Monday, April 27, 2009

Research Computing and Infrastructure Technology



The opening keynote at Bio-IT World Expo 2009 was given by Chris Dagdigian from BioTeam, titled Research Computing and Infrastructure Technology. Chris was introduced by Rudy Potenzone, Ph.D., WW Industry Technology Strategist for Pharmaceuticals, Microsoft Corporation, who noted in his remarks that "there will be more data generated in the next five years than in the history of mankind."

Chris is known for his 90-slides-in-30-minutes "Trends from the Trenches" presentations, however this was a somewhat less manic version of that talk. First topic was virtualization, which he says is the lowest hanging fruit in the infrastructure. He talked about a west-coast campus that ran out of power in the their data center, and in response built a "virtual colocation service" that recovered facilities, lowered costs, and provided more flexible services to users.

He had an interesting observation that the data deluge is not looking quite as scary as it was even last fall, not because the technology is coming to be rescue, but rather because people are realizing that there must be "data triage", or data management - we just can't keep it all. Furthermore, in the world of next-gen sequencing, he argues that infrastructure is not the gating element, it's the chemistry, reagent costs, and human factors that are the bottlenecks to the throughput.

In the last six months, BioTeam has put up their first 1 PB filesystem - he had a screen grab from a df command showing "1.1P". He really likes the "P". He noted more and more customers are not backing huge systems up - one customer has 50 TB of Isilon storage for research use, and they don't back it up.

He talked a bit about cloud computing and storage, and several times noted how much he likes James Hamilton's blog. James is with Amazon, which Chris says is the cloud - all the other providers are several years behind. Chris noted that James has said that the cloud storage providers can provide 4x geographically distributed storage for $0.80/GB/year, which he says is less than any organization can provide data in a single location, much less distributed. He said those kinds of economics are going to drive all data, even huge data, into the cloud. The problem that needs to be solved at the moment is that there is no good way to get large data (ie 1 TB/day) into the cloud, but he said Amazon is working on this and this will be overcome as well.

Looking out on the near horizon, Chris noted the recent release by Google of videos of their 2004 data center technology, and asked the question, if that's what they were doing five years ago, imagine what they and Amazon are doing now? The economics and competition related to these huge facilities is driving incredible, but secret innovation. Slowly these innovations are starting to leak out, which is a good thing for the rest of the field. One example are the rising operating temperatures of systems, and the huge energy savings associated with every extra degree hotter the facilities can run. Pushed by big customers, Dell is now offering systems warrantied for operation at 94F, and Rackable offers systems that are supported for 104F.

Lastly, he said he thinks federated storage is on the horizon, and referenced the recently formed partnership between BioTeam, Cambridge Computer, and General Atomics to deliver GA's Nirvana storage platform.

No comments: