Thursday, November 20, 2008

Visualization Gets Touchy-Feely at SC08

SC has always had some of the coolest visualization displays on the planet, and this year continues that trend. This year saw a big jump in interactive multi-touch systems.

This is the Lambda Table from the Electronic Visualization Laboratory (evl), "an interdisciplinary graduate research laboratory that combines art and computer science, specializing in advanced visualization and networking technologies."
One of the applications was created for the Minnesota Museum of Science to teach students how rain and water flows over terrain (apparently there is a wide-spread misconception that water flows south, instead of down). In this case, when you touch the screen, it rained on that portion of the landscape, and the water then flowed down the terrain. Another application on the device was a view of the rat brain cortex, which could then be zoomed in and out to a fairly high level of magnification.

There was also some of the more traditional techniques, where you manipulate the visualization with a handheld controller and glasses; in this case it is wireless...

and in this case it is a 3D joystick with feedback. The setup was built by the High-Performance Molecular Simulation Team with the Computational Systems Biology Research Group from RIKEN in Japan, and is a looking at protein folding. It includes a display in the back showing a protein and a target drug molecule in 3D, and allows the user (wearing special 3D glasses) to move the drug molecule around and in and out of the larger protein. All the while the joystick is providing feedback on the interactions between the drug and the protein, giving resistance and bumps as the molecules collide.

Cyviz was showing a gorgeous large display with 3650 by 1050 resolution generated by DLP projectors and their special blending technologies to provide a seamless image.
Next is a beautiful 64" prototype LCD monitor from Sharp in the San Diego Supercomputer booth, with 4096 x 2160 resolution.

There were a couple of other multitouch tables, one from the University of Amsterdam

and another from RENCI. The RENCI folks said it took them about six months to put their hardware and software together.

Wednesday, November 19, 2008

The SC08 Green500 BoF

The SC08 Green500 BoF was led by Wu Feng from Virginia Tech. He started with a brief history for perspective. He then launched into the issues surrounding the creation of the list, a discussion of which is the purpose of the BoF. There were brief presentations from several vendors, folks from analyst IDC and journal HPCWire, and a couple of users.

I hadn't really intended to go to this session but tagged along with Glen (ironically because I needed to find power for my laptop). I tried to listen to the conversation, which got pretty energetic at times, from the perspective of a CIO/decision maker, and what they would want from the Green500. Here are my thoughts:
  • A single, simple number will be hard to deliver. There is simply too great a range of uses for HPC, both in terms of size and computational problem, to have a single number mean much of anything. CIOs want a single number, but will also understand the complexity of the situation, and might even be skeptical of a single number as overly simplified. Several in the audience suggested creating segments or classes - I think this is appropriate, probably based on the size of the system in Tflops.
  • The number should not have anything to do with cost or "value". While the vendors may want that, the Green500 should not evalutate value - let IDC, HPCwire, and ultimately the buyers do that.
  • The major challenge is sorting out what is part of the data center facility and what is part of the system. I don't see an easy answer here. With the CIO hat on, I would say leave the facility out of it, as that's a separate issue with separate challenges and opportunities (ie tax incentives for green buildings, etc). Just tell me how much electricity the system takes, and how much heat it rejects. If you want to try to improve your score by taking on some of the facility issues, I'm guess I'm okay with that as I expect you will be able to improve your score, but at a greater cost (which again I will evaluate on my own).
  • No CIO will make the decision to spend 5-8 figures on a Green500 ranking alone. They will ask their people, what can the system do, and what will it cost to acquire and operate it? The main gold of the Green500 should be to help vendors and IT staff have a conversation about energy efficiency. It should also help provide credibility to the final purchasing recommendation that the CIO sees. The CIO should be ensuring that the system meets the compute needs of the organization in a cost-effective and environmentally responsible way.
It was a very interesting discussion to listen to, and I'll be interested to see where things stand a year from now.

SC08 Data Lifecycle Management BoF

The Data Lifecycle Management: ILM in an HPC World BoF at SC08 Tuesday afternoon was an interesting discussion of the issues surrounding the storage, archival, and management of large data sets. Sponsored by Avetec's Data Intensive Computing Environment (DICE), it was led by Tracey Wilson, Computer Sciences Corporation and DICE and Ralph McEldowney, Air Force Research Laboratory Major Share Resource Center.

Many in the audience, including folks from NASA and other large sites, had years of experience with this - they were past the initial challenges of simply get the capacity in place, and were now struggling with trying to figure out what data needs to be kept for how long, who owns it, how to find it back, how to continue growing the service with something resembling an affordable cost. At the heart of the problem is the need for users to tag their stored data with metadata that describes such things as the owner, contents, projected size, expected lifespan, etc. Major HPC sites are finding that storage needs are eating into funding originally aimed at HPC capacity, thus potentially limiting the compute resources available for research.

They listed some of the more commonly used solutions today in this general space, although each has different levels of functionality when it comes to handling the metadata associated DLM.

The group seemed to agree that user training was a major goal - users need to understand the critical importance of associating metadata with their storage. Some expressed frustration in getting agreement on this, however - there was discussion of how to provide incentives to drive the right behavior, as well as the suggestion of deleting any data after a period of time that does not have associated metadata.

One site provides disincentives through increased fees for inefficient use of storage - space that is requested but not utilized is charged at a higher rate, as is data that is retained beyond a certain timeframe.

There was general consensus that establishing and enforcing policy around data management was a must - the example of eDiscovery was brought up as justification for properly classifying data so that it could be located easily during the discovery phase of a civil or criminal legal case. Failure to be prepared for eDiscovery can cost as much as $15M for a single case. One person noted that policy was relatively useless if developed by IT - it should be driven by the business needs of the organization, ie the records management and legal needs, working in partnership with IT. Another person commented that there is high likelihood of a failure of the commons in the effort to develop policy if this is not driven from the top down. Most researchers will be certain that there is no risk to themselves re: eDiscovery, so have no personal motivation to participate.

SC08 Music Room

I swung by the SC08 Music Room yesterday, as I'm scheduled to play there today (Wed) from 4:00-4:45. I was fortunate to catch a pro, Darcie Deaville, who was there with sponsorship from enParallel. It's a great room, very nicely done, and Darcie sounded great playing a variety of tunes on guitar and fiddle. She had been planning to bring a hurdygurdy, but it didn't make it. She did however have two fiddles in tow, one of which she keeps tuned to an opening tuning (F?) that she plays with drone strings to give a similar feel to a hurdygurdy.

From the SC08 Showfloor

I started my usual walk through the showfloor yesterday after the keynote to get the lay of the land and try to ID interesting talks at some of the lab booths. The FermiLab booth always has cool stuff, last it was a cloud chamber where you could see the trails of particles. This year they have an interesting sculpture modeled on a dark matter detector. The actual unit is under ground, however it is approximately the same size as this. So far they haven't detected much and are working to make the detection wafers larger.

Update 11/20/08 9:05: I should have noted they are looking for weakly-interacting massive particles (WIMPs), and there is an interesting story over at Wired from yesterday reporting another group of researchers on the same hunt seem to have found something interesting.

nVidia is getting a lot of attention for their "personal" supercomputer. The numbers look great, but one has to keep in mind that, as Glen points out, these are not general purpose HPC systems.

Rich Brueckner over at Sun (which seemed to be one of the busier booths) was in chaps and ready to ride, but with good reason - he had brought the Java Chopper to the show:

It's interesting to note in Rich's blog that he is looking to add a SunSpot to the bike...

These are very interesting little devices coming out of Sun Labs, the research arm of Sun. They have a variety of sensors and I/O options (accelerometers, light detectors, temperature sensors, LEDs, push buttons and general I/O pins), they run Java, they automatically form wireless mesh networks to communicate, and they can run for months on a single battery charge. A Java chopper with a Java-embedded SunSpot cranks the geek meter up pretty high...

which then quickly thwacks back to the zero pin across the aisle at the Microsoft booth. They were working a golf theme, obviously, and had a video golf course setup:

My guess is that this attracted mostly sales people from other vendors, as opposed to HPC customers, who don't as a group strike me as major golfers. I don't think Microsoft marketing has quite the right line on the SC crowd... my condolences to the MS booth staffers who have to toe the line, to their credit they were doing the best they could with what they had to work with.

Tuesday, November 18, 2008

Live Blog of the SC08 Keynote - Michael Dell

Here we go with the Welcome and Keynote for SC08. Michael Dell is the keynote speaker this year.

Patricia Teller from UTexas is the General Chair of the conference, and she welcomed the audience with the news that the conference is setting a number of records this year, including the number of exhibits, education attendees, and a total of more than 10,000 attendees this year.

This is the 20th anniversay of SC, and a video retrospective included a clip of the Seymour Cray keynote, where he joked that his 10" slide rule was the leading edge of computing power.

9:07 - There have been nine companies who have been involved in sponsoring all 21 SC conferences: Cray, HPCWire, IBM, Lawrence Livermore National Laboratory, Los Alamos National Laboratory, NASA, NEC, Numerical Algorithms Group, and Sun Microsystems.

9:10 - Michael Dell is introduced.

Michael says it takes 20 Pflops to model human brain, which runs on 20 watts(!). Japanese are spending $1.8B on a 10 Pflop system.

9:18 - Dell is announcing today they are extending their partnership with nVidia to add Tesla cards to their precision workstations, which will deliver one Tflop to the desktop.

9:19 - 489 of the Top500 supercomputers are based on the x86 architecture.

9:21 - Video of the vis wall at TACC, which is powered by Dell gaming systems. They are using it to support a cancer researcher who noted that we have reached a point where we can generate 250 TB of data for a single cell. The picture is part of a rotating 3D model from the cancer research.

9:26 - Technology advancements 0ver the last five years - you get more Tflops with 90% fewer systems. We need to build petascale software to take advantage of petascale hardware.

9:30 - 70% of HPC budgets go to staffing and facilities.

9:31 - Facebook utilizes 10,000 Dell servers.

9:36 - Brief video - review of the various generations, baby boomers, X, Y... the next generation is the regeneration.

9:37 - Q & A - Michael clarifies that his reference to the brain at the beginning was not to suggest that we need to recreate the brain, but simply to compare current HPC to the brain. "There are enormous opportunities to improve the man-machine interface."

9:43 - Question about his vision for education. He responds that public education needs to be delivering students familiar and at ease with 21st century technology - "it's as important as reading, writing, and arithmetic."

9;46: Question about alternative energy. Normal desktop uses $120 in energy a year, new Dell system uses $6 year. This isn't a source of new energy, but it does dramatically reduce the demand. Also, Dell is one of the first companies to become carbon neutral - you'll see companies begin to compete in this area.

9:50 - Another energy question - how to push component vendors to improve? Dell's suppliers report back on their carbon emissions, it's one of the things that Dell evaluates them on.

9:51 - Keynote is done.

Monday, November 17, 2008

Best Conference Gear Award

My neighbor in the SAM-QFS BoF session gets the award for best conference gear. Finding power in these hotel conference rooms is a constant battle - this little unit has three plugs and two USB jacks. Outstanding!

Glen is Blogging!

I guess Glen (that's his left hand) got sick of my nagging (or was it encouragement?) and is also blogging over at Exploding Frog. He has a slightly different perspective on events, as he actually knows how to use HPC instead of just spew off acronyms and jargon, so make sure you add him to your RSS feed as well.

SAM-QFS BoF Session: Great for Backup Too

Harriet Coverston, Sun Distinguished Engineer and architect of Sun's distributed SAN file system, led a Birds of a Feather session with users, Sun staff, and third party vendors on Sun Storage Archive Manager and QFS. It was interesting to hear from current users at Clemson, the Arctic Region Supercomputer Center, Mississippi State, and TACC, who are all putting these solutions to substantial use.

Clemson is using SAM-QFS not only for home directories and moving large data to their cluster (which runs Lustre internally), but also as a substantial addition to their backup environment. The SAM-QFS approach to creating copies of file as they come into the system position it to be much more scalable for very large storage. It also makes for much faster recoveries - in one test they were able to restore 9 million files in nine hours, when a test restore with their traditional backup system was still running at 24 hours. This is reminiscent of conversations we've had with the Friedrich Miescher Institute for Biomedical Research, who are also using SAM-QFS as a add-on to traditional backup services.

The fact that numerous users we have talked to have come to utilize SAM-QFS to solve backup challenges in their environments speaks well to the software's data protection approach.

Jim Pepin: Arpanet < iPhone

Jim Pepin, CTO at Clemson, had some thought-provoking points about the challenges facing research environments. Similar to some of Ray Kurzweil's analysis about the rates of change for technology, Jim compared technology today to his start in the field in the 1970s, and noted that there has been five to seven magnitudes of growth in storage, compute and networking capacities over that time. He noted that his iPhone has more compute and storage capacity than the entire Arpanet did in the 1970s.

He also got a round of unsolicited applause from the crowd this morning for pointing out the value of SAM-QFS and chiding Sun spending too much time talking about ZFS, and not giving more attention to the widely used software for large scale archival storage.

Jim said that storage is in many cases moving closer to the user, with GBs of storage in laptops or in pockets on phones and USB drives. That conflicts with organizational drives to do a better job of organizing and protecting data, and vendor pushes to move data and systems into the cloud. He noted that the speed of light is not a suggestion, it's a law - we can't change it to make clouds work better. He went on to suggest that campus IT has come to think of itself as plumbers and not innovators, and that trend needs to reverse if we are going to successfully address these challenges. He listed several efforts under way at Clemson, including the broadening of IT support beyond the traditional Helpdesk out to power users and user groups.

Now It's Official: The Grid RIP

It's official now, Arnie put it on the big screen. I was too busy helping with his demo to take notes on the talk, but he's going to send me the slides, so I'll update this post with more later.

Where's "Head" Bubba?

Glen and I have been wondering where "Head" Bubba is... he has been at recent Sun HPC Consortium meetings, and as much as we're struck by his presence, we're even more struck by his absence this year. Enough so that we felt compelled to learn more.

To give you some sense of why this is interesting, this gentleman is the VP of IT Research and Development at Credit Suisse First Boston, has more than a passing resemblance to one of the greatest singers of all time, and who's badge (and it turns out legal name) is "Head" Bubba.

Turns out the more you look, the more interesting it gets. I hope his absence from this year's conference isn't the result of the downturn in the economy, or worse the impact of seven barrels of Jack Daniels. As someone who has kept kegs in the basement for the last 20 years, I know first hand that surviving large quantities of good beverages in easy reach is all about pacing. Plus I'm a bourbon fan (Basil Hayden's), so I'll have to ask him more about his experiences with the great Tennessee whiskey next time I see him.

Update 11/20/08 16:44 - He's Here After All - I ran into HB this afternoon outside one of the conference sessions, introduced myself and had a brief conversation with and one of his colleagues. "Head" doesn't drink, but gives away the bottles of Jack to people with creative technology ideas - he mentioned a couple of recent ones (individuals at Mellanox and I think Stanford), which I think is really cool. He suggested that maybe I could come up with an idea and get a bottle, which I said I thought was unlikely. I mentioned that I am a homebrewer, and his colleague suggested that perhaps I could improve on the venerable VAXtap, and I said I thought that was more in my league. I'll have to keep my eyes peeled for an old half rack coming out of the data center.

Update 7/1/11 - HB and JD - I haven't been back to SC since 2008, however I got email from HB this morning, looks like I've qualified for a bottle from his latest barrel! I haven't made any progress on the VAXtap, inspiring as the recent New York Times article about the surge in popularity of home brewing might be.

Sunday, November 16, 2008

Tour of Ranger - Loud and Windy

The Ranger cluster at TACC, the largest public research cluster in the world at 579 Tflops, is not your typical data center tour. The system took $30M and two and a half years to implement, with total costs of $60M over four years. Two sides of the data center have glass walls, making for a nice showcase.

Before going in, Glen gives us a tour of one of the blades. The cluster has almost 4000 of these, for a total of 15,744 processors, 62,976 cores, and 123 TB of memory.

When you enter the room, the immediate sensation is of an intense environment. It's very loud, and in the rows, very windy. The APC chillers I'm in front of are on each side of the blade racks, returning cooled air from the enclosed hot aisle. Unfortunately I don't have enough hair to really give you a true sense of it.

Here we see six SunBlade 6000 chassis in two racks, each with twelve blades. Storage racks with 1.7 PB of capacity coming from Sun Thumpers run the Lustre filesystem are at the ends of each aisle. Data can only stay on the system for 30 days, as they are creating 5-20 TB/day.

These are power distribution units (PDUs) for the cluster, which draws 2.4 Mw at peak load, or enough power for 2400 typical residential homes. At $.06/KwH, the annual power bill is ~$1M/year. There are no UPSes or generators for the system, though they are planning to add UPSes for the storage and network fabric. Also, notice the floor vents - there is room-based HVAC in addition to the in-row systems to manage humidity and the larger environment.

The cluster itself takes 2000 sf, however there is another 1500 sf needed for PDUs and chillers. The room overall is 6000 sf.

The enclosed hot aisles allow for much greater efficiency in cooling the tremendous heat load. The fire suppression is water sprinklers (dry pipe pre-action), however they had to add more smoke sensors and alarms to deal with the enclosed aisles, as there was concern someone working inside the closed aisle wouldn't hear or see an alarm.

This is one of two massive Magnum Infiniband switches, the world's largest such device, with 3456 non-blocking ports. It is so dense that engineering the cabling was a major challenge.

Ranger supports over 1500 users and 400 research projects, and has handled more than 300,000 jobs for a total of some 220,000,000 CPU hours. There are larger clusters out there, but Ranger gets kudos for its density, and for putting together a very highly performing system with a considerably smaller budget than the DOE labs.

Clemson has Good Experience with Sun Cluster Install

James Leylek, PhD, Exec Director of the Clemson Computer Center for Mobility Systems (CU-CCMS) spoke yesterday at the Sun HPC Consortium Conference about their experiences installing and testing a SunBlade 6000 cluster. The system has 3,440 processing cores: 31 SunBlade 6000 chassis, 10 blades/chassis, with two Intel quad core CPUs and 32 GBs of memory per blade. They wanted to do stringent acceptance testing on the cluster - install the system and then run it at full peak for 72 hours straight. They were told it couldn't be done. The installation included putting together the entire compute grid (19 miles of cabling) in three days. The testing began with individual blades, then chassis, then portions of the cluster, and finally the entire cluster. The tests ran successfully over 130 hours and the entire project was completed in 16 days.

Large Scale Visualization @ TACC

Kelly Gaither, Associate Director of Data and Informational Analysis, UTexas Austin gave an interesting presentation yesterday on the visualization components of the TACC Ranger cluster, which we'll be touring later today. Known as Spur, it is comprised of eight Sun servers configured with high end graphics capabilities providing 128 cores, 1 Terabyte aggregate memory, and 32 GPUs. It is integrated into the fabric of the Ranger cluster. Since the system went into production in October, they have 38 users with over 120 hours/week of interactive usage, including one user that produced 43 TB of analysis output from a single run. Yeesh.

Interesting History for the Austin Airport Hilton

Glen discovered some interesting info on the history of the Hilton Austin Aiport, where the Sun HPC Consortium Conference is being held in. The building was originally the headquarters for the Bergstrom airforce base, and served as a strategic air command center during the Vietnam War, the Persian Gulf War and Desert Storm. It was rescued from demolition and converted to a hotel and convention center about ten years ago. It's one of the better conference hotels I've been in. If you follow the link, you'll need to click on the correct link in the left nav bar to get to the page.

Update Sun 11/16 21:44: The bartender says there used to be seven floors beneath the current lowest floor, and it was indeed one of three places in the US the president would have been brought during a crisis. Three or four of the floors have been filled in, however the remaining had to be left because of piping/ductwork/infrastructure. The area is currently closed and not in use by the hotel. She had a handout of the old building, which had an open courtyard in the center. The current atrium was created during the renovation.

Update2 Monday 11/17 9:33: I thought I'd see if there was any chance of getting a tour of the lower floors. I talked to a supervisor who was in the military and familiar with the building when in use by the military. The front door was originally on the second floor and there was a ramp that went up to that level. Also, despite his military clearances he hasn't been able to see the lower floors. Two of the floors were used by the hotel for awhile, however the City of Austin, which owns the building, had them move out of one of them. Oh well.

Andy Bechtolsheim: Flash Is Here, But Tape's Not Dead

Andy Bechtolsheim, cofounder and Chief Geek at Sun, spoke about Sun's HPC Storage Roadmap this morning. Andy's presentations go at about one slide every ten seconds, so there was a lot of info, most of which I can't talk publicly about in detail. Generally speaking, he argues that compute is becoming I/O bound - CPU clock speed increases are gaining less and less because it's getting harder and harder to get data to the processors. Disk capacity has grown rapidly, but I/O has remained relatively flat. The answer to this problem is flash memory - performance is not as good as traditional memory, but still substantially faster than disk. The cost is coming down at ~50% a year, and now is at a point where storage systems can be built on this technology. For example, and SSD has 8000 IOPs compared to 180 IOPS in a 2.5" HDD.

Andy went on to note that Sun is building a line of storage based on three different approaches (SSD, PCIe, and DIMMs) that will provide very interesting options for very high performance storage. An early version of a storage system based on flash DIMMs will be coming out in a couple of days at SC08. The system is 1U with roughly half the density and 840X performance over traditional disk. It also caught my eye that the recently announced 7000 system gets 192 3.5" disks into a 42U rack with only 7 Kw power.

This gets more interesting in light of Sun's recent open storage announcement called Amber Road, which is aimed at taking advantage of ZFS to pool this flash based storage with lower cost and lower performance traditional disk into a hybrid system to provide better read performance and lower power utilization for the same cost.

Andy was asked by Jim Pepin from Clemson what the implications are for tape. Andy noted that while flash and disk is getting cheaper, so is tape. Tape also has other advantages, like very low power consumption, and "won't go away for the next ten years at least." When asked about MAID, he noted that ZFS will include features to improve the power consumption of traditional disks, but that generally MAID has issues are slow spin up and read times, as well as the wear and tear of shutting down and starting up drives.

The Grid is Dead

Glen and I had dinner with Arnie Miles, Senior Systems Architect, Advanced Research Computing Adjunct, Assistant Professor of Computer Science at Georgetown. Arnie is working on next-generation middleware called Thebes, a collaborative effort supported by Sun to create a secure attribute-based infrastructure for distributed computational environments. Arnie is looking for development assistance on the project, and was interested to find that Glen is a Torque developer, and could potentially help with APIs to that scheduler. Glen noted that it might be a good opportunity for a collaboration with the Computer Science department at UMaine. Arnie was pretty emphatic that the term "grid" is dead, and is looking for a replacement - my suggestions were "condiment" or maybe "fascia". Arnie offered to come to JAX to present on this work, and talk about collaborating with us.

Hitting the Big Time in Austin

We had a great Texas barbecue for dinner last night out at a ranch on the southwest side of town. Leeann Atherton and her band served up some great traditional country and western, and were good enough to let hacks from the crowd join in (Merle Haggard's Sing Me Back Home seemed like a good fit).

I had breakfast yesterday with Josh Simons, a Distinguished Engineer int the Sun Systems Group, and also an amateur photographer. He's posted more pics over at his Sun blog.