Monday, October 11, 2010

50 Ideas to Change Science: MPD

Source: MPD
Jackson's own Mouse Phenome Database has made NewScientist's list of 50 Ideas to Change Science. From the article:
"If we want truly to understand the living world, the genome won't do. We need to get to grips with the "phenome": the sum total of all traits, from genes to behaviour, that make up a living thing. [...] That complexity perhaps explains why there is as yet no "human phenome project", though such a thing was first mooted in 2003. But smaller-scale projects such as the Mouse Phenome Database are now springing up. From personalised medicine to our understanding of evolution, science will be the beneficiary."
MPD enables searches based on 2208 physiological and behavioral data, as well as strain, projects, protocols, interventions (ie drugs, chemicals, diets, etc), testing apparatus, and other criteria, and is being utilized by close to 7000 different visitors per month.

Cars That Drive Themselves Pass the 1000 Mile Mark

Source: NY Times
Not a new story per se, but interesting in that Google has been testing autonomous cars in traffic, and they've gone 1000+ miles without human intervention (and 140,000 miles with only occasional intervention). The NY Times story suggests that autonomous vehicles have the potential to transform society as dramatically as the Internet. It would certainly fix the texting while driving issue once and for all. The  programming is sophisticated enough at this point to have styles of driving ranging from cautious to aggressive. It's fascinating that Google is obviously putting non-trivial resources into this research... makes you wonder what else they are working on.

UPDATE: Deepak Singh has an interesting post on this kind of speculative research.

Wednesday, September 1, 2010

The Cancer Breathalzyer? On Your Cellphone?

A friend forward a link to a very interesting study in the British Journal of Cancer that shows the potential for developing a breath test that can identify volatile organic compounds associated with four different kinds of cancer (breast, colon, lung, and prostate). Peng et al utilized a custom nanosensor "array of cross-reactive sensors based on organically functionalised gold nanoparticles."  It would be fascinating if this kind of array technology could be combined with the Xprize initiative to create an artificially intelligent physician on a smartphone.

Tuesday, August 3, 2010

Next-Gen HDR Photography: Down to the Pixel

I've been mucking with high-dynamic range photography for a year or so, and ever since I was introduced to the technology, it seemed clear to me that it was only a matter of time before the ability to adjust for differing light intensities moved from the entire sensor (the current HDR process) to individual sensor pixels. Canon has recently submitted a patent application for just such an approach... while we're still years away from such a device, it's interesting to see movement in that direction.

Multitouch Table Taken to the Next Level

Very interesting demo of a multitouch interactive table that includes magnetically controlled widgets. While there have been tables that allow physical widgets such as knobs to provide input to the table, this setup gives the table the ability to control physical objects. While fascinating to watch, I'm having some trouble thinking up a good application for this.

Friday, July 30, 2010

The Microbiome: You Are Not Alone In There

In the age-old dispute of nature vs nurture, the study of the microbiome has a lot to offer to the nurture crowd. As noted in Science, a recent paper in Nature looked at the genomes of the viruses (viromes) living in the digestive tracts of human twins, and found a remarkable lack of similarity between individuals, no more so than between any two random individuals. This is somewhat surprising because earlier work looking at the bacterial communities in the gut shows much greater similarities between individuals who are related.

It feels like we may be on the cusp of a change in how we think about ourselves, at least with respect to our health. As we improve our ability to understand the complex relationships among the huge number of microbes we have for roommates, and how our external environment and diet impacts these communities, it seems likely that we will uncover major implications for our health and wellness. The spiritual world has long nurtured the idea of a deep and powerful connection between ourselves and the world around us, but perhaps that connection is much closer and personal than we ever thought. As noted in an NY Times editorial related to this paper:
"We are not just the expression of an individual human genome. We are, as Dr. Gordon writes, “a genetic landscape,” a collective of genomes of hundreds of different species all working together — in ways that leave our minds mysteriously free to focus on getting our bodies to the office and wondering what’s for lunch."  

Costs of Solar vs Nuclear Power: Fight!

Very interesting news from a study from Duke University looking at the costs trends of solar and nuclear energy, at least in North Carolina.  As one would expect, the cost of solar has been falling, but what's surprising is that the cost of nuclear is climbing at the rate they indicate. This is attributed to the "nuclear renaissance" and the associated redesign of the facilities, for which projections continue to climb as the planning gets into the specifics. Based on this work, it would seem unlikely that many more nuclear plants will be constructed.

The bad news for those of us at higher latitudes is that the solar photovoltaic potential is considerably lower here.  But then we've got quite few wind power initiatives in the works. I'd love to see the wind power costs overlaid on this chart.

8/3/10 - UPDATE: Stanford researchers announce a new method for harvesting both light and heat energy from the sun, potentially doubling power output. This could be the nail in the coffin of nuclear if it pans out.

Wednesday, July 7, 2010

The Mathematics of Genome Sequencing

The mathematics magazine Plus has an interesting, accessible article that looks at the mathematics and programming approaches to genome sequencing. My suspicion is that most people assume sequencing is perfectly accurate, but once you get into the weeds you realize it's messy in there.

Tuesday, June 15, 2010

HDR and Microscopy

I've been having a blast over the last year or so getting back into photography after about a 3 decade lapse (and major advances in technology). The thing that really got me excited was something called high dynamic range or HDR photography.

One of the basic rules in photography has been that you should have the shot's light source (in particular the sun) at your back, otherwise your exposure might be funky. HDR involves taking multiple images at different exposures of the same subject so that you can get the very bright and very dark areas at appropriate exposures. The human eye and brain have an amazing ability to handle these tremendous differences in a particular view, and to my eye HDR gives the photographer some tools to get closer to what we perceive (and to also take some artistic license with the images).  I really enjoy the ability to shoot directly into the sun.

Even more interesting is the potential to apply this technique to scientific imaging. We've had some discussions about improving surgery images, and I spent some time brainstorming with our microscopy folks, one of whom forwarded along this paper which covers very nicely how HDR can enhance bright-field microscopy. Interestingly enough, they also use the same HDR software I've been working with (Photomatix Pro). It's really exciting to see someone tackling this.

Images C and D are HDR.
Image: Joerg Piper, Bad Bertrich, Germany, 2010

Wednesday, June 9, 2010

New Study Shows Genomic Counseling Results in Lifestyle Changes

While there has been lot of attention paid to the incredible pace of advancement in sequencing technology, there is growing discussion around the medical value of the data. Knowing that you carry a certain allele for disease risk may be interesting, but what do you do with that data? Do you change lifestyle? Do you begin a proactive course of medication? What is appropriate and who helps you understand the data and the potentially difficult choices it creates? There has even been concern that individuals, upon finding out they are at higher risk for a disease, would "give up", resulting in decreased quality of health.

A recent study by the Coriell Institute finds that:
"People who find out they have high genetic risk for cardiovascular disease are more likely to change their diet and exercise patterns than are those who learn they have a high risk from family history, according to preliminary research."
It's interesting to see that genetic testing appears to have the potential to be more motivating for patients than traditional sources of similar information, though it's hard to know whether it's a function of the novelty of the data or something more lasting. It's also important to note that there appears to have been high quality counseling associated with this study, which is widely seen as a critical to appropriate use of genetic data.

Thursday, June 3, 2010

A New Culture of Scientific Communication

It's great to see Josh Sommer's excellent talk at the Sage Congress getting more attention. Josh is a 22 year-old college drop-out who was diagnosed with a very rare brain tumor called chordoma as a freshman at Duke. When he discovered the average life expectancy for this disease is seven years and only 20-30% of the patients are cured, he immediately began reading the research. He was disturbed to discover how little research was going on related to chordoma and began working in the lab of a Duke researcher, but realized that progress was still way too slow.  He then began to identify the things that were slowing down the speed of the research, and eventually founded the Chordoma Foundation in 2007 to begin attacking these barriers.

The first barrier they decided to tackle was the relatively limited flow of information between chrodoma researchers. As the foundation led workshops and brought researchers together, they saw substantial increase in the number of questions around the disease being answered, and eventually built a research roadmap. He makes a compelling case for the need to change from a publishing model established 400 years ago that is no longer appropriate for the rapid pace of technology to an open access model that changes the culture for data and information sharing across the scientific community.

Saturday, May 29, 2010

Grow New Teeth

Columbia researchers have developed a method of applying stem cells to scaffolding infused with growth hormone to grow teeth in a little as nine weeks in an animal model. A major move forward here has been to growth the teeth in situ instead of in the laboratory, which improves their final shape.

Thursday, May 27, 2010

One Mutation for Every Three Cigarettes

If you need more evidence that cigarettes are tremendously unhealthy, Genentech and Complete Genomics have published the full genome sequence of a primary lung tumor from a heavy smoker in a Nature paper. The tumor had as many as 50,000 mutated genes, from which they calculated that one mutation occurred for every three cigarettes smoked.

Wednesday, May 26, 2010

Very Impressive Quadruped Robot

Fascinating video of a quadrupedal robot from USC's Computational Learning and Motor Control Lab that can handle novel, challenging terrain, and can learn how to make good foothold choices.

Thursday, May 20, 2010

A Switch to Turn on Youthful Memory

A very interesting study found that acetylation of chromatin in DNA resulted in expression changes that gave older mice the learning and memory performance of a youngster. What a great example of how critical epigenetics is to understanding the big picture, just knowing the genes isn't enough, you have to understand how they are (or are not) expressed.

New Blog Techtilis Focusing on Bio-IT Infrastructure Planning

I've started a new blog Techtilis to focus on the issues and challenges of planning IT infrastructure in support of genetics research. Mental Burdocks will continue on as a more general blog of science and technology items.

Thursday, April 22, 2010

Matthew Trunnell on Data Management Challenges at The Broad

Matthew Trunnell, Acting Director, Advanced IT, Broad Institute gave a talk in Track 1 Infrastructure - Hardware at Bio-IT World on Adjusting to the New Scale of Research Data Management. The Broad has been struggling with the PBs of data associated with their massive sequencing facilities for four years (he noted he ordered 1.1 PB of new storage last week), and is now encountering the issues associated with managing massive data collections that have challenge the physics and space engineering communities (among others) for the last 10-20 years.

He noted that as data grow very large, if it's not well managed you start to spend substantial amounts of time looking for data instead of actually working with it. Their legacy data is growing faster than the costs of technology are dropping, which is driving total costs up to the point where they can no longer afford to backup all data.  Furthermore, they've developed their own tool (fsview) and have done analysis of current utilization of data storage, and found as many as 18 redundant copies of files.

The primary issue is that the simple tools included with filesystems provide very limited metadata for managing data (typically file owner, group, size, and creation/modification/access times). Information such as project, laboratory, security classification, availability requirements, and lifespan are not available.This information is critical for managing efficient and cost-effective storage of the data, as it needs to be identified long after it's created when the original creator may not be available or may not remember the details of the data.

He quoted investment guru Peter Lynch's adage, "Know what you own and why you own it" as the guiding principle of data management. As step towards tackling the problem, the management of the Broad directed that all files be associated with funded projects. Matthew noted that there is an established field of solutions specifically designed to address this challenge: digital asset management (DAM). They are working with the iRODS software that is derived from the Storage Resource Broker (SRB) from UCSD Supercomputer Center. SRB and variants have made fairly substantial penetration into large-scale data management (I've previously talked to the folks at General Atomics, who provide a commercially supported version of SRB called Nirvana SRB).

But he said the technology is not really the challenge, the biggest change is the cultural change required of the scientists, who will need to tag their data as they are created. Some of the tagging can be automated, but there will also need to be other metadata provided by people. He said his approach will be to provide the service, and data won't get the usual services (backup, security, etc) until they have been registered in the DAM.

I'm very interested in these efforts as it's a natural follow-on to the whitepaper we are working on focusing on Research IT Infrastructure, which clearly demonstrates the challenges we face in the next five years, and the need for better data management.

Chris Dag from BioTeam - Trends from the Trenches

Chris Dagdigian from BioTeam chaired the Track 1 Infrastructure - Hardware talks, and got some additional time due to the unfortunate loss of Phil Butcher from Sanger due to the volcano. Chris ran through the latest Trends From the Trenches presentation, and had some interesting updates.
  • He had expected blades to win the HPC hardware battle, but has not seen that come to pass, it's still a split field
  • Intel is currently the chip of choice, but AMD might be back in the game
  • BioTeam has done more Sun Grid Engine consulting in the first quarter of 2010 than all of 2009; he's not concerned about SGE's future following the Oracle acquisition of Sun
  • He got a laugh from the crown with "private clouds - still stupid in 2010". He notes that this is just marketing speak and doesn't really mean anything.
  • Public clouds, on the other hand, are very real, and close to being mainstream... he's a strong supporter of their use in the right situations.
  • DIY cluster/parallel filesystems have a higher risk of implementation failure rate due to lack of pre-sales planning and design, especially in smaller shops. He also recommended commercial solutions with formal support programs.
  • Clusters are increasingly utilizing fat nodes (32 core, 128 GB+ memory)
  • Petascale storage is no longer risky, and single namespace solutions are recommended
  • He expressed concern about the downstream analysis of data (such as sequencing) eating up storage capacity - while the HTPS pipeline is relatively easy to model, secondary analysis is much more difficult.
He had an interesting observation regarding communication between IT and scientists. He gave the example that scientists will often ask for 100% uptime and full data protection, but don't realize that the difference between five 9s and four 9s of uptime is several million dollars. He emphasized the need for bettter communication between IT and research, and that IT needs accurate accounting of IT costs so that it can explain the costs of services and facility

He argues that the DNA data deluge will get better, mostly because the sequencing vendors are becoming more efficient in delivering data from the instruments. I would agree that the per run sizes will stabilize, but as the costs continue to plummet for sequencing, it will drive much greater demand, thus continuing the pressure on storage and compute infrastructures.

He also talked about an issue that Jackson has had recent experience with, and that is the challenge of high-speed networking. He noted that moving large data around requires more than just big pipes and bandwidth. His experience is the number of hops between locations can have a huge impact on performance, as well as the tools and protocols utilized.

John Halamka on EHR

"From the doctor's brain to the patient's vein." - John Halamka, CIO of Harvard Medical School, on the impacts of EHR.

Wednesday at Bio-IT World started out with a keynote by John Halamka, M.D., M.S., CIO of Harvard Medical School, and fourth person to have his genome publicly sequenced. John walked the group through the implications of hundred of pages of $30B Healthcare IT legislation. Regarding privacy concerns he said, "there will not be a massive database in the basement of the White House run by Sarah Palin." He said the goal is to go from 20% to 100% use of EHR in five years, and characterized fully implemented Electronic Health Records as improving the accuracy and efficiency of medical records - "from the doctor's brain to the patient's vein."

John also related an interesting story about their one and only data breach. It started with an employee looking at a particular clinical trial involving 4000+ subjects. They found the data very compelling, and made a copy on their laptop (encrypted), which was then forgotten. A year later the employee left Beth Israel Deaconess and went to UCSF, and in the process copied the contents of their laptop to a new unencrypted laptop (CA has less stringent encryption requirements than MA). The laptop was stolen by someone, pawned, when the pawnshop owner couldn't get the system to boot, he called Dell Tech Support. Dell, upon discovering the contents of the laptop, contacted Beth Israel, and the laptop was returned in 24 hours. He said that he spends $1M annually on information security for BID, and that they are attacked every seven seconds over the Internet, half of which come from eastern Europe and other half of which come from eastern Cambridge.

Some other points of interest:
  • Lab tests will start using controlled vocabulary to ensure consistency across providers.
  • Patients will be able to get a full copy of their EHR.
  • The Social Security Administration spends $500M annually managing paper records, which are subsequently digitized.
He also commented on the growing collection of wifi-enabled devices capable of measuring and reporting body telemetry. He is using a home scale which automatically transmits his weight, body mass, and other data to Google Health and Microsoft Health Vault.

Wednesday, April 21, 2010

Bio-IT World Keynote: How to Start a Drug Company

"I will be shocked if there aren't drugs in the market in the next 10-15 years that target aging genes [and pathways]." - Christoph Westphal, CEO of Sitris Pharmaceuticals, responds to a question from Kevin Davies, Editor-in-Chief of Bio-IT World.

The kick-off keynote at Bio-IT World 2010 was given by Christoph Westphal, a doctor and scientist who has started a number of small drug companies. While most of them lost (or are losing) money, one was a widely-acclaimed success, at least for a time.

The conference was opened by Cindy Crowninshield, the conference director. One major challenge the conference has faced has been the need to find 20 replacement speakers due to travel disruptions caused by the Icelandic volcano.

The keynote was introduced by Ronald Ranauro, CEO of GenomeQuest, a company trying to carve out a niche in the next-gen sequencing data market with a SaaS offering they describe as "SDM" (sequence data management). It's not clear to me yet how they differ/integrate with LIMS solutions, but we'll find out more on Thursday, as I'll be meeting with them and another colleague from Jackson.  Ron pointed to the exponential growth in genome data has a tremendous opportunity.

GenomeQuest is looking to aggregate 1 million public genomes.

Back to Christoph, who told the story of Sitris, which was eventually acquired by Glaxo-Smith Kline for $720M.  They are working with Resveratrol, the anti-aging compound found in red wine. The talk focused on the process and components of a drug start that has a chance of making it.

This wasn't a topic of particular interest to me, but there were some interesting thoughts. By coincidence I had just had a long conversation with bandmate Jim Coffman, who is researching aging with sea urchin larvae at the MDI Biological Lab, on a ride back from band practice. It was great exercise for what I've learned in Genetics I and II over the last year. Jim's work focuses more on the TOR pathway (which is linked to rapamyacin, something being studied by Dave Harrison's lab at Jackson), but it seems similar to the SIRT1 pathway, which is regulated by resveratrol.  The big picture here is that caloric intake affects these pathways, in that a calorie-restricted diet has been repeatedly show to extend lifespan in multiple organisms.

But that wasn't the interesting part of the talk, really. More interesting was the insights to the incredible pace of progress in this field of research. Fifteen years ago, Christoph noted, it was considered crazy that there were genes involved in aging; now it's a major area of research.

Some other conclusions he's reached that apply fairly broadly:
  • it's more about the people than the technology
  • good teams overcome failures
  • a powerful vision/idea will attract supporters
He also noted that there is debate about whether to share data and information, or withhold it for competitive advantage. He is very strongly of the opinion that it is more important to share data and show you're a thought leader than worry about proprietary issues.

Tuesday, April 20, 2010

Prepping for Bio-IT World 2010

It's spring and I'm at the World Trade Center in Boston, which means it's time for Bio-IT World. Workshops started today and the main conference is kicked off with a keynote and reception later this afternoon.

I'm really looking forward to a number of talks, and am struggling with the usual conflicts. The conference has seven different focus areas or tracks and I'm interested in the first four - IT Infrastructure Hardware, IT Infrastructure Software, Bioinformatics & Next Gen Data, and Systems and Predictive Biology.

Track 1 (Hardware) will spend all day Wednesday on Scaling up for the Data Deluge, and then on Thursday look at Sequencing, Genetics Data Management & Grid Computing in the morning and Data Storage & Usage for Computational Tasks in the afternoon. Particular talks I'm interested in in this track are:
  • Wed, 11:00 - Sanger Centre's Perspective on Data Storage Challenges, Phil Butcher, Head of Systems at Sanger
  • Wed, 11:30 - Adjusting to the New Scale of Research Data Management, Matthew Trunnell, Acting Director, Advanced IT, Broad Institute
  • Wed, 2:15 - Improving Storage Efficiency for Unstructured Research Data, Richard Shaginaw, Project Manager, Scientific Computing Services, Bristol-Meyers Squibb
  • Wed, 3:45 - ResearchStation: A Bioinformatics Platform for Research Collaboration in Translational Medicine, Lynn H. Vogel, Ph.D., VP and CIO, Associate Professor, Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center
  • Thurs 11:00 - Pallas, a Computational Analysis Network, Charles Hurmiz, Director, Research Informatics, Information Sciences, St. Jude Children's Research Hospital
  • Thurs, 11:30 Making Systems and Services Easy: Secure File Sharing and Computational Portals, Shawn Houston, Technical Lead, Life Sciences Informatics, University of Alaska Fairbanks
  • Plus I'm presenting on IT Infrastructure Strategy in Support of Next-Gen Biological Research at 1:45 on Wed.
Track 2 (Software) spends Tuesday on Collaboration & Open Source Tools, Genomics Data & Wikis, and Semantic Web & Linked Data Technologies. Wednesday is focused on Information Exchange, Integration & Security. Items that catch my eye include:
  • Wed, 11:00 - Test to Best - Evidence for Collaboration and Science Driven IT as Criteria for Personalized Medicine, Michael Berens, Ph.D., Director of the Cancer and Cell Biology Division, Brain Tumor Research Lab, Translational Genomics Research Institute (ack! already a conflict with Track 1!)
  • Wed, 2:45 - Feature Presentation: The BIG Idea: Strategies to Achieve a Rapid-Learning Health System, Ken Buetow, Ph.D., Associate Director, Bioinformatics and Information Technology, National Cancer Institute (this talk spans Tracks 2, 3, 4, 6, and 7).
  • Thurs, 2:30 - Sharing Data While Keeping Control, Werner Ceusters, Professor, Director, Ontology Research Group, NYS Center of Excellence in Bioinformatics & Life Sciences
Track 3 (Bioinformatics and Next-Gen Data) focuses on Driving Biomarker Discovery and Translational Research on Wed morning, then Data Management & Integration Strategies on Wed afternoon, followed up with Application of Data on Thursday.
  • Wed, 11:00 - Leverage Emerging Technologies to Manage Genomic and Clinical Data, Stephen Friend, M.D., Ph.D., President, Sage Bionetworks. This first slot Wed is brutal. Sage is the non-profit off-shoot of Merck that looks very interesting.
  • Wed, 12:00 - Pipelining Your NGS Data, Nancy Miller Latimer, M.S. Senior Product Manager, Biological Sciences and Analytics, Accelrys. 
  • Wed 1:45 - CASTOR QC - A Database Approach for Handling Large Genomic Data Sets, Marc Bouffard, M.Sc., Senior Bioninformatician, Montreal Heart Institute and Genome Quebec Pharmacogenomics Center
  • Thurs, 11:00 - Unbiased Prioritization of Mutations in Cancer Genomes, David Dooling, Ph.D., Director, Analysis Developers, Laboratory Information Management Systems (LIMS), and the Information Systems Groups, The Genome Center at Washington University in St. Louis School of Medicine
  • Thurs, 3:00 - Toward Meaningful Whole-Genome Interpretation with Open Access Tools from the Genome Commons, Reece Hart, Ph.D., Chief Scientist, Genome Commons, UC Berkeley QB3 and Center for Computational Biology
Track 4 covers Data Modeling: Enabling Systems Medicine, Data Generation: Good Models Start with Good Data, and Data Integration: Modeling Disparate "Omic" Sources on Wed. On Thursday the focus continues with Data Integration in the morning, followed by Data Validation: From Benchtop to Clinical Outcomes. There are only a handful of talks I'm interested in this track, and I've run out of time as the first keynote is about to start. And I haven't even covered the main morning keynotes. More later...