CATCH OF THE DAY: GENES --- Under Venter-UCSD collaboration, millions of genes and thousand of protein families

ROCKVILLE, Md.—Using new computational tools and "whole environment shotgun sequencing," researchers from the J. Craig Venter Institute (JCVI) recently discovered millions of new genes and thousands of new protein families and characterized thousands of new protein kinases from ocean microbes. The work, which has major implications for drug discovery as well as several other areas of scientific research, was conducted as part of the ongoing Sorcerer II Global Ocean Sampling Expedition (GOS), which is funded in part by the Gordon and Betty Moore Foundation and the U.S. Department of Energy and was published in PLoS Biology.

More importantly, the overall genomic and metagenomic data now will be available from two online sources so that researchers can begin to dig into it and help advanced their particular areas of research, says Paul Gilna, executive director of the Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA) and former director of the Joint Genome Institute of the U.S. Department of Energy.

One of the locations where the data is being deposited is GenBank, a public database of the National Institutes of Health. The other location is CAMERA, which was developed by the University of California, San Diego division of the California Institute for Telecommunications and Information Technology (Calit2) in partnership with JCVI and UCSD's Center for Earth Observations and Applications at Scripps Institution of Oceanography. CAMERA is funded by a $24.5 million grant from the Moore Foundation over seven years—having started in January 2006—and is a direct offshoot of the GOS research.

"The scale and complexity of the GOS data required Calit2 to architect a powerful new cyberinfrastructure to enable both interactive access as well as high-performance computation on the data by the global metagenomic community," said Larry Smarr, Calit2 director and principal investigator on CAMERA, in a news release about the GOS findings and CAMERA launch.

The GOS dataset is 90-fold larger than other marine metagenomic datasets, making it the largest ever released in the public domain, according to the Venter Institute and Calit2. Also, the GOS work has nearly doubled the number of known proteins. This data is expected to help researchers better understand the genomic structure and evolution of microorganisms, as well as the function of important protein families.

While discoveries made from the genomic data may lead to profits for various companies, neither UCSD nor Venter themselves are seeking to commercialize anything coming out of this work.

"A major tenet of CAMERA is that the data be open," Gilna says. "That not only means the datasets but also the software tools—also available online—that we and Venter have developed and will develop for handling the genomic information. The software tools will all be designed around open-source standards."

In addition to the fact that the data coming out of the GOS and CAMERA work may help genomic and proteomic areas of drug discovery, Gilna thinks that as more is discovered about how microbes work and evolve—using the GOS findings—researchers may learn new ways to better harness microbe use in drug production.