WALNUT CREEK, Calif.—Scientists at the Lawrence Berkeley National Laboratory, in concert with other national labs across the country, recently launched an online microbial genome database that integrates for the first time all previously published sequence data from the Joint Genome Institute (JGI) and other public research arms. The goal is to provide researchers with a single, real-time sequencing resource and collaborative tool for DNA researchers in the areas of drug discovery, industrial process and agriculture, among others.
"We think this will be valuable to researchers since we have taken steps to clean up the data of these original 296 genomes we now have online," says Nikos Kyrpides, scientific manager of the Integrated Microbial Genomes (IMG) data management system. "One of the big problems with the available sequencing data out there is much of it contains errors, so doing the work to provide clean data was important to the success of this project."
Providing a system for getting microbial sequencing information online was an important outreach step for JGI since it is currently producing approximately one-quarter of all microbial genome projects worldwide. While JGI will continue to deposit genome sequence information into GenBank, the repository maintained by the National Center for Biotechnology Information, the IMG is more than just a repository of data, rather Kyrpides sees it as an evolving portal for researchers in far-flung locations to share data and work together.
"We have made it very user-friendly and easy to navigate and that is because of the strength of our development team led by Victor Markowitz," says Kyrpides. "What we have provided is a workbench that provides a variety of computations—pre-computed data—and eventually it will allow researchers to provide annotations to specific sequences based on their work. That is the real beauty of the system."
For Markowitz, creating the architecture for the system wasn't simply about getting the information online, but it was "to provide high-quality data in a comprehensible system that is diverse in the number of genomes it covers," he says in a prepared statement. "This goal follows the fundamental principle that the value of the genome depends on the quality of the data and increases the number of genomes available for comparative analysis."
Now that is up and running, the IMG will update the data it carries quarterly, with the next update scheduled for June. By the end of 2005, Kyrpides says the number of sequenced genomes in the system will top 600.
Over the ensuing months, the IMG team will also provide a number of site enhancements including more data analysis capabilities and the launch of mechanisms that will allow scientists to participate in the annotation effort.