MENLO PARK, Calif.–SRI International, an independent nonprofit research and development organization, announced in July the expansion of its BioCyc collection of pathway/genome databases from 18 to 160 organisms. SRI collaborated with the European Bioinformatics Institute (EBI) in enlarging the collection to include databases for most eukaryotic and prokaryotic species whose genomes have been sequenced.
Peter Karp, director of SRI's bioinformatics research group, says BioCyc's genome collection ranges from human to pathogens, including "most if not all of the biodefense bacteria," like anthrax and tularemia. Human genome data are from the public sequencing project, but also incorporate about a dozen pathways curated by SRI, including cholesterol biosynthesis, which Karp called "one of the longest pathways in any of our databases."
Researchers can query and visualize the individual databases at SRI's website, and Karp says "people who want to update the databases would download the databases" plus editing software. "Then they can use the same software to publish the updated database on their website" or submit it for registry at SRI.
Updates, says Christos Ouzounis, head of the computational genomics group at EBI, allow researchers to share the responsibility of maintaining data, and the innovation "has the potential of being a major step forward" for the community if researchers present their revisions to the public. Updates are crucial for large genomes, like those for the human and the mouse, says Ouzounis.
SRI developed the software for the project, with EBI serving as a "power user," says Ouzounis. "We understand each others' needs," says Ouzounis, something also made easier since he and Karp have collaborated since Ouzounis was a post-doc in Karp's lab 10 years ago.
One of BioCyc's advantages, says Karp, is it can use the integrated collection of genomes to extract predicted pathway and operon information. He called the "omics viewer" a "killer app" that enables users to download an omics data set — perhaps for gene expression or proteomics — "and paint it onto a cellular overview diagram." Diagrams, which might combine metabolomics and proteomics information, can also be animated, making them "a very powerful analysis tool" for organisms in the database.