Tip of the Iceberg
BETHESDA, Md.—October 9, 2007—To start the next phase of its efforts to expand the ENCyclopedia Of DNA Elements (ENCODE), the NHGRI announced the issue of more than $80 million in grants, funding projects that the organization hopes will help to fill in the other 99% of the genetic blueprint left from phase one of the project. Among the high-profile recipients are: Yale University’s Dr. Michael Snyder, Affymetrix’s Dr. Thomas Gingeras, and the Wellcome Trust Sanger Institute’s Dr. Tim Hubbard.
BETHESDA, Md.—In a move that could potentially swamp the already busy fields of genomics-based drug discovery and pathophysiology, researchers involved in the Encyclopedia of DNA Elements (ENCODE) Project announced that the human genome may be vastly more complex than first thought. What makes this announcement even more daunting is the fact that the scientists have only explored 1 percent of the genome so far.
According to research published in Nature and in 28 papers in Genome Research—based on a four-year feasibility study to build a comprehensive parts list of functional DNA elements—the human genome may contain very little unused or junk sequences. It seems that the protein-encoding genes that were the focus of the original Human Genome Project (HGP) may be only a small part of the genetic machinery that determines cell function.
In an effort coordinated by the National Human Genome Research Institute (NHGRI), 35 research groups from 80 centers around the world performed a detailed examination of 30 million base pairs of the human genome.
“One of the biggest technical hurdles was bringing the experimental methods to the point of being able to apply them at a production scale, which included creating quantitative data standards so that different groups could readily compare their results and so that there could be a quantitative measure of data quality,” says Dr. Elise Feingold, ENCODE Program Director at the NHGRI. “While it seems simple, it was not trivial to get groups to be able to compare their data in a direct manner.”
The researchers looked not only at the structure of individual genes, but also at the regulatory sequences around these genes, how the genes were laid out with respect to each other, and even how the DNA itself was structured to fold into chromosomes.
What they found was a startlingly complex network of interacting parts.
“Even in those well-studied regions, we found several surprises,” says Dr. Emmanouil “Manolis” Dermitzakis, investigator at the Cambridge, UK-based Wellcome Trust Sanger Institute. “In my view, the biggest surprise was the lack of strong conservation of regulatory regions that are essential for the proper expression of genes.”
The newfound genomic wealth may prove a double-edged sword, however, for drug companies already struggling to deal with the data arising from the HGP and their own gene sequencing and target identification efforts.
According to Justin Saeks, a biotechnology analyst with Kalorama Information, the new data and the enabling technologies arising from ENCODE could be like the rungs of a ladder, helping companies move beyond the largely depleted low-hanging fruit to reach higher unexploited branches.
“At the same time, the complexity being uncovered also explains why side effects and ADME/Toxicity issues continue to plague the industry,” he says. “These two sides of the coin will probably offset each other in terms of cost or benefit to business; it’s a question of how much.”
Dermitzakis is more upbeat about the prospects.
“The full-genome ENCODE project will provide a framework for researchers and drug companies to tease apart which of the sequence variants such as SNPs may have a direct functional effect,” he says. “But even with the pilot project, people now have protocols to study the functional landscape of the human and other genomes.”
He also sees possibilities in the technologies that were developed by the different groups to execute the project.
“I think there are a lot of potential for spin-offs based on the technologies,” he says. “Many of these methodologies are now being revolutionized with the new sequencing technologies and the tremendous throughput they offer.”
Feingold is equally enthusiastic. “More data is better! The ENCODE data actually represents different kinds of data that helps interpret the human genome sequence.”