EVENTS | VIEW CALENDAR
Treasure in the junk
BETHESDA, Md.—While the Human Genome Project might have been the dictionary to tell us the entirety of what's in our genetic makeup, the multiyear, international effort known as the Encyclopedia of DNA Elements (ENCODE) project may be our first grammar guide, telling us what those genes actually do.
One of the key things the scientific community has already learned from ENCODE is that the "junk DNA" making up much of our human genome isn't junk after all.
While it is still true that less than two percent of our genome actually contains instructions for making proteins, the roughly 440 ENCODE researchers in 32 labs around the world linked more than 80 percent of the human genome sequence to specific biological functions and mapped more than 4 million regulatory regions where proteins specifically interact with the DNA. So, while the vast majority of the human genome is still non-coding DNA—the more technical and accurate name for what has often been called junk DNA—many more of those genes serve necessary functions than was previously thought.
ENCODE's public research consortium has been at this since September 2003, when the National Human Genome Research Institute (NHGRI) of the U.S. National Institutes of Health launched the public research consortium as a follow-up to the Human Genome Project. Now that initial results are in, the NHGRI says the findings "represent a significant advance in understanding the precise and complex controls over the expression of genetic information within a cell. The findings bring into much sharper focus the continually active genome in which proteins routinely turn genes on and off using sites that are sometimes at great distances from the genes themselves. They also identify where chemical modifications of DNA influence gene expression and where various functional forms of RNA … help regulate the whole system."
As NHGRI Director Dr. Eric D. Green puts it, "During the early debates about the Human Genome Project, researchers had predicted that only a few percent of the human genome sequence encoded proteins, the workhorses of the cell, and that the rest was junk. We now know that this conclusion was wrong. ENCODE has revealed that most of the human genome is involved in the complex molecular choreography required for converting genetic information into living cells and organisms."
In a bit of understatement, Dr. Ewan Birney of the European Bioinformatics Institute in the United Kingdom—and lead analysis coordinator for the ENCODE project—notes that "we've come a long way," and adds, "by carefully piecing together a simply staggering variety of data, we've shown that the human genome is simply alive with switches, turning our genes on and off and controlling when and where proteins are produced. ENCODE has taken our knowledge of the genome to the next level, and all of that knowledge is being shared openly."
Three scientific journals are playing home to the initial sets of articles about the ENCODE findings—Nature, Genome Research and Genome Biology—and the data from the project is said to be so complex that the journals had to create a new way to present and review the information in an integrated manner: a navigational function they call threads.
"The ENCODE catalog is like Google Maps for the human genome," said Dr. Elise Feingold, an NHGRI program director who helped start the ENCODE Project, in the NHGRI news release about the ENCODE findings. "Simply by selecting the magnification in Google Maps, you can see countries, states, cities, streets—even individual intersections—and by selecting different features, you can get directions, see street names and photos and get information about traffic and even weather. The ENCODE maps allow researchers to inspect the chromosomes, genes, functional elements and individual nucleotides in the human genome in much the same way."
As Dr. Tim Reddy of the Duke Institute for Genome Sciences & Policy and one of the lead analyst for ENCODE notes, when the Human Genome Project wrapped up more than 10 years ago, researchers still didn't know what the vast majority of it meant.
"There is still a lot we don't know, but this is a step in the direction to actually understanding it," he says. "ENCODE tells us where the controls are; it starts to fill in the 97 percent gap we had in our understanding. This is important for disease studies because much of the variation that's been connected to diseases has been found in regions involved in how genes are regulated as opposed to their structure. Now that we know where those control elements are, we can start to understand how diseases are the result of changes in regulation. It leads us to new mechanisms."
Cold Spring Harbor Laboratory professor Dr. Thomas Gingeras, who was on one of the many ENCODE teams, points out that in addition to showing that some three-quarters of our DNA may be transcribed into RNA, the ENCODE data strongly suggest that non-coding RNA transcripts may act like components of a giant, complex switchboard rather than being "genetic padding" as was often assumed previously.
"We see the boundaries of what were assumed to be the regions between genes shrinking in length and genic regions making many overlapping RNAs," Gingeras says, noting that this challenges the notion that a gene is a discrete, localized region of a genome separated by inert DNA and suggesting that "new definitions of a gene are needed."
ENCODE initially formed as a pilot project to develop methods and strategies that would be needed to produce results and did so by focusing on only one percent of the human genome. By 2007, NHGRI concluded that the technology had evolved enough to undertake a full-scale project, in which the institute invested approximately $123 million over five years. In addition, NHGRI devoted about $40 million to the ENCODE pilot project and has contributed about $125 million for ENCODE-related technology development and model organism research since 2003.
The data from ENCODE are likely to make a splash outside of life-sciences research, too, with the Electronic Frontier Foundation already having requested that a court factor in the ENCODE findings as it weighs in on the legality of DNA collection during arrest. In a letter to the U.S. Court of Appeals for the Ninth Circuit regarding the case Haskell v. Harris, the foundation writes that the court's panel opinion in that case "relied heavily on the assumption that a DNA profile does nothing more than identify a person" and that the 13 markers typically used in forensic cases involve junk DNA that is "not linked to any genetic of physical trait"—a statement that is now in doubt, the foundation says, with at least 80 percent of the genome involved in at least one biochemical function.