Treasure in the junk
ENCODE project finds much less garbage in what has been called ‘junk DNA’
BETHESDA, Md.—While the Human Genome Project might have beenthe dictionary to tell us the entirety of what's in our genetic makeup, themultiyear, international effort known as the Encyclopedia of DNA Elements(ENCODE) project may be our first grammar guide, telling us what those genesactually do.
One of the key things the scientific community has alreadylearned from ENCODE is that the "junk DNA" making up much of our human genomeisn't junk after all.
While it is still true that less than two percent of ourgenome actually contains instructions for making proteins, the roughly 440ENCODE researchers in 32 labs around the world linked more than 80 percent ofthe human genome sequence to specific biological functions and mapped more than4 million regulatory regions where proteins specifically interact with the DNA.So, while the vast majority of the human genome is still non-coding DNA—themore technical and accurate name for what has often been called junk DNA—manymore of those genes serve necessary functions than was previously thought.
ENCODE's public research consortium has been at this since September2003, when the National Human Genome Research Institute (NHGRI) of the U.S.National Institutes of Health launched the public research consortium as afollow-up to the Human Genome Project. Now that initial results are in, theNHGRI says the findings "represent a significant advance in understanding theprecise and complex controls over the expression of genetic information withina cell. The findings bring into much sharper focus the continually activegenome in which proteins routinely turn genes on and off using sites that are sometimesat great distances from the genes themselves. They also identify where chemicalmodifications of DNA influence gene expression and where various functionalforms of RNA … help regulate the whole system."
As NHGRI Director Dr. Eric D. Green puts it, "During theearly debates about the Human Genome Project, researchers had predicted thatonly a few percent of the human genome sequence encoded proteins, theworkhorses of the cell, and that the rest was junk. We now know that thisconclusion was wrong. ENCODE has revealed that most of the human genome isinvolved in the complex molecular choreography required for converting geneticinformation into living cells and organisms."
In a bit of understatement, Dr. Ewan Birney of the European BioinformaticsInstitute in the United Kingdom—and lead analysis coordinator for the ENCODEproject—notes that "we've come a long way," and adds, "by carefully piecingtogether a simply staggering variety of data, we've shown that the human genomeis simply alive with switches, turning our genes on and off and controllingwhen and where proteins are produced. ENCODE has taken our knowledge of thegenome to the next level, and all of that knowledge is being shared openly."
Three scientific journals are playing home to the initialsets of articles about the ENCODE findings—Nature,Genome Research and Genome Biology—andthe data from the project is said to be so complex that the journals had tocreate a new way to present and review the information in an integrated manner:a navigational function they call threads.
"The ENCODE catalog is like Google Maps for the humangenome," said Dr. Elise Feingold, an NHGRI program director who helped startthe ENCODE Project, in the NHGRI news release about the ENCODE findings. "Simplyby selecting the magnification in Google Maps, you can see countries, states,cities, streets—even individual intersections—and by selecting differentfeatures, you can get directions, see street names and photos and getinformation about traffic and even weather. The ENCODE maps allow researchersto inspect the chromosomes, genes, functional elements and individualnucleotides in the human genome in much the same way."
As Dr. Tim Reddy of the Duke Institute for Genome Sciences& Policy and one of the lead analyst for ENCODE notes, when the HumanGenome Project wrapped up more than 10 years ago, researchers still didn't knowwhat the vast majority of it meant.
"There is still a lot we don't know, but this is a step inthe direction to actually understanding it," he says. "ENCODE tells us wherethe controls are; it starts to fill in the 97 percent gap we had in ourunderstanding. This is important for disease studies because much of thevariation that's been connected to diseases has been found in regions involvedin how genes are regulated as opposed to their structure. Now that we knowwhere those control elements are, we can start to understand how diseases arethe result of changes in regulation. It leads us to new mechanisms."
Cold Spring Harbor Laboratory professor Dr. Thomas Gingeras,who was on one of the many ENCODE teams, points out that in addition to showingthat some three-quarters of our DNA may be transcribed into RNA, the ENCODEdata strongly suggest that non-coding RNA transcripts may act like componentsof a giant, complex switchboard rather than being "genetic padding" as wasoften assumed previously.
"We see the boundaries of what were assumed to be theregions between genes shrinking in length and genic regions making manyoverlapping RNAs," Gingeras says, noting that this challenges the notion that agene is a discrete, localized region of a genome separated by inert DNA andsuggesting that "new definitions of a gene are needed."
ENCODE initially formed as a pilot project to develop methodsand strategies that would be needed to produce results and did so by focusingon only one percent of the human genome. By 2007, NHGRI concluded that thetechnology had evolved enough to undertake a full-scale project, in which theinstitute invested approximately $123 million over five years. In addition,NHGRI devoted about $40 million to the ENCODE pilot project and has contributedabout $125 million for ENCODE-related technology development and model organismresearch since 2003.
The data from ENCODE are likely to make a splash outside oflife-sciences research, too, with the Electronic Frontier Foundation alreadyhaving requested that a court factor in the ENCODE findings as it weighs in onthe legality of DNA collection during arrest. In a letter to the U.S. Court ofAppeals for the Ninth Circuit regarding the case Haskell v. Harris, the foundation writes that the court's panelopinion in that case "relied heavily on the assumption that a DNA profile doesnothing more than identify a person" and that the 13 markers typically used inforensic cases involve junk DNA that is "not linked to any genetic of physicaltrait"—a statement that is now in doubt, the foundation says, with at least 80percent of the genome involved in at least one biochemical function.