BETHESDA, Md.—Scientists in the United States, Britain and China have launched the 1000 Genomes Project, an effort that will involve sequencing the genomes of at least 1,000 people from around the world in an effort to create the most detailed and medically useful picture to date of human genetic variation.
The project will receive support from the Wellcome Trust Sanger Institute in Hinxton, England, the Beijing Genomics Institute (BGI) Shenzhen, China and the National Human Genome Research Institute (NHGRI), part of the NIH.
NHGRI director Dr. Francis S. Collins says the new project will increase the sensitivity of disease discovery efforts across the genome five-fold and within gene regions at least 10-fold.
"Our existing databases do a reasonably good job of cataloging variations found in at least 10 percent of a population," says Collins. "By harnessing the power of new sequencing technologies and novel computational methods, we hope to give biomedical researchers a genome-wide map of variation down to the one percent level."
Adam L. Felsenfeld, program director, large-scale sequencing, NHGRI, says the project and resulting data will have an immediate impact.
"All sequence data will be released rapidly after it is generated to NCBI and EBI," Felsenfeld says. "Some derivative data (for example, imputed haplotypes) will be the result of detailed analysis which will take time, but will be released before publication."
The detailed map of human genetic variation will be used by many researchers seeking to relate genetic variation to particular diseases. Measuring progress and success of the project will be critical, adds Felsenfeld.
"We have a good quantitative statement of the overall goals: Find all the variants in the genome that exist at a one percent or greater frequency in the population, and down to 0.5 percent (or better) in gene regions, and place them in their proper haplotype contexts," says Felsenfeld. "But there are many intermediate metrics and more detailed goals that need to be established in order to best work towards the overall project goals. These detailed goals will emerge from the pilot phase, when we will really begin to understand machine performance in the context of this project, and the limits of our ability to use the data according to the models and preliminary evidence that guided the pilot. On a more basic level, a steering committee will monitor the output of the different participating centers on a continuous basis, to ensure that the data are being produced."
The project will map the single-letter differences in people's DNA and will produce structural variants, providing a deeper understanding of human genetic variation and opening the door to many other new findings of significance to both medicine and basic human biology.
"At 6 trillion DNA bases, the project will generate 60-fold more sequence data over its three-year course than have been deposited into public DNA databases over the past 25 years," says Gil McVean of the University of Oxford in England, one of the co-chairmen of the consortium's analysis group.
"This project reinforces our commitment to transform genomic information into tools that medical research can use to understand common disease," adds Jun Wang, associate director of BGI.
The project depends on large-scale implementation of several new sequencing platforms. Using standard DNA sequencing technologies, the effort would likely cost more than $500 million. Project leaders expect the costs to be lower—from $30 million to $50 million—because of the project's pioneering efforts to use new sequencing technologies. Felsenfeld said the cost of the project will be spread over three years, for both the pilot phase and full effort. DDN
editconnect: e020805
The project will receive support from the Wellcome Trust Sanger Institute in Hinxton, England, the Beijing Genomics Institute (BGI) Shenzhen, China and the National Human Genome Research Institute (NHGRI), part of the NIH.
NHGRI director Dr. Francis S. Collins says the new project will increase the sensitivity of disease discovery efforts across the genome five-fold and within gene regions at least 10-fold.
"Our existing databases do a reasonably good job of cataloging variations found in at least 10 percent of a population," says Collins. "By harnessing the power of new sequencing technologies and novel computational methods, we hope to give biomedical researchers a genome-wide map of variation down to the one percent level."
Adam L. Felsenfeld, program director, large-scale sequencing, NHGRI, says the project and resulting data will have an immediate impact.
"All sequence data will be released rapidly after it is generated to NCBI and EBI," Felsenfeld says. "Some derivative data (for example, imputed haplotypes) will be the result of detailed analysis which will take time, but will be released before publication."
The detailed map of human genetic variation will be used by many researchers seeking to relate genetic variation to particular diseases. Measuring progress and success of the project will be critical, adds Felsenfeld.
"We have a good quantitative statement of the overall goals: Find all the variants in the genome that exist at a one percent or greater frequency in the population, and down to 0.5 percent (or better) in gene regions, and place them in their proper haplotype contexts," says Felsenfeld. "But there are many intermediate metrics and more detailed goals that need to be established in order to best work towards the overall project goals. These detailed goals will emerge from the pilot phase, when we will really begin to understand machine performance in the context of this project, and the limits of our ability to use the data according to the models and preliminary evidence that guided the pilot. On a more basic level, a steering committee will monitor the output of the different participating centers on a continuous basis, to ensure that the data are being produced."
The project will map the single-letter differences in people's DNA and will produce structural variants, providing a deeper understanding of human genetic variation and opening the door to many other new findings of significance to both medicine and basic human biology.
"At 6 trillion DNA bases, the project will generate 60-fold more sequence data over its three-year course than have been deposited into public DNA databases over the past 25 years," says Gil McVean of the University of Oxford in England, one of the co-chairmen of the consortium's analysis group.
"This project reinforces our commitment to transform genomic information into tools that medical research can use to understand common disease," adds Jun Wang, associate director of BGI.
The project depends on large-scale implementation of several new sequencing platforms. Using standard DNA sequencing technologies, the effort would likely cost more than $500 million. Project leaders expect the costs to be lower—from $30 million to $50 million—because of the project's pioneering efforts to use new sequencing technologies. Felsenfeld said the cost of the project will be spread over three years, for both the pilot phase and full effort. DDN
editconnect: e020805