Complete Genomics releases large sequencing dataset for global research community
Register for free to listen to this article
Listen with Speechify
0:00
5:00
MOUNTAIN VIEW, Calif.—Complete Genomics Inc. recently announced that it is providing the research community with access to 60 complete, high-coverage human genome sequences.
The data release builds upon the Yoruba trio dataset released by Complete Genomics on Jan. 6 and is also intended to complement other publicly available whole-genome datasets, such as the 1000 Genomes Project's recent publication of six high-coverage and 179 low-coverage human genomes.
Data for 40 genomes are currently available for download from Complete Genomics' corporate website at www.completegenomics.com/sequence-data/download-data and at the Bionimbus mirror site at www.bionimbus.org under "public data."
The remaining 20 genomes will be released by the end of March. The genomes were drawn from two resources housed at the Coriell Institute for Medical Research: the National Institute of General Medical Sciences (NIGMS) Human Genetic Repository and the NHGRI Sample Repository for Human Genetic Research. Included in the sample set is a 17-member, three-generation CEPH pedigree from the NIGMS Repository and ethnically diverse samples from the NHGRI Repository that represent nine different populations. The samples selected are unrelated, with the exception of the three-generation CEPH pedigree, a Yoruba trio and a Puerto Rican trio. The majority of these samples have been previously analyzed as part of the International HapMap Project or 1000 Genomes Project.
On average, these genomes have more than 55 times mapped read coverage, and the sequencing of these 60 genomes generated more than 12.2 terabases (Tb) of total mapped reads. In addition, 97 percent of each genome and 96 percent of each exome is called with high confidence. On average, more than 98.6 percent of each genome had coverage of 10 times or higher. Genome-wide single nucleotide polymorphisms (SNP) detection concordance with the high-quality Infinium subset of the International HapMap Project dataset averages 99.93 percent.
To date, Complete Genomics has sequenced and analyzed more than 1,000 high-coverage genomes for its customers, generating more than 230 Tb of mapped reads. The company completes more than 400 genomes per month and is working toward additional expansion of its genome sequencing capacity in the coming months.
The data release builds upon the Yoruba trio dataset released by Complete Genomics on Jan. 6 and is also intended to complement other publicly available whole-genome datasets, such as the 1000 Genomes Project's recent publication of six high-coverage and 179 low-coverage human genomes.
Data for 40 genomes are currently available for download from Complete Genomics' corporate website at www.completegenomics.com/sequence-data/download-data and at the Bionimbus mirror site at www.bionimbus.org under "public data."
The remaining 20 genomes will be released by the end of March. The genomes were drawn from two resources housed at the Coriell Institute for Medical Research: the National Institute of General Medical Sciences (NIGMS) Human Genetic Repository and the NHGRI Sample Repository for Human Genetic Research. Included in the sample set is a 17-member, three-generation CEPH pedigree from the NIGMS Repository and ethnically diverse samples from the NHGRI Repository that represent nine different populations. The samples selected are unrelated, with the exception of the three-generation CEPH pedigree, a Yoruba trio and a Puerto Rican trio. The majority of these samples have been previously analyzed as part of the International HapMap Project or 1000 Genomes Project.
On average, these genomes have more than 55 times mapped read coverage, and the sequencing of these 60 genomes generated more than 12.2 terabases (Tb) of total mapped reads. In addition, 97 percent of each genome and 96 percent of each exome is called with high confidence. On average, more than 98.6 percent of each genome had coverage of 10 times or higher. Genome-wide single nucleotide polymorphisms (SNP) detection concordance with the high-quality Infinium subset of the International HapMap Project dataset averages 99.93 percent.
To date, Complete Genomics has sequenced and analyzed more than 1,000 high-coverage genomes for its customers, generating more than 230 Tb of mapped reads. The company completes more than 400 genomes per month and is working toward additional expansion of its genome sequencing capacity in the coming months.