Researchers identify primate-specific sequences in the human genome in comparative genomics study
In comparative genomic research project using data that is already available in the public domain, researchers from Wayne State University and the Genome Institute of Singapore have identified primate-specific sequences in the human genome. According to the researchers, who summarized their findings in the July 6 online edition of the Proceedings of the National Academy of Sciences, the study has many implications for the field of genomics research.
Register for free to listen to this article
Listen with Speechify
0:00
5:00
DETROIT, Mich.—In comparative genomic research project usingdata that is already available in the public domain, researchers from WayneState University (WSU) and the Genome Institute of Singapore have identifiedprimate-specific sequences in the human genome. According to the researchers,who summarized their findings in the July 6 online edition of the Proceedingsof the National Academy of Sciences, the study has many implications for thefield of genomics research.
According to the researchers, the study, "Global discoveryof primate-specific genes in the human genome," offers an explanation forlineage-specific uniqueness that is based on something completely new inevolution, not on changes to old sequences or structures. Perhaps moreimportantly, the researchers believe the study itself provides an interestingcritique of current genomic research methods.
The researchers began their quest to find primate-specificgenes by noting that despite the increasing availability of genome andtranscriptome sequence data, the genomic basis of primate phenotypic uniquenessremains obscure. According to Dr. Leonard Lipovich, assistant professor of theCenter for Molecular Medicine and Genetics and Department of Neurology at WSU'sSchool of Medicine and principal investigator of the study, this challenge isdue to multiple factors.
First, searching for non-conserved genes isn't emphasized byany of the major players in genomics research, Lipovich says. Although factorssuch as segmental duplications and positive selection have received muchattention as potential drivers of primate phenotypes, single-copyprimate-specific genes are poorly characterized, he says.
"There is a seldom-challenged assumption in thegenomics field that functional genes must be broadly evolutionarily conservedand protein-coding," Lipovich says. "You hear a lot about usinggenome and transcriptome data to look at conserved genes, but investigatorstend to ignore genomic intervals outside of those known genes. The efforts thatare out there are unimaginative and focus primarily on finding homologs ofknown protein-coding genes in additional species, not on non-conserved genesand their possible role in the genomic basis of interspeciesdistinctions."
A second challenge, Lipovich notes, is that too much genomicand transcriptiome sequencing is being done without sufficient downstreamefforts to analyze the sequence data.
"The fact that we have genome and transcriptomedatabases is not, by itself, helpful," he notes. "What might behelpful is developing new algorithmic approaches. In addition, these datasetsfrequently are not put together in a way that can help test specifichypotheses."
The Genome Institute of Singapore's Sen-Kwan Tay, who workedon the study as an extension of a dissertation for his M.Sc. degree inbioinformatics, adds that data on the genomes of humans and our nearestrelative, the chimpanzee, show a 99 percent similarity in their sequences.Explanations for the substantial phenotypic differences between the two speciesnot only include sequence differences, but also regulatory and genome structuredifferences and species-specific indels, Tay says.
"While the genome and transcriptome sequence data provide alot of what we know about interspecies sequence and genomic structuredifferences, we still don't understand exactly how, mechanistically, thesedifferences lead to phenotypic differences such as the uniquely highercognitive capacity in humans, etc.," Tay says.
To address both of these concerns, the researchers screeneda catalog of 38,037 human transcriptional units (TUs), compiled from EST andcDNA sequences in conjunction with the FANTOM3 transcriptome project andinterrogated the intersection of transcriptome data and multispecies genomealignments to search for primate-specific genes. The comparative study, usingtranscriptome sequencing and transcript-to-genome alignments, mapped the humantranscripts from FANTOM against the genomes of a number of organisms, includingthe chimpanzee, to discover de novo gene genesis.
"We searched for new classes of interspecies differences,specifically entirely new genes in primates, because such genes might provideanother explanation for lineage-specific uniqueness that is based on somethingcompletely new in evolution, not on changes to old sequences or structures,"Tay explains.
The researchers identified 131 TUs from transcribedsequences residing within primate-specific insertions in nine-species sequencealignments and outside of segmental duplications. Exons of 120 (92 percent) ofthe TUs contained interspersed repeats, indicating that repeat insertions mayhave contributed to primate-specific gene genesis. Fifty-nine (46 percent)primate-specific TUs may encode proteins, the researchers also found. Althoughprimate-specific TU transcript lengths were comparable to known human gene mRNAlengths overall, 92 (70 percent) primate-specific TUs were single-exon.Thirty-two (24 percent) primate-specific TUs were localized to subtelomeric andpericentromeric regions. Forty (31 percent) of the TUs were nested in intronsof known genes, indicating that primate-specific TUs may arise within older,protein-coding regions. Primate-specific TUs were preferentially expressed inreproductive organs and tissues consistent with the expectation that emergenceof new, lineage-specific genes may accompany speciation or reproduction. Of the33 primate-specific TUs with human Affymetrix microarray probe support, 21 weredifferentially expressed in human teratozoospermia.
"This paper suggests that the emergence of primate-specificand functional transcripts that due to de novo insertions, not arising fromduplication and subsequent accelerated sequence evolution," Tay says. "Byexcluding segmental duplications often synonymous with gene genesis, we havealso shown that there exists single-copy transcripts which are also unique toprimates and presented initial evidence for function for these transcripts. Forexample, 21 of our 131 primate-specific transcripts were found to bedifferentially expressed in a separate study on severe teratozoospermia in men.A comparison of our primate-specific transcripts with primate orphan genesidentified in a recent paper (Toll-Riera, et al.) shows no overlap—anindication that the global primate-specific transcript catalog is far fromsaturated and many primate-specific genes are still to be discovered."
The broader implication of the study is that not all genesare necessarily conserved and protein-coding, Tay says.
"There are genes that are 'neither,' but they are interestingbecause of their recent origin and possibly functional roles in reproductionand behavior," he says. "Such genes need to be included in drug target screens,RNA structure analyses, etc. We need to understand the mechanisms underlyingthe birth of these insertions, especially their non-repetitive portions."
To accomplish that, researchers will now need to update theset of primate-specific transcripts as new data becomes available, Tay says.This will enable the researchers to confirm that such evolutionary noveltiesare expressed, he adds.
"Additionally, there is a set of human transcripts which aredeleted in chimpanzees but conserved in the rhesus macaque, and possibly otherprimate genomes," Tay says. "Such gene loss in the chimpanzees may alsocontribute to the phenotypic differences between them and us."
The study may also serve as a paradigm of how research canbe conducted differently, Lipovich says.
"This is such an underrepresented area of research, and onetake-home message we have is that people should be looking at publiclyavailable data more," Lipovich says. "We established an generally applicableparadigm for exploiting the union of two publicly available resources:genome-wide sequence alignments and transcriptome data. Our approach wasunbiased in that it considered all publicly available human transcriptome data,not just transcriptome data supporting already-known genes. Mapping thistranscriptome data onto multispecies genomic alignments enabled us to discoverprimate-specific genes outside of annotated known genes."