NEW YORK—It’s a longstanding question whether there can indeed be too much of a good thing. When it comes to someone’s favorite foods or activities, the answer seems like an obvious ‘no.’ When it comes to information, however, too much of a good thing can be a legitimate problem. As screening technologies become more advanced and can delve deeper into the genome, generating more answers, so too do they generate more “noise,” extraneous results beyond what a researcher might be screening for.
A team of scientists from the Icahn School of Medicine at Mount Sinai and The Rockefeller University are applying “blacklisting” to streamline sequencing analysis in an effort to deal with this noise. Blacklisting is most commonly used to control spam or limit access by blocking unwanted files or messages, but in this method, it is used to filter out benign genetic variations, thereby reducing the excess information that can be distracting in the search for pathogenic genetic variants. Their work was published in the Proceedings of the National Academy of Sciences in a paper titled “Blacklisting variants common in private cohorts but not in public databases optimizes human exome analysis.”
Whole-exome sequencing looks for variations in protein-coding genes to pinpoint the genetic origins of disease, but out of tens of thousands of genetic variants, only a handful will actually be linked to the disease in question. Current approaches require sequencers to go through their data and remove benign variants, which greatly delays analysis. The Mount Sinai and The Rockefeller University teams aim to speed the process by blacklisting non-pathogenic genetic variants; they have created the ReFiNE program based on this work, and also established a webserver on which other research teams can create their own blacklists.
“Until now, there has been no viable published method for filtering out non-pathogenic variants that are common in human genomes and absent from current genomic databases,” said Dr. Yuval Itan, an assistant professor of genetics and genomic sciences at the Icahn School of Medicine and senior author of the publication. “Using the blacklist, researchers will now be able to remove genetic ‘noise’ and focus on true disease-causing mutations.”
“The industry and academia both face the same problem when trying to go through exomes and genomes, which is false-positive variants: genetic variants that are detected as disease-causing or interesting while they are not, they’re just basically being noise in analysis,” Itan explains. “And this is usually the majority of the data that the industry would be dealing with in case they do not immediately find the disease-causing mutations in the exomes and genomes they are facing. So it’s really crucial for everyone dealing with human genomics to have such tools to remove as much noise as possible, but without removing true disease-causing mutations from the analysis.”
Itan tells DDNews that they tested their approach to minimize false-negatives by using results from the Human Gene Mutation Database, as well as with 129 patients whose disease-causing mutations were already known. In the 129 patients, none of their mutations were predicted to be blacklist variants, Itan says, noting that “It’s a very safe method to remove irrelevant variants while having a very low risk of removing the true mutations from the analysis.”
Itan adds that he expects this blacklisting method will work in other types of sequencing as well, commenting that while they only tested it with a few whole genomes, compared to testing it with thousands of exomes, “The majority of the genomic regions that are not covered by the exome are not pathogenic and have an even lower impact, so I expect the risk and performance in whole-genome to be even better, actually, than we show in whole-exome.” RNA sequencing will likely benefit from this tactic as well, he says.