A research paper newly published by PLOS—in PLoS Biology specifically—calls for improved data sharing to improve genomic research. The paper, from Cambridge, U.K.-based DNAdigest, a not-for-profit organization promoting best practices for data sharing in genomics, and Repositive (also in Cambridge), a software company developing novel tools to improve access to human genomic research data, describes how human genetics research and promises of precision medicine are impeded by the difficulty in finding and accessing data across research organizations.
The paper describes a range of benefits from improving access to public data and sharing of data with research peers, including improved transparency and reproducibility of data, particularly important when developing new treatments. In addition, availability of more complimentary and reference datasets will enable early validation of results, saving time and resources and reducing duplication of efforts.
“The increased data availability and accessibility is key to make breakthroughs in precision medicine and companion diagnostics. Medical research in genomics requires both specificity and sensitivity which is only possible by accessing and comparing large volumes of data,” said Fiona Nielsen, CEO of DNAdigest and Repositive, in a news release about the paper’s publication. “Easier data access will accelerate research and lower costs of making new discoveries, which will provide benefits for both clinicians and patients in the form of new and better treatments.”
As Lab Times Online noted May 30, in part, on its blog about the published paper:
“Whereas DNAdigest is characterised as a charity in the UK, working to engage the research community into exploring and solving the issue of human genomic data sharing, Repositive is a self-sustained business with its mission aligned to the social mission of the charity: ‘to facilitate efficient and ethical data sharing for genomics research.’
“In short, this entails building an online platform that provides a single-point entry to search public human genomic data repositories, free of charge. Why this is necessary? Simple: we tend to forget that sharing is important to let science progress. And to be fair, we are also afraid to expose potential sensitive personal information to the world. In the PLoS Biology article, the authors have a clear argument as to why we should breach that barrier: it’s against the data donor’s interests and expectations to not utilize their data in the best possible way within the given consent.”
As the PLoS Biology paper, “DNAdigest and Repositive: Connecting the World of Genomic Data,” itself notes: “Ironically, human genomic data is probably the most important data to share, since it lies at the heart of efforts to combat major health issues such as cancer, genetic diseases, and genetic predispositions for complex diseases like heart disease and diabetes. In particular, the promise of personalized medicine (in which treatment is tailored to the individual) is unlikely to be realized without widespread access to large amounts of genomic data.”
The paper points out that there are a number of online communities that feature networking and collaboration opportunities for scientists, including Researchgate, Academia.edu and LinkedIn, and they all provided the opportunity for researchers to interact, build their online profiles and identify potential collaborations. Several existing projects are also trying to deal with the data-sharing issue by providing online open-access repositories for storing and sharing research outputs, and just a few examples are Figshare, Zenodo and Dryad.
“All these platforms allow data storage, data sharing, and data annotation. The online communities and the data storage tools mentioned above all have a very broad coverage and are actively used by many researchers across very different fields of research,” note the authors of the PLoS Biology paper.
“However, there is currently no single point of entry for genomic datasets (like Uniprot for proteins or OMIM for genes),” they write. That is one thing that Repositive is trying to address with its platform, and the authors write that true progress might best be achieved “by concentrating on one specific problem (in our case, the problem of finding and accessing human genomic research data) and supporting best practices for data annotation, accessibility and reuse.”
“There are multiple other problems that need to be addressed to make data sharing the default rather than the exception,” they continue. “These include the standardization of ethics committee approvals, normalizing file formats, defining suitable ontologies and metadata formats for describing data. Many of these issues are addressed by working groups within a number of international consortia, including the Research Data Alliance, BioSHaRE-EU, and the Global Alliance for Genomics and Health, of which both DNAdigest and Repositive are members.”