CAMBRIDGE, U.K.—RNAcentral, reportedly the first unified resource for all types of non-coding RNA data, was launched in September by the RNAcentral Consortium. RNAcentral brings together information from a federation of expert databases and provides tools for easy browsing. The RNAcentral consortium currently includes 24 RNA database resources.
The initial release of RNAcentral contains about 8 million sequences. Using funding from the UK’s Biotechnology and Biological Sciences Research Council (BBSRC), partner institutes throughout the world were able to come together and build what they see as a practical solution to a shared problem.
Since the 1950s, scientists have thought of RNA as an intermediate molecule that provides a link between stable DNA and proteins. However, in recent decades it has become clear that RNA plays a much wider range of roles in living organisms. Researchers have discovered a lot about different types of RNA, but until now these data have not been put in one place.
“During the last decade, there has been a great increase in the number of noncoding RNA genes identified, including new classes such as microRNAs and piRNAs,” explains Alex Bateman, head of Protein Sequence Resources at EMBL-EBI (the European Bioinformatics Institute). “There is also a large amount of experimental characterization of these RNA components. Despite this growth in information, it is still difficult for researchers to access RNA data, because key data resources for noncoding RNAs have not yet been created. The most pressing omission is the lack of a comprehensive RNA sequence database, much like UniProt, which provides a comprehensive set of protein knowledge.”
Before RNAcentral, finding the RNAs encoded by a specific genome required gathering information from several independent resources, for example miRBase for microRNAs and HAVANA for lncRNAs. “There is plenty of published data on noncoding RNAs, but each subtype is maintained separately,” according to Bateman. “This is the first time we have a central place where you can find it all: piRNAs, ribosomal RNAs, everything. A lot of that information has typically been locked up in supplementary materials, or referred to only by a non-standard gene name. RNAcentral is a big step towards making RNA sequence as easy to access for research as protein sequence.”
RNAcentral 1.0 gives researchers access to data from 10 different expert databases and provides stable accession numbers that can be used consistently in the literature, other molecular databases and search engines. The RNAcentral website features a faceted search that enables users to explore different RNA sequences according to source, species and molecular function. Further expert databases will be included in future releases.
The RNAcentral consortium has its roots in a workshop held on the Wellcome Genome Campus in 2010. At that time members of the RNA community came together to discuss the lack of centralized access to RNA data.
“It is really satisfying to see this project come to fruition,” said Sam Griffiths-Jones of the University of Manchester. “The growth in non-coding RNA sequence and functional information is phenomenal and shows no signs of slowing. There has never been a greater demand for a universal resource for these data. The collaboration of RNAcentral consortium members to produce this resource represents an enormous step forward for the RNA field.”
According to BBSRC Chief Executive Prof. Jackie Hunter, “Fundamental research into noncoding RNAs has many potential applications, including disease diagnostics, new therapies and biotechnology. With the abundance of data now available due to next-generation DNA sequencing, there is an urgent need for informatics tools to decipher it. RNAcentral is a vital resource that will aggregate and integrate information to unify the data landscape and improve the discoverability and use of data by researchers worldwide.”
The resource uses EMBL-EBI infrastructure, notably data-submission and cross-reference services provided by the European Nucleotide Archive. It takes advantage of the nightly, global synchronisation of data from the International Nucleotide Sequence Database Collaboration. Future versions of RNAcentral will include additional data types and information about RNA structure, modifications, molecular interactions and function. A paper describing RNAcentral tools and features in detail has been accepted for publication in the journal Nucleic Acids Research.
RNAcentral partners are EMBL-EBI, the University of Manchester, the Wellcome Trust Sanger Institute, the University of California Santa Cruz, the University of Texas, Auburn University, Sandia National Laboratory, the University of Oxford, the Garvan Institute of Medical Research, the International Institute of Molecular and Cell Biology Warsaw and Adam Mickiewicz University, Rockefeller University, the Chinese Academy of Sciences, the Peking Union Medical College and Taicang Institute of Life Sciences Information, Michigan State University, National Chiao Tung University, Stanford University, the University of Thessaly, the Institute of Bioinformatics and Systems Biology of the Department of Biological Science and Technology at National Chiao Tung University and the National Center for Biotechnology Information.
“This resource will facilitate the next generation of RNA research and help drive further discoveries, including those that improve food production and human and animal health,” Bateman concluded.