As artificial intelligence and machine learning generate increased interest with the growing evidence of their potential in the field of drug discovery, companies are undertaking a variety of efforts to advance and participate in this movement. Among them are Recursion Pharmaceuticals, which recently released a large, open-source dataset to support the development of machine-learning algorithms, and AstraZeneca and BenevolentAI, which are uniting their data and platforms in a collaboration to seek out treatments for kidney disease.
“To answer fundamental questions facing biology and disease, and reimagine the drug discovery paradigm, we’re building the world’s largest, relatable, empirical biological dataset,” Dr. Chris Gibson, CEO of Recursion, said of the company’s recent news. “The RxRx1 dataset we’re announcing today represents an important resource for the machine learning community, with more than 100,000 images and 300-plus gigabytes of data representing diverse biological contexts. Yet despite the massive scale of this dataset, it represents just 0.4 percent of what we generate at Recursion on a weekly basis. We expect that the richness of this dataset, combined with the context surrounding the scale of our efforts, will inspire the world’s machine learning and AI community to help us in our mission to decode biology to radically improve lives.”
The dataset consists of images of human cells from more than 1,000 experimental conditions, with multiple biological replicates generated weeks or months apart. Each batch of data features unique experimental variations.
“At the highest level, these are fluorescent microscopy images of cells. The RxRx1 data set includes data from 51 instances of the same experiment design executed in different experimental batches. Each batch represents a complete set of 1,108 genetic contexts in a single human cell type (from among four different cell types: HUVEC, RPE, HepG2 and U20S). With different experimental batches comes experimental noise, and the challenge here is identifying the fundamental biology out of the experimental noise,” says Dr. Ron W. Alfa, senior vice president of translational discovery and chief evangelist at Recursion.
“AI and machine learning technologies are having a seismic impact on the industry, not just because these technologies now exist in ways that can be married with traditional drug discovery processes to better answer biological questions, but also because of the incredible financial cost associated with developing treatments (and the huge failure rate of the industry),” Alfa tells DDNews. “It is estimated to cost about $2.6 billion, on average, to bring a drug to market, which ultimately makes them more expensive for the payer and patient. At the same time, the vast majority of drug discovery projects fail even before reaching clinical testing. Imagine, that across the entire pharmaceutical industry, only 59 drugs were approved in 2018 and that was something of a record year.
“Against the backdrop of the huge costs and the high failure rates of the industry, pharma is increasingly looking to technology, specifically AI and machine learning, as a way to bring new medicines to patients more quickly and at a fraction of the cost. We’re just getting started, but already we’re seeing evidence that these technologies are going to impact just about every step in the drug discovery and development process.”
In conjunction with this effort, Recursion has initiated a competition, available through the NeurIPS 2019 Competition Track and co-sponsored by NVIDIA and Google Cloud, to encourage the development of machine-learning approaches that can determine representations of biology from within RxRx1.
“Our hope is that by open-sourcing a slice of our own data, we encourage others to do the same. Data-sharing and knowledge-sharing will be key to advancing this sector,” Alfa remarks.
In just such a data-sharing effort, AstraZeneca announced that it would be collaborating with BenevolentAI to apply machine learning and AI to discover and develop treatments for chronic kidney disease and idiopathic pulmonary fibrosis.
Mene Pangalos, executive vice president and president BioPharmaceuticals R&D, said: “The vast amount of data available to research scientists is growing exponentially each year. By combining AstraZeneca’s disease area expertise and large, diverse datasets with BenevolentAI’s leading AI and machine learning capabilities, we can unlock the potential of this wealth of data to improve our understanding of complex disease biology and identify new targets that could treat debilitating diseases.”
AstraZeneca brings with it experience in genomics, chemistry and clinical data, while BenevolentAI offers its target identification platform and biomedical knowledge graph, which consists of contextualized scientific data such as genes, proteins, diseases and compounds, and the relationships between them. The Benevolent Platform, according to the company, “can be used by scientists to try to discover novel pathways and mechanisms important in the pathophysiology of disease.” Financial details for the deal were not disclosed.
“Millions of people today suffer from diseases that have no effective treatment. The future of drug discovery and development lies in bridging the gap between AI, data and biology,” said BenevolentAI CEO Joanna Shields. “We are thrilled to be joining forces with AstraZeneca to develop new insights and identify promising new treatments for chronic kidney disease and idiopathic pulmonary fibrosis.”