Decoding cell behavior and disease with AI and single-cell transcriptomics
Machine learning paired with single-cell analysis helps scientists identify pathways that cause disease and potential therapeutics.
Something had clearly gone awry in the patient’s lung. Cells that should have been protective ciliated and club cells had instead differentiated into goblet cells, spewing mucus all over the surface of the lung epithelium. To find out what went wrong, some scientists might focus on one or two genes or molecules that provoke improper cellular differentiation, but a team at the Cambridge, Massachusetts biotech company Cellarity pursues questions like this with a birds-eye-view; they observe the gene expression changes occurring in millions of single cells as the cells transition from healthy to diseased.
By leveraging the power of single-cell transcriptomics with artificial intelligence and machine learning, the team identifies the gene expression changes that drive the healthy to diseased transition. They then apply machine-learning algorithms to predict potential therapeutics, allowing for an agnostic approach to treating disease.
“Diseases are very complex,” said Effie Tozzo, the Senior Vice President of Drug Development at Cellarity. “Some of these gene networks are drivers, and many are passengers. For us to understand the gene networks that we really need to focus on and to target as a cell behavior target, we need to understand which ones are the drivers.”
With single cell transcriptomics, scientists can catch any small fluctuations in RNA expression that occur in a particular cell type as it changes from healthy to diseased. The difficulty with this approach, however, is the sheer amount of data that it yields.
“In current standard single-cell RNAseq data, we can generate up to several million cells simultaneously, so the scale is huge, and information is sparse,” said Qin Ma, a systems biologist at Ohio State University who is not associated with Cellarity. “How can we find a needle in a stack? That's very hard, and luckily, machine learning and deep-learning frameworks, they have the power.”
Using machine learning algorithms, Cellarity’s scientists convert their single-cell gene expression data into digital maps, which they call “Cellarity Maps.” By applying additional machine-learning algorithms, the researchers identify the cellular behaviors driving disease.
In the case of the aberrant goblet cells found in a chronic obstructive pulmonary disease (COPD) patient’s lung, scientists used Cellarity Maps to investigate how the cells in the lung epithelial tissue — ciliated, club, goblet, and basal cells — altered their gene expression as they transitioned from healthy to diseased.
“We identified transitions that lead to a decrease of goblet cells and a concomitant increase in ciliated cells in COPD epithelial cells,” Tozzo said. While the company cannot reveal what those transcriptional transitions are yet, they are now implementing their machine-learning algorithms to identify or design molecules to correct those cell behaviors.
“We'll start by predicting molecules that will have an effect. We need to test them, and if they're active, this is our start for chemistry,” Tozzo said. “We will never have to make 3000 molecules to make a clinical candidate just because we're really using machine learning to drive this.”
Dominic Grün, a systems biologist at the University of Würzburg who is not associated with Cellarity, said that when it comes to data analysis, this approach is the state of the art. “These neural networks capture a lot of information in the system without seeing each and every possible case,” Grün said. “This is really the focus of where Cellarity is. They're trying to bring that to perfection.”
Using this approach, scientists hope to identify the drivers behind both rare and common complex diseases in four therapeutic areas: metabolic disease, respiratory disease, immuno-oncology, and hematological diseases. As they continue to grow, the team plans to add more data to their Cellarity Maps and to integrate information on cell-cell interactions in addition to that of just single cells.
By weeding through the noise of cellular changes to find the ones that cause disease, scientists will be one step closer to curing complex diseases — one misbehaving cell at a time.