BROOKINGS, S.D.—Linking individual genes—or sets of them—to disease is often the first step to understanding that disease, much less treating it. But a trio of institutions is looking to go even deeper by exploring not just genes, but how they are expressed, in hopes of determining more about how diseases advance.
This work is made possible by a four-year, $1.04-million National Institutes of Health Research Project grant, which is administered by the National Institute of General Medical Sciences. Four doctoral students will participate in this project each year. The grant, also known as an RO1 grant, offers the chance for the recipients to apply for a continuation award in the final year of the grant.
South Dakota State University Assistant Professor Qin Ma, who holds a joint faculty position in mathematics and statistics and agronomy, horticulture and plant science, will lead the team. Ma is also working with the South Dakota’s BioSystems Networks and Translational Research center.
“Most biological techniques collect 1,000 cells in a tissue and assume the way in which the genes are expressed is identical—that is not the case,” he remarked. “Each tissue contains multiple cell types and each has its own regulatory mechanism.”
This project has three main parts, and each participating university will take the lead on one segment. The first order of business is the establishment of a mathematical formula, which will be handled by Anru Zhang, a theoretical statistician and assistant professor of statistics at the University of Wisconsin–Madison. Once the formula is in hand, Ma will design algorithms that can be used to analyze data. And lastly, Chi Zhang, an assistant professor of medical and molecular genetics at the Indiana University School of Medicine, will apply the developed model—which will use single-cell RNA sequencing data—to cancer tissues.
With regards to what the South Dakota State University contingent will be responsible for, Ma says that “In this project, we propose the development of a computational infrastructure to derive gene signatures of cell-type specific TRSs from single-cell RNA-Seq data and decompose a tissue transcriptomic data to the contributions of TRSs in its component cells. Specifically, my lab will focus on the last two aims among the following three aims: (1) Mathematically model TRS and associated co-regulation gene modules through transcriptomic profiles of single cells; (2) Develop a novel bi-clustering algorithm for identifying condition/cell-type specific co-regulated gene modules in single-cell transcriptomic data; and (3) Identify and annotate the gene signatures for each TRS, and estimate the level of each TRS in independent tissue data. All the developed computational tools and derived knowledge will be maintained into a web server/database for public utilization.”
Given that there are 20,000 genes in the human genome, and each cell’s expression of its genes affect the cell’s function, there’s a lot of information to unlock in this work—particularly since gene expression can also change due to disease.
“A gene’s expression in an individual cell is regulated by a set of transcriptional regulatory signals (TRSs) such as transcription factors, miRNAs, lncRNA and epigenomic regulators,” Ma explains. “Deciphering cell-type specific expression contribution is equivalent to identifying the true cell-type specific TRSs in different cell components of a tissue sample. Recent studies revealed the crucial impact of stromal and immune cells on the progression and metastasis of cancer. We will apply the computational methods to TCGA tissue expression and single-cell expression data from other sources, to quantitatively estimate the level of cell type-specific TRSs for different cell types within a cancer tissue.”
“Considering that the highly diverse TRS types in mammalian cells cannot be simultaneously measured by current experimental methods, we will model and quantify cell-type specific TRSs via mathematically well-defined co-regulation modules of their regulated genes based on single-cell RNA-Seq data,” he continues. “We hypothesize that the genes co-regulated by a common TRS in multiple cells can be characterized in single-cell RNA-Seq data and form gene signatures of the TRS. Mathematically, such a problem can be formulated as detection of a submatrix in a single-cell expression matrix, where the genes share coherent expression patterns over certain single-cell samples.”
The models in this project will be developed using stromal and immune cells, according to Ma. He tells DDNews that this approach could have potential in other diseases outside of cancer, noting that “The proposed computational techniques can be robustly applied to other diseases besides cancer where heterogeneous cell types exist in the tissue.”