Deep learning for drug design

HOUSTON—A deep learning-based technique created at Rice University’s Brown School of Engineering is designed to tell pharmaceutical researchers how drugs in development will perform in the human body. Computer scientist Lydia Kavraki—the Noah Harding Professor of Computer Science; a professor of bioengineering, mechanical engineering and electrical and computer engineering; and director of Rice’s Ken Kennedy Institute—and her team have introduced Metabolite Translator. This computational tool predicts metabolites, the products of interactions between small molecules like drugs and enzymes, and helps to determine the safety of drug candidates.

As Kavraki explained, “Metabolite Translator predicts potential metabolites that can be formed in the human body. Drugs, as well as other chemicals that may end up in our bodies—such as pesticides, pollutants and cosmetics—may be metabolized through enzymatic reactions that alter their structure. The metabolites that are formed may raise safety concerns and, therefore, should be identified before a drug is released into the market.

“For example, metabolites may be toxic to the liver, where the big majority of drugs [are] being metabolized. Metabolite Translator offers an efficient computational method for predicting the structure of possible metabolites. Coupled with a tool that predicts toxicity or other activity of chemical compounds, it can provide insights on the safety of a potential drug.”

Kavraki cited a case in which Metabolite Translator was used, along with other computational tools, in order to assess the safety of novel chemical compounds as potential treatments for COVID-19. She explained that designing drug candidates for novel targets is especially challenging, because the drug designed for the novel target can bind to other undesired targets, potentially leading to toxicities.

She pointed out that Metabolite Translator’s identification of metabolites does not directly give insights on the efficacy of a drug; rather, the identification of metabolites is being done for assessing the safety of drugs.

Metabolite Translator is based on SMILES (simplified molecular-input line-entry system), a notation method that uses plain text rather than diagrams to represent chemical molecules. Kavraki explained that SMILES can be seen as a language for chemistry.

“The characters in a SMILES sequence indicate the atoms and their connectivity within a molecule. The SMILES sequence that describes a single molecule can be seen as a word,” she noted. “Such a representation offers multiple advantages, including the adoption of computational techniques that are developed for sequential data or even natural languages for solving chemical problems. Similarly, we have used a deep learning model that has been developed for translating natural languages in order to translate a molecule (such as a drug) into possible metabolites that can be formed in the human body.”

In terms of the advantages of the system, Kavraki said, “Previous methodologies predict metabolites through specific enzyme families. They are focused on the enzymes that metabolize the big majority of drugs. However, complications due to active metabolites can be caused even through other enzymes.”

“However, the nature of existing approaches makes it difficult to extend coverage on additional enzymes,” she added. “They rely on sets of reaction rules in order to predict the structure of potential metabolites. The rules explicitly encode the action of enzymes and the derivation of these rules involves manual work by experts. On the other side, Metabolite Translator attempts to bypass this step of defining reaction rules by training a machine learning model which captures the underlying mechanisms of metabolic reactions.

“This approach does not restrict to specific enzyme families. Indeed, Metabolite Translator has been trained on a diverse dataset of human metabolic reactions that covers a wide range of enzymes. The evaluation of our method showed that Metabolite Translator performs equally well with existing tools on the major enzyme families, and additionally, it is able to predict metabolites through enzymes that are not commonly involved in drug metabolism and [are] therefore missed by existing tools.”

According to Kavraki, this methodology would not have been possible as little as two years ago.

“From a computational perspective, predicting metabolites of a given chemical molecule is a very challenging problem,” she remarked. “A suitable method for this problem should not only process chemical molecules, but also generate chemical molecules as output. For example, the problem is more complicated than associating molecules with chemical properties, a problem which has been extensively studied for several years. However, methodologies for generating structured data, such as chemical molecules, were only recently adopted in the field of cheminformatics and drug discovery."

This work was performed in collaboration with Eleni Litsa, a graduate students at Rice University, and Dr. Payel Das from IBM Research. The article is available at https://pubs.rsc.org/en/content/articlelanding/2020/SC/D0SC02639E#!divAbstract