Machine learning improves tandem MS
Purdue researchers apply machine learning concepts to create a new mass spec method
WEST LAFAYETTE, Ind.—Researchers at Purdue University recently created a new method of applying machine learning concepts to the tandem mass spectrometry process, with the intent of improving the flow of information for new drug development. Their work has been published in Chemical Science.
DDN spoke with Gaurav Chopra, an assistant professor of analytical and physical chemistry in Purdue’s College of Science, and Hilkka Kenttämaa, the Frank Brown Distinguished Professor of Analytical Chemistry and Organic Chemistry at Purdue, to find out more about how this new method will benefit researchers involved in drug discovery.
“Mass spectrometry is essential for the drug discovery process,” say Kenttämaa and Chopra. “Tandem mass spectrometry involves two or more ion separation and analysis steps instead of just one, which makes it a very powerful analytical method and especially suitable for the characterization of complex mixtures containing unknown compounds. This can be used in several drug discovery processes to identify drug metabolites, reaction products, impurities, etc.”
“The specific tandem mass spectrometer method we used here is based on ion-molecule reactions. One can first protonate all analytes in a mixture inside the tandem mass spectrometer, then select just one protonated analyte in the first ion separation step by ejecting all other ions from the instrument, then allow the selected ion to undergo diagnostic reactions with a neutral reagent, and finally analyze all the product ions by using the second ion analysis step,” they continue. “This is a very powerful technique that is combined with machine learning for data interpretation, analysis and automation.
“Analysis of the results obtained in the experiments described above is very tedious, as one often uses many different neutral reagents and analyzes many different protonated analytes that display different reactivities toward the neutral reagents. Machine learning methods with bootstrapping will make data analysis very fast and much more reliable without specific human biases.”
Chopra stated that there are two major problems in the field of machine learning used for chemical sciences: the methods used don’t provide chemical understanding of the decisions that are made by the algorithm, and new methods aren’t typically used for blind experimental tests to see if the proposed models are accurate for use in a chemical laboratory.
“We have addressed both of these items for a methodology that is isomer-selective and extremely useful in chemical sciences to characterize complex mixtures, identify chemical reactions and drug metabolites, and in fields such as proteomics and metabolomics,” he noted.
The researchers constructed statistically robust machine learning models to work with less training data. The model looks at a common neutral reagent, called 2-methoxypropene (MOP), and predicts how compounds will interact with MOP in a tandem mass spectrometer in order to obtain structural information for the compounds.
“This is the first time that machine learning has been coupled with diagnostic gas-phase ion-molecule reactions, and it is a very powerful combination, leading the way to completely automated mass spectrometric identification of organic compounds,” Kenttämaa reported. “We are now introducing many new reagents into this method.”
The Purdue team has also introduced chemical reactivity flowcharts to facilitate chemical interpretation of the decisions made by the machine learning method, which will be useful for understanding and interpreting the mass spectra for structural information.
“Chemical reactivity flowcharts are a representation of how the machine learning methods are making decisions. Specifically, identification of different parts of the structure is evaluated to see if the reactions have occurred. The identification of parts of the structure (or features) for a specific chemical reaction is done automatically by machine learning models,” Chopra and Kenttämaa explain. “Eventually, combining these features with specificity of the reactions occurring with each neutral analyte will automate the process towards the choice and specific order of reagents to be used to do the reactions.”
“The combination of tandem mass spectrometry and machine learning will enable us to completely automate the experiments with no human involvement,” they add. “Furthermore, machine learning will reveal how new types of analytes will react with specific reagents and what sort of new reagents might prove to be useful, thereby revealing new chemistry in the process.”
“Researchers will [now] be able to rapidly, reliably and automatically characterize structures of unknown compounds in complex mixtures. Eventually this will become easy and fast to address, where the models learn from their own mistakes iteratively and develop models on the fly in a reinforcement learning manner, thereby saving human effort not to chase dead-end leads. We believe that [combining] powerful machine learning methods with a powerful tandem mass spectrometry technique will solve very difficult problems faced in a wide variety of fields,” conclude Kenttämaa and Chopra.