A 3D illustration of protein molecules floating in a dark blue background

Credit: iStock.com/ Design Cells

Expert Advice: Fulfilling the potential of bottom-up proteomics

Achieving higher accuracy and depth in bottom-up proteomics involves navigating a multifaceted process with various steps and challenges.
Yuning Wang
| 7 min read
Register for free to listen to this article
Listen with Speechify
0:00
7:00

Lloyd Smith, a biochemist at the University of Wisconsin-Madison, known for his extensive expertise in mass spectrometry, began his research journey studying genetics. He invented the first automated DNA sequencer during his postdoctoral research and joined the University of Wisconsin-Madison in 1987, focusing on capillary electrophoresis. The emergence of matrix-assisted laser desorption/ionization and electrospray ionization in the 1990s soon caught his attention. These technologies allowed him to ionize and separate DNA molecules in a mass spectrometer instead of a gel.

“During that time, we ran into an issue with analyzing DNA mixtures,” Smith said. “We couldn’t detect larger fragments as well as we could detect smaller ones, which caused bias in the mass spectrometry data.” Recognizing this was caused by instrumental limitations, Smith developed a technique called charge reduction mass spectrometry that solved the issue. His technique was just as effective for analyzing proteins as for analyzing nucleic acids.

Lloyd Smith
Lloyd Smith, a biochemist at the University of Wisconsin-Madison, leads an interdisciplinary group of researchers to develop new mass spectrometric technologies for comprehensively identifying and quantifying proteoforms in biological systems.

Credit: Lloyd Smith

Since then, Smith shifted his focus to proteomics, pioneering mass spectrometrybased methods that enable the identification of proteoforms, variations of proteins arising from a single gene that play crucial roles in health and disease but are challenging to detect with conventional methods. His team also led the development of advanced software tools to enhance the speed, accuracy, and depth of proteomic analysis. These tools have become instrumental resources for researchers to visualize and interpret complex proteomics data, offering valuable insights into biological systems.

Continue reading below...
Fluorescent-style illustration of spherical embryonic stem cells clustered together against a dark background.
WebinarsAdvancing predictive in vitro models
Explore how emerging in vitro systems — built from primary cells, cocultures, and vascularized tissues — are improving translational research outcomes.
Read More

Can you explain the basics of bottom-up proteomics?

In bottom-up proteomics, we usually want to analyze the whole proteomes of cells. We start with lysing the cells to release their contents, spinning out insoluble materials, and isolating the proteins. We then digest these proteins into smaller peptides using enzymes, usually trypsin and sometimes other enzymes. This step converts the protein mixture into a complex peptide mixture. The next step involves liquid chromatography-mass spectrometry to analyze the peptides. We may include intermediate steps like fractionation, where peptides go through a chromatography column and come out separated from other peptides. They are then ionized via electrospray ionization and enter the mass spectrometer.

Mass spectrometry occurs in two steps, known as tandem mass spectrometry. The first step is called MS1, which determines the masses of the peptides as they elute from the column. Specified software then automatically selects the most intense peptide ions for further analysis. In the second step, called MS2, these selected peptide ions fragment inside the mass spectrometer, generating smaller ions that allow us to identify the peptides.

Continue reading below...
3D illustration of ciliated cells, with cilia shown in blue.
Application NoteMapping the hidden proteome of elusive organelles
Ultraprecise proteomic analysis reveals new insights into the molecular machinery of cilia.
Read More

What are the advantages and limitations of bottom-up proteomics compared to top-down proteomics?

Bottom-up proteomics is more widely used than top-down proteomics, which skips the peptide digestion step, because the molecules are less complex. This means we get simpler spectra that are easier to understand, interpret, and generate. However, the downside of bottom-up proteomics is the loss of the context of what else was in the protein. For example, two different forms of a protein, called proteoforms, might produce the same peptide upon digestion, making it difficult to distinguish between them during the analysis. This loss of context is a key difference between bottom-up and top-down proteomics, where the latter maintains the intact protein, providing more detailed information about the protein’s molecular forms.

How should researchers select appropriate enzymes for protein digestion?

Having more enzymes is generally better. Cutting proteins with trypsin can generate both great peptides and less informative peptides that are too long, too short, or too hydrophobic. Using multiple enzymes can create more overlapping sets of peptides, where one enzyme may help reveal sequences missed by another. This overlap helps us stitch together a more complete sequence.

Continue reading below...
3D illustration showing a DNA double helix encapsulated in a transparent capsule, surrounded by abstract white and orange protein-like molecular structures against a blue background.
EbooksFast track to certainty: streamlining biopharmaceutical quality assessment
Discover an integrated analytical approach that unites identification, purification, and stability assessment for therapeutic molecules.
Read More

My student Rachel Miller developed a software tool to help researchers select which enzymes to digest their samples. It stimulates the digestion process, predicts the peptide fragments generated from the digestion, and assesses their length, hydrophobicity, and potential protein coverage. This tool helps researchers identify which enzymes will yield optimal results before conducting wet lab experiments, enhancing efficiency without unnecessary experimentation.

What else can help improve data quality during sample preparation?

One effective way to improve data quality is through multidimensional separation. For example, when dealing with a complex sample, using just reverse-phase liquid chromatography generates a lot of co-eluting peptides, which could lead to missed peptide identifications and errors. In mass spectrometry, more separations mean higher data quality, but also takes more time. So, it’s always a trade-off between time and output, especially with complex samples.

What are the approaches for protein identification?

In bottom-up proteomics, there are two main approaches. The first and most common approach involves generating theoretical mass spectra of peptides from an in silico proteome and then matching these with experimental spectra. This helps identify the most probable theoretical peptides from the experimental data. John Yates’ group at Scripps Research developed this strategy and wrote a program called SEQUEST in the 1990s, which became the paradigm under which we operate. The other approach, called de novo sequencing, involves identifying proteins without pre-existing sequence knowledge. However, this method carries a higher risk of protein or peptide misidentification. With this approach, researchers often need to combine multiple enzymes with trypsin to gather more data.

Continue reading below...
Close-up of a researcher using a stylus to draw or interact with digital molecular structures on a blue scientific interface.
ArticlesSpeaking the same molecular language in the age of complex therapeutics
When molecules outgrow the limits of sketches and strings, researchers need a new way to describe and communicate them.
Read More

In bottom-up proteomics, the results are typically probabilistic rather than absolute. When analyzing a large number of molecules, we rarely get a definitive answer. Instead, we obtain results with a high probability of correctness, which we call a false discovery rate. This rate accounts for the likelihood that some identifications may be incorrect. Peptide identification methods allow researchers to make informed guesses about which proteoforms are present in the sample. However, these conclusions are based on the data at hand and are not definitive.

What tools do you use to analyze mass spectrometry data?

We’ve developed an open source search engine named MetaMorpheus, which we built based on a tool called Morpheus, created by Craig Wenger in the Joshua Coon group at the University of Wisconsin-Madison. Both MetaMorpheus and Morpheus use a scoring function to match experimental and theoretical spectra for peptide identification. It’s very important that software tools are open source because that allows everyone to understand exactly what they’re doing and how they’re doing it. With open source code, others can look at the code and improve it without needing to reinvent it. We made all our source code available on GitHub, and we assist users who encounter problems. We have also implemented industrial-strength software robustness policies to test the code before new releases. Typically, a student leads a project to develop a new idea. They transform the idea into an algorithm and write the code to execute it. We then integrate this new capability into MetaMorpheus.

Continue reading below...
A colorful 3D rendering of molecular structures featuring interconnected blue, green, and yellow spheres representing atoms and bonds, symbolizing chemical and biological complexity.
ExplainersExplained: How is synthetic chemistry driving drug discovery today?
In parallel with advances in new modalities and digital design, synthetic chemistry remains integral to pharmaceutical innovation.
Read More

Our goal is to develop software capable of identifying all co-eluting peptide variants, even when they overlap, which could significantly increase the throughput of bottom-up proteomics.

What are the applications of MetaMorpheus?

A particularly useful tool in MetaMorpheus is G-PTM-D, which stands for global post-translational modification discovery. This tool helps us identify new post-translational modifications (PTMs). Traditionally, researchers use an algorithm called Variable Modification to find PTMs. This approach requires creating a large database with all possible PTM sites and searching for the best match. This process increases analysis time and error rate due to the large number of incorrect entries in databases.

To address this, we implemented a method that creates smaller databases based on just the observed mass shifts. This allows us to search for all modifications simultaneously and identify PTMs without prior knowledge of what they are or their location. In a recent paper published in the Journal of Proteome Research, we used G-PTM-D to discover previously unknown modifications in the human immunodefficiency virus capsid and matrix proteins that were biologically meaningful.

What are some recent enhancements you’ve made to your data analysis tools?

We’ve added many capabilities over time. For example, we’ve developed a tool called FlashLFQ, with LFQ standing for label-free quantitation. We integrated FlashLFQ into MetaMorpheus, allowing it to work with G-PTM-D to quantify modified peptides accurately. Currently, we’re making algorithmic modifications to improve the accuracy of these tools for single cell proteomic data. This is important because single cell proteomic data is often of poor quality, and using standard software designed for high quality data isn’t effective for low quality, noisy data.

Continue reading below...
3D illustration of RNA molecules on a gradient blue background.
InfographicsExploring the RNA therapeutics universe
With diverse emerging modalities and innovative delivery strategies, RNA therapeutics are tackling complex diseases and unmet medical needs.
Read More
A picture of Sam Markovitch and Brian Frey in the Lloyd Smith laboratory operating a mass spectrometer
Sam Markovitch and Brian Frey in the Lloyd Smith laboratory conduct bottom-up proteomics to characterize the proteomes of cells.

Credit: Lloyd Smith

What advice would you give to researchers new to bottom-up proteomics?

One of the first decisions is choosing the chromatography method. Our group specializes in nanoflow chromatography. It’s very sensitive but also finicky. Alternatively, many facilities use microflow chromatography, which employs larger columns and is less sensitive but simpler to set up. The next critical step is mastering the instrument’s software and operation. There are a lot of decisions to make, such as choosing between data-dependent acquisition and data-independent acquisition, each with its advantages and weaknesses. Once data starts accumulating, the challenge shifts to data analysis. Converting spectra into protein lists requires specialized software, and understanding isotopic or label-free quantitation methods is essential for quantitative analysis. Finally, extracting biological insights from identified proteins involves using a whole host of tools for network and pathway analysis and protein-protein interaction studies. Each step requires attention and patience. For beginners, I recommend learning from colleagues or visiting a specialized lab to get hands-on experience. Using user-friendly software, such as MetaMorpheus, can also be beneficial.

Continue reading below...
A 3D rendering of red and yellow protein molecules floating in a fluid-like environment.
Application NoteAccelerating access to soluble, functional proteins
Discover approaches that shorten the path from DNA constructs to purified, functional proteins.
Read More

What are the current challenges and future directions you see in proteomics?

One of the key challenges we face is the complexity of our samples. Often, multiple peptide variants co-elute from the chromatographic column and enter the mass spectrometer simultaneously, resulting in messy spectra. Current software typically identifies the most prominent peptide variants, leaving others unidentified. Our goal is to develop software capable of identifying all co-eluting peptide variants, even when they overlap, which could significantly increase the throughput of bottom-up proteomics. That’s an exciting opportunity to advance the field.

In broader terms, bottom-up proteomics is relatively mature, benefiting from substantial advancements in instrumentation and methodology over the years. The real frontier in proteomics is top-down proteomics, particularly in proteoform analysis. These areas are exciting and challenging, with many new things to do and a lot of room to improve.

This interview has been condensed and edited for clarity.

About the Author

  • Yuning Wang

    Yuning joined the custom content team at Drug Discovery News in June 2022. She earned her PhD in biochemistry from the University of Western Ontario, where she investigated how calcium sensor proteins regulate muscle cell membrane repair and cause muscular dystrophy. Yuning developed a passion for science communication during graduate school and began her career as a science writer in 2020. She enjoys reading, gardening, and trying new restaurants in Toronto.

Related Topics

Loading Next Article...
Loading Next Article...
Subscribe to Newsletter

Subscribe to our eNewsletters

Stay connected with all of the latest from Drug Discovery News.

Subscribe

Sponsored

Fluorescent-style illustration of spherical embryonic stem cells clustered together against a dark background.
Explore how emerging in vitro systems — built from primary cells, cocultures, and vascularized tissues — are improving translational research outcomes. 
3D illustration of ciliated cells, with cilia shown in blue.
Ultraprecise proteomic analysis reveals new insights into the molecular machinery of cilia.
3D illustration showing a DNA double helix encapsulated in a transparent capsule, surrounded by abstract white and orange protein-like molecular structures against a blue background.
Discover an integrated analytical approach that unites identification, purification, and stability assessment for therapeutic molecules.
Drug Discovery News December 2025 Issue
Latest IssueVolume 21 • Issue 4 • December 2025

December 2025

December 2025 Issue

Explore this issue