Technology Guides
Technology Guide: Bottom-up proteomics
Advanced techniques, tools, and strategies can help researchers refine their proteomics workflows, improving downstream results.
For researchers unraveling the complex proteomes of biological systems, bottom-up proteomics offers the sensitivity and resolution needed to identify and quantify proteins, characterize post-translational modifications, and uncover disease-relevant biomarkers. However, each step of the workflow, from enzyme selection to mass spectrometry analysis and data interpretation, demands careful decisions.
Download the full Technology Guide to learn:
- How to optimize every stage of the bottom-up proteomics workflow
- Strategies to enhance proteome coverage and identify low-abundance or modified proteins
- Tools and expert insights for streamlining data analysis and improving quantitation accuracy
Top Image Credit:
iStock.com/Christoph Burgstedt
A TECHNOLOGY GUIDE FOR BOTTOM-UP PROTEOMICS Bottom-up proteomics is a widely utilized mass spectrometry-based approach for identifying and quantifying proteins in biological samples. In contrast to top-down proteomics, which involves analyzing intact proteins, bottom-up proteomics involves breaking proteins down into smaller peptides before mass spectrometry analysis. A typical bottom-up proteomics workflow starts with extracting proteins from biological material and enzymatically digesting them into smaller peptides. Researchers then separate these peptides using liquid chromatography and introduce them into a mass spectrometer for analysis. Inside the mass spectrometer, peptides fragment into smaller ions, and their masses are measured to generate peptide spectra. Scientists use bioinformatics tools to match these spectra to theoretical spectra from protein databases, identifying the original proteins. They can also measure protein abundance across samples using quantitative techniques like label-free quantification or stable isotope labeling. Bottom-up proteomics offers a detailed analysis of the proteome’s complexity, dynamics, and post-translational modifications (PTMs) that are essential for understanding protein function and regulation. These capabilities make bottom-up proteomics a critical tool for biomarker discovery, clinical diagnostics, and studying cellular processes and disease mechanisms. 1. Protein extraction approach Extracting a protein mixture from biological samples is the crucial first step of a bottom-up proteomics workflow. No universal methods exist for this step, requiring researchers to customize their approach depending on the sample type. For cell samples, it is necessary to break down cell membranes using lysis buffer containing detergents and sonication and prevent protease and phosphatase activity by adding enzyme inhibitors. The lysed sample then undergoes protein extraction, typically via precipitation using organic solvents and further purification methods like chromatography and ultrafiltration (1). For biological fluid samples, the presence of high-abundance proteins, like albumin, can interfere with the detection of less abundant proteins. To address this, immunoaffinity methods can selectively remove these high-abundance proteins, enriching the sample for proteins of interest (1). Tissue samples need homogenization to break down their complex structures and release proteins. Homogenization methods include manual grinding, mechanical grinders, bead-beating, sonication, liquid nitrogen pulverization, and pressure cycling homogenizers. Researchers should choose a method based on tissue type and equipment availability (1). The homogenized tissue is then subjected to protein extraction and purification, similar to cell samples. 2. Choice of proteases Trypsin is the most widely used protease in bottom-up proteomics due to its robustness, reliability, and cost effectiveness. It cleaves proteins after lysine or arginine residues, generating small peptides ideal for fragmentation in mass spectrometry. However, exclusively relying on trypsin can limit the view of the proteome as it sometimes produces peptides that are too short for effective mass spectrometry analysis. Proteins with few lysine or arginine residues, such as membrane proteins, and certain PTMs can be challenging to characterize using tryptic digestion alone (2). To overcome these limitations, researchers can supplement trypsin with alternative enzymes to enhance peptide coverage and identification. For instance, chymotrypsin is suitable for proteins with long hydrophobic sections, Lys-C generates longer peptides than trypsin, and Gluc-C is effective for cleaving heavily glycosylated proteins. There are also proteases like Asp-N and Lys-N that are compatible with detergents and high temperatures. Additionally, Arg-C, which cleaves after arginine residues, is useful for characterizing PTMs and increasing proteome coverage (3). By combining these enzymes, researchers can uncover more protein sequences and modifications to comprehensively understand the proteome. 3. Peptide fractionation and separation Prior to mass spectrometry analysis, researchers typically need to reduce the complexity of peptide mixtures to enhance the depth and accuracy of protein identification and quantification. Highperformance liquid chromatography (HPLC), which separates complex mixtures of peptides into smaller, more manageable subsets based on their chemical properties, is an integral step in this process. HPLC methods include size-exclusion chromatography, ion-exchange chromatography, and reverse-phase HPLC, each separating peptides based on characteristics such as peptide size, net charge, or hydrophobicity (4). These methods effectively reduce sample complexity, increase the coverage of complex proteomes, and improve the detection of low-abundance peptides. When selecting and applying these methods, researchers should consider factors such as sample complexity, proteome coverage, instrument compatability, throughput, and sensitivity requirements. 4. Optimizing mass spectrometry methods Choosing the right mass spectrometer is crucial to ensure it meets specific experiment needs, such as sensitivity, resolution, mass range, and speed. High sensitivity is vital for detecting lowabundance peptides, while resolution helps accurately separate closely spaced peaks to enhance peptide identification. Researchers must also select data acquisition strategies that align with their experimental goals. Two common approaches are data-dependent acquisition (DDA) and data-independent acquisition (DIA). DDA dynamically selects the most abundant precursor ions for fragmentation and analysis, offering high sensitivity for detecting low-abundance peptides. In contrast, DIA fragments all precursor ions across a wide mass range, enabling comprehensive data acquisition (5). Each approach has its advantages, requiring careful requiring careful consideration and planning to optimize performance and achieve the desired outcomes. Additionally, researchers should employ appropriate fragmentation methods, such as collision-induced dissociation or higher-energy collisional dissociation, based on peptide characteristics and experimental objectives. 5. Analyzing proteomic data Deriving meaningful biological insights from mass spectrometry data includes peptide and protein identification, quantification, and PTM characterization. Peptide and protein identification typically begin with database searching to convert mass spectra into peptide sequences. Various software tools compare mass spectra data against a protein sequence database to identify peptides and their corresponding proteins. Researchers then score the matches by applying algorithms to assess their quality and remove unreliable matches (4). Other software tools help detect PTMs, such as phosphorylation, acetylation, and glycosylation, by analyzing fragmentation patterns to identify the location of these modifications (6). Additionally, specialized data analysis tools enable protein quantitation for isotopic label-based and label-free quantitation methods, allowing researchers to determine the relative or absolute protein abundance across different samples (5). Towards comprehensive biological insights With ongoing technological advancements enhancing sensitivity, resolution, and throughput, bottom-up proteomics enables deeper insights into the proteome’s complexity and dynamics. Integrating bottom-up proteomics with other omics technologies, such as genomics and metabolomics, provides a more comprehensive understanding of biological systems for identifying novel biomarkers, drug targets, and therapeutic strategies. When planning a bottom-up proteomics experiment, researchers should consider the following aspects. 2 | Bottom-up Proteomics BY YUNING WANG, PHD Top considerations for bottom-up proteomics REFERENCES 1. Duong, V-A. & Lee, H. Bottom-up proteomics: advancements in sample preparation. Int. J. Mol. Sci. 24, 5350 (2023). 2. Miller, R. M. & Smith, L. M. Overview and considerations in bottom-up proteomics. Analyst 148, 475–486 (2023). 3. Tsiatsiani, L. & Heck, A. J. R. Proteomics beyond trypsin. FEBS J. 282, 2612–2626 (2015). 4. Dupree, E. J. et al. A critical review of bottom-up proteomics: the good, the bad, and the future of this field. Proteomes 8, 14 (2020). 5. Jiang, Y. et al. Comprehensive overview of bottom-up proteomics using mass spectrometry. ACS Meas. Sci. Au (2024). doi:10.1021/acsmeasuresciau.3c00068 6. Chalkley, R. J. & Clauser, K. R. Modification site localization scoring: strategies and performance. Mol. Cell. Proteomics 11, 3–14 (2012). 7. Orsburn, B. C. Proteome Discoverer—a community enhanced data processing suite for protein informatics. Proteomes 9, 15 (2021). 8. FragPipe. FragPipe at 9. Röst, H. L. et al. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat. Methods 13, 741–748 (2016). 10. Farag, Y. M., Horro, C., Vaudel, M. & Barsnes, H. PeptideShaker online: a user-friendly web-based framework for the identification of mass spectrometry-based proteomics data. J. Proteome Res. 20, 5419–5423 (2021). 11. Solntsev, S. K., Shortreed, M. R., Frey, B. L. & Smith, L. M. Enhanced global post-translational modification discovery with MetaMorpheus. J. Proteome Res. 17, 1844–1851 (2018). 12. Bern, M., Kil, Y. J. & Becker, C. Byonic: advanced peptide and protein identification software. Curr. Protoc. Bioinforma. 40, 13.20.1-13.20.14 (2012). Abundance Abundance Abundance Name Description Mascot The Mascot search engine identifies proteins from mass spectrometry data by matching experimental data with sequence databases (4). SEQUEST SEQUEST is a mass spectrometry data analysis program that matches experimental spectra to theoretical spectra generated from protein databases (4). MaxQuant This software analyzes large-scale mass spectrometry data, providing tools for label-free and isobaric labeling-based quantitation and PTM analysis (4). Proteome Discoverer This software integrates with various search engines for protein identification and quantification, offering customizable workflows and PTM analysis (7). PEAKS PEAKS is a proteomics software program for protein identification, de novo protein sequencing, and quantification (4). FragPipe This computational mass spectrometry data analysis platform supports peptide identification, label-free and label-based quantitation, and PTM analysis (8). OpenMS OpenMS is an open-source software platform that supports common mass spectrometric data processing tasks (9). Name Description PeptideShaker PeptideShaker is a web-based framework for identifying and visualizing mass spectrometry-based proteomics data from raw data (10). Skyline Skyline is a targeted proteomics software for quantitative analysis, supporting raw data formats from multiple mass spectrometric vendors (5). MetaMorpheus This open-source software program enables the comprehensive analysis of proteomic data from PTM analysis and quantification to spectral annotation and data visualization (11). Byonic The Byonic software allows researchers to identify peptides, proteins, and PTMs by tandem mass spectrometry (12). Scaffold Scaffold is a software tool for visualizing and interpreting proteomics data, offering statistical analysis, quantification, and PTM analysis functionalities (5). Perseus Perseus supports biological and biomedical researchers in interpreting protein quantification, interaction, and PTM data with a comprehensive suite of statistical tools (5). Common bottom-up proteomics data analysis software A typical bottom-up proteomics workflow 3 | Bottom-up Proteomics CELLS PROTEASE PEPTIDES FRACTIONS LIQUID CHROMATOGRAPHY-MASS SPECTROMETRY TISSUE BIOLOGICAL FLUID PROTEIN MIXTURE Bottom-up proteomics involves digesting complex protein mixtures into peptides, enabling efficient and sensitive analysis using mass spectrometry. This method facilitates protein identification and quantification, providing crucial insights into protein function, interactions, and modifications within cells and tissues. PROTEIN EXTRACTION ENZYMATIC DIGESTION PEPTIDE PURIFICATION & FRACTIONATION DATA INTERPRETATION FRAGMENTATION DATA ANALYSIS MASS SPECTROMETRY ANALYSIS Abundance M/Z MS MS/MS PROTEIN IDENTIFICATION PROTEIN QUANTIFICATION PTM CHARACTERIZATION BIOINFORMATICS & DATABASE SEARCH EXPERT ADVICE: Fulfilling the potential of bottom-up proteomics Achieving higher accuracy and depth in bottom-up proteomics involves navigating a multifaceted process with various steps and challenges. INTERVIEWED BY YUNING WANG, PHD L LOYD SMITH, A BIOCHEMIST AT the University of Wisconsin-Madison, known for his extensive expertise in mass spectrometry, began his research journey studying genetics. He invented the first automated DNA sequencer during his postdoctoral research and joined the University of Wisconsin-Madison in 1987, focusing on capillary electrophoresis. The emergence of matrix-assisted laser desorption/ionization and electrospray ionization in the 1990s soon caught his attention. These technologies allowed him to ionize and separate DNA molecules in a mass spectrometer instead of a gel. “During that time, we ran into an issue with analyzing DNA mixtures,” Smith said. “We couldn’t detect larger fragments as well as we could detect smaller ones, which caused bias in the mass spectrometry data.” Recognizing this was caused by instrumental limitations, Smith developed a technique called charge reduction mass spectrometry that solved the issue. His technique was just as effective for analyzing proteins as for analyzing nucleic acids. Since then, Smith shifted his focus to proteomics, pioneering mass spectrometrybased methods that enable the identification of proteoforms, variations of proteins arising from a single gene that play crucial roles in health and disease but are challenging to detect with conventional methods. His team also led the development of advanced software tools to enhance the speed, accuracy, and depth of proteomic analysis. These tools have become instrumental resources for researchers to visualize and interpret complex proteomics data, offering valuable insights into biological systems. Can you explain the basics of bottom-up proteomics? In bottom-up proteomics, we usually want to analyze the whole proteomes of cells. We start with lysing the cells to release their contents, spinning out insoluble materials, and isolating the proteins. We then digest these proteins into smaller peptides using enzymes, usually trypsin and sometimes other enzymes. This step converts the protein mixture into a complex peptide mixture. The next step involves liquid chromatography-mass spectrometry to analyze the peptides. We may include intermediate steps like fractionation, where peptides go through a chromatography column and come out separated from other peptides. They are then ionized via electrospray ionization and enter the mass spectrometer. Mass spectrometry occurs in two steps, known as tandem mass spectrometry. The first step is called MS1, which determines the masses of the peptides as they elute from the column. Specified software then automatically selects the most intense peptide ions for further analysis. In the second step, called MS2, these selected peptide ions fragment inside the mass spectrometer, generating smaller ions that allow us to identify the peptides. What are the advantages and limitations of bottom-up proteomics compared to top-down proteomics? Bottom-up proteomics is more widely used than top-down proteomics, which skips the peptide digestion step, because the molecules are less complex. This means we get simpler spectra that are easier to understand, interpret, and generate. However, the downside of bottom-up proteomics is the loss of the context of what else was in the protein. For example, two different forms of a protein, called proteoforms, might produce the same peptide upon digestion, making it difficult to distinguish between them during the analysis. This loss of context is a key difference between bottom-up and top-down proteomics, where the latter maintains the intact protein, providing more detailed information about the protein’s molecular forms. Lloyd Smith, a biochemist at the University of Wisconsin-Madison, leads an interdisciplinary group of researchers to develop new mass spectrometric technologies for comprehensively identifying and quantifying proteoforms in biological systems. CREDIT: LLOYD SMITH 4 | Bottom-up Proteomics How should researchers select appropriate enzymes for protein digestion? Having more enzymes is generally better. Cutting proteins with trypsin can generate both great peptides and less informative peptides that are too long, too short, or too hydrophobic. Using multiple enzymes can create more overlapping sets of peptides, where one enzyme may help reveal sequences missed by another. This overlap helps us stitch together a more complete sequence. My student Rachel Miller developed a software tool to help researchers select which enzymes to digest their samples. It stimulates the digestion process, predicts the peptide fragments generated from the digestion, and assesses their length, hydrophobicity, and potential protein coverage. This tool helps researchers identify which enzymes will yield optimal results before conducting wet lab experiments, enhancing efficiency without unnecessary experimentation. What else can help improve data quality during sample preparation? One effective way to improve data quality is through multidimensional separation. For example, when dealing with a complex sample, using just reverse-phase liquid chromatography generates a lot of co-eluting peptides, which could lead to missed peptide identifications and errors. In mass spectrometry, more separations mean higher data quality, but also takes more time. So, it’s always a trade-off between time and output, especially with complex samples. What are the approaches for protein identification? In bottom-up proteomics, there are two main approaches. The first and most common approach involves generating theoretical mass spectra of peptides from an in silico proteome and then matching these with experimental spectra. This helps identify the most probable theoretical peptides from the experimental data. John Yates’ group at Scripps Research developed this strategy and wrote a program called SEQUEST in the 1990s, which became the paradigm under which we operate. The other approach, called de novo sequencing, involves identifying proteins without pre-existing sequence knowledge. However, this method carries a higher risk of protein or peptide misidentification. With this approach, researchers often need to combine multiple enzymes with trypsin to gather more data. In bottom-up proteomics, the results are typically probabilistic rather than absolute. When analyzing a large number of molecules, we rarely get a definitive answer. Instead, we obtain results with a high probability of correctness, which we call a false discovery rate. This rate accounts for the likelihood that some identifications may be incorrect. Peptide identification methods allow researchers to make informed guesses about which proteoforms are present in the sample. However, these conclusions are based on the data at hand and are not definitive. What tools do you use to analyze mass spectrometry data? We’ve developed an open source search engine named MetaMorpheus, which we built based on a tool called Morpheus, created by Craig Wenger in the Joshua Coon group at the University of Wisconsin-Madison. Both MetaMorpheus and Morpheus use a scoring function to match experimental and theoretical spectra for peptide identification. It’s very important that software tools are open source because that allows everyone to understand exactly what they’re doing and how they’re doing it. With open source code, others can look at the code and improve it without needing to reinvent it. We made all our source code available on GitHub, and we assist users who encounter problems. We have also implemented industrial-strength software robustness policies to test the code before new releases. Typically, a student leads a project to develop a new idea. They transform the idea into an algorithm and write the code to execute it. We then integrate this new capability into MetaMorpheus. What are the applications of MetaMorpheus? A particularly useful tool in MetaMorpheus is G-PTM-D, which stands for global post-translational modification discovery. This tool helps us identify new post-translational modifications (PTMs). Traditionally, researchers use an algorithm called Variable Modification to find PTMs. This approach requires creating a large database with all possible PTM sites and searching for the best match. This process increases analysis time and error rate due to the large number of incorrect entries in databases. To address this, we implemented a method that creates smaller databases based on just the observed mass shifts. This allows us to search for all modifications simultaneously and identify PTMs without prior knowledge of what they are or their location. In a recent paper published in the Journal of Proteome Research, we used G-PTMD to discover previously unknown modifications in the human immunodefficiency virus capsid and matrix proteins that were biologically meaningful. What are some recent enhancements you’ve made to your data analysis tools? We’ve added many capabilities over time. For example, we’ve developed a tool called FlashLFQ, with LFQ standing for label-free quantitation. We integrated FlashLFQ into MetaMorpheus, allowing it to work with G-PTM-D to quantify modified peptides accurately. Currently, we’re making algorithmic modifications to improve the accuracy of these tools for single cell proteomic data. This is important because single cell proteomic data is often of poor quality, and using standard software designed for high quality data isn’t effective for low quality, noisy data. What advice would you give to researchers new to bottom-up proteomics? One of the first decisions is choosing the chromatography method. Our group specializes in nanoflow chromatography. It’s very sensitive but also finicky. Alternatively, many facilities use microflow chromatography, which employs larger columns and is less sensitive but simpler to set up. The next critical step is mastering the instrument’s software and operation. There are a lot of decisions to make, such as choosing between data-dependent acquisition and data-independent acquisition, each with its advantages and weaknesses. Once data starts accumulating, the challenge shifts to data analysis. Converting spectra into protein lists requires specialized software, and understanding isotopic or label-free quantitation methods is essential for quantitative analysis. Finally, extracting biological insights from identified proteins involves using a whole host of tools for network and pathway analysis and protein-protein interaction studies. Each step requires attention and patience. For beginners, I recommend learning from colleagues or visiting a specialized lab to get hands-on experience. Using user-friendly software, such as MetaMorpheus, can also be beneficial. What are the current challenges and future directions you see in proteomics? One of the key challenges we face is the complexity of our samples. Often, multiple peptide variants co-elute from the chromatographic column and enter the mass spectrometer simultaneously, resulting in messy spectra. Current software typically identifies the most prominent peptide variants, leaving others unidentified. Our goal is to develop software capable of identifying all co-eluting peptide variants, even when they overlap, which could significantly increase the throughput of bottom-up proteomics. That’s an exciting opportunity to advance the field. In broader terms, bottom-up proteomics is relatively mature, benefiting from substantial advancements in instrumentation and methodology over the years. The real frontier in proteomics is top-down proteomics, particularly in proteoform analysis. These areas are exciting and challenging, with many new things to do and a lot of room to improve. This interview has been condensed and edited for clarity. 5 | Bottom-up Proteomics Sam Markovitch and Brian Frey in the Lloyd Smith laboratory conduct bottom-up proteomics to characterize the proteomes of cells. CREDIT: LLOYD SMITH “Our goal is to develop software capable of identifying all co-eluting peptide variants, even when they overlap, which could significantly increase the throughput of bottom-up proteomics.”
A TECHNOLOGY GUIDE FOR BOTTOM-UP PROTEOMICS Bottom-up proteomics is a widely utilized mass spectrometry-based approach for identifying and quantifying proteins in biological samples. In contrast to top-down proteomics, which involves analyzing intact proteins, bottom-up proteomics involves breaking proteins down into smaller peptides before mass spectrometry analysis. A typical bottom-up proteomics workflow starts with extracting proteins from biological material and enzymatically digesting them into smaller peptides. Researchers then separate these peptides using liquid chromatography and introduce them into a mass spectrometer for analysis. Inside the mass spectrometer, peptides fragment into smaller ions, and their masses are measured to generate peptide spectra. Scientists use bioinformatics tools to match these spectra to theoretical spectra from protein databases, identifying the original proteins. They can also measure protein abundance across samples using quantitative techniques like label-free quantification or stable isotope labeling. Bottom-up proteomics offers a detailed analysis of the proteome’s complexity, dynamics, and post-translational modifications (PTMs) that are essential for understanding protein function and regulation. These capabilities make bottom-up proteomics a critical tool for biomarker discovery, clinical diagnostics, and studying cellular processes and disease mechanisms. 1. Protein extraction approach Extracting a protein mixture from biological samples is the crucial first step of a bottom-up proteomics workflow. No universal methods exist for this step, requiring researchers to customize their approach depending on the sample type. For cell samples, it is necessary to break down cell membranes using lysis buffer containing detergents and sonication and prevent protease and phosphatase activity by adding enzyme inhibitors. The lysed sample then undergoes protein extraction, typically via precipitation using organic solvents and further purification methods like chromatography and ultrafiltration (1). For biological fluid samples, the presence of high-abundance proteins, like albumin, can interfere with the detection of less abundant proteins. To address this, immunoaffinity methods can selectively remove these high-abundance proteins, enriching the sample for proteins of interest (1). Tissue samples need homogenization to break down their complex structures and release proteins. Homogenization methods include manual grinding, mechanical grinders, bead-beating, sonication, liquid nitrogen pulverization, and pressure cycling homogenizers. Researchers should choose a method based on tissue type and equipment availability (1). The homogenized tissue is then subjected to protein extraction and purification, similar to cell samples. 2. Choice of proteases Trypsin is the most widely used protease in bottom-up proteomics due to its robustness, reliability, and cost effectiveness. It cleaves proteins after lysine or arginine residues, generating small peptides ideal for fragmentation in mass spectrometry. However, exclusively relying on trypsin can limit the view of the proteome as it sometimes produces peptides that are too short for effective mass spectrometry analysis. Proteins with few lysine or arginine residues, such as membrane proteins, and certain PTMs can be challenging to characterize using tryptic digestion alone (2). To overcome these limitations, researchers can supplement trypsin with alternative enzymes to enhance peptide coverage and identification. For instance, chymotrypsin is suitable for proteins with long hydrophobic sections, Lys-C generates longer peptides than trypsin, and Gluc-C is effective for cleaving heavily glycosylated proteins. There are also proteases like Asp-N and Lys-N that are compatible with detergents and high temperatures. Additionally, Arg-C, which cleaves after arginine residues, is useful for characterizing PTMs and increasing proteome coverage (3). By combining these enzymes, researchers can uncover more protein sequences and modifications to comprehensively understand the proteome. 3. Peptide fractionation and separation Prior to mass spectrometry analysis, researchers typically need to reduce the complexity of peptide mixtures to enhance the depth and accuracy of protein identification and quantification. Highperformance liquid chromatography (HPLC), which separates complex mixtures of peptides into smaller, more manageable subsets based on their chemical properties, is an integral step in this process. HPLC methods include size-exclusion chromatography, ion-exchange chromatography, and reverse-phase HPLC, each separating peptides based on characteristics such as peptide size, net charge, or hydrophobicity (4). These methods effectively reduce sample complexity, increase the coverage of complex proteomes, and improve the detection of low-abundance peptides. When selecting and applying these methods, researchers should consider factors such as sample complexity, proteome coverage, instrument compatability, throughput, and sensitivity requirements. 4. Optimizing mass spectrometry methods Choosing the right mass spectrometer is crucial to ensure it meets specific experiment needs, such as sensitivity, resolution, mass range, and speed. High sensitivity is vital for detecting lowabundance peptides, while resolution helps accurately separate closely spaced peaks to enhance peptide identification. Researchers must also select data acquisition strategies that align with their experimental goals. Two common approaches are data-dependent acquisition (DDA) and data-independent acquisition (DIA). DDA dynamically selects the most abundant precursor ions for fragmentation and analysis, offering high sensitivity for detecting low-abundance peptides. In contrast, DIA fragments all precursor ions across a wide mass range, enabling comprehensive data acquisition (5). Each approach has its advantages, requiring careful requiring careful consideration and planning to optimize performance and achieve the desired outcomes. Additionally, researchers should employ appropriate fragmentation methods, such as collision-induced dissociation or higher-energy collisional dissociation, based on peptide characteristics and experimental objectives. 5. Analyzing proteomic data Deriving meaningful biological insights from mass spectrometry data includes peptide and protein identification, quantification, and PTM characterization. Peptide and protein identification typically begin with database searching to convert mass spectra into peptide sequences. Various software tools compare mass spectra data against a protein sequence database to identify peptides and their corresponding proteins. Researchers then score the matches by applying algorithms to assess their quality and remove unreliable matches (4). Other software tools help detect PTMs, such as phosphorylation, acetylation, and glycosylation, by analyzing fragmentation patterns to identify the location of these modifications (6). Additionally, specialized data analysis tools enable protein quantitation for isotopic label-based and label-free quantitation methods, allowing researchers to determine the relative or absolute protein abundance across different samples (5). Towards comprehensive biological insights With ongoing technological advancements enhancing sensitivity, resolution, and throughput, bottom-up proteomics enables deeper insights into the proteome’s complexity and dynamics. Integrating bottom-up proteomics with other omics technologies, such as genomics and metabolomics, provides a more comprehensive understanding of biological systems for identifying novel biomarkers, drug targets, and therapeutic strategies. When planning a bottom-up proteomics experiment, researchers should consider the following aspects. 2 | Bottom-up Proteomics BY YUNING WANG, PHD Top considerations for bottom-up proteomics REFERENCES 1. Duong, V-A. & Lee, H. Bottom-up proteomics: advancements in sample preparation. Int. J. Mol. Sci. 24, 5350 (2023). 2. Miller, R. M. & Smith, L. M. Overview and considerations in bottom-up proteomics. Analyst 148, 475–486 (2023). 3. Tsiatsiani, L. & Heck, A. J. R. Proteomics beyond trypsin. FEBS J. 282, 2612–2626 (2015). 4. Dupree, E. J. et al. A critical review of bottom-up proteomics: the good, the bad, and the future of this field. Proteomes 8, 14 (2020). 5. Jiang, Y. et al. Comprehensive overview of bottom-up proteomics using mass spectrometry. ACS Meas. Sci. Au (2024). doi:10.1021/acsmeasuresciau.3c00068 6. Chalkley, R. J. & Clauser, K. R. Modification site localization scoring: strategies and performance. Mol. Cell. Proteomics 11, 3–14 (2012). 7. Orsburn, B. C. Proteome Discoverer—a community enhanced data processing suite for protein informatics. Proteomes 9, 15 (2021). 8. FragPipe. FragPipe at 9. Röst, H. L. et al. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat. Methods 13, 741–748 (2016). 10. Farag, Y. M., Horro, C., Vaudel, M. & Barsnes, H. PeptideShaker online: a user-friendly web-based framework for the identification of mass spectrometry-based proteomics data. J. Proteome Res. 20, 5419–5423 (2021). 11. Solntsev, S. K., Shortreed, M. R., Frey, B. L. & Smith, L. M. Enhanced global post-translational modification discovery with MetaMorpheus. J. Proteome Res. 17, 1844–1851 (2018). 12. Bern, M., Kil, Y. J. & Becker, C. Byonic: advanced peptide and protein identification software. Curr. Protoc. Bioinforma. 40, 13.20.1-13.20.14 (2012). Abundance Abundance Abundance Name Description Mascot The Mascot search engine identifies proteins from mass spectrometry data by matching experimental data with sequence databases (4). SEQUEST SEQUEST is a mass spectrometry data analysis program that matches experimental spectra to theoretical spectra generated from protein databases (4). MaxQuant This software analyzes large-scale mass spectrometry data, providing tools for label-free and isobaric labeling-based quantitation and PTM analysis (4). Proteome Discoverer This software integrates with various search engines for protein identification and quantification, offering customizable workflows and PTM analysis (7). PEAKS PEAKS is a proteomics software program for protein identification, de novo protein sequencing, and quantification (4). FragPipe This computational mass spectrometry data analysis platform supports peptide identification, label-free and label-based quantitation, and PTM analysis (8). OpenMS OpenMS is an open-source software platform that supports common mass spectrometric data processing tasks (9). Name Description PeptideShaker PeptideShaker is a web-based framework for identifying and visualizing mass spectrometry-based proteomics data from raw data (10). Skyline Skyline is a targeted proteomics software for quantitative analysis, supporting raw data formats from multiple mass spectrometric vendors (5). MetaMorpheus This open-source software program enables the comprehensive analysis of proteomic data from PTM analysis and quantification to spectral annotation and data visualization (11). Byonic The Byonic software allows researchers to identify peptides, proteins, and PTMs by tandem mass spectrometry (12). Scaffold Scaffold is a software tool for visualizing and interpreting proteomics data, offering statistical analysis, quantification, and PTM analysis functionalities (5). Perseus Perseus supports biological and biomedical researchers in interpreting protein quantification, interaction, and PTM data with a comprehensive suite of statistical tools (5). Common bottom-up proteomics data analysis software A typical bottom-up proteomics workflow 3 | Bottom-up Proteomics CELLS PROTEASE PEPTIDES FRACTIONS LIQUID CHROMATOGRAPHY-MASS SPECTROMETRY TISSUE BIOLOGICAL FLUID PROTEIN MIXTURE Bottom-up proteomics involves digesting complex protein mixtures into peptides, enabling efficient and sensitive analysis using mass spectrometry. This method facilitates protein identification and quantification, providing crucial insights into protein function, interactions, and modifications within cells and tissues. PROTEIN EXTRACTION ENZYMATIC DIGESTION PEPTIDE PURIFICATION & FRACTIONATION DATA INTERPRETATION FRAGMENTATION DATA ANALYSIS MASS SPECTROMETRY ANALYSIS Abundance M/Z MS MS/MS PROTEIN IDENTIFICATION PROTEIN QUANTIFICATION PTM CHARACTERIZATION BIOINFORMATICS & DATABASE SEARCH EXPERT ADVICE: Fulfilling the potential of bottom-up proteomics Achieving higher accuracy and depth in bottom-up proteomics involves navigating a multifaceted process with various steps and challenges. INTERVIEWED BY YUNING WANG, PHD L LOYD SMITH, A BIOCHEMIST AT the University of Wisconsin-Madison, known for his extensive expertise in mass spectrometry, began his research journey studying genetics. He invented the first automated DNA sequencer during his postdoctoral research and joined the University of Wisconsin-Madison in 1987, focusing on capillary electrophoresis. The emergence of matrix-assisted laser desorption/ionization and electrospray ionization in the 1990s soon caught his attention. These technologies allowed him to ionize and separate DNA molecules in a mass spectrometer instead of a gel. “During that time, we ran into an issue with analyzing DNA mixtures,” Smith said. “We couldn’t detect larger fragments as well as we could detect smaller ones, which caused bias in the mass spectrometry data.” Recognizing this was caused by instrumental limitations, Smith developed a technique called charge reduction mass spectrometry that solved the issue. His technique was just as effective for analyzing proteins as for analyzing nucleic acids. Since then, Smith shifted his focus to proteomics, pioneering mass spectrometrybased methods that enable the identification of proteoforms, variations of proteins arising from a single gene that play crucial roles in health and disease but are challenging to detect with conventional methods. His team also led the development of advanced software tools to enhance the speed, accuracy, and depth of proteomic analysis. These tools have become instrumental resources for researchers to visualize and interpret complex proteomics data, offering valuable insights into biological systems. Can you explain the basics of bottom-up proteomics? In bottom-up proteomics, we usually want to analyze the whole proteomes of cells. We start with lysing the cells to release their contents, spinning out insoluble materials, and isolating the proteins. We then digest these proteins into smaller peptides using enzymes, usually trypsin and sometimes other enzymes. This step converts the protein mixture into a complex peptide mixture. The next step involves liquid chromatography-mass spectrometry to analyze the peptides. We may include intermediate steps like fractionation, where peptides go through a chromatography column and come out separated from other peptides. They are then ionized via electrospray ionization and enter the mass spectrometer. Mass spectrometry occurs in two steps, known as tandem mass spectrometry. The first step is called MS1, which determines the masses of the peptides as they elute from the column. Specified software then automatically selects the most intense peptide ions for further analysis. In the second step, called MS2, these selected peptide ions fragment inside the mass spectrometer, generating smaller ions that allow us to identify the peptides. What are the advantages and limitations of bottom-up proteomics compared to top-down proteomics? Bottom-up proteomics is more widely used than top-down proteomics, which skips the peptide digestion step, because the molecules are less complex. This means we get simpler spectra that are easier to understand, interpret, and generate. However, the downside of bottom-up proteomics is the loss of the context of what else was in the protein. For example, two different forms of a protein, called proteoforms, might produce the same peptide upon digestion, making it difficult to distinguish between them during the analysis. This loss of context is a key difference between bottom-up and top-down proteomics, where the latter maintains the intact protein, providing more detailed information about the protein’s molecular forms. Lloyd Smith, a biochemist at the University of Wisconsin-Madison, leads an interdisciplinary group of researchers to develop new mass spectrometric technologies for comprehensively identifying and quantifying proteoforms in biological systems. CREDIT: LLOYD SMITH 4 | Bottom-up Proteomics How should researchers select appropriate enzymes for protein digestion? Having more enzymes is generally better. Cutting proteins with trypsin can generate both great peptides and less informative peptides that are too long, too short, or too hydrophobic. Using multiple enzymes can create more overlapping sets of peptides, where one enzyme may help reveal sequences missed by another. This overlap helps us stitch together a more complete sequence. My student Rachel Miller developed a software tool to help researchers select which enzymes to digest their samples. It stimulates the digestion process, predicts the peptide fragments generated from the digestion, and assesses their length, hydrophobicity, and potential protein coverage. This tool helps researchers identify which enzymes will yield optimal results before conducting wet lab experiments, enhancing efficiency without unnecessary experimentation. What else can help improve data quality during sample preparation? One effective way to improve data quality is through multidimensional separation. For example, when dealing with a complex sample, using just reverse-phase liquid chromatography generates a lot of co-eluting peptides, which could lead to missed peptide identifications and errors. In mass spectrometry, more separations mean higher data quality, but also takes more time. So, it’s always a trade-off between time and output, especially with complex samples. What are the approaches for protein identification? In bottom-up proteomics, there are two main approaches. The first and most common approach involves generating theoretical mass spectra of peptides from an in silico proteome and then matching these with experimental spectra. This helps identify the most probable theoretical peptides from the experimental data. John Yates’ group at Scripps Research developed this strategy and wrote a program called SEQUEST in the 1990s, which became the paradigm under which we operate. The other approach, called de novo sequencing, involves identifying proteins without pre-existing sequence knowledge. However, this method carries a higher risk of protein or peptide misidentification. With this approach, researchers often need to combine multiple enzymes with trypsin to gather more data. In bottom-up proteomics, the results are typically probabilistic rather than absolute. When analyzing a large number of molecules, we rarely get a definitive answer. Instead, we obtain results with a high probability of correctness, which we call a false discovery rate. This rate accounts for the likelihood that some identifications may be incorrect. Peptide identification methods allow researchers to make informed guesses about which proteoforms are present in the sample. However, these conclusions are based on the data at hand and are not definitive. What tools do you use to analyze mass spectrometry data? We’ve developed an open source search engine named MetaMorpheus, which we built based on a tool called Morpheus, created by Craig Wenger in the Joshua Coon group at the University of Wisconsin-Madison. Both MetaMorpheus and Morpheus use a scoring function to match experimental and theoretical spectra for peptide identification. It’s very important that software tools are open source because that allows everyone to understand exactly what they’re doing and how they’re doing it. With open source code, others can look at the code and improve it without needing to reinvent it. We made all our source code available on GitHub, and we assist users who encounter problems. We have also implemented industrial-strength software robustness policies to test the code before new releases. Typically, a student leads a project to develop a new idea. They transform the idea into an algorithm and write the code to execute it. We then integrate this new capability into MetaMorpheus. What are the applications of MetaMorpheus? A particularly useful tool in MetaMorpheus is G-PTM-D, which stands for global post-translational modification discovery. This tool helps us identify new post-translational modifications (PTMs). Traditionally, researchers use an algorithm called Variable Modification to find PTMs. This approach requires creating a large database with all possible PTM sites and searching for the best match. This process increases analysis time and error rate due to the large number of incorrect entries in databases. To address this, we implemented a method that creates smaller databases based on just the observed mass shifts. This allows us to search for all modifications simultaneously and identify PTMs without prior knowledge of what they are or their location. In a recent paper published in the Journal of Proteome Research, we used G-PTMD to discover previously unknown modifications in the human immunodefficiency virus capsid and matrix proteins that were biologically meaningful. What are some recent enhancements you’ve made to your data analysis tools? We’ve added many capabilities over time. For example, we’ve developed a tool called FlashLFQ, with LFQ standing for label-free quantitation. We integrated FlashLFQ into MetaMorpheus, allowing it to work with G-PTM-D to quantify modified peptides accurately. Currently, we’re making algorithmic modifications to improve the accuracy of these tools for single cell proteomic data. This is important because single cell proteomic data is often of poor quality, and using standard software designed for high quality data isn’t effective for low quality, noisy data. What advice would you give to researchers new to bottom-up proteomics? One of the first decisions is choosing the chromatography method. Our group specializes in nanoflow chromatography. It’s very sensitive but also finicky. Alternatively, many facilities use microflow chromatography, which employs larger columns and is less sensitive but simpler to set up. The next critical step is mastering the instrument’s software and operation. There are a lot of decisions to make, such as choosing between data-dependent acquisition and data-independent acquisition, each with its advantages and weaknesses. Once data starts accumulating, the challenge shifts to data analysis. Converting spectra into protein lists requires specialized software, and understanding isotopic or label-free quantitation methods is essential for quantitative analysis. Finally, extracting biological insights from identified proteins involves using a whole host of tools for network and pathway analysis and protein-protein interaction studies. Each step requires attention and patience. For beginners, I recommend learning from colleagues or visiting a specialized lab to get hands-on experience. Using user-friendly software, such as MetaMorpheus, can also be beneficial. What are the current challenges and future directions you see in proteomics? One of the key challenges we face is the complexity of our samples. Often, multiple peptide variants co-elute from the chromatographic column and enter the mass spectrometer simultaneously, resulting in messy spectra. Current software typically identifies the most prominent peptide variants, leaving others unidentified. Our goal is to develop software capable of identifying all co-eluting peptide variants, even when they overlap, which could significantly increase the throughput of bottom-up proteomics. That’s an exciting opportunity to advance the field. In broader terms, bottom-up proteomics is relatively mature, benefiting from substantial advancements in instrumentation and methodology over the years. The real frontier in proteomics is top-down proteomics, particularly in proteoform analysis. These areas are exciting and challenging, with many new things to do and a lot of room to improve. This interview has been condensed and edited for clarity. 5 | Bottom-up Proteomics Sam Markovitch and Brian Frey in the Lloyd Smith laboratory conduct bottom-up proteomics to characterize the proteomes of cells. CREDIT: LLOYD SMITH “Our goal is to develop software capable of identifying all co-eluting peptide variants, even when they overlap, which could significantly increase the throughput of bottom-up proteomics.”