What is structure-based drug design?
Structure-based drug design involves designing and optimizing new therapeutic agents based on the 3D structures of their biological targets, primarily proteins. This approach seeks to understand the interactions between a drug candidate and its target at the molecular level, allowing for the rational design of drugs that precisely fit into the binding sites of target proteins with optimal binding affinity and specificity.
The structure-based drug design process begins with choosing a target protein, typically a key player in a disease that binds a small molecule to carry out its function. Researchers determine the 3D structure of the target protein using structural biology techniques or computational methods. Leveraging structural data, researchers employ computational techniques to predict how potential drug candidates might interact with the target site. These predictions guide researchers to synthesize new compounds and test them experimentally for their biological activity. Iterative cycles of design, synthesis, and testing refine the compounds to optimize their pharmacological properties. Structure-based drug design has been instrumental in developing a number of successful drugs and continues to be a powerful strategy for discovering new therapeutic agents.
Understanding protein structure
Primary structure
The primary structure of a protein is simply the linear amino acid sequence of the protein’s polypeptide chain. Each protein has its own set of amino acids linked together through peptide bonds in a particular order. The primary structure drives the folding and intramolecular bonding of the amino acid chain, ultimately determining the protein's unique 3D shape.
Secondary structure
The secondary structure of a protein represents the local folding of the polypeptide chain into specific patterns. Secondary structures arise mainly due to hydrogen bonding between amino groups and carboxyl groups on the polypeptide chain.
The most common secondary structures are α helix and β sheet. An α helix is a right-handed helical structure, resembling a spiral staircase when the polypeptide chain turns around itself. Hydrogen bonds form between the oxygen atom of a carbonyl group in one amino acid and the hydrogen atom of an amino group in an amino acid located in the adjacent turn of the helix. A β sheet is a pleated, sheet-like structure where multiple segments of a polypeptide chain lie side by side connected by hydrogen bonds. This arrangement creates a flat, extended sheet.
Tertiary structure
Tertiary structure refers to the 3D arrangement of the polypeptide chain. A protein’s tertiary structure involves the spatial coordination of its secondary structure elements and the interactions between amino acid residues far apart in the primary sequence. Bonds between amino acids’ side chains, including hydrophobic interactions, hydrogen and ionic bonds, and disulfide bridges, contribute to the tertiary structure. These interactions, including hydrogen bonds, disulfide bridges, hydrophobic interactions, and van der Waals forces, contribute to the protein's unique and complex folding.
Quaternary Structure
Many proteins consist of more than one polypeptide chain to perform their functions. The spatial arrangement of all polypeptide chains within a protein is the quaternary structure, and each polypeptide chain is referred to as a subunit. Similar to the tertiary structure, the quaternary structure forms due to different noncovalent interactions and disulfide bonds. These bonds hold the subunits together, orchestrating their arrangement to create a larger, functional protein complex.
Protein domains and motifs
Protein domains are distinct structural and functional units within a protein. These units can fold independently and often exhibit specific functions, such as binding to other molecules or catalyzing chemical reactions. Domains can be thought of as building blocks that combine in various arrangements to form proteins with diverse functions. Examples of domains include the DNA-binding domains found in transcription factors and the kinase domains found in enzymes involved in cellular signaling pathways.
Protein Motifs are short, conserved sequences of amino acids. Unlike domains, motifs are not necessarily independent folding units but rather recurring patterns found among different proteins. These motifs can play critical roles in protein structure, function, and interaction with other molecules. Examples of motifs include the helix-turn-helix motif found in DNA-binding proteins and the zinc finger motif involved in nucleic acid recognition and protein-protein interactions.
Determining target protein structure
Accurately determining the structure of target proteins is a pivotal step in structure-based drug design. Various structural biology techniques allow scientists to uncover proteins’ 3D architectures, gaining essential insights into their functional mechanisms, binding sites, and interactions with potential drug candidates. These techniques encompass experimental methods, including X-ray crystallography, cryogenic electron microscopy (cryo-EM), nuclear magnetic resonance (NMR) spectroscopy, and computational structure modeling tools.
X-ray crystallography
Developed in 1912 (1), X-ray crystallography begins with protein crystallization. Scientists then expose the protein crystal to X-ray beams. The X-ray beams interact with the electrons in the crystal lattice, causing them to diffract in various directions. By measuring the angles and intensities of these diffracted beams, scientists can produce an electron density map, which reveals the spatial arrangement of the protein’s atoms.
As a well-established method, X-ray crystallography accounts for the majority of protein structures in the Protein Data Bank (PDB) archive. The average resolution for X-ray crystallography structures typically ranges from around 1.5 to 3.5 angstroms (Å) (2). X-ray crystallography also provides atomic details of incorporated ligands, inhibitors, ions, and other molecules within the crystal lattice.
However, protein crystallization can be challenging as not all proteins form well-ordered crystals easily, and the process can be time consuming and unpredictable, often requiring scientists to screen hundreds to thousands of different crystallization conditions. Large biological macromolecules, such as membrane proteins, are more difficult to crystallize due to their large size and poor solubilization. Additionally, X-ray crystallography only provides a static snapshot of a protein's structure. It does not capture dynamic movements or conformational changes that may be essential for fully understanding the protein's function (3).
Cryo-EM
Cryo-EM has recently become a popular technique to visualize proteins’ 3D structures at the near-atomic level. This approach involves rapidly freezing protein solutions, which causes the liquid in the solution to form vitreous ice, suspending proteins in their native state. Scientists then use an electron microscope to examine the frozen sample. As the electrons interact with the sample, they generate a series of 2D projection images of proteins from different angles. By using computational algorithms, scientists can reconstruct a 3D density map of the protein. The resolution limit of most cryo-EM maps was restricted to around 5 Å for years, but recent technical improvements in image acquisition and processing have pushed resolutions below 3 Å (4). In 2020, cryo-EM reached an atomic resolution of 1.25 Å (5).
Although the typical resolution of cryo-EM structures is lower than X-ray structures, cryo-EM has distinct advantages. It excels in visualizing large and complex proteins and protein assemblies that may be challenging to crystallize, such as membrane proteins and ribosomes. Cryo-EM is also well-suited for samples with flexibility and structural heterogeneity, where different protein molecules within a sample have different conformations. Cryo-EM can capture multiple conformations and dynamic states, including post-translational modifications.
Cryo-EM also comes with its own challenges. While effective for large proteins, achieving well-resolved structural details for proteins smaller than 100 kDa with cryo-EM can be difficult (6). Cryo-EM is a computationally intensive method, requiring researchers to analyze large datasets during imaging with advanced computational resources. Additionally, the high cost of purchasing, running, and maintaining cryo-EM instruments limits widespread access to this technique.
NMR spectroscopy
Since the first NMR-determined protein structure in 1985 (7), NMR has become another major contributor to structural biology. NMR spectroscopy is a non-destructive analytical technique to determine the 3D structures of proteins in solution. Unlike X-ray crystallography and cryo-EM, which require protein crystallization or freezing, NMR spectroscopy evaluates proteins in their native state under physiological conditions.
In NMR spectroscopy, protein samples are placed in a strong magnetic field. When subjected to radiofrequency pulses, the nuclei of certain atoms within the protein, such as hydrogen and carbon, absorb and re-emit energy at distinct frequencies, which are converted into peaks in a spectrum. The positions and intensities of these peaks provide information about the chemical environment and interactions experienced by the nuclei. By analyzing the NMR spectrum, scientists can deduce the structural and dynamic properties of the proteins in the sample.
NMR spectroscopy can provide information on protein dynamics and flexibility, allowing scientists to study how proteins move and interact with other molecules in solution. This is particularly useful for understanding the movements and conformational changes that proteins undergo during their biological functions and characterizing transient conformations implicated in the disease onset (8). NMR is also suitable for studying protein-ligand interactions, making it an essential tool in studying targets’ interactions with potential drug candidates.
However, NMR spectroscopy is less suitable for proteins and protein complexes larger than 50 kDa, as the spectra can become crowded and difficult to interpret. Sample preparation and data acquisition can be time consuming, and NMR experiments are often sensitive to protein concentration, purity, and stability.
Comparison of main protein structure determination techniques
Aspect | X-ray crystallography | Cryo-EM | NMR spectroscopy |
Resolution | High, can achieve resolutions below 3 Å | Variable, often around 3.5 Å, challenging below 3 Å | Medium to high (2.5–4.0 Å) |
Protein amount required | Requires large amounts of protein | Requires small amounts of protein | Requires large amounts of pure protein |
Sample preparation | Requires protein crystallization | Requires protein vitrification | Requires preparation of isotopically labeled samples |
Throughput | High throughput; can screen compounds rapidly | Lower throughput; slower screening process | Moderate throughput |
Suitable samples | Small to medium-sized crystallizable proteins and protein complexes | Large, dynamic proteins and protein complexes (>100 kD), such as membrane proteins and ribosomes | Soluble proteins smaller than 50 kD |
Structural flexibility | Limited to static structures | Can study proteins in various conformations | Provides dynamic structural information |
Computational protein structure modeling
Computational protein structure prediction methods are particularly valuable when experimental structures of the target protein are unavailable or difficult to obtain. Leveraging these methods, researchers can identify potential drug targets, predict ligand binding sites, optimize lead compounds, predict off-target interactions, and conduct virtual screening to identify promising drug candidates.
Homology modeling
Homology modeling, also known as comparative modeling, predicts a protein’s 3D structure based on its amino acid sequence and the known structures of homologous proteins, evolutionarily related proteins that share a common origin. Homology modeling relies on the principle that proteins with similar amino acid sequences tend to adopt similar 3D structures and perform similar biological functions.
In homology modeling, scientists first identify template structures similar to their target sequence from databases. They then align the amino acid sequence of the target protein with those of the templates, mapping corresponding residues using sequence alignment tools. Guided by this alignment, scientists then generate a 3D model for the target sequence by transferring atom coordinates from the template structure to the corresponding positions in the target sequence. This also involves building loops and regions of the target protein that are not present or conserved in the template structure. Various algorithms and software tools are available to perform this process efficiently (9).
After building the initial protein model, it undergoes refinement to optimize its geometry and stereochemistry. This refinement typically includes energy minimization to remove errors in the model's atomic positions and improve its stability. Finally, scientists assess the quality of the homology model using validation tools that evaluate its stereochemical properties, structural integrity, and the compatibility of the model's 3D structure with its amino acid sequence (9).
Protein threading
Protein threading, also known as fold recognition, is a computational approach to model proteins that share the same fold as proteins with known structures. Unlike homology modeling, which relies on homologous protein structures available in databases, protein threading is used for proteins that do not have close homologs with known structures. This method utilizes statistical knowledge of the relationship between structures and sequences to predict protein structure.
In protein threading, scientists first curate a collection of structure templates from databases. They then assess the compatibility between the target sequence and each structure template using threading algorithms. These threading algorithms align the target sequence with the structure template by taking into account various factors such as sequence similarity, residue-residue interactions, and physicochemical properties, assigning a score for each alignment (10).
After finding a statistically acceptable alignment, scientists create a structure model for the target protein by placing its backbone atoms in line with their corresponding positions in the selected structure template. This preliminary model undergoes additional refinement and validation using experimental methods or additional computational analysis.
Ab initio modeling
Ab initio modeling, also known as de novo protein structure prediction, refers to the computational method that predicts a protein’s 3D structure from its amino acid sequence without using any template structure. The term "ab intitio" is Latin for "from the beginning," reflecting the method’s approach of building models based on fundamental physical and chemical principles of proteins.
The ab initio modeling process starts with generating a vast number of possible conformations, known as structure decoys derived from the primary sequence. Researchers build sophisticated computational energy functions to simulate interactions within the protein, including bond interactions, angle interactions, dihedral angles, and non-bonded interactions such as van der Waals and electrostatic forces. These energy functions evaluate the stability of each structure decoy by calculating its energy, identifying the protein conformation with the lowest possible energy, indicative of the most stable structure (11). The best candidate structures undergo further refinement through detailed energy calculations and molecular dynamics simulations. These simulations allow researchers to explore the protein folding process in greater detail, providing insights into the protein’s dynamic behavior and its final stable configuration.
AI-based protein structure prediction
In recent years, AI algorithms, particularly those leveraging machine learning and deep learning, have emerged as effective tools for protein structure prediction with high speed and accuracy. These algorithms are trained on large datasets of known protein structures, allowing them to identify patterns and predict the most probable 3D protein configurations based on amino acid sequences.
A notable example is AlphaFold developed by DeepMind, an AI research laboratory that uses a neural network architecture to predict protein structures by integrating information about protein sequences and known structures (12). It compares the input sequence with sequences of known proteins from a large database to find patterns and similarities. The neural network processes these patterns to predict how the protein might fold. The system then refines the predicted structure iteratively, cycling through the neural network architecture to produce an accurate 3D model of the protein. The resulting 3D structures are often as accurate as those obtained from experimental methods like X-ray crystallography or cryo-EM (13).
Identifying ligand binding sites
After obtaining the structural information of target proteins, scientists identify and analyze their binding pockets, which are regions on the surface of proteins where small molecules, or ligands, can attach. The structure of these sites, including their shape, size, and chemical properties, dictates which molecules can bind to them and how tightly they bind. Understanding these binding sites is crucial for designing molecules that can selectively modulate protein function, either by enhancing or inhibiting it.
One of the primary techniques involves visual inspection and molecular graphics software. Researchers visually inspect the 3D structure of the protein using molecular graphics software to identify grooves, pockets, and cavities that could serve as potential binding sites. These software tools allow for detailed visualization and manipulation of molecular structures, making it easier to identify ligand binding regions.
In addition to visual inspection, computational methods are pivotal in identifying binding sites. Several structure-based prediction platforms involve geometric measurements within protein structures to locate surface hollows or cavities or placing virtual probes on the protein surface to estimate energy potentials between the probes and cavities. Some methods utilize database searches for proteins with known ligand binding sites, employing structure alignment algorithms to transfer these sites onto the query protein (14).
Hit discovery
After identifying the binding sites on target proteins, the next crucial step is to discover potential molecules that can interact with these sites effectively. This phase, known as hit discovery, involves a combination of computational and experimental techniques to screen and identify small molecules that can bind to the target proteins with high affinity and specificity.
Virtual screening
To identify potential drug candidates, researchers first curate large libraries of small molecules for screening against therapeutic targets. These libraries may be commercially available, custom-made, or derived from natural products. Creating a compound library involves selecting a diverse array of compounds with different chemical structures and properties to maximize the chances of finding a suitable ligand. Each compound in the library is represented by its 3D structure, and researchers generate multiple conformations of each compound to account for molecular flexibility.
The core of virtual screening involves molecular docking, where each compound in the library is virtually "docked" into the binding site of the target protein. Numerous docking programs facilitate this process by predicting the compound's orientation, position, and conformation within the binding site. These programs systematically explore potential conformations to generate low-energy structures and employ molecular dynamics simulations to optimize the ligand's orientation and assess the stability of the resulting ligand-protein complex (15).
Following the generation of potential binding poses, docking programs use scoring functions to estimate the free energy of the binding and score these poses. Researchers rank compounds based on docking scores and visually inspect top candidates to ensure reasonable interactions with the target protein (15). Based on the docking scores and binding modes, researchers select a shortlist of potential hits. They then synthesize or procure these selected hits for experimental validation through biochemical assays to confirm their binding affinity and biological activity against the target. The results from experimental validation provide feedback for further optimization.
De novo drug design
De novo drug design involves designing entirely new molecules that fit specific binding sites on target proteins using computational algorithms without relying on pre-existing templates. This approach is valuable when existing compounds fail to meet requirements or when researchers aim to develop drugs with unique properties. Compared to virtual screening, de novo drug design enables researchers to explore a broader range of chemical structures efficiently.
Various types of de novo drug design algorithms help scientists generate potential molecular structures that can interact favorably with the target protein's binding site. For example, evolutionary algorithms mimic biological evolution to optimize populations of molecular structures. Each structure undergoes iterative cycles of mutation and crossover to create new generations of molecules. A fitness function evaluates each molecule's performance, and those with higher fitness are selected to produce the next generation. This iterative process continues until predefined termination criteria are met (16).
In contrast, AI methods such as deep reinforcement learning employ data-driven approaches in drug design. Deep reinforcement learning works by training a neural network on data from existing molecules known to interact with biological targets. This network learns to optimize molecular designs based on feedback received through a predefined function, which evaluates properties crucial for drug efficacy and safety, such as binding affinity and drug likeness (16).
By utilizing virtual screening and de novo drug design methods, researchers can identify promising drug candidates capable of modulating the function of target proteins. They then synthesize these candidates and conduct detailed biochemical and biophysical studies to confirm drug candidates’ crucialcharacteristics, such as binding affinity, specificity, and mode of action against the target protein. Following this validation, researchers enter the lead optimization phase, where they iteratively refine the chemical structure of promising compounds to enhance their efficacy, selectivity, potency, and pharmacokinetic properties. This process integrates computational modeling with experimental validation to develop lead compounds that exhibit optimal efficacy and safety profiles, preparing them for nonclinical and clinical development.
What’s on the horizon for structure-based drug design?
Structure-based drug design is evolving rapidly with a wave of groundbreaking advancements. Cryo-EM is expanding the scope of structure-based drug design by enabling the visualization of complex protein structures, facilitating the design of drugs targeting previously inaccessible molecular targets. The integration of AI and machine learning is enhancing the prediction and optimization of drug-target interactions, altering how researchers identify and design potential therapeutics. Additionally, Advances in computational methods, including molecular docking and dynamics simulations, are improving the accuracy and efficiency of structure-based drug design approaches. These developments are paving the way for more efficacious and precisely tailored therapies for a spectrum of diseases in the years ahead.
References
- Bragg, W. L. The Specular Reflection of X-rays. Nature 90, 410–410 (1912).
- Krishnan, V. & Rupp, B. in Encyclopedia of Life Sciences (John Wiley & Sons, Ltd, 2012). doi:10.1002/9780470015902.a0002716.pub2
- Zheng, H. et al. X-ray crystallography over the past decade for novel drug discovery – where are we heading next? Expert Opin Drug Discov 10, 975–989 (2015).
- Renaud, J.-P. et al. Cryo-EM in drug discovery: achievements, limitations and prospects. Nat Rev Drug Discov 17, 471–492 (2018).
- Nakane, T. et al. Single-particle cryo-EM at atomic resolution. Nature 587, 152–156 (2020).
- Kühlbrandt, W. The Resolution Revolution. Science 343, 1443–1444 (2014).
- Williamson, M. P., Havel, T. F. & Wüthrich, K. Solution conformation of proteinase inhibitor IIA from bull seminal plasma by 1H nuclear magnetic resonance and distance geometry. Journal of Molecular Biology 182, 295–315 (1985).
- Alderson, T. R. & Kay, L. E. NMR spectroscopy captures the essential role of dynamics in regulating biomolecular function. Cell 184, 577–595 (2021).
- Muhammed, M. T. & Aki-Yalcin, E. Homology modeling in drug discovery: Overview, current applications, and future perspectives. Chemical Biology & Drug Design 93, 12–20 (2019).
- Majumder, P. in Statistical Modelling and Machine Learning Principles for Bioinformatics Techniques, Tools, and Applications (eds. Srinivasa, K. G., Siddesh, G. M. & Manisekhar, S. R.) 119–133 (Springer, 2020). doi:10.1007/978-981-15-2445-5_8
- Lee, J., Freddolino, P. L. & Zhang, Y. in From Protein Structure to Function with Bioinformatics (ed. J. Rigden, D.) 3–35 (Springer Netherlands, 2017). doi:10.1007/978-94-024-1069-3_1
- Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
- Callaway, E. ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures. Nature 588, 203–204 (2020).
- Zhao, J., Cao, Y. & Zhang, L. Exploring the computational methods for protein-ligand binding site prediction. Computational and Structural Biotechnology Journal 18, 417–426 (2020).
- Lionta, E., Spyrou, G., Vassilatis, D. K. & Cournia, Z. Structure-Based Virtual Screening for Drug Discovery: Principles, Applications and Recent Advances. Current Topics in Medicinal Chemistry 14, 1923–1938
- Mouchlis, V. D. et al. Advances in De Novo Drug Design: From Conventional to Machine Learning Methods. International Journal of Molecular Sciences 22, 1676 (2021).