URBANA, Ill.—While more than 4 million protein sequences have been identified in various genomes, the functions of more than half of these remain unknown or uncertain. Homology modeling provides some hints, but there are many examples where this method has led to false assumptions. Thus, researchers at the University of Illinois at Urbana-Champaign, University of California, San Francisco, and the Albert Einstein College of Medicine improved the odds by adding in silico docking.
As they describe in Nature Chemical Biology, the researchers examined an unknown protein with 35% sequence identity to the enolase family of enzymes. Using ESI-MS and polarimetry, they found that although the protein could bind the dipeptide targets of enolases, the kinetics were insufficient for the dipeptides to be the natural ligands. When they tested N-succinyl L-amino acids, however, the kinetics were much better.
Using Glide, they then docked a library of dipeptides and N-succinyl L-amino acids into an enolase-based homology model (using Prime), allowing sidechain flexibility because they weren't using an experimental structure. Consistent with the experimental results, top ranked ligands included N-succinyl L-arginine and L-lysine, suggesting that the unknown protein was an N-succinyl L-amino acids racemase, not an enolase.
The researchers suggest "subject to limitations including incompleteness of the ligand library and ligand-specific conformational changes, our approach should be useful for facilitating functional assignments of many unknown proteins for which only sequence information is available."