Status of GPCR modeling and docking as reflected by community wide GPCR Dock 2010 assessment. Structure, Volume 19, Issue 8, 10 August 2011, Pages 1108-1126.
Irina Kufareva1, Manuel Rueda1, Vsevolod Katritch1,2, GPCR Dock 2010 participants, Raymond C. Stevens3, Ruben Abagyan1,2.
1 UCSD Skaggs School of Pharmacy and Pharmaceutical Sciences, La Jolla, CA, 92039, USA.
2 San Diego Supercomputer Center, La Jolla, CA, 92039, USA.
3 Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA.
The work was partially supported by NIH grants R01 GM 071872 and U01 GM094612 to RA and U54 GM094618 to RCS.
The present round of the assessment, GPCR Dock 2010, represents three distinct classes and three levels of difficulty: (i) D3/eticlopride: a small molecule in a small pocket with two good templates; (ii) CXCR4/IT1t: a small molecule in a large pocket designed for peptide binding with more distant templates; and (iii) CXCR4/CVX15: the first peptide-analogue in a pocket with relatively distant templates. For these three assessments, 117, 103, and 55 unique interpretable models were submitted by 32, 25, and 19 groups, respectively. The results of the three classes are shown as independent assesments.
This section contains a short description of the parameters displayed in the columns in the table of results. Please, use the manuscript for a more comprehensive description.
Group ID: We referred to participating groups by the abbreviated name of their institution rather than by names. Because some institutions host more than one group, we developed a group nomenclature in which the institution name is followed by the 4-digit group ID assigned on registration.
Model: The rank of the model according to authors (from 1 to 5)
TM backbone RMSD: The protein molecule of each model was superimposed onto the backbone CA, C, and N atoms of the TM helices of the reference template. TM regions were defined by residue stretches 1.30-1.60, 2.37-2.66, 3.22-3.54, 4.38-4.61, 5.37-5.64, 6.28-6.60, and 7.31-7.55 in Ballesteros-Weinstein notation (in this notation, a single most conserved residue among the class A GPCRs is designated x.50, where x is the TM helix number; all other residues on that helix are numbered relative to this conserved position. Superimposition was performed using an adaptive algorithm that iteratively finds the region of higher similarity by assigning distance-dependent Gaussian weights to deviating fragments of the structure. Application of this algorithm ensured that the superimposition quality was not dominated by a single flexible and/or poorly predicted part, e.g. one deviating part of a helix.
Fraction TM superimposed: The fraction of TM bundle for which high-quality superimposition was found (< 2 Å RMSD) and the corresponding partial RMSD were also reported.
ECL2 backbone RMSD: With the same superimposition of TM helices, we calculated RMSD of backbone atoms of the model's ECL2 from that of the template. We chose to focus on ECL2 rather than on all extracellular parts of the protein because of its size and the critical role it plays in ligand binding for many GPCRs. ECL2 was defined by residues F171-N185 in D3 and by residues A174-E179, R183-N192 in CXCR4. The tip of ECL2 b-hairpin (residues A180, D181, and D182) was omitted from ECL2 comparison for CXCR4 because this region was disordered in the majority of reference templates, and was the most flexible in others as demonstrated by its structural variability and high B-factor values.
TM and pocket residue RMSD:
Similarity of the pocket residue conformations was evaluated by measuring RMSD between the heavy-atoms of the residues that constituted the bin
ding pockets in the reference templates. The sets of reference pocket residues included:
- D3/eticlopride complex: F106, D110, V111, C114, I183, V189, S192, S193, W342, F345, F346, H349, Y365, T369, and Y373 (15 residues: 14 in TM domain and one in ECL2) - CXCR4/IT1t complex: W94, D97, A98, W102, V112, H113, Y116, R183, I185, C186, D187, R188, and E288 (13 residues: 7 in TM domain and 6 in extracellular loops) - CXCR4/CVX15 complex: P27, H113, Y116, T117, D171, S178, C186, D187, R188, F189, Y190, P191, N192, D193, V196, F199, Q200, Y255, D262, I265, L266, E277, H281, I284, S285, an d E288 (12 TM domain residues and 14 extracellular loop residues)
The optimal superimposition of TM domains was performed prior to the binding pocket comparison as described above. Residue symmetry was taken into account when calculating pocket RMSD.
TM2 rotation: To assess the extent of rotation in helix II, the TM domain of each model was superimposed onto the template as described above; the model was then translated in space to ensure the optimal overlay of the helical axis of the top part of its helix II with the corresponding axis in the template. The two angles were then measured: one angle between the projections of W94 C atoms onto the plane perpendicular to the helical axes, and another angle between the projections of D97 C atoms.
Fraction predicted pocket area: Similarity of the pocket residue content was assessed by calculating and comparing the residue backbone and side-chain surface areas that become solvent-inaccessible in the presence of the ligand in the reference structures and in the models
Ligand heavy atom RMSD: RMSD of the ligand non-hydrogen atoms from their respective counterparts in the crystallographic structure was determined after superimposition of the model onto the reference template as described above. Internal ligand symmetry was taken into account for RMSD definition as well as other calculations. For example, for the isothiourea IT1t molecule co-crystallized with CXCR4, as many as 16 atom permutations are possible that result in exactly the same ligand covalent geometry and bond topology; all of these were tested and the one with the smallest RMSD to the model was chosen.
Atom (residue) contacts:The number of contacts that are the same between the reference template and the model is calculated and compared to the total number of ligand-protein contacts in the reference template (recall) or in the model (precision). As with ligand RMSD, calculation of atomic contacts requires enumeration of topologically equivalent atom permutations in the ligand; moreover, some amino acids also possess internal symmetry that should be taken into account. Treating side-chains symmetry in the same way as ligand symmetry is possible, but it quickly leads to combinatorial explosion of the total number of permutations in the system. For this reason, and because the "wingspan" of symmetric groups in the protein side-chains is limited by three heavy-atoms, we accounted for side chain symmetry by considering symmetric atoms as indistinguishable instead of explicitly enumerating them.
Contact strength: We refined the definition of an atomic contact in an attempt to make it more robust and continuous. Instead of using a "hard" distance cutoff and counting a contact as present (1) for interatomic distances below this cutoff, and as absent (0) for the distances above this cutoff, we designed a continuous contact strength function that gradually decreased from 1 to 0 within a specified distance margin.
Z-scores: Z-scores were calculated in the spirit of the previous assessment, GPCR Dock 2008. Ligand RMSD values and fractions of correctly predicted ligand-protein contacts were independently converted into Z-scores (the opposite of RMSD, Z-score was taken so that higher values correspond to better models in all cases); the two Z-scores were averaged. The new mean and standard deviation were calculated excluding the low-scoring models that deviated from the old mean by more than two standard deviations (SD), and new Z-scores were found using this corrected mean and SD. In cases of CXCR4/IT1t and D3/eticlopride, for which multiple reference templates were available, the template resulting in the best Z-score was chosen for each model. A similar algorithm was used for assessment of protein prediction accuracy based on TM and ECL2 backbone RMSD
For each model created an interactive HTML page on which the 3D structures of the complexes can be visualized with the help of ActiveICM (see Links section on top of the page).
Abagyan, R., Batalov, S., Cardozo, T., Totrov, M., Webber, J., and Zhou, Y. (1997). Homology modeling with internal coordinate mechanics: deformation zone mapping and improvements of models via conformational search. Proteins Suppl 1, 29-37.
Abagyan, R., and Totrov, M. (1994). Biased probability Monte Carlo conformational searches and electrostatic calculations for peptides and proteins. J Mol Biol 235, 983-1002.
Abel, R., Young, T., Farid, R., Berne, B.J., and Friesner, R.A. (2008). Role of the Active-Site Solvent in the Thermodynamics of Factor Xa Ligand Binding. Journal of the American Chemical Society 130, 2817-2831.
Arnold, K., Bordoli, L., Kopp, J.r., and Schwede, T. (2006). The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22, 195-201.
Barth, P., Schonbrun, J., and Baker, D. (2007). Toward high-resolution prediction and design of transmembrane helical protein structures. Proc Natl Acad Sci U S A 104, 15682-15687.
Bhattacharya, S., and Vaidehi, N. (2010). Computational mapping of the conformational transitions in agonist selective pathways of a G-protein coupled receptor. J Am Chem Soc 132, 5205-5214.
Bissantz, C., Logean, A., and Rognan, D. (2004). High-throughput modeling of human G-protein coupled receptors: amino acid sequence alignment, three-dimensional model building, and receptor library screening. J Chem Inf Comput Sci 44, 1162-1176.
Bostrom, J., Greenwood, J.R., and Gottfries, J. (2003). Assessing the performance of OMEGA with respect to retrieving bioactive conformations. J Mol Graph Model 21, 449-462.
Bray, J.K., Abrol, R., and Goddard, r., W.A (2010). Method to predict a physiological ensemble of conformations for GPCRs. To be published.
Brylinski, M., and Skolnick, J. (2008). Q-Dock: Low-resolution flexible ligand docking with pocket-specific threading restraints. J Comput Chem 29, 1574-1588.
Case, D.A., Cheatham, T.E., 3rd, Darden, T., Gohlke, H., Luo, R., Merz, K.M., Jr., Onufriev, A., Simmerling, C., Wang, B., and Woods, R.J. (2005). The Amber biomolecular simulation programs. J Comput Chem 26, 1668-1688.
Chai, B.-X., Pogozheva, I.D., Lai, Y.-M., Li, J.-Y., Neubig, R.R., Mosberg, H.I., and Gantz, I. (2005). Receptor-Antagonist Interactions in the Complexes of Agouti and Agouti-Related Protein with Human Melanocortin 1 and 4 Receptors. Biochemistry 44, 3418-3431.
Chen, R., Li, L., and Weng, Z. (2003). ZDOCK: an initial-stage protein-docking algorithm. Proteins 52, 80-87.
Chen, V.B., Arendall, W.B., 3rd, Headd, J.J., Keedy, D.A., Immormino, R.M., Kapral, G.J., Murray, L.W., Richardson, J.S., and Richardson, D.C. (2010). MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr 66, 12-21.
Davis, I.W., and Baker, D. (2009). RosettaLigand docking with full ligand and receptor flexibility. J Mol Biol 385, 381-392.
de Graaf, C., and Rognan, D. (2009). Customizing G Protein-coupled receptor models for structure-based virtual screening. Curr Pharm Des 15, 4026-4048.
DePristo, M.A., de Bakker, P.I.W., Lovell, S.C., and Blundell, T.L. (2003). Ab initio construction of polypeptide fragments: Efficient generation of accurate, representative ensembles. Proteins: Structure, Function, and Bioinformatics 51, 41-55.
Ding, F., Tsao, D., Nie, H., and Dokholyan, N.V. (2008). Ab initio folding of proteins with all-atom discrete molecular dynamics. Structure 16, 1010-1018.
Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32, 1792-1797.
Eswar, N., Webb, B., Marti-Renom, M.A., Madhusudhan, M.S., Eramian, D., Shen, M.Y., Pieper, U., and Sali, A. (2006). Comparative protein structure modeling using Modeller. Curr Protoc Bioinformatics Chapter 5, Unit 5 6.
Ginalski, K., Elofsson, A., Fischer, D., and Rychlewski, L. (2003). 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics 19, 1015-1018.
Guntert, P., and Wuthrich, K. (1991). Improved efficiency of protein structure calculations from NMR data using the program DIANA with redundant dihedral angle constraints. J Biomol NMR 1, 447-456.
Jain, A.N. (2003). Surflex: A Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine. J Med Chem 46, 499-511.
Krieger, E., Joo, K., Lee, J., Lee, J., Raman, S., Thompson, J., Tyka, M., Baker, D., and Karplus, K. (2009). Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: Four approaches that performed well in CASP8. Proteins 77 Suppl 9, 114-122.
Krieger, E., Koraimann, G., and Vriend, G. (2002). Increasing the precision of comparative models with YASARA NOVA--a self-parameterizing force field. Proteins 47, 393-402.
Lang, P.T., Brozell, S.R., Mukherjee, S., Pettersen, E.F., Meng, E.C., Thomas, V., Rizzo, R.C., Case, D.A., James, T.L., and Kuntz, I.D. (2009). DOCK 6: combining techniques to model RNA-small molecule complexes. RNA 15, 1219-1230.
Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., et al. (2007). Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947-2948.
Laskowski, R.A., MacArthur, M.W., Moss, D.S., and Thornton, J.M. (1993). PROCHECK: a program to check the stereochemical quality of protein structures. Journal of Applied Crystallography 26, 283-291.
Lee, H., and Zhang, Y. (2010). BSP-SLIM: A Blind Low-Resolution Ligand-Protein Docking Approach Using Theoretically Predicted Protein Structures. submitted.
Li, Y.Y., Hou, T.J., and Goddard, W.A., 3rd (2010). Computational modeling of structure-function of g protein-coupled receptors with applications for drug design. Curr Med Chem 17, 1167-1180.
Lu, M., Dousis, A.D., and Ma, J. (2008). OPUS-PSP: An Orientation-dependent Statistical All-atom Potential Derived from Side-chain Packing. Journal of Molecular Biology 376, 288-301.
Morris, G.M., Huey, R., Lindstrom, W., Sanner, M.F., Belew, R.K., Goodsell, D.S., and Olson, A.J. (2009). AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J Comput Chem 30, 2785-2791.
Nabuurs, S.B., Wagener, M., and de Vlieg, J. (2007). A flexible approach to induced fit docking. J Med Chem 50, 6507-6518.
Nikiforovich, G.V., Marshall, G.R., and Baranski, T.J. (2008). Modeling molecular mechanisms of binding of the anaphylatoxin C5a to the C5a receptor. Biochemistry 47, 3117-3130.
Nikiforovich, G.V., Taylor, C.M., Marshall, G.R., and Baranski, T.J. (2010). Modeling the possible conformations of the extracellular loops in G-protein-coupled receptors. Proteins: Structure, Function, and Bioinformatics 78, 271-285.
Pei, J., Kim, B.-H., and Grishin, N.V. (2008). PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res 36, 2295-2300.
Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R.D., Kale, L., and Schulten, K. (2005). Scalable molecular dynamics with NAMD. J Comput Chem 26, 1781-1802.
Pogozheva, I.D., Chai, B.-X., Lomize, A.L., Fong, T.M., Weinberg, D.H., Nargund, R.P., Mulholland, M.W., Gantz, I., and Mosberg, H.I. (2005). Interactions of Human Melanocortin 4 Receptor with Nonpeptide and Peptide Agonists. Biochemistry 44, 11329-11341.
Ramachandran, S., Kota, P., Ding, F., and Dokholyan, N.V. (2010). Automated minimization of steric clashes in protein structures. Proteins: Structure, Function, and Bioinformatics 79, 261-270.
Seeliger, D., Haas, J., and de Groot, B.L. (2007). Geometry-Based Sampling of Conformational Transitions in Proteins. Structure 15, 1482-1492.
Shen, M.-y., and Sali, A. (2006). Statistical potential for assessment and prediction of protein structures. Protein Science 15, 2507-2524.
Sherman, W., Day, T., Jacobson, M.P., Friesner, R.A., and Farid, R. (2005). Novel Procedure for Modeling Ligand/Receptor Induced Fit Effects. J Med Chem 49, 534-553.
Tovchigrechko, A., and Vakser, I.A. (2006). GRAMM-X public web server for protein-protein docking. Nucleic Acids Res 34, W310-314.
Trabanino, R.J., Hall, S.E., Vaidehi, N., Floriano, W.B., Kam, V.W., and Goddard, W.A., 3rd (2004). First principles predictions of the structure and function of g-protein-coupled receptors: validation for bovine rhodopsin. Biophys J 86, 1904-1921.
Trott, O., and Olson, A.J. (2010). AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31, 455-461.
Van Der Spoel, D., Lindahl, E., Hess, B., Groenhof, G., Mark, A.E., and Berendsen, H.J. (2005). GROMACS: fast, flexible, and free. J Comput Chem 26, 1701-1718.
Verdonk, M.L., Cole, J.C., Hartshorn, M.J., Murray, C.W., and Taylor, R.D. (2003). Improved protein-ligand docking using GOLD. Proteins 52, 609-623.
Vorobjev, Y.N. (2010). Blind docking method combining search of low-resolution binding sites with ligand pose refinement by molecular dynamics-based global optimization. J Comput Chem 31, 1080-1092.
Wallner, B.r., Larsson, P., and Elofsson, A. (2007). Pcons.net: protein structure prediction meta server. Nucleic Acids Research 35, W369-W374.
Wu, S., and Zhang, Y. (2007). LOMETS: a local meta-threading-server for protein structure prediction. Nucleic Acids Res 35, 3375-3382.
Yang, Q., and Sharp, K.A. (2008). Building alternate protein structures using the elastic network model. Proteins 74, 682-700.
Yin, S., Biedermannova, L., Vondrasek, J., and Dokholyan, N.V. (2008). MedusaScore: an accurate force field-based scoring function for virtual drug screening. J Chem Inf Model 48, 1656-1662.
Zhang, J., and Zhang, Y. High-resolution protein structure refinement using fragment guided molecular dynamics simulations. in preparation.
Zhang, Y. (2008). I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 9, 40.
Zhang, Y., and Skolnick, J. (2004). SPICKER: a clustering approach to identify near-native protein folds. J Comput Chem 25, 865-871.
Zhang, Y., and Skolnick, J. (2005). TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33, 2302-2309.
Zhou, H., and Zhou, Y. (2005). Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins 58, 321-328.
Zhu, K., Pincus, L.D., Zhao, S., and Friesner, R.A. (2006). Long loop prediction using the protein local optimization program. Proteins 65, 438-452.
(2005a). MacroModel. (New York, Schrödinger LLC).
(2005b). QUANTA. (San Diego, Accelrys Inc.).
(2007). Prime. (New York, NY, Schrödinger LLC).
(2008). SYBYL. (St. Louis, MO, Tripos-International).
(2010a). Desmond Molecular Dynamics System. (New York, D. E. Shaw Research).
(2010b). Discovery Studio. (San Diego, Accelrys Inc.).
(2010c). Glide. (New York, Schrödinger LLC).
(2010d). Maestro-Desmond Interoperability Tools. (New York, Schrödinger LLC).
(2010e). MOE (The Molecular Operating Environment). (1010 Sherbrooke Street West, Suite 910, Montreal, Canada H3A2R7, Chemical Computing Group Inc.).
(2010f). Phase. (New York, Schrödinger LLC).