DOI: 10.1110/ps.036061.108. How well can the accuracy of comparative protein structure models be predicted?

How well can the accuracy of comparative protein structure models be predicted?

10.1110/ps.036061.108

Crossref journal-article

Wiley

Protein Science (311)

Abstract

AbstractComparative structure models are available for two orders of magnitude more protein sequences than are experimentally determined structures. These models, however, suffer from two limitations that experimentally determined structures do not: They frequently contain significant errors, and their accuracy cannot be readily assessed. We have addressed the latter limitation by developing a protocol optimized specifically for predicting the Cα root‐mean‐squared deviation (RMSD) and native overlap (NO3.5Å) errors of a model in the absence of its native structure. In contrast to most traditional assessment scores that merely predict one model is more accurate than others, this approach quantifies the error in an absolute sense, thus helping to determine whether or not the model is suitable for intended applications. The assessment relies on a model‐specific scoring function constructed by a support vector machine. This regression optimizes the weights of up to nine features, including various sequence similarity measures and statistical potentials, extracted from a tailored training set of models unique to the model being assessed: If possible, we use similarly sized models with the same fold; otherwise, we use similarly sized models with the same secondary structure composition. This protocol predicts the RMSD and NO3.5Å errors for a diverse set of 580,317 comparative models of 6174 sequences with correlation coefficients (r) of 0.84 and 0.86, respectively, to the actual errors. This scoring function achieves the best correlation compared to 13 other tested assessment criteria that achieved correlations ranging from 0.35 to 0.71.

Bibliography

Eramian, D., Eswar, N., Shen, M., & Sali, A. (2008). How well can the accuracy of comparative protein structure models be predicted? Protein Science, 17(11), 1881â1893. Portico.

Authors 4

David Eramian (first)
Narayanan Eswar (additional)
Min‐Yi Shen (additional)
Andrej Sali (additional)

References 71 Referenced 123

{'key': 'e_1_2_6_2_1', 'first-page': '1650', 'article-title': 'ROC‐curve analysis. A statistical method for the evaluation of diagnostic tests', 'volume': '152', 'author': 'Albeck M.J.', 'year': '1990', 'journal-title': 'Ugeskr. Laeger'} / Ugeskr. Laeger / ROC‐curve analysis. A statistical method for the evaluation of diagnostic tests by Albeck M.J. (1990)
10.1093/nar/25.17.3389
10.1093/nar/gkh039
10.1093/nar/gkh131
10.1126/science.1065659
10.1093/nar/28.1.235
10.1021/bi048252q
10.1126/science.1113801
10.1515/BC.2005.041
10.1016/j.str.2004.05.018
10.1093/protein/gzi019
10.1002/j.1460-2075.1986.tb04288.x
10.1016/j.jmb.2006.08.035
10.1210/me.2004-0435
10.1002/(SICI)1097-0134(1999)37:3 <112::AID-PROT15>3.0.CO;2-R
10.1110/ps.062095806
10.1093/nar/gkg543
{'key': 'e_1_2_6_19_1', 'first-page': 'Unit 2.9', 'article-title': 'Comparative protein structure modeling using MODELLER', 'author': 'Eswar N.', 'year': '2007', 'journal-title': 'Curr. Protoc. Protein Sci.'} / Curr. Protoc. Protein Sci. / Comparative protein structure modeling using MODELLER by Eswar N. (2007)
10.1110/ps.9.9.1753
10.1110/ps.072939707
10.1142/9781860949852_0003
10.1002/1097-0134(20001201)41:4<518::AID-PROT90>3.0.CO;2-6
10.1093/nar/gki327
10.1016/0022-2836(91)90027-4
10.1110/ps.4820102
{'key': 'e_1_2_6_27_1', 'volume-title': 'Advances in kernel methods: Support vector learning', 'author': 'Joachims T.', 'year': '1999'} / Advances in kernel methods: Support vector learning by Joachims T. (1999)
10.1006/jmbi.1999.3091
10.1002/bip.360221211
10.1006/jmbi.1999.2685
10.1016/S0959-440X(00)00063-4
10.1016/0022-2836(71)90324-X
10.1016/j.jmb.2007.11.033
10.1093/protein/gzj005
10.1021/ci600485s
10.1146/annurev.biophys.29.1.291
10.1110/ps.03379804
10.1186/1471-2105-8-345
10.1093/bioinformatics/btn014
10.1093/bioinformatics/btg097
10.1006/jmbi.1996.0868
10.1006/jmbi.1998.1665
10.1110/ps.072895107
10.1002/pro.110430
10.1177/0272989X9801800118
10.1006/jmbi.1996.0114
10.1006/jmbi.1996.0256
10.1006/jmbi.1996.0809
10.1093/bioinformatics/bti540
10.1093/nar/gkj059
10.1002/prot.21809
10.1002/prot.20835
10.1093/protein/12.2.85
10.1006/jmbi.1993.1626
10.1110/ps.9.7.1399
10.1038/80776
10.1002/(SICI)1097-0134(20000701)40:1<6::AID-PROT30>3.0.CO;2-7
10.1002/jcc.10124
10.1110/ps.062416606
10.1016/j.cplett.2005.02.029
10.1073/pnas.95.19.11158
10.1093/bioinformatics/16.9.776
10.1002/prot.340170404
10.1016/0022-2836(81)90087-5
10.1021/ci049924m
10.1002/prot.10015
10.1002/prot.10454
10.1110/ps.0236803
10.1110/ps.051799606
10.1038/nsmb885
10.1002/prot.20264
10.1110/ps.0217002

Dates

Type	When
Created	16 years, 10 months ago (Oct. 1, 2008, 10:01 p.m.)
Deposited	1 year, 11 months ago (Sept. 28, 2023, 9:51 a.m.)
Indexed	3 days, 16 hours ago (Aug. 26, 2025, 3:04 a.m.)
Issued	16 years, 9 months ago (Nov. 1, 2008)
Published	16 years, 9 months ago (Nov. 1, 2008)
Published Online	16 years, 7 months ago (Jan. 2, 2009)
Published Print	16 years, 9 months ago (Nov. 1, 2008)

Funders 0

None

BibTeX

@article{Eramian_2008, title={How well can the accuracy of comparative protein structure models be predicted?}, volume={17}, ISSN={1469-896X}, url={http://dx.doi.org/10.1110/ps.036061.108}, DOI={10.1110/ps.036061.108}, number={11}, journal={Protein Science}, publisher={Wiley}, author={Eramian, David and Eswar, Narayanan and Shen, Min‐Yi and Sali, Andrej}, year={2008}, month=nov, pages={1881–1893} }

JSON

{
  "indexed": {
    "date-parts": [
      [
        2025,
        8,
        26
      ]
    ],
    "date-time": "2025-08-26T07:04:57Z",
    "timestamp": 1756191897385
  },
  "reference-count": 71,
  "publisher": "Wiley",
  "issue": "11",
  "license": [
    {
      "start": {
        "date-parts": [
          [
            2009,
            1,
            2
          ]
        ],
        "date-time": "2009-01-02T00:00:00Z",
        "timestamp": 1230854400000
      },
      "content-version": "vor",
      "delay-in-days": 62,
      "URL": "http://onlinelibrary.wiley.com/termsAndConditions#vor"
    }
  ],
  "content-domain": {
    "domain": [],
    "crossmark-restriction": false
  },
  "published-print": {
    "date-parts": [
      [
        2008,
        11
      ]
    ]
  },
  "abstract": "<jats:title>Abstract</jats:title><jats:p>Comparative structure models are available for two orders of magnitude more protein sequences than are experimentally determined structures. These models, however, suffer from two limitations that experimentally determined structures do not: They frequently contain significant errors, and their accuracy cannot be readily assessed. We have addressed the latter limitation by developing a protocol optimized specifically for predicting the C\u03b1 root\u2010mean\u2010squared deviation (RMSD) and native overlap (NO3.5\u00c5) errors of a model in the absence of its native structure. In contrast to most traditional assessment scores that merely predict one model is more accurate than others, this approach quantifies the error in an absolute sense, thus helping to determine whether or not the model is suitable for intended applications. The assessment relies on a model\u2010specific scoring function constructed by a support vector machine. This regression optimizes the weights of up to nine features, including various sequence similarity measures and statistical potentials, extracted from a tailored training set of models unique to the model being assessed: If possible, we use similarly sized models with the same fold; otherwise, we use similarly sized models with the same secondary structure composition. This protocol predicts the RMSD and NO3.5\u00c5 errors for a diverse set of 580,317 comparative models of 6174 sequences with correlation coefficients (<jats:italic>r</jats:italic>) of 0.84 and 0.86, respectively, to the actual errors. This scoring function achieves the best correlation compared to 13 other tested assessment criteria that achieved correlations ranging from 0.35 to 0.71.</jats:p>",
  "DOI": "10.1110/ps.036061.108",
  "type": "journal-article",
  "created": {
    "date-parts": [
      [
        2008,
        10,
        2
      ]
    ],
    "date-time": "2008-10-02T02:01:27Z",
    "timestamp": 1222912887000
  },
  "page": "1881-1893",
  "source": "Crossref",
  "is-referenced-by-count": 123,
  "title": "How well can the accuracy of comparative protein structure models be predicted?",
  "prefix": "10.1002",
  "volume": "17",
  "author": [
    {
      "given": "David",
      "family": "Eramian",
      "sequence": "first",
      "affiliation": []
    },
    {
      "given": "Narayanan",
      "family": "Eswar",
      "sequence": "additional",
      "affiliation": []
    },
    {
      "given": "Min\u2010Yi",
      "family": "Shen",
      "sequence": "additional",
      "affiliation": []
    },
    {
      "given": "Andrej",
      "family": "Sali",
      "sequence": "additional",
      "affiliation": []
    }
  ],
  "member": "311",
  "published-online": {
    "date-parts": [
      [
        2009,
        1,
        2
      ]
    ]
  },
  "reference": [
    {
      "key": "e_1_2_6_2_1",
      "first-page": "1650",
      "article-title": "ROC\u2010curve analysis. A statistical method for the evaluation of diagnostic tests",
      "volume": "152",
      "author": "Albeck M.J.",
      "year": "1990",
      "journal-title": "Ugeskr. Laeger"
    },
    {
      "key": "e_1_2_6_3_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/nar/25.17.3389"
    },
    {
      "key": "e_1_2_6_4_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/nar/gkh039"
    },
    {
      "key": "e_1_2_6_5_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/nar/gkh131"
    },
    {
      "key": "e_1_2_6_6_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1126/science.1065659"
    },
    {
      "key": "e_1_2_6_7_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/nar/28.1.235"
    },
    {
      "key": "e_1_2_6_8_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1021/bi048252q"
    },
    {
      "key": "e_1_2_6_9_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1126/science.1113801"
    },
    {
      "key": "e_1_2_6_10_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1515/BC.2005.041"
    },
    {
      "key": "e_1_2_6_11_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/j.str.2004.05.018"
    },
    {
      "key": "e_1_2_6_12_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/protein/gzi019"
    },
    {
      "key": "e_1_2_6_13_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/j.1460-2075.1986.tb04288.x"
    },
    {
      "key": "e_1_2_6_14_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/j.jmb.2006.08.035"
    },
    {
      "key": "e_1_2_6_15_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1210/me.2004-0435"
    },
    {
      "key": "e_1_2_6_16_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/(SICI)1097-0134(1999)37:3 <112::AID-PROT15>3.0.CO;2-R"
    },
    {
      "key": "e_1_2_6_17_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1110/ps.062095806"
    },
    {
      "key": "e_1_2_6_18_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/nar/gkg543"
    },
    {
      "key": "e_1_2_6_19_1",
      "first-page": "Unit 2.9",
      "article-title": "Comparative protein structure modeling using MODELLER",
      "author": "Eswar N.",
      "year": "2007",
      "journal-title": "Curr. Protoc. Protein Sci."
    },
    {
      "key": "e_1_2_6_20_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1110/ps.9.9.1753"
    },
    {
      "key": "e_1_2_6_21_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1110/ps.072939707"
    },
    {
      "key": "e_1_2_6_22_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1142/9781860949852_0003"
    },
    {
      "key": "e_1_2_6_23_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/1097-0134(20001201)41:4<518::AID-PROT90>3.0.CO;2-6"
    },
    {
      "key": "e_1_2_6_24_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/nar/gki327"
    },
    {
      "key": "e_1_2_6_25_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0022-2836(91)90027-4"
    },
    {
      "key": "e_1_2_6_26_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1110/ps.4820102"
    },
    {
      "key": "e_1_2_6_27_1",
      "volume-title": "Advances in kernel methods: Support vector learning",
      "author": "Joachims T.",
      "year": "1999"
    },
    {
      "key": "e_1_2_6_28_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.1999.3091"
    },
    {
      "key": "e_1_2_6_29_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/bip.360221211"
    },
    {
      "key": "e_1_2_6_30_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.1999.2685"
    },
    {
      "key": "e_1_2_6_31_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/S0959-440X(00)00063-4"
    },
    {
      "key": "e_1_2_6_32_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0022-2836(71)90324-X"
    },
    {
      "key": "e_1_2_6_33_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/j.jmb.2007.11.033"
    },
    {
      "key": "e_1_2_6_34_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/protein/gzj005"
    },
    {
      "key": "e_1_2_6_35_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1021/ci600485s"
    },
    {
      "key": "e_1_2_6_36_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1146/annurev.biophys.29.1.291"
    },
    {
      "key": "e_1_2_6_37_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1110/ps.03379804"
    },
    {
      "key": "e_1_2_6_38_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1186/1471-2105-8-345"
    },
    {
      "key": "e_1_2_6_39_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/bioinformatics/btn014"
    },
    {
      "key": "e_1_2_6_40_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/bioinformatics/btg097"
    },
    {
      "key": "e_1_2_6_41_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.1996.0868"
    },
    {
      "key": "e_1_2_6_42_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.1998.1665"
    },
    {
      "key": "e_1_2_6_43_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1110/ps.072895107"
    },
    {
      "key": "e_1_2_6_44_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/pro.110430"
    },
    {
      "key": "e_1_2_6_45_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1177/0272989X9801800118"
    },
    {
      "key": "e_1_2_6_46_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.1996.0114"
    },
    {
      "key": "e_1_2_6_47_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.1996.0256"
    },
    {
      "key": "e_1_2_6_48_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.1996.0809"
    },
    {
      "key": "e_1_2_6_49_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/bioinformatics/bti540"
    },
    {
      "key": "e_1_2_6_50_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/nar/gkj059"
    },
    {
      "key": "e_1_2_6_51_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/prot.21809"
    },
    {
      "key": "e_1_2_6_52_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/prot.20835"
    },
    {
      "key": "e_1_2_6_53_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/protein/12.2.85"
    },
    {
      "key": "e_1_2_6_54_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.1993.1626"
    },
    {
      "key": "e_1_2_6_55_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1110/ps.9.7.1399"
    },
    {
      "key": "e_1_2_6_56_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1038/80776"
    },
    {
      "key": "e_1_2_6_57_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/(SICI)1097-0134(20000701)40:1<6::AID-PROT30>3.0.CO;2-7"
    },
    {
      "key": "e_1_2_6_58_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/jcc.10124"
    },
    {
      "key": "e_1_2_6_59_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1110/ps.062416606"
    },
    {
      "key": "e_1_2_6_60_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/j.cplett.2005.02.029"
    },
    {
      "key": "e_1_2_6_61_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1073/pnas.95.19.11158"
    },
    {
      "key": "e_1_2_6_62_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/bioinformatics/16.9.776"
    },
    {
      "key": "e_1_2_6_63_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/prot.340170404"
    },
    {
      "key": "e_1_2_6_64_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0022-2836(81)90087-5"
    },
    {
      "key": "e_1_2_6_65_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1021/ci049924m"
    },
    {
      "key": "e_1_2_6_66_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/prot.10015"
    },
    {
      "key": "e_1_2_6_67_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/prot.10454"
    },
    {
      "key": "e_1_2_6_68_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1110/ps.0236803"
    },
    {
      "key": "e_1_2_6_69_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1110/ps.051799606"
    },
    {
      "key": "e_1_2_6_70_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1038/nsmb885"
    },
    {
      "key": "e_1_2_6_71_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/prot.20264"
    },
    {
      "key": "e_1_2_6_72_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1110/ps.0217002"
    }
  ],
  "container-title": "Protein Science",
  "original-title": [],
  "language": "en",
  "link": [
    {
      "URL": "https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1110%2Fps.036061.108",
      "content-type": "unspecified",
      "content-version": "vor",
      "intended-application": "text-mining"
    },
    {
      "URL": "https://onlinelibrary.wiley.com/doi/pdf/10.1110/ps.036061.108",
      "content-type": "unspecified",
      "content-version": "vor",
      "intended-application": "similarity-checking"
    }
  ],
  "deposited": {
    "date-parts": [
      [
        2023,
        9,
        28
      ]
    ],
    "date-time": "2023-09-28T13:51:48Z",
    "timestamp": 1695909108000
  },
  "score": 1,
  "resource": {
    "primary": {
      "URL": "https://onlinelibrary.wiley.com/doi/10.1110/ps.036061.108"
    }
  },
  "subtitle": [],
  "short-title": [],
  "issued": {
    "date-parts": [
      [
        2008,
        11
      ]
    ]
  },
  "references-count": 71,
  "journal-issue": {
    "issue": "11",
    "published-print": {
      "date-parts": [
        [
          2008,
          11
        ]
      ]
    }
  },
  "alternative-id": [
    "10.1110/ps.036061.108"
  ],
  "URL": "http://dx.doi.org/10.1110/ps.036061.108",
  "relation": {
    "has-review": [
      {
        "id-type": "doi",
        "id": "10.3410/f.1157427.621304",
        "asserted-by": "object"
      },
      {
        "id-type": "doi",
        "id": "10.3410/f.1157427.617585",
        "asserted-by": "object"
      }
    ]
  },
  "ISSN": [
    "0961-8368",
    "1469-896X"
  ],
  "subject": [],
  "container-title-short": "Protein Science",
  "published": {
    "date-parts": [
      [
        2008,
        11
      ]
    ]
  }
}