DOI: 10.1002/prot.10043. A study on protein sequence alignment quality

A study on protein sequence alignment quality

10.1002/prot.10043

Crossref journal-article

Wiley

Proteins: Structure, Function, and Bioinformatics (311)

Abstract

AbstractOne of the most central methods in bioinformatics is the alignment of two protein or DNA sequences. However, so far large‐scale benchmarks examining the quality of these alignments are scarce. On the other hand, recently several large‐scale studies of the capacity of different methods to identify related sequences has led to new insights about the performance of fold recognition methods. To increase our understanding about fold recognition methods, we present a large‐scale benchmark of alignment quality. We compare alignments from several different alignment methods, including sequence alignments, hidden Markov models, PSI‐BLAST, CLUSTALW, and threading methods. For most methods, the alignment quality increases significantly at about 20% sequence identity. The difference in alignment quality between different methods is quite small, and the main difference can be seen at the exact positioning of the sharp rise in alignment quality, that is, around 15–20% sequence identity. The alignments are improved by using structural information. In general, the best alignments are obtained by methods that use predicted secondary structure information and sequence profiles obtained from PSI‐BLAST. One interesting observation is that for different pairs many different methods create the best alignments. This finding implies that if a method that could select the best alignment method for each pair existed, a significant improvement of the alignment quality could be gained. Proteins 2002;46:330–339. © 2002 Wiley‐Liss, Inc.

Bibliography

Elofsson, A. (2002). A study on protein sequence alignment quality. Proteins: Structure, Function, and Bioinformatics, 46(3), 330â339. Portico.

Authors 1

Arne Elofsson (first)

References 43 Referenced 53

CASP. The casp‐site.http://predictioncenter.llnl.gov/casp3/Casp3.html 1999.
10.1016/0022-2836(81)90087-5
10.1016/0022-2836(70)90057-4
10.1093/nar/22.22.4673
10.1073/pnas.84.13.4355
10.1093/nar/25.17.3389
10.1002/pro.5560050516
10.1006/jmbi.1997.1101
10.1006/jmbi.1997.0924
10.1038/358086a0
10.1073/pnas.95.23.13597
10.1002/pro.5560070204
10.1006/jmbi.1997.1287
10.1073/pnas.95.11.6073
10.1006/jmbi.1997.1288
10.1006/jmbi.1998.2221
10.1002/(SICI)1097-0134(1997)1 <123::AID-PROT16>3.0.CO;2-Q
10.1006/jmbi.1999.3377
10.1002/(SICI)1097-0134(1997)1 <192::AID-PROT25>3.0.CO;2-I
10.1006/jmbi.2000.3615
10.1006/jmbi.2000.3541
10.1110/ps.9.8.1487
10.1002/(SICI)1097-0134(20000701)40:1<6::AID-PROT30>3.0.CO;2-7
10.1186/1471-2105-2-5
10.1093/nar/27.13.2682
10.1016/S1359-0278(96)00021-1
10.1002/pro.5560050711
FischerD ElofssonA RychlewskiL PazosF ValenciaA RostB OrtizA DunbrackR.Cafasp2: the critical assessment of fully automated structure prediction methods. Submitted for publication.
10.1110/ps.40501
10.1093/bioinformatics/14.10.846
EddyS. Hmmer‐hidden Markov model software url:http://genome.wustl.edu/eddy/hmmer.html 1997.
10.1110/ps.9.2.232
10.1006/jmbi.1999.3233
{'key': 'e_1_2_8_35_2', 'article-title': 'Pcons: a neural network based consensus predictor that improves fold recognition', 'author': 'Lundström J', 'journal-title': 'Protein Sci'} / Protein Sci / Pcons: a neural network based consensus predictor that improves fold recognition by Lundström J
10.1073/pnas.95.11.5913
10.1002/(SICI)1097-0134(1999)37:3 <22::AID-PROT5>3.0.CO;2-W
10.1002/(SICI)1097-0134(1999)37:3 <15::AID-PROT4>3.0.CO;2-Z
10.1093/bioinformatics/16.9.776
10.1016/S0022-2836(05)80134-2
10.1002/pro.5560010313
10.1093/bioinformatics/14.9.755
10.1093/bioinformatics/15.3.260
10.1006/jmbi.2000.3741

Dates

Type	When
Created	23 years ago (Aug. 25, 2002, 6:16 p.m.)
Deposited	1 year, 10 months ago (Oct. 15, 2023, 2:13 p.m.)
Indexed	2 months, 2 weeks ago (June 5, 2025, 12:24 p.m.)
Issued	23 years, 7 months ago (Jan. 8, 2002)
Published	23 years, 7 months ago (Jan. 8, 2002)
Published Online	23 years, 7 months ago (Jan. 8, 2002)
Published Print	23 years, 6 months ago (Feb. 15, 2002)

Funders 0

None

BibTeX

@article{Elofsson_2002, title={A study on protein sequence alignment quality}, volume={46}, ISSN={1097-0134}, url={http://dx.doi.org/10.1002/prot.10043}, DOI={10.1002/prot.10043}, number={3}, journal={Proteins: Structure, Function, and Bioinformatics}, publisher={Wiley}, author={Elofsson, Arne}, year={2002}, month=jan, pages={330–339} }

JSON

{
  "indexed": {
    "date-parts": [
      [
        2025,
        6,
        5
      ]
    ],
    "date-time": "2025-06-05T16:24:26Z",
    "timestamp": 1749140666954
  },
  "reference-count": 43,
  "publisher": "Wiley",
  "issue": "3",
  "license": [
    {
      "start": {
        "date-parts": [
          [
            2002,
            1,
            8
          ]
        ],
        "date-time": "2002-01-08T00:00:00Z",
        "timestamp": 1010448000000
      },
      "content-version": "vor",
      "delay-in-days": 0,
      "URL": "http://onlinelibrary.wiley.com/termsAndConditions#vor"
    }
  ],
  "content-domain": {
    "domain": [],
    "crossmark-restriction": false
  },
  "published-print": {
    "date-parts": [
      [
        2002,
        2,
        15
      ]
    ]
  },
  "abstract": "<jats:title>Abstract</jats:title><jats:p>One of the most central methods in bioinformatics is the alignment of two protein or DNA sequences. However, so far large\u2010scale benchmarks examining the quality of these alignments are scarce. On the other hand, recently several large\u2010scale studies of the capacity of different methods to identify related sequences has led to new insights about the performance of fold recognition methods. To increase our understanding about fold recognition methods, we present a large\u2010scale benchmark of alignment quality. We compare alignments from several different alignment methods, including sequence alignments, hidden Markov models, PSI\u2010BLAST, CLUSTALW, and threading methods. For most methods, the alignment quality increases significantly at about 20% sequence identity. The difference in alignment quality between different methods is quite small, and the main difference can be seen at the exact positioning of the sharp rise in alignment quality, that is, around 15\u201320% sequence identity. The alignments are improved by using structural information. In general, the best alignments are obtained by methods that use predicted secondary structure information and sequence profiles obtained from PSI\u2010BLAST. One interesting observation is that for different pairs many different methods create the best alignments. This finding implies that if a method that could select the best alignment method for each pair existed, a significant improvement of the alignment quality could be gained. Proteins 2002;46:330\u2013339. \u00a9 2002 Wiley\u2010Liss, Inc.</jats:p>",
  "DOI": "10.1002/prot.10043",
  "type": "journal-article",
  "created": {
    "date-parts": [
      [
        2002,
        8,
        25
      ]
    ],
    "date-time": "2002-08-25T22:16:03Z",
    "timestamp": 1030313763000
  },
  "page": "330-339",
  "source": "Crossref",
  "is-referenced-by-count": 53,
  "title": "A study on protein sequence alignment quality",
  "prefix": "10.1002",
  "volume": "46",
  "author": [
    {
      "given": "Arne",
      "family": "Elofsson",
      "sequence": "first",
      "affiliation": []
    }
  ],
  "member": "311",
  "published-online": {
    "date-parts": [
      [
        2002,
        1,
        8
      ]
    ]
  },
  "reference": [
    {
      "key": "e_1_2_8_2_2",
      "unstructured": "CASP. The casp\u2010site.http://predictioncenter.llnl.gov/casp3/Casp3.html 1999."
    },
    {
      "key": "e_1_2_8_3_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0022-2836(81)90087-5"
    },
    {
      "key": "e_1_2_8_4_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0022-2836(70)90057-4"
    },
    {
      "key": "e_1_2_8_5_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/nar/22.22.4673"
    },
    {
      "key": "e_1_2_8_6_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1073/pnas.84.13.4355"
    },
    {
      "key": "e_1_2_8_7_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/nar/25.17.3389"
    },
    {
      "key": "e_1_2_8_8_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/pro.5560050516"
    },
    {
      "key": "e_1_2_8_9_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.1997.1101"
    },
    {
      "key": "e_1_2_8_10_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.1997.0924"
    },
    {
      "key": "e_1_2_8_11_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1038/358086a0"
    },
    {
      "key": "e_1_2_8_12_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1073/pnas.95.23.13597"
    },
    {
      "key": "e_1_2_8_13_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/pro.5560070204"
    },
    {
      "key": "e_1_2_8_14_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.1997.1287"
    },
    {
      "key": "e_1_2_8_15_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1073/pnas.95.11.6073"
    },
    {
      "key": "e_1_2_8_16_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.1997.1288"
    },
    {
      "key": "e_1_2_8_17_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.1998.2221"
    },
    {
      "key": "e_1_2_8_18_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/(SICI)1097-0134(1997)1 <123::AID-PROT16>3.0.CO;2-Q"
    },
    {
      "key": "e_1_2_8_19_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.1999.3377"
    },
    {
      "key": "e_1_2_8_20_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/(SICI)1097-0134(1997)1 <192::AID-PROT25>3.0.CO;2-I"
    },
    {
      "key": "e_1_2_8_21_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.2000.3615"
    },
    {
      "key": "e_1_2_8_22_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.2000.3541"
    },
    {
      "key": "e_1_2_8_23_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1110/ps.9.8.1487"
    },
    {
      "key": "e_1_2_8_24_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/(SICI)1097-0134(20000701)40:1<6::AID-PROT30>3.0.CO;2-7"
    },
    {
      "key": "e_1_2_8_25_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1186/1471-2105-2-5"
    },
    {
      "key": "e_1_2_8_26_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/nar/27.13.2682"
    },
    {
      "key": "e_1_2_8_27_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/S1359-0278(96)00021-1"
    },
    {
      "key": "e_1_2_8_28_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/pro.5560050711"
    },
    {
      "key": "e_1_2_8_29_2",
      "unstructured": "FischerD ElofssonA RychlewskiL PazosF ValenciaA RostB OrtizA DunbrackR.Cafasp2: the critical assessment of fully automated structure prediction methods. Submitted for publication."
    },
    {
      "key": "e_1_2_8_30_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1110/ps.40501"
    },
    {
      "key": "e_1_2_8_31_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/bioinformatics/14.10.846"
    },
    {
      "key": "e_1_2_8_32_2",
      "unstructured": "EddyS. Hmmer\u2010hidden Markov model software url:http://genome.wustl.edu/eddy/hmmer.html 1997."
    },
    {
      "key": "e_1_2_8_33_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1110/ps.9.2.232"
    },
    {
      "key": "e_1_2_8_34_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.1999.3233"
    },
    {
      "key": "e_1_2_8_35_2",
      "article-title": "Pcons: a neural network based consensus predictor that improves fold recognition",
      "author": "Lundstr\u00f6m J",
      "journal-title": "Protein Sci"
    },
    {
      "key": "e_1_2_8_36_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1073/pnas.95.11.5913"
    },
    {
      "key": "e_1_2_8_37_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/(SICI)1097-0134(1999)37:3 <22::AID-PROT5>3.0.CO;2-W"
    },
    {
      "key": "e_1_2_8_38_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/(SICI)1097-0134(1999)37:3 <15::AID-PROT4>3.0.CO;2-Z"
    },
    {
      "key": "e_1_2_8_39_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/bioinformatics/16.9.776"
    },
    {
      "key": "e_1_2_8_40_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/S0022-2836(05)80134-2"
    },
    {
      "key": "e_1_2_8_41_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/pro.5560010313"
    },
    {
      "key": "e_1_2_8_42_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/bioinformatics/14.9.755"
    },
    {
      "key": "e_1_2_8_43_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/bioinformatics/15.3.260"
    },
    {
      "key": "e_1_2_8_44_2",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jmbi.2000.3741"
    }
  ],
  "container-title": "Proteins: Structure, Function, and Bioinformatics",
  "original-title": [],
  "language": "en",
  "link": [
    {
      "URL": "https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1002%2Fprot.10043",
      "content-type": "unspecified",
      "content-version": "vor",
      "intended-application": "text-mining"
    },
    {
      "URL": "https://onlinelibrary.wiley.com/doi/pdf/10.1002/prot.10043",
      "content-type": "unspecified",
      "content-version": "vor",
      "intended-application": "similarity-checking"
    }
  ],
  "deposited": {
    "date-parts": [
      [
        2023,
        10,
        15
      ]
    ],
    "date-time": "2023-10-15T18:13:12Z",
    "timestamp": 1697393592000
  },
  "score": 1,
  "resource": {
    "primary": {
      "URL": "https://onlinelibrary.wiley.com/doi/10.1002/prot.10043"
    }
  },
  "subtitle": [],
  "short-title": [],
  "issued": {
    "date-parts": [
      [
        2002,
        1,
        8
      ]
    ]
  },
  "references-count": 43,
  "journal-issue": {
    "issue": "3",
    "published-print": {
      "date-parts": [
        [
          2002,
          2,
          15
        ]
      ]
    }
  },
  "alternative-id": [
    "10.1002/prot.10043"
  ],
  "URL": "http://dx.doi.org/10.1002/prot.10043",
  "relation": {},
  "ISSN": [
    "0887-3585",
    "1097-0134"
  ],
  "subject": [],
  "container-title-short": "Proteins",
  "published": {
    "date-parts": [
      [
        2002,
        1,
        8
      ]
    ]
  }
}