Abstract
AbstractA new method is presented for identifying distantly related homologous proteins that are unrecognizable by conventional sequence comparison methods. The method combines information about functionally conserved sequence patterns with information about structure context. This information is encoded in stochastic discrete state‐space models (DSMs) that comprise a new family of hidden Markov models. The new models are called sequence‐pattern‐embedded DSMs (pDSMs). This method can identify distantly related protein family members with a high sensitivity and specificity. The method is illustrated with trypsin‐like serine proteases and globins. The strategy for building pDSMs is presented. The method has been validated using carefully constructed positive and negative control sets. In addition to the ability to recognize remote homologs, pDSM sequence analysis predicts secondary structures with higher sensitivity, specificity, and Q3 accuracy than DSM analysis, which omits information about conserved sequence patterns. The identification of trypsin‐like serine proteases in new genomes is discussed.
References
53
Referenced
19
10.1002/pro.5560050703
10.1093/protein/9.7.591
10.1038/369072a0
10.1016/S0022-2836(05)80360-2
10.1093/nar/25.17.3389
-
BairochA.1991. PROSITE: A dictionary of sites and patterns in proteins.Nucleic Acids Res19 Suppl:2241–2245.
(
10.1093/nar/19.suppl.2241
) 10.1093/nar/22.17.3626
10.1093/nar/22.17.3626
10.1016/0022-2836(87)90521-3
10.1093/nar/22.17.3441
10.1016/S0022-2836(77)80200-3
10.1093/emboj/17.20.6061
10.1111/j.1432-1033.1987.tb13566.x
10.1126/science.273.5278.1058
10.1126/science.2471267
10.1006/jmbi.1996.0874
10.1126/science.7280687
10.1016/S0959-440X(96)80056-X
10.1016/0022-2836(78)90297-8
10.1016/S0959-440X(96)80058-3
10.1006/jmbi.1996.0569
10.1002/prot.340070404
10.1073/pnas.84.13.4355
10.1042/bj1010229
10.1093/nar/19.23.6565
10.1016/S0968-0004(00)89071-4
10.1126/science.273.5275.595
10.1002/(SICI)1097-0134(199705)28:1<72::AID-PROT7>3.0.CO;2-L
10.1016/S0959-440X(97)80024-3
10.1002/bip.360221211
10.1002/(SICI)1097-0134(1997)1 <134::AID-PROT18>3.0.CO;2-P
10.1006/jmbi.1994.1104
10.1111/j.1365-2958.1995.tb02309.x
10.1006/jmbi.1996.0053
10.1145/32206.32207
10.1016/0022-2836(80)90373-3
10.1128/jb.171.3.1574-1584.1989
10.1038/387s007
/ Nature / Overview of the yeast genome by Mewes HW (1997)10.1006/jmbi.1996.0506
10.1016/S0022-2836(05)80134-2
10.1093/nar/25.9.1665
{'key': 'e_1_2_1_43_1', 'first-page': '325', 'article-title': 'Identifying distantly related protein sequences', 'volume': '13', 'author': 'Pearson WR', 'year': '1997', 'journal-title': 'CABIOS'}
/ CABIOS / Identifying distantly related protein sequences by Pearson WR (1997)10.1109/5.18626
10.1128/jb.172.2.1019-1023.1990
{'key': 'e_1_2_1_46_1', 'first-page': '279', 'article-title': 'Scrutineer: A computer program that flexibly seeks and describes motifs and profiles in protein sequence databases', 'volume': '6', 'author': 'Sibbald PR', 'year': '1990', 'journal-title': 'CABIOS'}
/ CABIOS / Scrutineer: A computer program that flexibly seeks and describes motifs and profiles in protein sequence databases by Sibbald PR (1990)10.1002/(SICI)1097-0134(199707)28:3<405::AID-PROT10>3.0.CO;2-L
10.1016/0959-440X(95)80101-4
{'key': 'e_1_2_1_49_1', 'first-page': '13', 'volume-title': 'Proteases and biological control.', 'author': 'Stroud RM', 'year': '1975'}
/ Proteases and biological control. by Stroud RM (1975)10.1016/S1569-2558(08)60483-X
10.1002/pro.5560020302
10.1016/0022-2836(86)90308-6
{'key': 'e_1_2_1_53_1', 'first-page': '255', 'volume-title': 'Bayesian analysis of time series and dynamic models.', 'author': 'White JV', 'year': '1988'}
/ Bayesian analysis of time series and dynamic models. by White JV (1988)10.1016/0025-5564(94)90004-3
Dates
Type | When |
---|---|
Created | 16 years, 6 months ago (Feb. 10, 2009, 1:16 a.m.) |
Deposited | 1 year, 10 months ago (Oct. 28, 2023, 3:47 p.m.) |
Indexed | 1 year, 6 months ago (Feb. 1, 2024, 1:20 p.m.) |
Issued | 26 years, 8 months ago (Dec. 1, 1998) |
Published | 26 years, 8 months ago (Dec. 1, 1998) |
Published Online | 16 years, 7 months ago (Dec. 31, 2008) |
Published Print | 26 years, 8 months ago (Dec. 1, 1998) |
@article{Yu_1998, title={A homology identification method that combines protein sequence and structure information}, volume={7}, ISSN={1469-896X}, url={http://dx.doi.org/10.1002/pro.5560071203}, DOI={10.1002/pro.5560071203}, number={12}, journal={Protein Science}, publisher={Wiley}, author={Yu, Lihua and White, James V. and Smith, Temple F.}, year={1998}, month=dec, pages={2499–2510} }