Abstract
AbstractThe more proteins diverged in sequence, the more difficult it becomes for bioinformatics to infer similarities of protein function and structure from sequence. The precise thresholds used in automated genome annotations depend on the particular aspect of protein function transferred by homology. Here, we presented the first large‐scale analysis of the relation between sequence similarity and identity in subcellular localization. Three results stood out: (1) The subcellular compartment is generally more conserved than what might have been expected given that short sequence motifs like nuclear localization signals can alter the native compartment; (2) the sequence conservation of localization is similar between different compartments; and (3) it is similar to the conservation of structure and enzymatic activity. In particular, we found the transition between the regions of conserved and nonconserved localization to be very sharp, although the thresholds for conservation were less well defined than for structure and enzymatic activity. We found that a simple measure for sequence similarity accounting for pairwise sequence identity and alignment length, the HSSP distance, distinguished accurately between protein pairs of identical and different localizations. In fact, BLAST expectation values outperformed the HSSP distance only for alignments in the subtwilight zone. We succeeded in slightly improving the accuracy of inferring localization through homology by fine tuning the thresholds. Finally, we applied our results to the entire SWISS‐PROT database and five entirely sequenced eukaryotes.
References
74
Referenced
129
10.1006/jmbi.1997.1287
{'key': 'e_1_2_12_3_1', 'first-page': '463', 'volume-title': "HICCS '98: Pacific symposium on biocomputing '98", 'author': 'Alexandrov N.N.', 'year': '1998'}
/ HICCS '98: Pacific symposium on biocomputing '98 by Alexandrov N.N. (1998)10.1007/BF00160485
10.1016/S0076-6879(96)66029-7
10.1016/S0022-2836(05)80360-2
10.1093/nar/25.17.3389
10.1006/jmbi.1997.1498
10.1093/bioinformatics/15.5.391
10.1242/dev.120.7.2077
/ Development / FlyBase–the Drosophila genetic database by Ashburner M. (1994)10.1093/nar/28.1.45
10.1006/jmbi.2001.4495
10.1038/ng0498-313
10.1073/pnas.95.11.6073
10.1016/S0962-8924(00)01833-X
10.1038/376647a0
10.1002/j.1460-2075.1986.tb04288.x
10.1016/S0959-440X(97)80057-7
10.1093/embo-reports/kvd092
10.1002/1097-0134(20001001)41:1<98::AID-PROT120>3.0.CO;2-S
10.1016/S0168-9525(01)02348-4
{'key': 'e_1_2_12_22_1', 'volume-title': 'Of URFs and ORFs: A primer on how to analyze derived amino acid sequences', 'author': 'Doolittle R.F.', 'year': '1986'}
/ Of URFs and ORFs: A primer on how to analyze derived amino acid sequences by Doolittle R.F. (1986)10.1006/jmbi.2000.3968
10.1016/S0962-8924(98)01226-4
10.1006/jmbi.2000.3903
10.1007/s004410000256
10.1038/sj.onc.1203132
10.1006/jmbi.1999.2661
10.1002/pro.5560010313
10.1093/bioinformatics/17.8.721
10.1110/ps.9.8.1487
10.1093/bioinformatics/14.9.753
10.1016/S0168-9525(99)01927-7
10.1002/j.1460-2075.1994.tb06287.x
10.1016/S0959-440X(00)00095-6
10.1016/S1388-1981(99)00098-0
10.1110/ps.10101
10.1146/annurev.biochem.67.1.265
10.1016/S0959-440X(98)80073-0
10.1006/jsbi.2001.4378
10.1093/protein/10.1.1
10.1093/protein/12.1.3
10.1016/0167-7799(96)10043-3
10.1006/jmbi.1998.2221
{'key': 'e_1_2_12_45_1', 'first-page': '42', 'article-title': 'Sensitive sequence comparison as protein function predictor', 'volume': '8', 'author': 'Pawlowski K.', 'year': '2000', 'journal-title': 'Pac. Symp. Biocomput.'}
/ Pac. Symp. Biocomput. / Sensitive sequence comparison as protein function predictor by Pawlowski K. (2000)10.1002/(SICI)1097-4547(20000101)59:1<19::AID-JNR3>3.0.CO;2-Y
10.1002/pro.5560040613
10.1002/prot.10029
{'key': 'e_1_2_12_49_1', 'first-page': '25', 'article-title': 'A simple statistical significance test of window scores in large dot matrices obtained from protein or nucleic acid sequences', 'volume': '3', 'author': 'Reich J.G.', 'year': '1987', 'journal-title': 'Comput. Appl. Biosci.'}
/ Comput. Appl. Biosci. / A simple statistical significance test of window scores in large dot matrices obtained from protein or nucleic acid sequences by Reich J.G. (1987)10.1093/nar/26.9.2230
10.1016/S1359-0278(97)00059-X
10.1016/S0969-2126(98)00029-X
10.1093/protein/12.2.85
10.1006/jsbi.2001.4336
10.1016/S0022-2836(02)00016-5
10.1002/prot.10051
10.1110/ps.8.3.614
10.1002/prot.340090107
10.1126/science.271.5255.1519
{'key': 'e_1_2_12_60_1', 'first-page': '109', 'volume-title': 'Supercomputer 1996: Anwendungen, Architekturen, Trends', 'author': 'Schneider R.', 'year': '1997'}
/ Supercomputer 1996: Anwendungen, Architekturen, Trends by Schneider R. (1997)10.1016/0005-2728(94)90125-2
{'key': 'e_1_2_12_62_1', 'first-page': '276', 'volume-title': 'Fifth international conference on intelligent systems for molecular biology', 'author': 'Shah I.', 'year': '1997'}
/ Fifth international conference on intelligent systems for molecular biology by Shah I. (1997)10.1016/S0167-4838(99)00119-3
10.1093/bioinformatics/14.6.542
{'key': 'e_1_2_12_65_1', 'first-page': '14658', 'article-title': 'Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplication and domain rearrangement', 'author': 'Teichmann S.', 'year': '1998', 'journal-title': 'rearrangement'}
/ rearrangement / Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplication and domain rearrangement by Teichmann S. (1998)10.1016/S0959-440X(99)80053-0
10.1006/jmbi.1999.3054
10.1006/jmbi.2001.4513
10.1038/88640
10.1006/jmbi.1995.0340
{'key': 'e_1_2_12_71_1', 'volume-title': 'Enzyme nomenclature 1992. Recommendations of the nomenclature committee of the International Union of Biochemistry and Molecular Biology.', 'author': 'Webb E.C.', 'year': '1992'}
/ Enzyme nomenclature 1992. Recommendations of the nomenclature committee of the International Union of Biochemistry and Molecular Biology. by Webb E.C. (1992)10.1006/jmbi.2000.3550
10.1006/jmbi.1999.2972
10.1006/jmbi.2000.3974
10.1006/jmbi.2000.3975
Dates
Type | When |
---|---|
Created | 22 years, 9 months ago (Nov. 19, 2002, 7:14 p.m.) |
Deposited | 1 year, 10 months ago (Oct. 9, 2023, 10:42 a.m.) |
Indexed | 4 months ago (April 26, 2025, 4:02 p.m.) |
Issued | 22 years, 8 months ago (Dec. 1, 2002) |
Published | 22 years, 8 months ago (Dec. 1, 2002) |
Published Online | 16 years, 4 months ago (April 13, 2009) |
Published Print | 22 years, 8 months ago (Dec. 1, 2002) |
@article{Nair_2002, title={Sequence conserved for subcellular localization}, volume={11}, ISSN={1469-896X}, url={http://dx.doi.org/10.1110/ps.0207402}, DOI={10.1110/ps.0207402}, number={12}, journal={Protein Science}, publisher={Wiley}, author={Nair, Rajesh and Rost, Burkhard}, year={2002}, month=dec, pages={2836–2847} }