Crossref journal-article
Wiley
Protein Science (311)
Abstract

AbstractThe more proteins diverged in sequence, the more difficult it becomes for bioinformatics to infer similarities of protein function and structure from sequence. The precise thresholds used in automated genome annotations depend on the particular aspect of protein function transferred by homology. Here, we presented the first large‐scale analysis of the relation between sequence similarity and identity in subcellular localization. Three results stood out: (1) The subcellular compartment is generally more conserved than what might have been expected given that short sequence motifs like nuclear localization signals can alter the native compartment; (2) the sequence conservation of localization is similar between different compartments; and (3) it is similar to the conservation of structure and enzymatic activity. In particular, we found the transition between the regions of conserved and nonconserved localization to be very sharp, although the thresholds for conservation were less well defined than for structure and enzymatic activity. We found that a simple measure for sequence similarity accounting for pairwise sequence identity and alignment length, the HSSP distance, distinguished accurately between protein pairs of identical and different localizations. In fact, BLAST expectation values outperformed the HSSP distance only for alignments in the subtwilight zone. We succeeded in slightly improving the accuracy of inferring localization through homology by fine tuning the thresholds. Finally, we applied our results to the entire SWISS‐PROT database and five entirely sequenced eukaryotes.

Bibliography

Nair, R., & Rost, B. (2002). Sequence conserved for subcellular localization. Protein Science, 11(12), 2836–2847. Portico.

Authors 2
  1. Rajesh Nair (first)
  2. Burkhard Rost (additional)
References 74 Referenced 129
  1. 10.1006/jmbi.1997.1287
  2. {'key': 'e_1_2_12_3_1', 'first-page': '463', 'volume-title': "HICCS '98: Pacific symposium on biocomputing '98", 'author': 'Alexandrov N.N.', 'year': '1998'} / HICCS '98: Pacific symposium on biocomputing '98 by Alexandrov N.N. (1998)
  3. 10.1007/BF00160485
  4. 10.1016/S0076-6879(96)66029-7
  5. 10.1016/S0022-2836(05)80360-2
  6. 10.1093/nar/25.17.3389
  7. 10.1006/jmbi.1997.1498
  8. 10.1093/bioinformatics/15.5.391
  9. 10.1242/dev.120.7.2077 / Development / FlyBase–the Drosophila genetic database by Ashburner M. (1994)
  10. 10.1093/nar/28.1.45
  11. 10.1006/jmbi.2001.4495
  12. 10.1038/ng0498-313
  13. 10.1073/pnas.95.11.6073
  14. 10.1016/S0962-8924(00)01833-X
  15. 10.1038/376647a0
  16. 10.1002/j.1460-2075.1986.tb04288.x
  17. 10.1016/S0959-440X(97)80057-7
  18. 10.1093/embo-reports/kvd092
  19. 10.1002/1097-0134(20001001)41:1<98::AID-PROT120>3.0.CO;2-S
  20. 10.1016/S0168-9525(01)02348-4
  21. {'key': 'e_1_2_12_22_1', 'volume-title': 'Of URFs and ORFs: A primer on how to analyze derived amino acid sequences', 'author': 'Doolittle R.F.', 'year': '1986'} / Of URFs and ORFs: A primer on how to analyze derived amino acid sequences by Doolittle R.F. (1986)
  22. 10.1006/jmbi.2000.3968
  23. 10.1016/S0962-8924(98)01226-4
  24. 10.1006/jmbi.2000.3903
  25. 10.1007/s004410000256
  26. 10.1038/sj.onc.1203132
  27. 10.1006/jmbi.1999.2661
  28. 10.1002/pro.5560010313
  29. 10.1093/bioinformatics/17.8.721
  30. 10.1110/ps.9.8.1487
  31. 10.1093/bioinformatics/14.9.753
  32. 10.1016/S0168-9525(99)01927-7
  33. 10.1002/j.1460-2075.1994.tb06287.x
  34. 10.1016/S0959-440X(00)00095-6
  35. 10.1016/S1388-1981(99)00098-0
  36. 10.1110/ps.10101
  37. 10.1146/annurev.biochem.67.1.265
  38. 10.1016/S0959-440X(98)80073-0
  39. 10.1006/jsbi.2001.4378
  40. 10.1093/protein/10.1.1
  41. 10.1093/protein/12.1.3
  42. 10.1016/0167-7799(96)10043-3
  43. 10.1006/jmbi.1998.2221
  44. {'key': 'e_1_2_12_45_1', 'first-page': '42', 'article-title': 'Sensitive sequence comparison as protein function predictor', 'volume': '8', 'author': 'Pawlowski K.', 'year': '2000', 'journal-title': 'Pac. Symp. Biocomput.'} / Pac. Symp. Biocomput. / Sensitive sequence comparison as protein function predictor by Pawlowski K. (2000)
  45. 10.1002/(SICI)1097-4547(20000101)59:1<19::AID-JNR3>3.0.CO;2-Y
  46. 10.1002/pro.5560040613
  47. 10.1002/prot.10029
  48. {'key': 'e_1_2_12_49_1', 'first-page': '25', 'article-title': 'A simple statistical significance test of window scores in large dot matrices obtained from protein or nucleic acid sequences', 'volume': '3', 'author': 'Reich J.G.', 'year': '1987', 'journal-title': 'Comput. Appl. Biosci.'} / Comput. Appl. Biosci. / A simple statistical significance test of window scores in large dot matrices obtained from protein or nucleic acid sequences by Reich J.G. (1987)
  49. 10.1093/nar/26.9.2230
  50. 10.1016/S1359-0278(97)00059-X
  51. 10.1016/S0969-2126(98)00029-X
  52. 10.1093/protein/12.2.85
  53. 10.1006/jsbi.2001.4336
  54. 10.1016/S0022-2836(02)00016-5
  55. 10.1002/prot.10051
  56. 10.1110/ps.8.3.614
  57. 10.1002/prot.340090107
  58. 10.1126/science.271.5255.1519
  59. {'key': 'e_1_2_12_60_1', 'first-page': '109', 'volume-title': 'Supercomputer 1996: Anwendungen, Architekturen, Trends', 'author': 'Schneider R.', 'year': '1997'} / Supercomputer 1996: Anwendungen, Architekturen, Trends by Schneider R. (1997)
  60. 10.1016/0005-2728(94)90125-2
  61. {'key': 'e_1_2_12_62_1', 'first-page': '276', 'volume-title': 'Fifth international conference on intelligent systems for molecular biology', 'author': 'Shah I.', 'year': '1997'} / Fifth international conference on intelligent systems for molecular biology by Shah I. (1997)
  62. 10.1016/S0167-4838(99)00119-3
  63. 10.1093/bioinformatics/14.6.542
  64. {'key': 'e_1_2_12_65_1', 'first-page': '14658', 'article-title': 'Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplication and domain rearrangement', 'author': 'Teichmann S.', 'year': '1998', 'journal-title': 'rearrangement'} / rearrangement / Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplication and domain rearrangement by Teichmann S. (1998)
  65. 10.1016/S0959-440X(99)80053-0
  66. 10.1006/jmbi.1999.3054
  67. 10.1006/jmbi.2001.4513
  68. 10.1038/88640
  69. 10.1006/jmbi.1995.0340
  70. {'key': 'e_1_2_12_71_1', 'volume-title': 'Enzyme nomenclature 1992. Recommendations of the nomenclature committee of the International Union of Biochemistry and Molecular Biology.', 'author': 'Webb E.C.', 'year': '1992'} / Enzyme nomenclature 1992. Recommendations of the nomenclature committee of the International Union of Biochemistry and Molecular Biology. by Webb E.C. (1992)
  71. 10.1006/jmbi.2000.3550
  72. 10.1006/jmbi.1999.2972
  73. 10.1006/jmbi.2000.3974
  74. 10.1006/jmbi.2000.3975
Dates
Type When
Created 22 years, 9 months ago (Nov. 19, 2002, 7:14 p.m.)
Deposited 1 year, 10 months ago (Oct. 9, 2023, 10:42 a.m.)
Indexed 4 months ago (April 26, 2025, 4:02 p.m.)
Issued 22 years, 8 months ago (Dec. 1, 2002)
Published 22 years, 8 months ago (Dec. 1, 2002)
Published Online 16 years, 4 months ago (April 13, 2009)
Published Print 22 years, 8 months ago (Dec. 1, 2002)
Funders 0

None

@article{Nair_2002, title={Sequence conserved for subcellular localization}, volume={11}, ISSN={1469-896X}, url={http://dx.doi.org/10.1110/ps.0207402}, DOI={10.1110/ps.0207402}, number={12}, journal={Protein Science}, publisher={Wiley}, author={Nair, Rajesh and Rost, Burkhard}, year={2002}, month=dec, pages={2836–2847} }