Crossref journal-article
Wiley
Protein Science (311)
Abstract

AbstractAs the number of complete genomes rapidly increases, accurate methods to automatically predict the subcellular location of proteins are increasingly useful to help their functional annotation. In order to improve the predictive accuracy of the many prediction methods developed to date, a novel representation of protein sequences is proposed. This representation involves local compositions of amino acids and twin amino acids, and local frequencies of distance between successive (basic, hydrophobic, and other) amino acids. For calculating the local features, each sequence is split into three parts: N‐terminal, middle, and C‐terminal. The N‐terminal part is further divided into four regions to consider ambiguity in the length and position of signal sequences. We tested this representation with support vector machines on two data sets extracted from the SWISS‐PROT database. Through fivefold cross‐validation tests, overall accuracies of more than 87% and 91% were obtained for eukaryotic and prokaryotic proteins, respectively. It is concluded that considering the respective features in the N‐terminal, middle, and C‐terminal parts is helpful to predict the subcellular location.

Bibliography

Matsuda, S., Vert, J., Saigo, H., Ueda, N., Toh, H., & Akutsu, T. (2005). A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Science, 14(11), 2804–2813. Portico.

Authors 6
  1. Setsuro Matsuda (first)
  2. Jean‐Philippe Vert (additional)
  3. Hiroto Saigo (additional)
  4. Nobuhisa Ueda (additional)
  5. Hiroyuki Toh (additional)
  6. Tatsuya Akutsu (additional)
References 43 Referenced 123
  1. 10.1093/nar/29.1.37
  2. 10.1016/j.jmb.2004.05.028
  3. 10.1093/nar/gkh350
  4. 10.1093/bioinformatics/bti309
  5. {'key': 'e_1_2_7_6_1', 'first-page': '440', 'article-title': 'Chloroplast transit peptides: Structure, function and evolution', 'volume': '10', 'author': 'Bruce B.D.', 'year': '2000', 'journal-title': 'Trends Biochem. Sci.'} / Trends Biochem. Sci. / Chloroplast transit peptides: Structure, function and evolution by Bruce B.D. (2000)
  6. 10.1093/bioinformatics/bth054
  7. 10.1006/jmbi.1996.0804
  8. 10.1002/prot.1035
  9. 10.1074/jbc.M204161200
  10. 10.1016/j.bbrc.2003.10.062
  11. 10.1002/jcb.10790
  12. 10.1006/bbrc.1998.9498
  13. 10.1093/protein/12.2.107
  14. 10.3109/10409239509083488
  15. 10.1016/S0955-0674(97)80023-3
  16. 10.1110/ps.8.5.978
  17. 10.1006/jmbi.2000.3903
  18. 10.1142/9781860947322_0012
  19. {'key': 'e_1_2_7_20_1', 'first-page': '147', 'volume-title': 'Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology', 'author': 'Horton P.', 'year': '1997'} / Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology by Horton P. (1997)
  20. 10.1093/bioinformatics/17.8.721
  21. {'key': 'e_1_2_7_22_1', 'first-page': '41', 'volume-title': 'Advances in kernel methods—Support vector learning', 'author': 'Joachims T.', 'year': '1999'} / Advances in kernel methods—Support vector learning by Joachims T. (1999)
  22. 10.1105/tpc.11.4.557
  23. {'key': 'e_1_2_7_24_1', 'first-page': '158', 'volume-title': 'Proceedings of the 3rd Annual Conference of the Korean Society for Bioinformatics', 'author': 'Kim J.K.', 'year': '2004'} / Proceedings of the 3rd Annual Conference of the Korean Society for Bioinformatics by Kim J.K. (2004)
  24. 10.1093/nar/gkg101
  25. 10.1016/0005-2795(75)90109-9
  26. 10.1074/jbc.M300690200
  27. {'key': 'e_1_2_7_28_1', 'first-page': '11', 'article-title': 'Optimal alignments in linear space', 'volume': '4', 'author': 'Myers E.W.', 'year': '1988', 'journal-title': 'Comput. Appl. Biosci.'} / Comput. Appl. Biosci. / Optimal alignments in linear space by Myers E.W. (1988)
  28. 10.1016/S0888-7543(05)80111-9
  29. 10.1016/0022-2836(70)90057-4
  30. 10.1146/annurev.biochem.66.1.863
  31. {'key': 'e_1_2_7_32_1', 'first-page': '218', 'article-title': 'Multi‐class support vector machines for protein secondary structure prediction', 'volume': '14', 'author': 'Nguyen M.N.', 'year': '2003', 'journal-title': 'Genome Inform. Ser. Workshop Genome Inform.'} / Genome Inform. Ser. Workshop Genome Inform. / Multi‐class support vector machines for protein secondary structure prediction by Nguyen M.N. (2003)
  32. 10.1093/protein/10.1.1
  33. 10.1093/oxfordjournals.jbchem.a022036
  34. 10.1093/bioinformatics/btg222
  35. 10.1016/0076-6879(90)83007-V
  36. 10.1073/pnas.85.8.2444
  37. 10.1002/pmic.200300769
  38. 10.1093/nar/26.9.2230
  39. 10.1034/j.1600-0854.2001.1r010.x
  40. {'key': 'e_1_2_7_41_1', 'volume-title': 'Learning with kernels—Support vector machines, regularization, optimization, and beyond', 'author': 'Schölkopf B.', 'year': '2002'} / Learning with kernels—Support vector machines, regularization, optimization, and beyond by Schölkopf B. (2002)
  41. 10.1016/0022-2836(81)90087-5
  42. 10.1007/BF01868635
  43. 10.1016/S0014-5793(99)00506-2
Dates
Type When
Created 19 years, 9 months ago (Oct. 26, 2005, 2:02 p.m.)
Deposited 1 year, 10 months ago (Oct. 9, 2023, 4:16 p.m.)
Indexed 11 months, 3 weeks ago (Aug. 27, 2024, 8:33 p.m.)
Issued 19 years, 9 months ago (Nov. 1, 2005)
Published 19 years, 9 months ago (Nov. 1, 2005)
Published Online 16 years, 7 months ago (Jan. 1, 2009)
Published Print 19 years, 9 months ago (Nov. 1, 2005)
Funders 0

None

@article{Matsuda_2005, title={A novel representation of protein sequences for prediction of subcellular location using support vector machines}, volume={14}, ISSN={1469-896X}, url={http://dx.doi.org/10.1110/ps.051597405}, DOI={10.1110/ps.051597405}, number={11}, journal={Protein Science}, publisher={Wiley}, author={Matsuda, Setsuro and Vert, Jean‐Philippe and Saigo, Hiroto and Ueda, Nobuhisa and Toh, Hiroyuki and Akutsu, Tatsuya}, year={2005}, month=nov, pages={2804–2813} }