Abstract
AbstractMore than 30 organisms have been sequenced entirely. Here, we applied a variety of simple bioinformatics tools to analyze 29 proteomes for representatives from all three kingdoms: eukaryotes, prokaryotes, and archaebacteria. We confirmed that eukaryotes have relatively more long proteins than prokaryotes and archaes, and that the overall amino acid composition is similar among the three. We predicted that ∼15%–30% of all proteins contained transmembrane helices. We could not find a correlation between the content of membrane proteins and the complexity of the organism. In particular, we did not find significantly higher percentages of helical membrane proteins in eukaryotes than in prokaryotes or archae. However, we found more proteins with seven transmembrane helices in eukaryotes and more with six and 12 transmembrane helices in prokaryotes. We found twice as many coiled‐coil proteins in eukaryotes (10%) as in prokaryotes and archaes (4%–5%), and we predicted ∼15%–25% of all proteins to be secreted by most eukaryotes and prokaryotes. Every tenth protein had no known homolog in current databases, and 30%–40% of the proteins fell into structural families with >100 members. A classification by cellular function verified that eukaryotes have a higher proportion of proteins for communication with the environment. Finally, we found at least one homolog of experimentally known structure for ∼20%–45% of all proteins; the regions with structural homology covered 20%–30% of all residues. These numbers may or may not suggest that there are 1200–2600 folds in the universe of protein structures. All predictions are available at http://cubic.bioc.columbia.edu/genomes.
References
76
Referenced
217
10.1038/387005b0
{'key': 'e_1_2_7_3_1', 'first-page': '2012', 'article-title': 'Genome sequence of the nematode C. elegans: A platform for investigating biology. The C. elegans Sequencing Consortium (publ. errata appear in Science 1999, 283: 35, 283: 2103, 285: 1493)', 'volume': '288', 'year': '1998', 'journal-title': 'Science'}
/ Science / Genome sequence of the nematode C. elegans: A platform for investigating biology. The C. elegans Sequencing Consortium (publ. errata appear in Science 1999, 283: 35, 283: 2103, 285: 1493) (1998)10.1126/science.287.5461.2185
10.1093/nar/25.17.3389
10.1038/24094
10.1002/(SICI)1097-0134(199708)28:4<465::AID-PROT1>3.0.CO;2-9
10.1093/nar/28.1.45
10.1093/nar/28.1.15
10.1093/nar/28.1.235
10.1093/emboj/17.20.6061
10.1002/pro.5560070121
10.1126/science.273.5278.1058
10.1038/13783
10.1038/357543a0
10.1016/S0960-9822(06)00148-5
10.1038/31159
10.1107/S0365110X53001964
10.1038/385029a0
10.1038/32831
10.1002/1097-0134(20001001)41:1<98::AID-PROT120>3.0.CO;2-S
10.1038/990031
10.1006/jmbi.2000.3903
10.1093/bioinformatics/17.12.1242
10.1126/science.7542800
10.1126/science.270.5235.397
10.1038/37551
10.1126/science.281.5375.375
10.1016/S0168-9525(97)01224-9
10.1016/S1359-0278(98)00066-2
10.1073/pnas.94.22.11911
10.1111/j.1574-6976.1998.tb00371.x
10.1002/yea.320090703
10.1093/nar/24.22.4420
10.1093/nar/27.1.244
10.1016/S0014-5793(98)00095-7
10.1038/7716
10.1093/dnares/3.3.109
10.1093/dnares/5.2.55
10.1093/dnares/6.2.83
/ DNA Res. / Complete genome sequence of an aerobic hyper‐thermophilic crenarchaeon, Aeropyrum pernix K1. by Kawarabayasi Y. (1999)10.1038/37052
10.1016/S0960-9822(02)00508-0
10.1111/j.1365-2958.1995.tb02309.x
10.1006/jmbi.1999.3377
- Liu J.andRost B.2000.Analyzing all proteins in entire genomes. CUBIC Columbia University Dept. of Biochemistry and Molecular Biophysics.http://cubic.bioc.columbia.edu/genomes/
{'key': 'e_1_2_7_46_1', 'article-title': 'Target space for structural genomics revisited', 'author': 'Liu J.', 'year': '2001', 'journal-title': 'Bioinformatics'}
/ Bioinformatics / Target space for structural genomics revisited by Liu J. (2001)10.1093/nar/28.1.257
{'key': 'e_1_2_7_48_1', 'first-page': '375', 'article-title': 'Coiled coils: New structures and new functions', 'volume': '21', 'author': 'Lupas A.', 'year': '1996', 'journal-title': 'Nucl. Acids Res.'}
/ Nucl. Acids Res. / Coiled coils: New structures and new functions by Lupas A. (1996)10.1016/S0076-6879(96)66032-7
10.1016/S0959-440X(97)80056-5
- Marti‐Renom M.A. Madhusudhan M.S. Fiser A. andSali A.2001.Accuracy of comparative modeling.http://pipe.rockefeller.edu/eva/cm/res/accuracy.html
10.1016/S0065-3233(00)54009-1
10.1093/protein/10.1.1
10.1002/(SICI)1097-0134(199604)24:4<467::AID-PROT6>3.0.CO;2-B
10.1016/S0959-440X(99)80051-7
10.1038/35001088
-
Przybylski D.andRost B.2001.Alignments grow secondary structure prediction improves. Columbia University.
(
10.1002/prot.10029
) 10.1016/S0969-2126(98)00029-X
10.1093/protein/12.2.85
10.1002/pro.5560050824
10.1038/4136
10.1002/prot.340090107
10.1016/S0378-1119(99)00310-8
10.1016/S0958-1669(99)00064-6
10.1128/jb.179.22.7135-7155.1997
10.1016/0014-5793(96)00527-3
10.1093/bioinformatics/14.6.542
10.1126/science.287.5459.1809
10.1016/S0955-0674(00)00111-3
10.1038/35057062
10.1038/41483
10.1126/science.1058040
10.1016/S0092-8674(00)80582-6
10.1016/S0092-8674(00)00021-0
10.1002/pro.5560070420
10.1126/science.286.5444.1571
10.1016/S1359-0278(98)00069-8
Dates
Type | When |
---|---|
Created | 23 years, 1 month ago (July 26, 2002, 7:59 p.m.) |
Deposited | 1 year, 10 months ago (Oct. 9, 2023, 4:58 p.m.) |
Indexed | 1 month, 3 weeks ago (July 7, 2025, 8:04 a.m.) |
Issued | 23 years, 11 months ago (Oct. 1, 2001) |
Published | 23 years, 11 months ago (Oct. 1, 2001) |
Published Online | 16 years, 8 months ago (Dec. 31, 2008) |
Published Print | 23 years, 11 months ago (Oct. 1, 2001) |
@article{Liu_2001, title={Comparing function and structure between entire proteomes}, volume={10}, ISSN={1469-896X}, url={http://dx.doi.org/10.1110/ps.10101}, DOI={10.1110/ps.10101}, number={10}, journal={Protein Science}, publisher={Wiley}, author={Liu, Jinfeng and Rost, Burkhard}, year={2001}, month=oct, pages={1970–1979} }