DOI: 10.1002/pro.5560030314. Modular arrangement of proteins as inferred from analysis of homology

Modular arrangement of proteins as inferred from analysis of homology

10.1002/pro.5560030314

Crossref journal-article

Wiley

Protein Science (311)

Abstract

AbstractThe structure of many proteins consists of a combination of discrete modules that have been shuffled during evolution. Such modules can frequently be recognized from the analysis of homology. Here we present a systematic analysis of the modular organization of all sequenced proteins. To achieve this we have developed an automatic method to identify protein domains from sequence comparisons. Homologous domains can then be clustered into consistent families. The method was applied to all 21,098 nonfragment protein sequences in SWISS‐PROT 21.0, which was automatically reorganized into a comprehensive protein domain database, ProDom. We have constructed multiple sequence alignments for each domain family in ProDom, from which consensus sequences were generated. These nonredundant domain consensuses are useful for fast homology searches. Domain organization in ProDom is exemplified for proteins of the phosphoenolpyruvate: sugar phosphotransferase system (PEP:PTS) and for bacterial 2‐component regulators. We provide 2 examples of previously unrecognized domain arrangements discovered with the help of, ProDom.

Bibliography

Sonnhammer, E. L. L., & Kahn, D. (1994). Modular arrangement of proteins as inferred from analysis of homology. Protein Science, 3(3), 482â492. Portico.

Authors 2

Erik L.L. Sonnhammer (first)
Daniel Kahn (additional)

References 43 Referenced 166

10.1146/annurev.ge.23.120189.001523
10.1016/0022-2836(91)90193-A
10.1016/S0022-2836(05)80360-2
10.1016/0076-6879(90)83023-3
10.1111/j.1365-2958.1991.tb02093.x
10.1093/nar/20.suppl.2013
10.1016/0968-0004(91)90009-K
{'key': 'e_1_2_1_9_1', 'first-page': '319', 'article-title': 'Efficient recognition of immunoglobulin domains from amino acid sequences using a neural network', 'volume': '6', 'author': 'Bengio Y', 'year': '1990', 'journal-title': 'Comput Appl Bio‐sci'} / Comput Appl Bio‐sci / Efficient recognition of immunoglobulin domains from amino acid sequences using a neural network by Bengio Y (1990)
10.1016/0014-5793(89)81818-6
10.1016/0014-5793(91)80937-X
10.1002/pro.5560011216
10.1073/pnas.89.20.9415
10.1038/357543a0
10.1093/nar/16.22.10881
10.1002/pro.5560010201
10.1126/science.2255907
10.1007/BF02603120
{'key': 'e_1_2_1_19_1', 'first-page': '39', 'article-title': 'Clustering proteins into families using artificial neural networks', 'volume': '8', 'author': 'Ferran EA', 'year': '1992', 'journal-title': 'Comput Appl Biosci'} / Comput Appl Biosci / Clustering proteins into families using artificial neural networks by Ferran EA (1992)
10.1073/pnas.84.13.4355
{'key': 'e_1_2_1_21_1', 'first-page': '837', 'volume-title': 'Proceedings of the Tenth National Conference on Artificial Intelligence.', 'author': 'Harris NL', 'year': '1992'} / Proceedings of the Tenth National Conference on Artificial Intelligence. by Harris NL (1992)
10.1093/nar/19.23.6565
10.1016/0076-6879(90)83009-X
{'key': 'e_1_2_1_24_1', 'first-page': '15', 'article-title': 'Sequence ordinates: A multivariate analysis approach to analysing large sequence data sets', 'volume': '8', 'author': 'Higgins DG', 'year': '1992', 'journal-title': 'Comput Appl Biosci'} / Comput Appl Biosci / Sequence ordinates: A multivariate analysis approach to analysing large sequence data sets by Higgins DG (1992)
10.1128/jb.174.4.1428-1431.1992
10.1007/BF00044154
10.1111/j.1365-2958.1991.tb00774.x
10.1073/pnas.87.6.2264
10.1146/annurev.bb.20.060191.001135
10.1016/0968-0004(91)90088-D
{'key': 'e_1_2_1_31_1', 'volume-title': 'The C programming language.', 'author': 'Kernighan BW', 'year': '1988'} / The C programming language. by Kernighan BW (1988)
10.1016/S0959-437X(05)80113-3
10.1146/annurev.ge.26.120192.000443
10.1016/0959-440X(91)90033-P
10.1093/protein/6.4.391
10.1128/jb.174.5.1433-1438.1992
10.1002/prot.340140105
10.1016/S0021-9258(17)39227-X / J Biol Chem / Multiple forms of the CheB methylesterase in bacterial sensing by Simms SA (1985)
10.1073/pnas.87.1.118
10.1016/0076-6879(90)83031-4
UtsumiR KatayamaS IkedaM IgakiS NakagawaH MiwaA TaniguchiM NodaM.1992. Cloning and sequence analysis of the evgAS genes involved in signal transduction of Escherichia coli K‐12.Nucleic Acids Symp Ser.pp149–150.
10.1016/0022-2836(91)90360-I
10.1002/pro.5560010512
10.1016/S0022-2836(05)80256-6

Dates

Type	When
Created	15 years, 1 month ago (July 12, 2010, 1:35 a.m.)
Deposited	1 year, 10 months ago (Oct. 24, 2023, 8:04 a.m.)
Indexed	1 month, 4 weeks ago (July 7, 2025, 8:04 a.m.)
Issued	31 years, 6 months ago (March 1, 1994)
Published	31 years, 6 months ago (March 1, 1994)
Published Online	16 years, 8 months ago (Dec. 31, 2008)
Published Print	31 years, 6 months ago (March 1, 1994)

Funders 0

None

BibTeX

@article{Sonnhammer_1994, title={Modular arrangement of proteins as inferred from analysis of homology}, volume={3}, ISSN={1469-896X}, url={http://dx.doi.org/10.1002/pro.5560030314}, DOI={10.1002/pro.5560030314}, number={3}, journal={Protein Science}, publisher={Wiley}, author={Sonnhammer, Erik L.L. and Kahn, Daniel}, year={1994}, month=mar, pages={482–492} }

JSON

{
  "indexed": {
    "date-parts": [
      [
        2025,
        7,
        7
      ]
    ],
    "date-time": "2025-07-07T12:04:49Z",
    "timestamp": 1751889889072
  },
  "reference-count": 43,
  "publisher": "Wiley",
  "issue": "3",
  "license": [
    {
      "start": {
        "date-parts": [
          [
            2008,
            12,
            31
          ]
        ],
        "date-time": "2008-12-31T00:00:00Z",
        "timestamp": 1230681600000
      },
      "content-version": "vor",
      "delay-in-days": 5419,
      "URL": "http://onlinelibrary.wiley.com/termsAndConditions#vor"
    }
  ],
  "content-domain": {
    "domain": [],
    "crossmark-restriction": false
  },
  "published-print": {
    "date-parts": [
      [
        1994,
        3
      ]
    ]
  },
  "abstract": "<jats:title>Abstract</jats:title><jats:p>The structure of many proteins consists of a combination of discrete modules that have been shuffled during evolution. Such modules can frequently be recognized from the analysis of homology. Here we present a systematic analysis of the modular organization of all sequenced proteins. To achieve this we have developed an automatic method to identify protein domains from sequence comparisons. Homologous domains can then be clustered into consistent families. The method was applied to all 21,098 nonfragment protein sequences in SWISS\u2010PROT 21.0, which was automatically reorganized into a comprehensive protein domain database, ProDom. We have constructed multiple sequence alignments for each domain family in ProDom, from which consensus sequences were generated. These nonredundant domain consensuses are useful for fast homology searches. Domain organization in ProDom is exemplified for proteins of the phosphoenolpyruvate: sugar phosphotransferase system (PEP:PTS) and for bacterial 2\u2010component regulators. We provide 2 examples of previously unrecognized domain arrangements discovered with the help of, ProDom.</jats:p>",
  "DOI": "10.1002/pro.5560030314",
  "type": "journal-article",
  "created": {
    "date-parts": [
      [
        2010,
        7,
        12
      ]
    ],
    "date-time": "2010-07-12T05:35:44Z",
    "timestamp": 1278912944000
  },
  "page": "482-492",
  "source": "Crossref",
  "is-referenced-by-count": 166,
  "title": "Modular arrangement of proteins as inferred from analysis of homology",
  "prefix": "10.1002",
  "volume": "3",
  "author": [
    {
      "given": "Erik L.L.",
      "family": "Sonnhammer",
      "sequence": "first",
      "affiliation": []
    },
    {
      "given": "Daniel",
      "family": "Kahn",
      "sequence": "additional",
      "affiliation": []
    }
  ],
  "member": "311",
  "published-online": {
    "date-parts": [
      [
        2008,
        12,
        31
      ]
    ]
  },
  "reference": [
    {
      "key": "e_1_2_1_2_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1146/annurev.ge.23.120189.001523"
    },
    {
      "key": "e_1_2_1_3_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0022-2836(91)90193-A"
    },
    {
      "key": "e_1_2_1_4_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/S0022-2836(05)80360-2"
    },
    {
      "key": "e_1_2_1_5_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0076-6879(90)83023-3"
    },
    {
      "key": "e_1_2_1_6_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1111/j.1365-2958.1991.tb02093.x"
    },
    {
      "key": "e_1_2_1_7_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/nar/20.suppl.2013"
    },
    {
      "key": "e_1_2_1_8_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0968-0004(91)90009-K"
    },
    {
      "key": "e_1_2_1_9_1",
      "first-page": "319",
      "article-title": "Efficient recognition of immunoglobulin domains from amino acid sequences using a neural network",
      "volume": "6",
      "author": "Bengio Y",
      "year": "1990",
      "journal-title": "Comput Appl Bio\u2010sci"
    },
    {
      "key": "e_1_2_1_10_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0014-5793(89)81818-6"
    },
    {
      "key": "e_1_2_1_11_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0014-5793(91)80937-X"
    },
    {
      "key": "e_1_2_1_12_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/pro.5560011216"
    },
    {
      "key": "e_1_2_1_13_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1073/pnas.89.20.9415"
    },
    {
      "key": "e_1_2_1_14_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1038/357543a0"
    },
    {
      "key": "e_1_2_1_15_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/nar/16.22.10881"
    },
    {
      "key": "e_1_2_1_16_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/pro.5560010201"
    },
    {
      "key": "e_1_2_1_17_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1126/science.2255907"
    },
    {
      "key": "e_1_2_1_18_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1007/BF02603120"
    },
    {
      "key": "e_1_2_1_19_1",
      "first-page": "39",
      "article-title": "Clustering proteins into families using artificial neural networks",
      "volume": "8",
      "author": "Ferran EA",
      "year": "1992",
      "journal-title": "Comput Appl Biosci"
    },
    {
      "key": "e_1_2_1_20_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1073/pnas.84.13.4355"
    },
    {
      "key": "e_1_2_1_21_1",
      "first-page": "837",
      "volume-title": "Proceedings of the Tenth National Conference on Artificial Intelligence.",
      "author": "Harris NL",
      "year": "1992"
    },
    {
      "key": "e_1_2_1_22_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/nar/19.23.6565"
    },
    {
      "key": "e_1_2_1_23_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0076-6879(90)83009-X"
    },
    {
      "key": "e_1_2_1_24_1",
      "first-page": "15",
      "article-title": "Sequence ordinates: A multivariate analysis approach to analysing large sequence data sets",
      "volume": "8",
      "author": "Higgins DG",
      "year": "1992",
      "journal-title": "Comput Appl Biosci"
    },
    {
      "key": "e_1_2_1_25_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1128/jb.174.4.1428-1431.1992"
    },
    {
      "key": "e_1_2_1_26_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1007/BF00044154"
    },
    {
      "key": "e_1_2_1_27_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1111/j.1365-2958.1991.tb00774.x"
    },
    {
      "key": "e_1_2_1_28_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1073/pnas.87.6.2264"
    },
    {
      "key": "e_1_2_1_29_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1146/annurev.bb.20.060191.001135"
    },
    {
      "key": "e_1_2_1_30_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0968-0004(91)90088-D"
    },
    {
      "key": "e_1_2_1_31_1",
      "volume-title": "The C programming language.",
      "author": "Kernighan BW",
      "year": "1988"
    },
    {
      "key": "e_1_2_1_32_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/S0959-437X(05)80113-3"
    },
    {
      "key": "e_1_2_1_33_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1146/annurev.ge.26.120192.000443"
    },
    {
      "key": "e_1_2_1_34_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0959-440X(91)90033-P"
    },
    {
      "key": "e_1_2_1_35_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1093/protein/6.4.391"
    },
    {
      "key": "e_1_2_1_36_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1128/jb.174.5.1433-1438.1992"
    },
    {
      "key": "e_1_2_1_37_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/prot.340140105"
    },
    {
      "key": "e_1_2_1_38_1",
      "doi-asserted-by": "crossref",
      "first-page": "10161",
      "DOI": "10.1016/S0021-9258(17)39227-X",
      "article-title": "Multiple forms of the CheB methylesterase in bacterial sensing",
      "volume": "260",
      "author": "Simms SA",
      "year": "1985",
      "journal-title": "J Biol Chem"
    },
    {
      "key": "e_1_2_1_39_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1073/pnas.87.1.118"
    },
    {
      "key": "e_1_2_1_40_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0076-6879(90)83031-4"
    },
    {
      "key": "e_1_2_1_41_1",
      "unstructured": "UtsumiR KatayamaS IkedaM IgakiS NakagawaH MiwaA TaniguchiM NodaM.1992. Cloning and sequence analysis of the evgAS genes involved in signal transduction of Escherichia coli K\u201012.Nucleic Acids Symp Ser.pp149\u2013150."
    },
    {
      "key": "e_1_2_1_42_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0022-2836(91)90360-I"
    },
    {
      "key": "e_1_2_1_43_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1002/pro.5560010512"
    },
    {
      "key": "e_1_2_1_44_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/S0022-2836(05)80256-6"
    }
  ],
  "container-title": "Protein Science",
  "original-title": [],
  "language": "en",
  "link": [
    {
      "URL": "https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1002%2Fpro.5560030314",
      "content-type": "unspecified",
      "content-version": "vor",
      "intended-application": "text-mining"
    },
    {
      "URL": "https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1002%2Fpro.5560030314",
      "content-type": "application/pdf",
      "content-version": "vor",
      "intended-application": "text-mining"
    },
    {
      "URL": "https://onlinelibrary.wiley.com/doi/pdf/10.1002/pro.5560030314",
      "content-type": "unspecified",
      "content-version": "vor",
      "intended-application": "similarity-checking"
    }
  ],
  "deposited": {
    "date-parts": [
      [
        2023,
        10,
        24
      ]
    ],
    "date-time": "2023-10-24T12:04:35Z",
    "timestamp": 1698149075000
  },
  "score": 1,
  "resource": {
    "primary": {
      "URL": "https://onlinelibrary.wiley.com/doi/10.1002/pro.5560030314"
    }
  },
  "subtitle": [],
  "short-title": [],
  "issued": {
    "date-parts": [
      [
        1994,
        3
      ]
    ]
  },
  "references-count": 43,
  "journal-issue": {
    "issue": "3",
    "published-print": {
      "date-parts": [
        [
          1994,
          3
        ]
      ]
    }
  },
  "alternative-id": [
    "10.1002/pro.5560030314"
  ],
  "URL": "http://dx.doi.org/10.1002/pro.5560030314",
  "relation": {},
  "ISSN": [
    "0961-8368",
    "1469-896X"
  ],
  "subject": [],
  "container-title-short": "Protein Science",
  "published": {
    "date-parts": [
      [
        1994,
        3
      ]
    ]
  }
}