DOI: 10.1162/08997660360675080. Recurrent Neural Networks with Small Weights Implement Definite Memory Machines

Recurrent Neural Networks with Small Weights Implement Definite Memory Machines

10.1162/08997660360675080

Crossref journal-article

MIT Press - Journals

Neural Computation (281)

Abstract

Recent experimental studies indicate that recurrent neural networks initialized with “small” weights are inherently biased toward definite memory machines (Tiňno, Čerňanský, & Beňušková, 2002a, 2002b). This article establishes a theoretical counterpart: transition function of recurrent network with small weights and squashing activation function is a contraction. We prove that recurrent networks with contractive transition function can be approximated arbitrarily well on input sequences of unbounded length by a definite memory machine. Conversely, every definite memory machine can be simulated by a recurrent network with contractive transition function. Hence, initialization with small weights induces an architectural bias into learning with recurrent neural networks. This bias might have benefits from the point of view of statistical learning theory: it emphasizes one possible region of the weight space where generalization ability can be formally proved. It is well known that standard recurrent neural networks are not distribution independent learnable in the probably approximately correct (PAC) sense if arbitrary precision and inputs are considered. We prove that recurrent networks with contractive transition function with a fixed contraction parameter fulfill the so-called distribution independent uniform convergence of empirical distances property and hence, unlike general recurrent networks, are distribution independent PAC learnable.

Bibliography

Hammer, B., & TiÅo, P. (2003). Recurrent Neural Networks with Small Weights Implement Definite Memory Machines. Neural Computation, 15(8), 1897â1929.

Authors 2

Barbara Hammer (first)
Peter Tiňo (additional)

References 32 Referenced 42

10.1162/neco.1989.1.1.151
10.1109/72.536317
10.1109/72.279181
10.1214/aos/1018031204
10.1109/69.917555
10.1207/s15516709cog2302_2
10.1109/72.623208
{'issue': '6', 'key': 'p_13', 'first-page': '313', 'volume': '8', 'author': 'Frasconi P.', 'year': '1995', 'journal-title': 'IEEE Transactions on Knowledge and Data Engineering'} / IEEE Transactions on Knowledge and Data Engineering by Frasconi P. (1995)
{'key': 'p_14', 'first-page': '831', 'volume': '12', 'author': 'Funahashi K.', 'year': '1993', 'journal-title': 'Neural Networks'} / Neural Networks by Funahashi K. (1993)
10.1016/0893-6080(95)00041-0
10.1007/PL00009845
10.1109/69.917560
10.1016/0890-5401(92)90010-D
10.1162/neco.1997.9.8.1735
10.1016/S0893-6080(09)80018-X
10.1016/0893-6080(89)90020-8
10.1007/BF01000408
10.1162/089976698300017359
10.1162/089976699300016656
10.1126/science.267326
10.1109/TASSP.1984.1164378
10.1145/235809.235811
10.1162/neco.1996.8.4.675
10.1007/BF00114008
{'key': 'p_42', 'first-page': '145', 'volume': '1', 'author': 'Sejnowski T.', 'year': '1987', 'journal-title': 'Complex Systems'} / Complex Systems by Sejnowski T. (1987)
10.1109/18.705570
10.1016/0304-3975(94)90178-3
10.1006/jcss.1995.1013
10.1016/0022-0000(92)90039-L
10.1023/A:1010972803901
{'key': 'p_53', 'first-page': '822', 'volume': '4', 'author': 'Tio P.', 'year': '1995', 'journal-title': 'Neural Computation'} / Neural Computation by Tio P. (1995)
10.1109/29.21701

Dates

Type	When
Created	22 years, 1 month ago (July 16, 2003, 5:24 p.m.)
Deposited	4 years, 5 months ago (March 12, 2021, 4:50 p.m.)
Indexed	4 months, 3 weeks ago (April 15, 2025, 2:11 a.m.)
Issued	22 years, 1 month ago (Aug. 1, 2003)
Published	22 years, 1 month ago (Aug. 1, 2003)
Published Print	22 years, 1 month ago (Aug. 1, 2003)

Funders 0

None

BibTeX

@article{Hammer_2003, title={Recurrent Neural Networks with Small Weights Implement Definite Memory Machines}, volume={15}, ISSN={1530-888X}, url={http://dx.doi.org/10.1162/08997660360675080}, DOI={10.1162/08997660360675080}, number={8}, journal={Neural Computation}, publisher={MIT Press - Journals}, author={Hammer, Barbara and Tiňo, Peter}, year={2003}, month=aug, pages={1897–1929} }

JSON

{
  "indexed": {
    "date-parts": [
      [
        2025,
        4,
        15
      ]
    ],
    "date-time": "2025-04-15T06:11:45Z",
    "timestamp": 1744697505402
  },
  "reference-count": 32,
  "publisher": "MIT Press - Journals",
  "issue": "8",
  "content-domain": {
    "domain": [],
    "crossmark-restriction": false
  },
  "published-print": {
    "date-parts": [
      [
        2003,
        8,
        1
      ]
    ]
  },
  "abstract": "<jats:p> Recent experimental studies indicate that recurrent neural networks initialized with \u201csmall\u201d weights are inherently biased toward definite memory machines (Ti\u0148no, \u010cer\u0148ansk\u00fd, &amp; Be\u0148u\u0161kov\u00e1, 2002a, 2002b). This article establishes a theoretical counterpart: transition function of recurrent network with small weights and squashing activation function is a contraction. We prove that recurrent networks with contractive transition function can be approximated arbitrarily well on input sequences of unbounded length by a definite memory machine. Conversely, every definite memory machine can be simulated by a recurrent network with contractive transition function. Hence, initialization with small weights induces an architectural bias into learning with recurrent neural networks. This bias might have benefits from the point of view of statistical learning theory: it emphasizes one possible region of the weight space where generalization ability can be formally proved. It is well known that standard recurrent neural networks are not distribution independent learnable in the probably approximately correct (PAC) sense if arbitrary precision and inputs are considered. We prove that recurrent networks with contractive transition function with a fixed contraction parameter fulfill the so-called distribution independent uniform convergence of empirical distances property and hence, unlike general recurrent networks, are distribution independent PAC learnable. </jats:p>",
  "DOI": "10.1162/08997660360675080",
  "type": "journal-article",
  "created": {
    "date-parts": [
      [
        2003,
        7,
        16
      ]
    ],
    "date-time": "2003-07-16T21:24:50Z",
    "timestamp": 1058390690000
  },
  "page": "1897-1929",
  "source": "Crossref",
  "is-referenced-by-count": 42,
  "title": "Recurrent Neural Networks with Small Weights Implement Definite Memory Machines",
  "prefix": "10.1162",
  "volume": "15",
  "author": [
    {
      "given": "Barbara",
      "family": "Hammer",
      "sequence": "first",
      "affiliation": [
        {
          "name": "Department of Mathematics Computer Science, University of Osnabr\u00fcck, D-49069, Osnabr\u00fcck, Germany,"
        }
      ]
    },
    {
      "given": "Peter",
      "family": "Ti\u0148o",
      "sequence": "additional",
      "affiliation": [
        {
          "name": "School of Computer Science, University of Birmingham, Edgbaston, Birmingham B15 2TT, U.K.,"
        }
      ]
    }
  ],
  "member": "281",
  "reference": [
    {
      "key": "p_5",
      "doi-asserted-by": "publisher",
      "DOI": "10.1162/neco.1989.1.1.151"
    },
    {
      "key": "p_6",
      "doi-asserted-by": "publisher",
      "DOI": "10.1109/72.536317"
    },
    {
      "key": "p_7",
      "doi-asserted-by": "publisher",
      "DOI": "10.1109/72.279181"
    },
    {
      "key": "p_8",
      "doi-asserted-by": "publisher",
      "DOI": "10.1214/aos/1018031204"
    },
    {
      "key": "p_9",
      "doi-asserted-by": "publisher",
      "DOI": "10.1109/69.917555"
    },
    {
      "key": "p_10",
      "doi-asserted-by": "publisher",
      "DOI": "10.1207/s15516709cog2302_2"
    },
    {
      "key": "p_11",
      "doi-asserted-by": "publisher",
      "DOI": "10.1109/72.623208"
    },
    {
      "issue": "6",
      "key": "p_13",
      "first-page": "313",
      "volume": "8",
      "author": "Frasconi P.",
      "year": "1995",
      "journal-title": "IEEE Transactions on Knowledge and Data Engineering"
    },
    {
      "key": "p_14",
      "first-page": "831",
      "volume": "12",
      "author": "Funahashi K.",
      "year": "1993",
      "journal-title": "Neural Networks"
    },
    {
      "key": "p_15",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0893-6080(95)00041-0"
    },
    {
      "key": "p_19",
      "doi-asserted-by": "publisher",
      "DOI": "10.1007/PL00009845"
    },
    {
      "key": "p_20",
      "doi-asserted-by": "publisher",
      "DOI": "10.1109/69.917560"
    },
    {
      "key": "p_21",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0890-5401(92)90010-D"
    },
    {
      "key": "p_22",
      "doi-asserted-by": "publisher",
      "DOI": "10.1162/neco.1997.9.8.1735"
    },
    {
      "key": "p_23",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/S0893-6080(09)80018-X"
    },
    {
      "key": "p_24",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0893-6080(89)90020-8"
    },
    {
      "key": "p_32",
      "doi-asserted-by": "publisher",
      "DOI": "10.1007/BF01000408"
    },
    {
      "key": "p_33",
      "doi-asserted-by": "publisher",
      "DOI": "10.1162/089976698300017359"
    },
    {
      "key": "p_34",
      "doi-asserted-by": "publisher",
      "DOI": "10.1162/089976699300016656"
    },
    {
      "key": "p_35",
      "doi-asserted-by": "publisher",
      "DOI": "10.1126/science.267326"
    },
    {
      "key": "p_36",
      "doi-asserted-by": "publisher",
      "DOI": "10.1109/TASSP.1984.1164378"
    },
    {
      "key": "p_38",
      "doi-asserted-by": "publisher",
      "DOI": "10.1145/235809.235811"
    },
    {
      "key": "p_39",
      "doi-asserted-by": "publisher",
      "DOI": "10.1162/neco.1996.8.4.675"
    },
    {
      "key": "p_41",
      "doi-asserted-by": "publisher",
      "DOI": "10.1007/BF00114008"
    },
    {
      "key": "p_42",
      "first-page": "145",
      "volume": "1",
      "author": "Sejnowski T.",
      "year": "1987",
      "journal-title": "Complex Systems"
    },
    {
      "key": "p_43",
      "doi-asserted-by": "publisher",
      "DOI": "10.1109/18.705570"
    },
    {
      "key": "p_44",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0304-3975(94)90178-3"
    },
    {
      "key": "p_45",
      "doi-asserted-by": "publisher",
      "DOI": "10.1006/jcss.1995.1013"
    },
    {
      "key": "p_46",
      "doi-asserted-by": "publisher",
      "DOI": "10.1016/0022-0000(92)90039-L"
    },
    {
      "key": "p_51",
      "doi-asserted-by": "publisher",
      "DOI": "10.1023/A:1010972803901"
    },
    {
      "key": "p_53",
      "first-page": "822",
      "volume": "4",
      "author": "Tio P.",
      "year": "1995",
      "journal-title": "Neural Computation"
    },
    {
      "key": "p_55",
      "doi-asserted-by": "publisher",
      "DOI": "10.1109/29.21701"
    }
  ],
  "container-title": "Neural Computation",
  "original-title": [],
  "language": "en",
  "link": [
    {
      "URL": "https://www.mitpressjournals.org/doi/pdf/10.1162/08997660360675080",
      "content-type": "unspecified",
      "content-version": "vor",
      "intended-application": "similarity-checking"
    }
  ],
  "deposited": {
    "date-parts": [
      [
        2021,
        3,
        12
      ]
    ],
    "date-time": "2021-03-12T21:50:25Z",
    "timestamp": 1615585825000
  },
  "score": 1,
  "resource": {
    "primary": {
      "URL": "https://direct.mit.edu/neco/article/15/8/1897-1929/6758"
    }
  },
  "subtitle": [],
  "short-title": [],
  "issued": {
    "date-parts": [
      [
        2003,
        8,
        1
      ]
    ]
  },
  "references-count": 32,
  "journal-issue": {
    "issue": "8",
    "published-print": {
      "date-parts": [
        [
          2003,
          8,
          1
        ]
      ]
    }
  },
  "alternative-id": [
    "10.1162/08997660360675080"
  ],
  "URL": "http://dx.doi.org/10.1162/08997660360675080",
  "relation": {},
  "ISSN": [
    "0899-7667",
    "1530-888X"
  ],
  "subject": [],
  "container-title-short": "Neural Computation",
  "published": {
    "date-parts": [
      [
        2003,
        8,
        1
      ]
    ]
  }
}