DOI: 10.1145/2623330.2623612. Efficient mini-batch training for stochastic optimization

Efficient mini-batch training for stochastic optimization

10.1145/2623330.2623612

Crossref proceedings-article

ACM

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (320)

Bibliography

Li, M., Zhang, T., Chen, Y., & Smola, A. J. (2014). Efficient mini-batch training for stochastic optimization. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 661â670.

Authors 4

Mu Li (first)
Tong Zhang (additional)
Yuqiang Chen (additional)
Alexander J. Smola (additional)

References 27 Referenced 448

10.1111/j.2517-6161.1974.tb00999.x
10.1561/2200000016
R. Byrd , S. Hansen , J. Nocedal , and Y. Singer . A stochastic quasi-newton method for large-scale optimization. arXiv preprint arXiv:1401.7020 , 2014 . R. Byrd, S. Hansen, J. Nocedal, and Y. Singer. A stochastic quasi-newton method for large-scale optimization. arXiv preprint arXiv:1401.7020, 2014. / A stochastic quasi-newton method for large-scale optimization. arXiv preprint arXiv:1401.7020 by Byrd R. (2014)
10.1007/s10107-012-0572-5
10.1145/2020408.2020517
A. Cotter , O. Shamir , N. Srebro , and K. Sridharan . Better mini-batch algorithms via accelerated gradient methods . In NIPS , volume 24 , pages 1647 -- 1655 , 2011 . A. Cotter, O. Shamir, N. Srebro, and K. Sridharan. Better mini-batch algorithms via accelerated gradient methods. In NIPS, volume 24, pages 1647--1655, 2011. / NIPS by Cotter A. (2011)
J. Dean , G. Corrado , R. Monga , K. Chen , M. Devin , Q. Le , M. Mao , M. Ranzato , A. Senior , P. Tucker , K. Yang , and A. Ng . Large scale distributed deep networks . In Neural Information Processing Systems , 2012 . J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng. Large scale distributed deep networks. In Neural Information Processing Systems, 2012. / Neural Information Processing Systems by Dean J. (2012)
O. Dekel R. Gilad-Bachrach O. Shamir and L. Xiao. Optimal distributed online prediction using mini-batches. Technical report http://arxiv.org/abs/1012.1367 2010. O. Dekel R. Gilad-Bachrach O. Shamir and L. Xiao. Optimal distributed online prediction using mini-batches. Technical report http://arxiv.org/abs/1012.1367 2010.
10.5555/1390681.1442794
10.1145/2020408.2020426
10.5555/1870568.1870593
T. Hastie , R. Tibshirani , and J. Friedman . The Elements of Statistical Learning . Springer , New York , 2 edition, 2009 . T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, New York, 2 edition, 2009. (10.1007/978-0-387-84858-7) / The Elements of Statistical Learning by Hastie T. (2009)
R. Johnson and T. Zhang . Accelerating stochastic gradient descent using predictive variance reduction . In Advances in Neural Information Processing Systems , pages 315 -- 323 , 2013 . R. Johnson and T. Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In Advances in Neural Information Processing Systems, pages 315--323, 2013. / Advances in Neural Information Processing Systems by Johnson R. (2013)
M. I. Jordan . An Introduction to Probabilistic Graphical Models . MIT Press , 2008 . To Appear. M. I. Jordan. An Introduction to Probabilistic Graphical Models. MIT Press, 2008. To Appear. / An Introduction to Probabilistic Graphical Models by Jordan M. I. (2008)
10.1109/18.910572
B. Kulis and P. L. Bartlett . Implicit online learning . In Proc.\ Intl.\ Conf.\ Machine Learning , 2010 . B. Kulis and P. L. Bartlett. Implicit online learning. In Proc.\ Intl.\ Conf.\ Machine Learning, 2010. / Proc.\ Intl.\ Conf.\ Machine Learning by Kulis B. (2010)
10.1109/CVPR.2011.5995477
10.1007/BF01589116
D. Mahajan , S. S. Keerthi , S. Sundararajan , and L. Bottou . A parallel sgd method with strong convergence. arXiv preprint arXiv:1311.0636 , 2013 . D. Mahajan, S. S. Keerthi, S. Sundararajan, and L. Bottou. A parallel sgd method with strong convergence. arXiv preprint arXiv:1311.0636, 2013. / A parallel sgd method with strong convergence. arXiv preprint arXiv:1311.0636 by Mahajan D. (2013)
G. Mann , R. McDonald , M. Mohri , N. Silberman , and D. Walker . Efficient large-scale distributed training of conditional maximum entropy models. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors , Advances in Neural Information Processing Systems 22 , pages 1231 -- 1239 , 2009 . G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1231--1239, 2009. / Advances in Neural Information Processing Systems 22 by Mann G. (2009)
10.1145/2339530.2339559
D. Mimno , M. Hoffman , and D. Blei . Sparse stochastic inference for latent dirichlet allocation . In International Conference on Machine Learning , 2012 . D. Mimno, M. Hoffman, and D. Blei. Sparse stochastic inference for latent dirichlet allocation. In International Conference on Machine Learning, 2012. / International Conference on Machine Learning by Mimno D. (2012)
S. Shalev-Shwartz and T. Zhang . Accelerated mini-batch stochastic dual coordinate ascent . In Advances in Neural Information Processing Systems , pages 378 -- 385 , 2013 . S. Shalev-Shwartz and T. Zhang. Accelerated mini-batch stochastic dual coordinate ascent. In Advances in Neural Information Processing Systems, pages 378--385, 2013. / Advances in Neural Information Processing Systems by Shalev-Shwartz S. (2013)
M. Takác , A. Bijral , P. Richtárik , and N. Srebro . Mini-batch primal and dual methods for svms. arXiv preprint arXiv:1303.2314 , 2013 . M. Takác, A. Bijral, P. Richtárik, and N. Srebro. Mini-batch primal and dual methods for svms. arXiv preprint arXiv:1303.2314, 2013. / Mini-batch primal and dual methods for svms. arXiv preprint arXiv:1303.2314 by Takác M. (2013)
10.1145/1835804.1835910
M. Zinkevich . Online convex programming and generalised infinitesimal gradient ascent . In Proceedings of the International Conference on Machine Learning , pages 928 -- 936 , 2003 . M. Zinkevich. Online convex programming and generalised infinitesimal gradient ascent. In Proceedings of the International Conference on Machine Learning, pages 928--936, 2003. / Proceedings of the International Conference on Machine Learning by Zinkevich M. (2003)
M. Zinkevich , A. J. Smola , M. Weimer , and L. Li . Parallelized stochastic gradient descent. In nips23e , editor, nips23, pages 2595 -- 2603 , 2010 . M. Zinkevich, A. J. Smola, M. Weimer, and L. Li. Parallelized stochastic gradient descent. In nips23e, editor, nips23, pages 2595--2603, 2010. / Parallelized stochastic gradient descent. In nips23e by Zinkevich M. (2010)

Dates

Type	When
Created	11 years ago (Aug. 22, 2014, 3:38 p.m.)
Deposited	2 months, 2 weeks ago (June 18, 2025, 3:19 a.m.)
Indexed	1 day ago (Sept. 3, 2025, 6:16 a.m.)
Issued	11 years ago (Aug. 24, 2014)
Published	11 years ago (Aug. 24, 2014)
Published Online	11 years ago (Aug. 24, 2014)
Published Print	11 years ago (Aug. 24, 2014)

Funders 0

None

BibTeX

@inproceedings{Li_2014, series={KDD ’14}, title={Efficient mini-batch training for stochastic optimization}, url={http://dx.doi.org/10.1145/2623330.2623612}, DOI={10.1145/2623330.2623612}, booktitle={Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining}, publisher={ACM}, author={Li, Mu and Zhang, Tong and Chen, Yuqiang and Smola, Alexander J.}, year={2014}, month=aug, pages={661–670}, collection={KDD ’14} }

JSON

{
  "indexed": {
    "date-parts": [
      [
        2025,
        9,
        3
      ]
    ],
    "date-time": "2025-09-03T10:16:43Z",
    "timestamp": 1756894603844,
    "version": "3.41.0"
  },
  "publisher-location": "New York, NY, USA",
  "reference-count": 27,
  "publisher": "ACM",
  "license": [
    {
      "start": {
        "date-parts": [
          [
            2014,
            8,
            24
          ]
        ],
        "date-time": "2014-08-24T00:00:00Z",
        "timestamp": 1408838400000
      },
      "content-version": "vor",
      "delay-in-days": 0,
      "URL": "https://www.acm.org/publications/policies/copyright_policy#Background"
    }
  ],
  "content-domain": {
    "domain": [
      "dl.acm.org"
    ],
    "crossmark-restriction": true
  },
  "published-print": {
    "date-parts": [
      [
        2014,
        8,
        24
      ]
    ]
  },
  "DOI": "10.1145/2623330.2623612",
  "type": "proceedings-article",
  "created": {
    "date-parts": [
      [
        2014,
        8,
        22
      ]
    ],
    "date-time": "2014-08-22T19:38:46Z",
    "timestamp": 1408736326000
  },
  "page": "661-670",
  "update-policy": "https://doi.org/10.1145/crossmark-policy",
  "source": "Crossref",
  "is-referenced-by-count": 448,
  "title": "Efficient mini-batch training for stochastic optimization",
  "prefix": "10.1145",
  "author": [
    {
      "given": "Mu",
      "family": "Li",
      "sequence": "first",
      "affiliation": [
        {
          "name": "Carnegie Mellon University, Pittsburgh, PA, USA"
        }
      ]
    },
    {
      "given": "Tong",
      "family": "Zhang",
      "sequence": "additional",
      "affiliation": [
        {
          "name": "Baidu, Beijing, China"
        }
      ]
    },
    {
      "given": "Yuqiang",
      "family": "Chen",
      "sequence": "additional",
      "affiliation": [
        {
          "name": "Baidu, Beijing, China"
        }
      ]
    },
    {
      "given": "Alexander J.",
      "family": "Smola",
      "sequence": "additional",
      "affiliation": [
        {
          "name": "Carnegie Mellon University, Pittsburgh, PA, USA"
        }
      ]
    }
  ],
  "member": "320",
  "published-online": {
    "date-parts": [
      [
        2014,
        8,
        24
      ]
    ]
  },
  "reference": [
    {
      "key": "e_1_3_2_2_1_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1111/j.2517-6161.1974.tb00999.x"
    },
    {
      "key": "e_1_3_2_2_2_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1561/2200000016"
    },
    {
      "key": "e_1_3_2_2_3_1",
      "volume-title": "A stochastic quasi-newton method for large-scale optimization. arXiv preprint arXiv:1401.7020",
      "author": "Byrd R.",
      "year": "2014",
      "unstructured": "R. Byrd , S. Hansen , J. Nocedal , and Y. Singer . A stochastic quasi-newton method for large-scale optimization. arXiv preprint arXiv:1401.7020 , 2014 . R. Byrd, S. Hansen, J. Nocedal, and Y. Singer. A stochastic quasi-newton method for large-scale optimization. arXiv preprint arXiv:1401.7020, 2014."
    },
    {
      "key": "e_1_3_2_2_4_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1007/s10107-012-0572-5"
    },
    {
      "key": "e_1_3_2_2_5_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1145/2020408.2020517"
    },
    {
      "key": "e_1_3_2_2_6_1",
      "first-page": "1647",
      "volume-title": "NIPS",
      "volume": "24",
      "author": "Cotter A.",
      "year": "2011",
      "unstructured": "A. Cotter , O. Shamir , N. Srebro , and K. Sridharan . Better mini-batch algorithms via accelerated gradient methods . In NIPS , volume 24 , pages 1647 -- 1655 , 2011 . A. Cotter, O. Shamir, N. Srebro, and K. Sridharan. Better mini-batch algorithms via accelerated gradient methods. In NIPS, volume 24, pages 1647--1655, 2011."
    },
    {
      "key": "e_1_3_2_2_7_1",
      "volume-title": "Neural Information Processing Systems",
      "author": "Dean J.",
      "year": "2012",
      "unstructured": "J. Dean , G. Corrado , R. Monga , K. Chen , M. Devin , Q. Le , M. Mao , M. Ranzato , A. Senior , P. Tucker , K. Yang , and A. Ng . Large scale distributed deep networks . In Neural Information Processing Systems , 2012 . J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng. Large scale distributed deep networks. In Neural Information Processing Systems, 2012."
    },
    {
      "key": "e_1_3_2_2_8_1",
      "unstructured": "O. Dekel R. Gilad-Bachrach O. Shamir and L. Xiao. Optimal distributed online prediction using mini-batches. Technical report http://arxiv.org/abs/1012.1367 2010.  O. Dekel R. Gilad-Bachrach O. Shamir and L. Xiao. Optimal distributed online prediction using mini-batches. Technical report http://arxiv.org/abs/1012.1367 2010."
    },
    {
      "key": "e_1_3_2_2_9_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.5555/1390681.1442794"
    },
    {
      "key": "e_1_3_2_2_10_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1145/2020408.2020426"
    },
    {
      "key": "e_1_3_2_2_11_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.5555/1870568.1870593"
    },
    {
      "key": "e_1_3_2_2_12_1",
      "doi-asserted-by": "crossref",
      "DOI": "10.1007/978-0-387-84858-7",
      "volume-title": "The Elements of Statistical Learning",
      "author": "Hastie T.",
      "year": "2009",
      "unstructured": "T. Hastie , R. Tibshirani , and J. Friedman . The Elements of Statistical Learning . Springer , New York , 2 edition, 2009 . T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, New York, 2 edition, 2009."
    },
    {
      "key": "e_1_3_2_2_13_1",
      "first-page": "315",
      "volume-title": "Advances in Neural Information Processing Systems",
      "author": "Johnson R.",
      "year": "2013",
      "unstructured": "R. Johnson and T. Zhang . Accelerating stochastic gradient descent using predictive variance reduction . In Advances in Neural Information Processing Systems , pages 315 -- 323 , 2013 . R. Johnson and T. Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In Advances in Neural Information Processing Systems, pages 315--323, 2013."
    },
    {
      "key": "e_1_3_2_2_14_1",
      "volume-title": "An Introduction to Probabilistic Graphical Models",
      "author": "Jordan M. I.",
      "year": "2008",
      "unstructured": "M. I. Jordan . An Introduction to Probabilistic Graphical Models . MIT Press , 2008 . To Appear. M. I. Jordan. An Introduction to Probabilistic Graphical Models. MIT Press, 2008. To Appear."
    },
    {
      "key": "e_1_3_2_2_15_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1109/18.910572"
    },
    {
      "key": "e_1_3_2_2_16_1",
      "volume-title": "Proc.\\ Intl.\\ Conf.\\ Machine Learning",
      "author": "Kulis B.",
      "year": "2010",
      "unstructured": "B. Kulis and P. L. Bartlett . Implicit online learning . In Proc.\\ Intl.\\ Conf.\\ Machine Learning , 2010 . B. Kulis and P. L. Bartlett. Implicit online learning. In Proc.\\ Intl.\\ Conf.\\ Machine Learning, 2010."
    },
    {
      "key": "e_1_3_2_2_17_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1109/CVPR.2011.5995477"
    },
    {
      "key": "e_1_3_2_2_18_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1007/BF01589116"
    },
    {
      "key": "e_1_3_2_2_19_1",
      "volume-title": "A parallel sgd method with strong convergence. arXiv preprint arXiv:1311.0636",
      "author": "Mahajan D.",
      "year": "2013",
      "unstructured": "D. Mahajan , S. S. Keerthi , S. Sundararajan , and L. Bottou . A parallel sgd method with strong convergence. arXiv preprint arXiv:1311.0636 , 2013 . D. Mahajan, S. S. Keerthi, S. Sundararajan, and L. Bottou. A parallel sgd method with strong convergence. arXiv preprint arXiv:1311.0636, 2013."
    },
    {
      "key": "e_1_3_2_2_20_1",
      "first-page": "1231",
      "volume-title": "Advances in Neural Information Processing Systems 22",
      "author": "Mann G.",
      "year": "2009",
      "unstructured": "G. Mann , R. McDonald , M. Mohri , N. Silberman , and D. Walker . Efficient large-scale distributed training of conditional maximum entropy models. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors , Advances in Neural Information Processing Systems 22 , pages 1231 -- 1239 , 2009 . G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1231--1239, 2009."
    },
    {
      "key": "e_1_3_2_2_21_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1145/2339530.2339559"
    },
    {
      "key": "e_1_3_2_2_22_1",
      "volume-title": "International Conference on Machine Learning",
      "author": "Mimno D.",
      "year": "2012",
      "unstructured": "D. Mimno , M. Hoffman , and D. Blei . Sparse stochastic inference for latent dirichlet allocation . In International Conference on Machine Learning , 2012 . D. Mimno, M. Hoffman, and D. Blei. Sparse stochastic inference for latent dirichlet allocation. In International Conference on Machine Learning, 2012."
    },
    {
      "key": "e_1_3_2_2_23_1",
      "first-page": "378",
      "volume-title": "Advances in Neural Information Processing Systems",
      "author": "Shalev-Shwartz S.",
      "year": "2013",
      "unstructured": "S. Shalev-Shwartz and T. Zhang . Accelerated mini-batch stochastic dual coordinate ascent . In Advances in Neural Information Processing Systems , pages 378 -- 385 , 2013 . S. Shalev-Shwartz and T. Zhang. Accelerated mini-batch stochastic dual coordinate ascent. In Advances in Neural Information Processing Systems, pages 378--385, 2013."
    },
    {
      "key": "e_1_3_2_2_24_1",
      "volume-title": "Mini-batch primal and dual methods for svms. arXiv preprint arXiv:1303.2314",
      "author": "Tak\u00e1c M.",
      "year": "2013",
      "unstructured": "M. Tak\u00e1c , A. Bijral , P. Richt\u00e1rik , and N. Srebro . Mini-batch primal and dual methods for svms. arXiv preprint arXiv:1303.2314 , 2013 . M. Tak\u00e1c, A. Bijral, P. Richt\u00e1rik, and N. Srebro. Mini-batch primal and dual methods for svms. arXiv preprint arXiv:1303.2314, 2013."
    },
    {
      "key": "e_1_3_2_2_25_1",
      "doi-asserted-by": "publisher",
      "DOI": "10.1145/1835804.1835910"
    },
    {
      "key": "e_1_3_2_2_26_1",
      "first-page": "928",
      "volume-title": "Proceedings of the International Conference on Machine Learning",
      "author": "Zinkevich M.",
      "year": "2003",
      "unstructured": "M. Zinkevich . Online convex programming and generalised infinitesimal gradient ascent . In Proceedings of the International Conference on Machine Learning , pages 928 -- 936 , 2003 . M. Zinkevich. Online convex programming and generalised infinitesimal gradient ascent. In Proceedings of the International Conference on Machine Learning, pages 928--936, 2003."
    },
    {
      "key": "e_1_3_2_2_27_1",
      "first-page": "2595",
      "volume-title": "Parallelized stochastic gradient descent. In nips23e",
      "author": "Zinkevich M.",
      "year": "2010",
      "unstructured": "M. Zinkevich , A. J. Smola , M. Weimer , and L. Li . Parallelized stochastic gradient descent. In nips23e , editor, nips23, pages 2595 -- 2603 , 2010 . M. Zinkevich, A. J. Smola, M. Weimer, and L. Li. Parallelized stochastic gradient descent. In nips23e, editor, nips23, pages 2595--2603, 2010."
    }
  ],
  "event": "KDD '14: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
  "container-title": "Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining",
  "original-title": [],
  "link": [
    {
      "URL": "https://dl.acm.org/doi/10.1145/2623330.2623612",
      "content-type": "unspecified",
      "content-version": "vor",
      "intended-application": "text-mining"
    },
    {
      "URL": "https://dl.acm.org/doi/pdf/10.1145/2623330.2623612",
      "content-type": "unspecified",
      "content-version": "vor",
      "intended-application": "similarity-checking"
    }
  ],
  "deposited": {
    "date-parts": [
      [
        2025,
        6,
        18
      ]
    ],
    "date-time": "2025-06-18T07:19:35Z",
    "timestamp": 1750231175000
  },
  "score": 1,
  "resource": {
    "primary": {
      "URL": "https://dl.acm.org/doi/10.1145/2623330.2623612"
    }
  },
  "subtitle": [],
  "short-title": [],
  "issued": {
    "date-parts": [
      [
        2014,
        8,
        24
      ]
    ]
  },
  "references-count": 27,
  "alternative-id": [
    "10.1145/2623330.2623612",
    "10.1145/2623330"
  ],
  "URL": "http://dx.doi.org/10.1145/2623330.2623612",
  "relation": {},
  "subject": [],
  "published": {
    "date-parts": [
      [
        2014,
        8,
        24
      ]
    ]
  },
  "assertion": [
    {
      "value": "2014-08-24",
      "order": 2,
      "name": "published",
      "label": "Published",
      "group": {
        "name": "publication_history",
        "label": "Publication History"
      }
    }
  ]
}