Crossref proceedings-article
ACM
Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (320)
Bibliography

Li, M., Zhang, T., Chen, Y., & Smola, A. J. (2014). Efficient mini-batch training for stochastic optimization. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 661–670.

Authors 4
  1. Mu Li (first)
  2. Tong Zhang (additional)
  3. Yuqiang Chen (additional)
  4. Alexander J. Smola (additional)
References 27 Referenced 448
  1. 10.1111/j.2517-6161.1974.tb00999.x
  2. 10.1561/2200000016
  3. R. Byrd , S. Hansen , J. Nocedal , and Y. Singer . A stochastic quasi-newton method for large-scale optimization. arXiv preprint arXiv:1401.7020 , 2014 . R. Byrd, S. Hansen, J. Nocedal, and Y. Singer. A stochastic quasi-newton method for large-scale optimization. arXiv preprint arXiv:1401.7020, 2014. / A stochastic quasi-newton method for large-scale optimization. arXiv preprint arXiv:1401.7020 by Byrd R. (2014)
  4. 10.1007/s10107-012-0572-5
  5. 10.1145/2020408.2020517
  6. A. Cotter , O. Shamir , N. Srebro , and K. Sridharan . Better mini-batch algorithms via accelerated gradient methods . In NIPS , volume 24 , pages 1647 -- 1655 , 2011 . A. Cotter, O. Shamir, N. Srebro, and K. Sridharan. Better mini-batch algorithms via accelerated gradient methods. In NIPS, volume 24, pages 1647--1655, 2011. / NIPS by Cotter A. (2011)
  7. J. Dean , G. Corrado , R. Monga , K. Chen , M. Devin , Q. Le , M. Mao , M. Ranzato , A. Senior , P. Tucker , K. Yang , and A. Ng . Large scale distributed deep networks . In Neural Information Processing Systems , 2012 . J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng. Large scale distributed deep networks. In Neural Information Processing Systems, 2012. / Neural Information Processing Systems by Dean J. (2012)
  8. O. Dekel R. Gilad-Bachrach O. Shamir and L. Xiao. Optimal distributed online prediction using mini-batches. Technical report http://arxiv.org/abs/1012.1367 2010. O. Dekel R. Gilad-Bachrach O. Shamir and L. Xiao. Optimal distributed online prediction using mini-batches. Technical report http://arxiv.org/abs/1012.1367 2010.
  9. 10.5555/1390681.1442794
  10. 10.1145/2020408.2020426
  11. 10.5555/1870568.1870593
  12. T. Hastie , R. Tibshirani , and J. Friedman . The Elements of Statistical Learning . Springer , New York , 2 edition, 2009 . T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, New York, 2 edition, 2009. (10.1007/978-0-387-84858-7) / The Elements of Statistical Learning by Hastie T. (2009)
  13. R. Johnson and T. Zhang . Accelerating stochastic gradient descent using predictive variance reduction . In Advances in Neural Information Processing Systems , pages 315 -- 323 , 2013 . R. Johnson and T. Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In Advances in Neural Information Processing Systems, pages 315--323, 2013. / Advances in Neural Information Processing Systems by Johnson R. (2013)
  14. M. I. Jordan . An Introduction to Probabilistic Graphical Models . MIT Press , 2008 . To Appear. M. I. Jordan. An Introduction to Probabilistic Graphical Models. MIT Press, 2008. To Appear. / An Introduction to Probabilistic Graphical Models by Jordan M. I. (2008)
  15. 10.1109/18.910572
  16. B. Kulis and P. L. Bartlett . Implicit online learning . In Proc.\ Intl.\ Conf.\ Machine Learning , 2010 . B. Kulis and P. L. Bartlett. Implicit online learning. In Proc.\ Intl.\ Conf.\ Machine Learning, 2010. / Proc.\ Intl.\ Conf.\ Machine Learning by Kulis B. (2010)
  17. 10.1109/CVPR.2011.5995477
  18. 10.1007/BF01589116
  19. D. Mahajan , S. S. Keerthi , S. Sundararajan , and L. Bottou . A parallel sgd method with strong convergence. arXiv preprint arXiv:1311.0636 , 2013 . D. Mahajan, S. S. Keerthi, S. Sundararajan, and L. Bottou. A parallel sgd method with strong convergence. arXiv preprint arXiv:1311.0636, 2013. / A parallel sgd method with strong convergence. arXiv preprint arXiv:1311.0636 by Mahajan D. (2013)
  20. G. Mann , R. McDonald , M. Mohri , N. Silberman , and D. Walker . Efficient large-scale distributed training of conditional maximum entropy models. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors , Advances in Neural Information Processing Systems 22 , pages 1231 -- 1239 , 2009 . G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1231--1239, 2009. / Advances in Neural Information Processing Systems 22 by Mann G. (2009)
  21. 10.1145/2339530.2339559
  22. D. Mimno , M. Hoffman , and D. Blei . Sparse stochastic inference for latent dirichlet allocation . In International Conference on Machine Learning , 2012 . D. Mimno, M. Hoffman, and D. Blei. Sparse stochastic inference for latent dirichlet allocation. In International Conference on Machine Learning, 2012. / International Conference on Machine Learning by Mimno D. (2012)
  23. S. Shalev-Shwartz and T. Zhang . Accelerated mini-batch stochastic dual coordinate ascent . In Advances in Neural Information Processing Systems , pages 378 -- 385 , 2013 . S. Shalev-Shwartz and T. Zhang. Accelerated mini-batch stochastic dual coordinate ascent. In Advances in Neural Information Processing Systems, pages 378--385, 2013. / Advances in Neural Information Processing Systems by Shalev-Shwartz S. (2013)
  24. M. Takác , A. Bijral , P. Richtárik , and N. Srebro . Mini-batch primal and dual methods for svms. arXiv preprint arXiv:1303.2314 , 2013 . M. Takác, A. Bijral, P. Richtárik, and N. Srebro. Mini-batch primal and dual methods for svms. arXiv preprint arXiv:1303.2314, 2013. / Mini-batch primal and dual methods for svms. arXiv preprint arXiv:1303.2314 by Takác M. (2013)
  25. 10.1145/1835804.1835910
  26. M. Zinkevich . Online convex programming and generalised infinitesimal gradient ascent . In Proceedings of the International Conference on Machine Learning , pages 928 -- 936 , 2003 . M. Zinkevich. Online convex programming and generalised infinitesimal gradient ascent. In Proceedings of the International Conference on Machine Learning, pages 928--936, 2003. / Proceedings of the International Conference on Machine Learning by Zinkevich M. (2003)
  27. M. Zinkevich , A. J. Smola , M. Weimer , and L. Li . Parallelized stochastic gradient descent. In nips23e , editor, nips23, pages 2595 -- 2603 , 2010 . M. Zinkevich, A. J. Smola, M. Weimer, and L. Li. Parallelized stochastic gradient descent. In nips23e, editor, nips23, pages 2595--2603, 2010. / Parallelized stochastic gradient descent. In nips23e by Zinkevich M. (2010)
Dates
Type When
Created 11 years ago (Aug. 22, 2014, 3:38 p.m.)
Deposited 2 months, 2 weeks ago (June 18, 2025, 3:19 a.m.)
Indexed 1 day ago (Sept. 3, 2025, 6:16 a.m.)
Issued 11 years ago (Aug. 24, 2014)
Published 11 years ago (Aug. 24, 2014)
Published Online 11 years ago (Aug. 24, 2014)
Published Print 11 years ago (Aug. 24, 2014)
Funders 0

None

@inproceedings{Li_2014, series={KDD ’14}, title={Efficient mini-batch training for stochastic optimization}, url={http://dx.doi.org/10.1145/2623330.2623612}, DOI={10.1145/2623330.2623612}, booktitle={Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining}, publisher={ACM}, author={Li, Mu and Zhang, Tong and Chen, Yuqiang and Smola, Alexander J.}, year={2014}, month=aug, pages={661–670}, collection={KDD ’14} }