Bibliography
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., & Hassabis, D. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), 354â359.
Authors
17
- David Silver (first)
- Julian Schrittwieser (additional)
- Karen Simonyan (additional)
- Ioannis Antonoglou (additional)
- Aja Huang (additional)
- Arthur Guez (additional)
- Thomas Hubert (additional)
- Lucas Baker (additional)
- Matthew Lai (additional)
- Adrian Bolton (additional)
- Yutian Chen (additional)
- Timothy Lillicrap (additional)
- Fan Hui (additional)
- Laurent Sifre (additional)
- George van den Driessche (additional)
- Thore Graepel (additional)
- Demis Hassabis (additional)
References
69
Referenced
6,041
-
Friedman, J., Hastie, T. & Tibshirani, R. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, 2009)
(
10.1007/978-0-387-84858-7
) -
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015)
(
10.1038/nature14539
) / Nature by Y LeCun (2015) - Krizhevsky, A., Sutskever, I. & Hinton, G. ImageNet classification with deep convolutional neural networks. In Adv. Neural Inf. Process. Syst. Vol. 25 (eds Pereira, F., Burges, C. J. C., Bottou, L. & Weinberger, K. Q. ) 1097–1105 (2012)
-
He, K., Zhang, X., Ren, S . & Sun, J. Deep residual learning for image recognition. In Proc. 29th IEEE Conf. Comput. Vis. Pattern Recognit. 770–778 (2016)
(
10.1109/CVPR.2016.90
) - Hayes-Roth, F., Waterman, D. & Lenat, D. Building Expert Systems (Addison-Wesley, 1984)
-
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
(
10.1038/nature14236
) / Nature by V Mnih (2015) - Guo, X., Singh, S. P., Lee, H., Lewis, R. L. & Wang, X. Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning. In Adv. Neural Inf. Process. Syst. Vol. 27 (eds Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D. & Weinberger, K. Q. ) 3338–3346 (2014)
- Mnih, V . et al. Asynchronous methods for deep reinforcement learning. In Proc. 33rd Int. Conf. Mach. Learn. Vol. 48 (eds Balcan, M. F. & Weinberger, K. Q. ) 1928–1937 (2016)
- Jaderberg, M . et al. Reinforcement learning with unsupervised auxiliary tasks. In 5th Int. Conf. Learn. Representations (2017)
- Dosovitskiy, A. & Koltun, V. Learning to act by predicting the future. In 5th Int. Conf. Learn. Representations (2017)
-
Man´dziuk, J. in Challenges for Computational Intelligence ( Duch, W. & Man´dziuk, J. ) 407–442 (Springer, 2007)
(
10.1007/978-3-540-71984-7_15
) -
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016)
(
10.1038/nature16961
) / Nature by D Silver (2016) -
Coulom, R. Efficient selectivity and backup operators in Monte-Carlo tree search. In 5th Int. Conf. Computers and Games (eds Ciancarini, P. & van den Herik, H. J. ) 72–83 (2006)
(
10.1007/978-3-540-75538-8_7
) -
Kocsis, L. & Szepesvári, C. Bandit based Monte-Carlo planning. In 15th Eu. Conf. Mach. Learn. 282–293 (2006)
(
10.1007/11871842_29
) -
Browne, C. et al. A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4, 1–49 (2012)
(
10.1109/TCIAIG.2012.2186810
) / Comput. Intell. AI Games by C Browne (2012) -
Fukushima, K. Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980)
(
10.1007/BF00344251
) / Biol. Cybern. by K Fukushima (1980) - LeCun, Y. & Bengio, Y. in The Handbook of Brain Theory and Neural Networks Ch. 3 (ed. Arbib, M. ) 276–278 (MIT Press, 1995)
- Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. 32nd Int. Conf. Mach. Learn. Vol. 37 448–456 (2015)
-
Hahnloser, R. H. R., Sarpeshkar, R., Mahowald, M. A., Douglas, R. J. & Seung, H. S. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405, 947–951 (2000)
(
10.1038/35016072
) / Nature by RHR Hahnloser (2000) - Howard, R. Dynamic Programming and Markov Processes (MIT Press, 1960)
-
Sutton, R . & Barto, A. Reinforcement Learning: an Introduction (MIT Press, 1998)
(
10.1109/TNN.1998.712192
) -
Bertsekas, D. P. Approximate policy iteration: a survey and some new methods. J. Control Theory Appl. 9, 310–335 (2011)
(
10.1007/s11768-011-1005-3
) / J. Control Theory Appl. by DP Bertsekas (2011) - Scherrer, B. Approximate policy iteration schemes: a comparison. In Proc. 31st Int. Conf. Mach. Learn. Vol. 32 1314–1322 (2014)
-
Rosin, C. D. Multi-armed bandits with episode context. Ann. Math. Artif. Intell. 61, 203–230 (2011)
(
10.1007/s10472-011-9258-6
) / Ann. Math. Artif. Intell. by CD Rosin (2011) -
Coulom, R. Whole-history rating: a Bayesian rating system for players of time-varying strength. In Int. Conf. Comput. Games (eds van den Herik, H. J., Xu, X . Ma, Z . & Winands, M. H. M. ) Vol. 5131 113–124 (Springer, 2008)
(
10.1007/978-3-540-87608-3_11
) -
Laurent, G. J., Matignon, L. & Le Fort-Piat, N. The world of independent learners is not Markovian. Int. J. Knowledge-Based Intelligent Engineering Systems 15, 55–64 (2011)
(
10.3233/KES-2010-0206
) / Int. J. Knowledge-Based Intelligent Engineering Systems by GJ Laurent (2011) - Foerster, J. N . et al. Stabilising experience replay for deep multi-agent reinforcement learning. In Proc. 34th Int. Conf. Mach. Learn. Vol. 70 1146–1155 (2017)
- Heinrich, J . & Silver, D. Deep reinforcement learning from self-play in imperfect-information games. In NIPS Deep Reinforcement Learning Workshop (2016)
- Jouppi, N. P . et al. In-datacenter performance analysis of a Tensor Processing Unit. Proc. 44th Annu. Int. Symp. Comp. Architecture Vol. 17 1–12 (2017)
- Maddison, C. J., Huang, A., Sutskever, I . & Silver, D. Move evaluation in Go using deep convolutional neural networks. In 3rd Int. Conf. Learn. Representations. (2015)
- Clark, C . & Storkey, A. J. Training deep convolutional neural networks to play Go. In Proc. 32nd Int. Conf. Mach. Learn. Vol. 37 1766–1774 (2015)
- Tian, Y. & Zhu, Y. Better computer Go player with neural network and long-term prediction. In 4th Int. Conf. Learn. Representations (2016)
-
Cazenave, T. Residual networks for computer Go. IEEE Trans. Comput. Intell. AI Games https://doi.org/10.1109/TCIAIG.2017.2681042 (2017)
(
10.1109/TCIAIG.2017.2681042
) - Huang, A. AlphaGo master online series of games. https://deepmind.com/research/AlphaGo/match-archive/master (2017)
- Barto, A. G. & Duff, M. Monte Carlo matrix inversion and reinforcement learning. Adv. Neural Inf. Process. Syst. 6, 687–694 (1994) / Adv. Neural Inf. Process. Syst. by AG Barto (1994)
- Singh, S. P. & Sutton, R. S. Reinforcement learning with replacing eligibility traces. Mach. Learn. 22, 123–158 (1996) / Mach. Learn. by SP Singh (1996)
- Lagoudakis, M. G. & Parr, R. Reinforcement learning as classification: leveraging modern classifiers. In Proc. 20th Int. Conf. Mach. Learn. 424–431 (2003)
- Scherrer, B., Ghavamzadeh, M., Gabillon, V., Lesner, B. & Geist, M. Approximate modified policy iteration and its application to the game of Tetris. J. Mach. Learn. Res. 16, 1629–1676 (2015) / J. Mach. Learn. Res. by B Scherrer (2015)
-
Littman, M. L. Markov games as a framework for multi-agent reinforcement learning. In Proc. 11th Int. Conf. Mach. Learn. 157–163 (1994)
(
10.1016/B978-1-55860-335-6.50027-1
) - Enzenberger, M. The integration of a priori knowledge into a Go playing neural network. http://www.cgl.ucsf.edu/go/Programs/neurogo-html/neurogo.html (1996)
-
Enzenberger, M. in Advances in Computer Games (eds Van Den Herik, H. J., Iida, H. & Heinz, E. A. ) 97–108 (2003)
(
10.1007/978-0-387-35706-5_7
) - Sutton, R. Learning to predict by the method of temporal differences. Mach. Learn. 3, 9–44 (1988) / Mach. Learn. by R Sutton (1988)
- Schraudolph, N. N., Dayan, P. & Sejnowski, T. J. Temporal difference learning of position evaluation in the game of Go. Adv. Neural Inf. Process. Syst. 6, 817–824 (1994) / Adv. Neural Inf. Process. Syst. by NN Schraudolph (1994)
-
Silver, D., Sutton, R. & Müller, M. Temporal-difference search in computer Go. Mach. Learn. 87, 183–219 (2012)
(
10.1007/s10994-012-5280-0
) / Mach. Learn. by D Silver (2012) - Silver, D. Reinforcement Learning and Simulation-Based Search in Computer Go. PhD thesis, Univ. Alberta, Edmonton, Canada (2009)
-
Gelly, S. & Silver, D. Monte-Carlo tree search and rapid action value estimation in computer Go. Artif. Intell. 175, 1856–1875 (2011)
(
10.1016/j.artint.2011.03.007
) / Artif. Intell. by S Gelly (2011) - Coulom, R. Computing Elo ratings of move patterns in the game of Go. Int. Comput. Games Assoc. J. 30, 198–208 (2007) / Int. Comput. Games Assoc. J. by R Coulom (2007)
- Gelly, S., Wang, Y., Munos, R. & Teytaud, O. Modification of UCT with patterns in Monte-Carlo Go. Report No. 6062 (INRIA, 2006)
-
Baxter, J., Tridgell, A. & Weaver, L. Learning to play chess using temporal differences. Mach. Learn. 40, 243–263 (2000)
(
10.1023/A:1007634325138
) / Mach. Learn. by J Baxter (2000) - Veness, J., Silver, D., Blair, A. & Uther, W. Bootstrapping from game tree search. In Adv. Neural Inf. Process. Syst. 1937–1945 (2009)
- Lai, M. Giraffe: Using Deep Reinforcement Learning to Play Chess. MSc thesis, Imperial College London (2015)
- Schaeffer, J., Hlynka, M . & Jussila, V. Temporal difference learning applied to a high-performance game-playing program. In Proc. 17th Int. Jt Conf. Artif. Intell. Vol. 1 529–534 (2001)
-
Tesauro, G. TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6, 215–219 (1994)
(
10.1162/neco.1994.6.2.215
) / Neural Comput. by G Tesauro (1994) -
Buro, M. From simple features to sophisticated evaluation functions. In Proc. 1st Int. Conf. Comput. Games 126–145 (1999)
(
10.1007/3-540-48957-6_8
) -
Sheppard, B. World-championship-caliber Scrabble. Artif. Intell. 134, 241–275 (2002)
(
10.1016/S0004-3702(01)00166-7
) / Artif. Intell. by B Sheppard (2002) -
Moravcˇík, M. et al. DeepStack: expert-level artificial intelligence in heads-up no-limit poker. Science 356, 508–513 (2017)
(
10.1126/science.aam6960
) / Science by M Moravcˇík (2017) - Tesauro, G & Galperin, G. On-line policy improvement using Monte-Carlo search. In Adv. Neural Inf. Process. Syst. 1068–1074 (1996)
-
Tesauro, G. Neurogammon: a neural-network backgammon program. In Proc. Int. Jt Conf. Neural Netw. Vol. 3, 33–39 (1990)
(
10.1109/IJCNN.1990.137821
) -
Samuel, A. L. Some studies in machine learning using the game of checkers II - recent progress. IBM J. Res. Develop. 11, 601–617 (1967)
(
10.1147/rd.116.0601
) / IBM J. Res. Develop. by AL Samuel (1967) -
Kober, J., Bagnell, J. A. & Peters, J. Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32, 1238–1274 (2013)
(
10.1177/0278364913495721
) / Int. J. Robot. Res. by J Kober (2013) - Zhang, W. & Dietterich, T. G. A reinforcement learning approach to job-shop scheduling. In Proc. 14th Int. Jt Conf. Artif. Intell. 1114–1120 (1995)
-
Cazenave, T., Balbo, F. & Pinson, S. Using a Monte-Carlo approach for bus regulation. In Int. IEEE Conf. Intell. Transport. Syst. 1–6 (2009)
(
10.1109/ITSC.2009.5309838
) - Evans, R. & Gao, J. Deepmind AI reduces Google data centre cooling bill by 40%. https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-40/ (2016)
-
Abe, N . et al. Empirical comparison of various reinforcement learning strategies for sequential targeted marketing. In IEEE Int. Conf. Data Mining 3–10 (2002)
(
10.1109/ICDM.2002.1183879
) - Silver, D., Newnham, L., Barker, D., Weller, S. & McFall, J. Concurrent reinforcement learning from customer interactions. In Proc. 30th Int. Conf. Mach. Learn. Vol. 28 924–932 (2013)
- Tromp, J. Tromp–Taylor rules. http://tromp.github.io/go.html (1995)
-
Müller, M. Computer Go. Artif. Intell. 134, 145–179 (2002)
(
10.1016/S0004-3702(01)00121-7
) / Artif. Intell. by M Müller (2002) -
Shahriari, B., Swersky, K., Wang, Z., Adams, R. P. & de Freitas, N. Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104, 148–175 (2016)
(
10.1109/JPROC.2015.2494218
) / Proc. IEEE by B Shahriari (2016) -
Segal, R. B. On the scalability of parallel UCT. Comput. Games 6515, 36–47 (2011)
(
10.1007/978-3-642-17928-0_4
) / Comput. Games by RB Segal (2011)
Dates
Type | When |
---|---|
Created | 7 years, 10 months ago (Oct. 17, 2017, 12:13 p.m.) |
Deposited | 1 month, 4 weeks ago (June 26, 2025, 11 a.m.) |
Indexed | 20 hours, 35 minutes ago (Aug. 23, 2025, 9:21 p.m.) |
Issued | 7 years, 10 months ago (Oct. 1, 2017) |
Published | 7 years, 10 months ago (Oct. 1, 2017) |
Published Online | 7 years, 10 months ago (Oct. 19, 2017) |
Published Print | 7 years, 10 months ago (Oct. 1, 2017) |
@article{Silver_2017, title={Mastering the game of Go without human knowledge}, volume={550}, ISSN={1476-4687}, url={http://dx.doi.org/10.1038/nature24270}, DOI={10.1038/nature24270}, number={7676}, journal={Nature}, publisher={Springer Science and Business Media LLC}, author={Silver, David and Schrittwieser, Julian and Simonyan, Karen and Antonoglou, Ioannis and Huang, Aja and Guez, Arthur and Hubert, Thomas and Baker, Lucas and Lai, Matthew and Bolton, Adrian and Chen, Yutian and Lillicrap, Timothy and Hui, Fan and Sifre, Laurent and van den Driessche, George and Graepel, Thore and Hassabis, Demis}, year={2017}, month=oct, pages={354–359} }