The Alpha of ‘Go’
What is Go?
The game of ‘Go’ is believed to be the oldest, continuously played board game in the world. It was invented in China, over 2,500 years ago, and is still wildly popular to this day. A survey taken in 2016 found there to be around 20 million Go players worldwide, though they mostly live in Asia. The game board uses a grid system, and requires two players. Each player places either black or white “stones” on the board, one after another. If player A surrounds player B’s stone(s), then any surrounded stone is removed from the board, and later factors in to player A’s score. If a player connects a series of stones, the number of squares within that connected series will be at least a portion of the final score for that player. The player with the most points wins.
Of course that explanation of the game was over-simplified, but the game itself actually appears to be simple as well. This is true, in terms of the rules and the goal of the game, however the sheer number of legal board positions is over two times the total number of atoms in the observable universe (Go: 2.1*10¹⁷⁰, Atoms: 1*10⁸⁰). The incomprehensible amount of moves alone add an extreme level of complexity to the game. On top of that, it was, for a long time, believed that Go required a certain level of human intuition to excel at the game, however the reigning world champion of Go inherently disagrees with this particular assessment.
AlphaGo
It was believed not too long ago that a computer would never be able to beat a high ranking human Go player. It’s happened in other similar style games, namely chess. In 1997 a computer developed by IBM named Deep Blue beat Garry Kasparov, the reigning world chess champ, using the standard regulated time. Deep Blue used a brute force approach. This involves searching every possible move of each piece (both sides of the board) before ultimately choosing which move would give it the highest probability of winning. This was more a big win for hardware; AlphaGo is something completely different.
AlphaGo, developed by artificial intelligence research company Deep Mind, is the result of combining machine learning and tree search techniques, specifically the Monte Carlo tree search, along with extensive training. Its training consisted of playing games, and was carried out by both human and computer play. The decision making is executed via a deep neural network, which implements both a “value network” and a “policy network”. These two networks guide the hand of which tree branch should be traversed, and which should be ignored due to a low probability of winning. This greatly decreases the time complexity of AlphaGo’s operations, while also improving itself over time.
After being acquired by Google, and a couple of new versions, AlphaGo was eventually succeeded by AlphaGo Zero. AlphaGo Zero differed from its predecessor by being completely self-taught. All versions of AlphaGo were trained in part by showing it human played games. AlphaGo Zero however, did not use any dataset fed to it by humans. Even though it pursued a goal blindly, AlphaGo Zero was able to learn and improve until it was able to surpass the all versions of the original AlphaGo in a mere 40 days. Eventually AlphaGo Zero was generalized into AlphaZero.
Conclusion
AlphaGo, and the familial programs which succeeded it, were a major breakthrough in the world of AI. Driven of course by the hard work of many developers, and also due to the self-improvement capability, this is far from the ceiling of AlphaGo’s full potential. AlphaGo Zero and Alpha Zero further this; due to their lack of human-backed training, the probability is high of a completely generalized AI algorithm, which could be applied to many different and diverse situations, and over a period of time, begin to function at a level of which humans are easily outperformed.
Two more fun facts: Along with Go and chess, MuZero, the successor to AlphaZero, is also capable of playing at least 57 different Atari games at a superhuman level. Additionally, the hardware cost of a single unit used for the AlphaGo Zero system was quoted at $25 Million.
FURTHER READING:- Chan, Dawn. “The AI That Has Nothing to Learn From Humans.” The Atlantic, 20 Oct. 2017, www.theatlantic.com/technology/archive/2017/10/alphago-zero-the-ai-that-taught-itself-go/543450.- Greenemeier, Larry, and Larry Greenemeier. “AI versus AI: Self-Taught AlphaGo Zero Vanquishes Its Predecessor.” Scientific American, 18 Oct. 2017, www.scientificamerican.com/article/ai-versus-ai-self-taught-alphago-zero-vanquishes-its-predecessor.- Shorten, Connor. “The Evolution of AlphaGo to MuZero - Towards Data Science.” Medium, 18 Jan. 2020, towardsdatascience.com/the-evolution-of-alphago-to-muzero-c2c37306bf9.- Silver, David. “Learning to Play Go from Scratch.” Nature, 19 Oct. 2017, www.nature.com/articles/nature24270?error=cookies_not_supported&code=d44f31b6-1e66-4d3f-b79f-c88d365911e1.