The Secret of AlphaGo's Victory over the World Go Champion

advertisement

image.png

Different from chess and chess, Go has a very large board and its playing methods are ever-changing. The result of the chess pieces of Go is more than the sum of all the atoms in the universe. Therefore, Go has always been a barrier that chess AI cannot overcome. With the rapid development of in-depth learning and the rapid progress of computer hardware technology, the AlphaGo Go AI program developed by the DeepMind team swept over the European Go professional second segment players with a 5:0 advantage, and then defeated the Korean Go professional ninth segment players ranked fourth in the world with a 4:1 victory.

So, why is AlphaGo so strong? What is the algorithm behind it?

When we uncover the mysterious cloak of AlphaGo, we can find that it is a machine learning project. Since it is a machine learning project, it contains at least three modules: feature extraction, model training and model prediction. We can think about this "mysterious" machine learning project from these three aspects.

For Go data, feature extraction is to digitize and structure the state of each chess step. The original Go data used by AlphaGo is taken from 6 to 9 segments of the game on the KGS Go server, totaling 160000 sets of nearly 30 million moves. According to the requirements of different sub models, AlphaGo extracted two feature data from these original chess games. First of all, it is not difficult to think of the first most intuitive chessboard feature, that is, the placement. Go board size is 19 × 19. There are three possible falling conditions for each position: sunspot, white or blank. Therefore, the drop feature data of a sample is three 19 × 19. Each matrix describes a kind of placement on the chessboard. In addition to the placement, many features can also be extracted from the current state of the chessboard itself, such as the current black or white playing, the rounds of each placement, the expected number of swallows after placement, and whether the placement is legal.

image.png

There are actually four sub models of AlphaGo: convolutional strategy network, enhanced strategy network, softmax regression network and convolutional value network.

Convolutional strategy network is good at solving the problem of image recognition and image classification. If the convolution strategy network is a stage of learning and surpassing others, then the enhancement strategy network is a stage of self-learning and surpassing oneself. According to the internal test results of AlphaGo, the success rate of the enhanced strategy network over the convolution strategy network is as high as 80%!

Reinforcement learning, also known as reinforcement learning, is a machine learning method that finds the best strategy and inspires it through constant self trial and error. Let's take Go as an example. For example, two people who can't play chess at all will play randomly at the beginning until one side wins. The winning party optimizes its placement strategy according to the way it moves, and then the two continue to play chess, optimize and play chess, repeating the process until a certain condition is met. Finally, people who can't play chess can know which step is good and which step is bad.

In conclusion, the algorithm of AlphaGo is not complex. However, AlphaGo trains and learns through a large number of data sets, as if it is growing slowly and becoming very intelligent.

image.png

WriterAlfo