Limits of machine learning

S_Emiya · May 17, 2021, 3:39pm

Creativity isn’t necessarily required in the process of knowledge creation. Knowledge is created by evolution. Evolution requires a population of replicators subjected to variation and a selection process. When knowledge is created by biological evolution the variance comes from mutation. When knowledge is created by idea evolution the variance comes from creativity.

Disclaimer: I have never heard of AlphaZero or even Monte Carlo tree search before your post.

AlphaZero works by undergoing a training period where it does “reinforcement learning” from games of self-play. It learns from a “tabula rasa” in the sense that it starts with only the rules of the game of chess. It is not provided with any “domain-specific human knowledge or data”. The initial parameters of the neural network are randomized. These parameters are updated throughout the training period.

Here’s a wikipedia explanation of MCTS:

“The application of Monte Carlo tree search in games is based on many playouts, also called roll-outs . In each playout, the game is played out to the very end by selecting moves at random. The final game result of each playout is then used to weight the nodes in the game tree so that better nodes are more likely to be chosen in future playouts.”

My understanding is that AZ doesn’t just select moves at random. Moves are selected in proportion to the root visit count which is stored in the parameters of the neural network. (this part I need to do more research on. I have a friend who works with neural networks I can ask, I can update after I talk with him)

In this case I am thinking of “good” or “winning” chess moves as the replicators. “Good” moves are moves which were identified as having a high probability of winning (or at least drawing) during the training process. These moves are stored in the neural network and reused during gameplay.

There is also variation happening during the training period. For a given game state s, random moves (not entirely random since they are based on the visit counts of the root state) are simulated until an end state is reached (win, loss, or draw). A score is then given to the move based on the result (+1 for win, 0 for draw, -1 for loss). This is the selection process that allows AZ to “learn” good moves from a distribution of random moves.

The end result of the training (variation and selection) process is a population of very good chess moves (replicators). Based on my understanding it seems that AZ is creating knowledge through the process of evolution. But it is relying on random variance rather than creativity.

Happy to hear feedback on where I went wrong =) The parts I need to research more are how the moves are selected based on the root visit count. And also whether good chess moves can be considered replicators (i have made mistakes with replicators in the past).