Podcast: Play in new window | Download | Embed
LM101-073: How to Build a Machine that Learns Checkers (remix)
Episode Summary:
This is a remix of the original second episode Learning Machines 101 which describes in a little more detail how the computer program that Arthur Samuel developed in 1959 learned to play checkers by itself without human intervention using a mixture of classical artificial intelligence search methods and artificial neural network learning algorithms. The podcast ends with a book review of Professor Nilsson’s book: “The Quest for Artificial Intelligence: A History of Ideas and Achievements”.
Show Notes:
Hello everyone! Welcome to the 73th podcast in the podcast series Learning Machines 101. In this series of podcasts my goal is to discuss important concepts of artificial intelligence and machine learning in hopefully an entertaining and educational manner. This podcast is a remake of Episode 2 of Learning Machines 101 which discusses how to build a learning machine that can play checkers. I’ve improved the audio quality, edited the content, and added a book review of the Cambridge University Press book “The Quest for Artificial Intelligence: A History of Ideas and Achievements” by Nils Nilsson.
In Episode 72 of Learning Machines 101, we introduced the idea that a scientific strategy for tackling real world problems in artificial intelligence is to systematically evaluate approaches to artificial intelligence within more constrained environments. Although in many cases this results in a dramatic simplification of the complexities of the problem, the advantage of investigating these simplified problems is that we can obtain a better understanding of how the algorithms are actually working and the resulting insights can provide essential guidance in the development of more complex realistic artificially intelligent systems.
As described in Episode 72, many real-world problems can be viewed as more complex versions of board games such as checkers. Today we will begin with a short review of a few of the ideas discussed in Episode 72 but then we will introduce some additional concepts that were not covered in Episode 72. Checkers is a board game which is much simpler in its complexity than the board games of Chess and GO yet more complex than the board game tic-tac-toe. Each player moves a playing piece on the board according to specific rules and tries to “capture” or “block” the opponents playing pieces. When one player can not make a move, then that player loses the game and the other player wins the game.
Real-world problems can be viewed as more complex versions of board games such as checkers. In the checker board world, one has a particular configuration of playing pieces on the checkerboard. Such a configuration can be interpreted as a simplified model of a more complex real-world situation typically encountered by a biological (or artificial) intelligence. So, like the checker board world, the real-world can be characterized at a particular moment in time by the concept of a situation. In the checkerboard world, the situation is quite simply specified by placing some pieces in acceptable locations on a checkerboard. In the real-world, a situation is considerably more complicated yet nevertheless the essential idea of a situation is the same for both the checkerboard world and the real-world.
Another point of similarity between the checkerboard world and the real world is the concept of a “move” in a checkerboard game. Making a “move” in a checkerboard game corresponds to making a “decision” in the real-world. In the real-world we make thousands of decisions each day which could have important consequences for us in the future and in many cases we don’t understand the implications of those decisions until much later. Similarly, in the checkerboard world, decisions are made regarding which checker piece to move or which of the human’s checker pieces should be captured and the consequences of those decisions may not be understood until much later in the game. However, unlike the real-world, the outcome of a checkerboard game is easily interpretable as either a “win”, a “loss”, or a “draw”. In the real-world, it may be much more difficult to determine whether a particular situation is a “winning” or “losing” situation.
In summary, there is a close correspondence between the problem of learning to play checkers and the problem of learning to make intelligent decisions in the real-world when the outcomes of those decisions are not available until the distant future.
Today, we review the underlying principles of learning machine which can learn to play checkers not only by interacting with human checker players but can actually increase its expertise by playing checkers against itself. The learning machine uses a mixture of standard search methods from the field of artificial intelligence as well as some machine learning algorithms which can be interpreted as artificial neural network algorithms. The computer program’s performance is actually quite good, it can compete with strong expert human checker players. But perhaps the most interesting point about this learning machine which skillfully combines both standard artificial intelligence methods and artificial neural network algorithms is that it was not invented within the past 5 years! In fact, it was invented by an Engineer and Scientist named Arthur Samuel in the year 1959 about 59 years ago. This was less than 13 years after the very first digital computer was constructed!
One method for building an intelligent machine that can learn from experience to play checkers is to use a Look Ahead Strategy. To illustrate this concept, suppose the learning machine and the opponent will always have 5 possible moves at each turn. In practice, the number of possible legal moves one can make will vary dramatically from turn to turn. The learning machine realizes that it can make 5 possible moves and for each of those 5 possible moves, the opponent can make 5 possible moves. That leads to 25 possible checkerboard situations. To make things even simpler, suppose that 2 of those 25 possible checkerboard situations is a “win” for the learning machine. Then, it seems that it should be straightforward for the machine to “backtrack” from that winning checkerboard situation and figure out what sequence of moves it should make to obtain the goal of winning the checkerboard game. However, not so fast, … there is something else we need to worry about.
If we look closely at the sequence of moves leading to the first winning checkerboard situation we see that we were assuming that the opponent would make a move that would “help the learning machine win” the checkerboard game. On the other hand, for the second winning checkerboard situation we see that we are assuming that the opponent would make a move that “avoids helping the learning machine win”. In practice, the opponent will always try to win so the learning machine doesn’t want to consider a sequence of moves which is based upon the assumption that the opponent is going to help the learning machine win. In fact, the learning machine might assume that the opponent is also a learning machine and is also doing a Look Ahead Strategy. Under this assumption the machine deduces that whenever the opponent makes a move, the opponent will try to move a checker piece so that ultimately the machine will lose and the opponent will win. In this manner, the machine can actually predict how his opponent would move in response to his move by making the assumption that the opponent is just as smart as he is! This type of analysis lets the machine figure out what move would be a good move if he “looks ahead” 2 moves in order to see the 25 resulting checkerboard positions.
This seems like a pretty good strategy but there is still another problem. The length of a checkerboard game is about 50 moves. If we assume that both the machine and the opponent always have about 5 possible moves, this means that after the machine moves and then the opponent moves there will be 25 possible (5 times 5) checkerboard situations. If we go another turn corresponding to the case where the machine moves, the opponent moves, the machine moves, and then the opponent moves this results in 625 possible (5 times 5 times 5 Times 5) possible resulting checkerboard situations. After 50 moves, there will be virtually an infinite number of resulting checkerboard situations. Even a computer which could analyze 1,000,000 checkerboard situations in 1 second would take centuries to consider all possible sequences of checkerboard moves!
In order to improve the performance of the look-ahead strategy, we now introduce the concept of an Evaluation Rule. This is a powerful idea which is the basis of the majority of many popular approaches to building artificially intelligent machines. The basic idea is that you give the Evaluation Rule a Checkerboard situation, and the Evaluation Rule gives you back a RATING which is a number. If the RATING is small, this means that the Checkerboard Situation is a winning situation. If the RATING is large, this means that the Checkerboard Situation is a losing situation.
In order to define the concept of an Evaluation Rule we need to introduce still another idea which is also fundamental to the field of artificial intelligence. This is the concept of a feature. If you have a lot of checker pieces and your opponent has a few checker pieces then you are doing well so we would say you are in a good situation and the evaluation rule should assign a rating such as 2 to the checkboard situation. On the other hand if you have only a few checker pieces and your opponent has a lot of checker pieces then the evaluation rule would assign a rating such as 100 to the checkboard situation. We would call the “number of checker pieces that you have on the board” an example of a feature.
But the number of checker pieces on the checkerboard is not the only important feature. For example, if you have more Kings than your opponent, then we might say you are in a good situation even though you have fewer pieces. This is an example of another type of feature.
We would like to somehow combine these two different types of features: (1) the number of checker pieces owned by each player, and (2) the number of Kings owned by each player. Its unclear which of these features is more important than the other so for now let’s assume they are equally important for figuring out whether or not a checkerboard situation is a good situation for you or a poor situation. The Evaluation Rule combines this information together to generate a rating for a checkerboard situation.
An important point. It is sometimes challenging to come up with good features. For example, suppose that we counted the total number of checker pieces in the left-hand corner of the checkerboard and did not distinguish the pieces that belong to the machine from the pieces that belong to the opponent. This would be considered to be a less effective (possibly even a bad) feature because it wouldn’t be helpful in coming up with a good rating. Indeed, there are many features which are essentially irrelevant to evaluating the board situation such as the quality of the checkerboard’s construction. In many problems in machine learning and artificial intelligence we will see that picking the right features is often one of the most important ingredients in developing smart machines.
So, to summarize, we have introduced the idea of a look-ahead strategy but this strategy is limited because we can’t consider all possible sequences of moves into the future. We then talked about the idea of an evaluation rule which provides a RATING indicating whether a particular checkerboard position is GOOD or BAD. The board evaluation rule is based upon the concept of features.
These ideas may now be combined to create a more limited yet more computationally tractable look-ahead strategy. This strategy is based upon three ideas:
(1) we are always going to look-ahead 3 or 4 moves
(2) After 3 or 4 moves if the board evaluation rule tells us that the checkerboard situation
is very dismal for us on a particular move sequence, then we won’t “look-ahead” further on that move sequence.
(3) We will stop no matter what after looking-ahead for 11 moves even if all 11 checkerboard situations look pretty good to us using the board evaluation rule.
This type of strategy is very computationally tractable, and instead of taking the learning machine centuries to make a decision…the learning machine can make a decision within a reasonable amount of time such as a few seconds.
So we now have a method that the learning machine can use to make decisions about current situations whose outcomes are not apparent until the distant future!
However we still have the final piece of the puzzle which is how do we combine the board features in such a way to come up with a good RATING? This is where the “learning by experience” aspect of the checkerboard playing learning machine comes into play. The idea is very simple, suppose the machine is looking ahead on a particular move sequence 6 moves into the future and the RATING of the checkerboard situation 6 moves into the future is very good (e.g., a rating=4). This means that we would like the Board Evaluation Rule to tell us that the CURRENT checkerboard situation should also have a rating of 4 since if we make the best possible moves and the opponent makes the best possible moves then we would end up 6 moves into the future with a checkerboard situation rating equal to 4. However, if the CURRENT checkerboard situation has a rating of a 3 we need to adjust the calculations associated with the Evaluation Rule so that when the machine encounters the CURRENT checkerboard situation in the future the machine is more likely to assign it a rating of a 4 rather than a 3.
This type of learning is called temporal reinforcement learning because the information regarding the performance of the learning machine is not provided immediately but only provided in the future.
The principles underlying this checkerboard learning machine problem are fundamentally important ideas that are central to many modern approaches to artificial intelligence in the 21st century.
To summarize, those key ideas are:
(1) the concept of considering every possible move sequence into the future
but using an Evaluation Rule to eliminate exploring move sequences which
are not likely going to lead to good solutions. This is called Best-First Search
in the Artificial Intelligence Literature.
(2) the use of an Evaluation Rule for not only deciding which move sequences are worth exploring but also to decide which checkerboard situations are GOOD (i.e., winning situations) and which are BAD (i.e., losing situations).
(3) the importance of Features for representing the essential conceptual components of a
problem in artificial intelligence.
(4) a Learning Rule which compares the difference between the predictions of the Evaluation Rule and actual observed outcomes and then uses this discrepancy to make small adjustments to the Evaluation Rule mechanism.
It’s also important to note that the methods in which this learning machine uses to play checkers are not entirely human-like. So, for example, a computer can look ahead 11 moves into the future and thus consider millions of possible board situations but this is something that humans can not easily do without additional tools. On the other hand, the learning procedure used by the machine can be shown to be consistent with modern theories of biological and behavioral learning in humans. The show notes at: www.learningmachines101.com provide some helpful references to this literature.
Your laptop computer is infinitely more powerful than the computers they used in the 1950s and yet the key principles of Artificial Intelligence which were implemented in this checker playing program using 1950s computer technology are still widely used today.
Moreover, aspects of this technology are now providing neuroscientists with guidance regarding what types of neural mechanisms might be the basis for biological human learning and biological human intelligence. They are additionally providing mathematical psychologists with guidance in the development of mathematical theories of human behavior.
And finally, note that the Checker playing learning machine is essentially learning a collection of rules such as: In this situation, I should move my checker piece in this particular manner. In the next two podcasts, we will look more closely at the idea of knowledge representation using logical rules
This concludes the remix of Episode 2 of the Learning Machines 101 series.
Before ending this podcast, however, I would like to share with you an overview of the book “The Quest for Artificial Intelligence: A History of Ideas and Achievements” by Professor Nils Nilsson, Professor of Engineering Emeritus in the Department of Computer Science at Stanford University.
Like any other decent learning machine, I’m a firm believer that in order to predict the future, we need to understand the past. The Quest for Artificial Intelligence: A History of Ideas and Achievements provides a historical and educational overview of progress in the field of Artificial Intelligence (AI) through the end of the twentieth century.
The author of the Quest for Artificial Intelligence is Professor Nils J. Nilsson, Professor of Engineering Emeritus in the Department of Computer Science at Stanford University received his doctorate in electrical engineering from Stanford in 1958 when the field of Artificial Intelligence was in its infancy. Professor Nilsson is a past president and Fellow of the Association for the Advancement of Artificial Intelligence and is recognized as a leader in the field.
Chapter 1 begins with early speculations about Artificial Intelligence by the Greek Philosopher Aristotle in the 1st century and the Italian Inventor Leonardi Da Vinci in the 15th Century. Chapter 2 introduces the concepts of symbolic logic from a historical perspective by reviewing contributions from the mathematician Leibniz in the 18th century as well as artificial neural network architectures implementing the Boolean logic of George Boole (1854) which were described in 1943 by the computational neuroscientist Warren McCulloch and brilliant young mathematician Walter Pitts. Chapter 3 provides a fascinating review of three early and influential meetings which signaled the official birth of the field of Artificial Intelligence. These conferences were the Session on Learning Machines held at the 1955 Western Joint Computer Conference in Los Angeles. The Summer Research Project in Artificial Intelligence at Dartmouth College in 1956. And, in 1958, a meeting titled the “Mechanization of Thought Processes” located in the United Kingdom. The book continues with an evolutionary on the development of artificial intelligence with some discussion of machine learning and probabilistic methods but primarily emphasizing symbolic logic perspectives. The book ends with a review of the DARPA (Defense Advanced Research Projects Agency) Grand Challenge which was held on July 30, 2002. The DARPA challenge invited researchers from the United States to design and build a driverless car capable of completing a 142 mile course from Barstow, California to Primm, Nevada for a cash prize of $1,000,000. Sixteen vehicles competed in the challenge and they all failed or crashed within the first ten miles of the journey.
Finally, one really fun feature of this book is that it includes photographs of B.F. Skinner, Noam Chomsky, Grey Walter, Ross Ashby, Warren McCulloch, Norman Wiener, Alan Turing, Claude Shannon, Herb Simon, Allen Newell, Oliver Selfridge, John McCarty, Frank Rosenblatt, Peter Hart, Richard Duda, Judea Pearl, David Rumelhart, James McClelland, Andy Barto, Richard Sutton, Marvin Minsky and others when they were young adults!
The Quest for Artificial Intelligence will be of interest not only for AI newbies but also engineers, scientists, and researchers who will benefit from a high-level overview of the evolutionary origins of current work in AI from the 1st to 20th century. The book sketches the major events which shaped the field of Artificial Intelligence and Machine Learning as we know them today. In addition, the book does not assume a strong technical background so it should be easily understood not only by scientists in the field but also the general public.
Thank you again for listening to this episode of Learning Machines 101! I would like to remind you also that if you are a member of the Learning Machines 101 community, please update your user profile and let me know what topics you would like me to cover in this podcast.
You can update your user profile when you receive the email newsletter by simply clicking on the: “Let us know what you want to hear” link!
If you are not a member of the Learning Machines 101 community, you can join the community by visiting our website at: www.learningmachines101.com and you will have the opportunity to update your user profile at that time. You can also post requests for specific topics or comments about the show in the Statistical Machine Learning Forum on Linked In.
From time to time, I will review the profiles of members of the Learning Machines 101 community and comments posted in the Statistical Machine Learning Forum on Linked In and do my best to talk about topics of interest to the members of this group!
And don’t forget to follow us on TWITTER. The twitter handle for Learning Machines 101 is “lm101talk”!
Also please visit us on ITUNES and leave a review. You can do this by going to the website: www.learningmachines101.com and then clicking on the ITUNES icon. This will be very helpful to this podcast! Thank you so much. Your feedback and encouragement are greatly valued!
Keywords: Machine Learning, Samuel Checker playing program, Artificial Intelligence, Artificial Neural Networks, Temporal Reinforcement Learning, Heuristic Search, Evaluation Function, Features
Further Reading:
- This is a good reference for example work in the area of temporal reinforcement learning.http://www.scholarpedia.org/article/Reinforcement_learning
- This is a good reference for “best-first search”: http://en.wikipedia.org/wiki/Best-first_search
- An original description of Samuel’s Checker playing program written by Samuels in 1959 may be found in: Arthur, Samuel (1959). “Some Studies in Machine Learning Using the Game of Checkers” (PDF). IBM Journal 3 (3): 210–229.
This was reprinted recently in the IBM Journal of Research and Development (Vol. 44, Issue 1.2) in the January, 2000. There’s also a free copy you might be able to find on the web!!! Also check it out for free from your local university library (not public library or community college library).
- The book “The Quest for Artificial Intelligence” by Nils Nilsson (Cambridge University Press) has a nice discussion of Samuels Checker playing program on pages 90-93. Also
on pages 415-420 there is a nice discussion of temporal reinforcement learning. On
pages 81-85 there is a discussion of the “best-first search” strategy although it is
not explicitly referenced as such.
- The book “Artificial Intelligence: Foundations of Computational Agents”
by Poole and Mackworth (2010) (Cambridge University Press) is freely
available (can be downloaded for free at: http://artint.info/html/ArtInt.html)
This book has discussions of “heuristic search” and “features” as well as other
relevant topics.
Here are some recent articles that discuss how research in artificial intelligence is influencing current theories of how information is processed and learned in the human brain (and vice-versa!). These articles may be a little technical but I provide them to help provide a snapshot of the latest research in this area!!!!
- Wilson, Takashashi, Schoenbaum, and Niv (2014). Orbitofrontal Cortex as a Cognitive Map of Task Space. Neuron, 81, 267-279.http://www.cell.com/article/S0896-6273(13)01039-8/abstract
- Niv, Y. (2009). Reinforcement learning in the brain. Journal of Mathematical Psychology, 53,
139-154.
Copyright Notice:
Copyright © 2014-2018 by Richard M. Golden. All rights reserved.