Podcast: Play in new window | Download | Embed
Episode Summary:
In this episode we explore the concept of evolutionary learning machines. That is, learning machines that reproduce themselves in the hopes of evolving into more intelligent and smarter learning machines.
LM101-024: How to Use Genetic Algorithms to Breed Learning Machines
Show Notes:
Hello everyone! Welcome to the twenty-fourth podcast in the podcast series Learning Machines 101. In this series of podcasts my goal is to discuss important concepts of artificial intelligence and machine learning in hopefully an entertaining and educational manner.
In this episode we introduce the concept of learning machines that can self-evolve into more intelligent machines. One approach to developing such learning machines would be to have learning machines so smart they can actually design and implement smarter machines. Such an approach might be called self-engineering evolution.
For example, in the classic science fiction novel “Brave New World” written in 1931 by A. Huxley is set in the year AD 2540. In the year 2540 AD genetic engineering technology is used to create and decant children who are then raised in hatcheries. Genius children are just as easily designed as non-genius children. In the book “Brave New World”, the “Savage” asks the Resident World Controller why all children are not designed to be geniuses. The Resident World Controller patiently explains to the savage that this would result in a society which was dysfunctional and unorganized and virtually uncontrollable. The Resident World Controller continues to patiently explain to the Savage why it is important to specify exactly the right mixture of children with high IQs and children with moderate and low IQs if society is to function in an orderly and controllable manner. Another example of self-engineering evolution arises from the Terminator series where intelligent machines evolve and adapt and build more intelligent machines with special properties such as the ability to infiltrate infestations of human beings. Both of these cases are examples of what one could call the “creationist” perspective or the “artificial evolution” perspective.
But there is another approach to learning machine development which is based upon principles of natural rather than artificial evolution. We will begin by discussing theories of human evolution and then once we understand some of the issues associated with theories of human evolution we will explore their applicability to the problem of the design of machine learning algorithms.
We start with a discussion of one vision of biological evolution where simple organisms randomly mutate and evolve over multiple generations into more complex organisms. This view of evolutionary processes is not commonly accepted by scientists. Even though to a non-specialist it might appear to describe current scientific theories of evolution, this is not an appropriate description of well accepted scientific theories of evolution. Indeed, the chances that a totally random series of mutations would transform a single-celled organism into an organism of the complexity of a human being clearly would be astronomical especially taking into the fact that a random series of mutations would be just as likely or even more likely to be detrimental rather than beneficial for the evolutionary growth of a species.
An alternative vision of biological evolution is the Lamarckian evolutionary perspective which was proposed . The Larmarckian perspective is that experiences acquired by an organism during its lifetime can be passed down to the organisms’ offspring. Despite the fact that the Lamarckian Theory was proposed 200 years ago, this theory was held in disfavor by most scientists until relatively recently. The MIT Technology Review reported in an article published in 2009 entitled: “A Comeback for Larmarckian evolution”. The article describes an experiment involving rats who were genetically bred to have memory problems who were then exposed to an enriched environment which helped them overcome the genetically-induced memory problems. Moreover, the enriched learning environment which improved learning and memory for the genetically disadvantaged mice appeared to benefit not only their mice but also their offspring. This was an example of scientific evidence for the Larmarckian Theory that experiences acquired by an organism during it lifetime can be passed down to the organism’s offspring.
And finally, the third and most popular view of biological evolution is the theory of Natural Selection proposed by Darwin. The classic Darwinian perspective is that variety can be produced more efficiently through the genetic recombination. By genetic recombination, we mean that as a result of sexual reproduction the genetic code from two individuals is combined to create a new genetic code with a collection of genetic dispositions which are combinations of the genetic dispositions of the parents. In a large population of organisms which reproduce through genetic recombination, many offspring with a large variety of genetic dispositions can be generated.
Mutations, on the other hand, must occur with a very low probability. Any major alteration to the genetic code is likely to be fatal or detrimental to an organism rather than beneficial. On the other hand, genetic recombination has the potential to general tremendous variation in offspring within a population and these variations are generally expected to not be fatal or detrimental to offspring. This increases the chances that most of these offspring will have the opportunity to interact with their environments for an extended period of time before they actually mate with their partners. The offspring which are least likely to survive successfully for an extended period of time in their environments are less likely to have offspring of their own.
It is important to emphasize that the success of the organism in its environment over an extended time period so that it has time to reproduce is an essential characteristic of Darwinian Natural Selection Theory. The Mutation Rate is a factor influencing the organism’s genetic disposition which is typically not modified as a function of the organism’s behaviors. So this is a crucial, yet subtle distinction, between changes in a genetic disposition arising from genetic recombination compared with changes in genetic disposition due to mutation rates.
Classical Darwinian Natural Selection Theory deemphasizes the role of the Mutation Rate but there has been a growing interest in acknowledging that behavioral factors can have at least a partial influence on mutation rate as well. For example, there are studies which show that the consumption of alcohol during pregnancy can damage DNA handed down to one’s offspring. The well known molecular evolutionary biologist Masatoshi Nei has, in fact, argued contrary to popular scientific thought that mutation rather than genetic recombination is the most influential factor in Natural Selection. Our knowledge of the role of how molecular interactions are influenced by behavior and environment and how these molecular interactions can alter DNA has been steadily growing.
In 1848, the black peppered moth was first documented in England. Prior to this time period, peppered moths were always observed to be white. Then, by 1864, the black peppered moth was the most common form of the peppered moth and white peppered moths were very rare. About thirty years years, in 1896, J. W. Tutt proposed that the mechanism of change was a Darwinian Natural Selection process in which black peppered moths were better camouflaged from their predators than white peppered moths in the new black coal sky industrial environment of 19th century England. This hypothesis is consistent with recent observations that the appearance of white peppered moths has become more prevalent since the mid 20th century which is correlated with the movement away from a coal-driven economy.
Traditionally, it has been assumed that a Natural Selection Process based upon gene recombination is the best explanation of the story of the Peppered Moth Story. The dominant gene in the Peppered Moth specifies a Black Peppered Moth and the recessive gene in the Peppered Moth specifies a White Peppered Moth. This means that for a Peppered Moth to have a white appearance the Black Peppered Moth gene must be absent. However, more recently, scientists have identified a single mutation which also could be responsible for differences in the coloration of Peppered Moths.Thus, both the factors of mutation rate and genetic recombination could be relevant although initially only the mechanism of genetic recombination was proposed.
Regardless of whether mutation rate, genetic recombination, or some combination of these factors is the root cause of a Natural Selection Process, the fundamentally important point is that through either mutation rate or genetic recombination mechanisms a large variety of individuals is generated within a particular environment. The individuals which are most compatible or equivalently most successful in that environment will survive for a sufficiently long amount of time to reproduce and thus pass on their genetic dispositions to their offspring. Individuals which are not compatible or not successful in that environment will be less likely to survive and less likely to propagate their genetic dispositions. For this reason, the terminology Natural Selection will be used in the following discussion to refer to some combination of both mutation and genetic recombination processes.
Let’s now summarize the essential ideas of Natural Selection taking into account both the factors of mutation rate and genetic recombination. First, each organism in the population has a genetic disposition or specification which we will call the organism’s genotype. This genotype does not determine the organism’s behavior but it biases the organism to express some behaviors more than others. Second, the presence of a particular genotype provides a genetic disposition to behave in particular ways but the resulting behaviors are the result of complex interactions between the organism’s genotype and the organism’s environment. Although the terminology phenotype typically refers to the appearance of an organism as influenced by its genotype, the concept of a behavioral phenotype is very important since different behavioral phenotypes might influence different behavioral mating strategies and hence different genetic recombination strategies. Mutation and genetic recombination operators generate diversity in the population of organisms. Organisms which exhibit behavioral phenotypes which are less successful are less likely to survive long enough to mate and generate offspring.
Let’s now discuss the basic ingredients of an example Genetic Algorithm as it pertains to machine learning. Suppose we have a particular learning machine and the learning machine has a number of characteristics. Some characteristics include: (1) methods of how input patterns will be represented, (2) the types of feature detectors used by the learning machine, (3) the specific type of learning mechanism used by the learning machine, and (4) an initial guess for the parameters to be learned by the learning machine. This information can be represented as the learning machine’s genotype which is a sequence of “genes” where each “gene” in the sequence is a sequence of zeros and ones. Notice that, like a biological genotype, the learning machine genotype does not explicitly specify particular knowledge structures or behaviors but rather specifies a disposition towards learning particular types of knowledge structures or behaviors.
The representation of a learning machine’s genotype as a gene sequence is a key idea so we now illustrate this idea with a specific example. We consider the Perceptron architecture described in Episode 15 of this podcast series. The Perceptron consists of a collection of input units. Each input unit can either be active or inactive. An event in the environment causes some input units to be active and some input units to be inactive. We assume an activity level can take on the value of either ZERO or ONE. Thus, different events in the environment correspond to different input patterns consists of ZEROS and ONES. Next, a group of hidden units are used to extract features from the pattern of activity over the input units. Each hidden unit computes a weighted sum of the activity levels of the input units. This weighted sum is called the netinput to the hidden unit. If this netinput exceeds a threshold value, then the hidden unit becomes active and takes on the value of ONE; otherwise the hidden unit becomes inactive and takes on the value of ZERO. The output unit of the perceptron works in a similar manner. The output unit computes a weighted sum of the hidden unit activations and generates a response of ONE if that weighted sum exceeds the output unit’s threshold value; otherwise the output unit generates a ZERO.
During the learning process, the Perceptron is provided with an input pattern and a desired response for that input pattern. If the Perceptron’s response to the input pattern is correct, then no learning occurs. On the other hand, if the Perceptron’s response to an input pattern is incorrect, then the connections from the hidden units to the output unit are changed by a slight amount determined by the learning rate of the Perceptron so that in the future the Perceptron is more likely to generate the correct response. The famous Perceptron Convergence Theorem states that if the connections from the hidden to output unit are properly adjusted and an optimal set of connections exists, then the Perceptron is guaranteed to learn any desired response for any desired input pattern in a finite number of learning trials. Check out Episode 15 for additional details!! Note that the connections from the input to hidden units are randomly chosen.
We can represent the genotype of a Perceptron as follows. We assume that every Perceptron has no more than say 100 hidden units. We also assume that every Perceptron has exactly 1 output unit and 20 input units. This means that each hidden unit pattern of connections is specified by 21 numbers: the connection weights of that hidden unit to each of the 20 input units plus an additional number for the hidden unit’s threshold. A hidden unit which is not used is represented by simply setting the connection weights of that hidden unit equal to zero. So now the genotype consists of 2100 numbers corresponding to the 21 numbers for each of the 100 hidden units. We might want to have Perceptron’s learn at different rates so we add an additional number representing the learning rate to bring the grand total up to 2101 numbers in the genotype. But remember that we need to represent a genotype for a learning machine as a list of zeros and ones. This is not difficult to accomplish if we assume that a fixed precision and represent a decimal number as a sequence of binary digits. A binary code would be one way to represent a decimal number as a sequence of zeros and ones. Computer scientists call this Base 2 notation.
Once the format of a genotype is defined, it can be created or altered in two fundamental ways. First, it can be created by a mutation. In this case, first a learning machine is selected at random from the current population of learning machines. Next, with some probability a particular gene is either added to an existing genotype, deleted from an existing genotype, or replaced with another gene in an existing genotype.
In contrast to creating new genotypes from mutation, a new genotype can be created by recombination. This is implemented, for example, by selecting two learning machines at random, selecting portions of their genotypes, and then splicing the gene sequence corresponding to one learning machine’s genotype into that of the other learning machine’s genotype and vice-versa. For the case of recombination, it simplifies the mathematical analysis to assume that a pair of parents always produces exactly two children.
If we have a generation of learning machines then the set of the genotypes of all learning machines in that generation will be called the population genotype which we assume evolves and changes from one generation to the next according to the above probabilistic rules. The “fitness” of an individual learning machine can be defined as the performance of that learning machine is acquiring statistics from the environment.
For simplicity, we assume that the behavior of the learning machines do not alter the statistical environment. We also assume that the statistics of the environment do not change over successive generations of learning machines. These assumptions are not necessary but we make these assumptions for expository reasons.
A given species genotype is assigned a number which represents the fitness of that population as specified by the population genotype. The fitness of a particular population of learning machines is computed by averaging the fitness of all individual learning machines in the population. It will be convenient to assume that SMALLER values of the fitness measure correspond to individuals which are more fit. We can use the average prediction error of a learning machine based upon its interactions with its environment as the fitness measure for a learning machine. Thus, “fitness” is a complicated function of not only the learning machine’s genotype but also the specific experiences of the learning machine.
The dynamics of population growth assumptions in our population of learning machines are as follows. It is assumed that the learning machine’s live in a police state which enforces rigid restrictions on population growth!
It is assumed that whenever a new learning machine is created by mutation then a decision has to be made whether or not to either: (1) keep the new learning machine and eliminate the learning machine’s parent, or (2) eliminate the new learning machine and keep the learning machine’s parent. Similarly, in the case of recombination, any pair of parents is required to give birth to exactly two children and a decision needs to be made regarding whether or not the children should or should not replace their parents.
Given that our goal is the development of evolutionary strategies for evolving smart learning machines rather than biological modeling, we can take advantage of our knowledge of the Metropolis Algorithm revised in Episode 21 of this podcast series. The Metropolis Algorithm essentially states that under the assumptions that: (1) the offspring of a pair of parents can (in turn) generate offspring with the same genotypes as their parents, (2) the offspring of a pair of parents can have exactly the same genotypes as their parents, and (3) every possible genotype can eventually evolve into every other genotype. In the language of Monte Carlo Markov Chain modeling, these assumptions are called the: reversibility, aperiodic, and irreducible Markov Chain modeling assumptions. In addition, it is assumed that when offspring are generated that improve the population fitness measure, those offspring always survive. However, if offspring are generated which do not improve the population fitness measure, then the probability that the offspring are kept anyway is not zero but instead is equal to the exponential function evaluated at negative one multiplied by the increase in the population fitness measure. Given these assumption, then the Monte Carlo Markov Chain Sampling Theorem is applicable and we have the nice mathematical result that population genotypes which have the best fitness will be visited most frequently by this population dynamics scheme as the number of generations becomes large. Moreover, the convergence rate is geometric.
Note that this these calculations are computationally easy since they involve changes to the population fitness measure and we assume that at each instant in time we either have just one mutation or one recombination so this controls the computational complexity per iteration. One could have multiple births or mutations at the same time and although this would slightly increase the computational complexity per iteration, the increased births or mutations per iteration might result in a faster exploration of the genotype space.
And finally, note that the above discussion is really a discussion of a branch of mathematical statistics which is called “model selection”. Recent advances in this area have explored the use of Metropolis-Hastings algorithms such as described here as principled alternatives to ad hoc model selection procedures such as stepwise regression modeling. In particular, I encourage listeners with advanced training in mathematical statistics to check out the paper by George and McCulloch which was published in (1993) in the Journal of the American Statistical Association entitled Variable Selection Via Gibbs Sampling as well as the paper by King and Brooks which was published in 2002 in the journal Biometrics entitled Model Selection for Integrated Recovery/Recapture Data. The approaches described in these papers is similar to the approach advocated here for exploring genetic algorithms. These references and others may be found in the show notes for this episode which are located on the website: www.learningmachines101.com
Thank you again for joining me for another episode of Learning Machines 101!
If you are a member of the Learning Machines 101 community, please update your user profile.
You can update your user profile when you receive the email newsletter by simply clicking on the: “Let us know what you want to hear” link!
Or if you are not a member of the Learning Machines 101 community, when you join the community by visiting our website at: www.learningmachines101.com you will have the opportunity to update your user profile at that time.
From time to time, I will review the profiles of members of the Learning Machines 101 community and do my best to talk about topics of interest to the members of this group!
Keywords: Gibbs Sampler, Monte Carlo Markov Chain, Metropolis-Hastings algorithm, Metropolis Algorithm,Genetic Algorithm, Perceptron, Evolution, Darwin Natural Selection
Further Reading:
Wikipedia Entry: Genetic Algorithm http://en.wikipedia.org/wiki/Genetic_algorithm
Wikipedia Entry: Metropolis-Hastings Algorithm. Click here to visit!
Nei, M. (2013). Mutation-Driven Evolution. Oxford University Press
Huxley, A. (1931). Brave New World (Chapter 16)
Wikipedia Entry: Peppered Moth Evolution
Discover Magazine: Mutation Not Recombination Drives Natural Selection
MIT Technology Review: A Comeback for Larmarckian Evolution
Alcohol Consumption Influence on DNA (Alcohol Research & Health, 34, NIAAA).
National Geographic Article on the Peppered Moth Story of Natural Selection
Scientists pinpoint single gene mutation which turned peppered moths from white to black
George, E. I., and McCulloch, R. E. (1993). Variable Selection Via Gibbs Sampling, Journal of the American Statistical Association, 88, 881-889.
King, R. and Brooks, S. P. (2002). Model Selection for Integrated Recovery/Recapture Data. Biometrics, 58, 841-851.
Copyright © 2015 by Richard M. Golden. All rights reserved.