In this episode, Charlie suggests using an evolutionary algorithm to search for abnormal behavior after a murder occurs in a video game with a real-life component. Evolutionary algorithms imitate the biological process of natural selection to find optimal solutions. There is a notion of which potential solutions are "fitter" than others. The fittest solutions survive as less fit solutions are weeded out, hopefully resulting in very fit (optimal) solutions over time. On this page, we will see how a genetic algorithm (a particular kind of evolutionary algorithm) could conceivably be used with a satellite to find a point of maximal elevation on the Earth. Charlie explains:
"Think of a pond. A pond is a healthy ecosystem where predator and prey are in balance. However, if an outside predator is introduced, well then the ecosystem changes. An evolutionary algorithm highlights abnormal alterations, it discerns subtle patterns, and locates the source of a problem.''
From this description, one might confuse the uses of evolutionary algorithms with the metaphor on which they are based. As we will see, abnormal alterations (mutations) occur in the execution of such algorithms. Presumably, Charlie has in mind a way of comparing behavior patterns and determining which actions are "abnormal" for video game play. These will be the fittest for his purposes. In general, evolutionary algorithms can be potentially used anywhere there is a meaningful notion of "fitness". This has nothing to do with searching for abnormal behavior.
For instance, let's suppose we have a satellite that will tell us the elevation of the Earth at any specific longitude/latitude pair we ask of it. In this example, we are only allowed to ask the satellite to check one point at a time. One way of finding a high point is to ask the satellite to check (in some order) each point on a fine grid laid over the region, like a bitmap image. A genetic algorithm may be a more efficient approach. We define a longitude/latitude pair to be more "fit" than another if it has a higher elevation.
Various factors influence the natural selection process. Random mutations can occur in the genes of individual organisms. When these have any effect on fitness at all, it tends to be negative. Nonetheless, mutations can occasionally increase the fitness of an individual. Such mutations may become more prevalent in successive generations, by natural selection. Sexual reproduction, where the geneotype of offspring depends on the genotype of two parents, tends to lead to greater diversity of offspring than asexual reproduction, where offspring have a single parent. Population diversity, by virtue of a large number of gene combinations, may lead to the existence of some very fit individuals. (Diverse populations may also be better suited to surviving changes in how fitness is determined as environmental circumstances vary.)
Now let's take an initial "population." This is a collection of such strings as above, often chosen at random. For our particular genetic algorithm, we can choose to keep the population at a constant size, as follows. We will choose half the members of the population to reproduce by sexual reproduction, in pairs, each pair producing four offspring. At this point, the parents (and organisms that did not reproduce) "die", and will be removed from the population. Such choices were somewhat arbitrary, and there are many similar algorithms.
In order to choose which half of the population reproduces, we could pair off individuals at random and let the more fit of each pair survive a given (large) proportion of the time, and the less fit survive the remainder of the time. Alternatively, we may wish to sort our population in order of fitness, letting the top third reproduce, "killing" the bottom third, and giving individuals of the middle third a 50% chance of reproduction. If we never allow the less fit organisms to reproduce, we might risk having low population diversity.
Now that we have chosen which individuals "reproduce", we need to determine how their genetic information is combined. One easy way of doing this is based on an actual biological process: crossing over of chromosomes. For each pair of reproducing individuals, choose a random position in the genetic string. Now, swap the portions of the two parents' chromosomes after this position, and we have two offspring with genes based on, but (usually) not the same as either parent. Swapping at a different point, the same parents produce a second pair of offspring.
Finally, we would like to add some mutations to our population in hopes of potentially finding some fitter individuals. Choose some specific (small) number p. Now, look at each bit (0 or 1) of each string in the population, and change it to its opposite (1 or 0) with probability p. Now we have the next generation! Repeating the process of selection, reproduction, and mutation for many generations, the genetic algorithm will hopefully produce a population with high fitness, and give us good solutions to our problem.
For the random element, roll a six-sided die. When deciding which of two organisms will reproduce, let the more fit (higher) organism reproduce if the roll is 1-5, say, and the less fit organism reproduce if the roll is 6. Use dice to determine where crossovers and mutations occur as well.
Now, play around with the choices you made. Try changing population size. Try increasing or decreasing the probability for unfit individuals to reproduce. Change the rate of mutations. How do these choices affect the fitness of your population after a given number of generations? (You will want to run your program several times for each set of parameters, to see how much variation occurs due to the random factors.)