In this episode, Charlie uses Bayesian inference and Markov chains to help his brother figure out the cause of a bus crash which let two prisoners escape from a prison bus.
Bayesian inference is the process of adjusting your belief in the probability of the truth of some statement based on new events. As a silly example, you may believe that it is extremely unlikely that there are any purple cows in the world. However, if you see a purple cow, then you would obviously change your belief to the belief that there is at least one purple cow in the world.
Let's say that Billy Bob has two boxes, one with one fair die in it, labelled with the numbers 1 through 6, and one with two fair dice in it, labelled with the numbers 1 through 6. Let's say he picks one box at random and then rolls the dice in the box. Let's use Bayes' rule to figure out the probability that he picked the box with one die given the fact that the total of the die (or dice) he rolled was a 3. First we need to translate the words into the symbols we used in the previous paragraph. Let's use A to denote the event that Billy Bob picks the box with one die and B the event that he rolls a 3. Then as we assumed, P(A) = .5, and we can calculate P(B) = .5*(1/6) + .5*(2/36)= 4/36 = 1/9. This is because there's a 1/2 chance he'll pick box 1, and if he does there's a 1/6 chance he'll roll a 3. Also, there's a 1/2 chance he'll pick box 2 and a 2/36 chance he'll roll a 3 (he can get a 1,2 or a 2,1). Now since what we want to figure out the probability that he picked box 1 (event A) given the fact that he rolled a 3 (event B), we want to use Bayes rule to figure out P(A|B). To use the formula we need to figure out what P(B|A) is. This is the probability that he rolls a 3 given the fact that he picks box A, so P(B|A) = 1/6. Then P(A|B) = (1/6)*(1/2)/(1/9) = 3/4. This makes sense because if he picks the box with two dice he is unlikely to get a 3, while if he picks the box with 1 die he is more likely to get a 3.
A Markov chain is a series of random events where the outcome of event n is only dependent on the outcome of the event n-1. For example, let's say there is a number line where each integer has a dot on it, and let's say that there is also an evil but useless robot standing on position 0. The robot has a quarter, and he flips it to determine whether he should go left or right. If he gets heads, he'll move to position 1, and if he gets tails he'll move to position -1. This is the first random event. Now he repeats this process over and over to generate a series of random events, where each time if he gets heads he moves right 1 step from his current position and if he gets tails he moves left 1 step from his current position (let's say right is the positive direction). This is a Markov chain because where the robot goes on his next step only depends on his current position and doesn't depend on where he's been in the past. The output of each event is the current position of the robot.
Now let's study this Markov chain a little bit to see how it behaves. One question we might ask is what is the average position of the robot after n steps. (A more technical way of phrasing this question is to ask what the expected value of the robot is after n steps.) To measure this we could make the robot take n steps, record its position, put it back on zero, repeat these three steps many times, and then average the results. It's actually pretty easy to see that the average should be 0, independent of what n is. To prove this, we can use mathematical induction. Let's say n is 0. Then the robot never moves at all, so its average position is 0. Now let's assume that after n steps the average position of the robot is 0. Then on the n+1 the robot has a 1/2 chance of being at position 1 and a 1/2 chance of being at position -1. Therefore the average position of the robot after n+1 steps is also 0. This proves that no matter what n is, the robots average position after n steps is 0.
Another question we could ask about this Markov chain is what the robot's average distance from the zero position is after n steps. This question is a little trickier, but still doable. Let's let be the robots position after n steps (so is a random variable) and let's define a new random variable . Also let's write E(-) for the average value of -. Then we have the equations: Here the second equality comes from the fact that half the time and the other half of the time . Since the average distance of the robot after 0 steps is 0, we see that the average distance of the robot from 0 is .