Numb3rs 406: In Security

In this episode, Don knows that his actions have led to the death of a protected witness, and assumes that he therefore made the wrong decision about whether or not to interact with her. However, if the risk to the witness was low enough based on the information available to Don, his decision may have been acceptable. To try to find out the truth, Charlie uses a Classification And Regression Tree (CART) Analysis to try to analyze Don's motivations and determine whether he was actually at fault. This method actually has applications in a variety of fields, including in medicine.

Hindsight

Suppose you play a game with a 99% chance of winning $5, and a 1% chance of losing $5. After you play, the unthinkable happens and you lose $5. Did you make a bad decision?

The answer is no: if you were presented with the same choice again, you should again play the game, so you should feel no guilt about the decision. Similarly, Charlie was trying to determine whether Don's decision was a good one given the information he had at the time.

Most decisions that we make in day-to-day life are very complex. For example, Sara may be faced with a decision like this one: She needs to decide whether she should:

When Sara is deciding between these options, she probably uses many conflicting pieces of information to make this decision. For example, she might ask herself, is there a good chance that something important happened in the news, that she will want to know about? Are there other people around who would be interested in playing a game? Does she have some important tasks that she is supposed to be completing? It is likely that she will have a hard time being certain that she has made the correct decision. If she wanted to make this decision using a CART analysis, she would take the following steps:


Step 1: Outcome Evaluation

First, CART analysis assumes that there was a "correct" decision to be made in each situation, but that the decisionmaker may not know it ahead of time. In our example, we will say there are a few possible outcomes of the decision:

  1. News was correct: Sara should have read the news, since something really important happened.
  2. Game was correct: Sara should have played a game with her friends, since all her friends were leaving town in a few hours.
  3. Work was correct: Sara should have done something productive, because she had very difficult homework to turn in the next day.

Sara needs to decide how much each of the potential outcomes costs depending on her decision, and assign values to each of them. She might do it like this:


  • If I choose correctly, that does not cost me anything, so 0's go in those boxes.
  • If I was supposed to read the news, but I did something else, that is not so bad, so I will put 3's in the first row.
  • Work and Games take a long time, so if I choose one of those when I am supposed to be doing the other, I lose a lot of valuable time. I will put 10's in those boxes.
  • If I choose news when I am supposed to be doing something else, that doesn't take so long, so I will put 5's in those boxes.
  • Sara's Cost of
    Decisions
    Sara's Decision
    News
    Game
    Work
    Correct
    Decision
    News
    0
    3
    3
    Game
    5
    0
    10
    Work
    5
    10
    0

    Step 2: Predictor Variables

    Sara now needs to put together a list of what kinds of information she has to help with her decision. In our example, she will have the following knowledge:

    Each of these will have either a "yes" or "no" answer, and using these, she needs to make her decision. To do so, she has to evaluate which of these predictor variables is most accurate. She decides the following:

    Being "90% sure" is not a very statistically precise notion. A CART program would use real statistics like correlation coefficients, in place of these everyday terms, but the idea of the CART analysis can be understood without complete mathematical rigor.

    Step 3: Splitting

    Sara now needs to look at her three pieces of information, and decide which of them is most reliable at splitting the outcomes into two groups of possibilities. In this case, she sees that her friends' knowledge is the most accurate "splitting variable:" if her friends told her that they were leaving town, she has a high degree of certainty that she should choose "Game."

    After this splitting has occurred, she only needs to deal with the case where her friends did not mention that they were leaving town. This means that given this, there is only a 10% chance that she needs to play a game immediately, and she needs to look at the rest of her information to make a decision. Her somewhat unreliable memory regarding her assigned work is her next most reliable source of information, so she needs to decide whether to use it.

    She then needs to weigh the costs of the various outcomes (from Step 1) against her certainty about which outcomes are correct (from Step 2). In each case, there is some probability that each choice is correct. Using the imprecise information from Step 2, we will conclude that these probabilities are:

  • If she remembers an assignment: News 30%, Game 10%, Work 60%.
  • If she remembers no assignment: News 50%, Game 10%, Work 40%.

    This does not mean she should automatically choose Work in the first case and News in the second. Using the costs in Step 1, we can determine the expected cost of each choice, essentially weighing the costs against the probabilities:
    Sara's Decision if she remembers an assignment
    News
    Game
    Work
    Expected Cost from "News" being correct: 30% * 0 = 0.030% * 5 = 1.5 30% * 5 = 1.5
    Expected Cost from "Game" being correct: 10% * 5 = 2.510% * 0 = 0.0 10% * 10 = 1.0
    Expected Cost from "Work" being correct: 60% * 5 = 3.060% * 10 = 6.060% * 0 = 0.0
    Total Expected Cost:
    5.5
    7.5
    2.5
    So, in this case, the lowest expected cost comes from trusting her knowledge of her assignment and choosing "Work."

    Sara's Decision if she does not remember an assignment
    News
    Game
    Work
    Expected Cost from "News" being correct: 40% * 0 = 0.040% * 5 = 2.0 40% * 5 = 2.0
    Expected Cost from "Game" being correct: 10% * 5 = 2.510% * 0 = 0.0 10% * 10 = 1.0
    Expected Cost from "Work" being correct: 50% * 5 = 2.550% * 10 = 5.050% * 0 = 0.0
    Total Expected Cost:
    5.0
    7.0
    3.0

    In fact, in this case, the lowest expected cost also comes from doing some work. Even though it is more likely that "News" was the correct choice, the potential cost to Sara from turning in an assignment late outweighs the risk of missing an important news story.

    This analysis could be repeated for the "Gut Feeling" information, and would also yield the same result: regardless of her gut feeling about the news, she should choose to do some work in this situation.


    Step 4: Building a Tree

    From the splitting analysis done in step 3, Sara can build a decision tree. In our example, it looks like this:


    Summary

    A CART analysis of Sara's decision here helped her decide which pieces of information she should use in her decision, and which pieces of information were not important. It is interesting to see that in this case, even though she had three pieces of information, only one of them was worth using, although that would not have been the case if Sara was more confident in her knowledge about assignment due dates. Also, because of this, Sara never chooses to read the news, because her uncertainty about her homework and her priorities regarding her friends will always convince her to choose one of the other two options.

    In an actual application of CART analysis, there would be many, many more predictor variables, much more precise statistics underlying it, and many more outcomes. In the case of a medical patient, for example, a doctor could have hundreds of pieces of data regarding the health of their patient, all of which are potential predictor variables, and there would be several possible choices to make regarding treatment of the patient, all of which had different costs (both in terms of health and money!) Normally a computer program would be used to analyze the probabilities and create an optimal tree for decisions.

    In this Numb3rs episode, Charlie was trying to find whether Don should have interacted with the witness, or not interacted with the witness. His possible outcomes were the safety of the witness and the death of the witness, and Don's possible choices were interacting with the witness or following protocol. In order to perform a CART analysis, Charlie would have had to answer some very difficult questions!

    In step 1, Charlie would need to assign numerical costs to each of the outcomes.
    Don's Cost of
    Decisions
    Don's Decision
    Interact
    Don't Interact
    Correct
    Decision
    Interact
    (Enjoyment from dating)
    (Missed dating opportunity)
    Don't Interact
    (Death of the witness)
    (Neutral result)

    Activity: Assigning Costs

    Fill out the cost matrix with numbers! If enjoyment from dating is value 1, and a missed dating opportunity costs 1, how many cost units is the death of the witness? Charlie would need to decide this.
    Don's Cost of
    Decisions
    Don's Decision
    Interact
    Don't Interact
    Correct
    Decision
    Interact
    -1
    1
    Don't Interact
                
    0

    After the CART analysis was completed, Charlie determined that Don was not at fault, meaning that given the information available to him, he correctly weighed the safety of the witness against his potential enjoyment from dating, and the optimal decision tree would have pointed him toward interactions with the witness. This depends heavily on the numerical values assigned in Step 1!