In this episode, Charlie and the FBI team try to find a mole in the federal government that is leaking secrets to the Chinese.

Apple # | Mass | Volume |

1 | 0.15 | 10.5 |

2 | 0.05 | 2.5 |

3 | 0.18 | 8.5 |

4 | 0.1 | 5.5 |

5 | 0.25 | 13.5 |

6 | 0.35 | 19.5 |

7 | 0.4 | 15.5 |

8 | 0.22 | 11.5 |

It is hard to notice any pattern in the data just by looking at the table. However when the data is plotted on graph paper, it becomes clear that the two measurements are related. In fact, they almost lie along a straight line:

Principal component analysis is a mathematical technique for finding these patterns automatically. To apply the technique, we use the following steps:

- Create a matrix
*X*that contains all of the measurements. Each row of the matrix corresponds to a single observation and each column corresponds to a measurement. - Compute the
*mean training vector*by averaging together all of the rows of*X*. Call the mean vector*u*. - Subtract
*u*from every row of matrix*X*, to produce a new matrix*Y*. - Compute the
*covariance matrix C*of*Y*, using the following formula: - Now we solve the following equation:
where *x*is a vector and is a number. In general there are many values of*x*and that make this equation true. The possible values of are called the*eigenvalues*of*C*, and the corresponding solutions for*x*are called its*eigenvectors*.

Then we compute the mean, which is

Step 4 gives us the following covariance matrix:

Finally, we compute the eigenvalues and eigenvectors. The eigenvalues are about 208 and 0.003, and the eigenvectors are (0.0181, 0.999) and (-0.999,0. 0181). The two eigenvectors give us a new set of coordinates that explain the data more compactly. That is, the results of PCA say that the data lies mostly along the vector (0.0181, 0.999), since the corresponding eigenvalue is so much larger than the other. In fact, if we extend the vector (0.0181, 0.999) and overlay it on the graph of training exemplars, we see that the majority of apples lie along the line:

To understand how eigenfaces work, we have to understand how images
are stored in a computer. A digital image is represented as a set
of *pixels* or points of light. For example, each of the face
images shown to the right is stored as a 40x40 grid of pixels, and
each pixel has a brightness value from 0 to 255. Mathematically, we
can think of an image as a vector of length 1600 (since 40*40=1600),
where each entry in the vector is in the range [0,255]. A specific
image is represented as a single point in this high dimensional space.

How would we go about comparing two face images to decide if they are
of the same person? One idea would be to compare the brightness
values of the two images pixel-by-pixel and to declare a match if all
of the pixel values were similar. We could accomplish this by
computing the *Euclidean distance* between the two points in the
high dimensional space. Unfortunately, this approach does not work,
because two pictures of the same person can be just different enough
such that no two pixels actually have the same value. For example, the
four images to the right are of the same person, but the pixel values
in every one of them are different.

In fact, virtually any two images of the same person are going to be
at least slightly different and thus be represented as different
vectors. It turns out that the number of possible images of the same
person is unbelievably large. Suppose we either add or subtract one
brightness value from every pixel in one of the above images. This
would change the image very slightly but the face would still be
recognizable. How many possible ways are there to do this? The answer
is 2^{1600}, or about 10^{480}. This is an enormously large number; it is
many times more than the number of atoms in the universe!

Our approach suffers from what is called the *curse of
dimensionality*: we are representing our images in a way that is too
complicated. Our data has too many degrees of freedom compared to what
we need to accurately represent a face.

The technique Charlie uses to address this problem is called
Eigenfaces. The idea is to transform a face image into a much lower
dimensional space that represents only the most important parts of a
face. To apply the Eigenface technique, we first collect a *training
set* of many (hundreds or thousands) of facial images. Each of the
training images is represented as a vector of length 1600, as we saw
before. Then we create a matrix *X* that contains all of the
training vectors, one per row. The resulting matrix has size N x 1600,
where N is the number of training images. Then we perform PCA on the
matrix using the steps given above.
Just like before, PCA is used to reduce the dimensionality of the face
data. The result is a set of eigenvalues and eigenvectors, as before,
but researchers have given eigenvectors of faces a special name:
eigenfaces. Usually only a small number (e.g. 10) of eigenfaces are
kept after PCA is performed. Each of these eigenfaces is a sort of
"prototype" of some general face characteristics. Here are what some Eigenfaces look like:

Any given face can be encoded as a combination of these eigenfaces. For example, your face might be 10% of the first eigenface, 30% of the second eigenface, 60% of the third eigenface, and 0% of the fourth. Eigenfaces thus lets us represent a face as a point in low dimensional space, instead of the 1600-dimensional space required before.

- Suppose you have a very simple camera that takes 10 x 10 pixel images, and every pixel can take on one of only 2 brightness values. How many possible images can your camera produce? If you were to take two pictures randomly, what is the probability that the two pictures would be exactly the same?
- Suppose you know that the eigenvalues of the following matrix are -1 and -2: [ 0 1 ; -2 -3] What are the corresponding eigenvectors?

Brachistochrone Problem. An upside down cycloid is the solution to the famous Brachistochrone problem (curve of fastest descent) between two points; that is, the curve between two points that is covered in the least time by a body that starts at the first point with zero speed under the action of gravity and ignoring friction. This problem was solved posed in the *Acta Eruditorum*, the first scientific journal of Germany, in 1696. The problem was solved by 4 famous mathematicians: Isaac Newton, Jakob Bernoulli, Gottfried Leibniz, and Guillaume de l'Hôpital.

In few words, it is the curve described by looking at the movement of a point on a circle while we move the circle along a straight line. In the activity below we study the mathematical equations that describe this curve

- Consider the figure to the right and argue that the distance between O and A is ra.
- Using the result from above, what is the value of x in terms of a and r? [Hint: Use trigonometric functions]
- What is the value of y? [Hint: Use trigonometry again]
- Compute the length of the segment of the curtate cycloid that has been drawn in the picture to the right.

- The problem is to minimize a function
*f*(*x*) of variables*x*_{1},*x*_{2}...,*x*_{n}over a region of feasible solutions S. That is, we are interested in finding

- The set
*S*is usually determined by general conditions on the variables, such as them taking values on the non-negative integers.

In out example, we can just go ahead and try every single element in *S* to find which pair maximizes the objective function *f*. Of course there are many techniques to find the optimal solution, depending on the particular problem. Branching and bounding algorithms constitute one such technique.

There exists a combinatorial optimization technique called *branching and bounding* (hereinafter called B&B) which basically consists of searching for the best solution among all possible solutions to a problem by dividing the set of solutions and using bounds for the function to be optimized. Since there is some enumeration of solutions involved in the process, this technique can perform very badly in the worst case; however, the integration of B&B algorithms with other optimization techniques and has proved useful for a wide variety of problems.

Here is a description of the three main components involved in B&B algorithms for minimization problems (with the case for maximization problems dealt with similarly):

B&B algorithms are an example of divide and conquer algorithms. The idea is to break the problem down to smaller problems that are easy to solve. This technique has proved useful in solving instances of the traveling salesmen problem with 40 to 60 cities and the graph partitioning problem.

- A
*bounding function*(upper or lower, depending on the case)*g*(*x*) for our objective function*f*that will allow us to discard some subsets of*S*. - A
*branching rule*that tells us how to divide a subset of*S*in case we are not able to discard such a subset as not having the optimal solution which we are looking. - A
*strategy*for selecting the subset to be investigated in the current iteration.

We describe the technique with the following example,

**Example:**Maximize the function Z = 21x_{1}+ 11x_{2}given the constraints 7x_{1}+ 4x_{2}≤ 13 , with x_{1}and x_{2}non-negative integers.**Solution:**The set*S*is described by the inequalities 7x_{1}+ 4x_{2}≤ 13, x_{1}≥ 0, x_{2}≥ 0. Since we cannot discard*S*, we subdivide it into smaller pieces, thus making the original problem smaller. For instance, let us focus on the subset of*S*where x_{2}= 0. Then Z attains its maximum when x_{1}= 13/7 (approx. 1.857). This first step was used as our branching rule, and now we focus on two subsets of*S*,*S*_{1}and*S*_{2}as shown in the picture below. We easily see that there are no feasible solutions on*S*_{2}, and that the solution on*S*_{1}is Z = 37.5, which is attained with x_{1}= 1, and x_{2}= 1.5; this value of x_{2}is used as our new branching rule. We now consider the problem in two new subsets,*S*_{11}= {solutions with 0 ≤ x_{1}≤ 1 and 0≤ x_{2}≤ 1} and*S*_{12}= {solutions with 0 ≤ x_{1}≤ 1 and x_{2}≥2}. Solving the original problem (maximizing Z) for*S*_{12}yields Z = 37, which is attained by x_{1}= 0.71 and x_{2}= 2, so we branch at x_{2}once again and consider the sets {x_{1}= 1, x_{2}≥ 2} and {x_{1}= 0 and x_{2}≥ 2}. Finally, the solution must be in*S*_{11}, and solving the original problem on this set yields Z = 32, which is attained by x_{1}= x_{2}= 1. Continuing in this matter, we obtain the answer Z = 33, which is attained when x_{1}= 0 and x_{2}= 3. This answer is in the set*S*_{1211}

The picture above depicts the process used to solve our problem. We considered 9 subsets of*S*; that is, the number of vertices in the graph depicted above is 9.### Activity 3

- Using the procedure just explained, to maximize 2x + 3y, given that 10x + 3y ≤ 24, for non-negative integers x and y.
- Maximize -x + y, given that 12x + 11y ≤ 63, -22x + 4y ≤ -33, for non-negative integers x and y.
- * Let n be a positive odd integer. Consider the following optimization problem: Maximize -x
_{0}subject to x_{0}+2(x_{1}+ ... + x_{n}) = n, 0 ≤ x_{j}≤1 for j = 0, 1, ..., n, where all the variables are integers. Show that the number of subsets that need to be consider (number of vertices in the graph) is at least 2^[(n-1)/2].