After one of David's childhood friends, Ben Ellis, throws a deadly punch fighting in a mixed martial arts match, he becomes a murder suspect - compounded by a previous virtually identical incident and drugs found in the dead man's system. While things look bleak for Ben, David promises to ask the team to look into the case. They're racing against the clock on two ends - one, fighting to keep Ben out of jail and two, trying to find out who the real killer is before Ben fights in the All-Out ultimate fighting match!
Charlie uses some sophisticated graph theory and statistical techniques to analyze the win/loss ratios of the main fighters involved in the mixed martial arts league, and discovers a disturbing trend - the fighters rising to the top aren't necessarily those that are the best, just those that have had their schedules managed the most judiciously. One of the techniques Charlie uses, Kruskal's algorithm, we can also get our hands on!
Let's start with a more basic subject though - what is graph theory? To talk about the algorithm, we first have to know what Charlie even means when he's talking about "graphs"!
One of the main branches of mathematics is a field called "combinatorics" - essentially, this is the study of counting. Before you laugh at the idea that someone could devote their life to something as simple as "counting," try and figure out these quick questions and see how easy/hard they really are:
When we talk about a graph in this case, we don't mean the "graph of a function." We mean an entirely different type of graph.
Let's look at these for a minute. The first graph only has two edges, and one "isolated" vertex (meaning there's a vertex that's not connected to any other vertex by an edge.) The second graph has no isolated vertices - in fact, every possible edge is already there. The third graph has no edges (and hence, four isolated vertices.) We say that this is "totally disconnected." The fourth graph is just a loop of length four (we call these "cycles".)
How many cycles does the second graph have? Well, this is a bit of trick question. If we're allowed to go over the same edge more than once, then it has infinitely many. If we're not allowed to go over the same edge more than once, then there are only cycles of lengths three and four in the second graph. How many cycles are there total?
While there are hundreds of classes of graphs that combinatorialists (mathematicians that study counting, deep down) would find of interest, two classes of graphs in particular that are worth looking at are "trees" and "chordal graphs." A tree is a graph where there are no cycles - essentially, these are the graphs that if you put your pencil down on one of the vertices and start traveling along edges, if you never double back, you can NEVER get back to the same vertex, no matter which vertex you started with! These graph have no loops, but while they sound easy, they can get pretty complicated pretty quickly. Here are some examples:
Trees show up pretty often in the construction of networks, although most often it's in the context of taking a given graph G and trying to remove edges from it until we wind up with ONLY a bunch of trees (so we are trying to get rid of all of our cycles!) We'll get back to this in a bit when we talk about Kruskal's algorithm.
JPGGraphs show up just above everywhere - from networks and connections to travel planning, they're entirely ubiquitous. However, given an arbitrary graph, it might be really, really complicated. We'd like to look at subset of our graph that are a little more manageable.
Let's go back to looking at general graphs for a minute. Given an arbitary connected graph G (where connected just means that given any two vertices, we have some path we can walk along edges to go from one to the other,) we don't necessarily have a tree - in fact, trees are relatively rare among the class of all connected graphs. However, we CAN take our original graph, and gradually remove edges until we get down to a subgraph that is a tree - in fact, we can do this and end up with a graph that's still connected! Let's look at a particular graph and see how this works.
The red highlighted part of the graphs above are our spanning trees. There are lots of other spanning trees though. As a small exercise, find at least three more distinct spanning trees. We'll eventually want a more specific algorithm to pick out "nice" spanning trees, but for now, while we work with unweighted graphs, this is fine.
Before we move on, here are some examples of subgraphs of our original graph that aren't spanning trees. Look at each of them, and explain why.
So far, we've been talking about simple graphs G. The only data we have is which edges connect different vertices. However, there's another natural extension of simple graphs that arise constantly in scientific and commercial applications constantly. The graphs, called "weighted graphs," also encode information about how much each edge "costs" to cross or use.
Here's our earlier unweighted graph. Compare that to a weighted version of the same graph (we just assign positive values or "costs" to each edge.)
Easy enough, right? Weighted graphs are frequently used to measure cost to travel or transport goods from place to place, or to measure flow of information or commodities in a given grid. You've probably run into weighted graphs yourself - for example, on most paper maps either driving times or distances are given between exits or major cities. The places where roads intersect form the vertices, and the roads form the weighted edges. When MapQuest or Google Maps is calculating the best route to take from point A to point B, they're also using the weighted graph of roads and location hubs, then minimizing travel time using sophisticated search methods.
The map to the right is a little more sophisticated than a basic weighted graph - now we have two different "types" of edges, both express and local. However, we can safely ignore this complication when considering this map if we restrict our attention to the blue edges.
One question that you'd probably be interested in is how to construct "minimal" spanning trees for these graphs. Before, when we were constructing spanning trees for our graphs, we just kept picking edges until we couldn't pick any more - and at that point, we were done, possessing our very own spanning tree for our original graph. Now though, different spanning trees won't necessarily be "worth" the same amount - if we add up the values along the edges in two different spanning trees of our graph G, one might be cost way more than another to have goods or information flow along our edges.
Kruskal's algorithm gives us a way to compute minimal spanning trees of our weighted graph - and it's pretty easy to do too! Start with your original weighted graph G.
This algorithm will always terminate (we add an edge at each stage, and we only started with a finite set of edges) and will always produce a minimal spanning tree. However, we DO have some choice at each stage, so our spanning trees aren't necessarily unique. Here's a sample graph and a demonstration of the algorithm: