1. Introduction to Sudoku

2. Solving Strategy

3. Counting Solutions

4. Solution Symmetries

5. The 4×4 Case

6. Some More Interesting Facts

7. References

The Math Behind Sudoku

Solution Symmetries

The number of distinct Sudoku grids has been determined to be N=6670903752021072936960≈6.671×1021. But sometimes we can transform one filled-in grid to another in some hands-on way. For example, if we have a valid Sudoku grid, then rotating it by 90° results in another grid, which may be distinct from the first one, but is still valid. We can consider these grids to be essentially the same because the transformation applied still leaves the grid as a Sudoku solution. Similary, if we swap all the 3's with the 4's in the grid, we end up with another valid grid. We could also take a Sudoku grid, switch the places of the fifth and sixth rows, and end up with another valid grid. When we perform any of these operations on a valid grid, we preserve the property of it being valid.

These kinds of operations are called symmetries of a grid. A symmetry of an object is an operation that preserves a certain property of the object. It is interesting to note that if we perform one symmetry and then another, the resultant transformation is itself a symmetry. Also, included in the set of symmetries is the transformation that leaves the object unchanged. For any symmetry, there is another one that undoes what the first one does. Finally, performing one symmetry then a second then a third can be done by grouping either the first and second together or the second and third together, and the resultant transformation is the same either way.

These properties tell us that the set of symmetries of an object form a group. A group is a set G along with an operation · which satisfies the following properties:

  1. If g and h are elements of G, then so is g·h. (This is called closure under the group operation.)
  2. If g, h, and k are elements of G, then g·(h·k)=(g·h)·k. (This is called the associativity property of the group operation.)
  3. There is an element e in G such that g·e=e·g=g for all g in G. (This element e is called the identity element of G, since it leaves every element of G unchanged under the operation.)
  4. For any element g of G, there is another element h of G so that g·h=h·g=e, where e is the identity element. (The element h is called the inverse of g, and is denoted by g-1.)

Notice that we used multiplicative notation for the operation of a group in the above definition. We could also use additive notation, and in this case we would denote the inverse of g by -g.

Exercise: Check that the set of integers Z with the operation of addition is a group. You need to check that all four of the properties above are satisfied.

The group of integers under addition has the extra property that the order in which elements are added does not matter. That is, for any two integers a and b, a+b is the same integer as b+a. When a group operation satisfies this commutative property, the group is described as abelian.

Let's see an example of a group of symmetries. Consider the square with labeled vertices:


This geometric object has eight transformations that preserve the fact that it is a square. In other words, the square has eight symmetries. These consist of the following rotations and reflections, with the group operation being composition of transformations:

  1. Rotation by 0 degrees (the identity transformation).
  2. Rotation clockwise by 90 degrees.
  3. Rotation clockwise by 180 degrees.
  4. Rotation clockwise by 270 degrees.
  5. Reflection in the horizontal axis (through the center of the square).
  6. Reflection in the vertical axis (through the center of the square).
  7. Reflection in the diagonal from the bottom left to the upper right corner.
  8. Reflection in the diagonal from the upper left to the bottom right corner.

Exercise: Pick any two transformations from the above list and check that performing one and then the other results in a transformation that is already on the list. Can you find two transformations whose composition results in different transformations depending on the order in which they are composed?

Unlike the group of integers under addition, the group of symmetries of a square is not abelian.

The symmetry group G of a valid Sudoko grid consists of the above Euclidean transformations of the square along with some other transformations involving switching places of blocks, rows, and columns, and compositions of these transformations. The symmetry group is thus generated by all the transformations of the following types:

  1. Relabeling the nine digits.
  2. Permuting the three stacks.
  3. Permuting the three bands.
  4. Permuting the three columns within a stack.
  5. Permuting the three rows within a band.
  6. Any reflection or rotation (from the list of symmetries of a square).

It is important to note that this is not a list of all the elements of G; it is a list of the kinds of symmetries that are in the group and combine to form other elements of the group. The specific transformations of each of the above types, and any compositions obtained from them that are distinct from the above types, comprise the symmetry group G. For instance, one element of G is the swapping of the first and second rows, which is a specific transformation of type (5) above. Let X denote the set of valid Sudoku grids. We now know that this set has a finite number of elements. Since each element of G is a map that takes one valid grid to another, this tells us that G has only finitely many symmetries in it.

We call two Sudoku grids equivalent if we can get from one to the other by applying one or more of the symmetries in G. If there is no such sequence of symmetries taking one grid to another, the grids are called essentially different.

This relationship is indeed an equivalence relation in the formal mathematical sense, because it satisfies the following three properties:

  1. The grid A is equivalent to itself (this is called reflexivity).
  2. If A is equivalent to B, then B is equivalent to A (this is called symmetry).
  3. If A is equivalent to B and B is equivalent to C, then A is equivalent to C (this is called transitivity).

Given any valid Sudoku grid A, we can consider all the grids equivalent to A as being essentially the same as A. By grouping together grids that are equivalent to each other, we can actually partition the set of grids: that is, we can break X up into subsets, none of which share elements with each other. We call these subsets equivalence classes. Any two elements in one equivalence class are equivalent to each other by some symmetry in G. The set of equivalence classes is denoted by X/G, and read "X modulo G" or "X mod G."

In the previous page, we discussed the enumeration of distinct Sudoku grids, but it would also be interesting to find out how many essentially different grids exist. Based on our discussion above, the total number of equivalence classes of grids, or the number of elements in X/G, is exactly the number of essentially different Sudoku grids. We outline the method used to calculate this number by Ed Russell and Frazer Jarvis in early 2006.

First we disregard the relabeling of the digits and only look at symmetries that operate on the grid - on the whole grid, blocks, or cells. We consider these symmetries - of types (2) through (6) above - and their compositions. This gives us a group H, which Russell and Jarvis discovered contains 3359232 distinct symmetries. In other words, H is the group generated by symmetries of types (2) through (6).

Now our notion of the equivalence of two grids can be stated as follows: The grid A is equivalent to the grid B if A can be transformed by a symmetry in H to another grid C, and this grid C can be relabeled to obtain the grid B. We can say A is H-equivalent to C, and C is equivalent to B by relabeling. Note that H-equivalence and equivalence under relabeling are indeed equivalence relations (they both satisfy the three properties listed above).

We think of H as acting on the set X of valid Sudoku grids: Every h in H represents a mapping from X to X, which takes one grid in X to another (possibly the same) grid in X by applying the transformation h. In other words, h gives us a way to transform any valid grid to another valid grid, and every h in H gives us such a map.

For any h in H, we can look at the grids that are fixed by h, up to relabeling. This means all grids A such that the grid B to which h sends A is equivalent to A by relabeling. This takes care of the fact that we left relabeling out of the symmetry group H.

In order to count the number of essentially different Sudoku grids, we need a theorem from group theory called Burnside's Lemma.

Burnside's Lemma: Let G be a finite group that acts on a set X. For each g in G, let Xg denote the set of elements in X that are fixed by g. Then the number of elements of X/G is |X/G|=1/|G|Σg in G|Xg|, where |-| denotes the number of elements inside the set.

Exercise: Imagine coloring each of the edges of a square blue or green. Two colorings are called essentially the same if there is a symmetry of the square which takes one of the colorings to the other. They are essentially different if no symmetry can transform one to the other. Use Burnside's Lemma to determine how many essentially different colorings exist. (For each of the eight symmetries of the square, figure out how many colorings are fixed. It may help to draw all 24=16 colored squares and visualize the transformations. Burnside's Lemma tells you to add up the number of fixed colorings of each symmetry and divide the sum by the number of symmetries in order to get the answer.)

In our case, the finite group is H, and it acts on the set X of valid Sudoku grids. For each h in H, we want to find out how many elements of X are fixed by h, up to relabeling. Then we need to add up these numbers and divide by the number of elements in H to get our answer of how many essentially different Sudoku grids there are. This accounts to taking the average number of grids fixed (up to relabeling) by an element of H.

As an example, the following is a valid Sudoku grid that is a fixed point of the 90° clockwise rotation (up to relabeling):


Exercise: Figure out how to relabel the digits in the rotated grid so that you get the original grid back. Write down what digit 1 should be sent to, what digit 2 should be sent to, what digit 3 should be sent to, and so on. This confirms the claim that the orginal grid is a fixed point of the given rotation symmetry.

In order to use Burnside's Lemma to calculate the number of essentially different grids, we would need to determine the number of fixed points for each of the 3359232 elements of H. It turns out that some of these 3359232 transformations have the same number of fixed grids. Russell and Jarvis used a computer program called GAP to determine that there are 275 classes of symmetries such that any two symmetries in one of these classes have the same number of fixed grids, and every symmetry in H has the same number of fixed points as an element in one of these classes. So we only need to count the number of fixed points for one symmetry from each of these 275 classes, and knowing how many symmetries are in each class will enable us to find the average as in Burnside's Lemma.

Exercise: Consider the reflection of a Sudoku grid over the horizontal axis through its center, that is, over the fifth row. Note that the fifth row stays the same. Can you relabel the reflected grid in order to get the original one back? What does this tell you about the fixed grids of the reflection over the horizontal axis? Are there any fixed grids? (Think about the fact that since this is a Sudoku grid, all nine digits occur in the fifth row.)

As we see in the exercise above, there do exist symmetries in H that have no fixed points. Russell and Jarvis actually determined that 248 of the 275 classes contain symmetries which do not have any fixed grids. So they only needed to count the number of fixed grids for a symmetry from each of the remaining 27 classes and find out how many symmetries are in each of these classes before using Burnside's Lemma. They used computer programs in order to carry out the calculations and discovered that there are 5472730538≈5.473×109 essentially different Sudoku grids.