SectionIV: Test

Section IV: Test

How do we determine the quality of a generator? Or given a sequence, what can we say about how random it is? This is what we will talk about in this section -- test of the randomness.

Generally, there are two kinds of tests: theoretical tests and empirical tests. Theoretical tests study the properties of the generator and try to tell in advance (before using the generator to generate number sequence) how well the generator is. On the other hand, empirical test can tell us how random a sequence is. When used to determine whether a generator is good or not, first we need to use the generator to generate a number sequence, then apply the test to the sequence. Usually, we need to generete several sequences, and say the generator passes the test only if the several sequences all pass it.

The development of theoretical tests is quite difficult and the theory is very hard. So in this section we only introduce empirical tests.

There are many of them, we will introduce some easy and commonly used ones.

Each test is applied to a sequence

<U_n> = U₀, U₁, U₂,... (4.1)

of real numbers, which purports to be uniformly distributed between zero and one. Some of the tests are disigned primarily for integer-valued sequences, instead of the real-valued sequence (4.1). In this case, the auxiliary sequence

<Y_n> = Y₀, Y₁, Y₂,..., (4.2)

which is defined by the rule

Y_n = [dU_n],

is used in the test. Remember that [X] denotes the largest integer which is no greater than X.

This is an integer sequence which will be uniformly distributed between 0 and d-1(include 0 and d-1) if the sequence (4.1) is uniformly distributed between 0 and 1. (Pay attention to the slight difference of meanings of the two "uniformly distributed" here)

The symbols U_n, Y_n, d will have the meaning in the above fomulas in all the following tests.

Exercise 4.1: Notice that when talking about interval of real numbers, sometimes the statement is vague about whether the boundary of the interval is included. But unlike interval of integers, it does not really matter. Think about why.

A. Equidistribution test (Frequency test).

The first requirement about sequence (4.2) one would think about might be that the frequency of occurences of each integer between 0 and d-1 should be almost the same.

For each integer r, 0 ≤ r < d, count the number of times Y_j = r for 0 ≤ j < n, which we denote by r(n).

When n is very big, the quotient r(n)/n should be close to 1/d.

This last sentence will be made precise later in this section with statistical mathod.

B. Serial test.

This is a generalization of frequency test: we want pairs of successive numbers to be uniformly distributed in an independent manner. To make this test, count the number of times the pair (Y_2j, Y_2j+1) = (q, r) occurs, for 0 ≤ j <n, which is denoted by (q,r)(n). These counts are to be made for each pair of integers (q, r) with 0 ≤ q, r < d, and

when n is very big, the quotient (q,r)(n)/n should be close to 1/d².

Similar to method A, the last sentence requires more interpretation.

Clearly we can generalize this test to triples, quadruples, etc., instead of pairs.

C. Gap test.

This test is used to examine the length of "gaps" between occurences of U_j in a certain range. If u and v are two real numbers with 0 ≤ u < v ≤ 1, we want to consider the lengths of consecutive subsequences U_j, U_j+1,...,U_j+r in which U_j+r lies between u and v but the other U's do not. (This subsequence of r+1 numbers represents a gap of length r.)

For example, we set u=0.5, v=1 and want to count the number of gaps of length 0, 1 and the number of gaps of length ≥ 2 of the following sequence,until 10 gaps have been tabulated:

0.30, 0.27, 0.84, 0.70, 0.12, 0.56, 0.96, 0.28, 0.17, 0.65, 0.69, 0.85, 0.42,0.78, 0.13, 0.07, 0.51, 0.73, 0.03,...

Then we seperate the gaps with |:

0.30, 0.27, 0.84,| 0.70,| 0.12, 0.56,| 0.96,| 0.28, 0.17, 0.65,| 0.69,| 0.85,| 0.42,0.78,| 0.13, 0.07, 0.51,| 0.73|, 0.03,...

The number of gaps of lengh 0( notice the way we uesd to define the length of gap)is 5; The number of gaps of lengh 1 is 2; The number of gaps of length ≥ 2 is 3.

Generally, if we count number of gaps of length 0, 1, ..., t - 1 and the number of gaps of length ≥ t,until n gaps have been found, and we denote the number of gaps of of length i by i(n), 0 ≤ i < t, and number of gaps of length ≥ t by t(n).

When n is very big, i(n)/n will be close to p(1-p)ⁱ, 0 ≤ i < t. And t(n)/n will be close to (1-p)^t,

where p = v - u.

Exercise 4.2: Derive the above fomula.

D. Poker test(Partition test).

The "classical" poker test considers n groups of five successive integers, (Y_5j, Y_5j+1,..., Y_5j+4), 0 ≤ j < n. We observe which of the following seven patterns each quintuple matches:

All different: abcde	Full house: aaabb
One pair: aabcd	Four of a kind: aaaab
Two pairs: aabbc	Five of a kind: aaaaa
Three of a kind: aaabc

Exercise 4.3: Finish the part left out in poker test.
Exercise 4.4: Find another kind of test yourself.

In every one of the tests mentioned above, there is an unsolved problem in the end: How do we interpret the infomation collected from these tests? There are many statistical ways of doing so, here we will only introduce the one that is perhaps the best known of all statistical tests: "Chi-square" tests.

Suppose we take n independent observations and every observation can fall into one of k categories. In method A, the Frequency test, we made n oberservations( one observation is to tell what the integer is). And every observation fall into d categories( being 0,1,..., or d-1).

Suppose the ideal probobility of one observation falls into category s is p_s, then the ideal number of times would be np_s(in A, it would be n/d). Let Y_s be the number of observations which actually do fall into category s. We form the statistic

V = ∑_1≤s≤k(Y_s - np_s)²/np_s ,

which is called the "chi-square" statistic of the observed quantities Y₁, Y₂,..., Y_k. (in A, V = d*∑_0≤r≤d-1(r(n) - n/d)²/n ).

Before further discussion, let's take a look at the following table:

SELECTED VALUES OF THE CHI-SQUARE DISTRIBUTION
	P=99%	95%	P=75%	P=50%	P=25%	P=5%	1%
v=1	0.00016	0.00393	0.1015	0.4549	1.323	3.841	6.635
v=2	0.02010	0.1026	0.5753	1.386	2.773	5.991	9.210
v=3	0.1148	0.3518	1.213	2.366	4.108	7.815	11.34
v=4	0.2971	0.7107	1.923	3.357	5.385	9.488	13.28
v=5	0.5543	1.1455	2.675	4.351	6.626	11.07	15.09
v=7	1.239	2.167	4.255	6.346	9.037	14.07	18.48
v=10	2.558	3.940	6.737	9.342	12.55	18.31	23.21
v=15	5.229	7.261	11.04	14.34	18.25	25.00	30.58
v=30	14.95	18.49	24.48	29.34	34.80	43.77	50.89

The "v" in the table is called the number of "degrees of freedom" ,which is equal to k-1,i.e. one less than the number of categories.

To use this table, first find the row v = k - 1. If the table entry in row v under column p is x, it means, "If the hypothesis 'one observation falls into category s with probobility p_s' holds, the quantity V will be greater than x with probability p". If V is too big, e.g. greater than the number under column 1%, or too small, e.g. smaller than the number under comlumn 99%, we should reject the hypothesis.

Exercise 4.5: Think about why should we reject the hypothesis if "V is too small"?

In the test, a large n should be chosen. One rule is to take n large enough so that each of the expected values np_s is five or more; preferably, however, take n much larger than this, to get a more powerful test.

There is a good summery in Knuth's book [1] about chi-square test, which I cited here as the end of this section:

A fairly large number, n, of independent observations is made. We count the number of observations falling into each of k categories, and compute the quantity V. Then V is compared with the number in the table of values of chi-square distribution. If V is less than the 99-percent entry or greater than the one-percent entry, we reject the numbers as not sufficiently random. If V lies between 99 and 95 percent or between 5 and 1 percent, the number are "suspect"; if V lies between 95 and 90 percent, or 10 and 5 percent, the numbers might be "almost suspect." The chi-square tests is often done at least three times on different sets of data, and if at least two of the three results are suspect the numbers are regarded as not sufficiently random.

Exercise 4.6: Choose some linear congruential generator, produce a number sequence. Use one of the empirical tests to test the sequence and use chi-square test to give the final conclusion whether you've got a good generator.(of course, this is a quite rough conclusion)

Exercise 4.7: Use knowledge in this section to test the sequence got in exercise 1.1.