Upper Bound

Exercise 1: calculate the probability of leading digit being 1 for a number randomly chosen from 1 to 19. How about 1 to 29, 1 to 39, etc? So how should we solve the problem of having to specify a bound?

You may want to try other range such as 1 to 199. Then you see the probability keeps oscillating (between what?), never approaching a number and so you need some averaging process to get a single number. The key is to try to make the answer as independent of the bound as possible.

Answer: If we want to get rid of having to specify a bound, we can try looking at what happens as the bound gets bigger. The example above has probability 11/19, 11/29, 11/39, and we see it will oscillate between 1/9 and 5/9 but never approaching a single number. The next try is to average the probability for various upper bound: (1/1+1/2+...+1/9+2/10+3/11+...+10/18+11/19+11/20+...+11/99)/99=0.253 (this is quite difficult to calculate with a calculator, so for first try, I compute (11/19+11/29+...+11/99)/9=0.242. ). If we do this with more upper bounds, it will approach a single number around 24.1%. This is not the the desired answer(30.1%) but we are not far away. As explained in the literature, averaging once is not enough to get rid of the effects of having to specify the upper bound. If we take the data from the first average and average them again, it will become closer. To eliminate the upper bound effect completely, we have to repeatedly average the average and this will yield the answer, although we will not actually compute this limit here. For those interested, the result is due to Flehinger and one method of computing is presented here in one of the .ps file.

The iterated averaging scheme is only one of many methods because the main factor is how well we get rid of the effects of choosing an upper bound. For example, our first idea is to use a random number generator. Instead of choosing an upper bound, we will use a second random number generator with upper bound 900,000 to obtain the upper bound for the first generator. Or we can use a third generator to get the bound for the second one instead of specifying the upper bound of the second generator. Of course we will eventually need to make a choice of upper bound, this method works very well after chaining it a few times. In fact we can use other distributions (random number generator is uniform distribution), such as normal distribution (which has parameters to be specified as well). This explains why Benford's Law seems to work especially well if you combine data from different sources (such as numbers in the newspaper).

For a very detailed exposition of everything, click here

Next: equidistribution or if you are tired, go Home