Data Desk: First steps

 

Law of large numbers

1. Menu Manip, Generate random numbers

choose Bernoulli, prob. of success 0.5, 3 variables, 10 000 or (100 000 cases).

2. Window appears, Variables BTrials1,..BTrials3 appear

3. Menu Manip, Transform, New derived variable

Window appears, write name (e.g. ave ).

Window appears, write formula:

GetCase(CumSum('BTrials1'),'grid')/'grid'

4. Generate the variable 'grid' :

Menu Manip, generate patterned data.

Generate numbers from 1 to 10000 in steps of 1.

Variable appears, name it 'grid'.

5. Select derived variable ave ,

Menu plot, lineplots.

A plot (picture) appears. Shows development of cumulated average across sample.

Brownian motion like picture, wiggly at the beginning, converges to probab. of success

6. Look at the picture in detail. Expand the picture (magnification lense), then drag it back into center, magnify again etc. The magnification works as follows: when the mouse cursor is in the center, the mouse click shrinks (zooms out). When the mouse cursor is in the outer region of the picture, the mouse click zooms in. You can see the "monster" in great detail but also as a whole. For each sample you get a different monster.

 

  

Central Limit theorem

(Unfortunately this does not work on the version of Datadesk in the Computer Lab.

The formula in 4. below has to be changed in some way to calculate binomial probabilities.

But it shows how to work with sliders.)

 

  1. Menu Data, New, Slider.
  2. Window appears; name the new sliding variable p.

    Another window appeears with a picture: scale between 0 and 1.5,

    Vertical axis at 1.

    (This will be our variable probability of success).

  3. Go to left upper corner of this window, click and choose "plot scale".
  4. Set upper bound 1 and lower bound 0, close dialog with OK.

  5. Repeat step 1, (get new slider). Call it N.
  6. In the "plot scale" dialog, set lower bound 0, upper bound none, but interval size 500.

    (This will be our variable binomial sample size).

  7. Menu Manip, Tranform, New derived variable. Call it binomprob.
  8. A window appears in which you can write a formula.

    Write

    N*BinomDistr(N*('grid'/10000),N,p)

    (This calculates the binomial probabilities.)

    Close this window by clicking on right upper corner.

  9. The variable just defined is still selected (the icon has a "Y"). Go to Menu plot, lineplots. A picture appears showing the binomial probabilites for the current N and p plotted on a scale from 1 to 10000. Click on the upper right corner of this picture, select "turn on automatic update" and "freeze scale").
  10. Move the p slider and the N slider and watch the curve changing. This shows the varying shape of the binomial distribution.
  11. Get a new slider and call it ‘s’. This will be just a display parameter for our purposes. Set lower bound 0.
  12. Repeat step 4, get a new derived variable. Call it ‘center’.
  13. Write the formula

    Sqrt(s*N)*BinomDistr((s*('grid'-5000)/10000)*(Sqrt(N*p*(1-p)))+N*p,N,p)

    This also calculates binomial probabilities, but in the center of the distribution (around Np with an appropriate scale).

    Close the formula window,.

     

  14. Repeat step 5 for this newly defined variable (formula).
  15. Repeat step 6 and watch the two diagrams changing. The second picture (center) may also be enlarged (click on secondmost right field in the upper right corner).

Change also the s slider (which is just the scale of the picture) to get a better view of the curve in ‘center’.

The curve center approximates the normal density curve to a varying degree, namely:

The better, the larger N is

And the closer p is to ½.

Watch the normal approximation break down as p approaches 0 or 1 or N becomes small.