Summaries and Histograms
One important goal in statistics is to summarize data.
Here we explore several important summary numbers as well as
graphical presentations which allow direct visual comparison of data sets.
|
|
|
Preparation:
Retrieve the
Singers data set.
Read the
Reference file in the data and
then inspect the four
DataDesk variables
Soprano,
Alto,
Tenor and
Bass
contined within the Icon
Heights.
Problem 1:
Make a
histogram ‡ for the
Alto variable.
(
Plot →
Histograms ).
Estimate the median height of the altos graphically from the histogram.
Mark your estimate on the histogram.
Describe how you made this estimate.
Is the distribution skewed to the right or left?
Problem 2:
Collect the
statistics ‡ to fill in the following table.
Use
Calc →
Calculation Options →
Select Summary Statistics
to customize which values will appear and then
Calc →
Summary →
Reports to actually see the values.
| Standard Deviation | Interquartile Range | Mean | Median |
Soprano | | | | |
Alto | | | | |
Tenor | | | | |
Bass | | | | |
Problem 3:
Plot ‡ the box plots side by side for the four groups. To get your box plots, append the data for each singing part as described below, select the
Heights there as
y, the
Part as
x and then
Plot →
boxplot y by x
Note: Create an Append relation by selecting all four icons in the
Heights folder at once followed by
Manip →
Append and Make Group Variable. A new window opens with the icons
data and
group. rename these to
heights and
Parts. The boxplot can then be made with
Heights selected as
y and
Parts as
x.
Note: You don't need to do it on this problem, but you could also generate these boxplots
individually by selecting one of the variables inside
Heights (e.g.
Soprano) and then choosing
Plot →
Boxplot Side by Side. However, you cannot use this command after selecting all four variables because the variables have different numbers of cases. That's why we used the approach above using an Append relation with
Plot →
boxplot y by x.
Problem 4:
- Classify these four groups of singers into two pairs
so that within each pair the mean heights are similar.
Would your conclusion be different if you were to use
the median instead of the mean?
- Using the standard deviation, compare the spreads of the four groups.
Now use the interquartile ranges to compare the spreads.
Explain whether or not these two approaches on spreads are consistent.
- Hypothesize about why the 4 groups of singers break up into
two pairs as you noted in part (a).
Hint: Read the Reference File again.
Note: Your textbook has further discussion of boxplots, medians and quartiles.
To Turn In:
- Please print out results marked with ‡.
- Turn in your work in lab next week.