Confidence Intervals
We often need to know things about populations, but can seldom examine
every individual. Instead we gather information from samples, often small, with
the hope the sample will tell us something about the population. However, natural
variability among samples (called sampling error) makes any guess about the
population uncertain.
|
|
|
Confidence intervals allow us to make inferences about population
parameters based upon sample statistics. we do not pretend that we will know an
exact value. Instead we settle for being fairly confident that the population value
lies within a (hopefully small) range. In completing this assignment you will
examine how well a confidence interval succeeds -- or doesn't. Here are the
step-by-step instructions. (Again the ‡ highlights things to include in your
Layout for printing.)
1. Datafile
Open the
HeartAttacks data file. Remember this from Lab 3? The variable
cost
contains the dollar amount hospitals billed each heart attack patient in New Yourk State
some years ago.
These are the data from every individual in the population.
Understand that you seldom have such a complete data set.
- Plot a Histogram ‡ for this population. Does it appear to be normally distributed?
- Cal culate Summaries ‡. How many NY patients were there? What was the
population mean cost of treatment? What was the population standard deviation?
2. Estimate
Now, pretend that you do not actually know about the population mean. Instead,
supose you need to estimate the typical patient cost for an insurance company, or
a congressional committee drafting health care legislation. To produce this
estimate, you will collect data from a random sample of the patients. Let's
examine how well this process might work. We'll start with sample size n=60.
- Select the cost icon as Y, then Manip → Sample. Remember that even though
the sample size n is actually the important value for us here, DataDesk asks
you to specify the percentage of the population to be chosen.
Select 20 samples of size 60 (0.5%); do NOT create sample indices.
- Using your chosen samples, estimate the population mean by creating the 75%
confidence interval for each of these 20 samples. Use the knowledge of
population standard deviation from part 1. (This can be done on DataDesk by using
Calc → Estimate... . You can do the 20 cases at once if you select
them all first. Be sure to select z-interval, enter the standard deviation and set
the individual confidence level to 75%.) Print Results ‡ .
- Examine the list of intervals. How many of them successfully capture the actual population
mean? Why does this sample-and-make-an-interval approach not always work?
- Is the sample size big enough for inference?
3. Estimate further
Using the same samples, create 90% confidence intervals ‡ and 95% confidence
intervals ‡. Compare the three sets of intervals. Which were most successful in
correctly estimating the population mean? When we request greater confidence, how
does the margin of error change?
4. Estimate from a Larger Sample
Now reselect the initial population, and create a new set of 95% confidence intervals ‡
based on much larger samples of around
n = 600 (5%). Compare these to the old set of
95% confidence intervals. What is better about the new intervals?
5. Higher Confidence
We could create 99% confidence intervals. What are the advantages and
disadvantages of doing that? (Note that you do not need to actually create them.)
The End!
Your completed assignment is due in lab next week.