The modern Olympic Games, a modified revival of the
Ancient Greek Olympian Games, were inaugurated in 1896,
largely through the efforts of French sportsman and educator
Baron Pierre de Coubertin. Since then, the Games have been
held nearly every four years at various sites around the world,
and have become a major international athletic competition.
|
|
|
During the past century, performances in the Olympic events have improved
dramatically. In this assignment, you will examine the men's long jump.
You want to look at historical trends, explain unusual results, construct a good linear model, then make predictions and interpret the results. Here are the step-by-step instructions. Include the items marked with ‡ in your
Layout for printing.
1. Datafile
Open the datafile. You should see these variables of interest:
- year - the years the Games were held, with 1900 = 0
(so the first modern Olympics have year = -4; the 2000 Olympics would have year = 100).
- dist - the gold-medal winning jumps, in inches.
2. History
First, look at the history of this event.
- Select dist as the response variable Y and year as the explanatory variable X.
- Plot the Scatterplot with the regression line ‡(in the hypermenu Add Regression Line ).
Describe what you see about the association.
- Again using the hypermenu, do the Regression ‡ of dist vs year.
- In the hypermenu for the regression analysis, Compute the Residuals.
- Plot the Scatterplot ‡ of residual(Y) vs year(X), and Add Regression Line ‡.
- Note that the residuals plot shows some interesting patterns and gaps.
Drawing upon your knowledge of the history of the 20th Century, explain
why the residuals plot looks like that. Also point out any outliers.
3. Model Considerations
Your goal is to create a good model for predicting Olympic long jump performances for the near future.
Good models should be based on relevant data. Your historical analysis probably suggests
that it would be unwise to use all the data for the first 100 years of the modern Games to
predict the results for the early 21st century. Think about what part of the data you would
consider most relevant.
4. Model Building
Now create the model (find the equation of the line of best fit) based upon your chosen data points.
- Under Modify, show the Tools. Select the "lasso" from the
upper left corner of the tools palette. Holding the mouse button
down, draw a loop encircling the points you want to use. (If you
make a mistake, or want to change your mind later, you can
Modify Selection → Clear at any time.) Your chosen points
should now be highlighted.
- Under Modify Selection you want to Assign Selector. This
creates a new variable that indicates which years' data you will use.
A "selector button" will appear in the lower left corner of the DataDesk window.
Be sure it is "on" (highlighted black). Now reselect dist and year
as Y and X, and Plot a new Scatterplot from your chosen data.
- Do a regressions analysis and make a residuals plot for your chosen data.
Do you think you have a good model? If so, continue with the rest of the
lab. If not, clear this selection and trash the selector, then make a different
choice of data and try again. You need to find a model you are satisfied
with, and be able to justify your choice of which data to use. On the one
hand is your desire to make the model really good by using only data that
accurately represents typical contemproary performances. On the other hand
is the scientifically indefensible practice of ignoring data just because you
don't like it. A model that works well must both look good and take into account
the reasonable variability and the trend seen in these performances. Be
assured that there is no "right answer". Different people will make different decisions.
- When you are satisfied, include your model's scatterplot (with line) ‡,
regression analysis ‡, and residuals plot ‡ in your Layout.
5. Model Justification
Justify and analyze your model.
- What data points did you choose to base your model on? Why?
- Do you think your model works well? Why?
- What is the model; that is, what is the equation of the line of best fit?
(The constant term and the slope for the equation of the line are the
coefficients displayed in the bottom left corner of the regression analysis.)
- What does the slope mean in this context?
- What does the value of R-Squared mean in this context?
- What does your model "predict" for the gold medal jump in the 2000 Sydney Games?
- The actual Sydney jump was 336.6 inches, Comment.
- What does your model "predict" for the gold medal jump in the 2004 Games?
- The actual jump was 338.2 inches, Comment.
- What does your model predict for the gold medal jump in the 2008 Games?
- Comment on your faith in this prediction.
- Predict the winning distance for the Olympics at the end of this century, in 2100.
- Comment on your faith in this prediction.
The End!
Your completed assignment is due in lab next week.