To most of us, fuel economy is an important issue. In the United States, we
measure fuel economy in miles per gallon; in Europe and Canada, the standard
is liters per 100 kilometers. A sociologist might say that this difference is a
window into the national psyche. A European has to go from A to B and asks,
"How much gas will I need to get there?" An American says, "The gas tank is
full. Where can I go?"
|
|
|
Anyway, we seek a good way to predict a car's fuel economy. In completing this assignment you will examine several possible explanatory variables, choose the one you think is best, construct a linear model, then make a prediction and interpret the results. Here are the step-by-step instructions. The ‡ highlights things to include in your
Layout for printing.
1. Data File
Open the
Cars91 DataDesk datafile.
You should see these variables of interest:
- MPG - fuel economy ratings in miles per gallon;
- Weight - in pounds;
- Horsepo - engine horsepower;
- Eng.Dis - engine displacement (size) in cubic inches;
- Cylinders - number of cylinders (usually 4, 6, or 8);
- Drive r - drive ratio (how many revolutions the engine makes to rotate the wheels once).
2. Best Predictor
Let's look for the best predictor of fuel economy:
- Select the response variable MPG as Y.
- Holding the shift key down, select the other variables simultaneously as possible explanatory variables, X.
- Calc ulate Correlations (Pearson).
- The correlation matrix‡ shows you the strength and direction of the association between fuel economy and each of the other variables. Which variable seems to be most strongly correlated with MPG? Explain your decision.
3. Plot Response
From now on, you will be working only with the response variable,
MPG, and whatever you just chose as the best explanatory variable.
- Select these variables as Y and X respectively.
- Plot the Scatterplot ‡.
- Is the pattern you see what the correlation led you to expect? Explain.
4. Create the Model
Now create the model (find the equation of the line of best fit).
- Using the scatterplot's hypermenu (click on the little triangle in the plot's title bar). Add Regression Line ‡ and do the Regression of MPG vs X ‡.
- You may find this flier helpful:
- How many cars is this analysis based on?
- What is the model; that is, what is the equation of the line of best fit?
(The constant term and the slope for the equation of the line are the coefficients
displayed in the bottom left corner of the regression analysis.)
- What does the slope mean in this context?
- What does the value of R-squared mean in this context?
My Geo
I drive a Geo Prizm. It weighs 2728 pounds (with two occupants) and
has a 102 horsepower, 97 cubic inch, 4 cylinder engine with final drive ratio of 3.05.
- Use your model to estimate how many miles per gallon I should get.
- I actually average about 34.7 mpg. How much lower (-) or higher (+)
is that than your model predicted? This difference, or error, is called the residual.
Model Success
Residuals provide an important look at how successful the model is. Use the
hypermenu to the left of the title bar for the
regression analysis (not the graph!) to create the
Scatterplot‡ residuals vs predicted.
- In the scatterplot's hypermenu you can again Add regression line ‡.
- The regression line is now horizontal, indicating where residuals would
equal 0. See the actual residuals plotted for the cars data? Explain what
they represent.
Quest for a Better Model
If this model had successfully extracted all the meaning from the data, the
remaining error would be random. In that case, the residuals plot should
appear to contain no pattern in the scatter. Look at your residuals plot. Do
you see the hint of a curved pattern? That means you should be able to find
a better model. Try the European concept of fuel economy: gallons per 100 miles.
- Manip → Transform and create a New Derived Variable named GpHM
for "gallons per hundred miles".
- Enter the formula
100/MPG
, then (from the hypermenu) Show Numbers.
You should see a list of values indicating how many gallons each car uses for
for 100 miles. Do they look reasonable?
- To create a revised model using your new variable, drag the new icon
GpHM onto the MPG axis label of your old scatterplot. When you drop
the icon onto the label on the vertical axis, you replace
MPG as the response variable with GpHM . You now see the new scatterplot‡
and regression line‡. Do a new regression analysis‡ (just drag
the icon again, make sure GpHM replaces MPG and is not, instead added
to the list of variables in the analysis) to create a revised model, and update the
residuals‡.
- Use the value of R-squared and what you see in the residuals plot to explain
why you think this new model is better (or worse) than the original.
The End
Please turn in your completed assignment during lab next week.