Regression Inference Exercise: Math 171 Lab 12

Regresssion is a powerful way to summarize data and make predictions. However, remember that the analysis is based on a sample and therefore random variation from the sample will affect not only values of the regression parameters but also any estimates of predicted values we might make.

In this lab we quantify these uncertainties using methods analogous to the inference techniques for means.

Table of Contents

The Data File:

Open the data file COL/ACC RATE . The college acceptance rate data is from approximately 1987, there are four variables involved in this data file:

  1. AccRate = Percentage of applicants accepted.
  2. SAT = sum of average Math and Verbal SAT scores among students at the college.
  3. Tuition = yearly tuition charge in dollars.
  4. Case # = numbering from 1 to 109 of cases in the data. Each number corresponds to a college. Cornell is #11

Run a simple regression

  • ...using AccRate as a response variable and SAT as an explanatory variable.
  • Print ‡ the summary regression table from Data Desk.
  • Answer the following questions based on the regression table and other information in the data file.

Helpful:
RegressionCribSheet.pdf pdf icon : Interpreting the Data Desk Regression Report

Computer Hint:

After selecting suitable Y and X variables, the menu entry Regression is available from the Calc menu. Note in the displays that s.e. stands for standard error. The menu entry Calculation Options on the Calc menu includes an entry Regression Options... which lets you select what is done when Regression is invoked. Options available include Calculate Residuals and Calculate Predicted Values. If these options are selected, then the Results folder will include a subfolder Regression containing tables of the requested values.

The Questions:

  1. What is the equation of the least squares line?
  2. What is the predicted ACCeptance RATE for Cornell? (Cornell corresponds to case #11)
  3. Print ‡ out a scatterplot of the residuals vs. predicted values.
    • Does the picture show any indication that the standard model assumptions may have been violated?
  4. What is the percentage of variability in the ACCeptance RATE that is explained by the regression?
  5. Conduct a test at significance level 0.05 that β = 0 vs. β ≠ 0 where β is the coefficient corresponding to the SAT.
    • How big is the p - value?
    • Find a 95% confidence interval for β ( beta )
    • Does it agree with common sense that β should be negative?
  6. In part 2. you computed the predicted ACCeptance RATE for Cornell and its corresponding residual. Use the fomulas developed in the Appendix below and your Data Desk summary statistics together with x* = the value of Cornell's average SAT scores to:
    • Produce a 95% confidence interval for μHat
    • Produce a 95% prediction interval for yHat


Appendix: A Pair of Rhetorical Questions with Answers

In part 2. you computed the predicted ACCeptance RATE for Cornell and its corresponding residual. We can ask two more general questions about this predicted value:

Q1: (Rhetorical) What is the uncertainty associated with this predicted value at the level of a predicted mean value (across all schools with the same SAT score)?

Q2: (Rhetorical) What is the uncertainty associated with this predicted value at the level of a predicted actual value for any particular observation (school)?

Answers:

These uncertainties are characterized by the Standard Error of μHat and the Standard Error of yHat respectively. Just as it was often simpler to talk of variance, the square of standard deviation, we can shift our attention to the squares of these standard errors calling them error variances. When we do this, we find two elegant formulae which connect directly to our recurring idea of adding variances:

A1: Uncertainty in Predicted Mean

SEμHat
(SEμHat)2 = s2/n + (x*-xBar)2 * SEβ2
error variance of μHat = (error variance about the line)/n + displacement * error variance of slope

A2: Uncertainty in Predicted Actual Value

SEyHat
(SEyHat)2 = s2 + SEμHat2
error variance of yHat = error variance about the line + error variance of μHat

  • Recall that s was defined as the standard error about the regression line, so s2 is an estimated variance about the regression line.
  • With a bit of algebraic discovery, you will find that these formulas are identical to the more detailed computation-oriented formulas in your textbook.
  • SEμHat and SEyHat can be used in the familiar fashion with t (or in the limit z ) values to create confidence intervals around μHat and yHat.

The End

home icon CuMath171Info > LabExercises LabOnRegressionInference
Revision: LabOnRegressionInference - r1.17 24 Nov 2006 - 04:55 - Dick Furnas