Next: About this document ...

Variance of Residuals in Simple Linear Regression

Allen Back

Suppose we use the usual denominator in defining the sample variance and sample covariance for samples of size :

$\displaystyle {\rm Var}(x)=\frac{1}{n-1}\Sigma_{i=1}^n (x_i-\bar{x})^2$
$\displaystyle {\rm Cov}(x,y)=\frac{1}{n-1}\Sigma_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})$

Of course the correlation coefficient

is related to this covariance by

$\displaystyle r=\frac{1}{s_x s_y}\left( {\rm Cov}(x,y) \right).$

Then since

, it follows that

$\displaystyle {\rm Var(a+b)}={\rm Var}(a)+{\rm Var}(b)+2{\rm Cov}(a,b).$

If we apply this to the usual simple linear regression setup, we obtain:

Proposition: The sample variance of the residuals in a simple linear regression satisfies

$\displaystyle {\rm Var}(d_i)= (1-r^2){\rm Var}(y_i)$

where ${\rm Var}(y_i)$ is the sample variance of the original response variable.

Proof: The line of regression may be written as

$\displaystyle \hat{y}-\bar{y}=b_1(x-\bar{x})$

where $b_1=\frac{rs_y}{s_x}$ . The residual of a point

is $d_i=y_i-\hat{y}_i$ , so:

$\displaystyle {\rm Var}(d_i)$	$\displaystyle =$	$\displaystyle {\rm Var}(y_i-\hat{y}_i)$
$\displaystyle \$	$\displaystyle =$	$\displaystyle {\rm Var}(y_i-(\bar{y}+b_1(x_i-\bar{x})))$
$\displaystyle \$	$\displaystyle =$	$\displaystyle {\rm Var}((y_i-\bar{y})-b_1(x_i-\bar{x}))$
$\displaystyle \$	$\displaystyle =$	$\displaystyle s_y^2 +b_1^2s_x^2-2b_1 {\rm Cov}(y_i-\bar{y},x_i-\bar{x})$
$\displaystyle \$	$\displaystyle =$	$\displaystyle s_y^2 +r^2 \frac{s_y^2}{s_x^2} s_x^2-2r\frac{s_y}{s_x} (rs_xs_y)$
$\displaystyle \$	$\displaystyle =$	$\displaystyle s_y^2 -r^2 s_y^2$
$\displaystyle \$	$\displaystyle =$	$\displaystyle (1-r^2) s_y^2.$

About this document ...

Next: About this document ...

Allen Back 2007-11-01