next up previous
Next: About this document ...

Variance of Residuals in Simple Linear Regression

Allen Back

Suppose we use the usual denominator $ (n-1)$ in defining the sample variance and sample covariance for samples of size $ n$:

$\displaystyle {\rm Var}(x)=\frac{1}{n-1}\Sigma_{i=1}^n (x_i-\bar{x})^2$      
$\displaystyle {\rm Cov}(x,y)=\frac{1}{n-1}\Sigma_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})$      

Of course the correlation coefficient $ r$ is related to this covariance by

$\displaystyle r=\frac{1}{s_x s_y}\left( {\rm Cov}(x,y) \right).
$

Then since $ (a+b)^2=a^2+2ab+b^2$, it follows that

$\displaystyle {\rm Var(a+b)}={\rm Var}(a)+{\rm Var}(b)+2{\rm Cov}(a,b).
$

If we apply this to the usual simple linear regression setup, we obtain:

Proposition: The sample variance of the residuals $ d_i$ in a simple linear regression satisfies

$\displaystyle {\rm Var}(d_i)= (1-r^2){\rm Var}(y_i)
$

where $ {\rm Var}(y_i)$ is the sample variance of the original response variable.



Proof: The line of regression may be written as

$\displaystyle \hat{y}-\bar{y}=b_1(x-\bar{x})
$

where $ b_1=\frac{rs_y}{s_x}$. The residual of a point $ (x_i,y_i)$ is $ d_i=y_i-\hat{y}_i$, so:
$\displaystyle {\rm Var}(d_i)$ $\displaystyle =$ $\displaystyle {\rm Var}(y_i-\hat{y}_i)$  
$\displaystyle \ $ $\displaystyle =$ $\displaystyle {\rm Var}(y_i-(\bar{y}+b_1(x_i-\bar{x})))$  
$\displaystyle \ $ $\displaystyle =$ $\displaystyle {\rm Var}((y_i-\bar{y})-b_1(x_i-\bar{x}))$  
$\displaystyle \ $ $\displaystyle =$ $\displaystyle s_y^2 +b_1^2s_x^2-2b_1 {\rm Cov}(y_i-\bar{y},x_i-\bar{x})$  
$\displaystyle \ $ $\displaystyle =$ $\displaystyle s_y^2 +r^2 \frac{s_y^2}{s_x^2} s_x^2-2r\frac{s_y}{s_x} (rs_xs_y)$  
$\displaystyle \ $ $\displaystyle =$ $\displaystyle s_y^2 -r^2 s_y^2$  
$\displaystyle \ $ $\displaystyle =$ $\displaystyle (1-r^2) s_y^2.$  




next up previous
Next: About this document ...
Allen Back 2007-11-01