PDFPS
Module 2: Simple linear regression
By Pia Veldt Larsen


Table of Contents





2.1 oIntroduction

Previous Section
Next Section

This module reviews simple linear regression models. That is, regression models with just one explanatory variable, and where the relationship between the response variable and the explanatory variable is a straight line. Although these models are of a simple nature, they are important for various reasons. Firstly, they are very common (you have already met several examples in Module 1). This is partly due to the fact that non-linear relationships often can be approximated by straight lines, over limited ranges. Secondly, in cases where a scatterplot of the data displays a non-linear relationship between the response variable and the explanatory variable, it is sometimes possible to transform the data into a new pair of variables with a straight-line relationship. That is, we can transform a simple non-linear regression model into a simple linear regression model, and analyse the data using methodology from linear models. Lastly, the simplicity of these models make them useful in providing an overview of the general methodology. Later in the course, we shall extend the results for simple linear regression models to more complex settings.


A formal definition of the simple linear regression model is given in Section 2.2. In Section 2.3, we discuss how to fit the model, and how to estimate the variation away from the line. Section 2.4 concerns inference on simple linear regression models.




2.2 Simple linear regression models

Previous Section
Next Section

In most of the examples and exercises in Module 1, there was only one explanatory variable, and the relationship between this variable and the response variable was a straight-line with some random fluctuation around the line.


Example 2.1        Mobility of elderly people

These data concern the relationship between two methods for measuring the mobility of elderly people: the TUG score ($ x$ ) and the Berg score ($ Y$ ). A scatterplot of the data is shown in Figure 2.1.

Figure 2.1: Berg score against TUG score
\includegraphics[width=0.75\textwidth]{fig/mobility}


The relationship between the variables could be described as a straight line, and some random fluctuations. Thus, we can use, as a model for the data, the model

$\displaystyle Y_{i}=\beta _{0}+\beta _{1}x_{i}+\varepsilon _{i},$  $\displaystyle i=1,\ldots
,16.
$

This is an example of a simple linear regression model.


Further details on this dataset can be found here.


$ \diamondsuit$


Suppose that we have a response variable $ Y$ and an explanatory variable $ x$ , then the simple linear regression model for $ Y$ on $ x$ is given by

$\displaystyle Y_{i}=\beta _{0}+\beta _{1}{x}_{i}+\varepsilon _{i},\hspace{1cm}i=1,\ldots ,n,$ (2.1)

where $ \beta _{0}$ and $ \beta _{1}$ are unknown parameters, and the $ \varepsilon _{i}$ s are independent random variables with zero mean and constant variance for all $ i=1,\ldots ,n$ .


The parameters $ \beta _{0}$ and $ \beta _{1}$ are called regression parameters (or regression coefficients), and the line $ h\left(
x\right) =\beta _{0}+\beta _{1}x$ is called the regression line or the linear predictor. (Recall that a general $ h\left( \cdot \right)
$ is called a regression curve.) The regression parameters $ \beta _{0}$ and $ \beta _{1}$ are unknown, non-random parameters. They are the intercept and the slope, respectively, of the straight line relating $ Y$ to $ x$ .


The name simple linear regression model refers to the fact that the mean value of the response:

$\displaystyle \mathbb{E}[Y_i]=\beta _{0}+\beta _{1}{x}_{i}
$

is a linear function of the regression parameters $ \beta _{0}$ and $ \beta
_{1}$ . (Note that $ \mathbb{E}[Y_i]$ is an affine function of the explanatory variable $ {x}_{i}$ .)


The terms $ \varepsilon _{i}$ in (2.1) are called random errors or random terms. The random error $ \varepsilon _{i}$ is the term which accounts for the variation of the $ i$ th response variable $ Y_{i}$ away from the linear predictor $ \beta _{0}+\beta _{1}x_{i}$ at the point $ x_{i}$ . That is,

$\displaystyle \varepsilon _{i}=Y_{i}-\beta _{0}-\beta _{1}x_{i},$  $\displaystyle i=1,\ldots ,n.$ (2.2)

The $ \varepsilon _{i}$ s are independent random variables with the same variance and zero mean. Hence, the response variables $ Y_{i}$ are independent with means $ \beta _{0}+\beta _{1}x_{i}$ , and constant variance equal to the variance of $ \varepsilon _{i}$ .


Example 2.1 (continued) Mobility of elderly people

An interpretation of the regression parameters $ \beta _{0}$ and $ \beta _{1}$ is as follows:

$ \protect\beta _{0}$ :
The expected Berg score for a hypothetical patient with TUG score zero.

$ \protect\beta _{1}$ :
The expected change in the Berg score, when the TUG score is increased by one minute. Observe that the slope of the line is negative, implying that the Berg score decreases with increasing TUG score.

$ \diamondsuit$




2.3 Fitting the model

Previous Section
Next Section

Having decided that a straight line might describe the relationship in the data well, the obvious question is now: which line fits the data best?


In Figure 2.2 four different lines are added to a scatterplot for the data on mobility of elderly people. One or two of the lines may look a little better than others, but it is difficult to decide which line is the best.

Figure 2.2: Mobility data: Four different regression lines
\includegraphics[width=0.75\textwidth]{fig/mobility1}


The most common criterion for estimating the best fitting line to data is the principle of least squares. This criterion is described in Subsection 2.3.1. Subsection 2.3.2 concerns a measure of the strength of the straight-line relationship. When we estimate the regression line, we effectively estimate the two regression parameters $ \beta _{0}$ and $ \beta
_{1}$ . That leaves one remaining parameter in the model: the common variance $ \sigma ^{2}$ of the response variables. We discuss how to estimate $ \sigma
^{2}$ in Subsection 2.3.3.




2.3.1 The principle of least squares

Previous Section
Next Section

The principle of least squares is based on the residuals. For any line, the residuals are the deviations of the response variables $ Y_{i}$ away from the line. (Note that residuals always refer to a given line or curve.) The residuals are usually denoted by $ \varepsilon _{i}$ like the random errors in (2.2). The reason for this notation is that, if the line is the true regression line of the model, then the residuals are exactly the random errors $ \varepsilon _{i}$ in (2.2). For a given line $ \tilde{h}\left( x\right) =\tilde{\beta}_{0}+\tilde{\beta}_{1}x$ , the observed value of $ \varepsilon _{i}$ is the difference between the $ i$ th observation $ y_{i}$ and the linear predictor $ \tilde{\beta}_{0}+\tilde{\beta}_{1}x_{i}$ at the point $ x_{i}.$ That is,

$\displaystyle \varepsilon _{i}=y_{i}-\tilde{\beta}_{0}-\tilde{\beta}_{1}x_{i},$  $\displaystyle i=1,\ldots ,n.$ (2.3)

The observed values of $ \varepsilon _{i}$ are called observed residuals (or just residuals). In figure 2.3, a possible regression line has been drawn in a scatterplot of the data on mobility of elderly people. The residuals are indicated as vertical lines in the plot.

Figure 2.3: Mobility data: the observed residuals
\includegraphics[width=0.75\textwidth]{fig/mobility2}


Note that, the better the line fits the data, the smaller the residuals will be. Thus, we can use the `sizes' of the residuals as a measure of how well a proposed line fits the data. If we simply used the sum of the residuals, we would get a problem with large positive and large negative values cancelling out; this problem can be avoided by using the sum of the squared residuals instead. If this measure-the sum of squared residuals-is small, the line explains the variation in the data well; if it is large, the line explains the variation in the data poorly. The principle of least squares is to estimate the regression line by the line which minimises the sum of squared residuals. Or, equivalently: estimate the regression parameters $ \beta _{0}$ and $ \beta _{1}$ by the values which minimise the sum of squared residuals.


The sum of squared residuals, or, as it is usually called, the residual sum of squares, is denoted by $ RSS$ (or $ RSS\left( \beta
_{0},\beta _{1}\right) $ to emphasise that it is a function of $ \beta _{0}$ and $ \beta _{1}$ ), and is given by

$\displaystyle RSS=RSS\left( \beta _{0},\beta _{1}\right) =\sum_{i=1}^{n}\varepsilon _{i}^{2}=\sum_{i=1}^{n}\left( y_{i}-\beta _{0}-\beta _{1}x_{i}\right) ^{2}.$ (2.4)

(For simplicity, we omit the limits $ i=1$ and $ i=n$ on the summation symbols in the following.)


In order to minimise $ RSS$ with respect to $ \beta _{0}$ and $ \beta _{1},$ we differentiate (2.4), and get

$\displaystyle \frac{\partial RSS}{\partial \beta _{0}}\left( \beta _{0},\beta _{1}\right)$ $\displaystyle =$ $\displaystyle -2\sum \left( y_{i}-\beta _{0}-\beta _{1}x_{i}\right) ,$  
$\displaystyle \frac{\partial RSS}{\partial \beta _{1}}\left( \beta _{0},\beta _{1}\right)$ $\displaystyle =$ $\displaystyle -2\sum x_{i}\left( y_{i}-\beta _{0}-\beta _{1}x_{i}\right) .$  

Putting the derivatives equal to zero and re-arranging the terms, yields the following equations
$\displaystyle \sum y_{i}$ $\displaystyle =$ $\displaystyle \beta _{0}n+\beta _{1}\sum x_{i},$  
$\displaystyle \sum x_{i}y_{i}$ $\displaystyle =$ $\displaystyle \beta _{0}\sum x_{i}+\beta _{1}\sum x_{i}^{2}.$  

Solving the equations for $ \beta _{0}$ and $ \beta _{1}$ provides the least squares estimates $ \hat{\beta}_{0}$ (reads beta-naught-hat) and $ \hat{\beta}_{1}$ (beta-one-hat) of $ \beta _{0}$ and $ \beta _{1}$ , respectively. They are given by
$\displaystyle \hat{\beta}_{0}$ $\displaystyle =$ $\displaystyle \overline{y}-\beta _{1}\overline{x},$  
$\displaystyle \hat{\beta}_{1}$ $\displaystyle =$ $\displaystyle \frac{\sum (x_{i}-\overline{x})(y_{i}-\overline{y})}{\sum
\left( x_{i}-\overline{x}\right) ^{2}},$  

where $ \overline{y}=\sum y_{i}/n$ and $ \overline{x}=\sum x_{i}/n$ denote the sample means of the response and explanatory variable, respectively.


The estimated regression line is called the least squares line or the fitted regression line and is given by

$\displaystyle \hat{y}=\hat{\beta}_{0}+\hat{\beta}_{1}x.$ (2.5)

The values $ \hat{y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{i}$ are called the fitted values or the predicted values. The fitted value $ \hat{y}_{i}$ is an estimate of the expected response for a given value $ x_{i}$ of the explanatory variable. The residuals corresponding to the fitted regression line, are called the fitted residuals, or simply the residuals. They are given by
$\displaystyle \hat{\varepsilon}_{i}$ $\displaystyle =$ $\displaystyle y_{i}-\hat{y}_{i}$  
  $\displaystyle =$ $\displaystyle y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i}$  $\displaystyle i=1,\ldots ,n.$ (2.6)

The fitted residuals can be thought of as observations of the random errors $ \varepsilon _{i}$ in the simple linear regression model (2.1).


It is convenient to use the following shorthand notation for the sums involved in the expressions for the parameter estimates (all summations are for $ i=1,\ldots ,n$ ):

$\displaystyle s_{xx}$ $\displaystyle =$ $\displaystyle \sum (x_{i}-\overline{x})^{2}=\sum x_{i}^{2}-\frac{(\sum x_{i})^{2}}{n},$  
$\displaystyle s_{yy}$ $\displaystyle =$ $\displaystyle \sum (y_{i}-\overline{y})^{2}=\sum y_{i}^{2}-\frac{(\sum y_{i})^{2}}{n},$  
$\displaystyle s_{xy}=s_{yx}$ $\displaystyle =$ $\displaystyle \sum (x_{i}-\overline{x})(y_{i}-\overline{y})=\sum
x_{i}y_{i}-\frac{\sum y_{i}\sum x_{i}}{n}.$  

The sums $ s_{xx}$ and $ s_{yy}$ are called corrected sums of squares, and the sums $ s_{xy}$ and $ s_{yx}$ are called corrected sums of cross products. (The corresponding sums involving the random variables $ Y_{i}$ rather than the observations $ y_{i}$ are denoted by upper-case letters: $ S_{yy}$ , $ S_{xy}$ and $ S_{yx}$ .) In this notation, the least squares estimates of the regression parameters $ {\beta }_{1}$ and $ {\beta }_{0}$ of the slope and intercept of the regression line are given by

$\displaystyle \hat{\beta}_{1}=\frac{s_{xy}}{s_{xx}},$ (2.7)

and

$\displaystyle \hat{\beta}_{0}=\overline{y}-\hat{\beta}_{1}\overline{x},$ (2.8)

respectively.


Note that the estimate of $ \hat{\beta}_{1}$ is undefined if $ s_{xx}=0$ (division by zero). But this is not a problem in practice: if $ s_{xx}=0$ the explanatory variable only takes one value, and there can be no best line. Note also that the least squares line passes through the centroid (the point $ (\overline{x},\overline{y})$ ) of the data.


Example 2.1 (continued) Mobility of elderly people

For the data on mobility of elderly people, the least squares estimates of the regression parameters are given by

$\displaystyle \hat{\beta}_{1}$ $\displaystyle =$ $\displaystyle -1.340$  
$\displaystyle \hat{\beta}_{0}$ $\displaystyle =$ $\displaystyle 61.314.$  

So, the fitted least squares line has equation

$\displaystyle \hat{y}=61.314-1.340~x.
$

The least squares line is shown in Figure 2.4. The line appears to fit the data reasonably well.

Figure 2.4: Mobility data; the least squares line
\includegraphics[width=0.75\textwidth]{fig/mobilityls}

$ \diamondsuit$


Example 2.2        Age and height of children

In the example from Module 1 on age and height of children from an Egyptian village, the interest was in the overall growth pattern of the children. The least squares line relating average height to age has the equation

$\displaystyle \hat{y}=64.927+0.635x.
$

That is,

   Height $\displaystyle =64.927+0.635\times$   Age,$\displaystyle $

where height is measured in cm, and age in months. Figure 2.5 shows the least squares line in a scatterplot of the data. You can see that the line fits the data very well.

Figure 2.5: Age and height data; the least squares line
\includegraphics[width=0.75\textwidth]{fig/ageheightls}


Further details on this dataset can be found here.


$ \diamondsuit$


The least squares principle is the traditional and most common method for estimating the regression parameters. But there exists other estimating criteria: e.g. estimating the parameters by the values that minimise the sum of absolute values of the residuals, or by the values that minimise the sum of orthogonal distances between the observed values and the fitted line. The principle of least squares has various advantages to the other methods. For example, it can be shown that, if the response variables are normally distributed (which is often the case), the least squares estimates of the regression parameters are exactly the maximum likelihood estimates of the parameters.




2.3.2 Coefficient of determination

Previous Section
Next Section

In the previous subsection we used the principle of least squares to fit the `best' straight line to data. But how well does the least squares line explain the variation in the data? In this subsection we describe a measure for roughly assessing how well a fitted line describes the variation in data: the coefficient of determination.


The coefficient of determination compares the amount of variation in the data away from the fitted line with the total amount of variation in the data. The argument is as follows: if we did not have the linear model we would have to use the `naïve' model $ \hat{y}=\bar{y}$ instead. The variation away from the naïve model is $ S_{yy}=\sum_{i=1}^{n}\left(
Y_{i}-\bar{y}\right) ^{2}$ : the total amount of variation in the data. However, if we use the least squares line (2.5) as model, the variation away from model is only

$\displaystyle RSS\left( \hat{\beta}_{0},\hat{\beta}_{1}\right) =\sum_{i=1}^{n}\...
...\beta}_{0}-\hat{\beta}_{1}x_{i}\right) ^{2}=S_{yy}-\frac{S_{xy}^{2}}{s_{xx}}.
$


A measure of the strength of the linear relationship between $ Y$ and $ x$ is the coefficient of determination $ R^{2}$ : it is the proportional reduction in variation obtained by using the least squares line instead of the naïve model. That is, the reduction in variation away from the model $ (S_{yy}-RSS)$ as a proportion of the total variation $ (S_{yy})$ :

$\displaystyle R^{2}=\frac{S_{yy}-RSS}{S_{yy}}=\frac{S_{yy}-S_{yy}+S_{xy}^{2}/s_{xx}}{S_{yy}}=\frac{S_{xy}^{2}}{s_{xx}S_{yy}}.
$

The larger the value of $ R^{2}$ , the greater the reduction from $ S_{yy}$ to $ RSS$ relative to $ S_{yy}$ , and the stronger the relationship between $ Y$ and $ x$ . An estimate of $ R^{2}$ is found by substituting $ S_{yy}$ and $ S_{xy} $ by the observed sums $ s_{yy}$ and $ s_{xy}$ , that is

$\displaystyle r^{2}=\frac{s_{xy}^{2}}{s_{xx}s_{yy}}.
$

Note that the square root of $ r^{2}$ is exactly the estimate from Module 1 of the Pearson correlation coefficient, $ \rho $ , between $ x$ and $ Y$ when $ x$ is regarded as a random variable:

$\displaystyle r=\frac{s_{xy}/\left( n-1\right) }{s_{\left( x\right) }s_{\left( y\right) }},
$

where $ s_{\left( x\right) }=\sqrt{s_{xx}/\left( n-1\right) }$ and $ s_{\left(
y\right) }=\sqrt{s_{yy}/\left( n-1\right) }$ are the standard deviations for $ x$ and $ y$ , respectively.



The value of $ R^{2}$ will always lie between 0 and 1 (or, in percentage, between 0% and 100%). It is equal to 1 if $ \hat{\beta}_{1}\neq 0$ and $ RSS=0$ , that is, if all the data points lie precisely on the fitted straight line (i.e. when there is a `perfect' relationship between $ Y$ and $ x$ ). If the coefficient of determination is close to 1, it is an indication that the data points lie close to the least squares line. The value of $ R^{2} $ is zero if $ RSS=S_{yy}$ , that is, the fitted straight-line model offers no more information about the value of $ Y$ than the naïve model does.


It is tempting to use $ R^{2}$ as a measure of whether a model is good or not. This is not appropriate. Try and think of why for a moment before reading on.


The coefficient of determination is only a measure of how well a straight-line model describes the variation in the data compared to the naïve model-not to other models in general. Even though $ R^{2}$ is close to 1 (i.e. a straight-line explains a large proportion of the variation), it could easily be that a non-linear model explains the data-variation much better than the linear. Methods for assessing the appropriateness of the assumption of a straight-line relationship between $ Y$ and $ x$ will be discussed in Module 4.


Example 2.2 (continued) Age and height of children

The relevant summary statistics for these data are

$\displaystyle s_{xx}=143,\qquad s_{yy}=58.31,$  $\displaystyle s_{xy}=90.8.\vspace*{4pt}
$

The coefficient of determination is given by

$\displaystyle r^{2}=\frac{s_{xy}^{2}}{s_{xx}s_{yy}}=\frac{90.8^{2}}{143\times 58.31}=0.989=98.9\%.
$

Since the coefficient of determination is very high, the model seems to describe the variation in the data very well.

$ \diamondsuit$




2.3.3 Estimating the variance

Previous Section
Next Section

In Subsection 2.3.1, we found that the principle of least squares can provide estimates of the regression parameters in a simple linear regression model. But, in order to fit the model we also need an estimate for the common variance $ \sigma ^{2}.$ Such an estimate is required for making statistical inferences about the true straight-line relationship between $ x$ and $ Y$ . Since $ \sigma ^{2}$ is the common variance of the residuals $ \varepsilon _{i}$ , $ i=1,\ldots ,n,$ it would be natural to estimate it by the sample variance of the fitted residuals (2.6). That is, an estimate would be

$\displaystyle \sum (y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})^{2}/\left( n-1\right)
=RSS/\left( n-1\right) ,
$

where $ RSS=RSS\left( \hat{\beta}_{0},\hat{\beta}_{1}\right) $ . However, it can be shown that this is a biased estimate of $ \sigma ^{2}$ , that is, the corresponding estimator does not have the `correct' mean value: $ {\mathbb{E}}[RSS/\left( n-1\right) ]\neq \sigma ^{2}$ . An unbiased estimate of the common variance, $ \sigma ^{2}$ , is given by

$\displaystyle s^{2}=\frac{RSS\left( \hat{\beta}_{0},\hat{\beta}_{1}\right) }{n-2}=\left( s_{yy}-\frac{s_{xy}^{2}}{s_{xx}}\right) /\left( n-2\right) ,$ (2.9)

The denominator in (2.9) is the residual degrees of freedom (d.f.), that is

   d.f. = number of observations - number of estimated parameters.

In particular, for simple linear regression models, we have $ n$ observations and we have estimated the two regression parameters $ \beta _{0}$ and $ \beta
_{1}$ , so the residual d.f. is $ n-2$ .


Example 2.2 (continued) Age and height of children

The relevant summary statistics for these data are

$\displaystyle s_{xx}$ $\displaystyle =$ $\displaystyle 143,\qquad s_{yy}=58.31,$  
$\displaystyle n$ $\displaystyle =$ $\displaystyle 12,$  $\displaystyle s_{xy}=90.8.\vspace*{4pt}$  

An unbiased estimate of the common variance $ \sigma ^{2}$ is given by

$\displaystyle s^{2}=\left( s_{yy}-\frac{s_{xy}^{2}}{s_{xx}}\right) /\left( n-2\right) =\frac{0.6552}{10}=0.0655.
$


$ \diamondsuit$




2.4 Inference in simple linear regression

Previous Section
Next Section

In Section 2.3 we produced an estimate of the straight line that describes the data-variation best. However, since the estimated line is based on the particular sample of data, $ x_{i}$ and $ y_{i}$ $ i=1,\ldots n,$ we have observed, we would almost certainly get a different line if we took a new sample of data and estimated the line on the basis of the new sample. For example, if we measured the heights and ages of children in the village neighbouring the one in Example 2.2, we would invariably get different measurements, and therefore a different least squares line. In other words: the least squares line is an observation of a random line which varies from one experiment to the next. Likewise, the least squares estimates $ \hat{\beta}_{0}$ and $ \hat{\beta}_{1}$ of the intercept and slope, respectively, of the least squares line, are both observations of random variables. These random variables are called the least squares estimators. (An estimate is non-random and is an observation of an estimator, which is a random variable.) The least squares estimators are given by


$\displaystyle \hat{\beta}_{1}$ $\displaystyle =$ $\displaystyle \frac{S_{xy}}{s_{xx}}=\frac{\sum (x_{i}-\overline{x})(Y_{i}-\overline{Y})}{\sum (x_{i}-\overline{x})^{2}},$ (2.10)
$\displaystyle \hat{\beta}_{0}$ $\displaystyle =$ $\displaystyle \overline{Y}-\hat{\beta}_{1}\overline{x},$ (2.11)

where $ \overline{Y}=\sum Y_{i}/n$ , and with all summations from $ i=1$ to $ n $ . By a similar argument we find that an unbiased estimator for the common variance $ \sigma ^{2}$ is given by
$\displaystyle S^{2}$ $\displaystyle =$ $\displaystyle \left( S_{yy}-\frac{S_{xy}^{2}}{s_{xx}}\right) /\left( n-2\right)$ (2.12)
  $\displaystyle =$ $\displaystyle \sum (Y_{i}-\hat{Y}_{i})^{2}/\left( n-2\right) ,$  

where $ \hat{Y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{i}$ , with $ \hat{\beta}_{0}$ and $ \hat{\beta}_{1}$ being the least squares estimators. Note that the randomness in the estimators is due to the response variables only, since the explanatory variables are non-random. In particular, it can be seen from (2.10) and (2.11) that $ \hat{\beta}_{0} $ and $ \hat{\beta}_{1}$ are linear combinations of the response variables.


It can be shown that the least squares estimators are unbiased, that is, that they have the `correct' mean values:

$\displaystyle {\mathbb{E}}[\hat{\beta}_{0}]=\beta _{0}$ and  $\displaystyle {\mathbb{E}}[\hat{\beta}_{1}]=\beta _{1}.$ (2.13)

Also, the estimator $ S^{2}$ is an unbiased estimator of the common variance $ \sigma ^{2}$ , that is

 $\displaystyle {\mathbb{E}}[S^{2}]=\sigma ^{2}.$ (2.14)


The variances of the estimators $ \hat{\beta}_{0}$ and $ \hat{\beta}_{1}$ can be found from standard results on variances (we shall not do it here). The variances are given by

$\displaystyle {\mbox{var}}[\hat{\beta}_{0}]$ $\displaystyle =$ $\displaystyle \frac{\sigma ^{2}}{n}+\frac{\overline{x}^{2}}{s_{xx}}\sigma ^{2}$ (2.15)
$\displaystyle {\mbox{var}}[\hat{\beta}_{1}]$ $\displaystyle =$ $\displaystyle \frac{\sigma ^{2}}{s_{xx}}.$ (2.16)

Note that both variances decrease when the sample size $ n$ increases. Also, the variances decrease if $ s_{xx}=\sum (x_{i}-\overline{x})^{2}$ is increased. (That is, if the $ x$ -values are widely dispersed.) In some studies, it is possible to design the experiment such that the value of $ s_{xx}$ is high, and hence the variances of the estimators are small. It is desirable to have small variances, as it improves the precision of results drawn from the analysis.


In order to make inferences about the model, such as testing hypotheses and producing confidence intervals for the regression parameters, we need to make some assumption on the distribution of the random variables $ Y_{i}$ . The most common assumption-and the one we shall make here-is that the response variables $ Y_{i}$ are normally distributed.


Module 4 concerns various methods for checking the assumptions of regression models. In this section, we shall simply assume the following about the response variables: the $ Y_{i}$ s are independent normally distributed random variables with equal variances and mean values depending linearly on $ x_{i}$ .




2.4.1 Inference on the regression parameters

Previous Section
Next Section

To test hypotheses and construct confidence intervals for the regression parameters $ \beta _{0}$ and $ \beta _{1}$ , we need the distributions of the parameter estimators $ \hat{\beta}_{0}$ and $ \hat{\beta}_{1}$ . Recall from (2.10) and (2.11) that the least squares estimators $ \hat{\beta}_{0}$ and $ \hat{\beta}_{1}$ are linear combinations of the response variables $ Y_{i}$ . Standard theory on the normal distribution says that a linear combination of independent, normal random variables is normally distributed. Thus, since the $ Y_{i}$ s are independent, normal random variables, the estimators $ \hat{\beta}_{0}$ and $ \hat{\beta}_{1}$ are both normally distributed. In (2.13)-(2.16), we found the mean values and variances of the estimators. Putting everything together, we get that

$\displaystyle \hat{\beta}_{0}$ $\displaystyle \sim$ $\displaystyle N\left( \beta _{0},\sigma ^{2}\left( \frac{1}{n}+\frac{\overline{x}^{2}}{s_{xx}}\right) \right)$  
$\displaystyle \hat{\beta}_{1}$ $\displaystyle \sim$ $\displaystyle N\left( \beta _{1},\frac{\sigma ^{2}}{s_{xx}}\right) .$  

It can be shown that the distribution of the estimator $ S^{2}$ of the common variance $ \sigma ^{2}\ $ is given by

$\displaystyle S^{2}\sim ~\frac{\sigma ^{2}\chi _{n-2}^{2}}{n-2},
$

where $ \chi _{n-2}^{2}$ denotes a chi-square distribution with $ n-2$ degrees of freedom. Moreover, it can be shown that the estimator $ S^{2}$ is independent of the estimators $ \hat{\beta}_{0}$ and $ \hat{\beta}_{1}$ . (But the estimators $ \hat{\beta}_{0}$ and $ \hat{\beta}_{1}$ are not mutually independent.)


We can use these distributional results to test hypotheses on the regression parameters. Since both $ \hat{\beta}_{0}$ and $ \hat{\beta}_{1}$ have normal distributions with variances depending on the unknown quantity $ \sigma ^{2}$ , we can apply standard results for normal random variables with unknown variances. Thus, in order to test $ \beta _{i}$ equal to some value $ \beta
_{i}^{\ast }$ , $ i=0,1$ , that is, to test hypotheses of the form $ H_{0}:\beta
_{i}=\beta _{i}^{\ast },$ for $ i=0,1$ , we can use the $ t$ -test statistic, given by

$\displaystyle t_{\hat{\beta}_{i}}(y)=\frac{\hat{\beta}_{i}-\beta _{i}^{\ast }}{{\mbox{se}}[\hat{\beta}_{i}]},\hspace{1cm}i=0,1,$ (2.17)

where $ {\mbox{se}}[\hat{\beta}_{i}]$ denotes the estimated standard error of the estimator $ \hat{\beta}_{i}$ . That is

$\displaystyle {\mbox{se}}[\hat{\beta}_{0}]=\sqrt{{\mbox{var}}[\hat{\beta}_{0}]}=\sqrt{s^{2}\left( \frac{1}{n}+\frac{\overline{x}^{2}}{s_{xx}}\right) }
$

and

$\displaystyle {\mbox{se}}[\hat{\beta _{1}}]=\sqrt{{\mbox{var}}[\hat{\beta}_{1}]}=\sqrt{s^{2}/s_{xx}}.
$

It can be shown that both test statistics $ t_{\hat{\beta}_{0}}(y)$ and $ t_{\hat{\beta}_{1}}(y)$ have $ t$ -distributions with $ n-2$ degrees of freedom.


The test statistics in (2.17) can be used for testing the parameter $ \beta _{i}$ $ \left( i=0,1\right) $ equal to any value $ \beta _{i}^{\ast }$ . However, for the slope parameter $ \beta _{1}$ , one value is particularly important: if we can test $ \beta _{1}$ equal to zero, the simple linear regression model simplifies to

$\displaystyle Y_{i}=\beta _{0}+\varepsilon _{i},$  $\displaystyle i=1,\ldots ,n.
$

That is, the value of $ Y_{i}$ does not depend on the value of $ x_{i}$ . In other words: the response variable and the explanatory variable are unrelated!


It is common-for instance in computer output-to present the estimates and standard errors of the least squares estimators in a table like the following.

Parameter Estimate Standard error $ t$ -statistic $ {p}$ -value
$ \beta _{0}$ $ \hat{\beta}_{0}$ $ {\mbox{se}}[\hat{\beta}_{0}]$   $ \ $
$ \beta _{1}$ $ \hat{\beta}_{1}$ $ {\mbox{se}}[\hat{\beta}_{1}]$ $ \ $ $ \ $


The column `$ t$ -statistic' contains the $ t$ -test statistic (2.17) for testing the hypotheses $ H_{0}:\beta _{0}=0$ and $ H_{0}:\beta _{1}=0,$ respectively. (Should you wish to test a parameter equal to a different value, it is easy to produce the appropriate test statistic (2.17) from the table.) The column `$ p$ -value' contains the $ p$ -values corresponding to the $ t$ -test statistic in the same row.


Example 2.2 (continued) Age and height of children

For the data on age and height of Egyptian children, the table is given by

Parameter Estimate Standard error $ t$ -statistic $ p$ -value
$ \beta _{0}$ $ 64.9283$ $ 0.5084$ $ 127.7085$ $ 0.0000$
$ \beta _{1}$ $ 0.6350$ $ 0.0214$ $ 29.6647$ $ 0.0000$


Not surprisingly, neither parameter can be tested equal to zero. If, for some reason, we wished to test whether the slope parameter was equal to 0.58, say, the test statistic would be

$\displaystyle t_{\hat{\beta}_{1}}(y)=\frac{\hat{\beta}_{1}-0.58}{{\mbox{se}}[\hat{\beta}_{1}]}=\frac{0.635-0.58}{0.0214}=2.570.
$

Since $ n=12$ in this example, the test statistic has a $ t\left( 10\right) $ -distribution. The $ p$ -value for this test is 0.0279, thus, on the basis of these data we reject the hypothesis that the slope parameter is 0.58, at the 5% significance level.

$ \diamondsuit$


A second practical use of the table is to provide confidence intervals for the regression parameters. The $ 1-\alpha $ confidence interval for $ \beta
_{0}$ and $ \beta _{1}$ are given by, respectively,

$\displaystyle \hat\beta _{0}\pm t_{1-\alpha /2}(n-2){\mbox{se}}[\hat\beta _{0}],
$

and

$\displaystyle \hat{\beta}_{1}\pm t_{1-\alpha /2}(n-2){\mbox{se}}[\hat{\beta}_{1}].
$

In order to construct the confidence intervals, all that is needed is the table and $ t_{1-\alpha /2}(n-2)$ : the $ \left( 1-\alpha /2\right) $ -quantile of a $ t\left( n-2\right) $ -distribution.


Example 2.2 (continued) Age and height of children

For the data on age and height of Egyptian children, the 95% confidence intervals for the regression parameters can be obtained from the table for these data and the 0.975-quantile of a $ t\left( 10\right) $ -distribution: $ t_{0.975}\left( 10\right) =2.2281$ . The confidence intervals for $ \beta
_{0} $ and $ \beta _{1}$ are, respectively,

$\displaystyle \beta _{0}:(63.80,66.06)
$

and

$\displaystyle \beta _{1}:(0.587,0.683).
$


$ \diamondsuit$




2.5 Summary

Previous Section
Next Section

In this module, the simple linear regression model has been discussed. We have described a method, based on the principle of least squares, for fitting simple linear regression models to data. The principle of least squares says to estimate the regression line by the line which minimises the sum of the squared deviations of the observed data away from the line. The intercept and slope of the fitted line are estimates of the regression parameters $ \beta _{0}$ and $ \beta _{1}$ , respectively. Further, an unbiased estimate of the common variance has been given. Under the assumption of normality of the response variables, we have tested hypotheses and constructed confidence intervals for the regression parameters.


Keywords: simple linear regression model, regression parameters, regression line, linear predictor, observed residuals, residuals, principle of least squares, residual sum of squares, least squares estimates, least squares line, fitted regression line, fitted values, predicted values, fitted residuals, $ R^{2}$ , coefficient of determination, bias corrected estimate of common variance, degrees of freedom, least squares estimators, distributions of least squares estimators, hypotheses on regression parameters, confidence intervals for regression parameters.


HOME | Back

Last modified February 12, 2008. Webmaster