3.1 Normal Model

Regression is a collection of statistical function-fitting techniques. These techniques are classified according to the form of the function being fit to the data. In linear regression a linear function is used to describe the relation between the independent variable or vector and a dependent variable . This function has the form

(3.1) |

where is the vector of unknown parameters to be estimated from the data. If we assume that has a zeroth element , we include the constant term in the parameter vector and write this function more conveniently as

(3.2) |

Because is linear, the parameters are sometimes called the slope parameters while is the offset at the origin. Due to random processes such as measurement errors, we assume that does not correspond perfectly to . For this reason a statistical model is created which accounts for randomness. For linear regression we will use the linear model

where the error-term is a Normally-distributed random variable with zero mean and unknown variance . Therefore the expected value of this linear model is

and for this reason is called the

Suppose
is an
matrix whose rows
represent experiments, each described by variables. Let
be an
vector representing the outcome of each
experiment in
. We wish to estimate values for the parameters
such that the linear model of
Equation 3.3 is, hopefully, a useful summarization
of the data
. One common method of parameter
estimation for linear regression is *least squares*. This method
finds a vector
which minimizes the *residual sum of
squares* (RSS), defined as

where is the row of . Note that is the ``true'' parameter vector, while is an informed guess for . Throughout this thesis a variable with a circumflex, such as , is an estimate of some quantity we cannot know like . The parameter vector is sometimes called a

To minimize the RSS we compute the partial derivative at
with respect to
, for
, and set the
partials equal to zero. The result is the *score equations*
for linear regression, from
which we compute the *least squares estimator*

(3.8) |

The expected value of is , and hence is unbiased. The covariance matrix of , cov, is . The variances along the diagonal are the smallest possible variances for any unbiased estimate of . These properties follow from our assumption that the errors for predictions were independent and Normally distributed with zero mean and constant variance [30].

Another method of estimating
is *maximum likelihood
estimation*. In this method we evaluate the probability of
encountering the outcomes
for our data
under the
linear model of Equation 3.3 when
. We will choose as our estimate of
the value
which maximizes the *likelihood function*

over . We are interested in maximization of and not its actual value, which allows us to work with the more convenient log-transformation of the likelihood. Since we are maximizing over we can drop factors and terms which are constant with respect to . Discarding constant factors and terms that will not affect maximization, the

(3.11) | |||

(3.12) |

To maximize the log-likelihood function we need to minimize . This is the same quantity minimized in Equation 3.7, and hence it has the same solution. In general one would differentiate the log-likelihood and set the result equal to zero, and the result is again the linear regression score equations. In fact these equations are typically defined as the derivative of the log-likelihood function. We have shown that the maximum likelihood estimate (MLE) for is identical to the least squares estimate under our assumptions that the errors are independent and Normally distributed with zero mean and constant variance .

If the variance
for each outcome is different, but
known and independent of the other experiments, a simple variation
known as *weighted least squares* can be used. In this procedure
a *weight matrix*
diag is used to standardize the unequal variances. The score
equations become
, and the
*weighted least squares estimator* is

(3.13) |

It is also possible to accommodate correlated errors when the covariance matrix is known. [30]