Next: 4.4 Logistic Regression for Up: 4. Logistic Regression Previous: 4.2 Maximum Likelihood Estimation   Contents

# 4.3 Iteratively Re-weighted Least Squares

An alternative to numerically maximizing the LR maximum likelihood equations is the iteratively re-weighted least squares (IRLS) technique. This technique uses the Newton-Raphson algorithm to solve the LR score equations. This process is explained in more detail below.

To compute the LR score equations we first differentiate the log-likelihood function with respect to the parameters

 (4.9) (4.10) (4.11)

where and is the number of parameters. We then set each of these partial derivatives equal to zero. If we extend the definition of such that , we may write the score equations as

 (4.12)

Define as the vector of partial derivatives shown in Equation 4.11. Finding a solution to the LR score equations is equivalent to finding the zeros of . Because the LR likelihood is convex [27], there will be only one minimum and hence exactly one zero for . The Newton-Raphson algorithm finds this optimum through repeated linear approximations. After an initial guess is chosen, each Newton-Raphson iteration updates its guess for via the first-order approximation

 J (4.13)

where J is the Jacobian of evaluated as . The Jacobian is a matrix, and each row is simply the transposed gradient of one component of evaluated at . Hence the -element of J is . If we define and    diag, then the Jacobian may be written in matrix notation as . Using this fact and Equation 4.12 we may rewrite the Newton-Raphson update formula as

 (4.14)

Since we may rewrite Equation 4.14 as
 (4.15) (4.16)

where [25,30,10]. The elements of vector are often called the adjusted dependent covariates, since we may view Equation 4.16 as the weighted least squares problem from Section 3.1 with dependent variables, or covariates, . Newton-Raphson iterations may be stopped according to any of several criteria such as convergence of or the LR log-likelihood from Equation 4.8.

Solving Equation 4.16 required a matrix inversion or Cholesky back-substitution, as well as several matrix vector multiplications. As a result, IRLS as described has time complexity O. We have commented several times in this thesis that O is unacceptably slow for our purposes. A simple alternative to the computations of Equation 4.16 will be presented in Chapter 5.

Next: 4.4 Logistic Regression for Up: 4. Logistic Regression Previous: 4.2 Maximum Likelihood Estimation   Contents