next up previous contents
Next: 4.3 Iteratively Re-weighted Least Up: 4. Logistic Regression Previous: 4.1 Logistic Model   Contents


4.2 Maximum Likelihood Estimation

Recall that the outcome $ y$ is a Bernoulli random variable with mean $ \mu(\ensuremath{\mathbf{x}},\ensuremath{\mathbf{\beta}})$ in the LR model. Therefore we may interpret the expectation function as the probability that $ y = 1$, or equivalently that $ \mathbf{x_i}$ belongs to the positive class. Thus we may compute the probability of the $ i^{\mbox{th}}$ experiment and outcome in the dataset $ \ensuremath{\mathbf{X}},\ensuremath{\mathbf{y}}$ as

P$\displaystyle (\ensuremath{\mathbf{x_i}},y_i\vert\beta)$ $\displaystyle =$ \begin{displaymath}\left\{
\begin{array}{ll}
\mu(\ensuremath{\mathbf{x}},\ensure...
...suremath{\mathbf{\beta}}) & \mbox{if $y=0$,}
\end{array}\right.\end{displaymath} (4.5)
  $\displaystyle =$ $\displaystyle \mu(\ensuremath{\mathbf{x}},\ensuremath{\mathbf{\beta}})^y   (1-\mu(\ensuremath{\mathbf{x}},\ensuremath{\mathbf{\beta}}))^{1-y}$ (4.6)

From this expression we may derive likelihood and log-likelihood of the data $ \ensuremath{\mathbf{X}},\ensuremath{\mathbf{y}}$ under the LR model with parameters $ \mathbf{beta}$ as
$\displaystyle \mathbb{L}(\ensuremath{\mathbf{X}}, \ensuremath{\mathbf{y}}, \ensuremath{\mathbf{\beta}})$ $\displaystyle =$ $\displaystyle \prod_{i=1}^R \mu(\ensuremath{\mathbf{x_i}},\ensuremath{\mathbf{\...
...{y_i}  
(1-\mu(\ensuremath{\mathbf{x_i}},\ensuremath{\mathbf{\beta}}))^{1-y_i}$ (4.7)
$\displaystyle \ln \mathbb{L}(\ensuremath{\mathbf{X}}, \ensuremath{\mathbf{y}}, \ensuremath{\mathbf{\beta}})$ $\displaystyle =$ $\displaystyle \sum_{i=1}^R \left( \rule{0mm}{10pt}
y_i \ln(\mu(\ensuremath{\mat...
...i) \ln(1 - \mu(\ensuremath{\mathbf{x_i}}, \ensuremath{\mathbf{\beta}}))
\right)$ (4.8)

The likelihood and log-likelihood functions are nonlinear in $ \mathbf{beta}$ and cannot be solved analytically. Therefore numerical methods are typically used to find the MLE $ \hat{\ensuremath{\mathbf{\beta}}}$. CG is a popular choice, and by some reports CG provides as good or better results for this task than any other numerical method tested to date [27]. The time complexity of this approach is simply the time complexity of the numerical method used.


next up previous contents
Next: 4.3 Iteratively Re-weighted Least Up: 4. Logistic Regression Previous: 4.1 Logistic Model   Contents
Copyright 2004 Paul Komarek, komarek@cmu.edu