    Next: 4.2 Maximum Likelihood Estimation Up: 4. Logistic Regression Previous: 4. Logistic Regression   Contents

# 4.1 Logistic Model

Let be a dataset with binary outcomes. For each experiment in the outcome is either or . Experiments with outcome are said to belong to the positive class, while experiments with belong to the negative class. We wish to create a regression model which allows classification of an experiment as positive or negative, that is, belonging to either the positive or negative class. Though LR is applicable to datasets with outcomes in , we will restrict our discussion to the binary case.  We can think of an experiment in as a Bernoulli trial with mean parameter . Thus is a Bernoulli random variable with mean and variance . It is important to note that the variance of depends on the mean and hence on the experiment . To model the relation between each experiment and the expected value of its outcome, we will use the logistic function. This function is written as (4.1)

where is the vector of parameters, and its shape may be seen in Figure 4.1. We assume that so that is a constant term, just as we did for linear regression in Section 3.1. Thus our regression model is (4.2)

where is our error term. It may be easily seen in Figure 4.2 that the error term can have only one of two values. If then , otherwise . Since is Bernoulli with mean and variance , the error has zero mean and variance . This is different than the linear regression case where the error and outcome had constant variance independent of the mean.

Because the LR model is nonlinear in , minimizing the RSS as defined for linear regression in Section 3.1 is not appropriate. Not only is it difficult to compute the minimum of the RSS, but the RSS minimizer will not correspond to maximizing the likelihood function. It is possible to transform the logistic model into one which is linear in its parameters using the logit function , defined as (4.3)

We apply the logit to the outcome variable and expectation function of the original model in Equation 4.2 to create the new model (4.4)

However, we cannot use linear regression least squares techniques for this new model because the error is not Normally distributed and the variance is not independent of the mean. One might further observe that this transformation is not well defined for or . Therefore we turn to parameter estimation methods such as maximum likelihood or iteratively re-weighted least squares.

It is important to notice that LR is a linear classifier, a result of the linear relation of the parameters and the components of the data. This indicates that LR can be thought of as finding a hyperplane to separate positive and negative data points. In high-dimensional spaces, the common wisdom is that linear separators are almost always adequate to separate the classes.    Next: 4.2 Maximum Likelihood Estimation Up: 4. Logistic Regression Previous: 4. Logistic Regression   Contents