next up previous contents
Next: 5.2 IRLS Parameter Evaluation Up: 5.1 Preliminaries Previous: 5.1.5 Computing Platform   Contents

5.1.6 Scope

In the sections and subsections that follow, we explore many ways to improve the stability, accuracy and speed of LR computations. This presentation is broken into two sections for our two LR parameter estimation methods. Section 5.2 discusses variations on IRLS with CG. We will refer to this combination simply as IRLS. Section 5.3 discusses variations on MLE, where CG is the numerical method used to find the optimum parameter estimate. This combination will be called CG-MLE. We will be using the datasets, scoring method, and computing platform described above.

For the rest of this chapter, ``parameters'' will no longer refer to the LR parameters $ \mathbf{beta}$. The parameters discussed below are implementation parameters that control which variations of LR computations are being used or how the computations proceed. For example, the modelmax parameter makes an adjustment to the LR expectation function, while the cgeps parameter is an error bound used for termination of CG iterations. Our goal in exploring these variations is to choose an implementation which is stable, correct, fast and autonomous. Since an autonomous classifier cannot require humans to micro-manage run-time parameters, we will seek default settings which meet our stability, correctness and speed goals on a wide variety of datasets. The six real-world datasets described in Section 5.1.3 will be used to evaluate our implementation and support our decisions.

In the IRLS and CG-MLE sections we divide the implementation parameters into three categories, according to their proposed purpose. These categories are

  1. controlling the stability of computations
  2. controlling termination and optimality of the final solution
  3. enhancing speed
Many of the parameters belong to multiple categories. For example, proper termination of CG requires numerically stable iterations, and hence depends on stability parameters. Each parameter will be discussed in the context that motivated its inclusion in our experiments.

The parameters in each category will be thoroughly tested for effectiveness. Parameters which consistently enhance performance for all of the datasets will have default values assigned. These defaults will be chosen after further empirical evaluation, with optimality of the AUC score preferred over speed. Each section ends with a summary of the useful techniques and the default values chosen for the corresponding parameters. Our final LR implementations will be characterized and compared in Chapter 6.

next up previous contents
Next: 5.2 IRLS Parameter Evaluation Up: 5.1 Preliminaries Previous: 5.1.5 Computing Platform   Contents
Copyright 2004 Paul Komarek,