next up previous contents
Next: 5.2.1.4 Basic Stability Test: Up: 5.2.1 Indirect (IRLS) Stability Previous: 5.2.1.2 Stability Parameter Tables   Contents


5.2.1.3 Basic Stability Test: ds1


Table 5.5: IRLS stability experiments for ds1. binitmean is disabled and wmargin is 0. The first four columns represent the state of modelmin and modelmax, margin, rrlambda, and cgwindow and cgdecay.
           Loose Epsilon Moderate Epsilon  Tight Epsilon             
mm  mar  rrl  cgw AUC  NaN  DEV  Time AUC  NaN  DEV  Time AUC  NaN  DEV  Time
-  -  -  - 0.897  -  3746  14 0.896  x  821  534 0.894  x  812  542
x  -  -  - 0.897  -  3746  16 0.896  x  821  534 0.894  x  812  560
-  x  -  - 0.897  -  3564  16 0.895  x  759  559 0.894  x  803  562
x  x  -  - 0.897  -  3564  15 0.895  x  759  558 0.894  x  803  570
-  -  x  - 0.897  -  3755  16 0.948  -  2087  111 0.948  -  2037  399
x  -  x  - 0.897  -  3755  16 0.948  -  2087  106 0.948  -  2037  400
-  x  x  - 0.897  -  3572  14 0.948  -  1990  110 0.948  -  1961  373
x  x  x  - 0.897  -  3572  16 0.948  -  1990  107 0.948  -  1961  374
-  -  -  x 0.897  -  3746  16 0.932  x  1417  79 0.932  x  1247  90
x  -  -  x 0.897  -  3746  16 0.932  x  1417  83 0.932  x  1247  89
-  x  -  x 0.897  -  3564  15 0.925  x  1214  96 0.926  x  1126  100
x  x  -  x 0.897  -  3564  16 0.925  x  1214  95 0.926  x  1126  101
-  -  x  x 0.897  -  3755  15 0.948  -  2087  80 0.949  -  2033  270
x  -  x  x 0.897  -  3755  16 0.948  -  2087  81 0.949  -  2033  271
-  x  x  x 0.897  -  3572  16 0.948  -  1991  85 0.948  -  1959  217
x  x  x  x 0.897  -  3572  16 0.948  -  1991  82 0.948  -  1959  217

The first table to consider is Table 5.5. For this set of experiments on ds1 there are several easy conclusions. All of the times and scores in the Loose Epsilon group are very close to one another, indicating little effect by the tested stability parameters. This is not true in the Moderate Epsilon or Tight Epsilon groups. By comparing pairs of rows one and two, three and four, etc., we see that the modelmin and modelmax parameters had little or no effect. Comparing these pairs of rows to one another shows that the margin parameter reduces the average minimum deviance, but by less than ten percent. Recall that the deviance is the LR loss function, and hence smaller is better. Furthermore margin made no significant change in LR's ability to correctly rank the test rows, judging by the small change in the AUC score. Recall from Section 5.1.4 that an AUC of one is the best possible score, and an AUC of zero is the worst possible score.

Comparing pairs of four rows shows the effect of the ridge-regression weight parameter rrlambda. This parameter makes a significant difference in the AUC score, the presence of NaN values in computations, the average minimum deviance and the speed. Though the deviance went up, the AUC improved. This suggests that the large coefficient penalty from the rrlambda parameter is preventing over-fitting of the training data. This will be discussed in greater detail after all of the IRLS stability charts are presented.

Finally we may compare the first and last halves of the table to see the effect of the cgwindow and cgdecay parameters. In the second half we see similar AUC scores, NaN occurrence, and deviance to the first half when rrlambda is used. However a clear improvement has been made when rrlambda isn't used. This suggests that experiments with rrlambda active never needed the cgwindow or cgdecay protection. With cgwindow and cgdecay active the non-rrlambda and rrlambda AUC scores are much closer than before, as are the times. The deviances still appear to dip too low without rrlambda, if our hypothesis of over-fitting is correct.

Our conclusions from Table 5.5 are that modelmin, modelmax and margin aren't useful, while regularization through rrlambda and constant-improvement checks like cgwindow and cgdecay do appear useful. These conclusions apply only to experiments on ds1 with the wmargin and binitmean parameters disabled. We will continue our analysis on the remaining three sparse datasets, though more briefly than for this first example, and summarize our findings at the end.


next up previous contents
Next: 5.2.1.4 Basic Stability Test: Up: 5.2.1 Indirect (IRLS) Stability Previous: 5.2.1.2 Stability Parameter Tables   Contents
Copyright 2004 Paul Komarek, komarek@cmu.edu