5.2.1.3 Basic Stability Test:

Loose Epsilon | Moderate Epsilon | Tight Epsilon | |||||||||||||||||||||||||

mm |
mar |
rrl |
cgw |
AUC | NaN |
DEV | Time | AUC | NaN |
DEV | Time | AUC | NaN |
DEV | Time | ||||||||||||

- | - | - | - | 0.897 | - | 3746 | 14 | 0.896 | x | 821 | 534 | 0.894 | x | 812 | 542 | ||||||||||||

x | - | - | - | 0.897 | - | 3746 | 16 | 0.896 | x | 821 | 534 | 0.894 | x | 812 | 560 | ||||||||||||

- | x | - | - | 0.897 | - | 3564 | 16 | 0.895 | x | 759 | 559 | 0.894 | x | 803 | 562 | ||||||||||||

x | x | - | - | 0.897 | - | 3564 | 15 | 0.895 | x | 759 | 558 | 0.894 | x | 803 | 570 | ||||||||||||

- | - | x | - | 0.897 | - | 3755 | 16 | 0.948 | - | 2087 | 111 | 0.948 | - | 2037 | 399 | ||||||||||||

x | - | x | - | 0.897 | - | 3755 | 16 | 0.948 | - | 2087 | 106 | 0.948 | - | 2037 | 400 | ||||||||||||

- | x | x | - | 0.897 | - | 3572 | 14 | 0.948 | - | 1990 | 110 | 0.948 | - | 1961 | 373 | ||||||||||||

x | x | x | - | 0.897 | - | 3572 | 16 | 0.948 | - | 1990 | 107 | 0.948 | - | 1961 | 374 | ||||||||||||

- | - | - | x | 0.897 | - | 3746 | 16 | 0.932 | x | 1417 | 79 | 0.932 | x | 1247 | 90 | ||||||||||||

x | - | - | x | 0.897 | - | 3746 | 16 | 0.932 | x | 1417 | 83 | 0.932 | x | 1247 | 89 | ||||||||||||

- | x | - | x | 0.897 | - | 3564 | 15 | 0.925 | x | 1214 | 96 | 0.926 | x | 1126 | 100 | ||||||||||||

x | x | - | x | 0.897 | - | 3564 | 16 | 0.925 | x | 1214 | 95 | 0.926 | x | 1126 | 101 | ||||||||||||

- | - | x | x | 0.897 | - | 3755 | 15 | 0.948 | - | 2087 | 80 | 0.949 | - | 2033 | 270 | ||||||||||||

x | - | x | x | 0.897 | - | 3755 | 16 | 0.948 | - | 2087 | 81 | 0.949 | - | 2033 | 271 | ||||||||||||

- | x | x | x | 0.897 | - | 3572 | 16 | 0.948 | - | 1991 | 85 | 0.948 | - | 1959 | 217 | ||||||||||||

x | x | x | x | 0.897 | - | 3572 | 16 | 0.948 | - | 1991 | 82 | 0.948 | - | 1959 | 217 |

The first table to consider is Table 5.5. For this set
of experiments on `ds1` there are several easy conclusions. All of
the times and scores in the Loose Epsilon group are very close to one
another, indicating little effect by the tested stability parameters.
This is not true in the Moderate Epsilon or Tight Epsilon groups. By comparing
pairs of rows one and two, three and four, etc., we see that the
`modelmin` and `modelmax` parameters had little or no effect. Comparing
these pairs of rows to one another shows that the `margin` parameter
reduces the average minimum deviance, but by less than ten percent.
Recall that the deviance is the LR loss function, and hence smaller is
better. Furthermore `margin` made no significant change in LR's ability
to correctly rank the test rows, judging by the small change in the
AUC score. Recall from Section 5.1.4 that an AUC of
one is the best possible score, and an AUC of zero is the worst
possible score.

Comparing pairs of four rows shows the effect of the ridge-regression
weight parameter `rrlambda`. This parameter makes a significant difference
in the AUC score, the presence of NaN values in computations, the
average minimum deviance and the speed. Though the deviance went up,
the AUC improved. This suggests that the large coefficient penalty
from the `rrlambda` parameter is preventing over-fitting of the training
data. This will be discussed in greater detail after all of the IRLS
stability charts are presented.

Finally we may compare the first and last halves of the table to see
the effect of the `cgwindow` and `cgdecay` parameters. In the second half we
see similar AUC scores, NaN occurrence, and deviance to the first
half when `rrlambda` is used. However a clear improvement has been made
when `rrlambda` isn't used. This suggests that experiments with `rrlambda` active never needed the `cgwindow` or `cgdecay` protection. With `cgwindow` and
`cgdecay` active the non-`rrlambda` and `rrlambda` AUC scores are much closer than
before, as are the times. The deviances still appear to dip too low
without `rrlambda`, if our hypothesis of over-fitting is correct.

Our conclusions from Table 5.5 are that `modelmin`,
`modelmax` and `margin` aren't useful, while regularization through `rrlambda` and constant-improvement checks like `cgwindow` and `cgdecay` do appear useful.
These conclusions apply only to experiments on `ds1` with the `wmargin` and
`binitmean` parameters disabled. We will continue our analysis on
the remaining three sparse datasets, though more briefly than for this
first example, and summarize our findings at the end.