|Loose Epsilon||Moderate Epsilon||Tight Epsilon|
The first table to consider is Table 5.5. For this set of experiments on ds1 there are several easy conclusions. All of the times and scores in the Loose Epsilon group are very close to one another, indicating little effect by the tested stability parameters. This is not true in the Moderate Epsilon or Tight Epsilon groups. By comparing pairs of rows one and two, three and four, etc., we see that the modelmin and modelmax parameters had little or no effect. Comparing these pairs of rows to one another shows that the margin parameter reduces the average minimum deviance, but by less than ten percent. Recall that the deviance is the LR loss function, and hence smaller is better. Furthermore margin made no significant change in LR's ability to correctly rank the test rows, judging by the small change in the AUC score. Recall from Section 5.1.4 that an AUC of one is the best possible score, and an AUC of zero is the worst possible score.
Comparing pairs of four rows shows the effect of the ridge-regression weight parameter rrlambda. This parameter makes a significant difference in the AUC score, the presence of NaN values in computations, the average minimum deviance and the speed. Though the deviance went up, the AUC improved. This suggests that the large coefficient penalty from the rrlambda parameter is preventing over-fitting of the training data. This will be discussed in greater detail after all of the IRLS stability charts are presented.
Finally we may compare the first and last halves of the table to see the effect of the cgwindow and cgdecay parameters. In the second half we see similar AUC scores, NaN occurrence, and deviance to the first half when rrlambda is used. However a clear improvement has been made when rrlambda isn't used. This suggests that experiments with rrlambda active never needed the cgwindow or cgdecay protection. With cgwindow and cgdecay active the non-rrlambda and rrlambda AUC scores are much closer than before, as are the times. The deviances still appear to dip too low without rrlambda, if our hypothesis of over-fitting is correct.
Our conclusions from Table 5.5 are that modelmin, modelmax and margin aren't useful, while regularization through rrlambda and constant-improvement checks like cgwindow and cgdecay do appear useful. These conclusions apply only to experiments on ds1 with the wmargin and binitmean parameters disabled. We will continue our analysis on the remaining three sparse datasets, though more briefly than for this first example, and summarize our findings at the end.