In previous sections we have seen the utility of enabling cgwindow and cgdecay. In Tables 5.15 and 5.16 we show the effects of enabling only cgwindow and cgdecay. From these tables we hope to determine whether both parameters are necessary. These tables replace the cgw column with separate columns for cgwindow and cgdecay. These columns are labeled cgw and cgd, respectively. Moderate epsilons were used, binitmean was disabled, and wmargin=0.
Table 5.15 shows results for dataset ds1 with moderate epsilons. Many of the times are comparable to those of the Moderate Epsilon group in Table 5.5, but keep in mind that the rows are not equivalent. Enabling cgwindow is clearly more effective at preventing overfitting than enabling cgdecay. When rrlambda is enabled the deviances are the same, but the cgwindow experiments require less time. Comparing the cgwindow and cgdecay experiments to the lower half of the Moderate Epsilon group in Table 5.5 reveals discrepancies in the deviances obtained. This implies occasional interaction between cgwindow and cgdecay, rather than one always terminating before the other. Our conclusion from Table 5.15 is that cgwindow is superior to cgdecay and is effective by itself.
The experiments in Table 5.16 were run on the dense dataset ds1.100pca with moderate epsilons. There is less variation in deviance and AUC , but the times still indicate that cgwindow is terminating CG iterations at a more appropriate time than cgdecay. The slightly negative interaction with rrlambda, observed previously in Table 5.9, is still present. In contrast with the ds1 experiments we just analyzed, the DEV achieved when cgwindow is enabled exactly matches that of the combined cgwindow and cgdecay experiments in Table 5.9. This suggests that in those experiments there was little or no interaction between cgwindow and cgdecay. Again we conclude that cgdecay is unnecessary when cgwindow is enabled.
Different values for cgdecay may improve its performance. In the best scenario for cgdecay, the cgwindow could allow a few extra CG iterations. For our datasets, those iterations are not expensive relative to the total cost of the optimization. Extra CG iterations may contribute to overfitting if they reduce the training data deviance too far; in this case neither cgwindow or cgdecay will have any effect since these parameters catch deviance increases. Thus a finely tuned cgdecay is unlikely to do much better than cgwindow. For this reason we will not explore cgdecay further.