Tables 5.5 through 5.8 summarize the majority of our stability experiments on sparse binary datasets. Each of these organizes results for ten-fold cross-validation experiments for combinations of modelmin and modelmax, margin, rrlambda, and cgwindow and cgdecay. Because the tests are cross-validations, the testing sets for each fold come from the same distribution as the training set for each fold. Each parameter can have one of two values, as shown in Table 5.4, where one value effectively disables it and the other is chosen to illustrate that parameter's effect on computations. The asymmetry seen in the modelmin and modelmax ``on'' values is due to the asymmetry of the IEEE 756 floating point representation in which denormalized values allow greater resolution near zero. Note that cgwindow and cgdecay are disabled by making them very large. Unless stated otherwise, the binitmean is disabled and wmargin is zero.
|Parameter(s)||``Off'' (-) values||``On'' (x) values|
|modelmin, modelmax||0.0, 1.0||1e-100, 0.99999998|
|cgwindow, cgdecay||1000, 1000||3, 2|
The columns of the stability experiment tables are arranged in four groups. The first group has ``-'' and ``x'' symbols for each of the binarized parameters, or pairs of parameters. A ``-'' indicates the parameter or pair of parameters were set to their ``off'' state as defined in Table 5.4, while ``x'' indicates the ``on'' state from the same table. The mm column represents the state of the pair modelmin and modelmax, the mar column represents margin, rrl represent rrlambda, and cgw represents the pair cgwindow and cgdecay.
The second, third and forth groups of columns represent the performance attained when the stability parameters are set as indicated by the first group of columns. The title ``Loose Epsilon'' above the second group indicates that cgeps and lreps were set to the rather large values 0.1 and 0.5, respectively. The third group uses moderate epsilons, with cgeps set to 0.001 and lreps set to 0.1. The fourth group has ``tight'' epsilons, with cgeps set to 0.000001 and lreps set to 0.0001. The sub-columns of each group represent the AUC score, whether NaN values were encountered during computation, the minimum average deviance achieved during the ten folds of the cross-validation, and the number of real seconds elapsed during computation. We do not provide confidence intervals for the scores because the focus is on stability and not on optimal performance or speed. Our indication of stabile computations is a good score and a good speed, as judged against other results in the same table.
The purpose of the Loose Epsilon, Moderate Epsilon and Tight Epsilon groups is to explore how well stability parameters compensate for different optimality criteria. Once we have analyzed the stability parameters using their binarized value we can explore optimal settings. After this work with the stability parameters is finished, Section 5.2.2 will focus on finding widely-applicable termination criteria which balance optimality and speed.