next up previous contents
Next: 5.2.2 Indirect (IRLS) Termination Up: 5.2.1 Indirect (IRLS) Stability Previous: 5.2.1.14 Stability Tests: Conclusions   Contents


5.2.1.15 Stability Tests: Default Parameters

In this section we present two tables which motivate our selection of default stability parameter values. Our goal is to find a set of parameters that does well on all datasets, though our choices might not be optimal for any dataset. All later experiments will use the default values, hoping to encourage the idea that IRLS can be used as an autonomous classifier with good results.

To make the times in this section's experiments comparable, all experiments were run as the only job on the computing platform described in Section 5.1.5. The experiments are broken into two sets, one with moderate epsilons and one with tight epsilons. Loose epsilons were not tested because of the lack of information the loose epsilon columns provided in previous sections. Each of the six datasets used in previous stability experiments are used again here. The IRLS stability parameters in question are rrlambda and cgwindow. For rrlambda the values 1, 5, 10, 50, 100 and 500 were tested, while for cgwindow the values 1, 2, 3 and 5. These test values are chosen based on previous experience, the nature of the parameters, and the desire to avoid excessive computation.


Table 5.17: Moderate Epsilon
  Time AUC               
  rrlambda rrlambda               
Dataset cgw 1  5  10  50  100  500 1  5  10  50  100  500
ds1 1 37  35  38  28  28  20 0.944  0.945  0.945  0.939  0.932  0.898
  2 74  50  45  39  33  24 0.943  0.947  0.946  0.941  0.932  0.909
  3 105  93  81  72  58  32 0.939  0.947  0.948  0.941  0.934  0.906
  5 160  126  106  77  57  39 0.941  0.948  0.948  0.941  0.934  0.906
imdb 1 135  134  135  133  133  133 0.972  0.972  0.972  0.972  0.972  0.972
  2 135  134  134  133  135  135 0.972  0.972  0.972  0.972  0.972  0.972
  3 134  132  134  133  135  134 0.972  0.972  0.972  0.972  0.972  0.972
  5 131  134  135  134  133  136 0.972  0.972  0.972  0.972  0.972  0.972
citeseer 1 36  37  34  34  35  35 0.928  0.928  0.928  0.928  0.928  0.928
  2 37  35  36  34  36  36 0.928  0.928  0.928  0.928  0.928  0.928
  3 34  34  36  35  37  34 0.928  0.928  0.928  0.928  0.928  0.928
  5 34  34  34  34  34  36 0.928  0.928  0.928  0.928  0.928  0.928
ds2 1 663  656  655  688  705  670 0.682  0.677  0.681  0.689  0.687  0.668
  2 668  679  680  708  707  697 0.682  0.689  0.689  0.687  0.690  0.667
  3 671  679  681  704  710  697 0.682  0.689  0.689  0.689  0.690  0.667
  5 667  680  681  701  710  698 0.682  0.689  0.689  0.689  0.690  0.667
ds1.100pca 1 31  31  31  27  25  21 0.916  0.916  0.916  0.912  0.907  0.891
  2 48  47  50  31  32  28 0.916  0.917  0.917  0.911  0.910  0.894
  3 57  58  52  29  29  38 0.918  0.918  0.917  0.911  0.908  0.893
  5 56  54  48  42  45  41 0.917  0.917  0.915  0.912  0.909  0.893
ds1.10pca 1 4  5  4  5  4  4 0.808  0.808  0.808  0.809  0.809  0.815
  2 5  4  5  5  6  5 0.831  0.830  0.829  0.839  0.839  0.842
  3 7  7  6  5  6  6 0.841  0.841  0.841  0.841  0.840  0.840
  5 7  7  6  7  7  7 0.842  0.842  0.842  0.842  0.841  0.840


Table 5.18: Tight Epsilon
  Time AUC               
  rrlambda rrlambda               
Dataset cgw 1  5  10  50  100  500 1  5  10  50  100  500
ds1 1 128  137  137  155  124  46 0.944  0.946  0.946  0.940  0.932  0.907
  2 169  196  210  113  96  90 0.943  0.948  0.948  0.940  0.932  0.909
  3 119  281  234  155  211  66 0.941  0.948  0.949  0.940  0.932  0.904
  5 226  304  283  183  150  83 0.941  0.948  0.948  0.941  0.932  0.904
imdb 1 1398  688  580  434  395  291 0.980  0.982  0.983  0.983  0.983  0.983
  2 955  844  728  482  417  292 0.980  0.982  0.983  0.983  0.983  0.983
  3 1290  981  791  481  413  292 0.980  0.982  0.983  0.983  0.983  0.983
  5 1499  982  791  482  415  289 0.980  0.982  0.983  0.983  0.983  0.983
citeseer 1 184  98  82  71  65  60 0.934  0.945  0.945  0.946  0.946  0.946
  2 211  126  96  74  65  58 0.935  0.945  0.945  0.946  0.946  0.946
  3 213  125  94  71  65  60 0.935  0.945  0.945  0.946  0.946  0.946
  5 211  126  95  72  64  59 0.935  0.945  0.945  0.946  0.946  0.946
ds2 1 2199  1791  2979  3738  4139  3959 0.706  0.705  0.713  0.724  0.727  0.712
  2 2130  6023  8089  4278  3127  3023 0.705  0.715  0.720  0.723  0.727  0.717
  3 4090  6312  5534  3942  3488  2580 0.705  0.714  0.720  0.726  0.726  0.723
  5 7158  8771  7300  5543  4269  2573 0.705  0.714  0.720  0.726  0.726  0.723
ds1.100pca 1 152  149  133  129  107  50 0.918  0.918  0.918  0.913  0.909  0.895
  2 104  118  104  108  277  211 0.919  0.919  0.919  0.913  0.910  0.893
  3 141  135  132  275  135  75 0.919  0.919  0.919  0.914  0.909  0.893
  5 147  139  132  111  99  82 0.919  0.919  0.919  0.914  0.909  0.893
ds1.10pca 1 34  33  34  32  27  26 0.845  0.845  0.845  0.842  0.826  0.839
  2 27  31  35  38  38  10 0.846  0.846  0.846  0.846  0.845  0.842
  3 11  11  17  40  11  10 0.845  0.846  0.846  0.846  0.845  0.842
  5 12  10  11  11  11  10 0.846  0.846  0.846  0.846  0.845  0.842

Table 5.17 shows results for moderate epsilons, while Table 5.18 shows results for tight epsilons. It has two halves, with times on the left and AUC scores on the right. Within each half, the columns represent the six rrlambda values. The table is broken into six sections horizontally, one for each dataset. Within each section are four rows representing the four cgwindow values. Labels for the rrlambda and cgwindow values are listed above or to the left of their columns or rows, respectively.

We will employ a single-elimination strategy for finding good values of rrlambda and cgwindow. Any parameter value which causes uniformly poor AUC scores for a dataset will be removed from consideration. After all remaining AUC scores are adequate, the same procedure will be applied to the run times. With some luck, at least one combination of rrlambda and cgwindow will survive.

Examination of AUC scores in Tables 5.17 and 5.18 immediately eliminates rrlambda values 100 and 500. A closer look at the Tight Epsilon table eliminates rrlambda=1 because of citeseer and ds2. Similarly ds1 eliminates rrlambda=50. Several rows with cgwindow=1 show poor scores in the remaining rrlambda columns, and in one case these experiments are slower as well.

When times are considered we see that setting cgwindow=5 is no better than cgwindow=3, and sometimes much worse. Our candidates are now cgwindow=2 or cgwindow=3, and rrlambda=5 or rrlambda=10. Within these restrictions cgwindow=3 and rrlambda=10 produce consistently good AUC scores and times, while other combinations occasionally excel and occasionally plummet.

It is clear that we would benefit from tuning our parameters for every experiment, but we feel this is not reasonable. For all remaining IRLS experiments in this thesis we set rrlambda to 10 and cgwindow to 3. While this static assignment may handicap IRLS in experiments with alternative algorithms in Chapter 6, we believe a small penalty is less important than the benefit of eliminating experiment-by-experiment tuning.


next up previous contents
Next: 5.2.2 Indirect (IRLS) Termination Up: 5.2.1 Indirect (IRLS) Stability Previous: 5.2.1.14 Stability Tests: Conclusions   Contents
Copyright 2004 Paul Komarek, komarek@cmu.edu