How Adaptive? - Jan Kloppenborg Møller

5.7 How Adaptive?

The previous sections treated adaptive models. This section will address the question of how adaptive a models should be. To study this three different model are examined. These are characterized with the same knot placement as the Basic models in Section5.4, but with three different updating procedures.

These models are now examined with different sizes of the bins. The models are referred to as

Bin1: One gliding window Bin2: Bins defined by the knots.

Bin3: Bins defined by the knots and in the midpoint between each knot.

These models are now studied with respect to the performance parameter that we can visualize. These are reliability distance, skill score, crossings, sharpness, and resolution. The performance parameters can however only by visualized as the overall measure. Such measures are visualized as a function of the number of elements in the design matrix at the end of the test period. The number of elements in the design matrix will not be constant over the test period if many observations are allowed in the bins. This is due to the size of the available data set.

Figure 5.10 shows local reliability distance for the three updating procedures.

There is not a very big difference between these. This figure indicate that we should have quite few element in the design matrix, about 2000. This correspond to about one month in the gliding window case. In the other cases it the will be different from each of the bins.

The reason why the updating procedures looks so similar in the reliability dis-tance, is that the different updating strategies take care of rare events and rare events will not affect reliability distance. Model A1 was an extreme example of this.

The effect of choosing different updating strategies is illustrated in Figure5.11, where the number of crossings, the size of extreme crossings and the mean size of the crossings are plotted. This figure clearly shows unacceptable behavior when the design matrix is small. With few elements in the design matrix the absolute size the crossings are of the same size as the possible interval of forecast. It is seen that the mean size of crossings and the size of extreme crossings stabilizes earlier for Bin 2 and 3, so if we want few elements in the design matrix then an

Local reliability distance for 3 updating procedures

Figure 5.10: Local reliability distance as a function of the number of elements in the design matrixX, for the 3 different updating procedures

5.7 How Adaptive? 115 Performance related to crossings for 3 updating procedures

0 5000 10000

Figure 5.11: Number of crossings, the most extreme crossing in the test set and the average size of the observed crossing as a function of the number of rows in the design matrix.

updating procedure with bins should be chosen. It is also noted that both the mean size of crossings and the size of extreme crossing display random behavior for small design matrices. This means that we could be mislead by a good performance in this sense simply by chance.

Figure 5.11suggests that we should choose the number of elements in the bins such that the number of elements in the design matrix is about 5000. For Bin 3 this could be chosen a somewhat smaller maybe around 2500-3000. So comparing with the reliability plot we should choose Bin 3 and with the number of elements in each bin such that the size of the design matrix become about 3000.

Figure 5.12 shows the skill score for each of the quantiles. The interval score is not shown here, but this is just the sum of the quantile scores. The figure suggest to choose Bin 2 with about 4000 elements in the design matrix. This is also where the crossings begin to stabilize for this model, but the conclusion is quite different from the reliability plot in Figure5.10. The skill score is higher for Bin 3 which has better performance with respect to crossings and reliability combined. I.e. we can choose a smaller design matrix and still avoid extreme crossings.

Figure 5.13shows sharpness and resolution for the three updating procedures.

For small design matrices we can not relay on the sharpness measure since the large number, and extreme size, of the crossings give an unrealistic picture of the size of sharpness. Therefore we should disregard sharpness for design matrices with few elements. The only information we really get from sharpness is that we should choose Bin 2 or 3 and with as few elements as the crossing analysis

Loss function for 3 updating procedures

Figure 5.12: Average loss on the test set for the 3 different updating procedures as a function of the number of rows in the design matrix.

allow. Resolution also reward extreme behavior of the crossings, but we see an extrema of this at around 4000 elements for Bin 2 and 3. So this conclusion is similar to the conclusion from the skill score, and the behavior of the crossing is not extreme any more.

Figure5.14shows the average cpu time per time step and the mean number of simplex steps per time step average is also taken over the two quartiles. The timing is quite close to a linear function and it is fast enough for a real time implementation, for all sizes of the design matrix. For both timing and number of simplex steps we see that Bin 2 and 3 perform better than Bin 1. Figure5.15 shows standard deviation for the number of simplex steps and the cpu timing.

The standard deviation of time displays the same behavior as the mean time, while the standard deviations of the number of simplex steps becomes smaller as we get more elements in the design matrixX.

In document Jan Kloppenborg Møller (Sider 133-136)