Evaluation of Quantile Quality - Eﬃcient Numerical Methods for Adaptive Quantile Regression

breaking with the requirements of WPPT if the parallelization should occur on a higher level. The optimal parallelization would be obtained by letting WPPT do it and perhaps change the update predictor function call so that multiple instances can easier be run in parallel.

8.2.3 Interior Point Method and Simplex

During the program development, there has been very little doubt that the simplex method performs much better than the interior point method in the case of the continious adaptive updates.

The comparative performance tests showed that the simplex method was indeed much faster than the interior point method, but at the cost of having to update the predictor each time new data enters the module. The tests were carried out under controlled circumstances, on a unloaded system and with identical setups.

The reason why the simplex method is faster is because each iteration requires less computations than the interior point method and the reuse of the optimal solution from the previous run significantly lowers the number of iterations required to find the new optimal solution.

Even though the simplex method does not work very well with infrequent up-dates, it might be possible to obtain good results by calculating the vertex from the previous predictor rather than by relying on the vertex being delivered from the last run. This might potentially lead to even better performance since the less computationally expensive simplex method may be used in setups where the predictor is only updated weekly.

8.3 Evaluation of Quantile Quality

As part of the validation tests, the two measures, reliability and skill score, have been used in different ways to provide a picture of how well the calculated quantiles describe the active data in the system and how well the prediction describes the data to come.

8.3.1 Adaptivity and Quantile Reliability

Testing the reliability of the calculated quantiles has been an interesting exercise.

The study on reliability of the data within the program throughout a time series showed that the reliability can be compromised by having too many zeros in the system. The distribution of predictions is not very uniform and a large number of zero predictions exist. This is well within the theory of power prediction, but for estimating the prediction uncertainty using linear quantile regression and splines, the zeros cause the system to become unreliable.

The first coefficient of the splines is the intercept. If there is a large concentration of errors at zero in the system, the optimization algorithm will try to limit the loss function, which causes the intercept to be placed either so it does not correctly describe what is going on at zero or so the offset makes it difficult to describe the rest of the distribution.

The problem with zeros leads to an adjustment of the algorithm in which a penalty was assigned to duplicate points. This effectively cured the problem with reliability, but other methods might work just as well. The penalty based replacement of old values is a general solution, which should perform well on data with frequently occurring inputs different from zero.

Another method for handling this would be to treat zero predictions as a special case using another model. The model would not need to be very advanced, but it would take the load off the rest of the system. The penalty based model does, however, still present a significant improvement towards the quality and robustness of the system and should not be removed unless something replaces it. The penalty rules might be slightly adjusted so point age would have a relatively larger importance, but this is highly application specific.

8.3.2 Predictions

The results from the tests where the predictor was used to predict the uncer-tainty showed a surprising lack of relation between predictor update frequency and quality of the probabilistic uncertainty. Even if the predictor was only up-dated once, the skill score was not significantly worse. This is very good news in terms of using the program as a module in WPPT, since the computational requirements can be overcome by updating the predictor less frequently.

A number of factors which may contribute to this lack of relation between pre-dictor age and quantile quality have already been mentioned. If the prediction

8.3 Evaluation of Quantile Quality 155

errors are fairly symmetrical or around zero, the quantiles may show good per-formance in terms of reliability over the long run, due to the averaging nature of the test. The moving average skill score does, however, seem to contradict that this is caused by averaging effects, because the trend can be followed through most of the time series.

Yet another possibility is that the relatively large uncertainty or standard devi-ation of the wind power prediction, which can be seen in Section7.7, limits the precision of the predicted uncertainty quantiles and thus diminishes the need for very advanced updating schemes to catch minor variations.

In order to determine whether or not this is true, much larger data sets need to be analyzed. The data set for most of the tests consisted of approximately two years worth of data, but it was shown that it took almost a year for the adaptive algorithm to settle and provide optimal uncertainty predictions. It would be very interesting to see if the uncertainty predictions could get better by using other and more explanatory variables.

8.3.3 Selecting Other Explanatory Variables

If wind predictions had been used instead of power predictions, the problem with reliability might never have been identified. The problems encountered during evaluation of the reliability were mainly caused by the threshold in the power curve model. It would be interesting to observe if the uncertainty predictions could be improved if the variables available to WPPT were used instead of just the predicted power. This is indeed possible with the implemented algorithm, but there is a risk of data thinning if too many bins are defined in each explana-tory variable. The penalty based replacement model does remove some of the reasons for using bins, so perhaps tests with many explanatory variables can be performed easily by using only few bins.

The reason for using additional explanatory variables apart from the predicted power is that if there is a correlation between prediction error and some other factor, this is impossible to include if only the predicted power is used to explain this. For instance if the power prediction reads zero because the wind prediction is just below a certain threshold, the chances for power to be produced would be larger than if the power prediction was zero due to zero wind. To the system only looking at predicted power, this information is completely lost.

Another very interesting parameter to take into account would be the uncer-tainty of weather predictions. If the company doing the weather forecasts had a similar system to the one implemented in this thesis for estimating

probabilis-tic uncertainties in their predictions, these uncertainty numbers could be taken into account when estimating the power prediction uncertainty, and this should ultimately lead to more accurate results.

8.3.4 Treatment of Horizons

It is well known, and has also been shown in the skill score tests, that the uncertainty of power predictions are related to the horizon in which they are predicted. From the Matlab implementation, the idea of splitting the data set in the number of horizons was continued, but there are some basic problems with this when other explanatory variables are considered. Treating each horizon as a separate data set offers the possibility of calculating a unique quantile predictor for each horizon and thus adjust for the increased uncertainty with longer prediction horizons.

There are some unfortunate limitations to this strategy, because the system, with its isolated data sets for each horizon, is oblivious to time skewed events in the incoming data. If for instance the weather prediction agency continually had a one-hour time skew in wind predictions for a particular windmill farm location, the program would only be able to adjust for this as an unknown error, but if system was structured substantially differently, it could potentially be able to adjust uncertainty estimates with this time skew.

Another event which goes completely unnoticed in the system is the relation between points in the time series - if a whole day with good power production is predicted, the program only sees this as a time series of independently ”good power” predictions. It is clear that if there is a 50% power production at time t, and the predicted power at t+ 1 hours is also 50%, this prediction will be more likely to be correct, than a 50% prediction one hour after having had 0%

power production for a while.

Changing the behavior of the program to incorporate capabilities for time skewed events and relations between the individual points in the predicted time series would require large rewrites of the code - in fact it would most likely be a completely different program.

Simply including the horizon as an explanatory variable could, however, easily be done and it might provide some additional information. If weather forecasts were used as explanatory variables, the age of these could also be interesting to include. If the strict separations between horizons were dropped, the problem with the need for a large training set might also be removed, because points from all horizons would enter the same model.

In document Eﬃcient Numerical Methods for Adaptive Quantile Regression (Sider 165-169)