8 Forecast evaluation

Where possible, the accuracy evaluation should be handled by existing tidymodels tools such as yardstick. It is likely that some changes or extensions will be needed for full support of time series accuracy metrics.

8.1 Accuracy

The forecast package implements accuracy as a function which is applied to a model. Out of sample accuracy can be computed by additionally providing a test set.

It is probably more transparent to compute accuracy metrics by directly providing actual response values and model predictions.

8.2 Model vs data centric

forecast is model centric

# forecast
accuracy(f = forecast, x = new_ts)

yardstick is data centric https://github.com/r-lib/generics/pull/22

# yardstick
fit_tbl %>% 
  accuracy(col1, col2)

8.3 Proposed fable API

8.3.1 Desirable functionality

By default, accuracy() should provide a basic set of measures of fit for both models (mdl_df) and forecasts (fbl_ts), similarly to the forecast package (perhaps only MAE, RMSE/MSE, and MAPE by default).

It should be sufficiently flexible to support analysts in calculating a wide variety of accuracy measures, including:

Point forecast accuracy measures
Interval accuracy measures
Distribution accuracy measures
User specified accuracy measures

The user should be able to specify which measures they wish to compute, including measures exported by fablelite, measures from extension packages, and user specified measures.

8.3.2 Proposed user interface

The accuracy measures to be calculated can be specified as a list of accuracy measure functions as the measures argument. This input will also be flattened, allowing groups of accuracy measures to be defined.

The ... is used to provide additional arguments that will be applied to all accuracy measures (where supported).

For models (mdl_df), no additional inputs are required:

mbl %>% 
  accuracy(
    measures = list(MASE, MAE, ME),
    ...
  )

For forecasts (fbl_ts), the test set must be provided. Additionally, the dataset used for model training can be provided (interface still under consideration) to extend the inputs (required for MASE):

mbl %>% 
  accuracy(
    new_data,
    measures = list(MASE, MAE, ME),
    training_data = NULL
    ...
  )

8.3.3 Implementation details

To achieve this, accuracy measure functions can expect a set of basic inputs from accuracy(). The measures that are required for computation should be used as formals for the function. These inputs include (list is not yet comprehensive and will be added to):

.resid: A vector of residuals from either the training (model accuracy) or test (forecast accuracy) data.
.resp: A vector of responses matching the residuals (for forecast accuracy, the original data must be provided).
.fitted: The fitted values from the model, or forecasted values from the forecast.
.dist: The distribution of fitted values from the model, or forecasted values from the forecast.
.period: The seasonal period of the data (defaulting to ‘smallest’ seasonal period).
.expr_resp: An expression for the response variable.

If a method allows more inputs than this, such as demeaning for MASE, these additional arguments are provided in the dots of the accuracy function.

8.4 Cross validation

CV(tsbl, mdl, h, window_type, ...)

Tidy time series forecasting with fable