nonstationary review - RebeccaSalles/TSPred GitHub Wiki

An Experimental Review of Nonstationary Time Series Transformation Methods

Most time series models assume the series is stationary, i.e., their statistical properties do not change over time. However, the presence of stationarity is the exception and not the rule in most real-world datasets. If overlooked, nonstationarity properties may cause misleading statistical inferences and bad or unexpected prediction results.

Regarding this issue, starting from version 4.0, the TSPred R-Package provides functions for addressing nonstationarity in time series. These functions implement some of the most researched transformation methods which are known for aiding the prediction of nonstationary time series. Using ARMA as a baseline prediction model, TSPred enables the comparative analysis of the effects of each implemented nonstationary time series transformation method to the problem of time series prediction. In order to perform such comparison, useful benchmark datasets from time series prediction competitions and other real macroeconomic datasets were also made available in the package.

Henceforth, we present the adopted datasets and give means of reproducing a thorough experiment for reviewing nonstationary time series transformation methods using the functions available in TSPred, as described by Manuscript Number KNOSYS-D-18-01175. Results show that suitable nonstationary time series transformation methods provided improvements of more than 30% in prediction accuracy for half of the evaluated time series, and improved the prediction in more than 95% for 10% of the time series.

Datasets

The datasets adopted for comparing the implemented nonstationary time series transformation methods are originated from the time series prediction competitions CATS, NN3 and NN5, and real macroeconomic observations collected by the Institute of Applied Economic Research of Brazil (Ipea).

These datasets present a reasonable number of time series with different types of nonstationarity and statistical properties. These different properties help provide a good discussion on the effects of the implemented transformation methods to time series prediction.

A detailed description of the datasets is presented in the following pages:

The presented datasets may be obtained by the following code:

Loading the datasets:

#Load TSPred package
> library("TSPred")

#Load the datasets CATS, NN3, NN5, Ipea_D, and Ipea_M 
> data(CATS,NN3.A,NN5.A,ipeadata_d,ipeadata_m)

Results and experiment reproducibility

All datasets, implemented functions and results/plots of the described comparative review of nonstationary time series transformation methods (Manuscript Number KNOSYS-D-18-01175) are available in the following RData files.

For the CATS dataset: Exp_Results_CATS.RData
For the NN3 dataset: Exp_Results_NN3.RData
For the NN5 dataset: Exp_Results_NN5.RData
For the Ipea_D dataset: Exp_Results_IpeaData_D.RData
For the Ipea_M dataset: Exp_Results_IpeaData_M.RData

By loading the R workspaces available in the presented RData files, the experiment and results can be reproduced by using the following calls:

#Required packages for the experiment
> install.packages(c("TSPred","KFAS","car","forecast","wavelets","EMD","vars"))
#Required packages for statistical tests
> install.packages(c("urca","tseries","stats","lmtest","car","nortest","plyr"))
#Required packages for results analysis
> install.packages(c("TSPred","KFAS","MuMIn","openair","ggplot2","corrplot","devtools","Cairo","plyr"))
> devtools::install_github("vsimko/corrplot")

> library("TSPred")
> library("Cairo")

CATS: Reproducing the experiment with the CATS dataset:

#Experiment for the CATS dataset
#Statistical properties across all series
> statprop_CATS <- TSstats(CATS)
> View(statprop_CATS)

#Analysis of fittness and prediction of many series and taylor diagram generation
> CATS.time <- system.time(
    results_CATS <- TransformsExp(CATS,CATS.cont,rank.by="MSE")
  )
  results_CATS <- remove_models(results_CATS)

#overall statistics across all series for each metric: transf X transf (all series must be positive(negative))
> stats_CATS <- TransformsExpStats(results_CATS,"none")
> View(stats_CATS$MSE)

#plot taylor diagrams for the transforms predictions of the series
> TaylorDiag_CATS <- plotTaylorDiagrams(results_CATS,CATS.cont)

#plot barplot with the number of times each transform was in the top 5 results of the series
> plotwins_CATS <- plotTransformWins(results_CATS,top=5)
> plotwins_CATS$plot
> file_name <- "transformWins_CATS.pdf"
  CairoPDF(file_name,width=4,height=3)
  plotwins_CATS$plot
  dev.off()

#plot barplot with the number of times each transform had errors "statistically" smaller than other transform
> plotwinsStats_CATS <- plotTransformWinsStats(stats_CATS,metric="MSE")
> plotwinsStats_CATS$plot
> file_name <- "transformWinsStats_CATS.pdf"
  CairoPDF(file_name,width=4,height=3)
  plotwinsStats_CATS$plot
  dev.off()

#plot scatter plot with the number of times each transform was in the top results of the series and
#also the number of times each transform had errors "statistically" smaller than other transforms
> plotAllwins_CATS <- plotAllWins(results=results_CATS,top=5,stats=stats_CATS,metric="MSE")
> plotAllwins_CATS$plot
> file_name <- "allTransformWins_CATS.pdf"
  CairoPDF(file_name,width=5.5,height=4.5)
  plotAllwins_CATS$plot
  dev.off()

#Plot "correlogram" with the p-values resulting from resultsExpStats
> plotTransformStats(stats_CATS)

NN3: Reproducing the experiment with the NN3 dataset:

#Experiment for the NN3 dataset
#Statistical properties across all series
> statprop_NN3 <- TSstats(NN3.A)
> View(statprop_NN3)

#Analysis of fittness and prediction of many series and taylor diagram generation
> NN3.time <- system.time(
    results_NN3 <- TransformsExp(NN3.A,NN3.A.cont,rank.by="MSE")
  )
  results_NN3 <- remove_models(results_NN3)

#overall statistics across all series for each metric: transf X transf (all series must be positive(negative))
> stats_NN3 <- TransformsExpStats(results_NN3,"none")
> View(stats_NN3$MSE)

#plot taylor diagrams for the transforms predictions of the series
> TaylorDiag_NN3 <- plotTaylorDiagrams(results_NN3,NN3.A.cont)

#plot barplot with the number of times each transform was in the top 5 results of the series
> plotwins_NN3 <- plotTransformWins(results_NN3,top=5)
> plotwins_NN3$plot
> file_name <- "transformWins_NN3.pdf"
  CairoPDF(file_name,width=7,height=6)
  plotwins_NN3$plot
  dev.off()

#plot barplot with the number of times each transform had errors "statistically" smaller than other transform
> plotwinsStats_NN3 <- plotTransformWinsStats(stats_NN3,metric="MSE")
> plotwinsStats_NN3$plot
> file_name <- "transformWinsStats_NN3.pdf"
  CairoPDF(file_name,width=7,height=6)
  plotwinsStats_NN3$plot
  dev.off()

#plot scatter plot with the number of times each transform was in the top results of the series and
#also the number of times each transform had errors "statistically" smaller than other transforms
> plotAllwins_NN3 <- plotAllWins(results=results_NN3,top=5,stats=stats_NN3,metric="MSE")
> plotAllwins_NN3$plot
> file_name <- "allTransformWins_NN3.pdf"
  CairoPDF(file_name,width=5.5,height=4.5)
  plotAllwins_NN3$plot
  dev.off()

#Plot and save "correlograms" with the p-values resulting from resultsExpStats
> plotTransformStats(stats_NN3)

NN5: Reproducing the experiment with the NN5 dataset:

#Experiment for the NN5 dataset
#Statistical properties across all series
> statprop_NN5 <- TSstats(NN5.A)
> View(statprop_NN5)

#Analysis of fittness and prediction of many series and taylor diagram generation
> NN5.time <- system.time(
    results_NN5 <- TransformsExp(NN5.A,NN5.A.cont,rank.by="MSE")
  )
  results_NN5 <- remove_models(results_NN5)

#overall statistics across all series for each metric: transf X transf (all series must be positive(negative))
> stats_NN5 <- TransformsExpStats(results_NN5,"none")
> View(stats_NN5$MSE)

#plot taylor diagrams for the transforms predictions of the series
> TaylorDiag_NN5 <- plotTaylorDiagrams(results_NN5,NN5.A.cont)

#plot barplot with the number of times each transform was in the top 5 results of the series
> plotwins_NN5 <- plotTransformWins(results_NN5,top=5)
> plotwins_NN5$plot
> file_name <- "transformWins_NN5.pdf"
  CairoPDF(file_name,width=7,height=6)
  plotwins_NN5$plot
  dev.off()

#plot barplot with the number of times each transform had errors "statistically" smaller than other transform
> plotwinsStats_NN5 <- plotTransformWinsStats(stats_NN5,metric="MSE")
> plotwinsStats_NN5$plot
> file_name <- "transformWinsStats_NN5.pdf"
  CairoPDF(file_name,width=7,height=6)
  plotwinsStats_NN5$plot
  dev.off()

#plot scatter plot with the number of times each transform was in the top results of the series and
#also the number of times each transform had errors "statistically" smaller than other transforms
> plotAllwins_NN5 <- plotAllWins(results=results_NN5,top=5,stats=stats_NN5,metric="MSE")
> plotAllwins_NN5$plot
> file_name <- "allTransformWins_NN5.pdf"
  CairoPDF(file_name,width=5.5,height=4.5)
  plotAllwins_NN5$plot
  dev.off()

#Plot "correlogram" with the p-values resulting from resultsExpStats
> plotTransformStats(stats_NN5)

Ipea_D: Reproducing the experiment with the Ipea_D dataset:

#Experiment for the Ipeadata dataset (ipeadata_d: daily, ipeadata_m: monthly)
#Experiment for the ipeadata_d dataset
#Statistical properties across all series
> statprop_ipeadata_d <- TSstats(ipeadata_d)
> View(statprop_ipeadata_d)

#Analysis of fittness and prediction of many series and taylor diagram generation
> ipeadata_d.time <- system.time(
    results_ipeadata_d <- TransformsExp(ipeadata_d,ipeadata_d.cont,rank.by="MSE")
  )
  results_ipeadata_d <- remove_models(results_ipeadata_d)

#overall statistics across all series for each metric: transf X transf (all series must be positive(negative))
> stats_ipeadata_d <- TransformsExpStats(results_ipeadata_d,"none")
> View(stats_ipeadata_d$MSE)

#plot taylor diagrams for the transforms predictions of the series
> TaylorDiag_ipeadata_d <- plotTaylorDiagrams(results_ipeadata_d,ipeadata_d.cont)

#plot barplot with the number of times each transform was in the top 5 results of the series
> plotwins_ipeadata_d <- plotTransformWins(results_ipeadata_d,top=5)
> plotwins_ipeadata_d$plot
> file_name <- "transformWins_ipeadata_d.pdf"
  CairoPDF(file_name,width=7,height=6)
  plotwins_ipeadata_d$plot
  dev.off()

#plot barplot with the number of times each transform had errors "statistically" smaller than other transform
> plotwinsStats_ipeadata_d <- plotTransformWinsStats(stats_ipeadata_d,metric="MSE")
> plotwinsStats_ipeadata_d$plot
> file_name <- "transformWinsStats_ipeadata_d.pdf"
  CairoPDF(file_name,width=7,height=6)
  plotwinsStats_ipeadata_d$plot
  dev.off()

#plot scatter plot with the number of times each transform was in the top results of the series and
#also the number of times each transform had errors "statistically" smaller than other transforms
> plotAllwins_ipeadata_d <- plotAllWins(results=results_ipeadata_d,top=5,stats=stats_ipeadata_d,metric="MSE")
> plotAllwins_ipeadata_d$plot
> file_name <- "allTransformWins_ipeadata_d.pdf"
  CairoPDF(file_name,width=5.5,height=4.5)
  plotAllwins_ipeadata_d$plot
  dev.off()

#Plot "correlogram" with the p-values resulting from resultsExpStats
> plotTransformStats(stats_ipeadata_d)

Ipea_M: Reproducing the experiment with the Ipea_M dataset:

#Experiment for the Ipeadata dataset (ipeadata_d: daily, ipeadata_m: monthly)
#Experiment for the ipeadata_m dataset
#Statistical properties across all series
> statprop_ipeadata_m <- TSstats(ipeadata_m)
> View(statprop_ipeadata_m)

#Analysis of fittness and prediction of many series and taylor diagram generation
> ipeadata_m.time <- system.time(
    results_ipeadata_m <- TransformsExp(ipeadata_m,ipeadata_m.cont,rank.by="MSE")
  )
  results_ipeadata_m <- remove_models(results_ipeadata_m)

#overall statistics across all series for each metric: transf X transf (all series must be positive(negative))
> stats_ipeadata_m <- TransformsExpStats(results_ipeadata_m,"none")
> View(stats_ipeadata_m$MSE)

#plot taylor diagrams for the transforms predictions of the series
> TaylorDiag_ipeadata_m <- plotTaylorDiagrams(results_ipeadata_m,ipeadata_m.cont)

#plot barplot with the number of times each transform was in the top 5 results of the series
> plotwins_ipeadata_m <- plotTransformWins(results_ipeadata_m,top=5)
> plotwins_ipeadata_m$plot
> file_name <- "transformWins_ipeadata_m.pdf"
  CairoPDF(file_name,width=7,height=6)
  plotwins_ipeadata_m$plot
  dev.off()

#plot barplot with the number of times each transform had errors "statistically" smaller than other transform
> plotwinsStats_ipeadata_m <- plotTransformWinsStats(stats_ipeadata_m,metric="MSE")
> plotwinsStats_ipeadata_m$plot
> file_name <- "transformWinsStats_ipeadata_m.pdf"
  CairoPDF(file_name,width=7,height=6)
  plotwinsStats_ipeadata_m$plot
  dev.off()

#plot scatter plot with the number of times each transform was in the top results of the series and
#also the number of times each transform had errors "statistically" smaller than other transforms
> plotAllwins_ipeadata_m <- plotAllWins(results=results_ipeadata_m,top=5,stats=stats_ipeadata_m,metric="MSE")
> plotAllwins_ipeadata_m$plot
> file_name <- "allTransformWins_ipeadata_m.pdf"
  CairoPDF(file_name,width=5.5,height=4.5)
  plotAllwins_ipeadata_m$plot
  dev.off()

#Plot "correlogram" with the p-values resulting from resultsExpStats
> plotTransformStats(stats_ipeadata_m)

Back to TSPred R-Package