SantaFe - RebeccaSalles/TSPred GitHub Wiki
The Santa Fe Time Series Competition Experiment
The Santa Fe Time Series Competition (Weigend, 1993) was devised with 6 different time series collected from 6 very different origins. The main benchmark of the competition is the univariate time series A, which is derived from laser-generated data recorded from a Far-Infrared-Laser in a chaotic state.
Time series A is composed of a clean low-dimensional nonlinear and stationary time series with 1,000 observations, depicted in Fig. 1.
Competitors were asked to correctly predict the next 100 observations. img/SantaFe.DataSetA.png
Fig. 1 Santa Fe Competition - time series A
We have also used the competition’s time series D, which is also univariate. Moreover, time series D is a computer-generated, four-dimensional nonlinear time series with non-stationary properties. It contains 100,000 observations and can be observed in Fig. 2.
Competitors were asked to correctly predict the next 500 observations of this time series. img/SantaFe.DataSetD.png
Fig. 2 Santa Fe Competition - time series D
For both time series A and D, the performance evaluation done by the Santa Fe Competition was based on the NMSE errors of prediction found by the competitors.
The time series A is present in TSPred R-Package as SantaFe.A, and times series D as SantaFe.D. The values which were to be predicted of these two times series are also present as SantaFe.A.cont and SantaFe.D.cont, respectively.
Experiment R-Scripts
#Install DMwR package, used for calculating NMSE errors
> install.packages("DMwR")
#Load DMwR package
> library("DMwR")
For time series A:
#Load the datasets SantaFe.A and SantaFe.A.cont
> data(SantaFe.A,SantaFe.A.cont)
#Automatically fits an ARIMA model to SantaFe.A and predicts the values in SantaFe.A.cont
#Also plots the predictions against SantaFe.A.cont
> pred <- marimapred(SantaFe.A,SantaFe.A.cont,plot=TRUE)
#Calculates the NMSE error of prediction between pred and SantaFe.A.cont
> ts.eval(ts(SantaFe.A.cont), ts(pred), stats = 'nmse', train.y = ts(SantaFe.A))
For time series D:
#Load the datasets SantaFe.D and SantaFe.D.cont
> data(SantaFe.D,SantaFe.D.cont)
#Automatically fits an ARIMA model to SantaFe.D and predicts the values in SantaFe.D.cont
#Also plots the predictions against SantaFe.D.cont
> pred <- marimapred(SantaFe.D,SantaFe.D.cont,plot=TRUE)
#Calculates the NMSE error of prediction between pred and SantaFe.D.cont
> NMSE <- ts.eval(ts(SantaFe.D.cont), ts(pred), stats = 'nmse', train.y = ts(SantaFe.D))
#Calculates the NMSE error of prediction between the first 15, 30 e 50 values of pred and SantaFe.D.cont
> NMSE15 <- ts.eval(ts(head(SantaFe.D.cont, 15)), ts(head(pred, 15)), stats = 'nmse', train.y = ts(SantaFe.D))
> NMSE30 <- ts.eval(ts(head(SantaFe.D.cont, 30)), ts(head(pred, 30)), stats = 'nmse', train.y = ts(SantaFe.D))
> NMSE50 <- ts.eval(ts(head(SantaFe.D.cont, 50)), ts(head(pred, 50)), stats = 'nmse', train.y = ts(SantaFe.D))
#Binds the NMSE errors in a vector
> cbind(NMSE15=NMSE15,NMSE30=NMSE30,NMSE50=NMSE50,NMSE500=NMSE)
Example of plotted graphic:
Fig. 3 ARIMA predictions (solid line) for the time series of dataset A of the Santa Fe Competition. The actual time series values are represented by the dashed line.
General R-Functions
For time series A:
> ARIMA.SantaFe.A <- function(TimeSeries, TimeSeriesCont, plot=FALSE){
if(is.null(TimeSeries)) stop("TimeSeries is required and must have positive length")
if(is.null(TimeSeriesCont)) stop("TimeSeriesCont is required and must have positive length")
Predictions <- marimapred(TimeSeries, TimeSeriesCont, plot=plot)
NMSE <- ts.eval(ts(TimeSeriesCont), ts(Predictions), stats = 'nmse', train.y = ts(TimeSeries))
return (cbind(NMSE=NMSE))
}
Example:
> ARIMA.SantaFe.A(SantaFe.A,SantaFe.A.cont,plot=TRUE)
For time series D:
> ARIMA.SantaFe.D <- function(TimeSeries, TimeSeriesCont, plot=FALSE){
if(is.null(TimeSeries)) stop("TimeSeries is required and must have positive length")
if(is.null(TimeSeriesCont)) stop("TimeSeriesCont is required and must have positive length")
Predictions <- marimapred(TimeSeries, TimeSeriesCont, plot=plot)
NMSE <- ts.eval(ts(TimeSeriesCont), ts(Predictions), stats = 'nmse', train.y = ts(TimeSeries))
NMSE15 <- ts.eval(ts(head(TimeSeriesCont, 15)), ts(head(Predictions, 15)), stats = 'nmse', train.y = ts(TimeSeries))
NMSE30 <- ts.eval(ts(head(TimeSeriesCont, 30)), ts(head(Predictions, 30)), stats = 'nmse', train.y = ts(TimeSeries))
NMSE50 <- ts.eval(ts(head(TimeSeriesCont, 50)), ts(head(Predictions, 50)), stats = 'nmse', train.y = ts(TimeSeries))
return (cbind(NMSE15=NMSE15,NMSE30=NMSE30,NMSE50=NMSE50,NMSE500=NMSE))
}
Example:
> ARIMA.SantaFe.D(SantaFe.D,SantaFe.D.cont,plot=TRUE)
References
A.S. Weigend, 1993, Time Series Prediction: Forecasting The Future And Understanding The Past. Reading, MA, Westview Press.