SantaFe - RebeccaSalles/TSPred GitHub Wiki

The Santa Fe Time Series Competition Experiment

The Santa Fe Time Series Competition (Weigend, 1993) was devised with 6 different time series collected from 6 very different origins. The main benchmark of the competition is the univariate time series A, which is derived from laser-generated data recorded from a Far-Infrared-Laser in a chaotic state.

Time series A is composed of a clean low-dimensional nonlinear and stationary time series with 1,000 observations, depicted in Fig. 1.

Competitors were asked to correctly predict the next 100 observations. img/SantaFe.DataSetA.png

Fig. 1 Santa Fe Competition - time series A

We have also used the competition’s time series D, which is also univariate. Moreover, time series D is a computer-generated, four-dimensional nonlinear time series with non-stationary properties. It contains 100,000 observations and can be observed in Fig. 2.

Competitors were asked to correctly predict the next 500 observations of this time series. img/SantaFe.DataSetD.png

Fig. 2 Santa Fe Competition - time series D

For both time series A and D, the performance evaluation done by the Santa Fe Competition was based on the NMSE errors of prediction found by the competitors.

The time series A is present in TSPred R-Package as SantaFe.A, and times series D as SantaFe.D. The values which were to be predicted of these two times series are also present as SantaFe.A.cont and SantaFe.D.cont, respectively.

Experiment R-Scripts

#Install DMwR package, used for calculating NMSE errors
> install.packages("DMwR")

#Load DMwR package
> library("DMwR")

For time series A:

#Load the datasets SantaFe.A and SantaFe.A.cont
> data(SantaFe.A,SantaFe.A.cont)

#Automatically fits an ARIMA model to SantaFe.A and predicts the values in SantaFe.A.cont
#Also plots the predictions against SantaFe.A.cont
> pred <- marimapred(SantaFe.A,SantaFe.A.cont,plot=TRUE)

#Calculates the NMSE error of prediction between pred and SantaFe.A.cont
> ts.eval(ts(SantaFe.A.cont), ts(pred), stats = 'nmse', train.y = ts(SantaFe.A))

For time series D:

#Load the datasets SantaFe.D and SantaFe.D.cont
> data(SantaFe.D,SantaFe.D.cont)

#Automatically fits an ARIMA model to SantaFe.D and predicts the values in SantaFe.D.cont
#Also plots the predictions against SantaFe.D.cont
> pred <- marimapred(SantaFe.D,SantaFe.D.cont,plot=TRUE)

#Calculates the NMSE error of prediction between pred and SantaFe.D.cont
> NMSE <- ts.eval(ts(SantaFe.D.cont), ts(pred), stats = 'nmse', train.y = ts(SantaFe.D))

#Calculates the NMSE error of prediction between the first 15, 30 e 50 values of pred and SantaFe.D.cont
> NMSE15 <- ts.eval(ts(head(SantaFe.D.cont, 15)), ts(head(pred, 15)), stats = 'nmse', train.y =  ts(SantaFe.D))
> NMSE30 <- ts.eval(ts(head(SantaFe.D.cont, 30)), ts(head(pred, 30)), stats = 'nmse', train.y = ts(SantaFe.D))
> NMSE50 <- ts.eval(ts(head(SantaFe.D.cont, 50)), ts(head(pred, 50)), stats = 'nmse', train.y = ts(SantaFe.D))

#Binds the NMSE errors in a vector
> cbind(NMSE15=NMSE15,NMSE30=NMSE30,NMSE50=NMSE50,NMSE500=NMSE)

Example of plotted graphic:

img/DataSetA.png

Fig. 3 ARIMA predictions (solid line) for the time series of dataset A of the Santa Fe Competition. The actual time series values are represented by the dashed line.

General R-Functions

For time series A:

> ARIMA.SantaFe.A <- function(TimeSeries, TimeSeriesCont, plot=FALSE){
    if(is.null(TimeSeries)) stop("TimeSeries is required and must have positive length")
    if(is.null(TimeSeriesCont)) stop("TimeSeriesCont is required and must have positive length")
    
    Predictions <- marimapred(TimeSeries, TimeSeriesCont, plot=plot)
    
    NMSE <- ts.eval(ts(TimeSeriesCont), ts(Predictions), stats = 'nmse', train.y = ts(TimeSeries))

    return (cbind(NMSE=NMSE))
}

Example:

> ARIMA.SantaFe.A(SantaFe.A,SantaFe.A.cont,plot=TRUE)

For time series D:

> ARIMA.SantaFe.D <- function(TimeSeries, TimeSeriesCont, plot=FALSE){
    if(is.null(TimeSeries)) stop("TimeSeries is required and must have positive length")
    if(is.null(TimeSeriesCont)) stop("TimeSeriesCont is required and must have positive length")
    
    Predictions <- marimapred(TimeSeries, TimeSeriesCont, plot=plot)
    
    NMSE <- ts.eval(ts(TimeSeriesCont), ts(Predictions), stats = 'nmse', train.y = ts(TimeSeries))
    NMSE15 <- ts.eval(ts(head(TimeSeriesCont, 15)), ts(head(Predictions, 15)), stats = 'nmse', train.y = ts(TimeSeries))
    NMSE30 <- ts.eval(ts(head(TimeSeriesCont, 30)), ts(head(Predictions, 30)), stats = 'nmse', train.y = ts(TimeSeries))
    NMSE50 <- ts.eval(ts(head(TimeSeriesCont, 50)), ts(head(Predictions, 50)), stats = 'nmse', train.y = ts(TimeSeries))
    
    return (cbind(NMSE15=NMSE15,NMSE30=NMSE30,NMSE50=NMSE50,NMSE500=NMSE))
}

Example:

> ARIMA.SantaFe.D(SantaFe.D,SantaFe.D.cont,plot=TRUE)

References

A.S. Weigend, 1993, Time Series Prediction: Forecasting The Future And Understanding The Past. Reading, MA, Westview Press.

Back to TSPred R-Package