software_architecture - RebeccaSalles/TSPred GitHub Wiki
TSPred Architecture
The main classes that represent the concept and structure of the TSPred framework are depicted in Figure 1. The main class of the framework is named tspred, standing for "time series prediction". An instance of this class represents a particular time series prediction application.
The tspred class has three main parts, namely the processing, modeling, and evaluating. The processing class corresponds to general preprocessing and transformation methods (represented by the prep class) and their respective reverse transformations (represented by the postp class). The modeling class corresponds to methods for training and predicting a particular time series model (represented by the train and pred classes, respectively). Finally, the evaluating class corresponds to methods for evaluating quality of predictions and model fitness. All prep, postp, train, pred and evaluating classes contain two main attributes, namely the function that implements their respective method (func) and a list of its parameters to be passed as input (par).
Generally the tspred class contains (i) one processing object for subsetting a time series into training and evaluation datasets (subsetting attribute); (ii) none or several processing objects for preprocessing/transforming and postprocessing/reverse transforming time series data (processing attribute); (iii) one modeling object for modeling and predicting a time series (modeling attribute); and (iv) none or several evaluating objects for evaluating the modeling and/or prediction of the given time series (evaluating attribute). Besides the aforementioned attributes, the tspred class also contains the input and output elements necessary for performing the activities of time series prediction. The attribute data contains all the time series data concerning a particular application. It contains the original time series which is given as input to (i); the resulting training and evaluation datasets which are given as input to (ii); and the resulting preprocessed/transformed datasets which are given as input to (iii). The attribute model contains the fitted model object generated by (iii). The attributes n.ahead and onestep are also passed to (iii) setting the prediction horizon and the type of prediction to be performed (multistep-ahead or one-step-ahead), respectively. The attribute pred contains predicted data, that is, the predicted data produced by (iii), which is given as input to (ii); and the resulting postprocessed/reverse transformed predictions, which are given as input to (iv). Finally, the attribute eval contains the evaluation metrics computed by (iv).
Software Functionalities
The main functionalities of the framework are implemented within the methods of class tspred. Among them, the constructor method (tspred) is responsible for defining (instantiating) a particular time series prediction application. Nearly all class methods return a current tspred object with updated output elements and parameters. Particularly, the subset and preprocess class methods together perform the first activity of time series prediction, i.e., data preprocessing and sampling. While the train, predict, postprocess and evaluate class methods perform the activities of model training, model prediction, data postprocessing, and prediction/model evaluation, respectively. Moreover, the class method workflow encompasses all these activities. Highlight is given to the class method benchmark that ranks the time series prediction applications described by a list of tspred objects based on the collected evaluation metrics.
The package provides several automatized features that can be useful for any time series prediction application such as (i) transformation/model parameter selection; (ii) multistep-ahead or one-step-ahead prediction for both linear and machine learning models; and (iii) machine learning methodology tasks performed during training and prediction activities, among others.
Implementation
Following the framework structure depicted in Figure 1, specific methods that help describe a time series prediction application (represented by the tspred class) can be implemented by extending the processing, modeling, and evaluating classes. This implementation design allows the user to define and apply any customized time series prediction methods according to demand.
Nonetheless, the framework developed includes the implementation of the main nonstationary time series transformation methods, namely the logarithmic transform (LT), Box-Cox transform (BCT), percentage change transform (PCT), moving average smoother (MAS),and simple differencing (DIF), empirical mode decomposition (EMD) and wavelet transform (WT). Furthermore, it implements other relevant preprocessing methods for time series prediction with MLM, including subsetting time series data into training and evaluation datasets; subsetting the time series data into sliding windows; handling of missing values; and finally data normalization via Min-max normalization (MM) andAdaptive normalization (AN) methods.
Methods for training and prediction were implemented by extending (generalizing) the modeling class. A second level of generalization was added in order to specify classes for statistical (linear) or machine learning (MLM) models. The MLM class, in particular, contains attributes with processing objects for performing any necessary machine learning methodology tasks during the training and prediction activities. The sw attribute corresponds to an object for coercing data into sliding windows, while the proc attribute corresponds to a list of processing objects for performing normalization and/or transformation of the data to serve as input for machine learning model training and prediction. The subclasses of the linear class represent the models autoregressive integrated moving average (ARIMA); Holt-Winter’s exponential smooth-ing (HW); theta forecasting (TF); and exponential smoothing state space model (ETS), whereas, the subclasses of the MLM class represent the models generated by feed-forward neural network (NNET); random forest regression (RFrst); radial basis function network (RBF); support vector machine (SVM); multilayer perceptron network (MLP); extreme learning machines network (ELM); convolutional neural network (CNN) and long short-term memory neural network (LSTM), respectively.
The methods for generating metrics for evaluating nonstationary time series prediction and modeling were implemented by extending the evaluating class. A second level of generalization was added in order to specify classes for prediction accuracy (error) measures (error) or model fitting criteria (fitness). The subclasses of the error class represent the prediction accuracy measures Mean Square Error (MSE); Normalized MSE (NMSE); rootmean square errors (RMSE); Mean Absolute Percentage Error (MAPE); symmetric MAPE (sMAPE); and maximal error (MAXError), and the subclasses of the fitness class represent the model fitting criteria Akaike Information Criterion (AIC), corrected Akaike Information Criterion (AICc); Bayesian Information Criterion (BIC); and log-likelihood (logLik), respectively. A description of the implemented metrics can be found in the work of Davydenko and Fildes.
The implementation of the framework structure depicted in Figure 1 and all its subclasses was performed using the R and its provided package resources. The classes of the framework follow the S3 class system. The framework is contained within the version 5.0 of the R-package TSPred for benchmarking nonstationary time series prediction. TSPred is freely available at CRAN which hosts contributed packages for R from users worldwide. For more implementation details, please refer to the documentation of TSPred.