rTrawl: trawl processes in R - rstats-gsoc/gsoc2018 GitHub Wiki

Background

High frequency financial time series of prices exhibit three key characteristics: they are integer-valued, update in continuous time, and exhibit autocorrelation in the returns. Those three features are captured by trawl processes, as utilised in Shephard & Yang (2017). This class of continuous-time stochastic processes is a subclass of the ambit processes, introduced in Barndorff-Nielsen & Schmiegel (2007). The flexibility of modelling the autocorrelation, while keeping the analytical tractability of Lévy processes is one of the main reasons trawl processes are attractive in a variety of applications. Besides modelling intra-day price series, trawl processes have been applied in Barndorff‐Nielsen et al. (2014) to the bid-ask spread, in Veraart (2018) to the stochastics of a limit order book, and in Noven et al. (2015) to several environmental time series.

So far, published applications have been descriptive. However, trawl processes are not limited to this. When the trawl process is estimated, intra-day realised variance is estimated more accurately, and a decomposition into the contribution of the efficient price process and market microstructure is achieved. This decomposition is important for short-term risk evaluation. In addition, the trawl characterised the full dependence structure, which can be utilised for short term predictions. These prediction can for example be trading signals at a high frequency.

Related work

The above mentioned series of papers introduce and apply trawl process in different subject areas. However, none of the code is publicly available. In fact, as far as we know, there is no open source code available replicating the above mentioned papers, not in R or any other programming language. Hence, researchers and practitioners wanting to explore this novel exciting area have to implement everything from the ground up.

Details of the coding project

The main focus of this project will be on implementing and fully testing an R package for simulation and estimation of trawl processes. The first part of the project consists of building a flexible simulation framework. Built-in methods should handle a variety of parametric trawl functions and Lévy seeds, both integer-and real-valued. In addition, non-parametric trawls and Lévy seeds make it possible for the user to specify non-implemented functions. The framework should be implemented in such a way that it is straightforward to add new trawls and Lévy seeds.

After a flexible simulation framework is put in place, the ability to estimate trawl processes, is important. Currently, a two-step approach for pure trawl processes is proposed in Barndorff‐Nielsen et al. (2014). When an independent Lévy component is added to the trawl process, a realised-variance based procedure is given in Shephard & Yang (2017). In addition, it should also be possible to estimate such processes based on autocorrelations of the return process at different lags or observation frequencies. The second part of the project consists of implementing these methods, while combining them in a user-friendly function with easy-to-use options. Since these estimators depend on some tuning variables, sensible default values have to be investigated and provided.

For simulation and estimation, the package requires some common statistics such as realised variance and autocorrelations at different lags. When dealing with homogeneously observed time series, this would be straightforward. However, at a microsecond frequency, regularly spaced observations capturing the same information would require a large amount of zero return values, leading to memory inefficiency and slow speeds. To deal with this, C functions optimised for irregularly spaced grids will be explored.

Due to the nature of the applications, computation speed needs to be as fast as possible. For this, we will make use of C code to speed up bottlenecks, while efficiently programming the majority of the code in R. We are confident that it is possible to handle time series with a large number of observations in an efficient manner, hence improving research possibilities by means of faster Monte Carlo studies.

Milestones

Phase 1

The first phase consists of implementing the flexible simulation framework, including different parametric and non-parametric trawls and Lévy seeds.

Phase 2

Next, the aim is to provide coherent all-containing estimation package. Hence, the different estimators discussed above need to be implemented.

Phase 3

In the last stage, the project is finalised by commenting the code and providing examples for the use of the newly implemented estimators. A vignette showing an overview of the different simulation options and estimation techniques will be written which also clearly explains the differences in use and best application practices.

Expected impact

Currently, no open source code is available. Hence, industry professionals as well as academic researchers wanting to explore this exciting area will have a reliable, comprehensible R package as basis. This will lower the entry barrier, facilitating advancements in the field.

Mentors

Brian Peterson, primary author of PerformanceAnalytics.

Kris Boudt, Associate Professor of Finance and Econometrics, Vrije Universiteit Brussel and Vrije Universiteit Amsterdam.

Tests

Applicants have to be able to show that they have:

  • A good working knowledge of programming in R and C;
  • A good working knowledge of Roxygen for the documentation;
  • A good working knowledge of Rmarkdown/LaTeX for the vignette;
  • Familiarities with the construction of R packages;
  • Good coding standards (Google’s C and R style guide);
  • A good theoretical understanding of Lévy and trawl processes;

Students should show their motivation by following the points below:

  • Easy: Create a function simulating a trawl process with Negative Binomial Lévy seed and inverse Gaussian trawl. The procedure should be reasonably fast for 50,000 observations;
  • Medium: Replicate the paper Shephard & Yang (2017). The estimators should be reasonably fast for 50,000 observations;
  • Hard: Implement a non-parametric estimation method for the trawl function based on realised variance or autocorrelation.

Solution to tests

Students, please post a link to your test results here.

References

Barndorff‐Nielsen, O. E., Lunde, A., Shephard, N., & Veraart, A. E. (2014). Integer‐valued Trawl Processes: A Class of Stationary Infinitely Divisible Processes. Scandinavian Journal of Statistics, 41(3), 693-724.

Barndorff-Nielsen, O. E., & Schmiegel, J. (2007). Ambit processes; with applications to turbulence and tumour growth. In Stochastic analysis and applications (pp. 93-124). Springer, Berlin, Heidelberg.

Shephard, N., & Yang, J. J. (2017). Continuous time analysis of fleeting discrete price moves. Journal of the American Statistical Association, 112(519), 1090-1106.

Veraart, A. (2018). Modelling, Simulation and Inference for Multivariate Time Series of Counts Using Trawl Processes. SSRN preprint: https://ssrn.com/abstract=3100076

Noven, R. C., Veraart, A. E., & Gandy, A. (2015). A latent trawl process model for extreme values. arXiv preprint:1511.08190.