Firedata Integrating Cloud Firestore - rstats-gsoc/gsoc2018 GitHub Wiki

Package Repo: https://github.com/Kohze/fireData

Background

Data science does not exist in isolation, but in a rich environment of shared datasets and user projects. While R already offers ways to easily prototype web applications with the Shiny framework, it is highly demanding to connect them to dynamic databases that allow user specific settings or datasets.

Firedata in connection to Google Firebase solves the bottleneck of user specific access and data sharing, but also opens a new perspective in the application development to use R to interactively modify real time platforms databases. Firebase is a quickly developing framework that offers new API routines and mechanisms to improve the user experience. It is now our challenge to reach a critical mass of open source developers to make Firedata a long term sustainable part of the R package environment.

Related work

Firedata is with its API connection to Google Firebase is unique in the ease of use and user experience interacting with shared databases. One of the main benefits of Firebase is that user registrations and logins are handled within firebase and no security relevant personal data is ever exposed, making it a perfect extension of many Shiny packages and newly developed projects.

Firedata is currently the only package in the R space enables such projects. Other than that the R universe offers bigQuery and other cloud database packages, but none of them enables models for secure user registration/login/maintenance.

Details of your coding project

The specific Impact of this GSOC application is to bring the Firedata package to the newest standards and developments Google Firebase introduced the last month. Among those changes are the handling of https requests the move towards streaming connections, but also the newly introduced data structure of Cloud Firestore (in replacement of the realtime database).

Expected impact

Data science is at the core the process towards a fully connected world. R should not stop at reading and analysing web data, but rather be involved at the core of platform databases. With Firedata it is not only possible to create web databases, but also actively modify entries in real time based on the rich prediction pipelines R can offer.

In the first year of Firedata, it is already used in at least 6 platforms such as the Cambridge University systems biology platform SpatialMaps, the code marketplace r-codes.com and other commercial platforms. As interest in Firedata has increased, other developers started their support to maintain and evolve the platform.

The months we expect to reach a broader interest in the community and see the creation of a variety of projects that utilize the cloud and its benefits.

Required Skills

The student applicants must be familiar or familiarize themselves with the working of

  • REST APIs
  • JSON
  • serialization of objects
  • unit testing
  • roxygen/devtools R package development

Mentors

  • Robin Kohze [email protected] is author and maintainer of the Firedata package and was a GSOC student with the R organization 2016 and 2017. He is also part of the https://r-codes.com/ open source project sharing platform.
  • Samuel Schmidt [email protected] is a biochemist and statistician at Cornell University. He was a GSoC mentor with the R organization in 2016 and 2017.
  • Laurent Gatto [email protected] is a data scientist and leading the Computational Preoteomics Unit at Cambridge University. He was a GSoC mentor with the R organization for the mzR bioconductor package.
  • Bert Jehoul [email protected] works at the open knowledge organisation in Belgium and is interested in a wider adaption of the R fireData package and the integration of into openCPU web applications.

Tests

  • Easy: Follow the github instructions and use the download example to get the mtcars dataset.

  • Medium: Setup your own firebase cloud, create your own user account via firedata and upload/download a sample dataset.

  • Hard: Add examples of the package use to the Rd files in the following format

examples{

  #examples (executed in < 5 sec; for checks)
  donttest{
   further examples for users (not used for checks)
  }
  dontrun{ # only if really needed
   examples which should not be executed (e.g. long runtimes)
  }

}

Solutions of tests

Jiasheng Zhu:

Paul Spende:

Sunny Bhadani: