GSoC 2017 - mablab/rpostgisLT GitHub Wiki
The rpostgisLT
package started as Google Summer of Code (GSoC) project in 2016. The following page summarizes my experience as GSoC student and main developer during the GSoC 2017.
Mentoring organization of the Google Summer of Code 2016: R project for statistical computing
Mentors:
- Mathieu Basille ([email protected])
- David Bucklin ([email protected])
- Clément Calenge ([email protected])
Developer: Balázs Dukai ([email protected])
During GSoC 2017 the package has been developed by me and the changes between v0.5.0 and v0.6.0 are my contribution. Release v0.6.0 summarises my work during this development period. Additionally this is the list of commits made by me https://github.com/mablab/rpostgisLT/commits/v0.6.0?author=balazsdukai
The main outcome of GSoC 2017 is the explorePgtraj
function which wraps a Shiny app. There are several functions that support the operation of this app and they are stored in the utils-shiny.R
, createShinyViews.R
scripts. Additionally, I sanitized the unit tests with the testthat
package and included two continuous integration services (Travis CI and AppVeyor), and code coverage reporting. In the section The MoSCoW deliverables below, those elements are ticked that I managed to complete.
My name is Balázs Dukai and currently I am a student of the MSc Geomatics program at the Delft University of Technology. However I took a bit of a detour before eventually ending up in the field of geomatics. First I studied and practiced landscape architecture, then I did several courses on data science and programming with R being the first scripting language I have learnt. After having some exposure to geographical information systems (GIS) and land surveying I decided to pursue the career of spatial data acquisition, management and analysis.
After a successful GSoC 2016 we stayed in touch with the project mentors. We kept developing the package and started collaborating on scientific articles that showcase its use. While in 2016 we focused on data storage and management, it was clear that the package need a front-end that allows the user to interactively explore trajectories. After some discussion we decided that a second round of GSoC would greatly boost the development thus we gave it a go.
The project proposal defines that the expected output is a functional interactive trajectory viewer fully integrated with rpostgisLT
to create, edit, and analyze/explore pgtraj
. This tool is expected to:
- have an interactive map which can load spatial data into the map from the database, or the current data environment
- create
pgtraj
from loaded data sets/database tables - edit
pgtraj
(cut trajectories into bursts, remove/add locations or steps, filter trajectories according to step attributes) - analyze
pgtraj
trajectory with step-by-step control (as inadehabitatLT::trajdyn
), providing step attribute information (length, time, angle, etc) - load and animate one to many
pgtraj
simultaneously - extract data from other spatial data sets (points, polygons, lines, rasters) to trajectories and add it to
pgtraj
'infolocs'
As the first step, I asked my mentors to translate the above expectations into must, should, could, won't deliverables. This approach helped tremendously in achieving a common understanding and focusing my efforts.
- Interactive editing of the infolocs of trajectories, by clicking on the trajectory and manually editing the desired field in the dedicated Infolocs GUI section.
- Display a single trajectory, entirely, or parts of it (k steps at a time)
- Display other trajectories in the background
- Dynamic display: step by step, with possible to jump n steps at a time
- Dynamic display of several trajectories at a time
- Interaction with keyboard to allow for fast operation
- Display should be smooth and fast
- GIS capabilities: additional layers (geometries and raster), zoom, click on features to get more information
- Everything should be also feasible with the mouse
- Infolocs should be editable on the fly (using keyboard)
- Create a pgtraj or ltraj from points.
- Additional editing stuff (cutting into bursts, etc.)
- Editing the “path” of a trajectory by dragging/moving its relocations. For example as one would edit an object in a drawing program (Inkscape).
Learning from the previous GSoC where I often failed to accurately estimate how long a task takes, particularly if I never did it before, this year I adopted a more relaxed project management. This involved less micro-planning and more prioritization based on the MoSCoW deliverables. Furthermore, I started utilizing GitHub Issues to document features and bugs. This helps in making the project more transparent for future contributors.
We adopted the Git Flow branching model.
What we've set out to do in the GSoC 2017 was new not only for me but also for my mentors. We were not certain which technology or framework would serve our goals the best, therefore is was essential to start the project with a reasearch phase. The choice of technology has obviously a high impact on the outcome and maintainability of the package, thus it was important to make an informed decision.
During the research phase I read lots of documentation and built small proof-of-concepts in order to figure out how to continue. I put together a simple requirements matrix in order to maintain objectivity.
Initially, I was very much in favor of developing a QGIS plugin for rpostgisLT
, instead of replicating some of QGIS's functionality in R. However, that would have required the user to have QGIS installed and be familiar with its operation. Additionally, QGIS plugins are written in python and attracting contributors could be very difficult considering the tiny user base of rpostgisLT
+ adehabitatLT
.
Therefore, after several tests I set out to build a Shiny app with Leaflet.
Providing smooth interaction with the map was one of the main challenges that we expected. Therefore I started working on that from early on. The traj data model defines that a step is composed of two relocations. If each step is modeled by a linestring, then long trajectories can easily contain several hundred-thousand or even million steps. Plotting such amount of linestrings with Leaflet is unfeasible, plus querying this amount of data real-time from a traj schema can become slow.
To overcome these difficulties I decided to materialize the two PostgreSQL views that are used by the app, all_burst_summary
and step_geometry_<pgtraj>
(see the traj database model). Even though this means data duplication, it is necessary for achieving the desired performance, because the materialized views are indexed and the table joins are omitted. To reduce the number of features for Leaflet, I introduced two modes of operation, Step Mode ON/OFF. In Step Mode ON, each step is an individual linestring which allows to display all step parameters, on the cost of slow operation. In Step Mode OFF, the steps are united into a single linestring within the selected time window, therefore the amount of linestrings on the map is very low. Therefore in Step Mode OFF the display performance is fast but the information display is minimal (only burst and animal name).
Developing Shiny apps was a new territory for me in general. I've played with Shiny previously but never had the chance build something that is used by others. The Data explorer tab, or feature, was something that I wanted to see through, because I believe that the possibility to see what is in a traj
schema would make rpostgisLT
much more user friendly. However this feature was not planned originally, thus it had a low priority for GSoC 2017.
You can find the development road map with the remaining features here.
The highlight of this project was the development of a Shiny app that allows the interactive exploration of spatial data in a database. I believe developing such applications is a valuable skill to have and I'm glad that I had the chance to learn it.
Secondly, database query optimization played an important role in the project and it is a skill that was on my bucket list since a while.
Unit tests and continuous integration are tools that help a developer write more robust, reliable software therefore I'm happy to have that under my belt.