team integration and workflow - cccs-web/soc-maps GitHub Wiki

This discussion addresses CCCS' need to formalize appropriate communication protocols and team data management workflow.

Unifying Team Communications for Integrated Development Efforts

As introduced in our discussion of /cccs-web/soc-maps/ application development, CCCS' web application development efforts are currently occuring in two silos—one led by Kartoza and the other Paul Whipp of CCCS.

Kartoza has been managing CCCS-related development via a private repository hosted on GitHub. CCCS and Kartoza agreed to this approach to allow Kartoza to use maintain regular use of their waffle.io 'SCRUM' workflow, while at the same time allowing the Kartoza team to point to client project sites when discussing development needs without risking undue exposure of sensitive information. This approach made sense at the early stages of development, during which we were still trying to figure out how we would manage client projects in relationship to development of the 'core' application. CCCS and Kartoza also agreed that CCCS could raise issues on /cccs-web/soc-maps/ or on the relevant wiki pages of our client projects and assign these to Kartoza (via @gubuntu, the only Kartoza team member who has accepted our invitation to the /cccs-web/soc-maps/ repository) for internal assignment and processing. Similarly, any client-specific requests, such as help with particular stylization and customization, would be raised and assigned within their respective repositories [hosted in CCCS' GitLab server] and re-assigned by Kartoza to individual team members.

The success of the above-defined approach in practice has been limited. Although our team has been responding to issues raised on /cccs-web/soc-maps/, it can sometimes take awhile before issues are found and addressed, and we haven't been doing well to assign and re-assign issues back and forth between team members. Also, monitoring both /cccs-web/soc-maps/ and /kartoza/cccs/ to determine which issues are activly being addressed is difficult and hard to manage. It is possible, however, that future requests could be facilitated by Kartoza's 'soc-maps' waffle.io SCRUM interface once we reach a point where we're actively developing UIs for client-specific reporting&raising numerous specific work tasks and establishing sprints.

On CCCS' side, our web application development has been occurring under the 'staging' branch of /cccs-web/core/ rather than /cccs-web/soc-maps/, which also contributes to the disconnect between our working teams. CCCS originally envisioned that the /cccs-web/soc-maps/ repository would be configured as a "module" that could be attached to the /cccs-web/core/ application in much the same way as our other application modules (e.g. /cccs-web/doc-meta/). Paul: I appreciate that your choice of developing on staging may have something to do with how you are structuring the /cccs-web/core/ architecture. Could you please explain more about this choice and recommend whether we should shift application development over to /cccs-web/core/ rather than to continue with /cccs-web/soc-maps/ ? Please note, however, that it is CCCS' strong preference to retain /cccs-web/soc-maps/. We would like our web application to be deployable as a "core" infrastructure allowing for modular additions, each of which developed in its own designated repo.

Going forward, CCCS hopes to channel all our development discussions through /cccs-web/soc-maps for issues related to application development, and to our respective client project repositories (e.g. /abadi/esms-maps/) for client- and project- specific requests.

Thus, we will raise all coding and development requests and issues /cccs-web/soc-maps, and extensions to application functionality will be cascaded down to client project sites.

This approach will keep all application development public, and the onus of keeping brach client applications and repositories up-to-date becomes a project-specific concern.

Data Management Workflow

As suggested in the discussion about data management, CCCS needs to establish a common workflow for obtaining, indexing, sharing, and version-controlling geospatial data. This workflow should function at the 'project' level, meaning that the data management occurs within the context of a single client project. 'CCCS' equivalent to any other 'client', with the important exception being that any new functionality that we are developing for our web applications needs to occur on CCCS' infrastructure, which will serve as a "template" for deploying future client projects.

As discussed preceding section and in our note on data management, if we are to focus development of map application utilities using /cccs-web/soc-maps/ (unless there's a strong argument in favor of keeping the application in /cccs-web/core/, per the discussion above), then our actual geospatial data needs to be stored and shared in a separate and dedicated repository. The infrastructure that I see as best suited to CCCS' existing data management workflow involves:

  • Git for deployment of the core application architecture [intial set-up and config files]
  • Git or GeoGig for version control and sharing of vector data
  • Git-annex sitting on top of S3 for for version control and sharing of raster data

This git-centric constellation of data management utilities should afford us fine-grained control over data access and allow us to retain detailed records of when and how data are changed and edited. This set-up would be our "canonical" point of truth for all data types. Should a project wish to change source data for their specific purposes without wishing for their changes to affect all other maps relying on those data, then the source data would need to be "forked" for that specific project.

Team Workflow Needs / Considerations for Entering Data into the VCS

Data relevant to geospatial analysis may come in a variety of formats, including spatial data (like shapefiles and geo-referenced images) as well as statistical data like demographic measurements and socio-economic indicator data.

Anyone who obtains new data should prioritize getting it loaded into a VCS-enabled repository ASAP.

When adding new data, it is imperative to 'track' the following metadata:

  • original file name
  • who supplied the data
  • who received the data
  • data data was received
  • initial location where data was entered into the CCCS file system [directory location / path]

Supplying New Data as a Non-Technical User

Any CCCS team member who receives or obtains geospatial data can give it to a CCCS data administrator for loading into the appropriate repositories.

NOTE: If CCCS chooses to use Git (rather than GeoGig) to maintain a working directory of shapefile data, then non-technical users could upload data directly to the repository on their own. Using GeoGig may preclude this option depending on the level of technical complexity involved.

Supplying New Data as a Mapper

Mappers should also upload data new data into a version-controlled repository [Git / GeoGig] in the format that it was received from the data originator (i.e. indexing shapefiles in their original formats in a VCS 'source' directory).

If a mapper wishes to choose how and where data are structured in postgreSQL / PostGIS, then the appropriate work flow is for that mapper to ensure that her or his local database is synced to the master, and then to write a custom loading script that can be executed on the server to load in the source data set from the location where it was entered in the VCS.

CAVEAT: CCCS does not have a clear understanding yet for how GeoGig either differentiates between, or combines, individual shapefiles and structured postgreSQL databases. One email received from Kartoza (@gubuntu) suggests that, GeoGig is exported into PostGIS on the server "such that the server database version that is supporting the web maps is always canonical". It may be the case that work through GeoGig (potentially via post-deployment hooks) can decrease or eliminate direct interaction with the database when entering new materials into the repository?