Sandbox integration - AtlasOfLivingAustralia/profile-hub GitHub Wiki

Requirements

  1. As a taxonomist, I want to be able to upload occurrence data that is not ready for public consumption so that I can access that data while creating species treatments/profiles prior to 'publication'.
  2. As the owner of a private profiles collection, I want to be able to upload occurrence data for use within the collection without making that data publicly accessible, so that I can create profiles using sensitive (scientific, cultural, etc) data for use by authorised users.
  3. As the owner of 'private' occurrence data, I want to be able to 'publish' that data to the ALA directly from the Profiles application so that I can make my work publicly accessible without having to go through the upload process again.

Approach

The Sandbox application allows data to be uploaded and manipulated separately to the Biocache instance(s). Sandbox is essentially a simple upload interface to the biocache - i.e. the Sanbox installation actually installs an instance of the Biocache (biocache-service, ala-hub, solr, cassandra, biocache-cli).

Using the Sandbox and allowing Profiles to extract data from either Biocache or Sandbox would achieve requirement #1.

However, neither the Sandbox nor Biocache support authorisation around data sets: once uploaded, anyone can view the data. This does not satisfy requirement #2.

Therefore, we install the backend portion of the 'Sandbox' environment with Profiles, but do not expose any of the services. All requests to the underlying biocache service are made via the Profiles application, which will ensure that only data owned by the Collection's Data Resource is ever retrieved.

This is not a terribly robust solution, as it relies on the Sandbox code staying the same as it was when this was implemented. However, this approach gives a much nicer user experience. It would probably be better still to convert Sandbox to a plugin and add it to Profile Hub.

Implementation

This solution uses Web Components to embed portions of the sandbox data upload page into a Profiles view. This requires the ID of the appropriate DIVs in the Sandbox page to remain constant.

The sandbox upload form's ajax calls are a mix of relative URLs hardcoded in the JS (on the copy&paste page), and grail-generated urls that are passed in (on the file upload page). The former are left as-is, and as such will all go to the context-root (profile-hub) rather than directly to the sandbox. These URLs are then proxied via the SandboxProxyController in Profile Hub (URLMappings.groovy maps the URL formats used by the Sandbox page). The one exception is the upload service call, which goes to SandboxProxyController but then directly to the biocache-service instance rather than to the Sandbox - this is because the Sandbox expects CAS to have authenticated the request, which doesn't happen when proxying via the Profile Hub. The grails-generated URLs are rewritten in the DataController.js angular controller to a format suitable to proxy them through the profile-hub.

All URLs are modified in some way to include the opus id and/or profile id - some are modified using $.ajaxPrefilter, others by rewriting the URL to include the ids in the path. This allows us to enforce collection-specific authorisation on all requests.

Requests to the biocache instance are also proxied in a similar manner, using the SandboxBiocacheProxyController.groovy class. These are simpler than the sandbox urls, as there is no need for manipulation on the server, so there is just a single action that proxies all GET URLs (there are no POSTs).

When the user first goes to the upload data page, the Sandbox's HTML and resources are requested directly from the publicly visible Sandbox UI. Only the required DIVs are actually rendered, and all CSS and third-party JS are dropped.

All subsequent requests/posts to/from the server go via the Profile Hub. This means that the Sandbox UI is publicly visible, so someone could upload data directly. However, that data would then be essentially lost to them as there is no Hub and the biocache-service is hidden behind the Profiles UI. The uploaded data would not be associated with any collection, and so would not be visible.