Data sharing and Policy matching - myantandco/RA-BitnobiPilotJuly2020 GitHub Wiki

Data Owner’s Perspective

A data sharing operation is created by the combination of attaching one or more Policies to a Workflow.

**Policy **- sets "who" can access specific resources, **Workflow **- sets "what" specific data is to be shared.

The workflow specifies which datasource, columns and rows can be shared. A workflow can also be used to hide sensitive data and create a unique identifier if needed.

The sections below provide more details on how the sharing works.

Note that workflows can also be used by the Data Consumer (e.g. researcher) to do additional filtering or do basic analysis with the shared data.

Sharing your Workflow Result Set

  • when you create a new workflow and run it, initially no one will have access to its result set except for the owner.
  • to share a workflow result set with other users, you must attach a policy to it. Any users matching the policy will be able to use it as a data source in their workflows or reports.

Policy Definition and Matching

  • for a policy to match a user, the user must have attribute/value pairs that exactly match every attribute/value in the policy. For example if the policy only has one attribute (e.g. organization = Bitnobi) then this will match all users that have an attribute organization = Bitnobi. If a policy has two attributes (e.g. organization = Bitnobi, department=IT) then all users that belong to the Bitnobi organization and are in the IT department will match.
  • each policy must have at least 1 attribute.
  • policies created by one user are visible only to the owner. For example every user that wishes to share workflow results with all users of the Bitnobi organization must create their own policy with organization = Bitnobi.
  • the admin user can create a policy that can be "globally shared" with all Bitnobi users. This is controlled by a checkbox on the Policy editor. This allows the admin to "pre-populate" a bunch of policies that normal Bitnobi users can the use for Workflow access control.

Data Transfer

  • By default, other users cannot download or transfer data that you have shared with them out of Bitnobi. Data Transfer access control allows you to selectively enable specific Data Transfer methods if necessary.
  • A Bitnobi user must have Data Transfer permission to use Data Transfer to download as .csv or upload to JupyterHub. The admin user controls this through policies and policy resources. If the Policy Resource for Data Transfer is disabled then the users that match the policy will not see the Data Transfer page in the Bitnobi UI.
  • If you create a workflow that uses only datasources created by you, then all Data Transfer types are allowed for you. The Data Transfer Type settings for a workflow do not apply to the owner.
  • If you share a workflow with another user, the Data Transfer Type settings on that workflow will control which Data Transfer operations are allowed for other users. For example if I restrict Data Transfer to Jupyter only, then other users will not be able to download my workflow results as a .csv file, nor download the results of any derived workflow.
  • If you create a workflow using multiple shared datasources, then the most restrictive Data Transfer settings of any datasource will apply to your workflow.

Workflow Resultset Sharing Examples

First, let us define some users with the following attributes:

User Attributes
bitnobi_user_1 organization=Bitnobi projectA=true projectC=true
bitnobi_user_2 organization=Bitnobi projectA=true projectB=true projectC=true
external_user_3 organization=External projectB=true
empty_user_4
data_owner organization=Bitnobi

Next we define some policies with attributes as below:

Policy Attributes Attributes Attributes Attributes Users match count
Bitnobi organization=Bitnobi 3
projectA projectA=true 2
projectB projectB=true 2
External organization=External 1
BitnobiABC organization=Bitnobi projectA=true projectB=true projectC=true 1
empty 0

Now let us create some workflows, attach policies and see which users can access their result sets.

Note that by attaching 2 or more policies to a workflow, this creates a logical OR condition for granting access. For example for workflow5 below, any user that belongs to projectA or projectB can access the resultset.

In contrast, when multiple attributes are set in a policy, this creates an AND condition for granting access. For example the policy BitnobiABC requires that a user must be part of the Bitnobi organization, and be a member of projectA and projectB and projectC. Thus for workflow7, asside from the data_owner, only bitnobi_user_2 can access its result set.

Workflow Access control Users able to access
workflow1 data_owner
workflow2 Bitnobi data_owner, bitnobi_user_1, bitnobi_user_2
workflow3 projectA data_owner, bitnobi_user_1, bitnobi_user_2
workflow4 projectB data_owner, bitnobi_user_2, external_user_3
workflow5 projectA, projectB data_owner, bitnobi_user_1, bitnobi_user_2, external_user_3
workflow6 External data_owner, external_user_3
workflow7 BitnobiABC data_owner, bitnobi_user_2
workflow8 empty data_owner

Workflows that each user should have access to as datasources (for workflows, reports and data transfers):

User w1 w2 w3 w4 w5 w6 w7 w8
bitnobi_user_1 ✔️ ✔️ ✔️
bitnobi_user_2 ✔️ ✔️ ✔️ ✔️ ✔️
external_user_3 ✔️ ✔️ ✔️
empty_user_4
data_owner ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
⚠️ **GitHub.com Fallback** ⚠️