(Detailed Design) Aleph2 security architecture - IKANOW/Aleph2 GitHub Wiki

Overview

The purpose of this page is to describe the requirements for the Aleph2 security service, together with some thoughts on design and implementation.

Requirements

Top level requirements

  • Provide 2 different authorization routes:
    • For "internal" assets (such as buckets) authenticated via an internal security service
      • Example: can a user access a given bucket or shared library object?
      • (initially this will be backed by the v1 management database - later this will also eventually back to a plugin)
    • For other assets route directly to an external security service (eg LDAP/SAML)
      • Example: a harvest developer wishes to restrict access to certain parts of the file system/TCP port access to certain users (user groups?) - he protects the desired action with a token that can then be duplicated in the back-end service (eg LDAP)
      • As a result of this requirement, the architecture should be "compliant" (whatever that means) with LDAP/SAML (?)
  • The system should (eventually) support federation (eg users on one cluster can be assigned permissions in a different cluster)
  • For buckets the access will be hierarchical like a file system * By default (ie unless overridden) lower level buckets will inherit their access restrictions from their parent

Lower level requirements

High priority, ie "phase 1":

  • Only administrators can upload libraries
    • (in the v1 context this means that the v1/v2 sync service will ignore otherwise valid buckets not uploaded by a v1 admin)
  • Users and buckets (via their owner ids) can only access shared libraries if they have read permission
    • (As noted above, write permission is only assigned to administrators.)
  • Internal assets can be given access rights from either v1 user groups or v1 communities/data groups
    • in the latter case, the groups are converted into user groups and users at check time (potentially with caching)
  • Bucket access:
    • Users can only access a bucket if they have specific read access to the bucket ... or the bucket has no specific permissions but the user has read access to the first parent bucket with specific read access
    • Users can only access a bucket if they have specific read access to the bucket ... or the bucket has no specific permissions but the user has write access to the first parent bucket with specific write access
    • Users can only create buckets if they have read permission all the way up the hierarchy and write permission in the immediate hierarchy
  • Integrated "External technologies" (ie things with Aleph2 contexts) will inherit their access rights from the bucket owner (ie all the assets available from the contexts -such as access to the underlying data services- will be wrapped in an authorization layer by default, though the developer will be able to override this)
  • An "analytics bucket" can only read from other buckets if the user that owns it has access rights to those buckets

Medium priority, ie "phase 2":

  • Developers of external modules can request an "external" security service check against an arbitrary string (or does it have to me more structured?)
    • When this code is executed, the security service then makes a check against the external assets, eg delegating out to LDAP (? see below) with the user credentials
    • (Caleb pointed out that a "hybrid" LDAP/internal security might be problematic - eg how does LDAP know about users if the user/group info is inside LDAP; I think Rob said you can sync LDAP with your own internal token information - but still a preferable alternative might be to build an access module with a very simple CRUD interface that lets you add tokens and associate them with user groups, and then use that access's modules CRUD state to lookup)
    • On assets - ideally they could be templated rules, eg "{port} > 1024", but not sure if that is possible within (say) SAML that we want to remain compatible against. You can probably work round it in any case .. eg have rules for "port<=1024", "port>1024" and then have an if statement to decide which to apply
  • All security lookups will use caches

Design/implementation thoughts

Note these are not intended to be binding, just my thoughts as I worked through these requirements:

v1/v2 integration

  • Accessing the underlying v1 database (the prototype used the Java driver which uses the REST API, but I don't think that's desirable), provides the following lookup types:
    • userid -> data groups/communities and user groups, and global admin or not
    • user group id -> list of users
    • data group/community id -> list of (user,role) pairs and (user group,role) pairs
      • v1 roles are: member (read only), content_publisher (read/write), moderator/owner (read/write)
  • V1 sources (ie synchronized to buckets in v2) currently can be assigned to a single data group/community only (though of course that can map to multiple users or user groups)
  • V1 shares (ie synchronized to shared library beans in v2) can be assigned to multiple user groups or data groups or communities
  • Therefore in at most 2 hops you can get from user id to the assets that need to be authenticated

Therefore the three main v1-related workflows would be:

  • Can the user read/write/create a specific bucket (ignoring hierarchical authorization)?
    1. Take the user_id and the bucket.access_rights (which is a list of data groups) and pass into the security service with an "r" role
    2. The security service accesses the v1 DB (or the cache) to map the data group to a list of (user,role)s and (user group,role)s
    3. If the user is present, done; else use the v1 DB (or the cache) to map the user groups with appropriate roles (this is more for "w") to a list of users, done.
  • Can a user upload a shared library object?
    1. _Take the user id from the shared library bean and pass it to the security service with the "admin" role (no user groups/etc here)
    2. The security service checks the v1 database (or cache) and checks if the user is admin, done
      • (Oh one subtlety I've just thought of is that the "config area" for shared library beans should be optionally (an on by default) secure, ie you can only see them if you have read permission. That will let people put passwords and things in there.)
  • Can a bucket access a shared library object?
    1. Take the owner_id from the bucket, and the shared_library.access_rights and pass them into the security settings with an "r" role
    2. As for buckets, above; except there are multiple tokens instead of 1, and they can be user groups as well as data groups (the initial data group->user group/user is not required in this case)

v2 security (still backed by the v1 management database)

There's quite a lot of different cases here, obviously if we need to do some re-architecting to make security more tractable then we will.

An initial outline:

  • Most non-core code interacts with the core via various context objects (IEnrichmentModuleContext, IAccessContext, IHarvestContext)
  • In most cases the context will be associated with a bucket (hence an owner_id)
    • (Or in the case of an access context, each REST call will generally have a user associated with it, ie via the v1 cookie)
  • These contexts provide a smallish number of context-specific calls, but also the ability to access any of the underlying data services
    • Currently the only generic interfaces common across all the data services are ICrudService and IManagementCrudService
      • Many top-level services allow access to CRUD services - eg the shared-library/bucket/bucket-status data stores from the CoreManagementDbService, or the per-bucket storage in ElasticsearchSearchIndex.
      • There are 2 types of CRUD service in practice:
        • A "pure" CRUD service (eg aleph2_crud_service_mongodb), which just provides direct access to the underlying technology ... normally a given user will have read/write permissions across an entire CRUD service
        • A "proxying" CRUD service (eg DataBucketCrudService in aleph2_management_db_service), where actions typically have side effects - in such cases a user will typically have access to some objects but not others
      • In the former case, it is necessary/sufficient to protect just the access to the CRUD service itself (eg ISearchIndexService.getCrudService).
        • These will probably have to be defended on a case-by-case basis in the data service implementation, though it would be nice to come up with a generic wrapper to do it. The good news is that there aren't that many cases outside of the CoreManagementDbService, so it shouldn't get too messy.
      • In the latter case, the individual calls must be authorized. I put in place 2 candidate ways of handling this:
        • There's a CRUD wrapper than enables you to attach lambdas to each different call
        • The CRUD interface includes a getFilteredRepo calls that you can pass an AuthenticationBean into (this is currently an empty bean). What the CRUD service actually does with this depends on the CRUD implementation (currently MongoDB or Elasticsearch - both currently do nothing with it obviously)
          • (the getFilteredRepo also lets you pass in a ProjectBean - projects are a concept where read access to assets are restricted as if the user didn't have read permission, but actually it's just to filter unwanted data)
        • So the basic idea would be that the call to the CRUD service would wrap the CRUD service and also pass in the AuthenticationBean (in case the underlying technology can make access faster/safer using that information - not something we'd worry about for now though)
      • I don't have any sort of plan for handling other "random" services/calls - perhaps it's not necessary because they won't typically be accessed so you can leave it up to the "external technology" developer to protect?
    • So the general idea would be something like: by default the contexts inherit the security of the bucket they are handling, and then add an extra getFullAccess() call, which returns a clone of the context but with the security flag unset.
      • (NOTE: that the point isn't to prevent the developer from making whichever calls they want, it's to make it easy for the developer to avoid accidentally opening up access in a way that would enable users to accidentally/deliberately subvert security)

External asset protection

As described above, the idea here is that harvesters and analytic modules will make lots of different types of access to the underlying OS/hardware/etc, some of which the module developer will want to "secure". These should be handled generically so that the core doesn't have to be changed every time someone thinks of a new asset.

As noted above, ideally these could be complex rules, but (certainly initially) just string tokens (preferably human readable) should be fine.

So the general idea would be that in some "external database" (which could be LDAP, or as mentioned above could be a simple CRUD store in Aleph2), you'd have a DB of tokens and their associated (user group, role)s and then apply authorization as in the standard case.

Moving away from the v1 management DB

Longer term the desire is to remove all vestiges of the v2 management DB.

The only bit not covered by the existing core are users and user groups (communities/data groups are not part of v2 and will only be used as long as needed to ease integration with v1).

The idea (subject to change, I think Rob recommended keeping some of this internally) would then be to use the LDAP/SAML data model and store all the information in there vs in a v2 data store (except for in memory caching etc).

Prioritization

The priority follows the order of above design/implementation notes:

  • (Very high prio, ASAP) Get the v1/v2 sync service protected. Together with carefully-coded external technologies, this will result in a safe system.
  • (Medium prio) Protect the v2 context/data services - this will make external technology development much easier
  • (Lower prio) Add access to external assets - this is still quite important I think, the main SNC customer has mentioned this as a requirement.
  • (Not a priority at all) Ditching v1 and moving to a pure v2 environment.