PDP 46 (Read Only Permissions for Reading Data) - derekm/pravega GitHub Wiki

Status: Early Draft

Table of Contents:

Motivation
Analysis
Proposed Changes
References

Motivation

Today, reading data from a stream requires write (READ_UPDATE) permission.

Say, you have a reader group application PriceChangeCalculator that reads data from a stream StockPriceUpdates in MarketData scope. The application will need to be run using an account having write permissions for the following resources:

MarketData (the scope)
MarketData/StockPriceUpdates (the stream)
MarketData/_RGPriceChangeCalculator (the internal stream for the reader group)
MarketData/_MARKStockPriceUpdates (the internal stream where watermarks for the stream are emitted)

This is problematic in at least three respects.

Firstly, it is plain counterintuitive that reads require write - and not read - permissions.

Secondly, write permissions provide broader permissions to the account than what is warranted for reads alone. It not only enables the user account to read data from the stream but also allows it to:

Update the stream's configuration
Delete the stream
Write data to the stream So, providing the user write permission to the stream violates the "principle of least privilege".

Thirdly, reads not only require access to the scope and the stream but also to the internal streams (_RGPriceChangeCalculator and _MARKStockPriceUpdates in the earlier example) used for storing reader group states and watermarks. This exposes internal abstractions to the admin/operator. For example, when using the Password Auth Handler, the admin shall need to configure the following Access Control List (ACL) for the user account:

MarketData,READ_UPDATE;MarketData/StockPriceUpdates,READ_UPDATE;
MarketData/_RGPriceChangeCalculator,READ_UPDATE;MarketData/_MARKStockPriceUpdates,READ_UPDATE

The admin/operator could use wildcard characters to simply the ACL, as shown below, but this ACL provides even broader permissions to the user.

MarketData,READ_UPDATE;MarketData/*,READ_UPDATE;

Analysis

Why does reading data from a stream require write permissions? Here are a few technical reasons:

S. No.	Resource	Example	Why is the permission needed for the resource?
1.	The scope	`MarketData`	Creating a reader group (RG) requires the creation of internal streams, which in turn require write access to the scope.
2.	The stream	`MarketData/StockPriceUpdates`	`getSegments` call to Controller requires write permissions. That call is invoked in the read path, for fetching the segments of the stream for reading. Fetching delegation tokens from the server via the Controller's `getDelegationToken` operation also requires this permission.
3.	The internal stream for the reader group	`MarketData/_RGPriceChangeCalculator`	`getCurrentSegments` call to Controller requires write permissions. In the read path, that call is invoked for this internal stream.
4.	The internal watermark stream for the stream	`MarketData/_MARKStockPrices`	`getCurrentSegments` call to Controller requires write permissions. In the read path, that call is invoked for this internal stream.

Proposed Changes

The following sub-sections describe the proposed changes.

I. Reduce permissions required for methods returning delegation tokens to `read`

Today, the following Controller gRPC operations require the caller to possess write (READ_UPDATE) permissions. All three return delegation tokens, and are invoked for both read/writes.

getDelegationToken(StreamInfo) returns (DelegationToken)

getCurrentSegments(StreamInfo) returns (SegmentRanges)

message SegmentRanges {
    repeated SegmentRange segmentRanges = 1;
    string delegationToken = 2;
}

getSegments(GetSegmentsRequest) returns (SegmentsAtTime)

message SegmentsAtTime {
   message SegmentLocation {
      ...
   }
   repeated SegmentLocation segments = 1;
   string delegationToken = 2;
}

Reducing their permission to read will enable readers to read data with read-only permissions. As we'll see in the next sub-section, doing this throws up another problem.

II. Distinguish between required and target permissions when generating delegation tokens

Let's first distinguish between two different types of permission that apply to the three Controller gRPC operations mentioned in the previous section:

The "minimum required permission (MRP)": The minimum permission that is required for authorizing the call.
The "requested permission (RP)": The permission that the caller wants to be assigned on the delegation token issued by the server (Controller).

The previous sub-section #I was about changing the MRP to read. Currently, RP is the same as the MRP for those operations. So, reducing the MRP to read will also result in the delegation tokens having read permissions assigned to the caller, which will not work for writes.

The solution is to distinguish the two types of permissions. Request messages of operations that return delegation tokens shall be enhanced to take RP as an optional input parameter, which will be used in the authorization and delegation token generation process as follows:

If the RP is not specified in the input, authorization is done solely based on the MRP. The delegation tokens the operations return will have the MRP assigned to the bearer.
If the RP is specified and it is the same as the MRP, the call is authorized based on the RP/MRP and the delegation tokens are assigned for the same.
If the RP is specified and it is different from the MRP the call is authorized based on the MRP, but the server also checks that the caller possessed the requested permission before issuing a delegation token. The token itself contains the RP.

The existing StreamInfo message can be modified to include the target permission. That'll cover the first two operations getDelegationToken and getCurrentSegments.

message StreamInfo {
    string scope = 1;
    string stream = 2;
    
    // New parameter. Default value is empty string as per https://developers.google.com/protocol-buffers/docs/proto3#default. 
    string requestedPermission = 3

Since, the GetSegmentsRequest request message includes StreamInfo internally, the third operation getSegments can is also covered.

III. Do not require the same level of permissions for internal streams

Currently, internal streams created for reader groups and watermarking are authorized exactly the same way as normal streams in both read and write paths. For example, when a reader group is set up on the client-side, an internal stream of the form scopeName/_RGreaderGroupName is created. Stream creation requires write permissions on the scope, which is problematic for read-only clients. Similarly, the readers publish state to the _RG stream, which requires write permissions on the stream.

The solution is to treat these internal streams differently, w.r.t. authorization. Operations that require write permissions for the stream, should require only read permissions for the internal streams. The downside from a security perspective is that internal streams can be updated with data using a user account that has read-only permissions. That might not be a big problem for these reasons:

Client applications cannot deal in internal streams directly. The client API prevents them from doing so. That doesn't entirely prevent all security attacks through the internal stream vector, but it does make it harder to compromise it.
Even with read permissions, client applications must be authenticated and have explicit permissions on the internal streams to compromise them. Besides, audit logging can be used to identify potential attempts to compromise them.

References

Versioning gRPC Services https://docs.microsoft.com/en-us/aspnet/core/grpc/versioning?view=aspnetcore-3.1