Documentation: Adding schema type checking support for OPA inputs and contextual data

Problem Statement:

We propose to augment Rego's type system to take into account the schemas for inputs and data documents. This will add precision to type checking and help prevent errors when writing Rego code. It will help users by rejecting erroneous code statically and improving developer productivity. We adopt JsonSchema as the format for providing schemas.

This work entails the following steps:

Augmenting the OPA infrastructure to accept schemas in a variety of usage scenarios.
Transformation of JsonSchema to Rego Object types. JsonSchemas can be complex descriptions with references, definitions, conditionals, disjunction. They need to be resolved before being transformed to Rego types. We leverage an existing open-source code base for schema resolution.
Augmenting Rego with schema annotations on rules to address issue with schemas overloading. This happens when a rule can be applied to inputs having different schemas, or schemas that are parametrized by another schema (e.g. Kubernetes admission review that applies to resources of different types).
Augmenting OPA type checking to integrate Rego types obtained from JsonSchemas.

Our development branch: https://github.com/aavarghese/opa/tree/schemaSupport

Usage Scenarios

We support various usage scenarios to accommodate different ways in which users may want to take advantage of this feature. When performing a query, the user always needs to upload input data. So we support uploading a schema together with that input in several ways. The user can supply a -s flag to indicate an input schema when running opa eval and opa run.

Another supported usage scenario is to upload schemas as base documents using OPA's Data APIs. The schemas are then like any other base document and can be read from storage during type checking. In this scenario, the user relies on annotations in the Rego code to associate Rego expressions with the corresponding schema. This feature can be used to give a schema to the input, as well as to other data documents.

Below is a complete list of all of the usage scenarios.

OPA eval

We added a new query argument to eval to support uploading of a single input schema json file

-s, --schema string set schema file path

Example: Consider the envoy request schema provided at: https://github.com/aavarghese/opa-schema-examples/tree/main/envoy

opa eval data.envoy.authz.allow -i example/envoy/input.json -d example/envoy/policy.rego -s example/envoy/input-schema.json

Schema Demo

We also extended the data file argument -d, --data string set data file(s) or directory path(s), to support uploading of data schema json files

Example: opa eval data.kubernetes.admission -i example/kubernetes/admission-review.json -d example/kubernetes/pod.rego -d schemas.kubernetes.pod:example/kubernetes/pod-schema.json -d schemas.kubernetes.admission:example/kubernetes/admission-schema.json

Notice the convention schemas. in the data file paths for data schema json files.

OPA run

We can pass an input schema to the OPA run REPL on the command line. We need to prefix the filepath for schema.

Example: opa run example.rego repl.input:input.json repl.input.schema:input-schema.json

OPA run (server)

There are three ways of uploading input/data schema information when running OPA as a server:

First, we added a new query argument to run to support uploading of a single input schema json file

-S, --schema string                        set path of schema directory or file

Example: opa run --server --addr=localhost:8181 --diagnostic-addr=0.0.0.0:8282 --ignore=.* example/envoy/policy.rego --schema=example/envoy/input-schema.json

Second, an `input` schema can be wrapped in the same JSON object as the input and uploaded via the OPA Data APIs:

{
    "input": {<value>},
    "schema": {<value>}
}

curl localhost:8181/v1/data/envoy/authz/allow -d @v1-data-inputWithSchema.json -H 'Content-Type: application/json'"

Finally, schemas can be uploaded to the server as data documents via OPA's Data APIs. They are assumed to be under the path data/schemas, which can have an arbitrary hierarchy underneath. The user employs annotations in the Rego code to associate schemas to various Rego expressions (see below). This feature can be used to give a schema to either the input or any data document.

http PUT localhost:8181/v1/data/somedata < pod.json
http PUT localhost:8181/v1/data/schemas/somedata < pod-schema.json

Schema Annotations

A rule can be annotated with a comment of the form:

#@rulesSchema=<expression>:data.schemas.<path-to-schema>,...,<expression>:data.schemas.<path-to-schema>

An expression is of the form <input|data>.field1. ... .fieldN

This annotation associates a schema (uploaded as a data document via OPA's Data APIs) with the corresponding expression. So it can be used to give a schema to the input or any data document. The type checker derives a Rego Object type for the schema and an appropriate entry is added to the type environment. This entry is removed upon exit from the rule.

Annotations allow overriding when a prefix of an expression has a type in the type environment (see example below).

Notice that currently the annotation needs to appear on the line immediately preceding the rule definition.

Examples:

@ruleSchema = input.request:data.schemas.io.k8s.admission.v1.pod

@ruleSchema = data.XYZ:data.schemas.ABC

package kubernetes.admission                                                

#@rulesSchema=input:data.schemas.kubernetes.admission,input.request.object:data.schemas.kubernetes.pod
deny[msg] {                                                              
  input.request.kind.kind == "Pod"                                          
  image := input.request.object.spec.containers[_].image                    
  not startswith(image, "hooli.com/")                                       
  msg := sprintf("image '%v' comes from untrusted registry", [image])       
}

The above rule annotation indicates that the input has a type derived from the admission schema, and that in addition, input.request.object has a type which is derived from the pod schema. The second annotation overrides the type in the first annotation for the path input.request.object. Notice that the order of annotations matter for overriding to work correctly.

Implementation

We leveraged an open-source JSON schema compiler (https://github.com/xeipuuv/gojsonschema) to resolve JsonSchema and obtain a structure that no longer has references. We then traverse this structure recursively to obtain a Rego Object type. This type is added to the type environment appropriately. The following document lists features of JsonSchema that will need special attention in the future: https://github.com/aavarghese/opa/blob/schemaSupport/example/UnderstandingJSONSchema.md

When a JsonSchema has a property with type object but no specified properties, we interpret that to be a Rego type Any. Otherwise the type system would not allow any fields defined for that property in the Rego code.

When an annotation includes and expression with a path, we look up all the prefixes of that path and look them up in the type environment. If any prefix has an entry in the environment, then we override than entry instead of creating a new entry. This has the effect of overriding previous types corresponding the same path. To enter a type in the environment we use types.Refs to create a key corresponding to that path for the environment and Put the Rego Object type obtained from the schema for that key in the environment.

Notice that the OPA Go library should work as-is but needs testing.

Use Cases

Envoy

Envoy's input schema: https://github.com/aavarghese/opa/blob/schemaSupport/example/envoy/input-schema.json

Application pod from simple tutorial example (https://github.com/open-policy-agent/opa-envoy-plugin/blob/master/quick_start.yaml) was modified to use our built OPA image (with type checking) as sidecar with a --schema input argument, where the schema json file is loaded via a volume-mounted ConfigMap.

          image: avarghese23/opa:0.24.0-envoy-16
          securityContext:
            runAsUser: 1111
          volumeMounts:
            - readOnly: true
              mountPath: /policy
              name: opa-policy
            - readOnly: true
              mountPath: /schema
              name: opa-schema
            - readOnly: true
              mountPath: /config
              name: opa-envoy-config
          args:
            - "run"
            - "--server"
            - "--config-file=/config/config.yaml"
            - "--addr=localhost:8181"
            - "--schema=/schema/input-schema.json"
            - "--diagnostic-addr=0.0.0.0:8282"
            - "--ignore=.*"
            - "/policy/policy.rego"

# Example schema to type check the above policy
############################################################
apiVersion: v1
kind: ConfigMap
metadata:
  name: opa-schema
data:
  input-schema.json: |
    {
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "http://example.com/example.json",
    "type": "object",
    "title": "The root schema",
    "description": "The root schema comprises the entire JSON document.",
    "properties": {
        "attributes": {
...

"For production deployments, OPA Envoy recommends serving policy Bundles from a remote HTTP server." Hence, we will need to implement type checking support when schemas are part of a Bundle, instead of loading via a volume-mounted ConfigMap.

Future Work

Write test cases
Use cases: Terraform OPA https://www.openpolicyagent.org/docs/latest/terraform/ (also see Schematics)
Implementation change: Instead of Any for a JSON object type with no properties, we can have a Rego Object with no static properties but having dynamic property of type Any.
New Features:
- Transitivity: What is the scope for a ruleSchema annotation when there is a rule referenced inside the annotated rule? Do schema annotations apply transitively? This would be useful if we have a helper rule called from another rule. We might need unioning of types if the schemas that are in scope are not consistent. This may require more changes to the type checker as it operates bottom up.
- Opa eval support for directory of schemas + Bundle plugin extension for schemas
- Support for features of JsonSchema (enum, aditionalProperties, anyOf, etc...)
  - enum: extend String type to have an enum. Use during type checking for equality expressions (unification?). Example: p["foo"]{true}, p[input.x]
  - additionalProperties: use Object type dynamic props of type Any
  - anyOf: use types.Or
- Support other types of annotations (global, statement-level)
  - Associate schemas with virtual documents (imports, look into OPA’s httpSend plugin)
Extend VS Code plugin for OPA (under the hood uses opa eval). It should take a directory of schemas.
Annotation inference (Gatekeeper: use webhook configurations)

References:

Discussion from OPA community:- https://github.com/open-policy-agent/opa/issues/1449
Kubernetes JSON Schemas library:- https://github.com/instrumenta/kubernetes-json-schema
Envoy plugin example:- https://github.com/open-policy-agent/opa-envoy-plugin/blob/master/quick_start.yaml
User friendly HTTP client:- https://httpie.io/
JSON to JSON schema online tool:- https://jsonschema.net/

Documentation: Adding schema type checking support for OPA inputs and contextual data - aavarghese/opa GitHub Wiki

Problem Statement:

Usage Scenarios

OPA eval

OPA run

OPA run (server)

Schema Annotations

Implementation

Use Cases

Envoy

Future Work

References:

⚠️ GitHub.com Fallback ⚠️

Documentation: Adding schema type checking support for OPA inputs and contextual data - aavarghese/opa GitHub Wiki

Problem Statement:

Usage Scenarios

OPA eval

OPA run

OPA run (server)

Schema Annotations

Implementation

Use Cases

Envoy

Future Work

References:

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️