k8s poc sow - noetl/noetl GitHub Wiki

NoETL Kubernetes Proof-of-Concept. Scope of work.

The proof-of-concept of NoETL on Kubernetes should consist of three parts:

  1. Orchestrator (back-end).
  2. Web user interface (front-end).
  3. Sample action.

1. Orchestrator (back-end)

The orchestrator is a Go application, which must be statically (cross-)compiled for Linux/amd64 target. Static compilation is crucial to run the application inside a container image built from the scratch. As written above, the application must be packaged into a Docker container built from "scratch" image.

The application must write standard log messages to stdout, rather than to stderr. Only panic/fatal messages should be written to stderr. For this proof-of-concept, use Go standard "log" package, do not import fancy custom loggers.

import "log"
log.SetOutput(os.Stdout)

The applicaton shall expect environment variables to connect to an etcd cluster:

  • ETCD_HOSTS: comma-separated list of IP addresses or host names of the etcd cluster (example: 10.50.10.11, 10.50.10.12, 10.50.10.13).
  • ETCD_USERNAME: username to connect to the etcd cluster.
  • ETCD_PASSWORD: password of the user that will connect to the etcd cluster.

Upon start the application shall retrieve values of the environment variables above and if none is found, write an error message to stderr and panic. Panic shall terminate the application.

If all environment variables are found, try to connect to the etcd cluster at any of ip/host given in the ETCD_HOSTS list. If the application could not connect to any of the hosts, do not panic :), just sleep for 10 seconds and try again.

Once connected to the etcd cluster, launch a web server for the front-end API on port 8080. TLS is not necessary yet.

In K8s, the application must be configured as a service with two pods on a dedicated service API and port 80. No TLS. Round robin load balancing between pods. No health check is necessary yet.

N.B.: Do NOT use any custom "extended" packages for proof-of-concept! Standard Go library for logging and http server, bwmarrin's snowflake and Kubernetes API would suffice.

2. Web user interface (front-end)

The front-end is a dynamic React application that executes on client browsers and interoperates with the back-end application over the API. The contents of the front-end must be packaged into a Docker container built from the official "nginx-stable-alpine" image. Using base images other than alpine is not recommended due to larger sizes. The container should be configured to expose port 80. No TLS is necessary yet.

The front-end application must be configured as a K8s service with two pods and round-robin load balancing between them. The front-end service shall have its dedicate service IP and shall serve on port 80.

3. Sample action

The only action the PoC accepts is to launch a K8s Job with two Pods (configured as "workers", see below). The action must be packaged as a Docker image accessible by name from within the K8s cluster. The orchestrator shall be able to launch the job programmatically via K8s API (kube-apiserver).

3.1 Front-end action

When loaded, the front-end should display one big fucking button "Launch". When clicked, this button shall invoke the /submit_post method with the "sample_job", see below. Upon susscessful /submit_job (200 OK), the front-end must disable the "Launch" button and start requesting job status every 10 seconds. The job status is requested with /list_jobs, see below.

3.2 Back-end action

When the /submit_post method is invoked, the back-end application must:

  • Create a new job record in etcd. Use a dedicated directory, for example "noetl/jobs" and create a job with a unique identity. Use bwmarrin's excellent snowflake package for generating unique identities as int64.
  • The new job must be written into etcd with key "id" under directory "noetl/jobs", where id is an int64 generated by snowflake.
  • Job's record value is a JSON object:
{
	"name":    "sample_job",
	"action":  "sample_action",
	"workers": 2,
	"status":  "created"
	"startTime": "YYYYMMDDhhmmss.sss",
	"endTime": ""
}
  • When the job record is created in etcd, the orchestrator shall start the sample_action job in K8s cluster using K8s API. Consult Kubernetes API manual.

When the /list_jobs method is invoked, the orchestrator shall return list of jobs from etcd stored under "noetl/jobs" directory. If is not possible to list all keys under the directory, create/update a list of jobs as an array of keys when creating new jobs. For example, use key "list" under directory "noetl/jobs" with value as JSON array "["snowflake_id1", "snowflake_id2"]". Note: ids must be written into JSON with double quotes as JSON string. Never ever use JSON number for long integers!

It would be great if /list_jobs queries job status in K8s cluster.

4. PoC API

The proof-of-concept back-end and front-end should support over simplest API with only few methods.

4.1 Submit Job

Request

POST /submit_job
application/json
{
	"job": {
		"name":    "sample_job",
		"action":  "sample_action",
		"workers": 2
	}
}

4.1 List Jobs

Request

GET /list_jobs

Response

200 OK
application/json
{
	"jobs": [ {
		"id": "snowflake_id",
		"name":    "sample_job",
		"action":  "sample_action",
		"workers": 2,
		"status":  "executing"
		"startTime": "YYYYMMDDhhmmss.sss",
		"endTime": ""
	}, {
		"id": "snowflake_id",
		"name":    "sample_job",
		"action":  "sample_action",
		"workers": 2,
		"status":  "completed"
		"startTime": "YYYYMMDDhhmmss.sss",
		"endTime": "YYYYMMDDhhmmss.sss"
	} ]
}

That's it. Do not expand the scope. This is enough for the first two-week exercise. You may change API and other things if necessary, but do not expand the scope.