SAK Kubeflow Quickstart v.1 - Evanto/qna GitHub Wiki

Swiss Army Kube for Kubeflow: an open-source tool for easy setup and deployment of AWS EKS Kubeflow Kubernetes clusters with Terraform

Join us on Slack

READMEProvectus

Overview

SAK Kubeflow is a free IaC tool for easy setup and deployment of AWS EKS Kubernetes clusters pre-configured with Kubeflow to help you quickly bring your ML to production. With SAK Kubeflow you get your production-ready Kubeflow cluster up and running on AWS in just a fraction of time it normally takes. In the process you also create a GitOps workflow.

SAK Kubeflow uses Terraform for cluster configuration and deployment, ArgoCD to manage all Kubernetes resources (including ArgoCD itself), and Cognito to manage user authentication with user pools.

Kubeflow SAK repository is a template of a cluster structure for your projects. Modify the .tfvars template to set up your cluster, deploy it on AWS with a couple of Terraform commands, and manage with ArgoCD UI/CLI. This simple yet powerful workflow allows you to configure and provision multiple dedicated ML-ready Kubeflow Kubernetes clusters (with different settings of variables, networks, Kubernetes versions, etc.) in no time.

SAK Kubeflow is based on the main Swiss-Army-Kube repository and is it's modification for the Kubeflow setup based on SAK's collection of modules.

Quickstart Contents

  1. Prerequisites
  2. Cluster Configuration
  3. Cluster Deployment
  4. Cluster Access and Management

1. Prerequisites

  • Fork and clone this repository
  • Install Terraform
  • Install AWS CLI

AWS Account

You should have and Amazon AWS account and a configured IAM user. If you don't have that yet, please use this official guide

Install AWS CLI

Install AWS CLI using this official guide.

Install Terrafrom

Install Terraform using this official guide.

2. Configuring Your Cluster

To set up your cluster, modify these two configuration files as you need:

  • backend.hcl
  • terraform.tfvars

Configure backend.hcl

Example configuration:

bucket         = "bucket-with-terraform-states"
key            = "some-key/kubeflow-sandbox"
region         = "region-where-bucket-placed"
dynamodb_table = "dynamodb-table-for-locks"

This is a backend configuration file that stores Terraform state.

Configure terraform.tfvars

Exaple configuration:

# Main route53 zone id if exist.
mainzoneid = "id-of-route53-zone"

# Name of domains aimed for endpoints
domains = ["sandbox.some.domain.local"]

# ARNs of users who will have admin permissions.
admin_arns = [
  {
    userarn  = "arn:aws:iam::<aws-account-id>:user/<username>"
    username = "<username>"
    groups   = ["system:masters"]
  }
]

# Email that would be used for LetsEncrypt notifications
cert_manager_email = "[email protected]"

# An optional list of users for Cognito Pool
cognito_users = [
  {
    email    = "[email protected]"
    username = "qa"
    group    = "masters"
  },
  {
    email    = "[email protected]"
    username = "developer"
  }
]

In most cases, you'll also need to override variables related to the GitHub repository (such as repository, branch, owner) in the terraform.tfvars file.

3. Deploying Cluster Cluster Deployment to AWS

Deploy your configured cluster with the next three terraform commands:

terraform init --backend-config backend.hcl
terraform apply
aws --region <region> eks update-kubeconfig --name <cluster-name>

These commands:

  • Initialize Terraform with the backend file and download all remote dependencies
  • Create a cluster and a clean EKS with all required AWS resources (IAM roles, ASGs, S3 buckets, etc.)
  • Update your local kubeconfig file to access your newly created EKS cluster in the configured context

After that you can manage your Kubernetes cluster with either ArgoCD CLI/UI or kubectl (install and configure it to manage cluster with kubectl commands).

4. Cluster Access and Management

Prepare to start using your cluster

Terraform commands will generate a few files in the default apps folder of the repository. You need to commit them in Git and push to your Github repository to start deploying services to your EKS Kubernetes cluster.

Note that ArgoCD is pre-configured to listen to the current repository. When new changes come to the apps folder, it triggers the synchronization process and all objects placed in that folder get created.

Access Kubeflow and ArgoCD

By default, two endpoints for accessing services will be created:

  • ArgoCD https://argocd.some.domain.local
  • Kubeflow https://kubeflow.some.domain.local

To access these URLs, configure Cognito User Pool with the name that matches your cluster name.

kubeflow argocd

⚠️ **GitHub.com Fallback** ⚠️