SAK Kubeflow QUICKSTART - Evanto/qna GitHub Wiki

Quickstart: Deploy Kubeflow on AWS EKS with Terraform

Join us on Slack

READMESwiss Army Kube (umbrella repository)Provectus

Overview

This repository is a template of a Kubeflow EKS cluster for your ML projects. Modify the main.tf file to set up a cluster, deploy it to AWS with Terraform commands and manage with ArgoCD UI/CLI (or kubectl) and Terraform. This simple yet powerful workflow allows you to quickly configure, provision, and replicate multiple dedicated ML-ready Kubeflow Kubernetes clusters (with different settings of variables, networks, Kubernetes versions, etc.).

Quickstart Contents

  1. Prerequisites
  2. Cluster Configuration
  3. Cluster Deployment
  4. Cluster Access and Management

1. Install Prerequisites

First, fork and clone this repository. Next, create/install the following prerequisites.

Create an AWS Account and IAM User

  • If you don't have an AWS account and IAM user yet, please use this official guide.

Install AWS CLI

Install Terrafrom


2. Configure Your Cluster

To set up your cluster, modify the following configuration files as you need:

  • backend.hcl
  • main.tf

Configure backend.hcl

backend.hcl is a backend configuration file that stores the Terraform state.

Example configuration of backend.hcl:

bucket         = "bucket-with-terraform-states"
key            = "some-key/kubeflow-sandbox"
region         = "region-where-bucket-placed"
dynamodb_table = "dynamodb-table-for-locks"

Configure main.tf

The minimal required set of variables you need to configure your Kubeflow EKS cluster is shown in the example below and consists of the following:

  • mainzoneid

  • domains

  • admin_arns

  • cert_manager_email

  • cognito_users

Exaple configuration of main.tf:

terraform {
  backend s3 {}
}

module "sak_kubeflow" {
  source = "git::https://github.com/provectus/sak-kubeflow.git?ref=init"

  cluster_name = "simple"

  owner      = "github-repo-owner"
  repository = "github-repo-name"
  branch     = "branch-name"

  #Main route53 zone id if exist (Change It)
  mainzoneid = "id-of-route53-zone"

  # Name of domains aimed for endpoints
  domains = ["sandbox.some.domain.local"]

  # ARNs of users who will have admin permissions.
  admin_arns = [
    {
      userarn  = "arn:aws:iam::<aws-account-id>:user/<username>"
      username = "<username>"
      groups   = ["system:masters"]
    }
  ]

  # Email that would be used for LetsEncrypt notifications
  cert_manager_email = "[email protected]"

  # An optional list of users for Cognito Pool
  cognito_users = [
    {
      email    = "[email protected]"
      username = "qa"
      group    = "masters"
    },
    {
      email    = "[email protected]"
      username = "developer"
    }
  ]

  argo_path_prefix = "examples/simple/"
  argo_apps_dir    = "argocd-applications"
}

In most cases, you'll need to override variables related to the GitHub repository (such as repository, branch, owner) in the main.tfvars file.


3. Deploy Your Cluster to AWS

Deploy your configured cluster with the following terraform commands:

terraform init --backend-config backend.hcl
terraform apply
aws --region <region> eks update-kubeconfig --name <cluster-name>

What these commands do:

  • Initialize Terraform with the backend file and download all remote dependencies
  • Create a cluster and a clean EKS with all required AWS resources (IAM roles, ASGs, S3 buckets, etc.)
  • Update your local kubeconfig file to access your newly created EKS cluster in the configured context

After that, you can manage your Kubernetes cluster with either ArgoCD CLI/UI or kubectl.

To use kubectl Kubernetes CLI for cluster management, install and configure it using this official guide.


4. Cluster Access and Management

Prepare to start using your cluster

Terraform commands will generate a few files in the default apps folder of the repository. You need to commit them in Git and push them to your Github repository to start deploying services to your EKS Kubernetes cluster.

Note that ArgoCD is pre-configured to track changes of the current repository. When new changes come to the apps folder, it triggers the synchronization process and all objects placed in that folder get created.

Access Kubeflow and ArgoCD UI

By default, two endpoints for accessing services will be created:

  • ArgoCD https://argocd.some.domain.local
  • Kubeflow https://kubeflow.some.domain.local

To access these URLs, configure Cognito User Pool with the name that matches your cluster name. Your login credentials will be emailed to the address you set up in the cognito_users in main.tf.

To get started with Kubeflow and ArgoCD please refer to the respective official documentation:

⚠️ **GitHub.com Fallback** ⚠️