MinIO and Trino on Kubernetes Using Helm - minio/wiki GitHub Wiki

Setting up Minio and Trino on Kubernetes

Table of Contents

In this tutorial, we'll deploy a cohesive system that allow distributed SQL querying across large datasets store in Minio, with Trino leveraging metadata from Hive Metastore and table schemas from Redis.

Prerequisites:

  • kubectl: Kubernetes command-line tool.
  • helm: Helm package manager for Kubernetes.

Initial setup

git clone https://github.com/r-scheele/trino-on-kubernetes.git
cd trino-on-kubernetes

if you don't want to go through or understand a long set of instructions, simply run:

bash scripts/up.sh

Here's the content of up.sh

set -euo pipefail

BASE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_DIR="${BASE_DIR}/.."
(
cd ${REPO_DIR}
kubectl create namespace trino --dry-run=client -o yaml | kubectl apply -f -
kubectl create secret generic redis-table-definition --from-file=redis/test.json -n trino || true

# Adding Helm repos and ignoring errors if the repo already exists
helm repo add bitnami https://charts.bitnami.com/bitnami || true
helm repo add trino https://trinodb.github.io/charts/ || true


helm upgrade --install my-minio bitnami/minio -n trino -f minio/values.yaml
helm upgrade --install hive-metastore-postgresql bitnami/postgresql -n trino -f hive-metastore-postgresql/values.yaml
helm upgrade --install my-hive-metastore -n trino -f hive-metastore/values.yaml ./charts/hive-metastore
helm upgrade --install my-redis bitnami/redis -n trino -f redis/values.yaml
helm upgrade --install my-trino trino/trino --version 0.7.0 --namespace trino -f trino/values.yaml
)


You can skip to the testing section from here*

Components Overview:

  1. Minio:

    • Purpose: Minio can be used to store large datasets, like the ones typically analyzed by Trino.
  2. Hive Metastore:

    • Purpose: Hive Metastore is a service that stores metadata for Hive tables (like table schema). Trino can use Hive Metastore to determine the schema of tables when querying datasets.
  3. PostgreSQL for Hive Metastore:

    • Purpose: This is the database backend for the Hive Metastore. It's where the metadata is actually stored.
  4. Redis:

    • Purpose: In this setup, Redis for storing table schemas for Trino.
  5. Trino:

    • Purpose: Trino (formerly known as Presto) is a high-performance, distributed SQL query engine. It allows querying data across various data sources like SQL databases, NoSQL databases, and even object storage like Minio.

Step-by-Step Guide:

  1. Create the Kubernetes Namespace:

    kubectl create namespace trino --dry-run=client -o yaml | kubectl apply -f -
  2. Create a Secret for Redis Table Definition: Store table schemas for Trino:

    kubectl create secret generic redis-table-definition --from-file=redis/test.json -n trino || true
  3. Add helm repositories:

    helm repo add bitnami https://charts.bitnami.com/bitnami || true
    helm repo add trino https://trinodb.github.io/charts/ || true
  4. Deploy Minio: Store large datasets for querying with Trino:

    kubectl minio init -n trino
    kubectl minio tenant create tenant-1 --servers 4 --volumes 4 --capacity 4Gi -n trino
  5. Set Up Hive Metastore with PostgreSQL: Store metadata for Hive tables, which Trino uses to determine table schema:

    helm upgrade --install hive-metastore-postgresql bitnami/postgresql -n trino -f hive-metastore-postgresql/values.yaml

    Deploy Hive Metastore:

    helm upgrade --install my-hive-metastore -n trino -f hive-metastore/values.yaml ./charts/hive-metastore
  6. Deploy Redis: Redis will store table schemas for Trino in this setup:

    helm upgrade --install my-redis bitnami/redis -n trino -f redis/values.yaml
  7. Deploy Trino: Deploy the distributed SQL query engine:

    helm upgrade --install my-trino trino/trino --version 0.7.0 --namespace trino -f trino/values.yaml
  8. Verify Your Deployments:

    kubectl get pods -n trino

Ensure to review and adjust configurations as needed, especially from a security standpoint.

Optional To disable Certificate checking for S3 connections, update your values.yaml file's additionalCatalogs section, with the following property:

 hive.s3.ssl.enabled=false

Testing the connection between Minio and Trino

Port froward to the minio service of tenant

   kubectl port-forward svc/minio -n trino 9443:443

Create an alias for the tenant and create a sample bucket Use the credentials from above

   mc alias set my-minio https://localhost:9443/ minio_access_key minio_secret_key --insecure
   mc mb my-minio/tiny --insecure

To access Trino UI in the browser

  export POD_NAME=$(kubectl get pods --namespace trino -l "app=trino,release=my-trino,component=coordinator" -o jsonpath="{.items[0].metadata.name}")
kubectl port-forward $POD_NAME 8080:8080

Visit http://127.0.0.1:8080 to use the UI.

Screenshot 2023-09-06 at 12 43 45

Or run the trino shell

kubectl exec -it deploy/my-trino-coordinator -n trino -- trino
SHOW CATALOGS;
SHOW SCHEMAS IN minio;
       Schema
--------------------
 default
 information_schema
CREATE SCHEMA minio.tiny
WITH (location = 's3a://tiny/');
CREATE TABLE minio.tiny.customer
WITH (
    format = 'ORC',
    external_location = 's3a://tiny/customer/'
) 
AS SELECT * FROM tpch.tiny.customer;
SELECT * FROM minio.tiny.customer LIMIT 50;
SHOW SCHEMAS IN minio;
       Schema
--------------------
 default
 information_schema
 tiny
(3 rows)

Confirm the content of MinIO by running the following commad:

   mc ls my-minio/tiny --insecure

Common Issues

If you encounter issues related to configurations, especially for security, review the respective values.yaml files for each component.

You might want to connect Trino to Minio using HTTPS - Optional

  1. Extract MinIO's certificate using openssl:
   kubectl port-forward <POD NAME> PORT:PORT
   echo | openssl s_client -connect MINIO_ENDPOINT:MINIO_PORT | openssl x509 > minio-cert.pem

Replace MINIO_ENDPOINT and MINIO_PORT with the appropriate values.

  1. Create a Truststore for Trino: Import the MinIO certificate into a new Java truststore
   keytool -import -alias minio -file minio-cert.pem -keystore trino-truststore.jks

This will prompt you for a password. Remember it as you'll need it in the subsequent steps.

  1. Create a Kubernetes Secret for the Truststore:
   kubectl create secret generic trino-truststore --from-file=trino-truststore.jks
  1. Update the values.yaml for Trino Helm Chart: You need to modify your values.yaml to mount the truststore and configure Trino to use it:
# ... (rest of your values.yaml content)

secretMounts:
  - name: trino-truststore
    path: "/etc/trino/truststore"
    secretName: trino-truststore

# ...

# Add to the 'config' or similar section in your helm chart:
config:
  http-server.https.enabled: "true"
  http-server.https.port: "8443"
  http-server.https.keystore.path: "/etc/trino/tls/trino.jks"
  http-server.https.keystore.key: "<password>"
  internal-communication.https.enabled: "true"
  internal-communication.https.keystore.path: "/etc/trino/tls/trino.jks"
  internal-communication.https.keystore.key: "<password>"
  internal-communication.https.truststore.path: "/etc/trino/truststore/trino-truststore.jks"
  internal-communication.https.truststore.key: "<truststore-password>"

# ... (rest of your values.yaml)

Replace <password> with the password you used when creating the JKS. Replace <truststore-password> with the password you set when creating the truststore.

  1. Upgrade Helm Release: Apply the changes:
   helm upgrade --install my-trino trino/trino --version 0.7.0 --namespace trino -f trino/values.yaml

This is a basic setup, and might require you to adjust the configurations a bit.

⚠️ **GitHub.com Fallback** ⚠️