Deploying Texera on Amazon Web Services(AWS) - Texera/texera GitHub Wiki

1. Create an EKS cluster on AWS

Go to the EKS Console and log in with your AWS account. Click "Create Cluster".

eks

Use the default configuration to create your cluster, and give it a name of your choice. If "Cluster IAM role" and "Node IAM role" are empty, click "Create recommended role" and follow the guided steps. Then, click "Create".

create

The cluster will take about 15–20 minutes to be created and reach an Active state. You can refresh the dashboard to monitor the progress.

dashboard

2. Install AWS CLI and Helm

To access the cluster, you have the following 2 options:

  1. Follow the AWS CLI installation guide to install the AWS CLI on your local enviornment. Once installed, create a passkey by clicking your account name in the top right corner → Security credentials.

    pass1

    On the security credentials page, click "Create access key".

    pass2

    Follow the prompt and copy the access key and secret.

    pass3

    Then open a terminal on your computer and enter aws configure, paste the copied credentials. Also enter the default region shown on your cluster dashboard when prompted, leave the "output format" as its default value:

    conf1
  2. Use AWS CloudShell. Make sure the cloudshell is running on the same region with your cluster.

    cloudshell

Once you have a terminal (either local or CloudShell) ready to run aws commands, set the environment variable:

EKS_CLUSTER_NAME=<your-cluster-name>

Update your kubeconfig to use the new cluster:

aws eks update-kubeconfig --name $EKS_CLUSTER_NAME

Verify the connection:

kubectl get all

By default, AWS does not assign external IPs to LoadBalancers unless the subnets are properly tagged. To enable both public and internal LoadBalancers, run the following command to tag your subnets:

aws ec2 create-tags \
  --resources $(aws eks describe-cluster --name $EKS_CLUSTER_NAME --query "cluster.resourcesVpcConfig.subnetIds" --output text) \
  --tags Key=kubernetes.io/role/elb,Value=1 Key=kubernetes.io/role/internal-elb,Value=1

3. Create two Nginx controllers for texera:

Install Helm by following the Helm installation guide. Then execute the following commands in your terminal:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update

helm upgrade --install nginx-texera ingress-nginx/ingress-nginx \
  --namespace texera --create-namespace \
  --set controller.replicaCount=1 \
  --set controller.ingressClassResource.name=nginx \
  --set controller.ingressClassResource.controllerValue="k8s.io/ingress-nginx" \
  --set controller.ingressClass=nginx \
  --set controller.service.type=LoadBalancer \
  --set controller.service.annotations."service\.beta\.kubernetes\.io/aws-load-balancer-scheme"="internet-facing"

helm upgrade --install nginx-minio ingress-nginx/ingress-nginx \
  --namespace texera --create-namespace \
  --set controller.replicaCount=1 \
  --set controller.ingressClassResource.name=nginx-minio \
  --set controller.ingressClassResource.controllerValue="k8s.io/nginx-minio" \
  --set controller.ingressClass=nginx-minio \
  --set controller.service.type=LoadBalancer \
  --set controller.service.annotations."service\.beta\.kubernetes\.io/aws-load-balancer-scheme"="internet-facing"

Wait for 1-2 minutes, then run:

kubectl get svc -n texera

When the EXTERNAL-IP fields are populated, assign the hostnames to environment variables:

TEXERA_IP=$(kubectl get svc nginx-texera-ingress-nginx-controller -n texera -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
MINIO_IP=$(kubectl get svc nginx-minio-ingress-nginx-controller -n texera -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

4. Create a StorageClass for Texera

Create a file named ebs-storage.yaml with the following content:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: auto-ebs-sc
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.eks.amazonaws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
  type: gp3
  encrypted: "true"

Then apply it:

kubectl apply -f ebs-storage.yaml

Validate the StorageClass:

kubectl get storageclass --all-namespaces

5. Prepare Texera deployment:

Execute the following bash commands.

curl -L -o texera.zip https://github.com/Texera/texera/releases/download/1.1.0/texera-cluster-1-1-0-release.zip
unzip texera.zip -d texera-cluster
rm texera.zip
helm dependency build texera-cluster

6. Deploy Texera

helm install texera texera-cluster --namespace texera --create-namespace \
  --set postgresql.primary.persistence.storageClass="auto-ebs-sc" \
  --set ingress-nginx.enabled=false \
  --set metrics-server.enabled=false \
  --set exampleDataLoader.enabled=false \
  --set minio.customIngress.enabled=true \
  --set minio.customIngress.ingressClassName=nginx-minio \
  --set minio.customIngress.texeraHostname="http://$TEXERA_IP" \
  --set minio.persistence.storageClass="auto-ebs-sc" \
  --set-string lakefs.lakefsConfig="$(cat <<EOF
database:
  type: postgres
blockstore:
  type: s3
  s3:
    endpoint: http://texera-minio:9000
    pre_signed_expiry: 15m
    pre_signed_endpoint: http://$MINIO_IP
    force_path_style: true
    credentials:
      access_key_id: texera_minio
      secret_access_key: password
EOF
)" \
  --set ingressPaths.hostname=""

Done!

  • It may take 10-15 minutes to fully launch the deployment.
  • During the process, you can periodically execute kubectl get pods -n texera to see the status of the deployed pods.
  • Once every pod is in a Running or Completed status, you can execute echo $TEXERA_IP to get the public IP of the Texera WebUI.
  • Then you can access Texera using http://<TEXERA_IP>.

To remove the Texera deployment from your Kubernetes cluster, execute the following bash commands.

helm uninstall texera -n texera
helm uninstall nginx-texera -n texera
helm uninstall nginx-minio -n texera

Advanced Configuration

You can customize the deployment by adding the following --set flags to your helm install command. These flags allow you to configure authentication, resource limits, and the number of pods for Texera deployment.

Texera Credentials

Texera relies on Postgres, MinIO and LakeFS that require credentials. You can change the default values to make your deployment more secure.

Default Texera Admin User

Texera ships with a built-in administrator account (username: texera, password: texera). To supply your own credentials during installation, pass the following Helm overrides:

# USER_SYS_ADMIN_USERNAME
--set texeraEnvVars[0].value="<user-name>" \
# USER_SYS_ADMIN_PASSWORD
--set texeraEnvVars[1].value="<password>" \

MinIO Authentication

--set minio.auth.rootUser=texera_minio \
--set minio.auth.rootPassword=password \

PostgreSQL Authentication (username is always postgres)

--set postgresql.auth.postgresPassword=root_password \

💡 Note: If you change the PostgreSQL password, you also need to change the following and add it to the install command:

--set lakefs.secrets.databaseConnectionString="postgres://postgres:root_password@texera-postgresql:5432/texera_lakefs?sslmode=disable" \

LakeFS Authentication

--set lakefs.auth.username=texera-admin \
--set lakefs.auth.accessKey=AKIAIOSFOLKFSSAMPLES \
--set lakefs.auth.secretKey=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY \
--set lakefs.secrets.authEncryptSecretKey=random_string_for_lakefs \

Allocating Resources

If your cluster has more available resources, you can allocate additional CPU, memory, and disks to Texera to improve the performance.

Postgres

To allocate more CPU, Memory and disk to Postgres, do:

--set postgresql.primary.resources.requests.cpu=4 \
--set postgresql.primary.resources.requests.memory=4Gi \
--set postgresql.primary.persistence.size=50Gi \

MinIO

To increase the storage for user's input dataset, do:

--set minio.persistence.size=100Gi

Computing Unit

To customize options for the computing unit, do:

# MAX_NUM_OF_RUNNING_COMPUTING_UNITS_PER_USER
--set texeraEnvVars[5].value="2" \
# CPU_OPTION_FOR_COMPUTING_UNIT
--set texeraEnvVars[6].value="1,2,4" \
# MEMORY_OPTION_FOR_COMPUTING_UNIT
--set texeraEnvVars[7].value="2Gi,4Gi,16Gi" \
# GPU_LIMIT_OPTIONS
--set texeraEnvVars[8].value="0,1" \ # to allow 0 or 1 GPU resource to be allocated

Adjusting Number of Pods

Scale out individual services for high availability or increased performance:

--set webserver.numOfPods=2 \
--set workflowCompilingService.numOfPods=2 \
--set pythonLanguageServer.replicaCount=2 \

Retaining User Data

By default, all user data stored by Texera will be deleted when the cluster deployment is removed. Since user data is valuable, you can preserve all datasets and files even after uninstalling the cluster by setting:

--set persistence.removeAfterUninstall=false
⚠️ **GitHub.com Fallback** ⚠️