Deploying Texera on Amazon Web Services(AWS) - Texera/texera GitHub Wiki
Go to the EKS Console and log in with your AWS account. Click "Create Cluster".

Use the default configuration to create your cluster, and give it a name of your choice. If "Cluster IAM role" and "Node IAM role" are empty, click "Create recommended role" and follow the guided steps. Then, click "Create".

The cluster will take about 15–20 minutes to be created and reach an Active state. You can refresh the dashboard to monitor the progress.

To access the cluster, you have the following 2 options:
-
Follow the AWS CLI installation guide to install the AWS CLI on your local enviornment. Once installed, create a passkey by clicking your account name in the top right corner → Security credentials.
On the security credentials page, click "Create access key".
Follow the prompt and copy the access key and secret.
Then open a terminal on your computer and enter
aws configure
, paste the copied credentials. Also enter the default region shown on your cluster dashboard when prompted, leave the "output format" as its default value:
-
Use AWS CloudShell. Make sure the cloudshell is running on the same region with your cluster.
Once you have a terminal (either local or CloudShell) ready to run aws
commands, set the environment variable:
EKS_CLUSTER_NAME=<your-cluster-name>
Update your kubeconfig to use the new cluster:
aws eks update-kubeconfig --name $EKS_CLUSTER_NAME
Verify the connection:
kubectl get all
By default, AWS does not assign external IPs to LoadBalancers unless the subnets are properly tagged. To enable both public and internal LoadBalancers, run the following command to tag your subnets:
aws ec2 create-tags \
--resources $(aws eks describe-cluster --name $EKS_CLUSTER_NAME --query "cluster.resourcesVpcConfig.subnetIds" --output text) \
--tags Key=kubernetes.io/role/elb,Value=1 Key=kubernetes.io/role/internal-elb,Value=1
Install Helm by following the Helm installation guide. Then execute the following commands in your terminal:
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm upgrade --install nginx-texera ingress-nginx/ingress-nginx \
--namespace texera --create-namespace \
--set controller.replicaCount=1 \
--set controller.ingressClassResource.name=nginx \
--set controller.ingressClassResource.controllerValue="k8s.io/ingress-nginx" \
--set controller.ingressClass=nginx \
--set controller.service.type=LoadBalancer \
--set controller.service.annotations."service\.beta\.kubernetes\.io/aws-load-balancer-scheme"="internet-facing"
helm upgrade --install nginx-minio ingress-nginx/ingress-nginx \
--namespace texera --create-namespace \
--set controller.replicaCount=1 \
--set controller.ingressClassResource.name=nginx-minio \
--set controller.ingressClassResource.controllerValue="k8s.io/nginx-minio" \
--set controller.ingressClass=nginx-minio \
--set controller.service.type=LoadBalancer \
--set controller.service.annotations."service\.beta\.kubernetes\.io/aws-load-balancer-scheme"="internet-facing"
Wait for 1-2 minutes, then run:
kubectl get svc -n texera
When the EXTERNAL-IP fields are populated, assign the hostnames to environment variables:
TEXERA_IP=$(kubectl get svc nginx-texera-ingress-nginx-controller -n texera -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
MINIO_IP=$(kubectl get svc nginx-minio-ingress-nginx-controller -n texera -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
Create a file named ebs-storage.yaml
with the following content:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: auto-ebs-sc
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.eks.amazonaws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
type: gp3
encrypted: "true"
Then apply it:
kubectl apply -f ebs-storage.yaml
Validate the StorageClass:
kubectl get storageclass --all-namespaces
Execute the following bash commands.
curl -L -o texera.zip https://github.com/Texera/texera/releases/download/1.1.0/texera-cluster-1-1-0-release.zip
unzip texera.zip -d texera-cluster
rm texera.zip
helm dependency build texera-cluster
helm install texera texera-cluster --namespace texera --create-namespace \
--set postgresql.primary.persistence.storageClass="auto-ebs-sc" \
--set ingress-nginx.enabled=false \
--set metrics-server.enabled=false \
--set exampleDataLoader.enabled=false \
--set minio.customIngress.enabled=true \
--set minio.customIngress.ingressClassName=nginx-minio \
--set minio.customIngress.texeraHostname="http://$TEXERA_IP" \
--set minio.persistence.storageClass="auto-ebs-sc" \
--set-string lakefs.lakefsConfig="$(cat <<EOF
database:
type: postgres
blockstore:
type: s3
s3:
endpoint: http://texera-minio:9000
pre_signed_expiry: 15m
pre_signed_endpoint: http://$MINIO_IP
force_path_style: true
credentials:
access_key_id: texera_minio
secret_access_key: password
EOF
)" \
--set ingressPaths.hostname=""
- It may take 10-15 minutes to fully launch the deployment.
- During the process, you can periodically execute
kubectl get pods -n texera
to see the status of the deployed pods. - Once every pod is in a
Running
orCompleted
status, you can executeecho $TEXERA_IP
to get the public IP of the Texera WebUI. - Then you can access Texera using
http://<TEXERA_IP>
.
To remove the Texera deployment from your Kubernetes cluster, execute the following bash commands.
helm uninstall texera -n texera
helm uninstall nginx-texera -n texera
helm uninstall nginx-minio -n texera
You can customize the deployment by adding the following --set flags to your helm install command. These flags allow you to configure authentication, resource limits, and the number of pods for Texera deployment.
Texera relies on Postgres, MinIO and LakeFS that require credentials. You can change the default values to make your deployment more secure.
Default Texera Admin User
Texera ships with a built-in administrator account (username: texera, password: texera). To supply your own credentials during installation, pass the following Helm overrides:
# USER_SYS_ADMIN_USERNAME
--set texeraEnvVars[0].value="<user-name>" \
# USER_SYS_ADMIN_PASSWORD
--set texeraEnvVars[1].value="<password>" \
MinIO Authentication
--set minio.auth.rootUser=texera_minio \
--set minio.auth.rootPassword=password \
PostgreSQL Authentication (username is always postgres)
--set postgresql.auth.postgresPassword=root_password \
💡 Note: If you change the PostgreSQL password, you also need to change the following and add it to the install command:
--set lakefs.secrets.databaseConnectionString="postgres://postgres:root_password@texera-postgresql:5432/texera_lakefs?sslmode=disable" \
LakeFS Authentication
--set lakefs.auth.username=texera-admin \
--set lakefs.auth.accessKey=AKIAIOSFOLKFSSAMPLES \
--set lakefs.auth.secretKey=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY \
--set lakefs.secrets.authEncryptSecretKey=random_string_for_lakefs \
If your cluster has more available resources, you can allocate additional CPU, memory, and disks to Texera to improve the performance.
Postgres
To allocate more CPU, Memory and disk to Postgres, do:
--set postgresql.primary.resources.requests.cpu=4 \
--set postgresql.primary.resources.requests.memory=4Gi \
--set postgresql.primary.persistence.size=50Gi \
MinIO
To increase the storage for user's input dataset, do:
--set minio.persistence.size=100Gi
Computing Unit
To customize options for the computing unit, do:
# MAX_NUM_OF_RUNNING_COMPUTING_UNITS_PER_USER
--set texeraEnvVars[5].value="2" \
# CPU_OPTION_FOR_COMPUTING_UNIT
--set texeraEnvVars[6].value="1,2,4" \
# MEMORY_OPTION_FOR_COMPUTING_UNIT
--set texeraEnvVars[7].value="2Gi,4Gi,16Gi" \
# GPU_LIMIT_OPTIONS
--set texeraEnvVars[8].value="0,1" \ # to allow 0 or 1 GPU resource to be allocated
Scale out individual services for high availability or increased performance:
--set webserver.numOfPods=2 \
--set workflowCompilingService.numOfPods=2 \
--set pythonLanguageServer.replicaCount=2 \
By default, all user data stored by Texera will be deleted when the cluster deployment is removed. Since user data is valuable, you can preserve all datasets and files even after uninstalling the cluster by setting:
--set persistence.removeAfterUninstall=false