Deploying Texera on Google Cloud Platform (GCP)

Prerequisites: Check your quota

Your GCP account should be able to allocate at least 20 vCPUs and 1 TB of SSD. To check your quota, go to the GCP Quotas page. You should be able to see a pre-populated query for listing the CPUs and SSDs in the us-central1 region by default. If you plan to deploy Texera in another region, you need to change the Dimensions part of the query. quota2 If your quota does not have at least 20 CPUs and 1 TB SSD, you need to request a quota increase by clicking the 3-dot button on the right -> "Edit Quota". quota3

1. Create an Autopilot GKE cluster

💡 Note: If you already have a GKE cluster and wish to use it for deploying Texera, you can skip this step and proceed directly to Step 2.

Navigate to GCP console -> Kubernetes Engine -> Clusters. Click on the create button.

💡 Note: You may need to enable the Kubernetes API if you haven't done so.

Use all default values to create a cluster. You can also customize the cluster accordingly if needed. After 15-20 minutes, you should be able to see the status of your cluster to be in a green checkmark( step0 0 ) state, with 0 vCPUs and 0 memory usage. step-0-2 Click the three dots on the right, and choose "connect". step-0-3

In the pop-up window, copy the project and region to your clipboard. Then click "Run in Cloud Shell". Press Enter for the first command shown on the terminal. step-0-4

2. Reserve Two Static IPs (for Texera website and MinIO)

After accessing your cluster using Cloud Shell, define the following variables based on your region and project in Step 1.

REGION="<YOUR_REGION>"
PROJECT="<YOUR_PROJECT>"

Execute the following bash commands to reserve two Public IP addresses.

gcloud compute addresses create texera-ip  --region=$REGION --project=$PROJECT
TEXERA_IP=$(gcloud compute addresses describe texera-ip --region=$REGION --format="get(address)"  --project $PROJECT)
gcloud compute addresses create minio-ip  --region=$REGION --project=$PROJECT
MINIO_IP=$(gcloud compute addresses describe minio-ip   --region=$REGION --format="get(address)"  --project $PROJECT)

Execute the following bash commands to create two nginx controllers with helm.

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-texera ingress-nginx/ingress-nginx \
  --namespace texera --create-namespace \
  --set controller.ingressClassResource.name=nginx \
  --set controller.ingressClassResource.controllerValue="k8s.io/ingress-nginx" \
  --set controller.ingressClass=nginx \
  --set controller.service.loadBalancerIP=$TEXERA_IP \
  --set controller.service.annotations."cloud\.google\.com/load-balancer-type"="External" \
  --set rbac.create=true

helm install nginx-minio ingress-nginx/ingress-nginx \
  --namespace texera \
  --set controller.ingressClassResource.name=nginx-minio \
  --set controller.ingressClassResource.controllerValue="k8s.io/nginx-minio" \
  --set controller.ingressClass=nginx-minio \
  --set controller.service.loadBalancerIP=$MINIO_IP \
  --set controller.service.annotations."cloud\.google\.com/load-balancer-type"="External" \
  --set rbac.create=true

3. Prepare Texera Installation

Execute the following bash commands.

curl -L -o texera.zip https://github.com/Texera/texera/releases/download/1.1.0/texera-cluster-1-1-0-release.zip
unzip texera.zip -d texera-cluster
rm texera.zip
helm dependency build texera-cluster

4. Deploy Texera

Execute the following bash command.

helm install texera texera-cluster --namespace texera --create-namespace \
  --set postgresql.primary.persistence.storageClass=standard-rwo \
  --set ingress-nginx.enabled=false \
  --set metrics-server.enabled=false \
  --set exampleDataLoader.enabled=false \
  --set minio.customIngress.enabled=true \
  --set minio.customIngress.ingressClassName=nginx-minio \
  --set minio.customIngress.texeraHostname="http://$TEXERA_IP" \
  --set minio.persistence.storageClass=standard-rwo \
  --set-string lakefs.lakefsConfig="$(cat <<EOF
database:
  type: postgres
blockstore:
  type: s3
  s3:
    endpoint: http://texera-minio:9000
    pre_signed_expiry: 15m
    pre_signed_endpoint: http://$MINIO_IP
    force_path_style: true
    credentials:
      access_key_id: texera_minio
      secret_access_key: password
EOF
)" \
  --set ingressPaths.hostname=""

Done!

It may take 10-15 minutes to fully launch the deployment.
During the process, you can periodically execute kubectl get pods -n texera to see the status of the deployed pods.
Once every pod is in a Running or Completed status, you can execute echo $TEXERA_IP to get the public IP of the Texera WebUI.
Then you can access Texera using http://<TEXERA_IP>.

To remove the Texera deployment from your Kubernetes cluster, execute the following bash commands.

helm uninstall texera -n texera
helm uninstall nginx-texera -n texera
helm uninstall nginx-minio -n texera

Note: You also need to release the 2 allocated IP addresses on GCP

Advanced Configuration

You can customize the deployment by adding the following --set flags to your helm install command. These flags allow you to configure authentication, resource limits, and the number of pods for Texera deployment.

Texera Credentials

Texera relies on Postgres, MinIO and LakeFS that require credentials. You can change the default values to make your deployment more secure.

Default Texera Admin User

Texera ships with a built-in administrator account (username: texera, password: texera). To supply your own credentials during installation, pass the following Helm overrides:

# USER_SYS_ADMIN_USERNAME
--set texeraEnvVars[0].value="<user-name>" \
# USER_SYS_ADMIN_PASSWORD
--set texeraEnvVars[1].value="<password>" \

MinIO Authentication

--set minio.auth.rootUser=texera_minio \
--set minio.auth.rootPassword=password \

PostgreSQL Authentication (username is always postgres)

--set postgresql.auth.postgresPassword=root_password \

💡 Note: If you change the PostgreSQL password, you also need to change the following and add it to the install command:

--set lakefs.secrets.databaseConnectionString="postgres://postgres:root_password@texera-postgresql:5432/texera_lakefs?sslmode=disable" \

LakeFS Authentication

--set lakefs.auth.username=texera-admin \
--set lakefs.auth.accessKey=AKIAIOSFOLKFSSAMPLES \
--set lakefs.auth.secretKey=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY \
--set lakefs.secrets.authEncryptSecretKey=random_string_for_lakefs \

Allocating Resources

If your cluster has more available resources, you can allocate additional CPU, memory, and disks to Texera to improve the performance.

Postgres

To allocate more CPU, Memory and disk to Postgres, do:

--set postgresql.primary.resources.requests.cpu=4 \
--set postgresql.primary.resources.requests.memory=4Gi \
--set postgresql.primary.persistence.size=50Gi \

MinIO

To increase the storage for user's input dataset, do:

--set minio.persistence.size=100Gi

Computing Unit

To customize options for the computing unit, do:

# MAX_NUM_OF_RUNNING_COMPUTING_UNITS_PER_USER
--set texeraEnvVars[5].value="2" \
# CPU_OPTION_FOR_COMPUTING_UNIT
--set texeraEnvVars[6].value="1,2,4" \
# MEMORY_OPTION_FOR_COMPUTING_UNIT
--set texeraEnvVars[7].value="2Gi,4Gi,16Gi" \
# GPU_LIMIT_OPTIONS
--set texeraEnvVars[8].value="0,1" \ # to allow 0 or 1 GPU resource to be allocated

Adjusting Number of Pods

Scale out individual services for high availability or increased performance:

--set webserver.numOfPods=2 \
--set workflowCompilingService.numOfPods=2 \
--set pythonLanguageServer.replicaCount=2 \

Retaining User Data

By default, all user data stored by Texera will be deleted when the cluster deployment is removed. Since user data is valuable, you can preserve all datasets and files even after uninstalling the cluster by setting:

--set persistence.removeAfterUninstall=false

Deploying Texera on Google Cloud Platform (GCP) - apache/texera GitHub Wiki

Prerequisites: Check your quota

1. Create an Autopilot GKE cluster

2. Reserve Two Static IPs (for Texera website and MinIO)

3. Prepare Texera Installation

4. Deploy Texera

Done!

Advanced Configuration

Texera Credentials

Allocating Resources

Adjusting Number of Pods

Retaining User Data

⚠️ GitHub.com Fallback ⚠️

Deploying Texera on Google Cloud Platform (GCP) - apache/texera GitHub Wiki

Prerequisites: Check your quota

1. Create an Autopilot GKE cluster

2. Reserve Two Static IPs (for Texera website and MinIO)

3. Prepare Texera Installation

4. Deploy Texera

Done!

Advanced Configuration

Texera Credentials

Allocating Resources

Adjusting Number of Pods

Retaining User Data

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️