Terraform and GKE - lago-morph/chiller GitHub Wiki

  • Broadly following the Hashicorp GKE tutorial
  • Start with the tutorial files from Hashicorp.
  • Created a project (chiller-load-test) using the gcloud manage resources console.
  • Chose the e2-medium instance to use. 2 vCPU, 4GB memory.
  • I should create a service account with gcloud iam service-accounts create and gcloud iam service-accounts keys create. I will get things working with my admin account, then modify later. I would never do this with a read gcloud account - but I'm doing a tutorial (and modifying for other uses) on a free-tier account without billing activated, so my security posture can be a little relaxed.
  • Enabled three APIs on the project. Compute Engine, Kubernetes Engine, and Service Usage. A bunch of other APIs I guess were enabled by default (can see them here)
  • Set the gcloud project (and quota-project) with gcloud config set project chiller-load-test and gcloud auth application-default set-quota-project chiller-load-test.
  • terraform apply to set up initial GKE cluster. This takes a while (10 minutes)
    • Had do edit gke.tf to set the proper version prefix for the Kubernetes version. It looks like Google considers Kubernetes 1.29.x to be "stable", and you have to go to the "rapid" release channel to get 1.30.x. I'll stick to 1.29.x for now, as there are no particular features I need from 1.30.
    • Had to add a variable, "zone". I want a zonal cluster (all nodes in one gcloud zone), but some resources are based on regions. So add another tf variable for this. Both the container cluster and node pool have to have the location updated (I found out about the second one after waiting 10 minutes...).
  • Interestingly, when it errored out on the node pool, it did not roll back the creation of the vpc or the container cluster. I don't know if that is a bug or a feature...
  • I need to import credentials so I can use kubectl on my local machine. This also demonstrates using the terraform outputs as inputs to another command, which I quite like.
    • First, the terraform outputs have to be modified to add the zone variable that I defined above. Do that and run terraform apply again (won't take 10 minutes this time I hope)
    • Need to install a kubectl plugin so it can use the GKE credentials I'll be downloading. Done with sudo apt-get install google-cloud-cli-gke-gcloud-auth-plugin
    • Use the modified (using zone instead of region) command from the tutorial, run gcloud container clusters get-credentials $(terraform output -raw kubernetes_cluster_name) --zone $(terraform output -raw zone)
  • And it is live. kubectl get all -A shows that there is a lot of stuff already there.
  • Turns out that GKE has a managed Prometheus service. I'm thinking that I want to install Prometheus and Grafana myself to learn about how to configure it and use it to determine the health of the application. If I use the managed service, I'll only know how to deal with it on a GKE cluster.
  • Continue following the tutorial to get the dashboard installed and working.
    • run kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml
    • Unfortunately, that was not the right thing to do. So I get to undo it. Run the same thing, but replace "apply" with "delete".
    • run kubectl delete -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml
    • One is supposed to use a helm install for current versions of dashboard. See the documentation.
helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/
helm upgrade --install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard --create-namespace --namespace kubernetes-dashboard
kubectl -n kubernetes-dashboard port-forward --address=192.168.88.130 svc/kubernetes-dashboard-web 8001:8000

Now a web browser at 192.168.88.130:8001 gives me the dashboard page (it wants a bearer token to authenticate). Now create a service account and get the bearer token from it

kubectl apply -f kubernetes-dashboard-admin.rbac.yaml
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep service-controller-token | awk '{print $1}')

Copy and paste the big-ass token into the web page. And it doesn't work. Apparently you can't do this if you are using http. And the service I port-forwarded only supports http. And if I forward the service that DOES support https, kubernetes-dashboard-kong-proxy, then the connection breaks when I try to connect. I tried upgrading it, enabling nginx, and it didn't break, but then I get a 404 not found. Going to just move on to my own app for this.

And of course, after beating my head against the wall trying to get the damn dashboard to work, I deployed the Watch and Chill app and it worked first time, no problems. Just had to port-forward to my local machine so I can get to it:

kubectl port-forward --address=192.168.88.130 svc/chiller-frontend 8080:80
  • Get rid of cluster with terraform destroy. Will this work? I may need to update and allow to be destroyed first... No I don't have to worry. For version 5.0+ of the provider I have to set deletion_protection = false in the container cluster resource. But the tutorial files I started with pinned the version at 4.x something. Looks like it takes a long time to destroy a cluster too.