DNSError in Operator version 4.5.2 - cniackz/public GitHub Wiki

How to repro:

  1. Create a cluster
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  apiServerAddress: "127.0.0.1"
  apiServerPort: 6443
nodes:
  - role: control-plane
    extraPortMappings:
    - containerPort: 30080
      hostPort: 30080
      listenAddress: "127.0.0.1"
      protocol: TCP
  - role: worker
    extraPortMappings:
    - containerPort: 30081
      hostPort: 30081
      listenAddress: "127.0.0.1"
      protocol: TCP
  - role: worker
    extraPortMappings:
    - containerPort: 30082
      hostPort: 30082
      listenAddress: "127.0.0.1"
      protocol: TCP
  - role: worker
    extraPortMappings:
    - containerPort: 30083
      hostPort: 30083
      listenAddress: "127.0.0.1"
      protocol: TCP
  - role: worker
    extraPortMappings:
    - containerPort: 30084
      hostPort: 30084
      listenAddress: "127.0.0.1"
      protocol: TCP
kind delete cluster
kind create cluster --config kind-config.yaml
  1. Install Operator
helm install \
     --namespace minio \
     --create-namespace minio \
     operator-4.5.2
  1. Create a Tenant with Two Pools from the UI:

  2. Look at the issue in MinIO Pods:

API: SYSTEM()
Time: 22:41:09 UTC 05/02/2023
Error: lookup cesar-pool-0-3.cesar-hl.minio.svc.cluster.local on 10.96.0.10:53: no such host (*net.DNSError)
       host="cesar-pool-0-3.cesar-hl.minio.svc.cluster.local"
       8: internal/logger/logonce.go:118:logger.(*logOnceType).logOnceIf()
       7: internal/logger/logonce.go:149:logger.LogOnceIf()
       6: cmd/endpoint.go:430:cmd.hostResolveToLocalhost()
       5: cmd/endpoint.go:477:cmd.Endpoints.UpdateIsLocal()
       4: cmd/endpoint.go:675:cmd.CreateEndpoints()
       3: cmd/endpoint-ellipses.go:385:cmd.createServerEndpoints()
       2: cmd/server-main.go:232:cmd.serverHandleCmdArgs()
       1: cmd/server-main.go:508:cmd.serverMain()

Root Cause:

  • This is hardcoded in our Operator v4.5.2 and it shouldn't be:
        startupProbe:
          failureThreshold: 30
          httpGet:
            path: /minio/health/live
            port: 9000
            scheme: HTTPS
          periodSeconds: 1
          successThreshold: 1
          timeoutSeconds: 1

How to fix it:

helm upgrade \
     --namespace minio \
     minio operator-4.5.3
  • Then terminate the statefulsets and wait for new Statefulsets to get created by Operator v4.5.3

  • As a result startupProbe will be removed and pods will start properly

Conclusion:

We can't have two pools in Operator version 4.5.2 because of DNSError