U1.61 Ubuntu Quick Start (QS): Kubernetes PostgreSQL HA Cluster on premises - chempkovsky/CS2WPF-and-CS2XAMARIN GitHub Wiki

Reading

We start with

Click to show Kubernetes cluster info
yury@u2004s01:~$ kubectl get nodes -o wide
NAME       STATUS   ROLES                  AGE     VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
u2004s01   Ready    control-plane,master   14m     v1.23.2   192.168.100.61   <none>        Ubuntu 20.04.3 LTS   5.4.0-91-generic   docker://20.10.12
u2004s02   Ready    <none>                 9m42s   v1.23.2   192.168.100.62   <none>        Ubuntu 20.04.3 LTS   5.4.0-91-generic   docker://20.10.12
u2004s03   Ready    <none>                 8m2s    v1.23.2   192.168.100.63   <none>        Ubuntu 20.04.3 LTS   5.4.0-91-generic   docker://20.10.12
u2004s04   Ready    <none>                 7m      v1.23.2   192.168.100.64   <none>        Ubuntu 20.04.3 LTS   5.4.0-91-generic   docker://20.10.12

yury@u2004s01:~$ kubectl get pods -n second-local-path-storage
NAME                                                              READY   STATUS    RESTARTS   AGE
second-local-path-storage-local-path-provisioner-75958b75882bf8   1/1     Running   0          36s

Installing Zalando Postgres Operator

git clone https://github.com/zalando/postgres-operator.git
cd postgres-operator
kubectl create -f manifests/configmap.yaml
kubectl create -f manifests/operator-service-account-rbac.yaml
kubectl create -f manifests/postgres-operator.yaml
kubectl create -f manifests/api-service.yaml

kubectl apply -k github.com/zalando/postgres-operator/manifests

yury@u2004s01:~/postgres-operator$ kubectl get pods -n default -o wide
NAME                                 READY   STATUS    RESTARTS   AGE    IP              NODE       NOMINATED NODE   READINESS GATES
postgres-operator-849dddc998-gbhcg   1/1     Running   0          4m5s   10.32.121.129   u2004s04   <none>           <none>
Click to show the responses
yury@u2004s01:~$ git clone https://github.com/zalando/postgres-operator.git
Cloning into 'postgres-operator'...
remote: Enumerating objects: 23247, done.
remote: Counting objects: 100% (366/366), done.
remote: Compressing objects: 100% (226/226), done.
remote: Total 23247 (delta 221), reused 228 (delta 123), pack-reused 22881
Receiving objects: 100% (23247/23247), 8.82 MiB | 9.20 MiB/s, done.
Resolving deltas: 100% (16633/16633), done.
yury@u2004s01:~$ cd postgres-operator
yury@u2004s01:~/postgres-operator$ sudo nano  manifests/configmap.yaml
[sudo] password for yury:
yury@u2004s01:~/postgres-operator$ kubectl create -f manifests/configmap.yaml
configmap/postgres-operator created
yury@u2004s01:~/postgres-operator$ kubectl create -f manifests/operator-service-account-rbac.yaml
serviceaccount/postgres-operator created
clusterrole.rbac.authorization.k8s.io/postgres-operator created
clusterrolebinding.rbac.authorization.k8s.io/postgres-operator created
clusterrole.rbac.authorization.k8s.io/postgres-pod created
yury@u2004s01:~/postgres-operator$ kubectl create -f manifests/postgres-operator.yaml
deployment.apps/postgres-operator created
yury@u2004s01:~/postgres-operator$ kubectl create -f manifests/api-service.yaml
service/postgres-operator created
yury@u2004s01:~/postgres-operator$ kubectl get pods -n default -o wide
NAME                                 READY   STATUS    RESTARTS   AGE    IP              NODE       NOMINATED NODE   READINESS GATES
postgres-operator-849dddc998-gbhcg   1/1     Running   0          4m5s   10.32.121.129   u2004s04   <none>           <none>

Installing Zalando Postgres operator UI

kubectl apply -f ui/manifests/
Click to show the responses (WITH ERROR)
yury@u2004s01:~/postgres-operator$ kubectl apply -f ui/manifests/
deployment.apps/postgres-operator-ui created
ingress.networking.k8s.io/postgres-operator-ui created
service/postgres-operator-ui created
serviceaccount/postgres-operator-ui created
clusterrole.rbac.authorization.k8s.io/postgres-operator-ui created
clusterrolebinding.rbac.authorization.k8s.io/postgres-operator-ui created
error: unable to recognize "ui/manifests/kustomization.yaml": no matches for kind "Kustomization" in version "kustomize.config.k8s.io/v1beta1"

yury@u2004s01:~/postgres-operator$ kubectl get pods -n default -o wide
NAME                                    READY   STATUS    RESTARTS   AGE   IP              NODE       NOMINATED NODE   READINESS GATES
postgres-operator-849dddc998-gbhcg      1/1     Running   0          49m   10.32.121.129   u2004s04   <none>           <none>
postgres-operator-ui-5889cfdc78-vx7zb   1/1     Running   0          20m   10.32.105.1     u2004s03   <none>           <none>
  • for u2004s01
kubectl port-forward --address 0.0.0.0 svc/postgres-operator-ui 8081:80
  • or
kubectl port-forward --address 0.0.0.0 pod/postgres-operator-ui-5889cfdc78-vx7zb 8081:80
  • we could not connect to the pod using the browser
Click to show the log
operator_ui.spiloutils INFO     Common Cluster Label: {"application":"spilo"}
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.8/logging/handlers.py", line 934, in emit
    self.socket.send(msg)
  File "/usr/lib/python3.8/site-packages/gevent/_socketcommon.py", line 722, in send
    return self._sock.send(data, flags)
  File "/usr/lib/python3.8/site-packages/gevent/_socket3.py", line 55, in _dummy
    raise OSError(EBADF, 'Bad file descriptor')
OSError: [Errno 9] Bad file descriptor

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/logging/handlers.py", line 855, in _connect_unixsocket
    self.socket.connect(address)
  File "/usr/lib/python3.8/site-packages/gevent/_socketcommon.py", line 628, in connect
    raise _SocketError(result, strerror(result))
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/logging/handlers.py", line 937, in emit
    self._connect_unixsocket(self.address)
  File "/usr/lib/python3.8/logging/handlers.py", line 866, in _connect_unixsocket
    self.socket.connect(address)
  File "/usr/lib/python3.8/site-packages/gevent/_socketcommon.py", line 628, in connect
    raise _SocketError(result, strerror(result))
FileNotFoundError: [Errno 2] No such file or directory
Call stack:
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/operator_ui/__main__.py", line 1, in <module>
    from .main import main
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/operator_ui/main.py", line 43, in <module>
    from .spiloutils import (
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/operator_ui/spiloutils.py", line 26, in <module>
    logger.info("Common Cluster Label: {}".format(COMMON_CLUSTER_LABEL))
Message: 'Common Cluster Label: {"application":"spilo"}'
Arguments: ()
operator_ui.spiloutils INFO     Common Pooler Label: {"application":"db-connection-pooler"}
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.8/logging/handlers.py", line 934, in emit
    self.socket.send(msg)
  File "/usr/lib/python3.8/site-packages/gevent/_socketcommon.py", line 722, in send
    return self._sock.send(data, flags)
  File "/usr/lib/python3.8/site-packages/gevent/_socket3.py", line 55, in _dummy
    raise OSError(EBADF, 'Bad file descriptor')
OSError: [Errno 9] Bad file descriptor
...
  • for u2004s01
kubectl delete -f ui/manifests/

Create a Postgres cluster

First attempt

  • now we will use the manifests/minimal-postgres-manifest.yaml-without changes
  • for u2004s01
kubectl apply -f- <<EOF
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: acid-minimal-cluster
  namespace: default
spec:
  teamId: "acid"
  volume:
    size: 1Gi
  numberOfInstances: 2
  users:
    zalando:  # database owner
    - superuser
    - createdb
    foo_user: []  # role for application foo
  databases:
    foo: zalando  # dbname: owner
  preparedDatabases:
    bar: {}
  postgresql:
    version: "14"
EOF
  • we got the STATUS == CreateFailed
yury@u2004s01:~/postgres-operator$ kubectl get pods -n default -o wide
NAME                                 READY   STATUS    RESTARTS   AGE     IP              NODE       NOMINATED NODE   READINESS GATES
acid-minimal-cluster-0               1/1     Running   0          5m45s   10.32.105.3     u2004s03   <none>           <none>
acid-minimal-cluster-1               1/1     Running   0          4m11s   10.32.121.131   u2004s04   <none>           <none>
postgres-operator-849dddc998-gbhcg   1/1     Running   0          109m    10.32.121.129   u2004s04   <none>           <none>

yury@u2004s01:~/postgres-operator$ kubectl get postgresql
NAME                   TEAM   VERSION   PODS   VOLUME   CPU-REQUEST   MEMORY-REQUEST   AGE     STATUS
acid-minimal-cluster   acid   14        2      1Gi                                     7m20s   CreateFailed

yury@u2004s01:~/postgres-operator$ kubectl get svc -l application=spilo -L spilo-role
NAME                          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE     SPILO-ROLE
acid-minimal-cluster          ClusterIP   10.110.78.245    <none>        5432/TCP   8m4s    master
acid-minimal-cluster-config   ClusterIP   None             <none>        <none>     6m26s
acid-minimal-cluster-repl     ClusterIP   10.109.123.102   <none>        5432/TCP   8m4s    replica

Second attempt

  • at first we delete the cluster
    • for u2004s01
kubectl delete -f- <<EOF
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: acid-minimal-cluster
  namespace: default
spec:
  teamId: "acid"
  volume:
    size: 1Gi
  numberOfInstances: 2
  users:
    zalando:  # database owner
    - superuser
    - createdb
    foo_user: []  # role for application foo
  databases:
    foo: zalando  # dbname: owner
  preparedDatabases:
    bar: {}
  postgresql:
    version: "14"
EOF
  • at second we try create the cluster with volume.size=2Gi
    • for u2004s01
kubectl create -f- <<EOF
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: acid-minimal-cluster
  namespace: default
spec:
  teamId: "acid"
  volume:
    size: 2Gi
  numberOfInstances: 2
  users:
    zalando:  # database owner
    - superuser
    - createdb
    foo_user: []  # role for application foo
  databases:
    foo: zalando  # dbname: owner
  preparedDatabases:
    bar: {}
  postgresql:
    version: "14"
EOF
  • Here is a result
  • we got the STATUS == CreateFailed
yury@u2004s01:~/postgres-operator$ kubectl get pods -n default -o wide
NAME                                 READY   STATUS    RESTARTS   AGE    IP              NODE       NOMINATED NODE   READINESS GATES
acid-minimal-cluster-0               1/1     Running   0          33s    10.32.105.6     u2004s03   <none>           <none>
acid-minimal-cluster-1               1/1     Running   0          20s    10.32.121.134   u2004s04   <none>           <none>
postgres-operator-849dddc998-gbhcg   1/1     Running   0          115m   10.32.121.129   u2004s04   <none>           <none>

yury@u2004s01:~/postgres-operator$ kubectl get postgresql
NAME                   TEAM   VERSION   PODS   VOLUME   CPU-REQUEST   MEMORY-REQUEST   AGE    STATUS
acid-minimal-cluster   acid   14        2      2Gi                                     9m3s   CreateFailed

yury@u2004s01:~/postgres-operator$ kubectl get postgresql
NAME                   TEAM   VERSION   PODS   VOLUME   CPU-REQUEST   MEMORY-REQUEST   AGE   STATUS
acid-minimal-cluster   acid   14        2      2Gi                                     32m   CreateFailed
Click to show the log of acid-minimal-cluster-0 : Containers : postgres : Logs
2022-01-23 20:52:53,076 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2022-01-23 20:52:55,091 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2022-01-23 20:52:55,094 - bootstrapping - INFO - No meta-data available for this provider
2022-01-23 20:52:55,095 - bootstrapping - INFO - Looks like your running local
2022-01-23 20:52:55,160 - bootstrapping - INFO - Configuring pam-oauth2
2022-01-23 20:52:55,161 - bootstrapping - INFO - Writing to file /etc/pam.d/postgresql
2022-01-23 20:52:55,161 - bootstrapping - INFO - Configuring certificate
2022-01-23 20:52:55,161 - bootstrapping - INFO - Generating ssl self-signed certificate
2022-01-23 20:52:55,414 - bootstrapping - INFO - Configuring wal-e
2022-01-23 20:52:55,414 - bootstrapping - INFO - Configuring standby-cluster
2022-01-23 20:52:55,414 - bootstrapping - INFO - Configuring pgbouncer
2022-01-23 20:52:55,415 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2022-01-23 20:52:55,415 - bootstrapping - INFO - Configuring crontab
2022-01-23 20:52:55,415 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of SYS_NICE capability
2022-01-23 20:52:55,416 - bootstrapping - INFO - Configuring bootstrap
2022-01-23 20:52:55,416 - bootstrapping - INFO - Configuring pgqd
2022-01-23 20:52:55,417 - bootstrapping - INFO - Configuring patroni
2022-01-23 20:52:55,441 - bootstrapping - INFO - Writing to file /run/postgres.yml
2022-01-23 20:52:55,450 - bootstrapping - INFO - Configuring log
2022-01-23 20:52:57,092 INFO: Selected new K8s API server endpoint https://192.168.100.61:6443
2022-01-23 20:52:57,143 INFO: No PostgreSQL configuration items changed, nothing to reload.
2022-01-23 20:52:57,160 INFO: Lock owner: None; I am acid-minimal-cluster-0
2022-01-23 20:52:57,391 INFO: trying to bootstrap a new cluster
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /home/postgres/pgdata/pgroot/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    /usr/lib/postgresql/14/bin/pg_ctl -D /home/postgres/pgdata/pgroot/data -l logfile start

2022-01-23 20:53:01,148 INFO: postmaster pid=80
/var/run/postgresql:5432 - no response
2022-01-23 20:53:01 UTC [80]: [1-1] 61edc02d.50 0     LOG:  Auto detecting pg_stat_kcache.linux_hz parameter...
2022-01-23 20:53:01 UTC [80]: [2-1] 61edc02d.50 0     LOG:  pg_stat_kcache.linux_hz is set to 250000
2022-01-23 20:53:01 UTC [80]: [3-1] 61edc02d.50 0     LOG:  redirecting log output to logging collector process
2022-01-23 20:53:01 UTC [80]: [4-1] 61edc02d.50 0     HINT:  Future log output will appear in directory "../pg_log".
/var/run/postgresql:5432 - accepting connections
/var/run/postgresql:5432 - accepting connections
2022-01-23 20:53:02,216 INFO: establishing a new patroni connection to the postgres cluster
2022-01-23 20:53:02,282 INFO: running post_bootstrap
DO
DO
DO
CREATE EXTENSION
NOTICE:  version "1.1" of extension "pg_auth_mon" is already installed
ALTER EXTENSION
GRANT
CREATE EXTENSION
NOTICE:  version "1.4" of extension "pg_cron" is already installed
ALTER EXTENSION
ALTER POLICY
REVOKE
GRANT
GRANT
ERROR:  cannot change name of input parameter "job_name"
HINT:  Use DROP FUNCTION cron.schedule(text,text,text) first.
REVOKE
GRANT
REVOKE
GRANT
REVOKE
GRANT
GRANT
CREATE EXTENSION
DO
CREATE TABLE
GRANT
ALTER TABLE
ALTER TABLE
ALTER TABLE
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
RESET
SET
NOTICE:  schema "zmon_utils" does not exist, skipping
DROP SCHEMA
DO
NOTICE:  language "plpythonu" does not exist, skipping
DROP LANGUAGE
NOTICE:  function plpython_call_handler() does not exist, skipping
DROP FUNCTION
NOTICE:  function plpython_inline_handler(internal) does not exist, skipping
DROP FUNCTION
NOTICE:  function plpython_validator(oid) does not exist, skipping
DROP FUNCTION
CREATE SCHEMA
GRANT
SET
CREATE TYPE
CREATE FUNCTION
CREATE FUNCTION
GRANT
You are now connected to database "postgres" as user "postgres".
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
GRANT
RESET
CREATE EXTENSION
CREATE EXTENSION
CREATE EXTENSION
NOTICE:  version "3.0" of extension "set_user" is already installed
ALTER EXTENSION
GRANT
GRANT
GRANT
CREATE SCHEMA
GRANT
GRANT
SET
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
RESET
You are now connected to database "template1" as user "postgres".
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
GRANT
RESET
CREATE EXTENSION
CREATE EXTENSION
CREATE EXTENSION
NOTICE:  version "3.0" of extension "set_user" is already installed
ALTER EXTENSION
GRANT
GRANT
GRANT
CREATE SCHEMA
GRANT
GRANT
SET
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
RESET
2022-01-23 20:53:10,277 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
2022-01-23 20:53:10,541 INFO: initialized a new cluster
2022-01-23 20:53:22,031 INFO: no action. I am (acid-minimal-cluster-0) the leader with the lock
2022-01-23 20:53:23,829 INFO: no action. I am (acid-minimal-cluster-0) the leader with the lock
2022-01-23 20:53:34,336 INFO: no action. I am (acid-minimal-cluster-0) the leader with the lock
2022-01-23 20:53:44,483 INFO: no action. I am (acid-minimal-cluster-0) the leader with the lock
2022-01-23 20:53:54,292 INFO: no action. I am (acid-minimal-cluster-0) the leader with the lock
2022-01-23 20:53:56.590 35 LOG Starting pgqd 3.3
2022-01-23 20:53:56.591 35 LOG auto-detecting dbs ...
2022-01-23 20:54:04,418 INFO: no action. I am (acid-minimal-cluster-0) the leader with the lock
2022-01-23 20:54:14,327 INFO: no action. I am (acid-minimal-cluster-0) the leader with the lock
2022-01-23 20:54:24,404 INFO: no action. I am (acid-minimal-cluster-0) the leader with the lock

...

2022-01-23 21:34:27.791 35 LOG {ticks: 0, maint: 0, retry: 0}
2022-01-23 21:34:34,218 INFO: no action. I am (acid-minimal-cluster-0) the leader with the lock
2022-01-23 21:34:44,271 INFO: no action. I am (acid-minimal-cluster-0) the leader with the lock
2022-01-23 21:34:54,220 INFO: no action. I am (acid-minimal-cluster-0) the leader with the lock
Click to show the log of acid-minimal-cluster-1 : Containers : postgres : Logs
2022-01-23 20:53:05,517 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2022-01-23 20:53:07,528 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2022-01-23 20:53:07,531 - bootstrapping - INFO - No meta-data available for this provider
2022-01-23 20:53:07,532 - bootstrapping - INFO - Looks like your running local
2022-01-23 20:53:07,639 - bootstrapping - INFO - Configuring log
2022-01-23 20:53:07,640 - bootstrapping - INFO - Configuring standby-cluster
2022-01-23 20:53:07,640 - bootstrapping - INFO - Configuring pgqd
2022-01-23 20:53:07,641 - bootstrapping - INFO - Configuring wal-e
2022-01-23 20:53:07,641 - bootstrapping - INFO - Configuring crontab
2022-01-23 20:53:07,642 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of SYS_NICE capability
2022-01-23 20:53:07,643 - bootstrapping - INFO - Configuring pam-oauth2
2022-01-23 20:53:07,644 - bootstrapping - INFO - Writing to file /etc/pam.d/postgresql
2022-01-23 20:53:07,645 - bootstrapping - INFO - Configuring bootstrap
2022-01-23 20:53:07,645 - bootstrapping - INFO - Configuring patroni
2022-01-23 20:53:07,682 - bootstrapping - INFO - Writing to file /run/postgres.yml
2022-01-23 20:53:07,690 - bootstrapping - INFO - Configuring pgbouncer
2022-01-23 20:53:07,691 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2022-01-23 20:53:07,691 - bootstrapping - INFO - Configuring certificate
2022-01-23 20:53:07,691 - bootstrapping - INFO - Generating ssl self-signed certificate
2022-01-23 20:53:08,841 INFO: Selected new K8s API server endpoint https://192.168.100.61:6443
2022-01-23 20:53:08,897 INFO: No PostgreSQL configuration items changed, nothing to reload.
2022-01-23 20:53:08,903 INFO: Lock owner: None; I am acid-minimal-cluster-1
2022-01-23 20:53:09,103 INFO: waiting for leader to bootstrap
2022-01-23 20:53:10,424 INFO: Lock owner: acid-minimal-cluster-0; I am acid-minimal-cluster-1
2022-01-23 20:53:10,428 INFO: trying to bootstrap from leader 'acid-minimal-cluster-0'
2022-01-23 20:53:10,431 INFO: No PostgreSQL configuration items changed, nothing to reload.
2022-01-23 20:53:20,867 INFO: Lock owner: acid-minimal-cluster-0; I am acid-minimal-cluster-1
2022-01-23 20:53:21,035 INFO: bootstrap from leader 'acid-minimal-cluster-0' in progress
1024+0 records in
1024+0 records out
16777216 bytes (17 MB, 16 MiB) copied, 0.0411527 s, 408 MB/s
2022-01-23 20:53:23,684 INFO: Lock owner: acid-minimal-cluster-0; I am acid-minimal-cluster-1
2022-01-23 20:53:23,685 INFO: bootstrap from leader 'acid-minimal-cluster-0' in progress
NOTICE:  all required WAL segments have been archived
2022-01-23 20:53:25,265 INFO: replica has been created using basebackup_fast_xlog
2022-01-23 20:53:25,267 INFO: bootstrapped from leader 'acid-minimal-cluster-0'
2022-01-23 20:53:25,820 INFO: postmaster pid=94
/var/run/postgresql:5432 - no response
2022-01-23 20:53:25 UTC [94]: [1-1] 61edc045.5e 0     LOG:  Auto detecting pg_stat_kcache.linux_hz parameter...
2022-01-23 20:53:25 UTC [94]: [2-1] 61edc045.5e 0     LOG:  pg_stat_kcache.linux_hz is set to 333333
2022-01-23 20:53:25 UTC [94]: [3-1] 61edc045.5e 0     LOG:  redirecting log output to logging collector process
2022-01-23 20:53:25 UTC [94]: [4-1] 61edc045.5e 0     HINT:  Future log output will appear in directory "../pg_log".
/var/run/postgresql:5432 - rejecting connections
/var/run/postgresql:5432 - rejecting connections
/var/run/postgresql:5432 - accepting connections
2022-01-23 20:53:27,920 INFO: Lock owner: acid-minimal-cluster-0; I am acid-minimal-cluster-1
2022-01-23 20:53:27,921 INFO: establishing a new patroni connection to the postgres cluster
2022-01-23 20:53:27,970 INFO: no action. I am a secondary (acid-minimal-cluster-1) and following a leader (acid-minimal-cluster-0)
2022-01-23 20:53:34,386 INFO: no action. I am a secondary (acid-minimal-cluster-1) and following a leader (acid-minimal-cluster-0)
...
2022-01-23 21:42:34,454 INFO: no action. I am a secondary (acid-minimal-cluster-1) and following a leader (acid-minimal-cluster-0)
2022-01-23 21:42:44,312 INFO: no action. I am a secondary (acid-minimal-cluster-1) and following a leader (acid-minimal-cluster-0)
  • after a while we got STATUS == SyncFailed
yury@u2004s01:~/postgres-operator$ kubectl get postgresql
NAME                   TEAM   VERSION   PODS   VOLUME   CPU-REQUEST   MEMORY-REQUEST   AGE   STATUS
acid-minimal-cluster   acid   14        2      2Gi                                     58m   SyncFailed

After restarting master pod

  • our cluster has been deployed into nodes as follows
    • u2004s03 (acid-minimal-cluster-0)
      • acid-minimal-cluster-0 had the master-role
    • u2004s04 (acid-minimal-cluster-1)
      • acid-minimal-cluster-1 had the replica-role
  • u2004s03-machine has been rebooted
    • with sudo poweroff-command
    • after a while u2004s03-machine has been started
  • Here is the expected result
    • acid-minimal-cluster-1 must become a master:
yury@u2004s01:~$ kubectl get pods -l application=spilo -L spilo-role
NAME                     READY   STATUS    RESTARTS   AGE   SPILO-ROLE
acid-minimal-cluster-0   1/1     Running   0          11h   replica
acid-minimal-cluster-1   1/1     Running   0          12h   master

Service resources

  • for u2004s01
yury@u2004s01:~$ kubectl get svc -l application=spilo -L spilo-role
NAME                          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE   SPILO-ROLE
acid-minimal-cluster          ClusterIP   10.104.184.254   <none>        5432/TCP   12h   master
acid-minimal-cluster-config   ClusterIP   None             <none>        <none>     12h
acid-minimal-cluster-repl     ClusterIP   10.105.27.187    <none>        5432/TCP   12h   replica

Node Affinity

Click to show deletion script
kubectl delete -f- <<EOF
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: acid-minimal-cluster
  namespace: default
spec:
  teamId: "acid"
  volume:
    size: 2Gi
  numberOfInstances: 2
  users:
    zalando:  # database owner
    - superuser
    - createdb
    foo_user: []  # role for application foo
  databases:
    foo: zalando  # dbname: owner
  preparedDatabases:
    bar: {}
  postgresql:
    version: "14"
EOF
  • at second we try create the cluster with nodeAffinity
    • Note: They do not use the chain affinity.nodeAffinity. Instead, nodeAffinity is used directly.
  • for u2004s01
kubectl apply -f- <<EOF
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: acid-minimal-cluster
  namespace: default
spec:
  teamId: "acid"
  volume:
    size: 2Gi
  numberOfInstances: 2
  users:
    zalando:  # database owner
    - superuser
    - createdb
    foo_user: []  # role for application foo
  databases:
    foo: zalando  # dbname: owner
  preparedDatabases:
    bar: {}
  postgresql:
    version: "14"
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - u2004s02
          - u2004s03
EOF
Click to show the result
yury@u2004s01:~$ kubectl get pods -n default -o wide
NAME                                 READY   STATUS    RESTARTS   AGE   IP              NODE       NOMINATED NODE   READINESS GATES
acid-minimal-cluster-0               1/1     Running   0          13m   10.32.105.14    u2004s03   <none>           <none>
acid-minimal-cluster-1               1/1     Running   0          13m   10.32.27.195    u2004s02   <none>           <none>
postgres-operator-849dddc998-gbhcg   1/1     Running   0          15h   10.32.121.129   u2004s04   <none>           <none>

yury@u2004s01:~$ kubectl get pods -l application=spilo -L spilo-role -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP             NODE       NOMINATED NODE   READINESS GATES   SPILO-ROLE
acid-minimal-cluster-0   1/1     Running   0          15m   10.32.105.14   u2004s03   <none>           <none>            master
acid-minimal-cluster-1   1/1     Running   0          15m   10.32.27.195   u2004s02   <none>           <none>            replica

yury@u2004s01:~$ kubectl get svc -l application=spilo -L spilo-role -o wide
NAME                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE   SELECTOR                                                                 SPILO-ROLE
acid-minimal-cluster          ClusterIP   10.104.30.236   <none>        5432/TCP   16m   <none>                                                                   master
acid-minimal-cluster-config   ClusterIP   None            <none>        <none>     16m   <none>
acid-minimal-cluster-repl     ClusterIP   10.101.194.33   <none>        5432/TCP   16m   application=spilo,cluster-name=acid-minimal-cluster,spilo-role=replica   replica

yury@u2004s01:~$ kubectl get postgresql -o wide
NAME                   TEAM   VERSION   PODS   VOLUME   CPU-REQUEST   MEMORY-REQUEST   AGE   STATUS
acid-minimal-cluster   acid   14        2      2Gi                                     17m   CreateFailed

Restarting master pod

  • for u2004s03
    • we do not use sudo reboot. Instead, we run the command below and wait for a while after virtual machine shutdown. Then we turn on u2004s03 again.
sudo poweroff
  • Cluster works as expected
    • acid-minimal-cluster-1 now has master-role
Click to show the result
yury@u2004s01:~$ kubectl get pods -l application=spilo -L spilo-role -o wide
NAME                     READY   STATUS    RESTARTS   AGE     IP             NODE       NOMINATED NODE   READINESS GATES   SPILO-ROLE
acid-minimal-cluster-0   1/1     Running   0          4m18s   10.32.105.16   u2004s03   <none>           <none>            replica
acid-minimal-cluster-1   1/1     Running   0          40m     10.32.27.195   u2004s02   <none>           <none>            master

Using Service Resources

  • acid-minimal-cluster should be used for SQL-write operations
  • acid-minimal-cluster-repl should be used for SQL-select operations
yury@u2004s01:~$ kubectl get svc -l application=spilo -L spilo-role
NAME                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE   SPILO-ROLE
acid-minimal-cluster          ClusterIP   10.104.30.236   <none>        5432/TCP   48m   master
acid-minimal-cluster-config   ClusterIP   None            <none>        <none>     47m
acid-minimal-cluster-repl     ClusterIP   10.101.194.33   <none>        5432/TCP   48m   replica

Create Database

kubectl exec --stdin --tty acid-minimal-cluster-1 -- /bin/bash
psql -U postgres
\l
SELECT datname FROM pg_database;
create database mydb;
\quit
exit
  • Now we login into replica pod
kubectl exec --stdin --tty acid-minimal-cluster-0 -- /bin/bash
psql -U postgres
SELECT datname FROM pg_database;
\quit
exit
Click to show the responses
yury@u2004s01:~$ kubectl exec --stdin --tty acid-minimal-cluster-0 -- /bin/bash

 ____        _ _
/ ___| _ __ (_) | ___
\___ \| '_ \| | |/ _ \
 ___) | |_) | | | (_) |
|____/| .__/|_|_|\___/
      |_|

This container is managed by runit, when stopping/starting services use sv

Examples:

sv stop cron
sv restart patroni

Current status: (sv status /etc/service/*)

run: /etc/service/patroni: (pid 27) 4047s
run: /etc/service/pgqd: (pid 28) 4047s
root@acid-minimal-cluster-0:/home/postgres# psql -U postgres
psql (14.0 (Ubuntu 14.0-1.pgdg18.04+1))
Type "help" for help.

postgres=# SELECT datname FROM pg_database;
  datname
-----------
 postgres
 mydb
 template1
 template0
(4 rows)

postgres=# \quit
root@acid-minimal-cluster-0:/home/postgres# exit
exit

SyncFailed Status

  • we tested replication and it works. But why the STATUS == SyncFailed???
yury@u2004s01:~$ kubectl get postgresql -o wide
NAME                   TEAM   VERSION   PODS   VOLUME   CPU-REQUEST   MEMORY-REQUEST   AGE    STATUS
acid-minimal-cluster   acid   14        2      2Gi                                     110m   SyncFailed
Click to show the fragment of the operator log
time="2022-01-24T12:13:22Z" level=warning msg="could not connect to Postgres database: dial tcp: lookup acid-minimal-cluster.default.svc.cluster.local on 10.96.0.10:53: no such host" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
time="2022-01-24T12:13:37Z" level=warning msg="could not connect to Postgres database: dial tcp: lookup acid-minimal-cluster.default.svc.cluster.local on 10.96.0.10:53: no such host" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
time="2022-01-24T12:13:52Z" level=warning msg="could not connect to Postgres database: dial tcp: lookup acid-minimal-cluster.default.svc.cluster.local on 10.96.0.10:53: no such host" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
time="2022-01-24T12:14:07Z" level=warning msg="could not connect to Postgres database: dial tcp: lookup acid-minimal-cluster.default.svc.cluster.local on 10.96.0.10:53: no such host" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
time="2022-01-24T12:14:22Z" level=warning msg="could not connect to Postgres database: dial tcp: lookup acid-minimal-cluster.default.svc.cluster.local on 10.96.0.10:53: no such host" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
time="2022-01-24T12:14:37Z" level=warning msg="could not connect to Postgres database: dial tcp: lookup acid-minimal-cluster.default.svc.cluster.local on 10.96.0.10:53: no such host" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
time="2022-01-24T12:14:52Z" level=warning msg="could not connect to Postgres database: dial tcp: lookup acid-minimal-cluster.default.svc.cluster.local on 10.96.0.10:53: no such host" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
time="2022-01-24T12:15:07Z" level=warning msg="could not connect to Postgres database: dial tcp: lookup acid-minimal-cluster.default.svc.cluster.local on 10.96.0.10:53: no such host" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
time="2022-01-24T12:15:07Z" level=warning msg="error while syncing cluster state: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
time="2022-01-24T12:15:07Z" level=error msg="could not sync cluster: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=default/acid-minimal-cluster pkg=controller worker=0

Cluster manifest reference

⚠️ **GitHub.com Fallback** ⚠️