Steps taken to test backup and restore

Create a umbrella for backup

SCENARIO=chef-backend PLATFORM=ubuntu-18.04 INSTALL_VERSION=14.11.31 UPGRADE_VERSION=14.11.36 BACKEND_VERSION=2.2.0 ENABLE_IPV6=false ENABLE_ADDON_PUSH_JOBS=false ENABLE_GATHER_LOGS_TEST=false ENABLE_PEDANT_TEST=false ENABLE_PSQL_TEST=false ENABLE_SMOKE_TEST=false ENABLE_IPV6=false make apply

Add data

Login to front end
chef-server-ctl user-create -f /tmp/admin.pem admin Admin User [email protected] password; chef-server-ctl org-create -f /tmp/test-validator.pem test Test; chef-server-ctl org-user-add test admin;
mkdir ~/.chef; cp /tmp/admin.pem ~/.chef/; vi ~/.chef/knife.rb
export PATH=$PATH:/opt/opscode/embedded/bin; knife ssl fetch; knife node create FOO -d; knife node create Foo -d; knife node create foo -d; knife node create bar -d;
chef-server-ctl user-list; chef-server-ctl org-list; knife node list;

Take a back up on the follower

Login to on of the follower
Run this chef-server-ctl backup

Do the restore - testing on the same instance.

Run this chef-backend-ctl restore /var/opt/chef-backup/chef-backup-2022-01-07-11-52-19.tgz but got the following error

Would you like to proceed? (y/n)
y
 ✓ Verifying backup has required components
 ✓ Verifying backup has required components
 ✓ Unpacking backup to temporary directory
 ✓ Removing existing data directoriest node
 ✓ Rewriting configuration for current node
 ✓ Restoring configuration cluster
 ✗ Create new Chef Backend cluster
   Restoring PostgreSQL data
   Starting up Chef Backendles
 ✓ Cleaning Up Temporary Files

An error occurred during this operation:

Restore failed:

    Timed out waiting for cluster to be ready.
root@ip-10-0-10-189:~#

The status after the restore was as follows

root@ip-10-0-10-189:~# 
root@ip-10-0-10-189:~# 
root@ip-10-0-10-189:~# chef-backend-ctl cluster-status
Name            IP           GUID                              Role    PG      ES      Blocked      Eligible
ip-10-0-10-189  10.0.10.189  e62b212424b293375261e5d5ce0bf81e  leader  leader  master  not_blocked  true    
root@ip-10-0-10-189:~# 
root@ip-10-0-10-189:~# 
root@ip-10-0-10-189:~# 
root@ip-10-0-10-189:~# chef-backend-ctl status
Service        Local Status         Time in State  Distributed Node Status     
leaderl        running (pid 10560)  0d 0h 6m 4s    Error: no cluster configured
epmd           running (pid 10410)  0d 0h 6m 15s   Error: no cluster configured
etcd           running (pid 10352)  0d 0h 6m 17s   Error: no cluster configured
postgresql     running (pid 10631)  0d 0h 6m 2s    Error: no cluster configured
elasticsearch  running (pid 10440)  0d 0h 6m 14s   Error: no cluster configured

System  Local Status                                          Distributed Node Status          
disks   /var/log/chef-backend: OK; /var/opt/chef-backend: OK  health: green; healthy nodes: 1/1
root@ip-10-0-10-189:~# 
root@ip-10-0-10-189:~#

Tried cluster join from the other follower node

need to copy the /etc/chef-backend/chef-backend-secrets.json of first node to /tmp/chef-backend-secrets.json of joining node
chef-backend-ctl cleanse
chef-backend-ctl join-cluster --accept-license --yes --quiet 10.0.10.189 -p 10.0.4.86 -s /tmp/chef-backend-secrets.json
this was successful and the status was as follows

root@ip-10-0-4-86:~# 
root@ip-10-0-4-86:~# chef-backend-ctl cluster-status
Name            IP           GUID                              Role      PG        ES          Blocked      Eligible
ip-10-0-4-86    10.0.4.86    8d7db929361e812c3e0964f17b90096a  follower  follower  not_master  not_blocked  true    
ip-10-0-10-189  10.0.10.189  e62b212424b293375261e5d5ce0bf81e  leader    leader    master      not_blocked  true    
root@ip-10-0-4-86:~# 
root@ip-10-0-4-86:~# 
root@ip-10-0-4-86:~# 
root@ip-10-0-4-86:~# chef-backend-ctl status
Service        Local Status         Time in State  Distributed Node Status                     
leaderl        running (pid 13107)  0d 0h 0m 40s   leader: 1; waiting: 0; follower: 1; total: 2
epmd           running (pid 13083)  0d 0h 0m 41s   status: local-only                          
etcd           running (pid 12963)  0d 0h 1m 16s   health: green; healthy nodes: 2/2           
postgresql     running (pid 13246)  0d 0h 0m 36s   leader: 1; offline: 0; syncing: 0; synced: 1
elasticsearch  running (pid 13104)  0d 0h 0m 42s   state: green; nodes online: 2/2             

System  Local Status                                          Distributed Node Status          
disks   /var/log/chef-backend: OK; /var/opt/chef-backend: OK  health: green; healthy nodes: 2/2
root@ip-10-0-4-86:~#

Tried to connect to the other remaining node

This is the leader node of the previous cluster.
Tried the same steps, initial I had error due to wrong leader ip address and proceed further with below steps.
Tried again after doing the next step below and correcting the error (Thank you Prajaktha!) and it was successful.
Status at the end point

root@ip-10-0-1-226:~# 
root@ip-10-0-1-226:~# chef-backend-ctl status
Service        Local Status         Time in State  Distributed Node Status                     
leaderl        running (pid 13273)  0d 0h 0m 50s   leader: 1; waiting: 0; follower: 2; total: 3
epmd           running (pid 13248)  0d 0h 0m 52s   status: local-only                          
etcd           running (pid 13128)  0d 0h 1m 25s   health: green; healthy nodes: 3/3           
postgresql     running (pid 13412)  0d 0h 0m 46s   leader: 1; offline: 0; syncing: 0; synced: 2
elasticsearch  running (pid 13270)  0d 0h 0m 52s   state: green; nodes online: 3/3             

System  Local Status                                          Distributed Node Status          
disks   /var/log/chef-backend: OK; /var/opt/chef-backend: OK  health: green; healthy nodes: 3/3
root@ip-10-0-1-226:~# 
root@ip-10-0-1-226:~# 
root@ip-10-0-1-226:~# chef-backend-ctl cluster-status
Name            IP           GUID                              Role      PG        ES          Blocked      Eligible
ip-10-0-1-226   10.0.1.226   37b13f086ea76a33cc635da13322888a  follower  follower  not_master  not_blocked  true    
ip-10-0-10-189  10.0.10.189  e62b212424b293375261e5d5ce0bf81e  leader    leader    master      not_blocked  true    
ip-10-0-4-86    10.0.4.86    8d7db929361e812c3e0964f17b90096a  follower  follower  not_master  not_blocked  true    
root@ip-10-0-1-226:~#

Status of the FE at this point

root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# chef-server-ctl status
-------------------
 Internal Services 
-------------------
run: bookshelf: (pid 18182) 5093s; run: log: (pid 17978) 5140s
run: haproxy: (pid 18131) 5094s; run: log: (pid 3311) 5194s
run: nginx: (pid 2380) 4703s; run: log: (pid 18113) 5108s
run: oc_bifrost: (pid 18136) 5094s; run: log: (pid 17792) 5174s
run: oc_id: (pid 18159) 5093s; run: log: (pid 17825) 5166s
run: opscode-erchef: (pid 18265) 5092s; run: log: (pid 18075) 5136s
run: redis_lb: (pid 2099) 4757s; run: log: (pid 18321) 5091s
-------------------
 External Services 
-------------------

down: elasticsearch: failed to connect to http://127.0.0.1:9200: 404 "Not Found"

run: postgresql: connected OK to 127.0.0.1:5432
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# chef-server-ctl user-list; chef-server-ctl org-list;
WARN: Server returned error 503 for https://127.0.0.1/users, retrying 1/5 in 4s
WARN: Server returned error 503 for https://127.0.0.1/users, retrying 2/5 in 8s
^CTraceback (most recent call last):
	7: from /usr/bin/chef-server-ctl:180:in `<main>'
	6: from /usr/bin/chef-server-ctl:180:in `load'
	5: from /opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/chef-server-ctl-1.1.0/bin/chef-server-ctl:337:in `<top (required)>'
	4: from /opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/omnibus-ctl-0.6.4/lib/omnibus-ctl.rb:745:in `run'
	3: from /opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/omnibus-ctl-0.6.4/lib/omnibus-ctl.rb:203:in `block in add_command_under_category'
	2: from /opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/chef-server-ctl-1.1.0/plugins/wrap-knife-opc.rb:43:in `block (2 levels) in load_file'
	1: from /opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/omnibus-ctl-0.6.4/lib/omnibus-ctl.rb:237:in `run_command'
/opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/omnibus-ctl-0.6.4/lib/omnibus-ctl.rb:237:in `system': Interrupt

root@ip-10-0-10-216:~#

Update the FE

update the chef_backend_members in /etc/opscode/chef-server.rb by having only the working nodes of the new cluster
chef-server-ctl reconfigure
The status is as follows

root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# chef-server-ctl status
-------------------
 Internal Services 
-------------------
run: bookshelf: (pid 18182) 5372s; run: log: (pid 17978) 5419s
run: haproxy: (pid 18893) 129s; run: log: (pid 3311) 5473s
run: nginx: (pid 18896) 128s; run: log: (pid 18113) 5387s
run: oc_bifrost: (pid 18136) 5373s; run: log: (pid 17792) 5453s
run: oc_id: (pid 18159) 5372s; run: log: (pid 17825) 5445s
run: opscode-erchef: (pid 18265) 5371s; run: log: (pid 18075) 5415s
run: redis_lb: (pid 18888) 129s; run: log: (pid 18321) 5370s
-------------------
 External Services 
-------------------

run: elasticsearch: connected OK to http://127.0.0.1:9200

run: postgresql: connected OK to 127.0.0.1:5432
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# chef-server-ctl user-list; chef-server-ctl org-list;
ERROR: Failed to authenticate to https://127.0.0.1:443 as pivotal with key /tmp/latovip20220111-18917-ywu2fo
Response:  Failed to authenticate as 'pivotal'. Ensure that your node_name and client key are correct.
ERROR: Failed to authenticate to https://127.0.0.1:443 as pivotal with key /tmp/latovip20220111-18921-t26put
Response:  Failed to authenticate as 'pivotal'. Ensure that your node_name and client key are correct.
root@ip-10-0-10-216:~#

do cleanse and reconfigure - chef-server-ctl cleanse, make sure you have the proper chef-server.rb and thenchef-server-ctl reconfigure
the status is fine, but the data is missing

root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# chef-server-ctl status
-------------------
 Internal Services 
-------------------
run: bookshelf: (pid 19206) 37s; run: log: (pid 18839) 210s
run: haproxy: (pid 19099) 39s; run: log: (pid 3694) 238s
run: nginx: (pid 19202) 37s; run: log: (pid 19068) 48s
run: oc_bifrost: (pid 19104) 39s; run: log: (pid 18659) 222s
run: oc_id: (pid 19138) 38s; run: log: (pid 18771) 216s
run: opscode-erchef: (pid 19214) 37s; run: log: (pid 18838) 210s
run: redis_lb: (pid 19094) 41s; run: log: (pid 19249) 36s
-------------------
 External Services 
-------------------

run: elasticsearch: connected OK to http://127.0.0.1:9200

run: postgresql: connected OK to 127.0.0.1:5432
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# chef-server-ctl user-list; chef-server-ctl org-list;
pivotal

root@ip-10-0-10-216:~#

Tried again after connecting the 3rd node. added the node ip in the chef-server-.rb and reconfigured. Status at this point is the same with missing data

root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# chef-server-ctl status
-------------------
 Internal Services 
-------------------
run: bookshelf: (pid 19206) 29546s; run: log: (pid 18839) 29719s
run: haproxy: (pid 3060) 429s; run: log: (pid 3694) 29747s
run: nginx: (pid 3063) 429s; run: log: (pid 19068) 29557s
run: oc_bifrost: (pid 19104) 29548s; run: log: (pid 18659) 29731s
run: oc_id: (pid 19138) 29547s; run: log: (pid 18771) 29725s
run: opscode-erchef: (pid 19214) 29546s; run: log: (pid 18838) 29719s
run: redis_lb: (pid 3055) 430s; run: log: (pid 19249) 29545s
-------------------
 External Services 
-------------------

run: elasticsearch: connected OK to http://127.0.0.1:9200

run: postgresql: connected OK to 127.0.0.1:5432
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# chef-server-ctl user-list; chef-server-ctl org-list; 
pivotal

root@ip-10-0-10-216:~#

Restoring backup on a single node

spin up a bare metal instance for fe (install chef-server) spin up 2 instances for backend (install chef-backend)

Chef backend ctl scenarios for back up and restore testing - chef/chef-server GitHub Wiki

Steps taken to test backup and restore

Create a umbrella for backup

Add data

Take a back up on the follower

Do the restore - testing on the same instance.

Tried cluster join from the other follower node

Tried to connect to the other remaining node

Status of the FE at this point

Update the FE

Restoring backup on a single node

⚠️ GitHub.com Fallback ⚠️

Chef backend ctl scenarios for back up and restore testing - chef/chef-server GitHub Wiki

Steps taken to test backup and restore

Create a umbrella for backup

Add data

Take a back up on the follower

Do the restore - testing on the same instance.

Tried cluster join from the other follower node

Tried to connect to the other remaining node

Status of the FE at this point

Update the FE

Restoring backup on a single node

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️