Performing upgrades - penumbra-zone/penumbra GitHub Wiki

Chain upgrades in Penumbra

Occasionally the project will make consensus-breaking changes to the protocol. In order for these changes to land on the network, a chain upgrade must be performed.

Upgrade process abstractly

At a high level, the upgrade process consists of:

Governance proposal submitted, specifying explicit chain height for halt to occur.
Governance proposal passes.
Chain reaches specified height n, nodes stop generating blocks.
Manual upgrade is performed on each validator and fullnode:
Export state (as backup) via pd export.
Install the new version of pd.
Upgrade state via pd migrate.
Copy a few files and directories around, clean up CometBFT state.
Restart node.

Depending on the deployment environment, these steps can differ considerably.

Testing an upgrade on local devnet

Here we document how to test upgrade behavior on your local workstation.

pd testnet generate --epoch-duration 50 --proposal-voting-blocks 100 to create the network. Shorter epochs means stake bonds faster, and shorter voting periods mean you don't have to wait the default 24h.
Stand up the network in your usual way.
List validators: pcli q validator list
Delegate: pcli tx delegate --to <VALIDATOR_ID> <AMOUNT>. You want to make sure that your amount is a majority of stake on the devnet, so that your single vote is enough to pass the proposal.
Wait for stake to be bonded; monitor with pcli q chain info -v (look for epoch rollover).
Create proposal for an upgrade: pcli tx proposal template upgrade-plan > upgrade.toml. Edit that file to specific a height that's greater than the current height + the voting period, but not so far in the future you don't want to wait around for it.
Submit the proposal: pcli tx proposal submit --file upgrade.toml --deposit-amount 10000000 (n.b. the --deposit-amount flag doesn't parse denoms; see GH3455).
Vote yes: pcli tx vote yes --on 0 (we assume id is 0 because this is a devnet).
Wait for proposal to pass; check with pcli q governance proposal 0 state.

Once it passes, wait for the chain to halt. You should see in the pd logs: ERROR penumbra_app::app: chain is halted, refusing to restart! Then proceed with the migration steps.

Performing an upgrade on k8s deployment

Here we document how the upgrade should be performed on the k8s setup used by Penumbra Labs. We assume that a governance proposal has already been passed and the chain halted. At least one validator will CrashLoop. Check the logs for this message: ERROR penumbra_app::app: chain is halted, refusing to restart! To perform an upgrade, start with the validators.

Set the deployment to maintenance mode, enabling sleep infinity for all containers, e.g. helmfile apply -f helmfile.d/penumbra-devnet.yaml --args --set=maintenanceMode=true. Then manually kill the pods associated with the statefulset, e.g. kubectl delete pod penumbra-devnet-val-0 penumbra-devnet-val-1. This will cause the pods to restart without running pd, just waiting.
Pick the pod with the highest-ordinal, e.g. penumbra-devnet-val-1, and into the pd container within the pod, e.g. kubectl exec -it penumbra-devnet-val-0 -c pd -- bash
Run the export: pd export --home /penumbra-config/penumbra-devnet-val/node0/pd --export-directory /penumbra-config/penumbra-devnet-val/pd-exported-state --export-archive /penumbra-config/penumbra-devnet-val/pd-exported-state.tar.gz
Back up the dir mv /penumbra-config/penumbra-devnet-val/node0/pd /penumbra-config/penumbra-devnet-val/pd-state-backup. Repeat steps 2-4 for all pods in the statefulset.
Now that we've exported the old state with the prior version of pd, it's time to upgrade the version of pd in preparation for running the migration. Edit the statefulset directly, this time modifying the tag on the container image to be the new version, e.g. ghcr.io/penumbra-zone/penumbra:vX.Y.Z.
Shell into the pd container again, run the migration: pd migrate --target-directory /penumbra-config/penumbra-devnet-val/pd-exported-state/ --migrate-archive /penumbra-config/penumbra-devnet-val/pd-migrated-state-archive.tar.gz. Make note of the genesis time logged. You must provide the exact genesis time for the other nodes to upgrade. Look for a log message like pd::upgrade: no genesis time provided, detecting a testing setup now=2023-12-09T00:08:24.225277473Z. Copy the value after now=.
Move the migrated state into place: mkdir /penumbra-config/penumbra-devnet-val/node0/pd && mv /penumbra-config/penumbra-devnet-val/pd-exported-state/rocksdb /penumbra-config/penumbra-devnet-val/node0/pd/
Move the upgrade cometbft state into place: cp /penumbra-config/penumbra-devnet-val/pd-exported-state/genesis.json /penumbra-config/penumbra-devnet-val/node0/cometbft/config/genesis.json && cp /penumbra-config/penumbra-devnet-val/pd-exported-state/priv_validator_state.json /penumbra-config/penumbra-devnet-val/node0/cometbft/data/priv_validator_state.json
Then we clean up the old CometBFT state: find /penumbra-config/penumbra-devnet-val/node0/cometbft/data/ -mindepth 1 -maxdepth 1 -type d -exec rm -r {} +
Fix permissions for mounted volumes: chown -R 1000:1000 /penumbra-config/penumbra-devnet-val/node0/pd && chown -R 100:100 /penumbra-config/penumbra-devnet-val/node0/cometbft
Repeat steps 6-10 for the other pods in the statefulset you're working on.
Finally, exit the shell, and restore the working statefulset: helmfile apply -f helmfile.d/penumbra-devnet.yaml --args --set=maintenanceMode=false --set=image.tag=vX.Y.Z. Then run kubectl rollout restart statefulset penumbra-devnet and monitor the rollout.

Finally, download the migration archive you created: kubectl cp penumbra-devnet-val-0:/penumbra-config/penumbra-devnet-val/pd-migrated-state-archive.tar.gz -c pd pd-migrated-state-archive.tar.gz and scp it to snapshots.penumbra.zone, and place in the /var/www/snapshots/testnet/ dir.