Performing upgrades - penumbra-zone/penumbra GitHub Wiki
Occasionally the project will make consensus-breaking changes to the protocol. In order for these changes to land on the network, a chain upgrade must be performed.
At a high level, the upgrade process consists of:
- Governance proposal submitted, specifying explicit chain height for halt to occur.
- Governance proposal passes.
- Chain reaches specified height
n
, nodes stop generating blocks. - Manual upgrade is performed on each validator and fullnode:
- Export state (as backup) via
pd export
. - Install the new version of pd.
- Upgrade state via
pd migrate
. - Copy a few files and directories around, clean up CometBFT state.
- Restart node.
Depending on the deployment environment, these steps can differ considerably.
Here we document how to test upgrade behavior on your local workstation.
-
pd testnet generate --epoch-duration 50 --proposal-voting-blocks 100
to create the network. Shorter epochs means stake bonds faster, and shorter voting periods mean you don't have to wait the default 24h. - Stand up the network in your usual way.
- List validators:
pcli q validator list
- Delegate:
pcli tx delegate --to <VALIDATOR_ID> <AMOUNT>
. You want to make sure that your amount is a majority of stake on the devnet, so that your single vote is enough to pass the proposal. - Wait for stake to be bonded; monitor with
pcli q chain info -v
(look for epoch rollover). - Create proposal for an upgrade:
pcli tx proposal template upgrade-plan > upgrade.toml
. Edit that file to specific a height that's greater than the current height + the voting period, but not so far in the future you don't want to wait around for it. - Submit the proposal:
pcli tx proposal submit --file upgrade.toml --deposit-amount 10000000
(n.b. the--deposit-amount
flag doesn't parse denoms; see GH3455). - Vote yes:
pcli tx vote yes --on 0
(we assume id is 0 because this is a devnet). - Wait for proposal to pass; check with
pcli q governance proposal 0 state
.
Once it passes, wait for the chain to halt. You should see in the pd logs: ERROR penumbra_app::app: chain is halted, refusing to restart!
Then proceed with the migration steps.
Here we document how the upgrade should be performed on the k8s setup used by Penumbra Labs. We assume that a governance proposal has already been passed and the chain halted. At least one validator will CrashLoop. Check the logs for this message: ERROR penumbra_app::app: chain is halted, refusing to restart!
To perform an upgrade, start with the validators.
- Set the deployment to maintenance mode, enabling
sleep infinity
for all containers, e.g.helmfile apply -f helmfile.d/penumbra-devnet.yaml --args --set=maintenanceMode=true
. Then manually kill the pods associated with the statefulset, e.g.kubectl delete pod penumbra-devnet-val-0 penumbra-devnet-val-1
. This will cause the pods to restart without running pd, just waiting. - Pick the pod with the highest-ordinal, e.g.
penumbra-devnet-val-1
, and into thepd
container within the pod, e.g.kubectl exec -it penumbra-devnet-val-0 -c pd -- bash
- Run the export:
pd export --home /penumbra-config/penumbra-devnet-val/node0/pd --export-directory /penumbra-config/penumbra-devnet-val/pd-exported-state --export-archive /penumbra-config/penumbra-devnet-val/pd-exported-state.tar.gz
- Back up the dir
mv /penumbra-config/penumbra-devnet-val/node0/pd /penumbra-config/penumbra-devnet-val/pd-state-backup
. Repeat steps 2-4 for all pods in the statefulset. - Now that we've exported the old state with the prior version of pd, it's time to upgrade the version of pd in preparation for running the migration. Edit the statefulset directly, this time modifying the tag on the container image to be the new version, e.g.
ghcr.io/penumbra-zone/penumbra:vX.Y.Z
. - Shell into the
pd
container again, run the migration:pd migrate --target-directory /penumbra-config/penumbra-devnet-val/pd-exported-state/ --migrate-archive /penumbra-config/penumbra-devnet-val/pd-migrated-state-archive.tar.gz
. Make note of the genesis time logged. You must provide the exact genesis time for the other nodes to upgrade. Look for a log message likepd::upgrade: no genesis time provided, detecting a testing setup now=2023-12-09T00:08:24.225277473Z
. Copy the value afternow=
. - Move the migrated state into place:
mkdir /penumbra-config/penumbra-devnet-val/node0/pd && mv /penumbra-config/penumbra-devnet-val/pd-exported-state/rocksdb /penumbra-config/penumbra-devnet-val/node0/pd/
- Move the upgrade cometbft state into place:
cp /penumbra-config/penumbra-devnet-val/pd-exported-state/genesis.json /penumbra-config/penumbra-devnet-val/node0/cometbft/config/genesis.json && cp /penumbra-config/penumbra-devnet-val/pd-exported-state/priv_validator_state.json /penumbra-config/penumbra-devnet-val/node0/cometbft/data/priv_validator_state.json
- Then we clean up the old CometBFT state:
find /penumbra-config/penumbra-devnet-val/node0/cometbft/data/ -mindepth 1 -maxdepth 1 -type d -exec rm -r {} +
- Fix permissions for mounted volumes:
chown -R 1000:1000 /penumbra-config/penumbra-devnet-val/node0/pd && chown -R 100:100 /penumbra-config/penumbra-devnet-val/node0/cometbft
- Repeat steps 6-10 for the other pods in the statefulset you're working on.
- Finally, exit the shell, and restore the working statefulset:
helmfile apply -f helmfile.d/penumbra-devnet.yaml --args --set=maintenanceMode=false --set=image.tag=vX.Y.Z
. Then runkubectl rollout restart statefulset penumbra-devnet
and monitor the rollout.
Finally, download the migration archive you created: kubectl cp penumbra-devnet-val-0:/penumbra-config/penumbra-devnet-val/pd-migrated-state-archive.tar.gz -c pd pd-migrated-state-archive.tar.gz
and scp
it to snapshots.penumbra.zone, and place in the /var/www/snapshots/testnet/
dir.