Horcrux Signer Migration Runbook - Cumulo-pro/Horcrux-Architecture GitHub Wiki
This document describes the safe migration of a single Horcrux signer to a new server (new IP / new provider / new region) without resharing keys and without modifying the threshold.
The procedure is intentionally generic and does not reference any specific infrastructure.
β
Replace one cosigner with a new server
β
Keep same shard and ECIES keys
β
Rebuild Raft cluster cleanly
β
Maintain threshold signing (2/3)
β Not covered: - Changing threshold - Adding or removing cosigners - Reβsharding validator key - Rotating validator key
- 3 cosigners
- threshold = 2
- Cosigner P2P port: 2222/tcp
- Sentry priv_validator_laddr port: 1234/tcp
- All cosigners can reach all sentries
- SSH access to all signer nodes
~/.horcrux/ βββ config.yaml βββ ecies_keys/ (must match shardID) βββ shards/ (must match shardID + chain_id) βββ state/ β βββ _priv_validator_state.json β βββ _share_sign_state.json βββ raft/ (MUST NOT be copied; MUST be rebuilt)
β Copy: - config.yaml - ecies_keys/ - shards/ - state/ (ONLY after cluster is stopped)
β Never copy: - raft/ - horcrux.pid
- Install OS updates
- Create user
- Configure firewall (open 2222/tcp)
- DO NOT start Horcrux
Verify:
horcrux version
which horcrux
Version must match existing production cluster.
mkdir -p ~/.horcrux
chmod 700 ~/.horcrux
From the signer being replaced:
Copy ONLY:
~/.horcrux/config.yaml
~/.horcrux/ecies_keys/
~/.horcrux/shards/
DO NOT copy: ~/.horcrux/raft ~/.horcrux/state (yet)
Verify structure on new server before proceeding.
On ALL existing signers:
sudo systemctl stop horcrux
Verify stopped:
pgrep horcrux || echo "stopped"
This prevents: - leader instability - partial raft writes - inconsistent state propagation
After all signers are stopped:
Copy:
~/.horcrux/state/
to the new signer.
Verify:
cat ~/.horcrux/state/<chain>_priv_validator_state.json
cat ~/.horcrux/state/<chain>_share_sign_state.json
Heights must look valid and nonβzero.
On ALL remaining signers AND the new signer:
Update only the p2pAddr for the shard being replaced:
thresholdMode:
threshold: 2
cosigners:
- shardID: X
p2pAddr: tcp://NEW_IP:2222
DO NOT modify: - shardID - threshold - keys - chainNodes
On ALL signers (old and new cluster members):
rm -rf ~/.horcrux/raft
Verify deletion:
test ! -d ~/.horcrux/raft && echo "raft removed"
Skipping this step may cause: - height regression errors - failed shard signing - chain id cannot be empty - persistent leader instability
Ensure the old signer does NOT rejoin:
sudo systemctl disable --now horcrux
Optionally shut down or destroy the old server.
Start Horcrux on all active signers:
sudo systemctl start horcrux
Check logs:
journalctl -u horcrux -f
Expected patterns:
- I am the leader
- Signed chain_id=...
- No repeated shard errors
- No height regression loops
horcrux leader
-
Stop all signers
-
Delete raft on all
-
Restart all signers
sudo systemctl stop horcrux rm -rf ~/.horcrux/raft sudo systemctl start horcrux
- Always copy state AFTER stopping all signers.
- Never copy raft between servers.
- Minor block misses during migration are normal.
- If you observe continuous height regression, repeat raft cleanup.