Manual trigger of rebalance in MinIO - minio/wiki GitHub Wiki

When a MinIO deployment is extended by adding new server pool, by default it does not rebalance objects. Instead, MinIO writes new files/objects to the pool with the relatively more free space. A manual trigger of rebalance for MinIO scans the whole deployment and then objects are moved around the server pools (if needed) to make sure all the pools are almost at similar free space level afterwards. It is expensive operation for a MinIO deployment and should be triggered judiciously. Though a rebalance can be stopped and re-started any point.

Peculiarity with simulating rebalance

It's not easy to simulate rebalance scenario in a stand alone deployment whether it's a kind based cluster or standalone using directories as drives. Actually rebalance to get triggered, we need existing pools to be populated almost to full and then after adding new server pool, a rebalance can be manually triggered. It would always be easy and better to simulate this scenario using Virtual Machines, rather.

LXD VMs to the rescue

For a quick and easy developer mode of simulation of rebalance, LXD (Linux Container Hypervisor) is a good option. This document would list the required settings and procedure to explain how the same can be achieved.

Let's get our hands dirty!!

Follow the below steps to simulate the rebalance in MinIO deployment. For this we should spin total 8 LXD VMs with ubuntu. We will start initial MinIO instance using the first 4 VMs and later extend with next 4 VMs. To limit the size of objects which can be loaded, we would add virtual disks of size 1GiB to all the VMs using loopback devices from the host. So, lets get going!!! Follow the below steps...

Install LXD to your host

LXD is available officially as a snap package, so install snapd first

sudo apt install snapd
sudo ln -s /var/lib/snapd/snap /snap
snap version
sudo systemctl restart snapd.service

Now install lxd

sudo snap install lxd

Verify the installation and start LXD

sudo snap enable lxd
sudo snap services lxd
sudo snap start lxd

Add your username to the `lxd` group

sudo usermod -a -G lxd <USERNAME>
id <USERNAME>
newgrp lxd

Initialize LXD

lxd init

This would take you through set of questions and most of default ones can be accepted

Would you like to use LXD clustering? (yes/no) [default=no]:
Do you want to configure a new storage pool? (yes/no) [default=yes]:
Name of the new storage pool [default=default]:
Name of the storage backend to use (btrfs, ceph, dir, lvm) [default=btrfs]:
Create a new BTRFS pool? (yes/no) [default=yes]:
Would you like to use an existing block device (yes/no) [default=no]:
Size in GB of the new block device (1GB minimum) (default=30GB):
Would you like to connect to a MAAS server (yes/no) [default=no]:
Would you like to create a new local network bridge (yes/no) [default=yes]:
What should new bridge be called (default=lxdbr0):
What IPv4 address should be used? (CIDR subnet notation, "auto" or "none") [default=auto]:
What IPv6 address should be used? (CIDR subnet notation, "auto" or "none") [default=auto]: none
Would you like LXD to be available over the network? (yes/no) [default=no]:
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]:
Would you like a YAML "lxd init" pressed to be printed? (yes/no) [default=no]:

Create VMs now

Create VMs using below commands

lxc init ubuntu:22.04 vm-01 --profile=default -c boot.autostart=true -c security.privileged=true -c security.syscalls.intercept.mount=true -c security.syscalls.intercept.mount.allowed=xfs -c limits.memory=1024MB -c limits.cpu.allowance=10%

lxc start vm-01

Repeat these commands for all 8 VMs. Now to make all the IPv4 IPs are in order for the VMs execute the below commands to reset their IPs

lxc stop vm-01
lxc network attach lxdbr0 vm-01 eth0 eth0
lxc config device set vm-01 eth0 ipv4.address 10.115.111.111
lxc start vm-01

Similarly for other VMs the IPs could be set as 10.115.111.112, 10.115.111.113, ....

Create virtual disk images and mount

Get inside the VMs now and create virtual disk images of 1GiB size and format it using mkfs.ext4. Also create a mount path for loopback device from the host

lxc exec vm-01 bash
truncate -s 1GiB /media/disk.img
mkfs.ext4 /media/disk.img
mkdir /mnt/virtual-disk

Repeat this for all the VMs

Attach loop devices from the host to the VM

Now we will attach available loopback devices from host to the VMs. These are just files listed as /dev/loop* and not necessarily all listed would be available to use. In case while mounting within VM, you face an issue mount: /mnt/virtual-disk: failed to setup loop device for /media/disk.img., create a fresh loopback device on host and try with that new one. Use coammand sudo losetup -f for getting a new free loopback device to be used. Follow the below steps for attaching and mounting the loop device within VMs

lxc config device add vm-01 loop-control unix-char path=/dev/loop-control
lxc config device add vm-01 loop4 unix-block path=/dev/loop4

From inside VM mount it now

lxc exec vm-01 bash
mount -t ext4 -o loop /media/disk.img /mnt/virtual-disk

Repeat this process for all the VMs. To verify if mount is successful, run the command as below within the VMs

mount | grep virtual-disk
/media/disk.img on /mnt/virtual-disk type ext4 (rw,realtime)

Hurray!! our VMs are ready now and we can start MinIO deployment 😍

Prepare objects to be loaded to the instance

We need no of objects created which we would later push to MinIO buckets. Execute the below commands on host node for the same

mkdir -p $HOME/test-minio-rebal
cd $HOME/test-minio-rebal
for index in {1..4500}; do truncate -s 1M file$index; done

This would create 4500 random files with 1M size each.

Install MinIO and MinIO Client

On all VMs install MinIO binary

wget -O https://dl.min.io/server/minio/release/linux-amd64/minio
chmod +x minio
mv minio /usr/local/bin

The MinIO client can be installed on the host itself

wget -O https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
sudo mv mc /usr/local/bin

Start MinIO instance and load objects

Follow the below steps for the same.

First get list of all running lxc VMs and note their IPv4 IPs

$ lxc list
+-------+---------+---------------------+----------------------------------------------+-----------+-----------+
| NAME  |  STATE  |        IPV4         |                     IPV6                     |   TYPE    | SNAPSHOTS |
+-------+---------+---------------------+----------------------------------------------+-----------+-----------+
| vm-01 | RUNNING | 10.49.238.61 (eth0) | fd42:9cd0:6055:a53:216:3eff:fef3:f0f (eth0)  | CONTAINER | 0         |
+-------+---------+---------------------+----------------------------------------------+-----------+-----------+
| vm-02 | RUNNING | 10.49.238.62 (eth0) | fd42:9cd0:6055:a53:216:3eff:fe16:4d04 (eth0) | CONTAINER | 0         |
+-------+---------+---------------------+----------------------------------------------+-----------+-----------+
| vm-03 | RUNNING | 10.49.238.63 (eth0) | fd42:9cd0:6055:a53:216:3eff:fe34:44cd (eth0) | CONTAINER | 0         |
+-------+---------+---------------------+----------------------------------------------+-----------+-----------+
| vm-04 | RUNNING | 10.49.238.64 (eth0) | fd42:9cd0:6055:a53:216:3eff:fef9:4262 (eth0) | CONTAINER | 0         |
+-------+---------+---------------------+----------------------------------------------+-----------+-----------+
| vm-05 | RUNNING | 10.49.238.65 (eth0) | fd42:9cd0:6055:a53:216:3eff:fe16:2e02 (eth0) | CONTAINER | 0         |
+-------+---------+---------------------+----------------------------------------------+-----------+-----------+
| vm-06 | RUNNING | 10.49.238.66 (eth0) | fd42:9cd0:6055:a53:216:3eff:fe94:4610 (eth0) | CONTAINER | 0         |
+-------+---------+---------------------+----------------------------------------------+-----------+-----------+
| vm-07 | RUNNING | 10.49.238.67 (eth0) | fd42:9cd0:6055:a53:216:3eff:fef1:40f3 (eth0) | CONTAINER | 0         |
+-------+---------+---------------------+----------------------------------------------+-----------+-----------+
| vm-08 | RUNNING | 10.49.238.68 (eth0) | fd42:9cd0:6055:a53:216:3eff:fef5:d909 (eth0) | CONTAINER | 0         |
+-------+---------+---------------------+----------------------------------------------+-----------+-----------+

Now start a MinIO instance using the first four VMs as below. This command should be run from inside all 4 VMs.

minio server http://10.49.238.{61...64}/mnt/virtual-disk/disk{1...4}

Once instance is stabilised, create an mc alias for the cluster as below

mc alias set ALIAS http://10.49.238.61:9000 minioadmin minioadmin

We are ready to load object to the cluster now. Run the below command for the same

mc mb ALIAS/test-bucket
mc cp $HOME/test-minio-rebal/* ALIAS/test-bucket

You may see errors towards end that not more space left on disk and that's fine as cluster is now loaded to the limit with objects. Wait for few secs and verify that objects are loaded to the pool.

mc admin info ALIAS --json | jq -r '.info.pools'
{                                                                                                               
  "0": {                                                                                                               
    "0": {                                                                                                               
      "id": 0,                                                                                                               
      "rawUsage": 3785478144,                                                                                                               
      "rawCapacity": 3800956928,                                                                                                               
      "usage": 1155530752,                                                                                                               
      "objectsCount": 1102,                                                                                                               
      "versionsCount": 0,                                                                                                               
      "healDisks": 0                                                                                                               
    }                                                                                                               
  }                                                                                                              
}

Note: Nos are symbolic and cane be different in actual run.

Extend the cluster and trigger rebalance

We are good to extend the cluster now with new set of nodes. Stop MiniIO processes on first 4 VMs and now from all 8 run the following command

minio server http://10.49.238.{61...64}/mnt/virtual-disk/disk{1...4} http://10.49.238.{65...68}/mnt/virtual-disk/disk{1...4}

Let the cluster stabilize and check if new pools is added

mc admin info ALIAS --json | jq -r '.info.pools'
{
  "0":
    "0": {                                                                                                                                                                   
      "id": 0,                                                                                                                                                               
      "rawUsage": 3785478144,                                                                                                                                                
      "rawCapacity": 3800956928,                                                                                                                                             
      "usage": 1155530752,                                                                                                                                                   
      "objectsCount": 1102,                                                                                                                                                  
      "versionsCount": 0,                                                                                                                                                    
      "healDisks": 0                                                                                                                                                         
    }                                                                                                                                                                        
  },                                                                                                                                                                         
  "1": {                                                                                                                                                                     
    "0": {                                                                                                                                                                   
      "id": 0,                                                                                                                                                               
      "rawUsage": 376832,                                                                                                                                                    
      "rawCapacity": 3800956928,                                                                                                                                             
      "usage": 0,                                                                                                                                                            
      "objectsCount": 0,                                                                                                                                                     
      "versionsCount": 0,                                                                                                                                                    
      "healDisks": 0                                                                                                                                                         
    }                                                                                                                                                                        
  }                                                                                                                                                                          
}

Now safely you can rung the rebalance on the cluster

mc admin rebalance start ALIAS

You can track the status of running rebalance as below

mc admin rebalance status ALIAS
Per-pool usage:
┌─────────┬────────┐
│ Pool-0  │ Pool-1 │
│ 0.85% * │ 0.14%  │
└─────────┴────────┘
Summary:
Data: 390 MiB (195 objects, 195 versions)
Time: 11.40798155s (52.794879439s to completion)

Once rebalance is completed and no more objects to move, you should be able to verify the same as below

mc admin info ALIAS --json | jq -r '.info.pools'
{
  "0": {
    "0": {
      "id": 0,
      "rawUsage": 2029391872,
      "rawCapacity": 3800956928,
      "usage": 1394606080,
      "objectsCount": 1330,
      "versionsCount": 0,
      "healDisks": 0
    }
  },
  "1": {
    "0": {
      "id": 0,
      "rawUsage": 1756606464,
      "rawCapacity": 3800956928,
      "usage": 435159040,
      "objectsCount": 415,
      "versionsCount": 0,
      "healDisks": 0
    }
  }
}

Footnote

Sorry for such a huge documentary. Happy hacking 😎