Quickstart - michaelsevilla/mantle-popper GitHub Wiki

Quickstart

Warnings: developing balancers on a single node is difficult because daemons become laggy.

Pre-requisites:

  1. you've compiled Ceph with Mantle

  2. you've installed mdtest or pulled the Docker image

Mantle with vstart.sh

  1. Start Ceph and tune the logging so we can see migrations happen:

    ./vstart.sh -n -l
    for i in a b c; do 
      ./ceph --admin-daemon out/mds.$i.asok config set debug_ms 0
      ./ceph --admin-daemon out/mds.$i.asok config set debug_mds 0
      ./ceph --admin-daemon out/mds.$i.asok config set debug_mds_balancer 2
      ./ceph --admin-daemon out/mds.$i.asok config set mds_beacon_grace 1500
    done
    
  2. Put the balancer into RADOS:

    ./rados put --pool=cephfs_metadata_a greedyspill.lua mds/balancers/greedyspill.lua
    
  3. Activate Mantle:

    ./ceph mds set allow_multimds true --yes-i-really-mean-it
    ./ceph mds set max_mds 5
    ./ceph mds set balancer greedyspill.lua
    
  4. Mount CephFS in another window:

    ./ceph-fuse /cephfs -o allow_other &
    tail -f out/mds.a.log
    
  5. Run a simple benchmark. In our case, we use the Docker mdtest image to create load. Assuming that CephFS is mounted in the first container, we can share the mount and run 3 clients using:

    for i in 0 1 2; do
      docker run -d \
        --name=client$i \
        --volumes-from ceph-dev \
        michaelsevilla/mdtest \
        -F -C -n 100000 -d "/cephfs/client-test$i"
    done
    
  6. When you're done, you can kill all the clients with:

    for i in 0 1 2 3; do docker rm -f client$i; done
    

Output

Testing with vstart is difficult because all daemons run on the same node. Some daemons become laggy so the balancer may not work as intended because heartbeats get lost. Here is an example of a typical vstart run:

2016-08-21 06:44:01.763930 7fd03aaf7700  0 lua.balancer MDS0: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=1.0 queue_len=0.0 cpu_load_avg=1.35 > load=0.0
2016-08-21 06:44:01.763966 7fd03aaf7700  0 lua.balancer MDS1: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=0.0 queue_len=0.0 cpu_load_avg=1.35 > load=0.0
2016-08-21 06:44:01.763982 7fd03aaf7700  0 lua.balancer MDS2: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=0.0 queue_len=0.0 cpu_load_avg=1.35 > load=0.0
2016-08-21 06:44:01.764010 7fd03aaf7700  2 lua.balancer when: not migrating! my_load=0.0 hisload=0.0
2016-08-21 06:44:01.764033 7fd03aaf7700  2 mds.0.bal  mantle decided that new targets={}
2016-08-21 06:44:11.742084 7fd03aaf7700  0 lua.balancer MDS0: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=1.0 queue_len=0.0 cpu_load_avg=1.14 > load=0.0
2016-08-21 06:44:11.742110 7fd03aaf7700  0 lua.balancer MDS1: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=0.0 queue_len=0.0 cpu_load_avg=1.14 > load=0.0
2016-08-21 06:44:11.742122 7fd03aaf7700  0 lua.balancer MDS2: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=0.0 queue_len=0.0 cpu_load_avg=1.14 > load=0.0
2016-08-21 06:44:11.742131 7fd03aaf7700  2 lua.balancer when: not migrating! my_load=0.0 hisload=0.0
2016-08-21 06:44:11.742150 7fd03aaf7700  2 mds.0.bal  mantle decided that new targets={}
2016-08-21 06:44:21.901355 7fd03aaf7700  0 lua.balancer MDS0: < auth.meta_load=4593.618288637 all.meta_load=1804.9266219747 req_rate=3163.0 queue_len=0.0 cpu_load_avg=1.19 > load=1804.9266219747
2016-08-21 06:44:21.901388 7fd03aaf7700  0 lua.balancer MDS1: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=0.0 queue_len=0.0 cpu_load_avg=1.19 > load=0.0
2016-08-21 06:44:21.901403 7fd03aaf7700  0 lua.balancer MDS2: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=0.0 queue_len=0.0 cpu_load_avg=1.19 > load=0.0
2016-08-21 06:44:21.901415 7fd03aaf7700  2 lua.balancer when: migrating! my_load=1804.9266219747 hisload=0.0
2016-08-21 06:44:21.901442 7fd03aaf7700  2 mds.0.bal  mantle decided that new targets={0=0,1=902.463,2=0}
2016-08-21 06:45:21.869994 7fd03aaf7700  0 lua.balancer MDS0: < auth.meta_load=5834.188908912 all.meta_load=1953.3492228857 req_rate=12591.0 queue_len=1075.0 cpu_load_avg=3.05 > load=1953.3492228857
2016-08-21 06:45:21.870017 7fd03aaf7700  0 lua.balancer MDS1: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=0.0 queue_len=0.0 cpu_load_avg=3.05 > load=0.0
2016-08-21 06:45:21.870027 7fd03aaf7700  0 lua.balancer MDS2: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=0.0 queue_len=0.0 cpu_load_avg=3.05 > load=0.0
2016-08-21 06:45:21.870034 7fd03aaf7700  2 lua.balancer when: migrating! my_load=1953.3492228857 hisload=0.0
2016-08-21 06:45:21.870050 7fd03aaf7700  2 mds.0.bal  mantle decided that new targets={0=0,1=976.675,2=0}
2016-08-21 06:45:21.870094 7fd03aaf7700  0 mds.0.bal    - exporting [0,0.52287 1.04574] 1030.88 to mds.1 [dir 100000006ab /client-test2/ [2,head] auth pv=33 v=32 cv=32/0 ap=2+3+4 state=1610612802|complete f(v0 m2016-08-21 06:44:20.366935 1=0+1) n(v2 rc2016-08-21 06:44:30.946816 3790=3788+2) hs=1+0,ss=0+0 dirty=1 | child=1 dirty=1 authpin=1 0x55d2762fd690]
2016-08-21 06:45:21.870151 7fd03aaf7700  0 mds.0.migrator nicely exporting to mds.1 [dir 100000006ab /client-test2/ [2,head] auth pv=33 v=32 cv=32/0 ap=2+3+4 state=1610612802|complete f(v0 m2016-08-21 06:44:20.366935 1=0+1) n(v2 rc2016-08-21 06:44:30.946816 3790=3788+2) hs=1+0,ss=0+0 dirty=1 | child=1 dirty=1 authpin=1 0x55d2762fd690]
2016-08-21 06:45:21.870785 7fd0381f1700  0 log_channel(cluster) log [WRN] : 2 slow requests, 2 included below; oldest blocked for > 50.224216 secs
2016-08-21 06:45:21.870806 7fd0381f1700  0 log_channel(cluster) log [WRN] : slow request 50.224216 seconds old, received at 2016-08-21 06:44:31.646441: client_request(client.4117:12590 create #100000003dd/file.mdtest.0.3905 2016-08-21 06:44:31.646352) currently submit entry: journal_and_reply
2016-08-21 06:45:21.870816 7fd0381f1700  0 log_channel(cluster) log [WRN] : slow request 50.223735 seconds old, received at 2016-08-21 06:44:31.646922: client_request(client.4117:12591 create #100000006b1/file.mdtest.0.4043 2016-08-21 06:44:31.646847) currently submit entry: journal_and_reply
2016-08-21 06:45:22.015865 7fd03aaf7700  0 lua.balancer MDS0: < auth.meta_load=0.91337977931428 all.meta_load=0.91337977931428 req_rate=12591.0 queue_len=5417.0 cpu_load_avg=4.72 > load=0.91337977931428
2016-08-21 06:45:22.015884 7fd03aaf7700  0 lua.balancer MDS1: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=0.0 queue_len=0.0 cpu_load_avg=4.72 > load=0.0
2016-08-21 06:45:22.015892 7fd03aaf7700  0 lua.balancer MDS2: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=0.0 queue_len=0.0 cpu_load_avg=4.72 > load=0.0
2016-08-21 06:45:22.015898 7fd03aaf7700  2 lua.balancer when: migrating! my_load=0.91337977931428 hisload=0.0
2016-08-21 06:45:22.015911 7fd03aaf7700  2 mds.0.bal  mantle decided that new targets={0=0,1=0.45669,2=0}
2016-08-21 06:45:36.919067 7fd03aaf7700  0 lua.balancer MDS0: < auth.meta_load=1.2604942010368 all.meta_load=1.2604942010368 req_rate=14099.0 queue_len=0.0 cpu_load_avg=5.38 > load=1.2604942010368
2016-08-21 06:45:36.919088 7fd03aaf7700  0 lua.balancer MDS1: < auth.meta_load=0.0 all.meta_load=0.25471096737569 req_rate=2.0 queue_len=0.0 cpu_load_avg=5.38 > load=0.25471096737569
2016-08-21 06:45:36.919098 7fd03aaf7700  0 lua.balancer MDS2: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=1.0 queue_len=0.0 cpu_load_avg=5.38 > load=0.0
2016-08-21 06:45:36.919104 7fd03aaf7700  2 lua.balancer when: not migrating! my_load=1.2604942010368 hisload=0.25471096737569
2016-08-21 06:45:36.919117 7fd03aaf7700  2 mds.0.bal  mantle decided that new targets={}
2016-08-21 06:45:43.630414 7fd03aaf7700  0 lua.balancer MDS0: < auth.meta_load=810.58304369097 all.meta_load=744.41534683744 req_rate=16703.0 queue_len=0.0 cpu_load_avg=6.27 > load=744.41534683744
2016-08-21 06:45:43.630450 7fd03aaf7700  0 lua.balancer MDS1: < auth.meta_load=0.0 all.meta_load=0.063229062693058 req_rate=2.0 queue_len=0.0 cpu_load_avg=6.27 > load=0.063229062693058
2016-08-21 06:45:43.630465 7fd03aaf7700  0 lua.balancer MDS2: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=1.0 queue_len=0.0 cpu_load_avg=6.27 > load=0.0
2016-08-21 06:45:43.630487 7fd03aaf7700  2 lua.balancer when: not migrating! my_load=744.41534683744 hisload=0.063229062693058
2016-08-21 06:45:43.630506 7fd03aaf7700  2 mds.0.bal  mantle decided that new targets={}
2016-08-21 06:45:54.242445 7fd03aaf7700  0 lua.balancer MDS0: < auth.meta_load=1345.6408819571 all.meta_load=533.80486360634 req_rate=24504.0 queue_len=0.0 cpu_load_avg=6.69 > load=533.80486360634
2016-08-21 06:45:54.242480 7fd03aaf7700  0 lua.balancer MDS1: < auth.meta_load=0.0 all.meta_load=0.015776540816676 req_rate=2.0 queue_len=0.0 cpu_load_avg=6.69 > load=0.015776540816676
2016-08-21 06:45:54.242495 7fd03aaf7700  0 lua.balancer MDS2: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=1.0 queue_len=0.0 cpu_load_avg=6.69 > load=0.0
2016-08-21 06:45:54.242507 7fd03aaf7700  2 lua.balancer when: not migrating! my_load=533.80486360634 hisload=0.015776540816676
2016-08-21 06:45:54.242523 7fd03aaf7700  2 mds.0.bal  mantle decided that new targets={}
2016-08-21 06:45:56.922132 7fd0381f1700  0 log_channel(cluster) log [WRN] : 2 slow requests, 2 included below; oldest blocked for > 34.727950 secs
2016-08-21 06:45:56.922144 7fd0381f1700  0 log_channel(cluster) log [WRN] : slow request 34.727950 seconds old, received at 2016-08-21 06:45:22.194124: slave_request(client.4117:12594.0 authpin) currently initiated
2016-08-21 06:45:56.922148 7fd0381f1700  0 log_channel(cluster) log [WRN] : slow request 34.595897 seconds old, received at 2016-08-21 06:45:22.326178: rejoin:client.4117:12596 currently initiated
2016-08-21 06:46:02.458610 7fd03aaf7700  0 lua.balancer MDS0: < auth.meta_load=1560.7909106678 all.meta_load=771.18863180669 req_rate=32159.0 queue_len=0.0 cpu_load_avg=7.35 > load=771.18863180669
2016-08-21 06:46:02.458632 7fd03aaf7700  0 lua.balancer MDS1: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=2.0 queue_len=0.0 cpu_load_avg=7.35 > load=0.0
2016-08-21 06:46:02.458641 7fd03aaf7700  0 lua.balancer MDS2: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=1.0 queue_len=0.0 cpu_load_avg=7.35 > load=0.0
2016-08-21 06:46:02.458649 7fd03aaf7700  2 lua.balancer when: migrating! my_load=771.18863180669 hisload=0.0
2016-08-21 06:46:02.458663 7fd03aaf7700  2 mds.0.bal  mantle decided that new targets={0=0,1=385.594,2=0}
2016-08-21 06:46:18.521698 7fd03aaf7700  0 lua.balancer MDS0: < auth.meta_load=1911.6809007051 all.meta_load=937.80486303514 req_rate=43410.0 queue_len=32.0 cpu_load_avg=9.74 > load=937.80486303514
2016-08-21 06:46:18.521726 7fd03aaf7700  0 lua.balancer MDS1: < auth.meta_load=740.45203236014 all.meta_load=563.72504572142 req_rate=4487.0 queue_len=0.0 cpu_load_avg=9.74 > load=563.72504572142
2016-08-21 06:46:18.521737 7fd03aaf7700  0 lua.balancer MDS2: < auth.meta_load=0.0 all.meta_load=0.52749434660449 req_rate=1.0 queue_len=0.0 cpu_load_avg=9.74 > load=0.52749434660449
2016-08-21 06:46:18.521751 7fd03aaf7700  2 lua.balancer when: not migrating! my_load=937.80486303514 hisload=563.72504572142
2016-08-21 06:46:18.521767 7fd03aaf7700  2 mds.0.bal  mantle decided that new targets={}
2016-08-21 06:46:23.522569 7fd0381f1700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 61.196347 secs
2016-08-21 06:46:23.522582 7fd0381f1700  0 log_channel(cluster) log [WRN] : slow request 61.196347 seconds old, received at 2016-08-21 06:45:22.326178: rejoin:client.4117:12596 currently initiated
2016-08-21 06:46:23.592688 7fd0381f1700  0 log_channel(cluster) log [WRN] : client.4117 isn't responding to mclientcaps(revoke), ino 10000003121 pending pAsxLsXsxFscr issued pAsxLsXsxFsxcrwb, sent 61.266069 seconds ago
2016-08-21 06:46:26.974021 7fd03aaf7700  0 lua.balancer MDS0: < auth.meta_load=248.1158482066 all.meta_load=44.952877248594 req_rate=49307.0 queue_len=0.0 cpu_load_avg=11.09 > load=44.952877248594
2016-08-21 06:46:26.974045 7fd03aaf7700  0 lua.balancer MDS1: < auth.meta_load=1907.1096319387 all.meta_load=1163.5221067181 req_rate=9921.0 queue_len=0.0 cpu_load_avg=11.09 > load=1163.5221067181
2016-08-21 06:46:26.974057 7fd03aaf7700  0 lua.balancer MDS2: < auth.meta_load=0.0 all.meta_load=0.11552373615274 req_rate=1.0 queue_len=0.0 cpu_load_avg=11.09 > load=0.11552373615274
2016-08-21 06:46:26.974066 7fd03aaf7700  2 lua.balancer when: not migrating! my_load=44.952877248594 hisload=1163.5221067181
2016-08-21 06:46:26.974078 7fd03aaf7700  2 mds.0.bal  mantle decided that new targets={}
2016-08-21 06:46:34.092200 7fd03aaf7700  0 lua.balancer MDS0: < auth.meta_load=189.33195157637 all.meta_load=118.79485827008 req_rate=54912.0 queue_len=0.0 cpu_load_avg=12.44 > load=118.79485827008
2016-08-21 06:46:34.092223 7fd03aaf7700  0 lua.balancer MDS1: < auth.meta_load=995.97360015722 all.meta_load=319.23241547619 req_rate=16070.0 queue_len=0.0 cpu_load_avg=12.44 > load=319.23241547619
2016-08-21 06:46:34.092234 7fd03aaf7700  0 lua.balancer MDS2: < auth.meta_load=0.0 all.meta_load=0.02888056437053 req_rate=1.0 queue_len=0.0 cpu_load_avg=12.44 > load=0.02888056437053
2016-08-21 06:46:34.092242 7fd03aaf7700  2 lua.balancer when: not migrating! my_load=118.79485827008 hisload=319.23241547619
2016-08-21 06:46:34.092273 7fd03aaf7700  2 mds.0.bal  mantle decided that new targets={}
2016-08-21 06:46:52.595112 7fd03aaf7700  0 lua.balancer MDS0: < auth.meta_load=2810.6849530314 all.meta_load=879.21284949283 req_rate=73283.0 queue_len=1.0 cpu_load_avg=12.27 > load=879.21284949283
2016-08-21 06:46:52.595158 7fd03aaf7700  0 lua.balancer MDS1: < auth.meta_load=1700.9354519463 all.meta_load=720.7756378639 req_rate=24545.0 queue_len=0.0 cpu_load_avg=12.27 > load=720.7756378639
2016-08-21 06:46:52.597953 7fd03aaf7700  0 lua.balancer MDS2: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=1.0 queue_len=0.0 cpu_load_avg=12.27 > load=0.0
2016-08-21 06:46:52.597985 7fd03aaf7700  2 lua.balancer when: not migrating! my_load=879.21284949283 hisload=720.7756378639
2016-08-21 06:46:52.598020 7fd03aaf7700  2 mds.0.bal  mantle decided that new targets={}
2016-08-21 06:47:01.147084 7fd03aaf7700  0 lua.balancer MDS0: < auth.meta_load=858.62263604989 all.meta_load=540.31341086379 req_rate=78797.0 queue_len=0.0 cpu_load_avg=12.49 > load=540.31341086379
2016-08-21 06:47:01.147112 7fd03aaf7700  0 lua.balancer MDS1: < auth.meta_load=42.579286795451 all.meta_load=42.579286795451 req_rate=27074.0 queue_len=0.0 cpu_load_avg=12.49 > load=42.579286795451
2016-08-21 06:47:01.147124 7fd03aaf7700  0 lua.balancer MDS2: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=1.0 queue_len=0.0 cpu_load_avg=12.49 > load=0.0
2016-08-21 06:47:01.147135 7fd03aaf7700  2 lua.balancer when: not migrating! my_load=540.31341086379 hisload=42.579286795451
2016-08-21 06:47:01.147149 7fd03aaf7700  2 mds.0.bal  mantle decided that new targets={}
2016-08-21 06:47:10.210253 7fd03aaf7700  0 lua.balancer MDS0: < auth.meta_load=415.77414300449 all.meta_load=415.79000078186 req_rate=82813.0 queue_len=0.0 cpu_load_avg=11.97 > load=415.79000078186
2016-08-21 06:47:10.210277 7fd03aaf7700  0 lua.balancer MDS1: < auth.meta_load=228.72023977691 all.meta_load=186.5606496623 req_rate=28580.0 queue_len=0.0 cpu_load_avg=11.97 > load=186.5606496623
2016-08-21 06:47:10.210290 7fd03aaf7700  0 lua.balancer MDS2: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=1.0 queue_len=0.0 cpu_load_avg=11.97 > load=0.0
2016-08-21 06:47:10.210298 7fd03aaf7700  2 lua.balancer when: not migrating! my_load=415.79000078186 hisload=186.5606496623
2016-08-21 06:47:10.210311 7fd03aaf7700  2 mds.0.bal  mantle decided that new targets={}
2016-08-21 06:47:19.844334 7fd03aaf7700  0 lua.balancer MDS0: < auth.meta_load=143.34243216393 all.meta_load=143.34243216393 req_rate=86005.0 queue_len=0.0 cpu_load_avg=12.04 > load=143.34243216393
2016-08-21 06:47:19.844364 7fd03aaf7700  0 lua.balancer MDS1: < auth.meta_load=833.62957934342 all.meta_load=408.50201738119 req_rate=31109.0 queue_len=0.0 cpu_load_avg=12.04 > load=408.50201738119
2016-08-21 06:47:19.844375 7fd03aaf7700  0 lua.balancer MDS2: < auth.meta_load=0.0 all.meta_load=0.0 req_rate=1.0 queue_len=0.0 cpu_load_avg=12.04 > load=0.0
2016-08-21 06:47:19.844392 7fd03aaf7700  2 lua.balancer when: not migrating! my_load=143.34243216393 hisload=408.50201738119
2016-08-21 06:47:19.844408 7fd03aaf7700  2 mds.0.bal  mantle decided that new targets={}
2016-08-21 06:47:22.600237 7fd0381f1700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 120.273988 secs
2016-08-21 06:47:22.600253 7fd0381f1700  0 log_channel(cluster) log [WRN] : slow request 120.273988 seconds old, received at 2016-08-21 06:45:22.326178: rejoin:client.4117:12596 currently initiated
2016-08-21 06:47:22.659926 7fd0381f1700  0 log_channel(cluster) log [WRN] : client.4117 isn't responding to mclientcaps(revoke), ino 10000003121 pending pAsxLsXsxFscr issued pAsxLsXsxFsxcrwb, sent 120.333336 seconds ago

Special notes:

We use mdtest because we need to generate enough load to get over the MIN_OFFLOAD threshold that is arbitrarily set in MDBalancer.cc. For example, this does not create enough metadata load:

while true; do
  touch "/cephfs/blah-`date`"
done
./vstart.sh -n -l; 

./ceph mds set allow_multimds true --yes-i-really-mean-it; ./ceph mds set max_mds 5; for i in a b c; do ./ceph --admin-daemon out/mds.$i.asok config set debug_ms 0; ./ceph --admin-daemon out/mds.$i.asok config set debug_mds_balancer 0; ./ceph --admin-daemon out/mds.$i.asok config set debug_mds_locker 0; done; ./rados put --pool=cephfs_metadata_a greedyspill.lua mds/balancers/greedyspill.lua; ./ceph mds set balancer greedyspill.lua

./ceph mds set allow_multimds true --yes-i-really-mean-it; ./ceph mds set max_mds 5; for i in a b c; do ./ceph --admin-daemon out/mds.$i.asok config set debug_ms 0; ./ceph --admin-daemon out/mds.$i.asok config set debug_mds_balancer 5; done; ./rados put --pool=cephfs_metadata_a greedyspill.lua mds/balancers/test.lua; ./ceph mds set balancer test.lua