Installing a Non Release Fork of Ceph on CloudLab - uccross/skyhookdm-ceph GitHub Wiki
The following instructions illustrate the steps for installing a non-release fork of Ceph (such as Skyhook) on a CloudLab instance. The examples use a 1osd image (skyhook-ub18bio-g5-1osd`).
-
Spin up a CloudLab Ubuntu 18 instance.
-
On all nodes (from
~):
# from ~ on all nodes :
sudo apt-get update ;
sudo apt-get install tmux ;
yes | sudo apt-get install vim ;
and add these lines to ~/.vimrc
set tabstop=2
set shiftwidth=2
set expandtab
- On all nodes (from
~), change the hostnames. e.g. onclient0:
# from client0:~
sudo hostname client0 ;
HOSTNAME=client0 ;
e.g. on osd0:
# from osd0:~
sudo hostname osd0 ;
HOSTNAME=osd0 ;
- On all nodes (from
~), simplify/etc/hostswith node identifiers.
# from ~ on all nodes :
sudo vi /etc/hosts ;
example contents for 1osd cluster:
127.0.0.1 localhost
10.10.1.2 client0
10.10.1.1 osd0
- On all nodes (from
~), copy your PRIVATE CloudLab ssh key into.ssh/id_rsa
# from ~ on all nodes :
vi .ssh/id_rsa ;
chmod 0600 .ssh/id_rsa ;
- Copy the following files into
~onclient0and create anodes.txtfile with all the node identifiers for the cluster:
# from client0:~
cp /proj/skyhook-PG0/projscripts/format-sd* . ;
cp /proj/skyhook-PG0/projscripts/mount-sd* . ;
cp /proj/skyhook-PG0/projscripts/cluster_setup_copy_ssh_keys.sh . ;
cp /proj/skyhook-PG0/projscripts/zap-sd* . ;
echo client0 >> nodes.txt ;
echo osd0 >> nodes.txt ;
- Setup the cluster ssh.
# from client0:~
sh cluster_setup_copy_ssh_keys.sh ;
- Format the 500GB SSD drive. This is important because Ceph is ~30GB in size and therefore may not fit on the default partition/device on CloudLab nodes. The
sd*label will be different, depending upon your CloudLab node hardware. Uselsblkto determine the label for your 500GB SSD drive. For thec220g5nodes in this example, the SSD is on/dev/sda4.
Run the format script for the appropriate drive.
# from client0:~
sh format-sda4.sh ; # pass in your user name (e.g. sh format-sda4.sh kat)
- Go to the newly formatted drive and clone your Ceph fork for later (
skyhook-cephin this example).
# from client0:~
cd /mnt/sda4 ;
git clone https://github.com/uccross/skyhook-ceph.git ;
NOTE: you can stop here if you just want to dev or use the ceph vcluster on cloudlab.
- Go back to
~and installceph-deploy
# from client0:~
cd ~ ;
sudo apt-get -f install ;
sudo apt-get install -y python-virtualenv ;
mkdir cluster ;
cd cluster ;
virtualenv env ;
env/bin/pip install ceph-deploy==1.5.38 ; # the release closest after luminous
- Install
ceph-luminouson ALL nodes in the cluster.
# from client0:~/cluster
env/bin/ceph-deploy install --release=luminous client0 osd0 ;
- The above command pulls and installs the latest stable release of luminous on the specified nodes in the cluster.
THIS DOES NOT REFLECT THE CODE IN YOUR FORK/BRANCH! (obvs)
Installing your special copy of Ceph requires selectively replacing/installing files from a deb archive of your fork/branch in the appropriate locations on your cluster AFTER performing a clean luminous install.
In the general case, you will have to:
- know which lib*.so files encompass your code changes from vanilla Ceph
- know which daemon binaries are effected by your code changes
- generate the deb files for your fork/branch with
dpkg(see Building Deb Files for Cloudlab Installs)
- copy the deb files somewhere in your CloudLab cluster instance
- extract the
lib*.so.1.0.0(s) and daemon binary file(s) you're interested in
- examine the locations of the corresponding files in the Ceph (eg luminous) install for appropriate paths
- copy the
lib*.so.1.0.0(s) and binary file(s) into the appropriate locations
For simplicity, let's assume your fork/branch represents a clean CLS extension of vanilla Ceph (e.g. Skyhook). Then you will search your deb files for the libcls_<extension name>.so.1.0.0 encompassing the changes and the ceph-osd binary and copy the files to the appropriate locations in the installation hierarchy created by the vanilla Ceph luminous install. For Skyhook, the CLS extension is in tabular, so we'll be looking for libcls_tabular.so.1.0.0.
- First, figure out which deb files contain
ceph-osdandlibcls_tabular.so.1.0.0using thesearch.shscript.search.shis currently located in the sharedprojscripts/directory.
# from client0:
cd <path to your deb files on this cloudlab instance> ;
cp /proj/skyhook-PG0/projscripts/search.sh . ;
sh search.sh ceph-osd ; #pass string-of-interest as the commandline argument
example:
kat@osd0:/proj/skyhook-PG0/cephbits/katbits-ub18$ ls
ceph_12.2.7-1_amd64.deb librgw2-dbg_12.2.7-1_amd64.deb
ceph-base_12.2.7-1_amd64.deb librgw-dev_12.2.7-1_amd64.deb
ceph-base-dbg_12.2.7-1_amd64.deb python3-ceph-argparse_12.2.7-1_amd64.deb
ceph-common_12.2.7-1_amd64.deb python3-cephfs_12.2.7-1_amd64.deb
ceph-common-dbg_12.2.7-1_amd64.deb python3-cephfs-dbg_12.2.7-1_amd64.deb
ceph-fuse_12.2.7-1_amd64.deb python3-rados_12.2.7-1_amd64.deb
ceph-fuse-dbg_12.2.7-1_amd64.deb python3-rados-dbg_12.2.7-1_amd64.deb
ceph-mds_12.2.7-1_amd64.deb python3-rbd_12.2.7-1_amd64.deb
ceph-mds-dbg_12.2.7-1_amd64.deb python3-rbd-dbg_12.2.7-1_amd64.deb
ceph-mgr_12.2.7-1_amd64.deb python3-rgw_12.2.7-1_amd64.deb
ceph-mgr-dbg_12.2.7-1_amd64.deb python3-rgw-dbg_12.2.7-1_amd64.deb
ceph-mon_12.2.7-1_amd64.deb python-ceph_12.2.7-1_amd64.deb
ceph-mon-dbg_12.2.7-1_amd64.deb python-cephfs_12.2.7-1_amd64.deb
ceph-osd_12.2.7-1_amd64.deb python-cephfs-dbg_12.2.7-1_amd64.deb
ceph-osd-dbg_12.2.7-1_amd64.deb python-rados_12.2.7-1_amd64.deb
ceph-resource-agents_12.2.7-1_amd64.deb python-rados-dbg_12.2.7-1_amd64.deb
ceph-test_12.2.7-1_amd64.deb python-rbd_12.2.7-1_amd64.deb
ceph-test-dbg_12.2.7-1_amd64.deb python-rbd-dbg_12.2.7-1_amd64.deb
libcephfs2_12.2.7-1_amd64.deb python-rgw_12.2.7-1_amd64.deb
libcephfs2-dbg_12.2.7-1_amd64.deb python-rgw-dbg_12.2.7-1_amd64.deb
libcephfs-dev_12.2.7-1_amd64.deb radosgw_12.2.7-1_amd64.deb
libcephfs-java_12.2.7-1_all.deb radosgw-dbg_12.2.7-1_amd64.deb
libcephfs-jni_12.2.7-1_amd64.deb rados-objclass-dev_12.2.7-1_amd64.deb
librados2_12.2.7-1_amd64.deb rbd-fuse_12.2.7-1_amd64.deb
librados2-dbg_12.2.7-1_amd64.deb rbd-fuse-dbg_12.2.7-1_amd64.deb
librados-dev_12.2.7-1_amd64.deb rbd-mirror_12.2.7-1_amd64.deb
libradosstriper1_12.2.7-1_amd64.deb rbd-mirror-dbg_12.2.7-1_amd64.deb
libradosstriper1-dbg_12.2.7-1_amd64.deb rbd-nbd_12.2.7-1_amd64.deb
libradosstriper-dev_12.2.7-1_amd64.deb rbd-nbd-dbg_12.2.7-1_amd64.deb
librbd1_12.2.7-1_amd64.deb remove-local-deb-pkgs-via-dpkg.sh
librbd1-dbg_12.2.7-1_amd64.deb search.sh
librbd-dev_12.2.7-1_amd64.deb
librgw2_12.2.7-1_amd64.deb
kat@osd0:/proj/skyhook-PG0/cephbits/katbits-ub18$ sh search.sh ceph-osd
$1 is NOT empty
using search string 'ceph-osd'
<entries removed for clarity>
checking in ceph-osd-dbg_12.2.7-1_amd64.deb:
drwxr-xr-x root/root 0 2018-07-16 11:00 ./usr/share/doc/ceph-osd-dbg/
-rw-r--r-- root/root 2696 2018-07-16 11:00 ./usr/share/doc/ceph-osd-dbg/changelog.Debian.gz
-rw-r--r-- root/root 6142 2018-07-16 11:00 ./usr/share/doc/ceph-osd-dbg/copyright
checking in ceph-osd_12.2.7-1_amd64.deb:
-rw-r--r-- root/root 639 2018-07-16 11:00 ./etc/init/ceph-osd-all-starter.conf
-rw-r--r-- root/root 113 2018-07-16 11:00 ./etc/init/ceph-osd-all.conf
-rw-r--r-- root/root 722 2018-07-16 11:00 ./etc/init/ceph-osd.conf
-rw-r--r-- root/root 49 2018-07-16 11:00 ./etc/sysctl.d/30-ceph-osd.conf
-rw-r--r-- root/root 181 2018-07-16 11:00 ./lib/systemd/system/ceph-osd.target
-rw-r--r-- root/root 711 2018-07-16 11:00 ./lib/systemd/system/[email protected]
-rw-r--r-- root/root 9038 2018-07-16 11:00 ./lib/udev/rules.d/95-ceph-osd.rules
-rwxr-xr-x root/root 19631048 2018-07-16 11:00 ./usr/bin/ceph-osd
-rwxr-xr-x root/root 4149008 2018-07-16 11:00 ./usr/bin/ceph-osdomap-tool
-rwxr-xr-x root/root 1251 2018-07-16 11:00 ./usr/lib/ceph/ceph-osd-prestart.sh
drwxr-xr-x root/root 0 2018-07-16 11:00 ./usr/share/doc/ceph-osd/
-rw-r--r-- root/root 2696 2018-07-16 11:00 ./usr/share/doc/ceph-osd/changelog.Debian.gz
-rw-r--r-- root/root 6142 2018-07-16 11:00 ./usr/share/doc/ceph-osd/copyright
-rw-r--r-- root/root 1830 2018-07-16 11:00 ./usr/share/man/man8/ceph-osd.8.gz
checking in ceph-resource-agents_12.2.7-1_amd64.deb:
checking in ceph-test-dbg_12.2.7-1_amd64.deb:
checking in ceph-test_12.2.7-1_amd64.deb:
<truncated for clarity>
kat@osd0:/proj/skyhook-PG0/cephbits/katbits-ub18$
So, ceph-osd_12.2.7-1_amd64.deb contains the ceph-osd binary in ./usr/bin.
- Do the same search for
libcls_tabular
kat@osd0:/proj/skyhook-PG0/cephbits/katbits-ub18$ sh search.sh libcls_tabular
$1 is NOT empty
using search string 'libcls_tabular'
checking in ceph-base-dbg_12.2.7-1_amd64.deb:
checking in ceph-base_12.2.7-1_amd64.deb:
-rw-r--r-- root/root 617808 2018-07-16 11:00 ./usr/lib/rados-classes/libcls_tabular.so.1.0.0
lrwxrwxrwx root/root 0 2018-07-16 11:00 ./usr/lib/rados-classes/libcls_tabular.so -> libcls_tabular.so.1
lrwxrwxrwx root/root 0 2018-07-16 11:00 ./usr/lib/rados-classes/libcls_tabular.so.1 -> libcls_tabular.so.1.0.0
checking in ceph-common-dbg_12.2.7-1_amd64.deb:
checking in ceph-common_12.2.7-1_amd64.deb:
<truncated for clarity>
kat@osd0:/proj/skyhook-PG0/cephbits/katbits-ub18$
So, the files relevant for tabular are in ceph-base_12.2.7-1_amd64.deb at ./usr/lib/rados-classes/.
- Next, figure out where
ceph-osdlives andlibclsfiles live in the vanilla Ceph install. Note the find will take a bit of time.
kat@osd0:~$ which ceph-osd
/usr/bin/ceph-osd
kat@osd0:~$ time sudo find / -name libcls*
<entries removed for clarity>
/usr/lib/x86_64-linux-gnu/rados-classes/libcls_kvs.so.1.0.0
/usr/lib/x86_64-linux-gnu/rados-classes/libcls_journal.so
/usr/lib/x86_64-linux-gnu/rados-classes/libcls_timeindex.so.1.0.0
/usr/lib/x86_64-linux-gnu/rados-classes/libcls_lock.so
/usr/lib/x86_64-linux-gnu/rados-classes/libcls_lua.so
<entries removed for clarity>
kat@osd0:~$
So, ceph-osd is in /usr/bin and libcls files go in /usr/lib/x86_64-linux-gnu/rados-classes/.
- Unpack the
ceph-osdfrom your deb archives.
# from client0 (or any other node since this is the shared dir)
cd <path to your deb files>/ ;
mkdir temp_cephosd ;
cd temp_cephosd ;
dpkg-deb -x ../ceph-osd_12.2.7-1_amd64.deb . ;
- Unpack the special
libcls_tabularshared object file from your deb archives.
# from client0 (or any other node since this is the shared dir)
cd <path to your deb files>/ ;
mkdir temp_tabular ;
cd temp_tabular ;
dpkg-deb -x ../ceph-base_12.2.7-1_amd64.deb . ;
- Copy over the
libclsfile(s) on ALL NODES (libcls_tabular.so.1.0.0file ONLY in this example).
cd /usr/lib/x86_64-linux-gnu/rados-classes/ ;
sudo cp /proj/skyhook-PG0/cephbits/katbits-ub18/temp_tabular/usr/lib/rados-classes/libcls_tabular.so.1.0.0 . ;
- Match the symlink convention (on ALL NODES!!!).
sudo ln -s libcls_tabular.so.1.0.0 libcls_tabular.so.1 ;
sudo ln -s libcls_tabular.so.1 libcls_tabular.so ;
Always double check:
kat@client0:/usr/lib/x86_64-linux-gnu/rados-classes$ ls -alh | grep tab
lrwxrwxrwx 1 root root 19 Jun 1 20:50 libcls_tabular.so -> libcls_tabular.so.1
lrwxrwxrwx 1 root root 23 Jun 1 20:50 libcls_tabular.so.1 -> libcls_tabular.so.1.0.0
-rw-r--r-- 1 root root 604K Jun 1 20:48 libcls_tabular.so.1.0.0
kat@client0:/usr/lib/x86_64-linux-gnu/rados-classes$
-
If your fork/branch needs modifications to
ceph-osd, then copy that over to the appropriate directory too. Skyhook doesn't requireceph-osdchanges, so we're not doing the copy. -
Spin up a new Ceph cluster on
client0.
# from client0
cd ~/cluster ;
env/bin/ceph-deploy new client0 ;
- Add these lines (and any other configs you want) to
~/cluster/ceph.conf
osd pool default size = 1
osd pool default min size = 1
osd crush chooseleaf type = 0
osd pool default pg num = 128
osd pool default pgp num = 128
mon_allow_pool_delete = true
osd_class_load_list = *
osd_class_default_list = *
objecter_inflight_op_bytes = 2147483648
enable experimental unrecoverable data corrupting features = *
[osd]
osd max write size = 128 #The maximum size of a write in megabytes.
osd max object size = 256000000 #The maximum size of a RADOS object in bytes.
debug ms = 1
debug osd = 25
debug objecter = 20
debug monc = 20
debug mgrc = 20
debug journal = 20
debug filestore = 20
debug bluestore = 30
debug bluefs = 20
debug rocksdb = 10
debug bdev = 20
debug rgw = 20
debug reserver = 10
debug objclass = 20
- Spin up a new monitor.
# from client0:~/cluster
env/bin/ceph-deploy mon create client0 ;
- Gather keys as the last step in preparation for adding OSDs.
# from client0:~/cluster
env/bin/ceph-deploy gatherkeys client0 ;
- Provision OSDs.
!!! IMPORTANT !!!
Figure out which device on the OSD nodes will host the OSD storage. On the c220g5s, sda4 would not work. The guess is because it's a partition. Ceph wants total control of a whole device. So, for c220g5s, we're using sdb, the 1T HDD. Some CloudLab nodes will have the 500GB SSD as a whole device. In which case, try using those instead.
# from client0:~/cluster
# do the following for every OSD node in the cluster
env/bin/ceph-deploy osd create osd0:/dev/sdb ; #if you're working with more than 1 OSD, list them here like osd0:<device path> osd1:<device path> ... osdN:<device path>
- Spin up an admin.
# from client0:~/cluster
env/bin/ceph-deploy admin client0 ;
sudo chmod a+r /etc/ceph/ceph.client.admin.keyring ;
ceph osd set noscrub ; #optional. this will trigger status warnings.
ceph osd set nodeep-scrub ; #optional. this will trigger status warnings.
- Make sure the OSDs have the config files.
# from client0:~/cluster
env/bin/ceph-deploy config push osd0 ; #if you're working with more than 1 osd, list them here like osd0 osd1 ... osdN
- Spin up a mgr to get rid of status warnings.
# from client0:~/cluster
env/bin/ceph-deploy mgr create client0 ;
- Create a test pool with the appropriate number of placement groups. (see [https://ceph.com/pgcalc/])
# from client0:~/cluster
ceph osd pool create testpool 64 64 replicated ; #or rados mkpool testpool
ceph osd pool set testpool size 1 ; #need to set the size
- Associate an application with every pool to get rid of status warnings.
ceph osd pool application enable testpool mytest ;
- Try adding objects to the pool.
rados -p testpool put testobj0 /proj/skyhook-PG0/kat_stuff/dataset_obj_v_file_100mb.txt ;
rados -p testpool put testobj1 /proj/skyhook-PG0/kat_stuff/dataset_obj_v_file_100mb.txt ;
rados -p testpool put testobj2 /proj/skyhook-PG0/kat_stuff/dataset_obj_v_file_100mb.txt ;
example results:
kat@client0:~/cluster$ ceph -s
2019-06-02 17:07:45.143254 7f52d45f0700 -1 WARNING: all dangerous and experimental features are enabled.
2019-06-02 17:07:45.163293 7f52d45f0700 -1 WARNING: all dangerous and experimental features are enabled.
cluster:
id: 6839bbb0-ac95-497c-a5d0-43d4621320de
health: HEALTH_OK
services:
mon: 1 daemons, quorum client0
mgr: client0(active)
osd: 1 osds: 1 up, 1 in
data:
pools: 1 pools, 64 pgs
objects: 3 objects, 298MiB
usage: 1.29GiB used, 1.09TiB / 1.09TiB avail
pgs: 64 active+clean
io:
client: 16.4MiB/s wr, 0op/s rd, 4op/s wr
kat@client0:~/cluster$
- Build your fork drivers and test. For Skyhook:
cd /mnt/sda4 ;
cd skyhook-ceph ;
git checkout <your branch> ;
./install-deps.sh ;
./do_cmake.sh ;
cd build ;
make run-query ;
rados mkpool tpchflatbuf ;
ceph osd pool set tpchflatbuf size 1 ;
cp /proj/skyhook-PG0/kat_stuff/skyhookdb.testdb.lineitem.oid* . ;
yes | PATH=$PATH:bin ../src/progly/rados-store-glob.sh tpchflatbuf skyhookdb.testdb.lineitem.oid.* ;
bin/run-query --num-objs 2 --pool tpchflatbuf --wthreads 1 --qdepth 10 --query flatbuf --select "orderkey,lt,5;linenumber,gt,5;tax,leq,1.01;" --project-cols orderkey,linenumber,tax,comment --use-cls ;
Grep the OSD log to make sure CLS calls are working:
root@osd0:/var/log/ceph# grep "tabular" ./ceph-osd.0.log