IT: HOWTO: Install Ceph - feralcoder/shared GitHub Wiki
feralcoder public Home
feralcoder IT
Living Room Data Center
My Private Cloud
Kolla-Ansible OpenStack Deployment
HOWTO: Install Kolla-Ansible
HOWTO: Setup Docker Registries For OpenStack
HOWTO: Kolla-Ansible Container Management
HOWTO: Setup Octavia LBAAS
My Ceph installation is useful by itself, but its higher purpose is to serve as the storage layer for OpenStack. As such, this HOWTO is written to be used in conjunction with the Kolla-Ansible OpenStack Deployment HOWTO. Some requirements for Ceph won't be listed here because they're already outlined there. I will try to note and refer, though.
The OpenStack HOWTO outlines some issues which necessitate having local Docker regisries, and they apply to the Ceph install also.
HOWTO: Setup Docker Registries For OpenStack
There is one difference with ceph-ansible, though, which alleviates some of those problems: the ceph-ansible containers are version-tagged at points in time, so it may be sufficient to use only a pull-thru container cache, if you're only deploying Ceph.
CentOS 8 doesn't support Docker CE, and the ceph-ansible maintainers take this to heart. Ansible scripts categorically assume podman on CentOS 8 systems. Some modification of upstream scripts is required to usse Docker. These scripts are updated frequently, so be ready to manage and merge...
At some point in the future I will revisit this and switch over to podman.
Requiremnts already handled in the OpenStack HOWTO include, at least: users, python, ansible, venvs, firewall, sshpass, podman/buildah removal, network bond setup...
I'm using the stack user for both kolla-ansible and ceph-ansible deployments.
VERSION=4.0 # Nautilus mkdir ~/CODE/ceph && cd ~/CODE/ceph git clone https://github.com/ceph/ceph-ansible.git cd ceph-ansible git checkout stable-$VERSION pip install -U pip pip install -r requirements.txt
I recommend doing your work in a virtual environment dedicated to ceph-ansible.
First, set it up:
mkdir -p ~/CODE/venvs/ceph-ansible python3 -m venv ~/CODE/venvs/ceph-ansible source ~/CODE/venvs/ceph-ansible/bin/activate
From here, all ceph-ansible commands should be run after sourcing the ceph-ansible venv:
source ~/CODE/venvs/ceph-ansible/bin/activate
I'm deploying 3 monitor / manager nodes and 5 OSD's. I'm also deploying Rados Gateway, MetaData Server (for CephFS), and Grafana with Prometheus.
My inventory file (~/CODE/ceph/ceph-ansible/hosts) looks like:
[mons] strange-api ansible_user=stack ansible_become=true merlin-api ansible_user=stack ansible_become=true gandalf-api ansible_user=stack ansible_become=true [osds] kerrigan-api ansible_user=stack ansible_become=true neo-api ansible_user=stack ansible_become=true bowman-api ansible_user=stack ansible_become=true lawnmowerman-api ansible_user=stack ansible_become=true manhattan-api ansible_user=stack ansible_become=true [grafana-server] strange-api ansible_user=stack ansible_become=true [mgrs] strange-api ansible_user=stack ansible_become=true merlin-api ansible_user=stack ansible_become=true gandalf-api ansible_user=stack ansible_become=true [rgws] strange-api ansible_user=stack ansible_become=true merlin-api ansible_user=stack ansible_become=true gandalf-api ansible_user=stack ansible_become=true [mdss] strange-api ansible_user=stack ansible_become=true merlin-api ansible_user=stack ansible_become=true gandalf-api ansible_user=stack ansible_become=true [clients] kerrigan-api ansible_user=stack ansible_become=true neo-api ansible_user=stack ansible_become=true bowman-api ansible_user=stack ansible_become=true lawnmowerman-api ansible_user=stack ansible_become=true manhattan-api ansible_user=stack ansible_become=true strange-api ansible_user=stack ansible_become=true merlin-api ansible_user=stack ansible_become=true gandalf-api ansible_user=stack ansible_become=true dumbledore ansible_user=stack ansible_become=true
Edit and place $CEPH_CODE_DIR/group_vars/all.yml.
For me, this configuration includes:
- Ceph is pinned to a point-in-time version tag, so I can decide when I'm ready to deal with pulling and testing updates.
- I've created named bonds for reliable interface identification across servers (see the Kolla-Ansible OpenStack HOWTO)
- public_network matches the radosgw/monitor interface
- cluster_network matches the cluster_interface
- configure desired dashboard parameters
- ceph_docker_image and ceph_docker_registry can be changed to support a localized and possibly customized container image
- ceph can make some optimizations if you intend to run other tasks, ie OpenStack Compute, on the same hosts (HCI=Hyper-Converged-Infrastructure)
- cephx requires considerable additional configuration in OpenStack, and some corrections too
% diff ceph-all.yml ceph-all-sample.yml < ceph_docker_image_tag: master-86da1a4-nautilus-centos-7 < < radosgw_interface: bond4 < monitor_interface: bond4 < cluster_interface: bond5 < < public_network: 172.18.0.0/24 < cluster_network: 172.19.0.0/24 < < dashboard_enabled: True < dashboard_protocol: https < dashboard_frontend_vip: '172.18.0.10' < dashboard_port: 8443 < dashboard_admin_password: s3cr3t < grafana_admin_password: s3cr3t < dashboard_crt: '' < dashboard_key: '' < < ceph_docker_image: "ceph/daemon" < containerized_deployment: true < ceph_docker_registry: docker.io < < is_hci: true < < cephx: true > # Dummy variable to avoid error because ansible does not recognize the > # file as a good configuration file when no variable in it. > dummy:
For automation compatibility reasons elsewhere, I've decided I'm using Docker. The ceph-ansible team has decided they won't do docker on CentOS 8.
I have to override 'podman' logic in $CEPH_CHECKOUT_DIR/ceph-site-docker.yml:
#container_binary: "{{ 'podman' if (podman_binary.stat.exists and ansible_facts['distribution'] == 'Fedora') or (ansible_facts['os_family'] == 'RedHat' and ansible_facts['distribution_major_version'] == '8') else 'docker' }}" container_binary: "{{ 'docker' }}"
And also in $CEPH_CHECKOUT_DIR/roles/ceph-facts/tasks/container_binary.yml:
#container_binary: "{{ 'podman' if (podman_binary.stat.exists and ansible_facts['distribution'] == 'Fedora') or (ansible_facts['os_family'] == 'RedHat' and ansible_facts['distribution_major_version'] == '8') else 'docker' }}" container_binary: 'docker'
And in $CEPH_CHECKOUT_DIR/roles/ceph-container-engine/tasks/pre_requisites/prerequisites.yml, I comment out everything except:
- name: include specific variables include_vars: "{{ item }}" with_first_found: - "{{ ansible_facts['distribution'] }}-{{ ansible_facts['distribution_major_version'] }}.yml" - "{{ ansible_facts['os_family'] }}.yml" - name: debian based systems tasks include_tasks: debian_prerequisites.yml when: - ansible_facts['os_family'] == 'Debian' tags: with_pkg
Right now I've got a single OSD volume in each OSD server. Each OSD volume consists of a spinning data drive for data, and an SSD db drive for DB and WAL. It's possible to specify a separate WAL drive for each OSD, but it will default to the DB drive if not provided. Both DB and WAL will be on the data drive if no DB drive is provided. Sizing recommendations I've seen are 4% DB to DATA - I'm doing 240GB vs 4TB, 6%.
I'm using Bluestore, because it's the only sensible way, now (vs Filestore):
- Elimination of the unnecessary filesystem intermediate layer is huge, for performance and architectural correctness.
- Block-level checksumming provides an entirely different category of data reliability.
- Bluestore allows erasure coded pools to serve CephFS and RBD.
- Maintaining rollback information for data overwrites is too expensive through the old OSD-on-Filesystem model.
- Bluestore provides additional primitives to make this feasible.
I'm using LVM OSD's for one reason: predictable device identification via LVM names. The other way would be to specify the raw devices, but even my Proliant servers don't give me 100% reliable device naming, and udev still doesn't support renaming the block devices.
$CEPH_CHECKOUT_DIR/group_vars/osds.yml:
osd_scenario: lvm osd_objectstore: bluestore dmcrypt: false #devices: [] lvm_volumes: - data: OSD_1_data data_vg: OSD_1_data db: OSD_1_db db_vg: OSD_1_db
In addition to using LVM for reliable OSD identification by ceph-ansible, I'm also putting LVM into a gpt partition, rather than on the raw disk. I do this so that my OSD management scripts can easily identify my OSD devices by partition name, and I can rebuild those devices without ever risking losing the identifying handle.
To prepare these devices, I run on each OSD server:
# ASSUMING BLANK DRIVES DATA_VG=OSD_1_data DATA_DEV=/dev/sdX DB_VG=OSD_1_db DB_DEV=/dev/sdY parted --script $DATA_DEV mklabel gpt mkpart primary 1MiB 100% name 1 $DATA_VG set 1 lvm on pvcreate -y ${DATA_DEV}1; vgcreate -y $VG ${DATA_DEV}1; lvcreate -y -n $DATA_VG -l 100%FREE $VG parted --script $DB_DEV mklabel gpt mkpart primary 1MiB 100% name 1 $DB_VG 1 lvm on pvcreate -y ${DB_DEV}1; vgcreate -y $VG ${DB_DEV}1; lvcreate -y -n $DB_VG -l 100%FREE $VG
It can be problematic reusing drives, which may already be interracting with the OS. For example, existing LVM state will already be loaded and activated, even if you haven't done anything with the drives. It's advised to wipe them before doing the above step. DD followed by reboot will do the trick. Or, if you're reinstalling a system with existing OSD's, do this:
for VG in $DATA_VG $DB_VG; do lvremove -y $VG vgremove -y $VG done for DEV in $DATA_DEV $DB_DEV; do pvremove -y ${DEV}1 dd bs=1M count=1024 conv=sync if=/dev/zero of=${DEV}1 done
Finally, we can wrap this up! Nope, stick around, buddy.
ansible-playbook $CEPH_CHECKOUT_DIR/site-docker.yml -i $INVENTORY -e container_package_name=docker-ce
Ceph and Kolla-Ansible require substantial additional configuration to work together.
(More client auth details can be found here : https://docs.ceph.com/en/latest/rbd/rbd-openstack/)
My configuration enables glance, cinder, nova, and gnocchi to use ceph-backed storage. This gives me all the storage functionality I need for images, volumes, provisioning instances, and stack telemetry.
I tried enabling manila, encountered problems, and will revisit as time allows. It would be very nice to leverage my ceph cluster to provision fileservers. With block-level checksumming and fragment-parity data resilience, we finally have a viable free multi-node alternative to single-node ZFS.
NOTE: It can be helpful to disable cephx to debug integration issues, and enable after everything's working...
Enable desired systems in kolla-ansible's globals.yml:
enable_cinder: "yes" enable_gnocchi: "yes" enable_grafana: "yes" gnocchi_backend_storage: "rbd" glance_backend_ceph: "yes" cinder_backend_ceph: "yes" nova_backend_ceph: "yes" external_ceph_cephx_enabled: "yes"
On the ansible controller, set cephx to required in kolla-ansible's ceph.conf for all containers:
echo "auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx" | sudo tee -a /etc/ceph/ceph.conf
On the ansible controller, configure ceph mons into kolla-ansible's ceph.conf for all containers:
# Specify ceph_mon hosts MON_HOSTS="strange-api,merlin-api,gandalf-api" echo "mon initial members = $MON_HOSTS" | sudo tee -a /etc/ceph/ceph.conf
From within a ceph_mon container adjust the page group numbers on the default pools:
# Default Pools Have Too Few Page Groups ceph osd pool set cephfs_data pg_num 16 ceph osd pool set cephfs_metadata pg_num16
From a ceph_mon container, set up pools and client auth for the OpenStack services we'll be using.
# CLIENT GLANCE ceph osd pool create images 32" $CEPH_MON rbd pool init images" $CEPH_MON ceph auth get-or-create client.glance mon 'allow r' osd 'allow class-read object_prefix rdb_children, allow rwx pool=images' -o /etc/ceph/ceph.client.glance.keyring ceph auth get-or-create client.glance mon 'profile rbd' osd 'profile rbd pool=images' mgr 'profile rbd pool=images' ceph auth get-or-create client.glance -o /etc/ceph/ceph.client.glance.keyring # CLIENT CINDER ceph osd pool create volumes 32 rbd pool init volumes ceph auth get-or-create client.cinder mon 'allow r' osd 'allow class-read object_prefix rdb_children, allow rwx pool=volumes' -o /etc/ceph/ceph.client.cinder.keyring ceph auth get-or-create client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images' mgr 'profile rbd pool=volumes, profile rbd pool=vms' ceph auth get-or-create client.cinder -o /etc/ceph/ceph.client.cinder.keyring # CLIENT CINDER-BACKUP ceph osd pool create backups 32 rbd pool init backups ceph auth get-or-create client.cinder-backup mon 'allow r' osd 'allow class-read object_prefix rdb_children, allow rwx pool=backups' -o /etc/ceph/ceph.client.cinder-backup.keyring ceph auth get-or-create client.cinder-backup mon 'profile rbd' osd 'profile rbd pool=backups' mgr 'profile rbd pool=backups' ceph auth get-or-create client.cinder-backup -o /etc/ceph/ceph.client.cinder-backup.keyring # CLIENT NOVA ceph osd pool create vms 32 rbd pool init vms ceph auth get-or-create client.nova mon 'allow r' osd 'allow class-read object_prefix rdb_children, allow rwx pool=vms' -o /etc/ceph/ceph.client.nova.keyring ceph auth get-or-create client.nova mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images' mgr 'profile rbd pool=volumes, profile rbd pool=vms' ceph auth get-or-create client.nova -o /etc/ceph/ceph.client.nova.keyring # CLIENT GNOCCHI ceph osd pool create metrics 32 ceph auth get-or-create client.gnocchi mon 'allow r' osd 'allow class-read object_prefix rdb_children, allow rwx pool=metrics' -o /etc/ceph/ceph.client.gnocchi.keyring ceph auth get-or-create client.gnocchi mon 'allow r' osd 'allow class-read object_prefix rdb_children, allow rwx pool=metrics' ceph auth get-or-create client.gnocchi -o /etc/ceph/ceph.client.gnocchi.keyring
From the same ceph mon host, sync ceph client credentials to the ansible controller, to be used by kolla-ansible:
cd /etc/ceph rsync ceph.client.glance.keyring ceph.client.cinder.keyring ceph.client.cinder-backup.keyring ceph.client.nova.keyring ceph.client.gnocchi.keyring root@$ANSIBLE_CONTROLLER:/etc/ceph
Kolla-ansible's globals.yml needs a correction for the nova client:
ceph_nova_keyring: "ceph.client.nova.keyring"
On the ansible controller place the credentials for service deployment:
cd /etc/kolla/config sudo mkdir -p cinder/cinder-backup/ cinder/cinder-volume/ glance/ nova/ gnocchi/ sudo cp /etc/ceph/ceph.client.glance.keyring /etc/kolla/config/glance/ sudo cp /etc/ceph/ceph.client.cinder-backup.keyring /etc/kolla/config/cinder/cinder-backup/ sudo cp /etc/ceph/ceph.client.cinder.keyring /etc/kolla/config/cinder/cinder-backup/ sudo cp /etc/ceph/ceph.client.cinder.keyring /etc/kolla/config/cinder/cinder-volume/ sudo cp /etc/ceph/ceph.client.cinder.keyring /etc/kolla/config/nova/ sudo cp /etc/ceph/ceph.client.nova.keyring /etc/kolla/config/nova/ sudo cp /etc/ceph/ceph.client.gnocchi.keyring /etc/kolla/config/gnocchi/
On the ansible controller, place the following files to configure services via kolla-ansible deploy:
/etc/kolla/config/cinder/cinder-backup.conf
[DEFAULT] backup_ceph_conf=/etc/ceph/ceph.conf backup_ceph_user=cinder-backup backup_ceph_chunk_size = 134217728 backup_ceph_pool=backups backup_driver = cinder.backup.drivers.ceph.CephBackupDriver backup_ceph_stripe_unit = 0 backup_ceph_stripe_count = 0 restore_discard_excess_bytes = true
/etc/kolla/config/cinder/cinder-volume.conf
[DEFAULT] enabled_backends=rbd-1 [rbd-1] rbd_ceph_conf=/etc/ceph/ceph.conf rbd_user=cinder backend_host=rbd:volumes rbd_pool=volumes volume_backend_name=rbd-1 volume_driver=cinder.volume.drivers.rbd.RBDDriver rbd_secret_uuid = {{ cinder_rbd_secret_uuid }} rados_connect_timeout = -1
/etc/kolla/config/glance/glance-api.conf
[glance_store] stores = rbd default_store = rbd rbd_store_pool = images rbd_store_user = glance rbd_store_ceph_conf = /etc/ceph/ceph.conf # Enable copy-on-write show_image_direct_url = True # Disable cache management [paste_deploy] flavor = keystone
/etc/kolla/config/nova/nova-compute.conf
[libvirt] images_rbd_pool=vms images_type=rbd images_rbd_ceph_conf=/etc/ceph/ceph.conf rbd_user=nova
/etc/kolla/config/gnocchi/gnocchi.conf
[storage] driver = ceph ceph_username = gnocchi ceph_keyring = /etc/ceph/ceph.client.gnocchi.keyring ceph_conffile = /etc/ceph/ceph.conf
Stage the generic config to all client services:
sudo cp /etc/ceph/ceph.conf /etc/kolla/config/cinder/ sudo cp /etc/ceph/ceph.conf /etc/kolla/config/glance/ sudo cp /etc/ceph/ceph.conf /etc/kolla/config/nova/ sudo cp /etc/ceph/ceph.conf /etc/kolla/config/gnocchi/
And add a little more configuration for nova:
echo " [client] rbd_cache = true rbd_cache_writethrough_until_flush = true admin_socket = /var/run/ceph/guests/$cluster-$type.$id.$pid.$cctid.asok log_file = /var/log/qemu/qemu-guest-$pid.log rbd_concurrent_management_ops = 20" | sudo tee -a /etc/kolla/config/nova/ceph.conf