IT: HOWTO: Install Ceph - feralcoder/shared GitHub Wiki

Up Links

Public

feralcoder public Home
feralcoder IT
Living Room Data Center

FeralStack

My Private Cloud
Kolla-Ansible OpenStack Deployment

HOWTOS

HOWTO: Install Kolla-Ansible
HOWTO: Setup Docker Registries For OpenStack
HOWTO: Kolla-Ansible Container Management
HOWTO: Setup Octavia LBAAS

Discussion

Preface

My Ceph installation is useful by itself, but its higher purpose is to serve as the storage layer for OpenStack. As such, this HOWTO is written to be used in conjunction with the Kolla-Ansible OpenStack Deployment HOWTO. Some requirements for Ceph won't be listed here because they're already outlined there. I will try to note and refer, though.

Environment

Docker Registry

The OpenStack HOWTO outlines some issues which necessitate having local Docker regisries, and they apply to the Ceph install also.
HOWTO: Setup Docker Registries For OpenStack

There is one difference with ceph-ansible, though, which alleviates some of those problems: the ceph-ansible containers are version-tagged at points in time, so it may be sufficient to use only a pull-thru container cache, if you're only deploying Ceph.

Docker vs Podman

CentOS 8 doesn't support Docker CE, and the ceph-ansible maintainers take this to heart. Ansible scripts categorically assume podman on CentOS 8 systems. Some modification of upstream scripts is required to usse Docker. These scripts are updated frequently, so be ready to manage and merge...

At some point in the future I will revisit this and switch over to podman.

OpenStack Overlap

Requiremnts already handled in the OpenStack HOWTO include, at least: users, python, ansible, venvs, firewall, sshpass, podman/buildah removal, network bond setup...

I'm using the stack user for both kolla-ansible and ceph-ansible deployments.

Setup

Get Ceph

VERSION=4.0 # Nautilus

mkdir ~/CODE/ceph && cd ~/CODE/ceph
git clone https://github.com/ceph/ceph-ansible.git
cd ceph-ansible
git checkout stable-$VERSION
pip install -U pip
pip install -r requirements.txt

Set up Virtual Environment

I recommend doing your work in a virtual environment dedicated to ceph-ansible.

First, set it up:

mkdir -p ~/CODE/venvs/ceph-ansible
python3 -m venv ~/CODE/venvs/ceph-ansible
source ~/CODE/venvs/ceph-ansible/bin/activate

From here, all ceph-ansible commands should be run after sourcing the ceph-ansible venv:

source ~/CODE/venvs/ceph-ansible/bin/activate

Inventory

I'm deploying 3 monitor / manager nodes and 5 OSD's. I'm also deploying Rados Gateway, MetaData Server (for CephFS), and Grafana with Prometheus.

My inventory file (~/CODE/ceph/ceph-ansible/hosts) looks like:

[mons]
strange-api        ansible_user=stack ansible_become=true
merlin-api         ansible_user=stack ansible_become=true
gandalf-api        ansible_user=stack ansible_become=true

[osds]
kerrigan-api        ansible_user=stack ansible_become=true
neo-api             ansible_user=stack ansible_become=true
bowman-api          ansible_user=stack ansible_become=true
lawnmowerman-api    ansible_user=stack ansible_become=true
manhattan-api       ansible_user=stack ansible_become=true

[grafana-server]
strange-api        ansible_user=stack ansible_become=true

[mgrs]
strange-api        ansible_user=stack ansible_become=true
merlin-api         ansible_user=stack ansible_become=true
gandalf-api        ansible_user=stack ansible_become=true

[rgws]
strange-api        ansible_user=stack ansible_become=true
merlin-api         ansible_user=stack ansible_become=true
gandalf-api        ansible_user=stack ansible_become=true

[mdss]
strange-api        ansible_user=stack ansible_become=true
merlin-api         ansible_user=stack ansible_become=true
gandalf-api        ansible_user=stack ansible_become=true

[clients]
kerrigan-api        ansible_user=stack ansible_become=true
neo-api             ansible_user=stack ansible_become=true
bowman-api          ansible_user=stack ansible_become=true
lawnmowerman-api    ansible_user=stack ansible_become=true
manhattan-api       ansible_user=stack ansible_become=true
strange-api        ansible_user=stack ansible_become=true
merlin-api         ansible_user=stack ansible_become=true
gandalf-api        ansible_user=stack ansible_become=true
dumbledore         ansible_user=stack ansible_become=true

Global Configuration

Edit and place $CEPH_CODE_DIR/group_vars/all.yml.

For me, this configuration includes:

  • Ceph is pinned to a point-in-time version tag, so I can decide when I'm ready to deal with pulling and testing updates.
  • I've created named bonds for reliable interface identification across servers (see the Kolla-Ansible OpenStack HOWTO)
  • public_network matches the radosgw/monitor interface
  • cluster_network matches the cluster_interface
  • configure desired dashboard parameters
  • ceph_docker_image and ceph_docker_registry can be changed to support a localized and possibly customized container image
  • ceph can make some optimizations if you intend to run other tasks, ie OpenStack Compute, on the same hosts (HCI=Hyper-Converged-Infrastructure)
  • cephx requires considerable additional configuration in OpenStack, and some corrections too
% diff ceph-all.yml ceph-all-sample.yml

< ceph_docker_image_tag: master-86da1a4-nautilus-centos-7
<
< radosgw_interface: bond4
< monitor_interface: bond4
< cluster_interface: bond5
<
< public_network: 172.18.0.0/24
< cluster_network: 172.19.0.0/24
<
< dashboard_enabled: True
< dashboard_protocol: https
< dashboard_frontend_vip: '172.18.0.10'
< dashboard_port: 8443
< dashboard_admin_password: s3cr3t
< grafana_admin_password: s3cr3t
< dashboard_crt: ''
< dashboard_key: ''
<
< ceph_docker_image: "ceph/daemon"
< containerized_deployment: true
< ceph_docker_registry: docker.io
<
< is_hci: true
<
< cephx: true

> # Dummy variable to avoid error because ansible does not recognize the
> # file as a good configuration file when no variable in it.
> dummy:

Force Docker Over Podman

For automation compatibility reasons elsewhere, I've decided I'm using Docker. The ceph-ansible team has decided they won't do docker on CentOS 8.

I have to override 'podman' logic in $CEPH_CHECKOUT_DIR/ceph-site-docker.yml:

#container_binary: "{{ 'podman' if (podman_binary.stat.exists and ansible_facts['distribution'] == 'Fedora') or (ansible_facts['os_family'] == 'RedHat' and ansible_facts['distribution_major_version'] == '8') else 'docker' }}"
container_binary: "{{ 'docker' }}"

And also in $CEPH_CHECKOUT_DIR/roles/ceph-facts/tasks/container_binary.yml:

#container_binary: "{{ 'podman' if (podman_binary.stat.exists and ansible_facts['distribution'] == 'Fedora') or (ansible_facts['os_family'] == 'RedHat' and ansible_facts['distribution_major_version'] == '8') else 'docker' }}"
container_binary: 'docker'

And in $CEPH_CHECKOUT_DIR/roles/ceph-container-engine/tasks/pre_requisites/prerequisites.yml, I comment out everything except:

- name: include specific variables
  include_vars: "{{ item }}"
  with_first_found:
    - "{{ ansible_facts['distribution'] }}-{{ ansible_facts['distribution_major_version'] }}.yml"
    - "{{ ansible_facts['os_family'] }}.yml"

- name: debian based systems tasks
  include_tasks: debian_prerequisites.yml
  when:
    - ansible_facts['os_family'] == 'Debian'
  tags: with_pkg

OSDs

Right now I've got a single OSD volume in each OSD server. Each OSD volume consists of a spinning data drive for data, and an SSD db drive for DB and WAL. It's possible to specify a separate WAL drive for each OSD, but it will default to the DB drive if not provided. Both DB and WAL will be on the data drive if no DB drive is provided. Sizing recommendations I've seen are 4% DB to DATA - I'm doing 240GB vs 4TB, 6%.

I'm using Bluestore, because it's the only sensible way, now (vs Filestore):

  • Elimination of the unnecessary filesystem intermediate layer is huge, for performance and architectural correctness.
  • Block-level checksumming provides an entirely different category of data reliability.
  • Bluestore allows erasure coded pools to serve CephFS and RBD.
    • Maintaining rollback information for data overwrites is too expensive through the old OSD-on-Filesystem model.
    • Bluestore provides additional primitives to make this feasible.

I'm using LVM OSD's for one reason: predictable device identification via LVM names. The other way would be to specify the raw devices, but even my Proliant servers don't give me 100% reliable device naming, and udev still doesn't support renaming the block devices.

$CEPH_CHECKOUT_DIR/group_vars/osds.yml:

osd_scenario: lvm
osd_objectstore: bluestore
dmcrypt: false
#devices: []
lvm_volumes:
  - data: OSD_1_data
    data_vg: OSD_1_data
    db: OSD_1_db
    db_vg: OSD_1_db

In addition to using LVM for reliable OSD identification by ceph-ansible, I'm also putting LVM into a gpt partition, rather than on the raw disk. I do this so that my OSD management scripts can easily identify my OSD devices by partition name, and I can rebuild those devices without ever risking losing the identifying handle.

To prepare these devices, I run on each OSD server:

# ASSUMING BLANK DRIVES
DATA_VG=OSD_1_data   DATA_DEV=/dev/sdX
DB_VG=OSD_1_db       DB_DEV=/dev/sdY

parted --script $DATA_DEV mklabel gpt mkpart primary 1MiB 100% name 1 $DATA_VG set 1 lvm on
pvcreate -y ${DATA_DEV}1; vgcreate -y $VG ${DATA_DEV}1; lvcreate -y -n $DATA_VG -l 100%FREE $VG

parted --script $DB_DEV mklabel gpt mkpart primary 1MiB 100% name 1 $DB_VG 1 lvm on
pvcreate -y ${DB_DEV}1; vgcreate -y $VG ${DB_DEV}1; lvcreate -y -n $DB_VG -l 100%FREE $VG

It can be problematic reusing drives, which may already be interracting with the OS. For example, existing LVM state will already be loaded and activated, even if you haven't done anything with the drives. It's advised to wipe them before doing the above step. DD followed by reboot will do the trick. Or, if you're reinstalling a system with existing OSD's, do this:

for VG in $DATA_VG $DB_VG; do
  lvremove -y $VG
  vgremove -y $VG
done
for DEV in $DATA_DEV $DB_DEV; do
  pvremove -y ${DEV}1
  dd bs=1M count=1024 conv=sync if=/dev/zero of=${DEV}1
done

Deploy

Finally, we can wrap this up! Nope, stick around, buddy.

ansible-playbook $CEPH_CHECKOUT_DIR/site-docker.yml -i $INVENTORY -e container_package_name=docker-ce

Ceph / OpenStack Integration

Ceph and Kolla-Ansible require substantial additional configuration to work together.
(More client auth details can be found here : https://docs.ceph.com/en/latest/rbd/rbd-openstack/)

My configuration enables glance, cinder, nova, and gnocchi to use ceph-backed storage. This gives me all the storage functionality I need for images, volumes, provisioning instances, and stack telemetry.

I tried enabling manila, encountered problems, and will revisit as time allows. It would be very nice to leverage my ceph cluster to provision fileservers. With block-level checksumming and fragment-parity data resilience, we finally have a viable free multi-node alternative to single-node ZFS.

Kolla-Ansible - Ceph Configuration

NOTE: It can be helpful to disable cephx to debug integration issues, and enable after everything's working...

Enable desired systems in kolla-ansible's globals.yml:

enable_cinder: "yes"
enable_gnocchi: "yes"
enable_grafana: "yes"

gnocchi_backend_storage: "rbd"
glance_backend_ceph: "yes"
cinder_backend_ceph: "yes"
nova_backend_ceph: "yes"

external_ceph_cephx_enabled: "yes"

On the ansible controller, set cephx to required in kolla-ansible's ceph.conf for all containers:

echo "auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx" | sudo tee -a /etc/ceph/ceph.conf

Kolla-Ansible's External Ceph

On the ansible controller, configure ceph mons into kolla-ansible's ceph.conf for all containers:

# Specify ceph_mon hosts
MON_HOSTS="strange-api,merlin-api,gandalf-api"
echo "mon initial members = $MON_HOSTS" | sudo tee -a /etc/ceph/ceph.conf

Pool Setup

From within a ceph_mon container adjust the page group numbers on the default pools:

# Default Pools Have Too Few Page Groups
ceph osd pool set cephfs_data pg_num 16
ceph osd pool set cephfs_metadata pg_num16

From a ceph_mon container, set up pools and client auth for the OpenStack services we'll be using.

# CLIENT GLANCE
ceph osd pool create images 32" $CEPH_MON
rbd pool init images" $CEPH_MON
ceph auth get-or-create client.glance mon 'allow r' osd 'allow class-read object_prefix rdb_children, allow rwx pool=images' -o /etc/ceph/ceph.client.glance.keyring
ceph auth get-or-create client.glance mon 'profile rbd' osd 'profile rbd pool=images' mgr 'profile rbd pool=images'
ceph auth get-or-create client.glance -o /etc/ceph/ceph.client.glance.keyring

# CLIENT CINDER
ceph osd pool create volumes 32
rbd pool init volumes
ceph auth get-or-create client.cinder mon 'allow r' osd 'allow class-read object_prefix rdb_children, allow rwx pool=volumes' -o /etc/ceph/ceph.client.cinder.keyring
ceph auth get-or-create client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images' mgr 'profile rbd pool=volumes, profile rbd pool=vms'
ceph auth get-or-create client.cinder -o /etc/ceph/ceph.client.cinder.keyring

# CLIENT CINDER-BACKUP
ceph osd pool create backups 32
rbd pool init backups
ceph auth get-or-create client.cinder-backup mon 'allow r' osd 'allow class-read object_prefix rdb_children, allow rwx pool=backups' -o /etc/ceph/ceph.client.cinder-backup.keyring
ceph auth get-or-create client.cinder-backup mon 'profile rbd' osd 'profile rbd pool=backups' mgr 'profile rbd pool=backups'
ceph auth get-or-create client.cinder-backup -o /etc/ceph/ceph.client.cinder-backup.keyring

# CLIENT NOVA
ceph osd pool create vms 32
rbd pool init vms
ceph auth get-or-create client.nova mon 'allow r' osd 'allow class-read object_prefix rdb_children, allow rwx pool=vms' -o /etc/ceph/ceph.client.nova.keyring
ceph auth get-or-create client.nova mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images' mgr 'profile rbd pool=volumes, profile rbd pool=vms'
ceph auth get-or-create client.nova -o /etc/ceph/ceph.client.nova.keyring

# CLIENT GNOCCHI
ceph osd pool create metrics 32
ceph auth get-or-create client.gnocchi mon 'allow r' osd 'allow class-read object_prefix rdb_children, allow rwx pool=metrics' -o /etc/ceph/ceph.client.gnocchi.keyring
ceph auth get-or-create client.gnocchi mon 'allow r' osd 'allow class-read object_prefix rdb_children, allow rwx pool=metrics'
ceph auth get-or-create client.gnocchi -o /etc/ceph/ceph.client.gnocchi.keyring

Ceph Client Authentication

From the same ceph mon host, sync ceph client credentials to the ansible controller, to be used by kolla-ansible:

cd /etc/ceph
rsync   ceph.client.glance.keyring   ceph.client.cinder.keyring   ceph.client.cinder-backup.keyring   ceph.client.nova.keyring   ceph.client.gnocchi.keyring   root@$ANSIBLE_CONTROLLER:/etc/ceph

Kolla-ansible's globals.yml needs a correction for the nova client:

ceph_nova_keyring: "ceph.client.nova.keyring"

On the ansible controller place the credentials for service deployment:

  cd /etc/kolla/config
  sudo mkdir -p   cinder/cinder-backup/   cinder/cinder-volume/   glance/   nova/   gnocchi/
  sudo cp /etc/ceph/ceph.client.glance.keyring /etc/kolla/config/glance/
  sudo cp /etc/ceph/ceph.client.cinder-backup.keyring /etc/kolla/config/cinder/cinder-backup/
  sudo cp /etc/ceph/ceph.client.cinder.keyring /etc/kolla/config/cinder/cinder-backup/
  sudo cp /etc/ceph/ceph.client.cinder.keyring /etc/kolla/config/cinder/cinder-volume/
  sudo cp /etc/ceph/ceph.client.cinder.keyring /etc/kolla/config/nova/
  sudo cp /etc/ceph/ceph.client.nova.keyring /etc/kolla/config/nova/
  sudo cp /etc/ceph/ceph.client.gnocchi.keyring /etc/kolla/config/gnocchi/

Configure Ceph Client Services

On the ansible controller, place the following files to configure services via kolla-ansible deploy:

/etc/kolla/config/cinder/cinder-backup.conf

[DEFAULT]
backup_ceph_conf=/etc/ceph/ceph.conf
backup_ceph_user=cinder-backup
backup_ceph_chunk_size = 134217728
backup_ceph_pool=backups
backup_driver = cinder.backup.drivers.ceph.CephBackupDriver
backup_ceph_stripe_unit = 0
backup_ceph_stripe_count = 0
restore_discard_excess_bytes = true

/etc/kolla/config/cinder/cinder-volume.conf

[DEFAULT]
enabled_backends=rbd-1

[rbd-1]
rbd_ceph_conf=/etc/ceph/ceph.conf
rbd_user=cinder
backend_host=rbd:volumes
rbd_pool=volumes
volume_backend_name=rbd-1
volume_driver=cinder.volume.drivers.rbd.RBDDriver
rbd_secret_uuid = {{ cinder_rbd_secret_uuid }}
rados_connect_timeout = -1

/etc/kolla/config/glance/glance-api.conf

[glance_store]
stores = rbd
default_store = rbd
rbd_store_pool = images
rbd_store_user = glance
rbd_store_ceph_conf = /etc/ceph/ceph.conf

# Enable copy-on-write
show_image_direct_url = True


# Disable cache management
[paste_deploy]
flavor = keystone

/etc/kolla/config/nova/nova-compute.conf

[libvirt]
images_rbd_pool=vms
images_type=rbd
images_rbd_ceph_conf=/etc/ceph/ceph.conf
rbd_user=nova

/etc/kolla/config/gnocchi/gnocchi.conf

[storage]
driver = ceph
ceph_username = gnocchi
ceph_keyring = /etc/ceph/ceph.client.gnocchi.keyring
ceph_conffile = /etc/ceph/ceph.conf

Stage the generic config to all client services:

sudo cp /etc/ceph/ceph.conf /etc/kolla/config/cinder/
sudo cp /etc/ceph/ceph.conf /etc/kolla/config/glance/
sudo cp /etc/ceph/ceph.conf /etc/kolla/config/nova/
sudo cp /etc/ceph/ceph.conf /etc/kolla/config/gnocchi/

And add a little more configuration for nova:

echo "
[client]
  rbd_cache = true
  rbd_cache_writethrough_until_flush = true
  admin_socket = /var/run/ceph/guests/$cluster-$type.$id.$pid.$cctid.asok
  log_file = /var/log/qemu/qemu-guest-$pid.log
  rbd_concurrent_management_ops = 20" | sudo tee -a /etc/kolla/config/nova/ceph.conf
⚠️ **GitHub.com Fallback** ⚠️