kubernetes 安装rook ceph - alx696/share GitHub Wiki
- 至少1个主节点和2个从节点;
- 2个从节点必须拥有第2块空硬盘(无文件系统);
根据推荐,内核要求4.17以上,Ubuntu建议使用20.04版本!节点最好是1主3从,否则Rook(ceph)因为节点小于3会有警告。
参考文档,集群为1主2从共3个节点组成。
从源码库下载最新release分支,比如release-1.3。解压后复制源码中cluster/examples/kubernetes/ceph
路径下的common.yaml
,operator.yaml
,cluster.yaml
三个文件。修改cluster.yaml
文件中mon的count为2
。
mon的数量应该跟从节点的数量对应,默认主节点是不用的。
用这个脚本清理磁盘完成准备!
将common.yaml
,operator.yaml
,cluster.yaml
三个文件复制到主节点中,执行:
$ kubectl create -f common.yaml
$ kubectl create -f operator.yaml
$ kubectl -n rook-ceph get pod
确认rook-ceph-operator的状态为Running,执行:
$ kubectl create -f cluster.yaml
稍等一会儿,验证是否成功:
u@nm:~$ kubectl -n rook-ceph get pod
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-9pw95 3/3 Running 0 3m20s
csi-cephfsplugin-bp49p 3/3 Running 0 3m20s
csi-cephfsplugin-provisioner-75f4cb8c76-2jxlm 5/5 Running 0 3m19s
csi-cephfsplugin-provisioner-75f4cb8c76-mq7t6 5/5 Running 0 3m19s
csi-rbdplugin-5xqj7 3/3 Running 0 3m21s
csi-rbdplugin-p78mn 3/3 Running 0 3m21s
csi-rbdplugin-provisioner-6cfb8565c4-4w9zj 6/6 Running 0 3m20s
csi-rbdplugin-provisioner-6cfb8565c4-llzl9 6/6 Running 0 3m20s
rook-ceph-crashcollector-n1-776c848c9b-z6fp8 1/1 Running 0 3m1s
rook-ceph-crashcollector-n2-db4bf74cf-m7m77 1/1 Running 0 2m20s
rook-ceph-mgr-a-5bd5b87b8b-dwmdb 1/1 Running 0 2m36s
rook-ceph-mon-a-5b4c8d684-sn74v 1/1 Running 0 3m1s
rook-ceph-mon-b-5fb78ff889-q4k5l 1/1 Running 0 2m49s
rook-ceph-operator-5698b8bd78-tbqlp 1/1 Running 0 3m43s
rook-ceph-osd-0-67886b5585-k5q62 1/1 Running 0 2m21s
rook-ceph-osd-1-687b4bcf65-b4zvm 1/1 Running 0 2m20s
rook-ceph-osd-prepare-n1-z72xm 0/1 Completed 0 2m34s
rook-ceph-osd-prepare-n2-h8fkw 0/1 Completed 0 2m33s
rook-discover-75dcs 1/1 Running 0 3m42s
rook-discover-nl8qf 1/1 Running 0 3m42s
注意:因为依赖的2个镜像大概有1.6G多,所以需要等待的时间较长。注意确认从节点上rook/ceph和ceph/ceph是否拉取完毕!
上面输出中应该至少存在rook-ceph-mgr
和rook-ceph-osd-0
才表示成功!到从节点中查看/var/lib/rook/rook-ceph/log/ceph-volume.log
日志内容了解原因。
$ kubectl delete -f cluster.yaml
$ kubectl delete -f operator.yaml
执行$ kubectl -n rook-ceph get pod
直到输出No Resources...
字样,执行:
$ kubectl delete -f common.yaml
到每个从节点中,将下面脚本保存为sh文件:
#!/usr/bin/env bash
DISK="/dev/sda"
# Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean)
# You will have to run this step for all disks.
sgdisk --zap-all $DISK
dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync
# These steps only have to be run once on each node
# If rook sets up osds using ceph-volume, teardown leaves some devices mapped that lock the disks.
ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove %
# ceph-volume setup can leave ceph-<UUID> directories in /dev (unnecessary clutter)
rm -rf /dev/ceph-*
# remove old data
rm -rf /var/lib/rook/*
用sudo执行上面命令清理磁盘。
注意:修改DISK参数为你的实际磁盘!
$ kubectl get -n rook-ceph cephcluster rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH
rook-ceph /var/lib/rook 2 118m Ready Cluster created successfully HEALTH_WARN
安装官方文档完成基本安装后,使用时出现错误:subvolume group 'csi' does not exist
。
# ceph fs volume ls
# ceph fs subvolumegroup create sf csi
# exit
再次安装问题解决。