OSD Journal mapping mechanism in bigfin - skyrings/bigfin GitHub Wiki

Bigfin automatically takes care of assigning journal disks while OSD creation for ceph cluster. It follows the guidelines provided by Ceph and out of provided disks on a host, it automatically figures out disks to be used as journal and data disks.

The guidelines for using journals are -

  • Prefer SSD disks to be used as journals
  • Each SSD can be used as journal for at most 4 OSDs
  • Rotational disks can works as journal for at most 1 OSD only
  • If only rotational disks are available, smaller sized ones are preferred to be used journals (as long as enough space available based on journal size selected for the cluster)
  • Default journal size for Ceph clusters is 5GB
  • Journal disk for an OSD should reside on the same host

Algorithm

The algorithm which implements the mapping of journal for OSDs goes as below -

Case-1: If all the disks for a host are rotational in nature

  • Sort the set of disks in descending order of their size
  • If no of disks is odd, leave the last entry in the sorted list as journal mapping cannot be done for all the disks
  • Start from the head of the sorted list of disks and map journals from the tail of the list
  • For each OSD selected from the head of the list select one journal disk from the tail of the list
  • Keep moving till reach the middle of the list and by this time all the OSDs to be created have got their journal disks assigned
  • In case of odd no of disks, last disk is left out and the same can be utilized as journal while new disk addition to the host at later stage

images/journal_mapping1.png

Case-2: If all the disks for a host are SSD in nature

  • Sort the set of disks in descending order of their size
  • Start from the head of the sorted list and map journals from the tails of the list
  • A maximum of 4 disks (provided enough space is available left with the SSD disk based on journal size) should be mapped from head of the list to the same journal disk from tail of the list
  • If no enough space left on the SSD in consideration for journal, move to one before that from tail of the list even if OSD count mapped to this SSD is not reached 4
  • Keep doing this as long as no more disks left from head to be mapped to their journals

images/journal_mapping2.png

Case-3: If few of the disks are rotational and few are SSD in nature

  • Segregate the set of rotational and SSD disks in two different lists
  • Loop through the list of rotational disks and start mapping them with SSDs from the second list
  • Map a maximum of 4 rotational disk to a single SSD to be used as journal (as long as enough space is available on SSD)
  • If no more space available with SSD even if maximum no of 4 is not reached move to the next SSD for mapping journals
  • Once done with either of rotational or SSDs list there could below two scenario possible

Case-3A: Few SSDs still left

In this scenario for sure all the rotational disks are exhausted and we are left with only SSDs

  • Sort the left out SSDs in descending order of size
  • Follow the logic in case-2 to map journals within SSDs
  • DONE!

Case-3B: Few rotational disks are still left

In this scenario for sure we have exhausted the SSDs and left with few rotational disks only

  • Sort the left out rotational disk in descending order of size
  • Follow the logic in case-1 to map journals within rotational disks
  • DONE!

Case-4: Expand cluster by adding new disks to an existing host

  • Figure out is some SSD which is already used as journal for other OSDs, still has got enough space available for another journal
  • If so, map this SSD as journal for newly coming disks
  • Keep track that a maximum of 4 OSD can use this SSD as journal as long space is available based on journal size for the cluster
  • After utilizing the existing SSD as journal, if still left with disks perform the mapping among themselves based on their type

Case-5 Expand cluster by adding a new host with set of disks

This scenario is pretty simple as new node has come up now and it would fall under case-1, case-2 or case-3

Few sample examples

Use Case-1:

An OSD host with 7 rotational disks should result in 3 OSDs

1-A:

Add a disk to the above host post cluster creation and left out disk above should be used and 1 OSD should be created

Use Case-2:

An OSD host with 6 rotational disks should result in 3 OSDs

Use Case-3:

An OSD host with 10 SSD disks should result in 8 OSDs in total

Use Case-4:

An OSD host with 8 rotational + 1 SSD disks should result in 6 OSDs in total

Use Case-5:

An OSD host with 8 rotational + 2 SSD disks should result in 8 OSDs in total

Use Case-6:

An OSD host with 6 rotational + 5 SSD disks should result in 8 OSDs in total