Masakari API Design - ntt-sic/masakari GitHub Wiki

(This is WIP and proposal for next release)

API Use case

Failover Segment

System can be zoned from top to down levels, into Regions, Availability Zones and Host Aggregates (or Cells). Within those zones, one or more pacemaker/pacemaker-remote clusters may exist. In addition to those boundaries, shared storage boundary is also important to decide the optimal host for fail-over. Openstack zoned boundaries (such as Regions, AZ, Host Aggregates, etc..) can be managed by the nova scheduler. However, shared storage boundaries are difficult to manage. Moreover, the operator may want to use other types of boundary such as rack layout and powering. Therefore, operator may want to define the segment of hypervisor hosts and assign the failover host/hosts for each of them. Those segment can be define based on the shared storage boundaries or any other limitations may critical for selection of the failover host.

Interfaces:

  • (mandatory) CRUD of failover segments

Approach:

  • Operator can define the failover segment and member host of each segment

  • Operator can define the failover policy for each segment such as use particular failover host or use nova scheduler or combination of both methods.

Capacity Reservation

Service provider who ensures an uptime of VM instance to their customer needs to make sure that the certain amount of host capacity are reserved to prepare a failure event. If the host capacity of system is full and the host failure happens, the VM on the failure host cannot be evacuated to other host. The system capacity is typically fragmented into segments due to underlying component’s scalability and each segment has a limited capacity. To increase resource efficiency, high utilization of host capacity is preferred. However, as any user consume resources on demand, the host capacity of each segment tends to reach the full if the system doesn’t provides the way to reserve the portion of host capacity to the operators. Therefore, the function to reserve host capacity for failover event is important to ensure the high availability of VM instance.

Interfaces:

  • (mandatory) CRUD of reserved host

Approach:

  • The operator is assumed to achieve capacity management by changing nova-compute service status of each host. The host registered as reserved host is excluded from nova scheduler with having the nova-compute status to service-disabled.

  • In case of a failure event, the function changes the nova-compute status to service-enabled to make the host available and specifies the host as the destination of VM instance evacuation operation.

Host Maintenance

A host has to be temporarily and safely removed from the system for the maintenance such as hardware failure, firmware update and so on. During the maintenance, the monitoring function on the host should be disabled and the monitoring alert from the host should be ignored not to trigger any recovery action of VM instance on the host if it’s running. The host should be excluded from reserved hosts as well.

Interfaces:

  • (mandatory) change(enable/disable) the maintenance mode of a host.

Approach:

  • In case of changing maintenance mode, the function checks if there is any on-going recovery tasks on the host. It only changes the status of the host when there is no tasks conflicting to the change. The status of a host is managed in the DB.

  • The function calls the command or interface of pacemaker such as “crm node standby” and “crm node online” to change the underlying monitoring and fencing component.

Event History

Knowledge of the past events such as process failures, VM failures and host failures are useful for determine the required maintenance of the hosts. And also, easy tracking of past event save more time in system failure diagnosis. This APIs could use to auto generate the health reports or SLA report of the High-Available VMs.

Interfaces:

  • Get the list of past failure events and recovery actions per host/VM by host name or VM uuid.

  • Get the details of single event by event ID.

Approach:

  • All the failure events and recovery actions details are stored in DB. Query them and compose the response upon above API requests.

API Design

Parameter Style Type Description
Failover Segments
POST /segments Create a Segment to define failover boundary name plain xsd:string Name of the segment
description(Optional) plain xsd:string Description of the segment
recovery_method plain xsd:string Set the failover method. Available modes are 1. Auto (use nova scheduler to select the host), 2. RH ( use ReservedHost), 3. AutoPrioritize (First use Auto, if fails then use RH), 4. RHPrioritize (First use RH, if fails then use Auto)
GET /segments List of segments ids.
GET /segments/{id} Show details for a segments id plain xsd:string id of the segment
PUT /segments/{id} Update the segment name plain xsd:string Name of the segment
description(Optional) plain xsd:string Description of the segment
recovery_method plain xsd:string Set the failover method. Available modes are 1. Auto (use nova scheduler to select the host), 2. RH ( use ReservedHost), 3. AutoPrioritize (First use Auto, if fails then use RH), 4. RHPrioritize (First use RH, if fails then use Auto)
DELETE /segments/{id} Delete a segment id plain xsd:string id of the segment
Hosts
POST /hosts Add host name plain xsd:string Name of the host
type plain xsd:string Cluster type
controll_attributes plain xsd:dict Atrributes to controll the cluster status of the host. Depend on the cluster type.
GET /hosts List of hosts ids
GET /hosts/{id} Show details for a Host id plain xsd:string id of the host
PUT /hosts/{id} Update host reserved plain xsd:boolean Whether the host is a reserved host or not
on_maintenance plain xsd:boolean Set host to maintenance mode
failover_segment_id plain xsd:string id of the failover segment
DELETE /hosts/{id} Delete a Host id plain xsd:string id of the host
Event History
GET /events List of notifications q query xsd:list Filters the response by one or more arguments.For example: ?q.field=Foo&q.value=my_text.This could be a uuid of a VM, host name, time duration,..etc
GET /events/{notification_id} Show details for a notification notification_id plain xsd:string ID of the notification

DB Design

masakari DB design

⚠️ **GitHub.com Fallback** ⚠️