storage pools - dianaclarke/openstack-notes GitHub Wiki

What's a storage pool?

A storage pool is a quantity of storage set aside by an administrator, often a dedicated storage administrator, for use by virtual machines. Storage pools are divided into storage volumes either by the storage administrator or the system administrator, and the volumes are assigned to VMs as block devices. --https://libvirt.org/storage.html
Review storage pool specs & related patches:
The plan - three stages (from Live Migration: Austin summit update):
- Refactor the instance storage code
- Adapt to use storage pools for the instance storage
- Use storage pools to drive resize/migration
Current code:
- Image.cache(): https://github.com/openstack/nova/blob/8185dcb57e55f7579b60040649fcd0588177d714/nova/virt/libvirt/imagebackend.py#L217
  - Split cache method into 2 separate methods:
    - create_from_image(image_id)
    - create_from_func(format, size)
- BlockDeviceMapping: https://github.com/openstack/nova/blob/8185dcb57e55f7579b60040649fcd0588177d714/nova/objects/block_device.py#L43
  - Add driver_info field to the BlockDeviceMapping object
"Image" vs "Disk":

An 'image' in this code refers either to the thing which glance stores, or an instance's disk. This is confusing in any case, and especially confusing when the word has both meanings in the same block of code, for example in create_image (which downloads a glance image and creates an instance disk, which it also calls an image). Image in the latter context has come about because that's what libvirt calls it. However, as it's overloaded in nova we should never have used it. At some point I'd like to rename these all to 'disk'. --https://review.openstack.org/#/c/270998/

Read this commit to understand the motivation:

The libvirt driver was calling images.convert_image during snapshot to convert snapshots to the intended output format. However, this function does not take the input format as an argument, meaning it implicitly does format detection. This opened an exploit for setups using raw storage on the backend, including raw on filesystem, LVM, and RBD (Ceph). An authenticated user could write a qcow2 header to their instance's disk which specified an arbitrary backing file on the host. When convert_image ran during snapshot, this would then write the contents of the backing file to glance, which is then available to the user. If the setup uses an LVM backend this conversion runs as root, meaning the user can exfiltrate any file on the host, including raw disks. --Fix format conversion in libvirt snapshot

Links
Understanding the goal

The cache() interface does not provide the backend with any metadata about disk image it is being given to import. Consequently it must either infer it heuristically or inspect it. Both methods are prone to error and potential security bugs. The replacement for cache() must allow the backend to determine in advance the format and size of the disk it is importing.

The imagebackend code uses a single method, cache(), to create both disks from glance images, and disks from templates (i.e. blank filesystems or swap disks). These are then handled differently by different backends. Writing to the image cache is done by the individual backends, which use the image cache differently due to their different natures. To do this, backends must differentiate between glance images and templates, but the interface does not permit them to do this directly. The Raw backend greps 'image_id' from the argument passed to its template function. The LVM backend uses 'ephemeral_size'. The Ploop backend uses 'context' and 'image_id', and independently fetches glance metadata. The cache() interface needs to be changed to reflect its usage.
- Image.cache()
  - def cache(self, fetch_func, filename, size=None, *args, **kwargs):
- Image.cache() calls self.create_image()
- Each backend implements create_image()
- libvirt driver.py is the only consumer of Image.cache()
Root (and ephemeral) disks

Each instance needs at least one root disk (that contains the bootloader and core operating system files), and may have optional ephemeral disk (per the definition of the flavor selected at instance creation time). The content for the root disk either comes from an image stored within the Glance repository (and copied to storage attached to the destination hypervisor) or from a persistent block storage volume (via Cinder). For more information on the root disk strategies available during instance creation, refer to the section called “Root Disk Choices When Booting Nova Instances”. -- http://netapp.github.io/openstack-deploy-ops-guide/icehouse/content/section_nova-key-concepts.html

Missing local root disk

Reader beware: unlike BDMs, block_device_info does not represent all disks that an instance might have. Significantly, it will not contain any representation of an image-backed local disk, i.e. the root disk of a typical instance which isn't boot-from-volume. Other representations used by the libvirt driver explicitly reconstruct this missing disk. I assume other drivers must do the same. -- http://lists.openstack.org/pipermail/openstack-dev/2016-June/097529.html