Adding a new cloud provider - sul-dlss/preservation_catalog GitHub Wiki

Overview

Your mileage may vary, but this document of the steps we took to set up a new archive endpoint in the IBM cloud should get you started adding a new cloud provider to the preservation_catalog.

That said, you will need to make changes to three repositories for the following reasons:

  • shared_configs to set up a new pool of workers and the new endpoint
  • puppet to set up credentials and a service for the worker pool
  • preservation_catalog to set up the application's interface to the cloud, additional tooling for deployment and ci, and tests.

To dig a little deeper ...

Shared Configs

Since we use resque-pool to manage workers, and you want a pool of workers to deliver DruidVersionZips to your new endpoint, you will need to specify the name of your pool and the number of workers it contains in a configuration file for resque-pool to consume. You will also need to add configuration for the new zip endpoint. For a general example, see the sample config file in the preservation_catalog. In the case of adding the IBM endpoint, see the appropriate branch of shared_configs, particularly the config/resque-pool-south.yml file and the zip_endpoints block in config/settings/production.yml. Note for future reference that the name of the pool must correspond to the name you queue_as in ActiveJob, as seen here in order for the application to hand off to the right workers.

Puppet

The workers live on a set of virtual machines that use puppet to manage credentials for cloud providers and a service for starting and stopping worker pools. See here for an example that sets up the puppet role necessary for doing those things in our infrastructure.

Preservation Catalog

Finally, see here for an example of a pull request that sets up the application code, tests, and configuration for the IBM cloud. To walk through the changes: (1) in .travis.yml we define encrypted environment variables in a matrix, with separate environment variables defining AWS credentials and a custom environment variable for each cloud provider. The custom environment variable is used in spec/spec_helper.rb to let ci know which endpoint to run a set of "live" tests against. (2) The heart of the changes to add a new cloud provider consists in a new PreservationCatalog::Ibm module, which differs from PreservationCatalog::S3 in that the Aws::S3::Resource requires an endpoint specified. This means we think that spinning up an additional cloud provider will need a new PreservationCatalog::<YourCloudProviderHere> module, with associated audit class, and specs. (3) The ZipEndpoint class will also need a new delivery_class entry. (4) Next, capistrano will need a number of changes, to link to the additional shared_configs file for the new pool of workers, to set up the bucket name, and to be able to stop and start the pools on deployment. (5) Finally, once you add a new zip endpoint to the test settings, you will likely uncover test failures that point to a coupling between the specs and the number of zip endpoints. You will need to modify existing tests to account for the fact of a new zip endpoint.