Adding a new cloud provider - sul-dlss/preservation_catalog GitHub Wiki
Overview
Your mileage may vary, but this document of the steps we took to set up a new archive endpoint in the IBM cloud should get you started adding a new cloud provider to the preservation_catalog.
That said, you will need to make changes to three repositories for the following reasons:
- shared_configs to set up a new pool of workers and the new endpoint
- puppet to set up credentials and a service for the worker pool
- preservation_catalog to set up the application's interface to the cloud, additional tooling for deployment and ci, and tests.
To dig a little deeper ...
Shared Configs
Since we use resque-pool to manage workers, and you want a pool of workers to deliver DruidVersionZips
to your new endpoint, you will need to specify the name of your pool and the number of workers it contains in a configuration file for resque-pool to consume. You will also need to add configuration for the new zip endpoint. For a general example, see the sample config file in the preservation_catalog. In the case of adding the IBM endpoint, see the appropriate branch of shared_configs, particularly the config/resque-pool-south.yml
file and the zip_endpoints
block in config/settings/production.yml
. Note for future reference that the name of the pool must correspond to the name you queue_as
in ActiveJob
, as seen here in order for the application to hand off to the right workers.
Puppet
The workers live on a set of virtual machines that use puppet to manage credentials for cloud providers and a service for starting and stopping worker pools. See here for an example that sets up the puppet role necessary for doing those things in our infrastructure.
Preservation Catalog
Finally, see here for an example of a pull request that sets up the application code, tests, and configuration for the IBM cloud. To walk through the changes: (1) in .travis.yml
we define encrypted environment variables in a matrix, with separate environment variables defining AWS credentials and a custom environment variable for each cloud provider. The custom environment variable is used in spec/spec_helper.rb
to let ci know which endpoint to run a set of "live" tests against. (2) The heart of the changes to add a new cloud provider consists in a new PreservationCatalog::Ibm
module, which differs from PreservationCatalog::S3
in that the Aws::S3::Resource
requires an endpoint
specified. This means we think that spinning up an additional cloud provider will need a new PreservationCatalog::<YourCloudProviderHere>
module, with associated audit class, and specs. (3) The ZipEndpoint
class will also need a new delivery_class
entry. (4) Next, capistrano
will need a number of changes, to link to the additional shared_configs
file for the new pool of workers, to set up the bucket name, and to be able to stop and start the pools on deployment. (5) Finally, once you add a new zip endpoint to the test settings, you will likely uncover test failures that point to a coupling between the specs and the number of zip endpoints. You will need to modify existing tests to account for the fact of a new zip endpoint.