Set up RDS - VertebrateResequencing/vr-pipe GitHub Wiki

VRPipe (currently) requires a MySQL database to function. If you already have such a database and are able to connect to it from an EC2 instance, or if you know how to administer a MySQL database and can set one up the instance that will run VRPipe, it will be cheapest to do so.

However if you want your database administration handled for you (including backups), and want the convenience of being able to terminate all your EC2 instances when work completes, then later launch new ones when there is new work to do, whilst having no fears of losing your database, Amazon provides a managed Relational Database Service (RDS) - a persistent MySQL server in the cloud. There is a continuous monthly charge for storage, and an hourly charge while your RDS instance is running.

It is possible to use all-default settings when setting up RDS, and this will be fine for small VRPipe deployments (tens of instances). For large clusters of thousands of CPU cores, however, it may be beneficial to optimize settings, as described in the optional step 2 below.

Starting from your console home, click the RDS icon. (Then make sure that the correct region is selected in the top right menu; it should be the same region as your EC2 instances.)
(optional) For large deployments, create a new parameter group with the recommended parameters listed in the VRPipe README file.
Go to the "Parameter Groups" pane and click the "Create DB Parameter Group" button:

Choose the mysql5.5 family, enter a name and description and click "Yes, Create":

Select the group you just created, then click "Edit Parameters":

Go through the list of parameters and set the following values before clicking "Save Changes":
binlog_cache_size = 33554432
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_size = {DBInstanceClassMemory*3/4}
innodb_change_buffering = all
innodb_commit_concurrency = 50
innodb_concurrency_tickets = 5000
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 0
innodb_log_buffer_size = 33554432
max_connections = {DBInstanceClassMemory/12582880}
tx_isolation = REPEATABLE-READ
Go to the "Instances" pane and click the "Launch DB Instance" button.
Select MySQL.
Select a version of MySQL 5.5. Choose an instance class based on how many connections it can handle (see here for a guide; it is based on the amount of memory the instance has available accounting for the OS etc.) vs how many submissions you will be running at once in VRPipe (how many cores there will be in your maximum number of EC2 instances / ~3). For example, if you will have 10s of instances, you may be able to use the cheapest t1.micro instance class. For thousands of cores, you should use an xlarge. Choose 'No' for Mutli-AZ Deployment and 'No' for "Auto Minor Version Upgrade". Depending on how much work you do with VRPipe you could end up needing 10s of GB of storage for the database, but you can start with the minimum storage allocation of 5GB and increase it later if necessary. Finally, choose an identifier, username and password, then Continue.
Fill in the name for your VRPipe production database, make sure it won't be publicly accessible, set the availability zone to match where your EC2 instances are (or will be placed), choose the parameter group you made in step 2 (or the default group if you skipped step 2), select the security group you EC2 instances will use, and then click Continue.
Take advantage of automatic backups (which will typically be free).
Click the "Launch DB Instance" button.
Wait for the instance to become available.
Select the instance and choose "See Details" from the "Instance Actions" menu.
From the details page, note down the Endpoint, which is the "host" and "port" you will need when configuring VRPipe or connecting to the database. Also note your username and the password you chose in step 5. Finally note the Availability Zone: use the same one when launching EC2 instances.