Running Serratus - ababaian/serratus GitHub Wiki
Serratus is under active development, these steps are the most recent but subject to change.
To run Serratus
with less issues, ensure you're grabbing a tagged
version which we can confirm are operational.
0) Dependencies
AWS account
- Sign up for an AWS account (you can use the free tier)
- Create an IAM Admin User with Access Key. For Access type, use Progammatic access.
- Note the Access Key ID and Secret values.
- Create a EC2 keypair in
us-east-1
region. Retain the name of the keypair and the.pem
file. Configure yourssh
for easy AWS access(changeserratus.pem
to your identity file).
~/.ssh/config
: Add these lines
Host *.compute.amazonaws.com *.compute-1.amazonaws.com aws_*
User ec2-user
IdentityFile ~/.ssh/serratus.pem
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
Packer
- Download Packer as a binary. Extract it to a PATH directory (
~/.local/bin
)
Terraform
- Download Teraform (>= v0.12.24) as a binary. Extract it to a PATH directory (
~/.local/bin
)
1) Build Serratus AMIs with Packer
Pass AWS credentials to pipeline via environmental variables
export AWS_ACCESS_KEY_ID="your_access_key"
export AWS_SECRET_ACCESS_KEY="your_secret_key"
Use packer to build the serratus instance image (AMI)
cd serratus/packer
/path/to/packer build docker-ami.json
cd ../..
This will start up a t3.nano, build the AMI, and then terminate it. Currently this takes about 2 minutes, which should cost well under a penny. The final line of STDOUT will be the region and AMI. Retain this information
Current stable AMI: us-east-1: ami-01aa4fbd5cad1f0c1
2) Build Serratus resources with Terraform
Set Terraform variables
Open terraform/main/terraform.tfvars
in a text editor. Set these variables
dev_cidrs
: Your public IP, followed by "/32". Use:curl ipecho.net/plain; echo
key_name
: Your EC2 key pair namedockerhub_account
: (optional). Change this to your docker hub account to build your own images. Default images are inserratusbio
organization.
Create Serratus resources
Navigate to the top-level module and run terraform
initialization and apply. Retain the scheduler DNS address (last output line).
cd terraform/main
terraform init
terrafform apply
cd ../..
At the time of writing, this will create:
- a t3.nano, for the scheduler, with an Elastic IP
- an S3 bucket, to store intermediates
- an ASG for serratus-dl, using c5.large with 50GB of gp2.
- An ASG for serratus-align, using c5.large
- An ASG for serratus-merge, using t3.small
- Security groups and IAM roles to tie it all together.
All ASGs have a max size of 1. This can all be reconfigured in terraform/main/main.tf.
At the end of tf apply
, it will output the scheduler's DNS address. Keep this for later.
3) Open SSH tunnel to the scheduler
The scheduler exposes ports 3000/8000/9090. This port is not exposed to the public internet. You will need to create an SSH tunnel to allow your local web-browser and terminal to connect.
./create_tunnel.sh
Open a web browser for UI: Status Page: http://localhost:8000/jobs/ Grafana: http://localhost:3000/jobs/ http://localhost:8000/jobs/ Prometheus: http://localhost:8000/jobs/
May take a few minutes to boot. Make tea.
5) Loading SRA Accessions into Serratus
Once the scheduler is online, you can load SRA accession data in the form of a SraRunInfo.csv
file (NCBI SRA > Send to: File
) with the upload script
./uploadSRA.sh my_SraRunInfo.csv
This should respond with a short JSON indicating the number of rows inserted, and the total number in the scheduler.
In your web browser, refresh the status page. You should now see a list of accessions by state. If ASGs are online, they should start processing immediately. In a few seconds, the first entry will switch to "splitting" state, which means it's working.
6) Launch cluster nodes
With data loaded into the scheduler, get the config.json
which controls cluster size via the scheduler.
curl localhost:8000/config | jq > serratus-config.json
You can control the number of serratus-dl
, serratus-align
and serratus-merge
instances and the rate at which they are created with this file. Once updated with the parameters you would like to run upload it back into the scheduler
# Re-upload config file
curl -T serratus-config.json localhost:8000/config
Monitor the performance and throughput of the cluster with the Grafana interface (localhost:3000). You can adjust cluster metrics with the config file and re-upload it to update the cluster.
Example
- Example run template is here: Run template
- Example run with data is here: Phase 1 clean-up Run