Architecture Overview - avalonmediasystem/avalon-aws GitHub Wiki
Architecture and technical description of Avalon in AWS

Overview
Avalon in AWS is an attempt to build a fast, redundant, scalable Avalon solution that allows for rapid development and deployment. Our goal is to remove as many servers as possible and rely on AWS Services whenever and wherever they make sense. When we have to host server instances, we will rely on Elastic Beanstalk.
Core Services used
Elastic Transcoder is a transcoding piping line. It allows us to efficiently transcode video. Each as each transcode request is an api call which returns a transcoded derivative. This allows us to scaleup rapidly.
Amazon Cloudfront is a edge cache and streaming solution for static media files. Using Amazon Cloudfront as part of our Avalon solution effectively means we do not have to host or maintain a separate wowza service. In addition, as an edge cache it gets data close to users providing a better experience. Cloudfront streams both HLS and RTMP derivatives.
Elastic Beanstalk an auto-scaled set of ec2 instances and loadbalancers. Using EBS, we are able to scale at will to compensate for load and spin down when necessary to conserve money. This will mean a considerably better experience for end users.
RDS is Amazon's Relational Database Service. It is a fully manages solution, requiring only that we give it a schema. In addition, we are able to scale at will to compensate for load, should we need to.
Amazon ElastiCache is a drop in replacement for Redis. Avalon uses Redis as a job queue.
Overview of Archtecture Diagram
inside vpc
-
zookeeper
- Manages solr configs, etc
- Avalon connects to zookeeper, hands over configs
- zookeeper, tells the solr to share avalon core, handles redundancy, etc, zookeeper, will alert
- zookeeper allows us to scale out SOLR when needed
-
fedora
- Stores metadata in RDS Postges
- Stores binaries in an s3 bucket
-
Avalon
- Configurations are stored as environment variables
- the webapp pushes jobs into the que, the worker picks up jobs to make sure that things don't slow down
outside vpc
-
s3 buckets
- Masterfiles (store originals and uploaded "dropbox" items)
- when a user puts an item in a bucket, it kicks off a lambda
- Takes json, shoves it into notification service
- Batches, the worker picks it up from the que and does it
- Batch ingest works on demand rather than as cron
- Derivatives ( stores transcoded items for streaming)
- Fedora Binary Storage (stores Fedora items)
- Masterfiles (store originals and uploaded "dropbox" items)
-
Elastic Transcoder
- Consists Input bucket, output bucket
- Sets off jobs based on input and output
- Currently, things are sitting in the buckets after transcode.
- Currently we have 6 derivatives
- Moving forward, should we just do mpeg dash? and auto, cut down to low/high
-
Streaming
- CloudFront
- Presigned URLs for authorization
-
Code Pipeline
- This is how the EBS systems get updated
- The code pipline watches an s3 bucket if avalon.zip updates, it updates the code