AWS EC2 setup: lots of tips - lmmx/devnotes GitHub Wiki

Tips from AWS Tips I Wish I'd Known Before I Started, inspired by Sehrope Sarkuni's post

Application develoment

Look at docs on Instance Metadata and User Data
change in thinking, from e.g. caring about each individual host to 'cloud' advantages re: provisioning
store no application state on your servers - store session data on S3 databases
- if a server gets killed you won't lose state
- uploads can also go to S3
use Debian not Ubuntu for outward-facing servers (first 2 people I asked use Debian on server)
- via this thread on the merits of each distro:
Debian provides security support for all 10k+ packages. Ubuntu provides security support for 1k tops. i would never let a single package from Universe touch an outward facing server.

14.04 is really old right now. Debian 8 has much newer software. A new Ubuntu LTS will be out in April 2016.

(at time of writing Ubuntu 16.04 is there, but 14 is on the front AMI page)
AMI = Amazon Machine Image, "a template that contains the software configuration (operating system, application server, and applications) required to launch your instance. You can select an AMI provided by AWS, our user community, or the AWS Marketplace; or you can select one of your own AMIs."
NB: I have heard differently...
Store more info in logs:

Log lines normally have information like timestamp, pid, etc. You'll also probably want to add instance-id, region, availability-zone and environment (staging, production, etc), as these will help debugging considerably. You can get this information from the instance metadata service. The method I use is to grab this information as part of my bootstrap scripts, and store it in files on the filesystem (/env/az, /env/region, etc). This way I'm not constantly querying the metadata service for the information. You should make sure this information gets updated properly when your instances reboot, as you don't want to save an AMI and have the same data persist, as it will then be incorrect.
- "bootstrap" scripts are commands to be run on startup (Dockerfiles follow a similar logic in setting up an environment)
  - rather than just being a matter of reproducibility, AWS bootstrap scripts are intended to pass user data into the instance. Via AWS docs:
  You can also pass this data into the launch wizard as plain text, as a file (this is useful for launching instances via the command line tools), or as base64-encoded text (for API calls).
- from the CLI: use the --user-data parameter with the run-instances command.
  - for Amazon EC2 CLI, make that ec2-run-instances, and for the Query API use the UserData parameter with the RunInstances command.
use an SDK: {Android, browser, iOS, Java, .NET, Node.js, PHP, Python, Ruby, Go, C++ (developer preview)}
final recommendation on development from wblinks is to have a sysadmin tool, syslog viewer, or something to view current real-time log info w/o need to SSH into a running instance. Logs should be centralised elsewhere anyway.

Operations

If you have to SSH into your servers, then your automation has failed

"Disable SSH access to all servers", at the firewall level cf. on servers themselves to "transition to this mindset".

It forced me to get my automation into a decent state, but it might not be for everyone.

See: discussion of this tip on HN
Don't care about servers, just about the service (servers will fail, so what).
Don't give servers static/elastic IPs. Put things behind a load balancer
Get your alerts to become notifications
Catch runaway billing with granular bills

Security

use EC2 roles, don't give applications an IAM account
- "If you give an application AWS credentials, you're doing it wrong".
- specify permissions per role
- very easy to use SDK with these - will receive temporary credentials
Assign permissions to groups, not users
Don't log into master, use IAM
Multi-factor auth
Check for changes in security settings, prevent against intrusions (video, presentation)
use CloudTrail to keep an audit log, logs any action performed via API or web console to S3 bucket
- version bucket to ensure not modifiable - you hope that you'll never need to use this

S3

Use - not . in bucket names, for SSL to avoid mismatch errors (cannot change once created)
Avoid filesystem mounts
You don't need to use CloudFront in front of S3 (but it can help if you need speed over scale)
- CDN that copies your content to edge locations

This seems like a strange idea, but one of the implementation details of S3 is that Amazon use the object key to determine where a file is physically placed in S3. So files with the same prefix might end up on the same hard disk for example. By randomising your key prefixes, you end up with a better distribution of your object files. (Source: S3 Performance Tips & Tricks)

AWS has a CLI tool here