AWS EMR EC2 S3 tips - mikec964/chelmbigstock GitHub Wiki

Your first MapReduce step by step

Link to AWS tutorial
Link to materials for the hands-on session we had

Install s3cmd on EC2

To acceess your S3 storage easily from EC2, install s3cmd on your EC2.

  1. From a bash shell on an EC2 VM, run this command:
    $ curl -L -O http://sourceforge.net/projects/s3tools/files/s3cmd/1.5.0-rc1/s3cmd-1.5.0-rc1.tar.gz
    This creates s3cmd-1.5.0-rc1.tar.gz in the current directory.
  2. To extract the tar.gz, run
    $ tar -zxvf s3cmd-1.5.0-rc1.tar.gz
    This creates a new directory s3cmd-1.5.0-rc1.
  3. Go to the directory
    $ cd s3cmd-1.5.0-rc1
  4. Install s3cmd
    $ sudo python setup.py install
  5. Configure s3cmd
    $ s3cmd --configure
    s3cmd asks some questions:
  6. Access Key: enter your AWSAccessKeyId
  7. Secret Key: enter your AWSSecretKey
  8. Encryption password: whatever you like
  9. You can skip the rest of questions by hitting the enter key
  10. All set!
⚠️ **GitHub.com Fallback** ⚠️