Operator Tips & Tricks - StormSurgeLive/asgs GitHub Wiki

S3 Storage

Automated file transfers often use scp but s3cmd is increasingly used due to the prevalence of cloud storage with an S3 compatible interface (originally developed by Amazon for their cloud services but now an industry standard protocol in its own right). The DigtalOcean Spaces storage that we use for ASGS is also S3 compatible, hence this document.

At some point, it would be great for ASGS to install and set up s3cmd by default. But until then, this document will help us remember some of the details so we can set it up and use it on an as-needed basis.

Download: S3cmd

Configure

By default, s3cmd stores its configuration file, .s3cfg, in the home directory of the user that ran the configuration command. .s3cfg is a plain text file of key/value pairs which can be edited directly once it has been created.

s3cmd uses the options set in its default configuration file when you run commands. You can specify a different configuration by appending -c ~/path/to/config/file to each command you run.

If DigitalOcean is the main or only provider you’ll connect to with s3cmd and you don’t want to specify its configuration file every time you use s3cmd, configure the default ~/.s3cfg file with the following command:

s3cmd --configure

Here is how it played out on mike:

[username@mike1 ~]$ module load python
[username@mike1 ~]$ module list
Currently Loaded Modulefiles:
 1) intel/2021.5.0   2) intel-mpi/2021.5.1   3) python/3.9.7-anaconda  
[username@mike1 ~]$ s3cmd --configure

Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.

Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
Access Key: [cut/paste accesskey](/StormSurgeLive/asgs/wiki/cut/paste-accesskey)
Secret Key: [cut/paste secretkey](/StormSurgeLive/asgs/wiki/cut/paste-secretkey)
Default Region [US]: <just hit enter>

Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
S3 Endpoint [s3.amazonaws.com]: sfo2.digitaloceanspaces.com

Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used
if the target S3 system supports dns based buckets.
DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: %(bucket)s.sfo2.digitaloceanspaces.com                                                                

Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password: <just hit enter>
Path to GPG program [/usr/bin/gpg]: <just hit enter>

When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP, and can only be proxied with Python 2.7 or newer
Use HTTPS protocol [Yes]: <just hit enter>

On some networks all internet access must go through a HTTP proxy.
Try setting it here if you can't connect to S3 directly
HTTP Proxy server name: <just hit enter>

New settings:
  Access Key: [cut/pasted accesskey](/StormSurgeLive/asgs/wiki/cut/pasted-accesskey)
  Secret Key: [cut/pasted secretkey](/StormSurgeLive/asgs/wiki/cut/pasted-secretkey)
  Default Region: US
  S3 Endpoint: sfo2.digitaloceanspaces.com
  DNS-style bucket+hostname:port template for accessing a bucket: %(bucket)s.sfo2.digitaloceanspaces.com
  Encryption password: 
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: True
  HTTP Proxy server name: 
  HTTP Proxy server port: 0

Test access with supplied credentials? [Y/n] <just hit enter>
Please wait, attempting to list all buckets...
Success. Your access key and secret key worked fine :-)

Now verifying that encryption works...
Not configured. Never mind.

Save settings? [y/N] y
Configuration saved to '/home/jgflemin/.s3cfg'
[username@mike1 ~]$ s3cmd ls
2019-03-12 21:06  s3://asgs-static-assets

Usage

Create Buckets: Use the command mb, short for “make bucket”, to create a new bucket:

s3cmd mb s3://spacename s3://secondspace

List Buckets: s3cmd ls

List Buckets and Contents: s3cmd ls s3://spacename s3://secondspace

List all existing buckets: s3cmd la --recursive

Upload

Use the put command to copy files from your local machine to a bucket. In all of these commands, you must include the trailing slash. When you include the trailing slash, as in the example below, the original file name will be appended. If you omit the slash, then the file will be copied to the bucket with the new name, path.

s3cmd put file.txt s3://spacename/path/

New name: You can change the name of a file as you put it in a bucket by typing the new name at the end of the path as follows:

s3cmd put file.txt s3://spacename/newname.txt

Multiple Files:

s3cmd put file1.txt file2.txt path/to/file3.txt s3://spacename/path/

Using the * with the put command will copy everything in the current working directory, recursively, into your bucket:

s3cmd put * s3://spacename/path/ --recursive

You can set public permissions for all files at once by adding --acl-public, and you can similarly set metadata with --add-header (like --add-header=Cache-Control:max-age=86400):

s3cmd put * s3://yourfolder --acl-public --add-header=Cache-Control:max-age=86400 --recursive

Download

The command get copies files from a bucket to your local computer.

One File: s3cmd get s3://spacename/path/to/file.txt

All Files in a Directory: To get multiple files, the s3 address must end with a trailing slash, and the command requires the --recursive flag:

s3cmd get s3://spacename/path/ --recursive

New Name: Like the put command, the get command allows you to give the file a different name.

s3cmd get s3://spacename/file.txt newfilename.txt

Permissions

s3cmd only provides output when the command you issue actually changes access levels. For example, when you change the ACL from private to public, you’ll see output like s3://spacename/: ACL set to Public. If the ACL is already public, s3cmd will return silently to the command prompt.

Enable directory listings

s3cmd setacl s3://spacename/ --acl-public

Disable directory listings

s3cmd setacl s3://spacename/ --acl-private

Using the setacl command, files can be made private so that only someone connecting with a valid key pair will be able to read the file, or public so that anyone can read the file with either an S3 compatible client or via HTTPS.

s3cmd only provides output when the command you issue changes the access. For example, when you change the ACL from private to public, you’ll see output like s3://spacename/test.txt: ACL set to Public [1 of 1]. If the ACL is already public, s3cmd will return silently to the command prompt.

Make a file public

s3cmd setacl s3://spacename/file.txt --acl-public

Make all the files at a path public recursively: Use the --recursive flag to make multiple files public recursively:

s3cmd setacl s3://spacename/path/to/files/ --acl-public --recursive

Make a file private

s3cmd setacl s3://spacename/file.txt --acl-private

Make all the files at a path private recursively: Use the --recursive flag to make multiple files private recursively:

s3cmd setacl s3://spacename/path/to/files/ --acl-private --recursive

COLDSTARTDATE

Traditionally, the Operator is required to compute the COLDSTARTDATE datetime based on the HINDCASTLENGTH number of days relative to the date at which the first nowcast or forecast is to start (often on or just before the current datetime). For example:

HINDCASTLENGTH=30
COLDSTARTDATE=...<<operator calculates 30 days prior to midnight on the current date in head>>

This is cumbersome and error prone. So we have developed a way for the Operator set to the HINDCASTLENGTH and then use the date builtin to bash to automatically calculate the COLDSTARTDATE based on the date that the Operator would like for the tidal/river spinup to end, which is much easier. For reference, the date command for date math - https://stackoverflow.com/questions/18180581/subtract-days-from-a-date-in-bash

For example:

HINDCASTLENGTH=30
HINDCASTENDDATE=$(date +%Y%m%d)  # e.g., 20210505
COLDSTARTDATE=$(date --date="${HINDCASTENDDATE} -${HINDCASTLENGTH} days" +%Y%m%d%H)
# e.g.,         date --date="20210505           -30                days" +%Y%m%d%H)

The date for the tide+river initialization to end is given in YYYYmmdd format. It then uses the HINDCASTLENGTH (in days) to calculate the COLDSTARTDATE, relieving the Operator of doing this manually, or with a more elaborate helper script. It does require the Operator to make sure that the HINDCASTLENGTH is specified before the COLDSTARTDATE code in the ASGS configuration file.