SharedFileSystems - umccr/aws_parallel_cluster GitHub Wiki
Shared File Systems
Staging input and reference data
You will likely need to download your input data and software.
Ensure that inputs are accessible to all nodes by placing it in the ${SHARED_DIR}
folder.
If you specified --file-system-type
as efs
(default) in your start_cluster.py
command,
then the SHARED_DIR
environment variable will be set
to /efs
. Alternatively if --file-system-type
is set to fsx
then the SHARED_DIR
environment variable will be set to /fsx
.
By default you will have read-only access to the s3 buckets linked to your aws account.
Use sbatch --wrap "aws s3 sync s3://<bucket_path> "${SHARED_DIR}/local_path"
to download data
into the shared file system.
The compute nodes have much higher band width than the head node which is why the command above is wrapped in an sbatch script.
Uploading data back to s3
Assumes yawsso is in your path yawsso is installed in the pcluster conda env
By default, parallel cluster does not have write access to s3 buckets.
A workaround is taking your short-term local SSO credentials and importing them into parallel cluster.
To do this you must have the following:
- Logged in to AWS on your local computer via sso
- Have your parallel cluster environment activated, OR at least have
aws2-wrap
oryawsso
in your PATH - Have the
ssm_run
function sourced from [this GitHub repo][alexiswl_bashrc]
From your local computer run:
master_instance_id="<master_ec2_instance_id>"
shared_fs_path="</path/to/outputs>"
path_to_s3_bucket="<s3://bucket>"
export_env_vars="$(yawsso --export-vars --profile "${AWS_PROFILE}" | \
sed 's/export //g' | \
tr '\n' ',' | \
sed 's/,$//')"
echo " sbatch \
--partition=\"copy\" \
--export \"${export_env_vars},ALL\" \
--wrap \"aws s3 sync \\\"${shared_fs_path}\\\" \\\"${path_to_s3_bucket}\\\" \"" | \
ssm_run \
--instance-id "${master_instance_id}"
The space before the sbatch is for security reasons.
Be aware you are running a command on a shared parallel cluster with your personal access tokens.
By prefixing the command with a space, this prevents the tokens being exposed in the ec2-user's bash history.
Please note this is not foolproof method.