Data Management - uwsph/hpcusers GitHub Wiki
The cluster's storage system is independent of other storage systems within SPH. It's kept isolated to provide increased security, performance, and in general optimized for a cluster computing environment. As a result, data must be transferred to and from the system. The most common ways to move data are via SFTP and OnDemand's file manager.
Storage Locations
Home Directory (/home/user/YOURNETID)
Every user is given a home directory for storing their personal files. This directory is good for storing work that isn't part of a larger group. It's also the location where applications like OnDemand will store their user specific temporary data or log files.
Group or Project (/projects/YOURPROJECT)
Groups can purchase storage for their use on the cluster. This space is for sharing files between group members. Internally, each group should develop a method for organizing their data, and code. If projects share some kind of base data, like genome data, they should be stored in a manner to avoid duplication.
Scratch
Scratch is temporary storage on the cluster. Its purpose is for holding data files that are either intermediate data or will only exist on the cluster for the duration of a job. You should not store data here long-term, as it'll be removed automatically after a period of inactivity.
Transferring Data
When transferring data, you can either establish the connection from the cluster itself, or from a remote system into the cluster. If you're transferring files to another server, it may be best to establish the connection from the cluster. However, if you are transferring data that is on your desktop or laptop computer, it would be best to establish a connection to the cluster.
Transferring from the Cluster
SFTP (Command Line)
When establishing an SFTP session from the cluster, it's best to do so from the login node. For SFTP connections, you can use the builtin "sftp" command line client. Here's how:
(Start Here) Connecting to a remote host:
sftp USERNAME@HOST
| Command | What it does |
|---|---|
cd /go/to/dir |
Change remote directory (change which directory you are in on the remote host) |
lcd /go/to/dir |
Change local directory (change which directory you are in on the cluster) |
get file.txt |
Download the file |
get -r dir_i_want |
Download a directory (recursive download) |
put file.txt |
Upload a file |
put -r dir_i_want |
Upload a directory (recursive upload) |
quit |
Exit the SFTP session |
Transferring to the Cluster
Using the SFTP client of your choice (we typically use WinSCP or CyberDuck), you'll want to establish an SFTP connection to the login node. Most desktop SFTP clients work the same, with a simple drag and drop interface, much like Windows Explorer. You'll find the connection settings below:
- Protocol: SFTP
- Host (or Server): login.hpc.sph.washington.edu
- Port: 22
- Username: UW NetID
- Password: UW NetID Password
Below are some tutorials / guides for common desktop SFTP clients: