Data Management - uwsph/hpcusers GitHub Wiki

The cluster's storage system is independent of other storage systems within SPH. It's kept isolated to provide increased security, performance, and in general optimized for a cluster computing environment. As a result, data must be transferred to and from the system. The most common ways to move data are via SFTP and OnDemand's file manager.

Storage Locations

Home Directory (/home/user/YOURNETID)

Every user is given a home directory for storing their personal files. This directory is good for storing work that isn't part of a larger group. It's also the location where applications like OnDemand will store their user specific temporary data or log files.

Group or Project (/projects/YOURPROJECT)

Groups can purchase storage for their use on the cluster. This space is for sharing files between group members. Internally, each group should develop a method for organizing their data, and code. If projects share some kind of base data, like genome data, they should be stored in a manner to avoid duplication.

Scratch

Scratch is temporary storage on the cluster. Its purpose is for holding data files that are either intermediate data or will only exist on the cluster for the duration of a job. You should not store data here long-term, as it'll be removed automatically after a period of inactivity.

Transferring Data

When transferring data, you can either establish the connection from the cluster itself, or from a remote system into the cluster. If you're transferring files to another server, it may be best to establish the connection from the cluster. However, if you are transferring data that is on your desktop or laptop computer, it would be best to establish a connection to the cluster.

Transferring from the Cluster

SFTP (Command Line)

When establishing an SFTP session from the cluster, it's best to do so from the login node. For SFTP connections, you can use the builtin "sftp" command line client. Here's how:

(Start Here) Connecting to a remote host:

sftp USERNAME@HOST

Command What it does
cd /go/to/dir Change remote directory (change which directory you are in on the remote host)
lcd /go/to/dir Change local directory (change which directory you are in on the cluster)
get file.txt Download the file
get -r dir_i_want Download a directory (recursive download)
put file.txt Upload a file
put -r dir_i_want Upload a directory (recursive upload)
quit Exit the SFTP session

Transferring to the Cluster

Using the SFTP client of your choice (we typically use WinSCP or CyberDuck), you'll want to establish an SFTP connection to the login node. Most desktop SFTP clients work the same, with a simple drag and drop interface, much like Windows Explorer. You'll find the connection settings below:

  • Protocol: SFTP
  • Host (or Server): login.hpc.sph.washington.edu
  • Port: 22
  • Username: UW NetID
  • Password: UW NetID Password

Below are some tutorials / guides for common desktop SFTP clients: