MARCC - BenLangmead/jhu-compute GitHub Wiki

The Maryland Advanced Research Computing Center (MARCC) is a facility in east Baltimore on the Bayview campus of JHU. The Bluecrab, which is inside MARCC, is a big compute cluster that serves the schools of JHU as well as UMD. "MARCC functions are to generate, store, backup, analyze or visualize large datasets for members of the Hopkins and University of Maryland at College Park communities, and to provide the necessary infrastructure to transfer and share data at high bandwidths (100GB/s), storage and powerful computing processing resources."

The computing platform comprises approximately 19,000 cores and 96 GPU's for accelerated computing, with combined peak performance of over 900 Teraflops. A Lustre parallel file system provides over 2 Petabytes of disk storage, and 56 Gb/s Infiniband connectivity is used for all parallel applications messaging and I/O. A second ZFS file system with approximate 14 Petabytes capacity is available for storing and processing big data. MARCC is a shared system and will be initially used by 5 schools within Hopkins and UMCP.

MARCC is separate from HHPC and all other clusters currently at JHU. It is not replacing any of them.

To request an account

https://www.marcc.jhu.edu/request-access/request-an-account/

Limits

Partitions and Constraints

There are a number of potentially useful partition/queues defined in MARCC, a few of which are:

lrgmem (up to 1TB memory)
parallel (can ask for exclusive access)
shared
express (limited to <=4 cores, <=14GBs mem, <=12hrs)
skylake (these are hit/miss, but are the newer architecture as the name implies)

The express queue is for small, short jobs, but uses the newer skylake architecture and tends to be idle.

The skylake queue itself is made up of a number of skylake architecture nodes, but a number of which don't have network access (compute0685,compute0686,compute0688,compute0689,compute0694,compute0704) while compute0702 appears to have limited memory problems. These also tend to be idle. Their max mem is ~88GBs (as a setting).

Also, MARCC, as of 10/31/2017, added the notion of constraints to allow for choices of architecture, e.g. Intel Broadwell vs. Intel Haswell as well as GPUs, within a partition:

https://www.marcc.jhu.edu/gpu-driver-updates-completed-for-cuda-9-updated-partitions/

Storage

The Lustre and ZFS partitions are discussed above. Besides these, we also have a lab-specific 66-TB disk array accessible from all MARCC machines at /net/langmead-bigmem-ib.bluecrab.cluster/storage.

Scavenger (read free) Usage

In the case that we run out of allocation or we just want to test/save our allocation we can use the "scavenger" queue on which jobs can be pre-empted and are limited to 12 hours.

to do this, add the following (or replace your normal queue designation) to your sbatch command: -p scavenger --qos=scavenger

as of 9/18/2015 use of scavenger queue will still deduct from the PI's quota of core-hours, but also we're allowed to "go negative", though it's not clear if thats just on scavenger or all queues.

Compute nodes

A standard compute node (C6320) consists of:

Two Intel Haswell E5-2680v3 12-core 2.5 GHz cpus, 128 GB RAM Single port Infiniband card and Infiniband cables One port (out of 18) on an Infiniband switch (SX6025F) Access to the Infiniband director switch (SX6518) and IB cards Two ports on 1 gbps management nodes (7048R) and cables Slots on PDU and power whips Slot on rack Warranty for 5 years

You can SSH into any compute node, however, your SSH session and any child processes will be killed in ~5-10 minutes. This happens even if you are running a job on the node via SLURM and you SSH into it in addition.

Screen/nohup/disown does not get around this.

Dedicated nodes

[langmead-bigmem](MARCC bigmem)

Login info

Your login should be your university ID @ university .edu (where university id is your JHED ID or directory ID) For example [email protected] or [email protected]

Your password should be set from your account request. If this is not correct passwords can be reset at https://password.marcc.jhu.edu/

If you need any assistance please contact [email protected]

SLURM Commands

SLURM <> SGE Command Mapping:

http://slurm.schedmd.com/rosetta.pdf

Basic interactive mode:

salloc -J interact -N min#nodes-max#nodes --ntasks-per-node=1 --cpus-per-task=#cpus --time=DD-HH:MM:SS --mem=#g -p queuename srun --pty bash

ex.:

salloc -J interact -N 1-1 --ntasks-per-node=1 --cpus-per-task=1 --time=1:00:00 --mem=4g -p debug srun --pty bash

to check your running/queued jobs:

squeue -l -u <userlogin>

ex:

squeue -l -u [email protected]

To get statistics on completed/running jobs for a user:

sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed

ex :

sacct -u [email protected] --format=JobID,JobName,MaxRSS,Elapsed

Refer to the manual at https://computing.llnl.gov/linux/slurm/man_index.html for more details on Slurm commands.

Moving data onto and off of MARCC

James Taylor says basic rsync is really slow.

Data transfer nodes should be used:

dtn4.marcc.jhu.edu

dtn5.marcc.jhu.edu

Any files/filesets > a few 100 MBs should be copied to /scratch/groups/blangme2 rather than the home directory as the disk quota for home directories is quite low.

While individual rsync gets ~10 MegaBytes/s top speed, you can always manually split the file list up and start many parallel rsync jobs.
This worked for me to transfer part of the geuvadis set from HHPC to MARCC. As well as from HHPC to JHPCE.

Globus is available, but probably requires it to be setup both at the source and destination.