Cineca HPC - aimagelab/aimagelab-srv GitHub Wiki

ℹ️ Sospensione: Al momento i servizi di HPC di CINECA non vengono utilizzati da AImageLab, per mancanza di incompatibilità tra le due strutture.

AImageLab regularly uses systems at Cineca HPC. Contrary to aimagelab-srv, Cineca HPC is an external service, which does not share any part of the AImagelab filesystem.

Cineca HPC, at present, is mainly used:

by the AImagelab Staff, to execute long-term, non-critical jobs
by AImagelab master students, to execute experiments of any kind

Requested projects

Access to Cineca HPC facilities is subject to the activation of appropriate ISCRA projects. Staff from AImagelab usually requires a new project shortly before the current active project (marked with ✔) expires.

Project number	PI	Name/Account	Grant type	HPC Platform	Budget (hrs)	Starts	Expires	Status
IsC38	Cucchiara	DeepVid	Iscra C	Galileo	6 000	9 February, 2016	9 November, 2016	✘
IsC47	Cucchiara	DeepVD	Iscra C	Galileo	13 000	7 December, 2016	7 September, 2017	✘
IsC55	Cucchiara	DeepCA	Iscra C	New Galileo	50 000	12 October, 2017	12 July, 2018	✘
IsC62	Baraldi	DeepVS	Iscra C	New Galileo/DAVIDE	10 000	8 June, 2018	8 March, 2019	✔

Legend: ✘ Expired ✔ Active ⌛ Pending review/activation

Become a user

Getting a personal username on Cineca HPC requires some easy steps:

Register to the UserDB portal. After creating an account, upload your identity card on your profile and wait for the Cineca HPC staff to validate it (expect a few days delay). When that is done, your document will appear as "verified" on your profile.
Open an issue on the aimagelab-srv repository, and ask to be associated to a valid project as a collaborator.
You will be given the access information (login name and password) via e-mail.

Access to the New Galileo

The New Galileo can be accessed from login.galileo.cineca.it. Use the credentials received from Cineca HPC.

Accounting

HPC resources at Cineca can be used on a "pay for use" basis.

Currently the cost is based on elapsed time and the effective number of cores (reserved, not used!) by the batch jobs.

The command: saldo -b shows all defined accounts for the username together with available budgets. A more detailed output can be obtained by running saldo -r, which gives also the number of resources consumed by each user in the project.

The elapsed time spent by the users' batch jobs, multiplied by the number of reserved cores (core-h), is used to decrease the account budget; the update occurs once a day, at midnight, taking into account all jobs completed in the past 24 hours.

If you are a student, you should ask to your manager a quota (in terms of core-h) and monitor the cost of jobs accordingly. Since the amount of available resources is limited, we take this quite seriously. There is a wiki page which enumerates all the users along with their quotas.

Storage areas

There are two different storage areas that you might use.

$HOME: permanent/backed up, user specific, local

This is a local area where you are placed after the login procedure. This area is conceived to store programs and small personal data. Files are never deleted from this area, moreover they are guaranteed by daily backups.

$WORK: permanent, project specific, local

This is a scratch area for collaborative work within a given project. File retention is related to the life of the project. Files in $WORK will be conserved up to 6 months after the project end, then they will be cancelled. Please note that there is no back-up on this area.

This area is conceived for hosting large working data files, since it is characterized by high bandwidth of a parallel file system. It behaves very well when I/O is performed accessing large blocks of data, while it is not well suited for frequent and small I/O operations. This is the main area for maintaining scratch files resulting from batch processing.

Owner of the main directory is the PI (Principal Investigator) of the project, which is usually either Prof. Cucchiara or a member of the AImageLab staff. All users are allowed to read/write in there, and are advised to create a personal directory in $WORK for storing their personal files.

You can transfer data to/from Cineca by opening an SFTP connection to the login node.

Environment Modules

All software programs installed on the CINECA machines are available as modules.

In order to have a list of available modules and select a specific one you have to use the module command. The following table contains its basic options:

Command                Action
---------------------------------------------------------------------------------------------- 
module avail ......... show the available modules on the machine 
module load <appl> ... load the module <appl> in the current shell session, preparing the environment for the application. 
module help <appl> ... show specific information and basic help on the application 
module list .......... show the modules currently loaded on the shell session 
module purge ......... unload all the loaded modules 
module unload <app>... unload a  specific module
----------------------------------------------------------------------------------------------

As you will see by typing "module avail", the software modules are collected in different profiles (base,advanced....) and organized by functional category (compilers, libraries, tools, applications,..).

Feel free to load all the modules you might need. Here is a (non-exhaustive) list of modules that might be useful for most deep-learning applications:

module load python/3.6.4
module load profile/deeplrn
module load cuda/9.0
module load gnu/4.9.2
module load cudnn/7.0--cuda--9.0

Remember that you can always compile/install the software you need in your user space. For instance, if you need to use PyTorch, you can just install it in your home folder:

pip3 install --user numpy          # prevent compatibility issues
pip3 install --user torch
pip3 install --user torchvision

Using the SLURM scheduler

New Galileo and Marconi use a SLURM scheduler, just like aimagelab-srv. If you are not familiar with SLURM, see this wiki page for a short introduction.

When working on the New Galileo, you should always use the partition gll_usr_gpuprod, which contains GPU-equipped nodes.

For example, to have an interactive shell with one GPU, just run:

srun --partition gll_usr_gpuprod --gres=gpu:1 --pty bash

You can also, of course, create batch jobs.