3. Storage, and navigating the HPC area - TGAC/knowledge_base GitHub Wiki
Questions:
- ""
Objectives:
- ""
Keypoints:
- ""
EI currently has approximately 18.5 petabytes (PB) of storage across various forms, although a significant portion is allocated for supporting multiple backup processes and our archive system. The primary storage area used by researchers (often called our Isilon storage) is smaller at 5 PB. This space is divided into the public area (/ei/public/) accessible to all users, and several other areas where access permissions are rigorously managed. These include legacy group areas, individual user spaces, and larger project-based allocations which we use to facilitate access management across the Norwich Research Park (NRP).
‘Projects’ in this sense are nominally distinct pieces of work, ranging in size and often (but not always) directly relating to a grant. Each project is assigned a universally unique identifier (UUID), such as 'b5b1d71e-3528-49eb-b757-fc61984d2b79' and is allocated a specific directory within /ei/projects/. These directories include a 'data' area, which is regularly snapshotted, and a scratch area, which is not. Snapshots are a feature of the Isilon system that periodically captures the current state of files, allowing users to revert any changes made and restore them to previous versions if needed. The ‘data’ and ‘scratch’ areas of a project have a tiered size quotas:
A hard (or maximum) quota, at which writes will be denied;
A soft quota (at 90% of the hard quota), which will deny further writes if exceeded continuously for 30 days
An advisory quota (at 80%), which triggers an email to your Data Champion (DC) and PI
To request new projects or quota increases, contact your group's Principal Investigator (PI) and data champion. Temporary or intermediary files might be unavoidably caught in the snapshot system, so we ask that users primarily use the data area for files which are relatively static or time consuming to reproduce (as accidental changes or deletions can be restored from the snapshot). It's advised to conduct ongoing work in scratch which is not snapshotted.
Reports on current disk usage are available here: https://reporting.researchcomputing.nbi.ac.uk/. This gives easy visibility to PIs and data champions for project areas and personal usage details.
Mounting HPC areas for direct access by a local system varies by operating system and is covered in detail in a subsequent lesson.
For further documentation and support, visit the RC documentation site and use the RC ticketing system (https://researchcomputing.nbi.ac.uk/).