1.3 File Storage, Check Space, Recover Lost File - bu-rcs/SA-Biostatistics GitHub Wiki

Where Should you Store your Files?

Users on the SCC are automatically granted several locations to store their files. Our overall file storage system is described here. Most users will be primarily storing files in three areas, all of which are generally accessible from all of the login and compute nodes; the exception is that the /restricted/ partitions are only accessible from the scc4.bu.edu login node and all of the compute nodes:

  1. Home Directory – This directory is entirely controlled by you and the default permissions are that nobody else can see or otherwise access your files. Home directories have a quota of 10 GB and this will generally not be increased. You will naturally store files directly related to your account here, such as dotfiles. It is also commonly used to store personal files, such as email or personal images. Although it is possible to do work in your home directory if it fits within the 10GB limit, we recommend you use Project Disk Space in case you end up needing more space than you anticipate. Home directories are both protected by Snapshots and also backed up off site.

  2. Backed Up Project Disk Space – Projects are by default granted 50 GB of space under /restricted/project/project_name/ for most BUMC projects. This number can be increased to a maximum of 200 GB at the request of the project leader(s) but it can not go beyond that. This data is both protected by Snapshots and also backed up off site. Depending on the workflow of the project, a reasonable approach is to keep code and files you hand-edit in /restricted/project/ and files downloaded or generated by code or applications in /restricted/projectnb/.

  3. Not Backed Up Project Disk Space – Projects are by default granted 50 GB of space under /restricted/projectnb/project_name/ for most BUMC projects. This can be increased for free to a maximum total allocation of free disk space of 1000 GB and then beyond that additional Not Backed Up space can be purchased through either Buy-In or Storage-as-a-Service. Despite the name for this space, it is protected by both hardware RAID (protecting against disk failures) and daily Snapshots (protecting against accidental deletion of files). You will want to use this space for any large quantities of data you have. We have guidelines for what data should be stored in each partition.

Checking How Much Space you are Using

  • You can easliy check quota on SCC OnDemand:

  • Use the command pquota to see your quota and usage:
[aaa@scc4 ~] pquota -u animate
                                      quota      usage     usage
project space                          (GB)       (GB)   (files)
-----------------------------------  ------  ---------  --------
/project/animate                         50       0.00         1

/projectnb/animate                       50       3.45      4328
    15407                                         0.09        80
    73043                                         0.25        61
    82363                                         0.11       243
    dcornell                                      0.29       104
    laura                                         1.02      2114
    rcrnl                                         1.68      1723
    root                                          0.00         3

The -u option asks for a breakdown of usage by the users on the project, in addition to the default project totals. Information on quota (in GB), usage (in GB), and number of files is given for each partition the selected project group has access to. If there are any numbers instead of login names in the list, as in the example above, they refer to files owned by users who had accounts on the system long ago. Note that the pquota data on most filesystems is updated every five minutes so if you delete some files, you will need to a wait a few minutes to see the change reflected by the command.

  • The command for home directories to show quota (10 GB for almost all users) and usage is quota -s:
[aaa@scc4 ~] quota -s
Home Directory Usage and Quota:
Name           GB    quota    limit in_doubt    grace |    files    quota    limit in_doubt    grace
adftest2  0.00212     10.0     11.0      0.0     none |      287        0        0        0     none

Retrieving Lost Files from Snapshots

SCC home directories, Project Disk Space have a feature called snapshots implemented. The snapshots are copies of files and are stored within the file system, making them useful and convenient for retrieving files that are accidentally deleted, but not useful in the event of catastrophic failure. You have access to the snapshots of your files. Each directory contains a hidden subdirectory called .snapshots. It is not visible in an **ls -a ** listing, but you can cd into it:

[aaa@scc4 ~] cd .snapshots
[aaa@scc4 ~] ls
160514/ 160515/ 160516/ 160517/

The directory names use the form YYMMDD to represent the day that the snapshot was created. Snapshots are taken at 12:01am every day. Regardless of the permissions that appear to be set on the snapshots of your files, you cannot overwrite or remove them. You can, however, copy them to your directory in the main file system. The snapshots in no way count against your allocation; you can ignore how much file space they take up.

In short, you can recover lost files by:

  1. cd .snapshots
  2. ls to see which days of directory are available in snapshot
  3. cd YYMMDD and ls the files within YYMMDD directory
  4. cp file1 /yourdestinationfolder/ to recover the lost file1.

Bote that cp does not work if there is a file with the same filename in the destination foler with the file you want to copy. So you should either delete the file in your destination folder or rename it before doing cp.






Ref: https://www.bu.edu/tech/support/research/system-usage/using-scc/managing-files/

⚠️ **GitHub.com Fallback** ⚠️