Storage & Backups - psu-libraries/scholarsphere GitHub Wiki

File Storage & Backup Requirements

  • The Libraries must be able to permanently delete all copies and backups of a file if required by law or University policy.
  • If a file is deleted by a user (for example, a file attached to a deleted draft work version), the file should be recoverable for a minimum of 30 days. The only exceptions are files the Libraries are required by law or University policy to immediately and permanently delete.
  • Once a work version is published, the associated files cannot be deleted. (Repository managers may, however, withdrawn the work version and delete the files if necessary).
  • Files of published work versions are retained and made accessible (in accordance with the visibility settings of the work) for the lifetime of the repository. They should be replicated in at least two geographic regions and, additionally, backed-up on storage at University Park.
  • Files of withdrawn work versions are generally retained though access is limited to the depositor and repository mangers. Repository managers may permanently delete files associated with withdrawn work versions.

Implementation Details

Files deposited to ScholarSphere are uploaded to an AWS S3 bucket (the primary bucket) in us-east-1 (North Virginia). The primary bucket is replicated to us-west-2 (Oregon). In addition, files are archived (or "transitioned") to AWS S3 Glacier for long-term backup. It can take up to 24 hours for deposited files to be backed up t S3 Glacier storage.

Both the primary bucket and the replication bucket have Versioning enabled. This means, in part, that deleted files are marked as deleted but they are still recoverable. When versioning is enabled and a file is deleted, the current version of the file is set to a "deletion marker"; the file can be recovered by deleting the deletion marker.