CU Cloud Storage - earthlab/earth-lab-operations GitHub Wiki

Options for cloud storage at CU

  • AWS (commercial: Amazon)
  • Google Drive (commercial: Google)
  • OneDrive (commercial: Microsoft)
  • Petalibrary (internal: CU Research Computing)

Note that each of these have pros/cons and limits:

  • AWS - has tiers of storage on temperature gradient from cold ("Glacier") to hot ("S3"). The temperature associates with how often you might access the storage and how fast it is to access it. For example, Glacier is cheap, but it is slow to access because it sits on an external drive that they have to go get and load. You would use this for big data archive that you don't really access very frequently. S3 is great for data you are constantly and actively working with, but is quite costly, so it doesn't make sense of longer-term storage.
  • Google Drive - is limited storage at CU. The current storage thresholds are 5GB per user. That seems like an archaically small storage size given that free accounts get 15 GB for free, but this is a contractual limitation for CU based on Google constraints. Google Drive is best suited for ACTIVE COLLABORATIONS. Google Drive offers something called GSuite, which have office applicaitons like google docs, sheets, slides, etc that are fantastic at version control and real-time collaborative writing/editing. Google outpaces any other commercial provider in real-time collaborations and version control. It's great to use this resource for this purpose, but once the active collaboration ends, you will want to zip your drive and move it to One Drive (see next bullet).
  • One Drive - has unlimited storage. This is great for longer-term personal data storage for projects.
  • Petalibrary - is a CU Research Computing resource for data storage and shared access within the CU network as well as external users through Globus. This is a paid service, for which Earth Lab has an account, but is much cheaper than any commercial option. Please work with the Analytics Hub staff to use this option.

Best Practices for getting the most of Google Drive and One Drive

  • Create a folder on Google Drive > My Drive > ACTIVE COLLABORATIONS for work that is in google docs, slides, sheets, etc and is actively worked on by big teams of people
  • Create a folder on One Drive > My Drive > ARCHIVED COLLABORATIONS for data, videos, past projects where files are more or less “locked” and not dynamic
  • Maintain the same folder and organizational structure and just archive when done (easily export whole folder by zipping)

Here is an example: Screen Shot 2022-02-18 at 9 46 06 AM

Getting set up and accessing data on Petalibrary

The Earth Lab PetaLibrary identifier is "earthlab".

You can access that allocation by logging into the RC system, then navigating to that directory:

$ cd /pl/active/earthlab/

From Globus you can also reach this location in about the same way. Log into Globus using your CU credential then connect to the "CU Boulder Research Computing" collection, log into this RC collection, then use the same path to navigate to "/pl/active/earthlab/" in the Path input.

What to do when you leave CU?

The best part of following the above best practices is that it should be easy for you to port over your 5GB of Google proprietary document formats to microsoft and organize accordingly. Then you can simply create an external harddrive of that One Drive folder or port to Petalibrary.

For any files that are still in active collaboration and on Google Drive, please open a CIRES IT ticket to transfer ownership of that folder to: [email protected]. Then notify Analytics Hub staff so that they know to maintain that accounts data storage and ensure that it is organized in a way that makes sense for provenance at Earth Lab.