gcp Cloud Storage - ghdrako/doc_snipets GitHub Wiki

Maximum object size 5 TiB (resumable uploads are the recommended method for uploading large objects)

Google Cloud provides the following storage options:

  • Zonal standard persistent disk and zonal SSD persistent disk
  • Regional standard persistent disk and regional SSD persistent disk
  • Local SSD for high-performance local block storage
  • Cloud Storage buckets: Object storage.
  • Filestore: High-performance file storage
gsutil mb -p PROJECT_ID gs://BUCKET_NAME
gsutil mb -p PROJECT_ID -c STORAGE_CLASS -l BUCKET_LOCATION -b on gs://BUCKET_NAME
gsutil cp -p PROJECT_ID gs://[BUCKET_NAME]/[OBJECT_NAME] [SAVE_TO_LOCAL_LOCATION]
gsutil cp [LOCAL_OBJECT_LOCATION] gs://[DESTINATION_BUCKET_NAME]/
gsutil stat -p PROJECT_ID gs://BUCKET_NAME
gcloud storage mv gs://SOURCE_BUCKET_NAME/SOURCE_OBJECT_NAME gs://DESTINATION_BUCKET_NAME/DESTINATION_OBJECT_NAME

gsutil Flags:

  • -p: Set the project id for your new bucket
  • -c: Set the default storage class of your bucket
  • -l: Set the location of your bucket. For example, US-EAST1.
  • -b: Enable uniform bucket-level access for your bucket.

Cloud Storage GCS stores a different form of data known as unstructured data, as it cannot structure your data into rows and tables. It will only store a sequence of bytes, exactly the way you store it.

Object-based storage is accessible programmatically versus block storage which can only be accessed via an operating system. Object-based storage is also flat in nature; it does not have the hierarchical structure of file systems.

Objects stored in bucket.

Object Lifecycle Management (OLM) that allows you to define conditions for your data that, when met, will automatically move your data to a lower storage class to help reduce monthly bills.

Storage Classes

Storage Class gsutil Name min duration avaliability description
Standard STANDARD None >99.99%/99.99%
Nearline NEARLINE 30 days 99.95%/99.9% - once a month min 30 dey - buckup
Coldline COLDLINE 90 days 99.95%/99.9% - once a year min 90 day - archiving and disaster revovery
Archive ARCHIVE 365 days 99.95%/99.9%

Object Livecycle Managament

Availability

  • multi-region
  • dual-region
  • regional

Access Control

  • Cloud IAM
  • ACL - how can access objects in buckets (max 100 ACL per object) Permition(Owner Writer Reader) Scope(user,group ex AllUsers,ALlAuthenticatedUsers) ACL entry = [scope + permition]
  • signed URL (and timed) gsutil signurl -d 10m path/to/privatekey.p12 gs://bucket/object operation permmitet in ticket (HTTP GET,PUT DELETE)
  • signed Policy Document

Object Versioning

Object Versioning cannot be enabled on a bucket that currently has a retention policy.

gsutil versioning set STATE gs://BUCKET_NAME
gsutil versioning get gs://BUCKET_NAME

Retention policies

Retention periods

Retention periods are measured in seconds

Retention policy locks

Set lock a retention policy on a bucket prevent the policy from ever being removed or the retention period from ever being reduced (although you can still increase the retention period). If you try to remove or reduce the policy duration of a locked bucket, you get a 400 BadRequestException error. Once a retention policy is locked, you cannot delete the bucket until every object in the bucket has met the retention period.

Locking a retention policy is irreversible - delete the entire bucket in order to "remove" the policy after all objects in it that have fulfilled their retention period.

gsutil retention set 2678400s gs://BUCKET_NAME  # set retension polices to  2678400s
gsutil retention clear gs://BUCKET_NAME  # remove retention policy when is not lock
gsutil retention lock gs://BUCKET_NAME   # permanently lock retention policy
gsutil retention get gs://BUCKET_NAME    # get retention
  • Object change Notification - web Hook
  • Pub Sub Object change Notification - better (faster and cost effective)

Strong Global Cosistency

  • Read-after-write
  • Read-after-metadata-update
  • Read-after-update
  • Read-after-delete
  • Bucket listing
  • Object listing
  • Granting access to resource
gcloud config set project PROJECT
gsutil cp gs://[BUCKET_NAME]/[OBJECT_NAME] [SAVE_TO_LOCAL_LOCATION] # download the object to local location

Object holds

Cloud Storage offers the following types of holds:

  • Event-based holds
  • Temporary holds Use event-based holds in conjunction with retention policies to control retention based on the occurrence of some event, such as holding loan documents for a certain period after loan was paid. Temporary holds can be used for regulatory or legal purposes, such as holding trading documents for legal investigation.
gsutil retention event-default STATE gs://BUCKET_NAME # STATE set to use default event-based holds or release to not use default event-based holds.
gsutil retention HOLD_TYPE STATE gs://BUCKET_NAME/OBJECT_NAME # HOLD_TYPE event/temp

Recursively copy all your objects from the source bucket to the destination bucket

gsutil cp -r gs://SOURCE_BUCKET/* gs://DESTINATION_BUCKET
gsutil -m cp -r gs://gcs-bucket-name/batch gcs-bucket-name-local

-m option - perform a parallel multi-threaded/multi-processing copy

List buckets

gsutil ls

Viewing the IAM policy for a bucket

gsutil iam get gs://BUCKET_NAME
gcloud projects get-iam-policy my_project
gcloud projects add-iam-policy-binding xxx --member "serviceAccount:[email protected]" --role "roles/storage.objectViewer"

Versioning

gsutil versioning set (on|off) gs://<bucket_name>...
gsutil versioning get gs://<bucket_name>...   # Enable or suspend 

# When Bucket versioning is enabled, we can get all versions of a file by typing
gsutil ls -a gs://path/to/file

Public access

gsutil acl ch -u AllUsers:R gs://my-bucket-akepka-9/index.html
cd ~
curl https://raw.githubusercontent.com/hashicorp/learn-terraform-modules/master/modules/aws-s3-static-website-bucket/www/index.html > index.html
curl https://raw.githubusercontent.com/hashicorp/learn-terraform-modules/blob/master/modules/aws-s3-static-website-bucket/www/error.html > error.html
gsutil cp *.html gs://YOUR-BUCKET-NAME
https://storage.cloud.google.com/YOUR-BUCKET-NAME/index.html

Storage notification

gsutil notification create -f (json|none) [-p <prefix>] [-t <topic>] \
    [-m <key>:<value>]... [-e <eventType>]... gs://<bucket_name>
gsutil notification delete (<notificationConfigName>|gs://<bucket_name>)...
gsutil notification list gs://<bucket_name>...

gsutil notification watchbucket [-i <id>] [-t <token>] <app_url> gs://<bucket_name>
gsutil notification stopchannel <channel_id> <resource_id>
  • OBJECT_FINALIZE - An object has been created.
  • OBJECT_METADATA_UPDATE - The metadata of an object has changed.
  • OBJECT_DELETE - An object has been permanently deleted.
  • OBJECT_ARCHIVE - A live version of an object has become a noncurrent version.

Example

gsutil mb gs://BUCKET_NAME
gcloud pubsub topics create TOPIC_NAME
gcloud pubsub subscriptions create SUBSCRIPTION_NAME
gsutil notification create -f json -t <TOPIC_NAME> gs://<BUCKET_NAME>
gsutil cp FILE_NAME gs://BUCKET_NAME
gcloud pubsub subscriptions pull SUBSCRIPTION_NAME --auto-ack
gsutil rm gs://BUCKET_NAME/FILE_NAME
gcloud pubsub subscriptions pull SUBSCRIPTION_NAME --auto-ack

Grep on bucket

gsutil cat gs://bucket/ | grep "what you wnat to grep"
$ python bucket_grep.py bucket_name pattern directory_if_any

$ cat bucket_grep.py
from google.cloud import storage
import re
import sys

client = storage.Client()
BUCKET_NAME = sys.argv[1] 
PATTERN = sys.argv[2]
PREFIX = ""
try:
    PREFIX= sys.argv[3]
except:
    pass

def search(string, patern):
    obj = re.compile(patern) 
    return obj.search(string)

def walk(bucket_name, prefix=''):
    bucket = client.bucket(bucket_name) 
    blobs = bucket.list_blobs(prefix=prefix) 
    for ele in blobs:
        if not ele.name.endswith("/"): 
            yield ele

for file in walk(BUCKET_NAME, prefix=PREFIX): 
    temp = file.download_as_string().decode('utf-8') 
    if search(temp, PATTERN):
        print(file.name)
⚠️ **GitHub.com Fallback** ⚠️