Storage - bobbae/gcp GitHub Wiki

https://en.wikipedia.org/wiki/Computer_data_storage

Types of Storage

Google Cloud Storage Products

https://cloud.google.com/products/storage

Google Cloud Compute Engine Storage options

Compute Engine offers several types of storage options for your instances. Each of the following storage options has unique price and performance characteristics:

Increasing disk space of VM instance without down time

https://gridpane.com/kb/how-to-increase-the-disk-space-of-your-google-vm-instance-with-no-downtime/

File Systems

File System is a hierarchical storage methodology used to organize and store data on a computer system.

Kubernetes Storage

Kubernetes Storage is by default not persistent per pod but there are extensive methods to provide persistent storage features.

https://www.cncf.io/blog/2020/04/28/a-complete-storage-guide-for-your-kubernetes-storage-problems/

Block Storage

Discuss what kinds of block storage devices you have in your computers.

Differences between block and file storage

https://www.atlantic.net/dedicated-server-hosting/what-is-block-storage/

GCP Block storage resources

Block storage resources have different performance characteristics. Consider your storage size and performance requirements to help you determine the correct block storage type for your instances.

SCSI

Small Computer System Interface is a set of standards for physically connecting and transferring data between computers and peripheral devices. The SCSI standards define commands, protocols, electrical, optical and logical interfaces. The SCSI standard defines command sets for specific peripheral device types.

https://www.lifewire.com/small-computer-system-interface-scsi-2626002

SCSI Hard drives

https://www.servermonkey.com/blog/servers-101-hdd-interface-comparison-sata-vs-scsi-vs-sas.html

Disk arrays

https://en.wikipedia.org/wiki/Disk_array

Types of Hard drives

https://ttrdatarecovery.com/types-of-hard-drives-user-guide/

Disk scheduling

Elevator Algorithm

https://wikipedia.org/wiki/Elevator_algorithm

SCAN

https://www.geeksforgeeks.org/scan-elevator-disk-scheduling-algorithms/

C-SCAN

https://www.geeksforgeeks.org/c-scan-disk-scheduling-algorithm/

RAID

Redundant Array of Inexpensive Disks is a data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both.

Discuss RAID-5 parity.

https://www.open-e.com/blog/how-does-raid-5-work/

Linux software RAID

https://www.thomas-krenn.com/en/wiki/Linux_Software_RAID_Information

SSD

A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data persistently, typically using flash memory, and functioning as secondary storage in the hierarchy of computer storage. SSDs lack the physical spinning disks and movable read–write heads used in hard disk drives (HDDs) and floppy disks.

SSD vs HDD

https://www.pcmag.com/news/ssd-vs-hdd-whats-the-difference

Compare various Filesystems for SSD

https://www.linux.org/threads/comparison-of-file-systems-for-an-ssd.28780/

Linux Block devices

https://www.dell.com/support/kbdoc/en-us/000132092/ubuntu-linux-terms-for-your-hard-drive-and-devices-explained

/dev/sda9

/dev/ is the part in the Unix directory tree that contains all "device" files -- Unix traditionally treats just about everything you can access as a file to read from or write to.

sd originally identified a SCSI device, but since the proliferation of USB (and other removable) data carriers, it became a catch-all for any block device (another Unix term; in this context, anything capable of carrying data) that wasn't already accessible via IDE. When SATA came around, the developers figured it'd be much easier and much more convenient for everyone to add it into the existing framework rather than write a whole new framework.

/dev/sda9 means the ninth partition on the first drive.

Linux Logical Volume Manager (LVM)

LVM makes it easy to manage disk space. Especially when it comes to resizing partitions and adding another hard drive to the system. LVM does not have redundancy built-in.

https://opensource.com/business/16/9/linux-users-guide-lvm

GCP Persistent Disk

Persistent Disks are reliable and high-performance block storage for virtual machine instances.

It stores data redundantly to ensure data integrity. Persistent disk performance scales automatically with size, so you can resize your existing persistent disks or add more persistent disks to an instance to meet your performance and storage space requirements.

GCP Local SSD

Local SSD is physically attached to the server that hosts your VM instance. This tight coupling offers superior performance, very high input/output operations per second (IOPS), and very low latency compared to other block storage options. Local SSDs are designed for temporary storage use cases such as caches or scratch processing space. Which makes them suitable for workloads like media rendering, data analytics, or high-performance computing.

https://cloud.google.com/blog/products/storage-data-transfer/n2-vms-run-low-latency-io-intensive-workloads-with-9tb-ssd

GCSFUSE

https://cloud.google.com/storage/docs/gcs-fuse

https://github.com/GoogleCloudPlatform/gcsfuse

Storage Protocols

https://www.sciencedirect.com/topics/computer-science/storage-protocol

NFS

Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems.

Filestore

https://cloud.google.com/filestore

NAS

Network-attached storage (NAS) is a file-level (as opposed to block-level storage) computer data storage server connected to a computer network providing data access to a heterogeneous group of clients. NAS is specialized for serving files either by its hardware, software, or configuration. It is often manufactured as a computer appliance – a purpose-built specialized computer.

Google Filestore uses NAS.

NFS Appliance

https://www.netapp.com/atg/publications/publications-file-system-design-for-an-nfs-file-server-appliance-20000024/

WAFL

https://www.netapp.com/atg/publications/publications-file-system-design-for-an-nfs-file-server-appliance-20000024/

SAN

A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. A SAN typically is a dedicated network of storage devices not accessible through the local area network (LAN).

https://www.snia.org/education/storage_networking_primer/san/what_san

Types of SAN

https://en.wikipedia.org/wiki/Storage_area_network#Network_protocols

Object Storage

Object storage is a flat structure in which files are broken into pieces and spread out among hardware. In object storage, the data is broken into discrete units called objects and is kept in a single repository, instead of being kept as files in folders or as blocks on servers.

Ceph

Ceph implements object, block and file storage.

Google Compute Engine Disk Options

By default, each Compute Engine instance has a single boot persistent disk (PD) that contains the operating system. When your apps require additional storage space, you can add one or more additional storage options to your instance.

Google Compute Engine Storage products

Overview and comparison of various Storage services and products on GCP.

Companies have a wide range of options to choose from when storing data and selecting a database in the cloud. Listen to an overview of data storage options, discuss why you would choose one over the other.

Discuss how traditional storage models compare to the cloud counterparts.

Cloud Volume

https://cloud.google.com/architecture/partners/netapp-cloud-volumes/

Cloud Volume Service

https://cloud.netapp.com/cloud-volumes-service-for-gcp

Security

Various security precautions need to be taken to protect data security.

De-identification of storage content

https://cloud.google.com/blog/products/identity-security/announcing-easier-de-identification-of-google-cloud-storage-data

Healthcare data de-identification

https://cloud.google.com/healthcare-api/docs/concepts/de-identification

Data governance

From when the data is ingested to when it can be used for valuable insights and information, management and data governance should be considered with the utmost importance for any organization.

Encryption at Rest

Google Cloud encrypts all customer content stored at rest, without any action required from the customer, using one or more encryption mechanisms.

Disaster Recovery and Backup

Filestore Enterprise

https://cloud.google.com/blog/products/storage-data-transfer/google-cloud-announces-filestore-enterprise-for-business-critical-apps

Backup for GKE

https://cloud.google.com/blog/products/storage-data-transfer/google-cloud-launches-backups-for-gke

GCS and rsync

https://cloud.google.com/storage/docs/gsutil/commands/rsync

Rclone

Similar to gsutil rsync Rclone ("rsync for cloud storage") is a command line program to sync files and directories to and from different cloud storage providers.

GCS and Rclone

https://rclone.org/googlecloudstorage/

GCS sync using Cloud Scheduler and Cloud Run

https://github.com/salrashid123/gcp_rclone

Case study ACME Corp

https://cloud.google.com/blog/products/storage-data-transfer/dr-in-google-cloud-with-vmware-engine-actifio-and-zerto

Secure Tertiary Data Backup (STDB)

https://cloud.google.com/blog/topics/financial-services/stdb-on-google-cloud

Folders vs. buckets and generation gap

https://futurism.com/the-byte/gen-z-kids-file-systems

Backblaze Storage Pods

https://www.backblaze.com/b2/storage-pod.html