Multi cloud - Velkurkirankumar/Internship-Projects GitHub Wiki
Backup and Archival
- Backup is taken for recovering from failures quickly whereas archiving is done to recover from disasters & it will take more time.
- Cloud is all about using some one elseβs hardware
- AWS and Azure offer Backup and Archival solutions
What are different storage needs for organization
- Disk Storage: A disk typically is attached to a server
- Network Storage: A disk that is typically mounted on multiple systems with in a network
- Blob Storage: This storage refers to accessing the data over internet with urls
Disks
- A disk typically needs a filesystem to be identified by OS. Windows supports NTFS and linux has many filesystems (ext4, xfs,..)
- In window a disk can be partitioned and drives (C, D..) can be created
- In linux, a disk can be partitioned and mounted
-
Hardware Types: Disks have the following hardware types
- Magnetic Disks
- HDD (Hard disk Drive)
- SSD (Solid State Drives)
- To measure the harddisk performance we have two units
- IOPS (I/O per second):
- Throughput
- 1 kibi (kib) = 1024 bytes
- 1 kb = 1000 bytes
- When you create storage in cloud we get (GiB, MiB, KiB, TiB) options Storage Migration is of two Types
Online:
- Agent Based
- Dedicated Hardware (Regular Data Transfers)
Offline: (One-time)
- Azure Data Box
Explore Virtualization and findout what vmware, kvm and Hyper-V are ?
- Virtualization is a technology that allows multiple virtual machines (VMs) to run on a single physical hardware system by creating an abstraction layer over the hardware. This enables more efficient use of resources and is foundational for cloud computing. Three prominent virtualization technologies are VMware, KVM (Kernel-based Virtual Machine), and Hyper-V.
VMware
- Overview: VMware is a leading provider of virtualization and cloud computing software, founded in 1998 and now a subsidiary of Dell Technologies. Its primary product is the VMware hypervisor, which allows multiple VMs to operate on a single physical server, each capable of running its own operating system.
KVM (Kernel-based Virtual Machine)
- Overview: KVM is an open-source virtualization technology integrated into the Linux kernel. It transforms Linux into a Type 1 hypervisor, allowing it to host multiple isolated VMs.
Hyper-V
- Overview: Hyper-V is Microsoftβs hardware virtualization product that allows users to create and manage VMs. It is included in Windows Server and some client versions of Windows.
- Storage Services offered by AWS
- In addition to the above services we also have EBS (Elastic Block Storage) which is used by virtual machines (EC2 instances)
- Storage Services offered by Azure
- Disk Storage (Block Storage)
- File Shares (Network Storage)
- Blob Storage
- Hybrid Storage Solutions
- CDN
- Managing Storage
- Managing Backups & Archivals
- Full Backups
- Incremental Backups
- This is copy of a disk at a particular time
- Snapshots can be implemented using
- Terms
- Latency
- Region
- Availability Zone
- Edge Locations
- PoP locations
- 5G Zone
- Local Zones (AWS)
- This is a file storage where
- you need not deal with filesystems
- you can access the files over http(s)
- you can use this for any file types
- you can treat it is as unlimited storage.
- individual file size restrictions are applicable (5TB)
Why do we need this?
- Google Drive, One Drive, iCloud all of them are internally blob storages.
- All streaming/ott platforms uses blob storages
- Data lakes on Blob storage
How cloudβs charge for Blob Storage ? Google Cloud Platform training
- Size
- Data transfer costs
To fine tune costs we have pricing models
-
Frequently accessed data:
- Storage cost will be high
- Data transfer cost will be least
-
In-Frequently accessed data:
- Storage cost will be less
- Data transfer cost will be high
-
Archival Storage:
- Storage cost will be least
- Data transfer (accessing) might not be allowed
-
Blob Storages have options to enable them as Data Lakes.
-
Blob Storages will have integration with Content Delivery Networks (CDN)
-
Blob Storages can serve web applications (Static Webpages β HTML CSS, JavaScript)
- Region: This is geographical area which has Availability Zones (AZ)
- AZ: This a location/site with in region where AWS hosts datacenters
- Simple Storage Service (S3) is a service offered by AWS for Blob Storages.
- To use S3 we need to create a bucket (S3 Bucket)
- A Bucket will have a unique name across AWS.
- We can consider Bucket as unlimited storage with a restriction of individual file size cannot be greater than 5 TB
- Bucket will have permissions of who can access the data
- Buckets support versioning.
- Bucket: Bucket can have objects or folders or both
- Object: This represents file
- Permission
- I will be creating s3 bucket in Hyderabad region with public read access enabled
- While creating bucket ensure ACLβs are enabled, and unselect disable public access
- Navigate into s3 bucket
Upload any file (png or mp3 or mp4 or pdf)
- S3 buckets have to be empty before delete
- Note: Watch classroom video for steps
- Create an s3 bucket in any region
- upload 3 files
- Ensure you can access over http(s)
- Delete the buckets
-
When we upload data into s3 bucket, it creates multiple copies and stores it in multiple locations
-
This is done to address two major factors
- Durability: Chance of data getting lost or corrupted.
- Availability: How much data is available for access.
-
Frequently accessed Data
-
Standard: - This is default storage class. As part of free tier accounts we can use 5GB of Standard Storage for free. - Availability: 99.99% - Durability: 99.99999999999 (eleven 9s) - High storage cost and less access cost
-
S3 Express One Zone
-
Reduced Redundancy Storage
-
Infrequently accessed Data
- S3 Standard-IA
- S3 One Zone-IA
-
Rarely accessed Data
- S3 Glacier Instant Retrieval (GLACIER_IR)
- S3 Glacier Flexible Retrieval (GLACIER)
- S3 Glacier Deep Archive (DEEP_ARCHIVE)
Storage Class | Use Cases | Durability | Availability | Latency | Minimum Storage Duration | Retrieval Charges |
---|---|---|---|---|---|---|
S3 Standard | Frequently accessed data (e.g., websites, big data) | 99.999999999% (11 nines) | 99.99% | Milliseconds | None | None |
S3 Intelligent-Tiering | Data with unknown or changing access patterns | 99.999999999% (11 nines) | 99.9% | Milliseconds | None | None |
S3 Standard-IA | Infrequently accessed data needing rapid access | 99.999999999% (11 nines) | 99.9% | Milliseconds | 30 days | Per GB retrieved |
S3 One Zone-IA | Re-creatable infrequently accessed data stored in a single zone | 99.999999999% (11 nines) | 99.5% | Milliseconds | 30 days | Per GB retrieved |
S3 Glacier Instant Retrieval | Archive data that needs immediate access | 99.999999999% (11 nines) | 99.9% | Milliseconds | 90 days | Per GB retrieved |
S3 Glacier Flexible Retrieval | Rarely accessed long-term archive data | 99.999999999% (11 nines) | 99.9% | Minutes to hours | 90 days | Per GB retrieved |
S3 Glacier Deep Archive | Long-term archive data accessed once or twice per year | 99.999999999% (11 nines) | 99.9% | Hours | 180 days | Per GB retrieved |
- Intelligent Tiering: In this case depending on your access patterns, storage class is automatically chosen.
- Create an s3 bucket with acls enabled is Block public access is unchecked
- Upload any file and add public-read access
- Navigate to the properties section
- Once the file is uploaded
- AWS Offers lifecycle tranistion rules
- If the access patterns are not predictable then use intelligent tiering.
- Note: For screenshots watch recording
- S3 buckets versioning can be enabled and suspended
- Navigate to the Bucket -> Properties -> Bucket Versioning -> Enable
- Generally it is recommended to enable versioning, we can write a lifecycle to delete/move to cheaper storages older versions
- The buckets which we have created so far are called as General Purpose Buckets.
- Each blobs name with folder path is called as prefix
- S3 charges for storage and access,
- I can have s3 bucket and enable requester pays i.e. storage cost will be paid by me and request cost will be paid by requester (who is generally other aws account)
- Since s3 is capable of urls, we can configure s3 to host a website
- A static website referes to a website developed in html, css and javascript
- Upload website from here
- These are two new types of buckets added.
- Directory Buckets: They use file paths rather than prefixes, This is useful for data lakes
- Table Buckets: This is based on Apache Iceberg to store the data in tabular formats for quick retireval
- They are used for storing semistructure data in large volume Data Pipelines.
- When the data is added to one bucket either all content or some selected content based on prefixes can be synced to other buckets in other regions
- Navigate to Management -> Replication Rules
-
Azure Storage Account is a service offered by Azure which handles
- blob storage
- Table Storage
- File Shares
- Queue Storage
-
In Azure Blob Storage has 3 types
- Page Blob: Hard disks
- Block Blob: blobs (like s3 objects)
- Append Blod: for log files
-
Azure Global Infra:
- Azure has two types of regions
- regions
- regions with zones
- Azure has two types of regions
- Resources can be create only in Resource Groups
- Creating storage account
- Pattern for storage account blob url
https://<storage-acc-name>.blob.core.windows.net/<container>/<blob-path>
https://qtstorageaccount27march.blob.core.windows.net/documents/1.txt
- Enable versions in Azure Storage Account
- Host a static website
- Lifecycle Transitions/Lifecycle policy: Move from one Archive tier to another based on rules
- ReHydrate
- A virtual disk in a cloud caters to one virtual machine generally.
- Cloud offerings generally charge for hardware utilization.
- If the disk and virtual machine are created from same physical server
- disk will have temporary/ephemeral storage i.e. shutting down the vm will erase data
- If the disk and virtual machine are created from different physical servers
- disks will have persistent/non ephemeral storage i.e. shuttidng down the vm will not erase data
- Non ephemeral/persistent disks are called as EBS Volumes
- ephemeral/temporary disk are called as instance storages.
- Non ephemeral/persistent disks are called as Managed Disks
- ephemeral/temporary disksts are called as Temp/Local Disks.
- In both clouds the os disk (root volume) has to be persistent
- No cloud allows you to reduce disk size, they support only increasing the sizes
- VM Size has impacts on disk performance
- The compressed copy of the disk contents is called as snapshot (Backup of a disk)
- Snapshots are generated either incrementally or as part of full backups
- AWS has two types of disks
-
EBS Volumes:
- Default storage type for all root volume (disk with OS in it)
- This is supported in all ec2 instance types.
- Size is flexible
- Max number of EBS Volumes have nothing to do with instance types
-
Instance Storage:
- This is supported only by few instance types
- Sizes are fixed
- Max number of instance stores are defined in instance type.
-
- ssd
- hdd
- magnetic
- Azure has two types of disk
- Persistent
- OS Disk: Persistent and is supposed to be boot disk
- Data Disk: Additional persistent disks, Number of Data disks also depends on VM Size.
- Temporary
- Local Disk: Is more common is Azure
- This is temporary disk
- Size of Local Disk is fixed and is dependent on VM Size
- Local Disk: Is more common is Azure
- Disk Hardware types supported by Azure
- ssd
- hdd
Cloud + DevOps
Engineer (0-2)
CI/CD (Git, GitHubActions/AzureDevOps/Jenkins)
Terraform
Containers
Docker
Kubernetes
Python (scripting)
Bash
Cloud:
Compute (VM)
Networking
Storage
CI/CD (Git, GitHubActions/AzureDevOps/Jenkins)
Terraform
Containers
Docker
Kubernetes (RBAC, ARGO CD, Istio)
Python API, Serverless (AWS Lambda)
Bash
Cloud: (Add Automation with Terraform/Cloud Formation/Azure Bicep)
Compute (VM, App Services, Serverless)
Networking (LB, VPN Networking)
Storage (Access)
Observability
Design (Case studies)
CI/CD (Git, GitHubActions/AzureDevOps/Jenkins)
Terraform
Containers
Docker
Kubernetes (RBAC, ARGO CD, Istio)
Python API, Serverless (AWS Lambda)
Bash
Cloud: (Add Automation with Terraform/Cloud Formation/Azure Bicep)
Compute (VM, App Services, Serverless)
Networking (LB, VPN Networking)
Storage (Access)
Observability
- Activities:
- Creating the disks and attaching the disks to vms (vm disks)
- Creating the network disks and attaching the disks to vms
- Third party disk
- File systems
- Performance and sizing
- Increasing disk sizes and making it usable in Virtual machines
- Backups and backup retentions
- Replicating or recreating disks in other regions
- Disk Encryptions
- Create an ec2 instance with ubuntu with only root disk
- In AWS, the disk and ec2 instance should belong to the same zone,
- Lets create a disk of size 1 GB in same zone as ec2
- Now Attach the disk the ec2 instance
- To effective deal with mounting, go through following linux topics
- lsblk
- mkfs
- mount
- fstab
- disk partitions
- extending partition
- General purpose 2
- General Puprose 3
- Provisioned IOPS io1
- Provisioned IOPS io2
- Cold HDD
- Throughput optimized HDD
- Magnetic
Here is a comparison of the various AWS EBS volume types with respect to IOPS, Throughput, Minimum Size, and Maximum Size:
Volume Type | IOPS | Throughput | Min Size | Max Size |
---|---|---|---|---|
General Purpose SSD (gp2) | 100 to 16,000 IOPS (3 IOPS per GiB, burst up to 3,000 IOPS for smaller volumes) | Up to 250 MiB/s (burst) | 1 GiB | 16 TiB |
General Purpose SSD (gp3) | Baseline of 3,000 IOPS, provisionable up to 16,000 IOPS | Baseline of 125 MiB/s, provisionable up to 1,000 MiB/s | 1 GiB | 16 TiB |
Provisioned IOPS SSD (io1) | Up to 256,000 IOPS depending on instance type and size | Up to 4,000 MiB/s depending on configuration | 4 GiB | 16 TiB |
Provisioned IOPS SSD (io2) | Up to 256,000 IOPS depending on instance type and size | Up to 4,000 MiB/s depending on configuration | 4 GiB | 16 TiB |
Throughput Optimized HDD (st1) | Baseline: 40 MiB/s per TiB; Burst: up to 500 MiB/s | Burst throughput: up to 500 MiB/s; baseline scales with volume size | 125 GiB | 16 TiB |
Cold HDD (sc1) | Baseline: lower than st1; Burst: up to 250 MiB/s | Burst throughput: up to 250 MiB/s; baseline scales with volume size | 125 GiB | 16 TiB |
Magnetic (Standard) | Very low and inconsistent performance | Limited throughput; not recommended for new workloads | N/A | Up to 3 TiB |
- SSD-backed volumes (gp2, gp3, io1, io2) are optimized for high IOPS and consistent performance. They are ideal for transactional workloads.
- HDD-backed volumes (st1, sc1) are optimized for throughput-intensive workloads. They perform best with large sequential I/O operations.
- Magnetic disks are legacy storage options and are not recommended for new workloads due to their limited performance capabilities.
- Lets create an additional disk of 2 GB to an windows server
- This additional disk will be partioned into two volumes of 1 GB each
- Now lets increase the disk size to 3 GB and increase the size of first partition to 2 GB
- Watch classroom recording
- In Linux, we have disks
- Disk can be partitioned (option)
- Partitioned disks can be mounted to a folder
- Refer Here for steps to partition and mount disks
- Attaching disks to AWS EC2 Linux Instance (Ubuntu)
- Create Partitions
- Extend partitions
- Repeat the similar activity on Azure
- Watch Classroom Recording.
- In Cloud, We use a term called as snapshot to refer a backup of disk
- Creating a disk backup manually in AWS
-
AWS Snapshots are incremental in nature
-
Creating a disk backup manually in Azure
-
Azure gives incremental as well as full backup options
-
Using Snapshots
-
we can create new disks with the same content.
-
If the snapshot has OS in it we can create AMI/Azure VM Image
-
Copy the snapshot to other regions
-
Share the snapshot with other accounts.
-
Automating Snapshot creation:
- Use Snapshot policies
- Azure Backup/AWS Backup
- Overview
- Mounting happens over the network
- Here is a comparison of different network storage options in tabular format, covering their types, features, and use cases:
Type | Description | Key Features | Ideal Use Cases |
---|---|---|---|
Direct Attached Storage (DAS) | Storage directly attached to a server or workstation. | β Simple setup β Low cost β Limited scalability β No network sharing |
Small networks, single-server environments |
Network Attached Storage (NAS) | Dedicated storage device connected to a network, providing file-based data access. | β Centralized storage β Easy file sharing β Supports NFS/SMB protocols β RAID for redundancy |
Small to medium businesses, file sharing |
Storage Area Network (SAN) | High-speed network that provides block-level storage access to multiple servers. | β High performance β Scalable β Uses Fibre Channel or iSCSI β Suitable for mission-critical apps |
Large enterprises, data centers, high-speed needs |
Cloud Storage | Storage solutions offered by cloud providers like AWS, Azure, and Google Cloud. | β Pay-as-you-go model β Scalable on demand β Accessible globally β Managed services available |
Backup, disaster recovery, remote collaboration |
- Examples of Products for Each Type
Type | Description | Key Features | Ideal Use Cases | Examples |
---|---|---|---|---|
Direct Attached Storage (DAS) | Storage directly attached to a server or workstation. | β Simple setup β Low cost β Limited scalability β No network sharing |
Small networks, single-server environments | JBOD (Just a Bunch of Disks), RAID arrays |
Network Attached Storage (NAS) | Dedicated storage device connected to a network, providing file-based data access. | β Centralized storage β Easy file sharing β Supports NFS/SMB protocols β RAID for redundancy |
Small to medium businesses, file sharing | Synology DiskStation DS1522+, QNAP TS-233-US, TerraMaster F4-423 |
Storage Area Network (SAN) | High-speed network that provides block-level storage access to multiple servers. | β High performance β Scalable β Uses Fibre Channel or iSCSI β Suitable for mission-critical apps |
Large enterprises, data centers, high-speed needs | Dell EMC PowerMax, HPE Primera, IBM FlashSystem |
Cloud Storage | Storage solutions offered by cloud providers like AWS, Azure, and Google Cloud. | β Pay-as-you-go model β Scalable on demand β Accessible globally β Managed services available |
Backup, disaster recovery, remote collaboration | Amazon S3, Google Filestore, Microsoft Azure Blob Storage |
- EFS (Elastic File Share)
- FSx (File Share x)
- EFS Overview
- Create two ubuntu systems in two different zones
- Now create an EFS as discussed in the class
- Refer Here for instructions on how to mount
- On ubuntu install nfs client
sudo apt update
sudo apt-get -y install nfs-common
- Create a folder called as /projects
- Now mount using instructions
-
We have mounted to /projects
sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-07760194a1be46edf.efs.ap-south-1.amazonaws.com:/ /projects
-
Now execute df -h
- Note: Watch classroom video for steps
-
- I will create a network and security group Exercise: Repeat the exact same with Redhat Instances
- Refer Here for Azure file share
- Creating file shares
- Azure File Share Tiers
- Watch the clasroom video for mounting fileshares on linux machines
- AWS Offers a service called as fsX which is a file share supporting third party vendors
- Using EFS with 100 TB with different options
- Using FSx with 100 TB with different options
- USing Azure File share with 100 TB
- Usign Azure Netapp File with 100 TB
- Note: Shared Disk and Multi-Attach EBS are used primarily for systems to quickly recover from failures i.e. Failover kind of Scenarios.
- Database as a service is offered by Cloud Providers
- Watch classroom video to understand what database as a service is.
- In this category of Databases, Data is stored in a database, which is categorized into tables.
- Each Table will have columns which represent fields and rows which represent records
- Tables will have relationships between them
- We need DBMS to manage database, Softwares which offer them are
- Microsoft SQL Server
- Oracle
- mysql
- Postgres
- DB2
- AWS has a service called as RDS where the following database engines are offered as service
- SQL Server
- Oracle
- mysql/mariadb
- Postgres
- SQL Server
- Aurora:
- Postgres
- mysql
- Azure offers Database as a service for
- SQL Server
- mysql/maria
- postgres
- Overview
- RDS is created in the network (VPC) and requires at least two subnets provided to RDS (DB Subnet Group) while creation.
- RDS instances can be public or private
- RDS instances will have pricing similar to ec2 i.e. hourly billing and storage charges
- RDS instances support disks with autoscaling (Start from 20 GB and AWS will automatically increase disk size when needed)
- AWS Generally gives 3 ways of creating a database instance
- Single AZ
- Multi AZ
- Multi Az Cluster
- Single AZ mysql
- network: default
- subnet group: default
- access: public
- free tier:
- Watch classroom video for screen shots
- Terms to understand
- Replications
- Failover
- Exercise: Try creating
- Single Database with mysql in Azure
- Single Database with Postgres in AWS
- Overview
- We will an RDS instance which is primary instance, we can add a read replica which helps in
- distributing the load
- Reducing downtime
- Overview
- We will have master and standby instance and failover is automatic
- Overview
- Amazon offers aurora databases for mysql and postgres
- Amazonβs claim Aurora delivers up to 5x the performance of MySQL and 3x that of PostgreSQL without requiring changes to most applications
- Azure offers Microsoft Server in 3 services
- Azure SQL
- Azure SQL VM
- Azure SQL Managed Instances
- Purchasing models
- DTU:
- Basic
- Standard
- Premium
- vCore:
- General purpose
- Hyperscale
- Business Critical
- DTU: