Compute - bobbae/gcp GitHub Wiki

https://en.wikipedia.org/wiki/Google_Compute_Engine

GCP compute stack contains, Google Compute Engine (GCE), Google Kubernetes Engine (formerly Container Engine) (GKE), Google App Engine (GAE) and Google Cloud Functions (GCF).

There are many hosting options that address different requirements.

Concepts

https://cloud.google.com/compute/docs/concepts

Best Practices

https://cloud.google.com/compute/docs/tutorials/robustsystems

Using GCE

Google Compute Engine is computing and hosting service that lets you create and run virtual machines on Google infrastructure

Machine Types

A machine type is a set of virtualized hardware resources available to a virtual machine (VM) instance, including the system memory size, virtual CPU (vCPU) count, and persistent disk limits. You must choose a machine type when you create an instance. You can select from a number of predefined machine types in each machine type family. If the predefined machine types do not meet your needs, you can create your own custom machine types. To compare machine type performance, see CPU platforms, GPU platforms and accelerator-optimized machine family.

TPU

Tensor Processing Units (TPUs) are Google’s custom-developed application-specific integrated circuits (ASICs) used to accelerate machine learning workloads.

From Data Center Professional Point of view

Data centers use server virtualization, in which multiple virtual Unix/Linux or Windows servers can be run on a single physical machine. These virtual machines are created, provisioned, and managed through a software suite such as VMware vSphere or XenCenter.

Compute Engine uses this model as well, combining server virtualization and related management tools into an integrated suite. On Compute Engine, virtual machines are called virtual machine (VM) instances.

Machine Images

https://cloud.google.com/compute/docs/machine-images

Quickstart using a Linux VM

https://cloud.google.com/compute/docs/quickstart-linux

VM Metadata

https://cloud.google.com/compute/docs/metadata/overview

How to create a VM on GCP

https://www.youtube.com/watch?v=1FpMe8na64A

https://cloud.google.com/compute/docs/instances/create-start-instance

Backing up persistent disks using snapshots

https://cloud.google.com/compute/docs/disks/create-snapshots

Create and Manage custom images

https://cloud.google.com/compute/docs/images/create-delete-deprecate-private-images

Snapshots and Images

https://diana-moraa.medium.com/snapshots-and-images-in-google-cloud-platform-406b23224e9f

Instance Templates

https://cloud.google.com/compute/docs/instance-templates/create-instance-templates

Audit Logging

https://cloud.google.com/compute/docs/logging/audit-logging

Naming Resources

https://cloud.google.com/compute/docs/naming-resources

Labels

https://cloud.google.com/compute/docs/labeling-resources

Managing Access to Resources

https://cloud.google.com/compute/docs/access/managing-access-to-resources

Auto shutdown

https://cloud.google.com/compute/docs/shutdownscript

https://medium.com/geekculture/stop-burning-money-by-leaving-your-vms-on-add-an-auto-shutdown-script-4b3e801fd249

Sole tenant nodes

Sole-tenancy lets you have exclusive access to a sole-tenant node, which is a physical Compute Engine server that is dedicated to hosting only your project's VMs. Use sole-tenant nodes to keep your VMs physically separated from VMs in other projects, or to group your VMs together on the same host hardware.

Preemptible VM

A preemptible instance is an instance you can create and run at a much lower price than normal instances. However, Compute Engine might stop (preempt) these instances if it requires access to those resources for other tasks. Preemptible instances will always stop after 24 hours.

Reservations

Reservations provide a very high level of assurance in obtaining capacity for Compute Engine zonal resources. For example, use reservations to help ensure that your project has resources for future increases in demand, including: planned or unplanned spikes, migrating a large number of virtual machine (VM) instances, backup and disaster recovery, or planned growth and buffer.

Creating machine images

https://cloud.google.com/compute/docs/machine-images/create-machine-images

Adding & Resizing Persistent Disks

https://cloud.google.com/compute/docs/disks/add-persistent-disk

Extending Persistent Disks

https://wizzard-harshit.medium.com/extend-disk-on-gcp-in-e8cdd1d0fe34

Creating customized boot disks

https://cloud.google.com/compute/docs/disks/create-root-persistent-disks

Detach and reattach boot disks

https://cloud.google.com/compute/docs/disks/detach-reattach-boot-disk

Importing and Exporting VM images

You can share virtual machine (VM) instances, virtual disk files, and machine images from other cloud environments or from your on-premises environment by importing and exporting images from Cloud Storage.

Dynamic Resource Management

https://cloud.google.com/compute/docs/dynamic-resource-management

VM Instance life cycle

https://cloud.google.com/compute/docs/instances/instance-life-cycle

Load balancing and scaling

https://cloud.google.com/compute/docs/load-balancing-and-autoscaling

Containers on compute engine

https://cloud.google.com/compute/docs/containers

GCP HPC Toolkit

https://cloud.google.com/blog/products/compute/new-google-cloud-hpc-toolkit

Slurm-GCP

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.

https://cloud.google.com/compute/docs/instances/create-intel-select-solution-hpc-clusters#create_intel_select_solution_verified_clusters_using_slurm-gcp

https://cloud.google.com/blog/products/compute/introducing-the-latest-slurm-on-google-cloud-scripts

https://github.com/SchedMD/slurm-gcp

Run the WRF Weather Forecasting Model with Fluid Numerics' Slurm-GCP

https://codelabs.developers.google.com/codelabs/wrf-on-slurm-gcp#0

Public & Custom Images

https://cloud.google.com/compute/docs/images/

https://cloud.google.com/compute/docs/autoscaler/predictive-autoscaling

Instance Groups

A managed instance group (MIG) is a group of virtual machine (VM) instances that you control as a single entity. MIGs support features such as autohealing, load balancing, autoscaling, auto-updating, and stateful workloads.

Predictive Autoscaling

https://cloud.google.com/blog/products/compute/introducing-compute-engine-predictive-autoscaling

Regional MIGs

You can create regional MIGs or zonal MIGs. Regional MIGs provide higher availability compared to zonal MIGs because the instances in a regional MIG are spread across multiple zones in a single region. This document provides information about creating either zonal or regional MIGs. However, regional MIGs have additional options and considerations.

Instance Templates & Groups

https://cloud.google.com/compute/docs/instance-templates

Compute Engine AIM roles and permissions

https://cloud.google.com/compute/docs/access/iam

Operating System Details

https://cloud.google.com/compute/docs/images/os-details

Confidential Computing with N2D and C2D VMs

https://cloud.google.com/blog/products/identity-security/introducing-confidential-computing-with-n2d-and-c2d-vms

Securely connecting to VMs

https://cloud.google.com/solutions/connecting-securely

SSH connections to Linux VMs

https://cloud.google.com/compute/docs/instances/ssh

Managing SSH keys

https://cloud.google.com/compute/docs/instances/adding-removing-ssh-keys

SSH protocol and public key authentication method

https://www.ssh.com/academy/ssh/public-key-authentication

VPC firewall rules for VMs

https://cloud.google.com/vpc/docs/special-configurations

Firewall Rules

Each firewall rule applies to incoming (ingress) or outgoing (egress) connection, not both. Firewall rules only support IPv4 connections. Each firewall rule's action is either allow or deny . When you create a firewall rule, you must select a VPC network.

Using Network tags

https://cloud.google.com/vpc/docs/add-remove-network-tags

Creating snapshots

https://cloud.google.com/compute/docs/disks/create-snapshots

Snapshot Best Practices

https://cloud.google.com/compute/docs/disks/snapshot-best-practices

Linux Command Cheat Sheet

https://www.linuxtrainingacademy.com/linux-commands-cheat-sheet/

GPUs and TPUs

TPU Types and Zones

Tensor Processing Units (TPUs) are Google’s custom-developed application-specific integrated circuits (ASICs) used to accelerate machine learning workloads. TPUs are designed from the ground up with the benefit of Google’s deep experience and leadership in machine learning.

Available Zones:

https://cloud.google.com/tpu/docs/types-zones

GPU

Compute Engine provides graphics processing units (GPUs) that you can add to your virtual machine (VM) instances. You can use these GPUs to accelerate specific workloads on your instances such as machine learning and data processing.

Compute Engine provides NVIDIA® GPUs for your instances in passthrough mode so that your virtual machine instances have direct control over the GPUs and their associated memory.

GPU regions and zones

https://cloud.google.com/compute/docs/gpus/gpu-regions-zones

Creating VMs with attached GPUs

https://cloud.google.com/compute/docs/gpus/create-vm-with-gpus

Running TensorFlow inference workloads with TensorRT5 and NVIDIA T4 GPU

https://cloud.google.com/compute/docs/tutorials/ml-inference-t4

Windows VM Instances

Creating Windows Server Failover Clustering

https://cloud.google.com/compute/docs/tutorials/running-windows-server-failover-clustering

Creating Disaster Recovery Plan for SQL Server

https://cloud.google.com/solutions/sql-server-disaster-recovery-plan-compute-engine

Performing an automated in-place upgrade

https://cloud.google.com/compute/docs/tutorials/performing-an-automated-in-place-upgrade-windows-server

Mainframe

Micro Focus Enterprise Server

https://cloud.google.com/blog/topics/partners/micro-focus-enterprise-server-google-cloud-blueprint

Where to run my code? Deciding between GCE, GKE, App Engine

https://www.youtube.com/watch?v=2tLXKCgqwLY

VM Manager

VM Manager helps drive efficiency through automation and reduces the operational burden of maintaining these VM fleets running Windows and Linux on Compute Engine.

VM Manager supports projects in VPC Service Controls service perimeters.

OS Patch Management

https://blog.searce.com/patching-gce-vms-using-gcp-vm-manager-os-patch-management-a27eba7d356f

GCE Networking

IP Addresses

https://cloud.google.com/compute/docs/ip-addresses

Internal DNS

https://cloud.google.com/compute/docs/internal-dns

Committed Use discounts

https://cloud.google.com/compute/docs/instances/committed-use-discounts-overview

https://cloud.google.com/blog/products/compute/save-money-with-the-new-compute-engine-flexible-cuds/

Auto Healing

https://cloud.google.com/compute/docs/tutorials/high-availability-autohealing

Data Center hardware

Open Compute Project

Open source hardware design for data centers.

https://www.opencompute.org/products

Computer servers

A Computer server is a piece of computer hardware or software (computer program) that provides functionality for other programs or devices, called "clients". This architecture is called the client–server model.