5. Monitor & Backup Azure resources - GlennVandenborre/AZ-104-Azure-Administrators GitHub Wiki

5.1 File and folder backups

5.1.1 Azure Backup

Azure Backup

Azure Backup is the Azure-based service you can use to back up (or protect) and restore your data in the Microsoft cloud. It offers multiple components that you download and deploy on the appropriate computer, server, or in the cloud. The component, or agent, that you deploy depends on what you want to protect. All Azure Backup components (no matter whether you're protecting data on-premises or in the cloud) can be used to back up data to a Recovery Services vault in Azure.

Benefits of Azure Backup

Benefit Description
Offload on-prem backup Backup on-prem environment without deploying other backup solutions
Backup Azure VMs Backup optimized and easily restore resources
Get unlimited data transfer No limit and no charge for inbound/outbound data
Keep data secure Data encryption key/phrase stored locally and not in the cloud
Get app-consistent backups Recovery point of all required data to restore the backup copy
Retain short and long-term data Azure Recovery Services vault used for short-term or long-term data retention
Automatic storage management No costs for implementing on-prem storage devices
Multiple storage options Replication and high availability (LRS/GRS)

5.1.2 Backup Center for Azure Backup

Backup Center

  • Range of capabilities
  • Data source-centric management
  • Connected experiences
  • Supported scenarios

5.1.3 Azure Recovery Services vault backup

Recovery Services Vault

The Recovery Services vault is a storage entity in Azure that stores data. Recovery Services vaults make it easy to organize your backup data, while minimizing management overhead.

  • Backup Azure Files file share or on-prem files and folders.
  • Various Azure Services: VM's Azure SQL,...
  • Supports System Center Data Protection Manager, Windows Server, Azure Backup Server,...

Azure Backup storage replication

  • No configuration for Azure Files file shares (snapshot-based).
  • 3 storage replications: GRS, LRS and ZRS.
  • Enable Cross Region Restore: restore data in secondary Azure paired region.

5.1.4 Microsoft Azure Recovery Service (MARS) agent

MARS Agent

The MARS agent is used to backup files, folders and system data from your on-premises machines and Azure VMs. The MARS agent is a full-featured agent that offers many benefits for both backing up and restoring your data.

  • MARS agent need to be installed on Windows Client/Server for backup of files and folders.
  • Backup data where MARS agent is installed.
  • Backup files and folders on Windows VM's or physical machines (on-prem or Azure).
  • No separate backup server needed for MARS agent.
  • MARS agent restores files and folders from backups or volume-level restore. It is not application-aware.

5.1.5 On-premises file and folder backups

MARS Agent for Azure Backup

  • Create Recovery Services vault.
  • Download MARS agent and credentials file.
  • Install and register MARS agent.
  • Configure backups.

5.2 VM Backups

5.2.1 Protect VM data

Backup options for VMs

Azure Backup option Configuration Description
Azure Backup Backup production VMs Snapshot that is used as a recovery point in GRS vaults, restore files/folders or entire VM
Azure Site Recovery Recover specific applications Protection against major issues or disasters
Azure managed disks - snapshot Backup VMs that use managed disks Read-only full copy of managed disk, full copy can be used to create a new managed disk
Azure managed disk - image Custom VHD in Azure Storage to create hundreds of VMs Managed custom image (OS & data disks)

Images vs snapshots

  • Images: managed custom image (OS + data disks), bulk create same VMs.
  • Snapshots: Copy of disk at point in time of snapshot. Only 1 disk.
  • Operating disk backups: snapshot or image of disk. Create VM from snapshot of disk.

5.2.2 VM snapshots in Azure Backup

Azure Backup job

An Azure Backup job creates a snapshot for your VM in two phases.

  • Take snapshot of VM data.
  • Transfer snapshot to Azure Recovery Services vault.

Snapshots and recovery points

  • Snapshots: retention of 2 days to reduce backup and restore times.
  • Default snapshot retention from 1 to max 5 days.
  • Incremental snapshots stored as Azure page Blobs (Azure Disks).
  • Recovery point only available if backup job has executed both phases.
  • Recovery points are labeled with recovery point type.
  • First snapshot identifies with the snapshot recovery point type.
  • Recovery point type changes after transfer to Azure Recovery Services vault.

5.2.3 Azure Recovery Services vault

What is Azure Recovery Services vault

An Azure Recovery Services vault is a storage entity in Azure that houses data. The data is typically copies of data, or configuration information for virtual machines, workloads, servers, or workstations. Organize backup data and minimize management overhead.

5.2.4 Backup VMs

5.2.5 Restore VMs

How to restore VMs

Recovery points store in your Recovery Services vault.

  • Select recovery points for your VM snapshots.
  • Azure creates backup jobs to track the restore operation.
  • Temporarily displays notifications about restore operation.
  • Track restore operation by monitoring job notifications.

5.2.6 System Center Data Protection Manager and Microsoft Azure Backup Server

System Center DPM and Azure Backup Server

  • Local disk for short-term storage.
  • Azure for online protection.
  • On-prem? instance must be located on-premises.
  • Azure VM? MABS instance must run on Azure VM.
  • Protection agent installed on every machine you want to protect.
  • Must be added to System Center DPM protection group.

Advantages

  • Optimized app-aware backups.
  • Simplified backups for on-prem machines.
  • Flexibility and scheduling.
  • Consolidated management.

5.2.7 MARS Agent vs Azure Backup Server

Component Benefits Limits Data protected Backups stored
MARS Agent Files & folders on VM or physical Windows machine, no separate backup server 3 backups/day, not application aware, file/folder/volume-level restore only, no Linux Files and folders Azure Recovery Services vault
Azure Backup Server App-aware snapshots, full flexibility, recovery granularity, Linux support on HyperV and VMware, backup and restore VMware VMs, no System Center license Need an active Azure subscription, no backups for Oracle, no support for tape backup Files, folders, volumes, VMs, applications and workloads Azure Recovery Services vault or locally attached disk

5.2.8 Soft deletion for VMs

Soft deletion

Azure Storage now offers the soft delete option for Azure Blob objects. You can more easily recover your data when it's modified or deleted. It protects backups of your virtual machines from unintended deletion and keeps the backups in soft delete state for 14 days.

  • Stop backup job.
  • Apply soft-delete state.
  • View soft-delete data in the vault.
  • Undelete backup items.
  • Restore items.
  • Resume backups.

5.2.9 Azure Site Recovery

What is Azure Site Recovery

Azure Site Recovery is a service that helps ensure business continuity by replicating workloads from a primary site to a secondary location. It enables failover from region A to region B.

  • Azure VM replication from region A to region B.
  • On-prem VMware VM replication, HyperV machines, physical servers (Windows and linux) and Azure Stack VMs to Azure.
  • AWS Windows instances replication to Azure.
  • VM's managed by System Center VMM to secondary site.

Features of Azure Site Recovery

Feature Description
Consolidated management Replication, failover and fallback management in 1 place
Reduced cost and complexity Replication to Azure eliminates cost of secondary datacenter
Replication resilience Resilience of Azure Storage, no interruption of app data
Continuous replication Continuous replication of Azure, VMware VMs
Snapshot recovery points Replication by using recovery points with app-consistent snapshots
Failover and easy fall back Planned failover with zero data loss, unplanned failover with minimum data loss
Integration Application network management in Azure, reserving UP addresses, load balancers, Azure Traffic Manager

5.3 Azure Monitor

5.3.1 Key capabilities of Azure Monitor

Azure Monitor

Azure Monitor provides you with a comprehensive solution for collecting, analyzing, and responding to telemetry data from your on-premises and cloud environments.

Features and capabilities in 3 areas

  • Monitor and visualize metrics.
  • Query and analyze logs.
  • Set up alerts and actions.

5.3.2 Components of Azure Monitor

Monitoring strategy

An effective monitoring strategy helps you understand the detailed operation of the components of your applications. Monitoring also helps you increase your uptime by proactively notifying you of critical issues.

Azure monitoring

Monitoring is the act of collecting and analyzing data. The data can be used to determine the performance, health, and availability of your business applications and the resources they depend on.

  • Monitoring categories: Core, Application, Infrastructure and Shared Capabilities.
  • Data stores: Azure Monitor Metrics & Azure Monitor Logs.
  • Various monitoring sources: Azure subscription and tenant, Service instances, Azure resources, ....

Azure Monitor Insights

Performs different functions with the collected data, including analysis, alerting, and streaming to external systems.

  • Get insights
  • Visualize
  • Analyze
  • Respond
  • Integrate

5.3.3 Metrics and logs

Metrics

Metrics are numerical values that describe some aspect of a system at a particular point in time. Metrics are lightweight and capable of supporting near real-time scenarios.

  • Metrics are collected and displayed on the Overview page of Azure resources.
  • View metrics on the metrics explorer in Azure Monitor.
  • View and use Metric charts interactively.

Logs

Logs contain different kinds of data organized into records with different sets of properties for each type. Data like events and traces are stored as logs along with performance data so all the data can be combined for analysis.

  • Log data stored in Log Analytics.
  • Rich query language (KQL) for retrieving, consolidating and analyzing collected data.
  • Create and test queries in Log Analytics, save queries, visualize data, create rule alerts.
  • Data Explorer query language for simple or advanced queries.

5.3.4 Monitoring data and tiers

Data Collection

  • Collecting data since creation of Azure subscription and add resources.
  • Creating or modifying resources are stored in Azure Monitor activity logs.
  • Performance data and amount of resources consumed stored in Azure Monitor metrics.
  • Add Azure Monitor Agent to compute resources and extend data collection by enabling diagnostics.
  • Azure Monitor Agent used for collecting logs and metrics of different data sources from Windows and Linux OS.
  • Collect data from REST clients using Data Collector API (custom monitoring).

Monitoring data tiers

Data Tier Description
Application Performance and functionality of application code
Guest OS Data about operating system (on-prem, Azure, other cloud)
Azure resource Azure resource utilization, including consumption details
Azure subscription Operation and management of Azure subscription, health and operation of Azure itself
Azure Tenant Tenant-level Azure services, MS Entra ID

5.3.5 Activity log events

Activity log

The Azure Monitor activity log is a subscription log that provides insight into subscription-level events that occur in Azure.

  • Understand status of resource operations and other properties.
  • What, who and when.
  • Activity logs kept for 90 days.
  • Query any range of dates in activity log (max 90 days).
  • Retrieve events from activity logs via Azure Portal, CLI, PowerShell and Azure Monitor REST API.

5.3.6 Query activity log

Activity log filters

  • Subscription
  • Timespan
  • Event severity
  • Resource group
  • Resource
  • Resource type
  • Operation name
  • Event initiated by
  • Text string in search box

Event categories

Event category Event Data Examples
Administrative Create, update, delete and action operations & changes to RBAC Creation VMs, deleting NSG
Service Health All Service health events SQL Azure in EAST US has downtime, scheduled maintenance complete
Resource health Resource health events VM health status changed to unavailable
Alert Activations of Azure alerts CPU% on VM1 over 80% for more then 3 mins
Auto scale Operation of auto scale engine Auto scale up action failed
Recommendation Recommendation events Recommendation about better utilizing your resources
Security Alerts generated by Defender for Cloud Suspicious double extension file executed
Policy Action operations by Azure Policy Audit and Deny

5.4 Azure alerts

5.4.1 Azure Monitor alerts

Alerting in Azure Monitor

  • Azure Monitor: capture telemetry data.
  • Create alerts.
  • Alert is alert rules consisting settings of resources, signals or telemetry, conditions to match.
  • Action groups with responsive steps.
  • Alert monitors telemetry and captures changes to resources.
  • Alert rule captures signal and check if condition criteria matches.
  • Alert triggers and triggers action groups after conditions are met.
  • Conditions and alerts triggered are evaluated separately.

Benefits of Azure alerts

Benefits Description
Improved notification system Action groups for newer alerts
Unified authoring experience Alert creation in 1 place, Azure Monitor, Log Analytics and Azure Application Insights
Combined view for Log Analytics alerts Monitor Log Analytics alerts for subscriptions, separate portal for Azure Monitor (Log Analytics)
Separation of active alerts and alert rules Separate operational and config views of alert, alert rules and actions
Better workflow Discover and define settings and conditions to trigger alerts

5.4.2 Alert management in Azure Monitor

Alert types

  • Metric alerts.
  • Log alerts.
  • Activity log events.
  • Smart detection alerts.

Alert states

  • New
  • Acknowledged: review
  • Closed

Alert state and Azure Monitor condition

  • Initial trigger of alert is NEW, local admin changes the alert state after.
  • Updates to Azure Monitor conditions, system makes the changes.
  • Azure Monitor condition changes to fired when alert triggers.
  • Issue for alerts clears, condition changes to resolved.

Stateless and stateful alerts

  • Stateless alerts: each time your alert rule condition matches your data, even if the same alert already exists.
  • Stateful alerts: doesn't trigger any more actions until the current alert rule conditions clear.

5.4.3 Creating alert rules

####Alert rules The alert rules consist of resources, action groups, and monitor conditions that represent the target and criteria for your alert operation.

  • Several key attributes: target resource, alert signal, rule criteria, issue severity, name and description.
  • Target resource defines scope and signals for your alert operation.
  • Target resource alerts is signal based on selected resource type (Metric, Activity log, Application Insight or Log).
  • Criteria for alert rule and applied to target resource.
  • Severity level for alert rule, from 0 to 4.
  • System invokes actions for alert rule, responsive steps.
  • New alert rule is default enabled, manually put on disabled if you don't want the alert rule to trigger.
  • Alert can only be triggered when alert rule is in enabled state.

5.4.4 Creating action groups

Action Groups

An action group is a collection of notification preferences that you define as an Azure subscription owner.

  • Multiple alerts can use same action group.
  • Notifications say how to notify when action group triggers.
  • Actions specify defined action when action group triggers.

Notifications

  • Email Azure Resource Manager Role.
  • Email/SMS message/Push/Voice.

Actions

  • Automation runbook
  • Azure Functions
  • ITSM
  • Logic Apps
  • Webhook

5.5 Log Analytics

5.5.1 Log Analytics

What is Log Analytics

Log Analytics is a tool for Azure Monitor. Edit and run log queries for the data collected in Azure Monitor Logs.

  • Query features and tools that help you answer virtually any question about your monitored configuration.
  • Supports Kusto Query Language (KQL).
  • Use Log Analytics to perform detailed analysis and problem solving.

5.5.2 Log Analytics Workspace

What is Log Analytics Workspace (LAW)

Azure stores the collected information in a Log Analytics workspace. It is the basic management environment for Azure Monitor Logs.

  • Name
  • Subscription
  • Resource Group
  • Region
  • Pricing

5.5.3 Kusto (KQL) queries

KQL

The KQL syntax helps you quickly and easily create simple or complex queries to retrieve and consolidate your monitoring data in the repository.

KQL Concepts

  • View table data in the Azure Monitor Logs repository.
  • Create simple and complex queries.
  • Filter and summarize search results.
  • Add visualizations for search results.

5.5.4 Structure Log Analytics queries

KQL Query structure

Each of your selected data sources and solution stores its data in dedicated tables in your Log Analytics workspace. Documentation for each data source and solution includes the name of the data type that it creates and a description of each of its properties. The basic structure of a query is a source table followed by a series of commands (referred to as operators). A query can have a chain of multiple operators to refine your data and perform advanced functions. Each operator in a query chain begins with a pipe character "|". Many queries require data from a single table only, but other queries can use various options and include data from multiple tables.


5.6 Network Watcher

5.6.1 Network Watcher

Azure Network Watcher

Azure Network Watcher provides a suite of tools to monitor, diagnose, view metrics and enable or disable logs for Azure IaaS resources. It enables you to monitor and repair the network health of IaaS services like VM's, Vnets, application gateways, load balancers, etc... 3 major sets of tools: Monitoring, Network diagnostics tools and Traffic.

Feature Tool Description
Topology Monitoring A visualization of your entire network to understand your network configuration. Interactive interface to view resources and their relationships in Azure across multiple subscriptions, resource groups and locations.
Connection Monitor Monitoring End-to-end connection monitoring for Azure and hybrid endpoints. Understand network performance between various endpoints in your network infrastructure.
IP flow verify Network diagnostic tools Detect traffic filtering issues at a VM level. Checks if a packet is allowed or denied to or from an IP address. Checks which security rule allowed or denied the traffic.
NSG diagnostics Network diagnostic tools Checks if a packet is allowed or denied to or from an IP address, IP prefix or a service tag. Understand network performance between various endpoints in your network infrastructure. Checks which security rule allowed or denied the traffic. Add a new security rule with a higher priority to allow or deny traffic.
Next hop Network diagnostic tools Detect routing issues. Checks if traffic is routed correctly to intended destination. Provides information about Next hop type, IP address and Route table ID for specific destination IP address.
Effective security rules Network diagnostic tools View effective security rules applied to network interface. Shows all security rules applied to the network interface, the subnet the network interface is in and the aggregate of both.
Connection Troubleshoot Network diagnostic tools Test connection between VM, VM scale set, application gateway or a Bastion and a VM, FQDN, URI or IPv4 address. Test the connection at a point in time instead of monitoring it over time (different compared to Connection Monitor).
Packet capture Network diagnostic tools Remotely create packet capture sessions to track traffic to and from a VM or VM scale set.
VPN Troubleshoot Network diagnostic tools Troubleshoot virtual network gateways and their connections.
Flow logs Traffic Log information about Azure IP traffic and stores data in Azure storage. Log IP traffic flowing through a network security group or Azure virtual network.
Traffic analytics Traffic Rich visualizations of flow logs data.

5.6.2 IP Flow verify diagnostics

What is IP Flow verify

Checks connectivity from or to the internet, and from or to your on-premises environment. This feature helps you identify if a security rule is blocking traffic to or from your virtual machine or the internet.

Functionality of IP Flow verify

  • Configure with following properties: VM and network interface, source port number, destination IP address and remote port number, TCP or UDP and traffic direction (inbound or outbound).
  • Communication with machine succeeds or fails.
  • Returns the name of security rule if target machines denies the packet because of an NSG.

5.6.3 Next hop diagnostics

What is next hop

Checks if traffic is being directed to the intended destination. Next hop tests the communication between the source and destination, and reports the type of next hop in the traffic route.

Next hop configuration properties

  • Properties: subscription and resource group, VM and network interface, source IP address, Destination IP address.
  • Test next connection point in your network route configuration.
  • Next hop test returns 3 items: next hop type, IP address of next hop and route table for next hop.
  • Next hop examples: Internet, Virtual Network and Virtual Network Service Endpoint.
  • If next hop is UDR, process returns UDR route, otherwise system route is returned.
  • Next hop is type None then no next hop exist to route the traffic to target.

5.6.4 Visualize network topology

What is network topology

Azure Network Watcher provides a network monitoring topology tool to help administrators visualize and understand infrastructure.

  • Visual diagram of resources in a Vnet.
  • Shows resources in network, interconnections and relationships with each other.
  • View subnets, VM's, network interfaces, public IP addresses, NSG's, route tables, etc..
  • Need a Network Watcher in same region as Vnet to generate topology.

5.7 Improving incident response with alerting in Azure

5.7.1 Data types in Azure Monitor

A data type can be a metric, a log, or both a metric and a log.

  • Metric-based data types: numerical time-sensitive values that represent some aspect of the target resource.
  • Log-based data types: querying of content data held in structured, record-based log files that are relevant to the target resource.
  • Metric alerts
  • Activity log alerts
  • Log alerts

Composition of an alert rule

  • Resource
  • Condition
  • Actions
  • Alert Details

Scope of alert rules

  • Metric values.
  • Log search queries.
  • Activity log events.
  • Health of the underlying Azure platform.
  • Tests for website availability.
  • Manage alert rules.
  • Enable or disable alert rules as needed.

Alert summary view

Summary of all alerts. Apply filters using categories: subscriptions, alert condition, severity or time ranges.

Alert condition

System sets alert condition, Fired or Resolved.

5.7.2 Metric alerts for performance issues

When Metric alerts

Regular threshold monitoring of Azure resources. You can receive alerts when your server CPU utilization is reaching a critical threshold of 90%, database storage is getting low or when network latency is reaching unacceptable levels.

Metric alert composition

Define the type of statistical analysis (static or dynamic), define the period of data to be assessed and define the frequency for the alert condition to check.

Use static threshold metric alerts

With static metrics, you specify the threshold that's used to trigger the alert or notification.

Use dynamic threshold metric alerts

Dynamic metric alerts use machine-learning tools to automatically improve the accuracy of the thresholds defined by the initial rule.

  • Look-back period.
  • Number of violations.

Understand dimensions

Enable monitoring data to be supplied from multiple target instances. Use dimensions to define a metric alert rule and let it apply to multiple instances.

Scale metric alerts

Scaling metric alert rules enable monitoring of multiple resources, you just select all the resources that you want to monitor.

5.7.3 Log alerts on events in your application

When Log alerts

Log alerts use log data to assess the rule logic and, if necessary, trigger an alert. This data can come from any Azure resource: server logs, application server logs, or application logs.

How do log alerts work

The first part of a log alert defines the log search rule. When a log search evaluates as positive, it creates an alert record and triggers any associated actions.

Log search rules composition

  • Log query
  • Time period
  • Frequency
  • Threshold

Number of records

This type of log search returns a single alert when the number of records in a search result reaches or exceeds the value for the number of records (threshold). (syslog and web-app responses)

Metric measurement

  • Aggregate function
  • Group field
  • Interval
  • Threshold

Stateless nature of log alerts

A stateless log alert will generate new alerts every time the rule criteria are triggered, regardless of whether the alert was previously recorded.

5.7.4 Activity log alerts for events in your Azure infrastructure

Activity log alerts

Activity log alerts allow you to be notified when a specific event happens on some Azure resource. It can also include alerts for Azure service health. Activity log alerts are based on events.

When Activity log alerts

Activity log alerts are designed to work with Azure resources. Typically, you'd create this type of log to receive notifications when specific changes occur on a resource within your Azure subscription.

  • Specific operations.
  • Service health events.

Activity log alert composition

  • Category
  • Scope
  • Resources Group
  • Resource type
  • Operation name
  • Level
  • Status
  • Event initiated by

5.7.5 Action groups and alert processing rules

Action groups

An action group is a collection of notification preferences and actions that are executed when the alert is fired. You can run one or more actions for each triggered alert.

Alert processing rules

Overwrite the normal behavior of a fired alert by adding or suppressing an action group.

  • Suppress notifications during a planned maintenance window.
  • Implement management at scale.
  • Add action group to all alert types.

5.8 Analyze Azure infrastructure using Azure Monitor logs

5.8.1 Azure Monitor Logs

Azure Monitor

Azure Monitor is a service for collecting and analyzing telemetry. It helps you get maximum performance and availability for your cloud applications and for your on-premises resources and applications.

Data collection in Azure Monitor

Azure Monitor collects metrics and logs. Metrics show how a resource is performing. Logs show when resources are created or modified.

  • Application data
  • Operating-system data
  • Azure resource data
  • Azure subscription
  • Azure tenant data

Extend data that Azure Monitor collects

  • Enabling diagnostics
  • Adding an agent

Logs

Log data from Azure Monitor in a Log Analytics workspace.

Metrics

Metrics are numerical values that describe some aspect of a system at a point in time. Azure Monitor can capture metrics in near-real time. They are stored in a time-series database.

Analyzing logs by using Kusto query language

To retrieve, consolidate, and analyze data, you can specify a query to run in Azure Monitor logs. You can write a log query with the Kusto query language.


5.9 Monitor Azure VMs with Azure Monitor