Challenge 4: Monitoring your solution - alan9259/animal-adoption-portal GitHub Wiki

https://azuresprintcommon.blob.core.windows.net/content/githubdevops/challenges/teams-based/challenge4/README.html

Challenge 4: Monitoring your solution TAA has received some feedback and would like to increase the reliability of the web portal. You have been tasked to reduce the resolution time of outages on the web site. Proper tooling is key to timely resolutions and also enables automatic rollback if problems are detected.

In this challenge you will be tasked to implement Application Insights, set up Azure Monitor and integrate both with GitHub Actions.

Exercise 1 Now you will create an Application Insights resource in Azure, then integrate it with the web portal project so that requests and errors are logged.

NOTE: You will have to use Visual Studio to complete this exercise as Visual Studio Code does not have the required functionality.

Create an Application Insights resource in the production Resource Group. Add Application Insights Telemetry support to the AnimalAdoption.Web.Portal project through the context menu. When configuring the dependency, make sure that you select Azure Application Insights not Application Insights SDK (Local), leave the rest of the settings as their default. Commit your changes and wait for your workflow to complete. Add the Application Insights Instrumentation Key as a repository secret, then update your production App Settings step to set APPINSIGHTS_INSTRUMENTATIONKEY to the value of this secret, otherwise telemetry will not come through. Open the Application Insights Overview blade in a new tab. In a separate tab, load the production TAA web portal and navigate between the various pages. Ensure that telemetry is being displayed in Application Insights. Data may take up to 5 minutes before it appears in Azure. Update the SimulatedFailureChance App Setting and reload the TAA web portal until you have seen a couple failed requests come through. Check back in Application Insights and drill down into the failed requests. Set SimulatedFailureChance back to 0. Hints The sample code includes an environment variable SimulatedFailureChance that can be set in each service to simulate errors. You can use this to see Application Insights in action without changing the code! Set this to a value between 0 and 100 as it is a percentage. Application Insights has extensions for Visual Studio that can make it easier to view alerts inline with your code, however most of the functionality is exposed in the Azure Portal and without requiring Visual Studio. Helpful resources Start monitoring your ASP.NET Core Application Application Insights auto-instrumentation This is currently NOT supported for .NET Core App Services on Linux App Service Plans. Diagnosing failures using the Azure portal Exercise 2 It is important to know if external users are experiencing issues such as slow loading times or errors when loading the web portal. Application Insights provides availability tests to test uptime and responsiveness. Application Insights makes it easy to quickly turn errors into issues within GitHub.

Set up Application Insights to monitor the availability of the production web portal. Stop the web portal and ensure that an error was generated in the availability plot. Data may take up to 5 minutes before it appears in Azure. Try changing the view mode of the availability plot and interacting with the points. Use the Application Insights interface to create a GitHub Issue from a failed availability test. Start the web portal again and ensure that the availability tests start to return successfully. Hints URL Ping Tests don't use Internet Control Message Protocol (ICMP) for checking availability - they will work with any website that responds to HTTP/HTTPS. Make sure to select the region that your resources are deployed to as one of the Test locations. It is recommended to set the Test frequency and Test timeout parameters to the smallest possible values to avoid long wait times. Helpful resources Monitor Web App Availability Application Insights: Work item integration with GitHub Mastering Issues Exercise 3 There are situations where critical errors need to be immediately surfaced to the team rather than requiring someone to view a dashboard in Application Insights. Using Azure Monitor we can configure rules to alert users to issues within your environment.

Set up an alert that sends an email when errors occur above a threshold for five minutes. Stop the web portal and ensure that an alert activated email comes through. Start the web portal and ensure that an alert deactivated email comes through. Hints Azure Monitor can be used to monitor multiple resources within Azure and not just Application Insights, have a look to see what other metrics are available for other resources you have deployed. You will need to create Action Groups for alert notifications. You can create alert rules directly from the Availability page by using the context menu for an availability test. You can view the currently active alerts on the Alerts page. Helpful resources Overview of alerts in Microsoft Azure Create Azure Monitor Alerts with Application Insights Record runtime exceptions with Azure Monitor Availability Alerts Exercise 4 Now that metrics are flowing into Azure Monitor, GitHub Actions can be integrated to ensure that a new deployment did not cause an increase in failed traffic. Set up Github Actions and Azure Monitor to fail a release and switch slots back to the previous state if errors occur. For this to work you will need to setup an Alert Rule that will utilise the availability test you created earlier.

In the Monitoring > Alerts section of your Application Insights resource, create a new alert rule that will send an email when the average availability of the production web portal's production slot is below 60% in a 5 minute period. You can reuse your existing action group for email notifications While configuring the alert, you can use Split by dimensions to target a specific test You will need a new job in your GitHub Action workflow that authenticates with Azure then uses the Azure CLI to check the status of your alert rule. This should: Come after the release job in the release workflow Only run if the release job succeeds The Azure Login step should have the enable-AzPSSession property set to true Check if your new alert rule has generated alerts within the last hour of the severity used by the alert rule you created above Use the Azure PowerShell action to install any required modules, check if there have are any unresolved alerts for the production slot alert rule in the last hour, then save whether or not there are unresolved alerts using a combination of PowerShell variables and GitHub Actions output parameters. Set azPSVersion to latest, errorActionPreference to stop and failOnStandardError to true. Add a second new job to your workflow, it should authenticate with Azure then use the Azure CLI to switch slots. This should: Come after the Azure Monitor job Only run if the release job succeeds AND the production slot alert output variable is set to true. Use the Azure CLI to swap the App Service from the the staging (source) to production (target) slot (assumes that since production has errors that we want to rollback to what is currently deployed on the staging slot). Set the App Settings correctly for the production slot (during a swap app settings are copied over so the value for APPINSIGHTS_INSTRUMENTATIONKEY will be missing), you should also make use of the slot-name property for the appservice-settings action inside this job. Test that your workflow runs as expected in sequence (the final job shouldn't be triggered yet). Ensure that you are deploying to the production slot in the release job. Ensure that the App Settings for the staging slot match those of the production slot configured (as App Settings are copied between slots when a swap occurs). Then remove the App Settings from the production slot (to "break" it). Restart the production App Service and test that you encounter missing environment variable errors when trying to load the TAA web portal (you may need to hard refresh). Take the production App Service offline. Wait until the alert email comes through for the alert you have configured in your check azure monitor for errors step (you may get multiple emails for different alerts depending on what you have configured). Read the steps below while you wait. You can also check the Alerts section of your Application Insights resource. Load up the production App Service Overview page in Azure and your workflow in separate tabs. Note: the next steps need to be run quickly in sequence otherwise the availability test will run and show as online by the time your workflow gets to checking Azure Monitor for alerts. Immediately: run your workflow. As soon as the release to production job starts: bring the production App Service back online (so that deployment and switching the slots will work). Ensure that the rollback/switch slots job runs (this might fail if your workflow run has to wait in a queue before being processed. To avoid this increase the Test frequency value for the availability test to 10 minutes ((or a value that is larger than the time it takes for your release to production job and the login to azure step for the check for azure monitors job)) then start again from step 8). Ensure that the production TAA web portal now works. Discuss with your team whether you want to add workflow variables for the names of your web app slots to make future updates easier, if so, add the variables and replace all slot-related usages for "staging" and "production" with these variables. Hints You can test a failure by stopping the App Service. Ensure that the new jobs have the needs and if properties correctly set so that they run in the right order, and under the correct circumstances. To get the currently active alerts you can use the az rest command along with the getall method inside Azure Cloud Shell or the Azure CLI on your local machine to quickly get test queries. NOTE: In your GitHub workflow you will have to use the Azure PowerShell action with the Az.AlertManagement module to retrieve the alert status. If you view all of the alerts by clicking the table entries in Application Insights > Alerts you will be able to set filters that align with some of the URI parameters which will help you determine which values to use. Since the Get-AzAlert command returns a PSAlert object, you can use a null check on the result to decide what value to set the output parameter to. In your if checks you will need to surround boolean values in single quotes for them to work correctly e.g. if: needs.check-for-azure-monitor-errors.outputs.productionSlotHasActiveAlerts == 'true'. Helpful resources Create a log alert rule with the Azure portal Azure Login Azure PowerShell Action Workflow syntax for GitHub Actions Azure Alert Management REST API Azure PowerShell Alert Management Module Automatically assign command output to a variable in Powershell PowerShell - If Else Statement Setting an output parameter GitHub Action for Azure CLI Deployment Slots for Web Apps using the Azure CLI az webapp deployment slot swap reference What happens during a swap Extra reading 7 best practices for continuous monitoring with Azure Monitor Martin Fowler on Blue Green Deployments Spoilers Read through the steps above again and see if they make more sense reading a second time around. Otherwise, talk to your coach before looking at these, all the information you require is provided in the instructions and links above.

az rest examples Get-AzAlert examples