Creating a Self Healing System Using PoShMon - HiltonGiesenow/PoShMon GitHub Wiki

Note: The information on this page is out of date - PoShMon now includes a "Self-repair" framework that you can hook into (see the Invoke-Repairs function) - documentation still to be updated.

While PoShMon itself only performs monitoring and notification, one of it's important design decisions is to return the full output of the monitoring operation. This output, combined with the information in your PoShMonConfiguration instance, can be used to review any failing items and run scripts to automatically correct. This means, for instance, that instead of just calling

Invoke-OSMonitoring -PoShMonConfiguration $poShMonConfiguration

for example, you should store the resulting output from the monitoring tests and use that together with the key structural information contained the PoShMonConfiguration. To do so, change the above example to instead receive the resulting output, as follows:

$monitoringOutput = Invoke-OSMonitoring -PoShMonConfiguration $poShMonConfiguration

You can now pass $monitoringOutput and $poShMonConfiguration on to something that can review and hopefully correct any issues.

As an example, one SharePoint system I've started administering recently has a quirk that the SharePoint "Configuration Cache" gets stale (I'm still working out the cause :>). This manifests as an error in the event log during a Search crawl, with EventId 6398 and message similar to

The Execute method of job definition 'Microsoft.Office.Server.Search.Administration.CustomDictionaryDeploymentJobDefinition'(ID [guid]) threw an exception. More information is included below.

In this case, I know that resetting the Config cache fixes the error, so why do it by hand every time? Instead, I can create a PowerShell script like the one below:

Function Repair-SP
{
    [CmdletBinding()]
    Param(
        [hashtable]$poShMonConfiguration,
        [object[]]$PoShMonOutputValues
    )

    Repair-SPConfigCache @PSBoundParameters
}

Where Repair-SPConfigCache looks similar to this:

Function Repair-SPConfigCache
{
    [CmdletBinding()]
    Param(
        [hashtable]$poShMonConfiguration,
        [object[]]$PoShMonOutputValues
    )

    Write-Host "Testing for 'CustomDictionaryDeploymentJobDefinition' Event Log entries"

    $errorEventLogs = $PoShMonOutputValues | Where SectionHeader -eq 'Error Event Log Issues'

    $failures = $errorEventLogs.OutputValues.GroupOutputValues | Where EventID -eq 6398 -and Message -Like '%CustomDictionaryDeploymentJobDefinition%'
    if ($failures.Count -gt 0)
    {
        $servers = @()

        foreach ($groupOutputValue in $output.OutputValues)
        {
            if (($groupOutputValue.GroupOutputValues | Where EventID -eq 6398 -and Message -Like '%CustomDictionaryDeploymentJobDefinition%').Count -gt 0)
                if (!$servers.Contains($groupOutputValue.GroupName))
                    { $servers.Add($groupOutputValue.GroupName) }
        }

        . [path to file]\Clear-SPConfigCacheForRemoteServer.ps1

        foreach ($server in $servers)
        {
            $remoteSession = Connect-RemoteSession -ServerName $uri.Host -ConfigurationName $poShMonConfiguration.General.ConfigurationName

            Clear-SPConfigCacheForRemoteServer $remoteSession
        }
    } else {
        Write-Host "`tNone found"
    }
}

the final monitoring could would therefore look like:

$monitoringOutput = Invoke-OSMonitoring -PoShMonConfiguration $poShMonConfiguration

Repair-SP $poShMonConfiguration $monitoringOutput

Of course I could have as many 'Repair-SPConfigCache'-type functions as I need, and viola, I have a self-healing system that automatically corrects commonly-occurring issues in my environment!