【Azure 环境】 介绍两种常规的方法来监视Window系统的CPU高时的进程信息: Performance Monitor 和 Powershell Get Counter - LuBu0505/My-Code GitHub Wiki

问题描述

部署在Azure上的VM资源,偶尔CPU飙高,但是发现的时候已经恢复,无法判断当时High CPU原因。

在Windows系统中,有什么方式能记录CPU被进程占用情况,查找出当时是排名前列的进程信息,用于后期分析。 

问题解答

方式一:Performance Monitor

可以通过Windows系统自带的 Performance Monitor 来获取高CPU的情况。(缺点:Performance Monitor需要长期运行,对于没有规律且短时间无法重现的情况,不太适用) image.png

图形版的操作步骤可以参考博文: 【Azure微服务 Service Fabric 】在SF节点中开启Performance Monitor及设置抓取进程的方式

也可以通过CMD直接调用Performance Monitor的进程(Logman.exe )创建Monitor和开启,停止

第一步:创建Performance Monitor 指标 (Counter)

| Logman.exe create counter Perf-1min -f bin -max 500 -c "\LogicalDisk()*" "\Memory*" "\Network Interface()*" "\Paging File()*" "\PhysicalDisk()*" "\Server*" "\System*" "\Process()*" "\Processor()*" "\Cache*" "\GPU Adapter Memory()*" "\GPU Engine()*" "\GPU Local Adapter Memory()*" "\GPU Non Local Adapter Memory()*" "\GPU Process Memory(*)*" -si 00:01:00 -cnf 24:00:00 -v mmddhhmm -o C:\PerfMonLogs\Perf-1min.blg |

***注:*counter的名称,-si 间隔时间, -cnf 固定的时间间隔, -o文件输出路径都可以自定义修改。

第二步:开启

Logman start Perf-1min

第三步:停止

Logman stop Perf-1min

CMD运行效果:

image.png

方式二:Powershell Get-Counter 

The Get-Counter cmdlet gets performance counter data directly from the performance monitoring instrumentation in the Windows family of operating systems.  Gets performance counter data from local and remote computers.

Get-Counter 直接从 Windows 系列操作系统中的性能监视检测中获取性能计数器数据。

You can use the Get-Counter parameters to specify one or more computers, list the performance counter sets and the instances they contain, set the sample intervals, and specify the maximum number of samples. Without parameters, Get-Counter gets performance counter data for a set of system counters.

可以使用 Get-Counter 参数指定一台或多台计算机,列出性能计数器集及其包含的实例,设置采样间隔,以及指定最大样本数。如果不带参数,Get-Counter 将获取一组系统计数器的性能计数器数据。

Many counter sets are protected by access control lists (ACL). To see all counter sets, open PowerShell with the Run as administrator option.

许多计数器集受访问控制列表 (ACL) 的保护。若要查看所有计数器集,请使用“以管理员身份运行”选项打开 PowerShell。

本文中示例为:(保存脚本为 getcpu.ps1,直接运行输入间隔时间(秒)和CPU阈值,脚本会长时间运行)


#获取总的cpu
function all_cpu(){
$total = Get-Counter "\Process(*)\% Processor Time" -ErrorAction SilentlyContinue | select -ExpandProperty CounterSamples | where InstanceName -eq _total
$idle = Get-Counter "\Process(*)\% Processor Time" -ErrorAction SilentlyContinue | select -ExpandProperty CounterSamples | where InstanceName -eq idle
$cpu_total = ($total.cookedvalue-$idle.cookedvalue)/100/$env:NUMBER_OF_PROCESSORS
return $cpu_total.tostring("P")
}


#获取前五cpu占用
function get_top_5(){
Get-Counter "\Process(*)\% Processor Time" -ErrorAction SilentlyContinue `
  | select -ExpandProperty CounterSamples `
  | where {$_.Status -eq 0 -and $_.instancename -notin "_total", "idle"} `
  | sort CookedValue -Descending `
  | select TimeStamp,
    @{N="Name";E={
        $friendlyName = $_.InstanceName
        try {
            $procId = [System.Diagnostics.Process]::GetProcessesByName($_.InstanceName)[0].Id
            $proc = Get-WmiObject -Query "SELECT ProcessId, ExecutablePath FROM Win32_Process WHERE ProcessId=$procId"
            $procPath = ($proc | where { $_.ExecutablePath } | select -First 1).ExecutablePath
            $friendlyName = [System.Diagnostics.FileVersionInfo]::GetVersionInfo($procPath).FileDescription
        } catch { }
        $friendlyName
    }}, 
    @{N="ID";E={
        $friendlyName = $_.InstanceName
        [System.Diagnostics.Process]::GetProcessesByName($_.InstanceName)[0].Id}} ,
    @{N="CPU";E={($_.CookedValue/100/$env:NUMBER_OF_PROCESSORS).ToString("P")}} -First 5 `
 | ft -a 
}



#主函数
#入参,间隔时间, CPU阈值
function main{
 param(
        [parameter(Mandatory=$true)]
        [ValidateNotNullOrEmpty()][int] $sleeptime,
    [parameter(Mandatory=$true)]
        [ValidateNotNullOrEmpty()][int] $cpu_set
    )
$count = 0
while(1){
$count += 1
echo "Check CPU times : $count"
if ($(all_cpu)*100 -gt $cpu_set){
echo "===== ===== Start Logging ====== =====" >> C:\checkcpu_all_cpu.log
echo "CPU : $(all_cpu)" >>  C:\checkcpu_all_cpu.log
echo $(get_top_5) >>  C:\checkcpu_top_5.log
}


#每隔多少秒运行一次
start-sleep -s $sleeptime
}

}

#执行主函数
main 

执行效果: image.png

参考资料

Get-Counterhttps://docs.microsoft.com/zh-cn/powershell/module/microsoft.powershell.diagnostics/get-counter?view=powershell-7.2

[END]

当在复杂的环境中面临问题,格物之道需:浊而静之徐清,安以动之徐生。 云中,恰是如此!

分类: 【Azure 环境】【Azure Developer】

标签: Azure 环境VM Windows Performance MonitorPowershell Get-Counter