Tasklet IRQs - HewlettPackard/LinuxKI GitHub Wiki

LinuxKI Warning

ksoftirqd using 100% CPU on RHEL 7.1 and 7.2
03/01/2016

Problem

The ksoftirq daemon may be observed using high amounts of CPU after upgrading to RHEL 7.1 or RHEL 7.2.

Investigation

The LinuxKI Toolset was used to collect additional detail about the softirqs by adding the irq subsystem as part of the KI dump collection scripts (runki -s irq or runki -e all). Analysis of the KI data, specifically the CPU/RunQ Analysis Report (kiinfo -kirunq) shows CPU 0 spending a lot of time processing softirqs:

Global CPU Counters

cpu node     Total Busy          sys          usr         idle  hardirq_sys hardirq_user hardirq_idle  softirq_sys softirq_user softirq_idle
  0 [ 0] :      100.00%        0.00%       27.32%        0.00%        0.00%        0.00%        0.00%        0.00%       72.68%        0.00%
  1 [ 0] :       29.42%       27.75%        1.00%       69.29%        0.07%        0.00%        0.11%        0.59%        0.01%        1.17%
  2 [ 0] :        3.13%        1.64%        1.43%       96.50%        0.01%        0.01%        0.03%        0.02%        0.03%        0.34%
  3 [ 0] :        3.92%        1.55%        2.30%       95.67%        0.01%        0.01%        0.03%        0.02%        0.03%        0.38%
  4 [ 0] :       12.57%        5.43%        7.09%       87.16%        0.00%        0.00%        0.03%        0.02%        0.02%        0.24%
  5 [ 0] :        3.68%        2.24%        1.34%       94.51%        0.01%        0.01%        0.35%        0.07%        0.03%        1.45%
Total            25.42%        6.44%        6.74%       73.89%        0.01%        0.00%        0.09%        0.12%       12.11%        0.60%

Further analysis of the CPU/Runq Analysis Report shows that the TASKLET softirq is the primary cause of the high CPU usage:

Soft IRQ events
===============
IRQ Name                Count      ElpTime
  6 TASKLET          66396240    14.497936
  3 NET_RX             175353     0.659607
  4 BLOCK               41437     0.214926
  1 TIMER               28060     0.010501
  7 SCHED                 566     0.002296
  9 RCU                   526     0.000309
  8 HRTIMER               131     0.000159
  2 NET_TX                 18     0.000046
    Total:           66642331    15.385780

This is likely due to a defect in the ioatdma module in RHEL 7.1 and RHEL 7.2. For more information, please review the following RedHat document:

RHEL 7.1 - ksoftirqd thread reports high CPU utilization due to bug in ioatdma driver

Solution

To resolve the issue, please upgrade the RHEL version to one of the following:

  • Red Hat Enterprise Linux 7.1 : Upgrade to kernel-3.10.0-229.14.1.el7 from RHSA-2015-1778 or later
  • Red Hat Enterprise Linux 7.2 : Upgrade to kernel-3.10.0-327.el7 from RHSA-2015-2152