Memory Technologies Production Ready Hardware PMC - antimetal/system-agent GitHub Wiki
Hardware Performance Counters (PMCs) represent CPU-level hardware monitoring capabilities that provide zero-overhead insight into system behavior. These counters are built directly into modern processors and can track thousands of microarchitectural events without software intervention.
For memory leak detection, PMCs offer unique advantages:
- CPU hardware counters provide zero-overhead monitoring - No performance impact on production systems
- Track LLC misses, TLB pressure, memory bandwidth - Direct indicators of memory allocation patterns
- Cycle-accurate measurements - Precise timing information for anomaly detection
- Requires expertise to interpret - Complex signals need sophisticated analysis
PMCs excel at detecting memory allocation anomalies through indirect signals like cache miss patterns, TLB pressure, and memory bandwidth utilization rather than direct heap inspection.
Metric | Rating | Details |
---|---|---|
Overhead | 0% | Hardware native, no software instrumentation |
Accuracy | Medium | Indirect signals require pattern analysis |
False Positives | Medium | Workload changes can mimic leak signatures |
Production Ready | Limited | Requires deep expertise to interpret |
Platform Support | Intel/AMD x86_64, ARM | CPU-specific event sets |
Deployment Considerations:
- Minimal system impact makes it suitable for continuous monitoring
- Requires kernel perf subsystem support (CONFIG_PERF_EVENTS)
- Limited counter availability necessitates event multiplexing
- Virtualization environments may restrict PMC access
// perf_event_open() system call usage
struct perf_event_attr pe;
int fd;
memset(&pe, 0, sizeof(struct perf_event_attr));
pe.type = PERF_TYPE_RAW;
pe.size = sizeof(struct perf_event_attr);
pe.config = 0x412e; // MEM_LOAD_RETIRED.L3_MISS on Intel
pe.disabled = 1;
pe.exclude_kernel = 0;
pe.exclude_hv = 1;
fd = perf_event_open(&pe, -1, cpu, -1, 0);
// Intel Performance Counter Monitor integration
#include "cpucounters.h"
class MemoryLeakDetector {
private:
PCM* pcm;
std::vector<uint64> before_llc_misses;
std::vector<uint64> before_memory_reads;
public:
void initialize() {
pcm = PCM::getInstance();
if (pcm->program() != PCM::Success) {
throw std::runtime_error("PMC initialization failed");
}
}
void collect_baseline() {
auto states = pcm->getAllCounterStates();
for (auto& state : states) {
before_llc_misses.push_back(getL3CacheMisses(state));
before_memory_reads.push_back(getBytesReadFromMC(state));
}
}
};
Intel Haswell+ Events:
-
MEM_LOAD_RETIRED.L3_MISS
(0x20d1) - L3 cache misses -
DTLB_LOAD_MISSES.WALK_CYCLES
(0x1008) - TLB miss cycles -
OFFCORE_RESPONSE.DEMAND_DATA_RD.L3_MISS
- Memory reads missing L3 -
PAGE_WALKER_LOADS.DTLB_L1
(0x1012) - Page table walks
class PMCMemoryLeakDetector:
def __init__(self):
self.llc_miss_baseline = None
self.tlb_miss_baseline = None
self.memory_bw_baseline = None
def detect_llc_miss_divergence(self, current_misses, window_size=300):
"""
LLC Miss Rate Divergence Detection
Sustained increase in LLC miss rate indicates growing working set
"""
if len(self.llc_miss_history) < window_size:
return False
recent_avg = np.mean(self.llc_miss_history[-window_size//2:])
historical_avg = np.mean(self.llc_miss_history[-window_size:-window_size//2])
# Detect statistically significant increase
divergence_ratio = recent_avg / historical_avg
return divergence_ratio > 1.3 # 30% increase threshold
def analyze_tlb_pressure(self, tlb_misses, memory_accesses):
"""
TLB Pressure Analysis - Higher TLB miss rate suggests memory fragmentation
"""
tlb_miss_rate = tlb_misses / memory_accesses
if self.tlb_miss_baseline is None:
self.tlb_miss_baseline = tlb_miss_rate
return False
pressure_increase = tlb_miss_rate / self.tlb_miss_baseline
return pressure_increase > 2.0 # 100% increase in TLB pressure
Memory leaks create expanding working sets that exceed CPU cache capacity. This manifests as:
- Sustained LLC miss rate increase - Growing heap data structures exceed L3 cache
- Memory access pattern changes - Shift from cache-friendly to memory-bound behavior
- Temporal correlation - LLC miss increases correlate with allocation-heavy code paths
Detection Algorithm:
def detect_working_set_expansion(llc_misses, memory_requests, time_window=60):
miss_rate = llc_misses / memory_requests
baseline_rate = calculate_baseline(miss_rate, time_window)
# Z-score based anomaly detection
z_score = (miss_rate - baseline_rate.mean) / baseline_rate.std
return z_score > 3.0 # 3-sigma threshold
Translation Lookaside Buffer pressure indicates memory fragmentation and large working sets:
- DTLB miss cycles increase - More time spent in page table walks
- Page walker activity - Hardware page table traversal increases
- Virtual memory pressure - Large virtual address space consumption
Memory allocation patterns create distinctive bandwidth signatures:
- Read/Write ratio changes - Allocation-heavy workloads show increased writes
- NUMA traffic imbalance - Memory leaks can skew cross-socket traffic
- Burst vs. sustained patterns - Leaks create sustained memory pressure
def analyze_memory_bandwidth_fingerprint(read_bw, write_bw, numa_traffic):
"""
Memory bandwidth fingerprinting for leak detection
"""
rw_ratio = write_bw / read_bw
numa_imbalance = max(numa_traffic) / min(numa_traffic)
# Leak signature: high write ratio + NUMA imbalance
leak_score = (rw_ratio * 0.6) + (numa_imbalance * 0.4)
return leak_score > 1.5
Non-uniform memory access patterns can indicate memory management issues:
- Cross-socket memory traffic - Allocations on wrong NUMA node
- Remote memory access increases - Growing working set exceeds local NUMA capacity
- Memory controller utilization skew - Uneven distribution across memory controllers
-
"Anomaly Detection Using Hardware Performance Counters" (Garcia-Serrano et al., 2015)
- IEEE Transactions on Computers
- Establishes PMC-based anomaly detection methodology
- Demonstrates statistical significance testing for counter anomalies
- Proposes multivariate analysis of counter correlations
-
"Low-overhead Memory Leak Detection Using Adaptive Statistical Profiling" (Microsoft SWAT Team)
- USENIX Annual Technical Conference 2006
- Statistical sampling approaches for production systems
- Adaptive profiling techniques to minimize overhead
- Integration with Windows performance toolkit
-
"Hardware-Performance-Counters-based Anomaly Detection" (Zhang et al., IEEE 2020)
- IEEE Transactions on Reliability
- Machine learning approaches for PMC anomaly detection
- Feature selection from hundreds of available counters
- Real-time classification algorithms for production deployment
-
Intel Memory Bandwidth Monitoring Documentation
- Intel 64 and IA-32 Architectures Software Developer's Manual
- Cache Quality of Service monitoring capabilities
- Memory bandwidth allocation and monitoring (MBA/MBM)
- Per-core memory bandwidth tracking techniques
- "Cache-based Side-channel Attacks and Memory Access Pattern Analysis" (Yarom & Falkner, USENIX Security 2014)
- "Precise and Scalable Detection of Double-Fetch Bugs" (Xu et al., Oakland 2018)
- "PMU-Events: A Library for Accessing Performance Monitoring Unit Events" (PAPI Project)
#include <linux/perf_event.h>
#include <sys/syscall.h>
#include <unistd.h>
// Comprehensive PMC event monitoring
struct pmc_event {
const char* name;
uint64_t config;
int fd;
uint64_t value;
};
static struct pmc_event memory_events[] = {
{"LLC_MISSES", 0x412e, -1, 0},
{"DTLB_LOAD_MISSES", 0x1008, -1, 0},
{"PAGE_WALKER_LOADS", 0x1012, -1, 0},
{"MEM_INST_RETIRED_LOADS", 0x81d0, -1, 0},
{NULL, 0, -1, 0}
};
int setup_pmc_monitoring(int cpu) {
struct perf_event_attr pe;
for (int i = 0; memory_events[i].name; i++) {
memset(&pe, 0, sizeof(pe));
pe.type = PERF_TYPE_RAW;
pe.size = sizeof(pe);
pe.config = memory_events[i].config;
pe.disabled = 1;
pe.exclude_kernel = 0;
pe.exclude_hv = 1;
pe.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
PERF_FORMAT_TOTAL_TIME_RUNNING;
memory_events[i].fd = perf_event_open(&pe, -1, cpu, -1, 0);
if (memory_events[i].fd == -1) {
perror("perf_event_open");
return -1;
}
}
return 0;
}
void read_pmc_values() {
for (int i = 0; memory_events[i].name; i++) {
if (read(memory_events[i].fd, &memory_events[i].value,
sizeof(memory_events[i].value)) != sizeof(memory_events[i].value)) {
perror("read");
}
}
}
#include "cpucounters.h"
#include <vector>
#include <chrono>
class IntelPCMMemoryMonitor {
private:
PCM* m_pcm;
std::vector<CoreCounterState> before_state;
std::vector<CoreCounterState> after_state;
public:
bool initialize() {
m_pcm = PCM::getInstance();
if (m_pcm->program() != PCM::Success) {
std::cerr << "Error: Cannot access performance counters" << std::endl;
return false;
}
std::cout << "PMC monitoring initialized for "
<< m_pcm->getNumCores() << " cores" << std::endl;
return true;
}
void start_monitoring() {
before_state.clear();
for (uint32 i = 0; i < m_pcm->getNumCores(); ++i) {
before_state.push_back(getCoreCounterState(i));
}
}
struct MemoryMetrics {
uint64 llc_misses;
uint64 memory_reads;
uint64 tlb_misses;
double cycles_per_instruction;
double llc_miss_rate;
};
std::vector<MemoryMetrics> get_memory_metrics() {
after_state.clear();
for (uint32 i = 0; i < m_pcm->getNumCores(); ++i) {
after_state.push_back(getCoreCounterState(i));
}
std::vector<MemoryMetrics> metrics;
for (uint32 i = 0; i < m_pcm->getNumCores(); ++i) {
MemoryMetrics core_metrics;
core_metrics.llc_misses = getL3CacheMisses(before_state[i], after_state[i]);
core_metrics.memory_reads = getBytesReadFromMC(before_state[i], after_state[i]);
core_metrics.cycles_per_instruction = getCoreIPC(before_state[i], after_state[i]);
// Calculate LLC miss rate
uint64 total_requests = getL3CacheHitsNoSnoop(before_state[i], after_state[i]) +
core_metrics.llc_misses;
core_metrics.llc_miss_rate = total_requests > 0 ?
(double)core_metrics.llc_misses / total_requests : 0.0;
metrics.push_back(core_metrics);
}
return metrics;
}
};
import ctypes
import os
import struct
from ctypes import c_int, c_uint64, c_void_p, POINTER
# Python wrapper for perf_event_open
class PerfEventAttr(ctypes.Structure):
_fields_ = [
("type", c_uint64),
("size", c_uint64),
("config", c_uint64),
("sample_period", c_uint64),
("sample_type", c_uint64),
("read_format", c_uint64),
("flags", c_uint64),
# ... additional fields
]
class PMCMonitor:
# Intel event codes
EVENTS = {
'LLC_MISSES': 0x412e,
'DTLB_LOAD_MISSES_WALK_CYCLES': 0x1008,
'PAGE_WALKER_LOADS': 0x1012,
'MEM_LOAD_RETIRED_L3_MISS': 0x20d1,
'MEMORY_READS': 0x81d0
}
def __init__(self):
self.event_fds = {}
self.libc = ctypes.CDLL("libc.so.6")
def perf_event_open(self, attr, pid, cpu, group_fd, flags):
return self.libc.syscall(298, ctypes.byref(attr), pid, cpu, group_fd, flags)
def setup_event(self, event_name, cpu=-1):
attr = PerfEventAttr()
attr.type = 4 # PERF_TYPE_RAW
attr.size = ctypes.sizeof(PerfEventAttr)
attr.config = self.EVENTS[event_name]
attr.read_format = 3 # PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
fd = self.perf_event_open(attr, -1, cpu, -1, 0)
if fd < 0:
raise RuntimeError(f"Failed to open {event_name} event")
self.event_fds[event_name] = fd
return fd
def read_counter(self, event_name):
fd = self.event_fds[event_name]
data = os.read(fd, 24) # 3 * 8 bytes
values = struct.unpack('QQQ', data)
return {
'value': values[0],
'time_enabled': values[1],
'time_running': values[2]
}
def enable_counting(self):
for fd in self.event_fds.values():
os.write(fd, b'\x01\x00\x00\x00\x00\x00\x00\x00') # PERF_EVENT_IOC_ENABLE
def disable_counting(self):
for fd in self.event_fds.values():
os.write(fd, b'\x02\x00\x00\x00\x00\x00\x00\x00') # PERF_EVENT_IOC_DISABLE
# Usage example
monitor = PMCMonitor()
monitor.setup_event('LLC_MISSES')
monitor.setup_event('DTLB_LOAD_MISSES_WALK_CYCLES')
monitor.enable_counting()
# ... run workload ...
monitor.disable_counting()
llc_data = monitor.read_counter('LLC_MISSES')
print(f"LLC Misses: {llc_data['value']}")
class LLCMissRateAnalyzer:
def __init__(self, baseline_samples=100):
self.baseline_samples = baseline_samples
self.llc_miss_history = []
self.memory_access_history = []
def calculate_miss_rate(self, llc_misses, total_memory_accesses):
if total_memory_accesses == 0:
return 0.0
return llc_misses / total_memory_accesses
def update_baseline(self, llc_misses, memory_accesses):
miss_rate = self.calculate_miss_rate(llc_misses, memory_accesses)
self.llc_miss_history.append(miss_rate)
if len(self.llc_miss_history) > self.baseline_samples:
self.llc_miss_history.pop(0)
def detect_anomaly(self, current_llc_misses, current_memory_accesses,
threshold_std=2.5):
if len(self.llc_miss_history) < self.baseline_samples:
return False
current_miss_rate = self.calculate_miss_rate(current_llc_misses,
current_memory_accesses)
import numpy as np
baseline_mean = np.mean(self.llc_miss_history)
baseline_std = np.std(self.llc_miss_history)
if baseline_std == 0:
return False
z_score = (current_miss_rate - baseline_mean) / baseline_std
return z_score > threshold_std
def get_statistics(self):
if not self.llc_miss_history:
return None
import numpy as np
return {
'mean': np.mean(self.llc_miss_history),
'std': np.std(self.llc_miss_history),
'samples': len(self.llc_miss_history),
'min': min(self.llc_miss_history),
'max': max(self.llc_miss_history)
}
Event Name | Event Code | Description | Memory Leak Relevance |
---|---|---|---|
MEM_LOAD_RETIRED.L3_MISS | 0x20d1 | Retired load instructions that missed L3 cache | Primary indicator of working set expansion |
DTLB_LOAD_MISSES.WALK_CYCLES | 0x1008 | Cycles spent in DTLB miss page walks | Memory fragmentation and large working sets |
PAGE_WALKER_LOADS.DTLB_L1 | 0x1012 | Page table loads for DTLB L1 misses | Virtual memory pressure indication |
LONGEST_LAT_CACHE.MISS | 0x412e | Core-originated cacheable demand requests missed LLC | Cache pressure from growing heap |
MEM_INST_RETIRED.ALL_LOADS | 0x81d0 | All retired load instructions | Baseline for miss rate calculations |
OFFCORE_RESPONSE.DEMAND_DATA_RD.L3_MISS | 0x1b7 + MSR | Data read requests that missed L3 cache | Memory bandwidth utilization |
Event Name | Event Code | Description |
---|---|---|
L3_CACHE_MISSES | 0x04 | L3 cache misses |
DATA_TLB_MISSES | 0x45 | Data TLB misses |
L3_LOOKUP_DATA_READ | 0x04:0x03 | L3 lookups for data reads |
Event Name | Event Code | Description |
---|---|---|
L2D_CACHE_REFILL | 0x17 | Level 2 data cache refill |
DTLB_REFILL | 0x05 | Data TLB refill |
MEM_ACCESS | 0x13 | Data memory access |
class PMCEventSelector:
def __init__(self, cpu_vendor, cpu_model):
self.cpu_vendor = cpu_vendor
self.cpu_model = cpu_model
def get_optimal_events(self, max_counters=4):
"""
Select optimal PMC events based on CPU capabilities
"""
if self.cpu_vendor == "Intel":
if "Haswell" in self.cpu_model or "Broadwell" in self.cpu_model:
return [
"MEM_LOAD_RETIRED.L3_MISS",
"DTLB_LOAD_MISSES.WALK_CYCLES",
"LONGEST_LAT_CACHE.MISS",
"MEM_INST_RETIRED.ALL_LOADS"
][:max_counters]
elif "Skylake" in self.cpu_model or "Ice Lake" in self.cpu_model:
return [
"MEM_LOAD_RETIRED.L3_MISS",
"DTLB_LOAD_MISSES.WALK_COMPLETED",
"L2_RQSTS.ALL_DEMAND_MISS",
"MEM_INST_RETIRED.ALL_LOADS"
][:max_counters]
elif self.cpu_vendor == "AMD":
return [
"L3_CACHE_MISSES",
"DATA_TLB_MISSES",
"L3_LOOKUP_DATA_READ",
"RETIRED_INSTRUCTIONS"
][:max_counters]
# Default fallback
return ["CACHE_MISSES", "TLB_MISSES", "MEMORY_ACCESSES"][:max_counters]
High-Performance Computing centers extensively use PMCs for performance optimization:
# Example: SLURM job with PMC monitoring
#!/bin/bash
#SBATCH --job-name=pmc_monitoring
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
# Enable PMC access for unprivileged users
echo 0 > /proc/sys/kernel/perf_event_paranoid
# Launch application with PMC monitoring
perf stat -e LLC-load-misses,dTLB-load-misses,cache-misses \
-o performance_counters.log \
./memory_intensive_application
# Analyze results
python analyze_pmc_results.py performance_counters.log
Real-world HPC deployments:
- NERSC (National Energy Research Scientific Computing Center) - Uses PMCs for job performance analysis
- TACC (Texas Advanced Computing Center) - PMC-based workload characterization
- CERN Computing - Memory usage optimization in physics simulations
AWS EC2 Instance Performance Monitoring:
class CloudPMCMonitor:
def __init__(self, instance_type):
self.instance_type = instance_type
self.supports_pmc = self.check_pmc_support()
def check_pmc_support(self):
"""
Check if EC2 instance type supports PMC access
- Bare metal instances: Full PMC access
- Nitro system: Limited PMC access
- Xen-based: No PMC access
"""
if 'metal' in self.instance_type:
return 'full'
elif any(prefix in self.instance_type for prefix in ['m5', 'c5', 'r5']):
return 'limited'
else:
return 'none'
def get_available_counters(self):
if self.supports_pmc == 'full':
return 8 # Most Intel processors
elif self.supports_pmc == 'limited':
return 4 # Virtualized access
else:
return 0
Google Cloud Platform Usage:
- GCE Custom Machine Types - PMC monitoring for memory optimization
- GKE Node Performance - Container memory leak detection
- Compute Engine Sole-tenant Nodes - Full PMC access for detailed analysis
Microsoft SWAT (Software Analysis Team):
class SWATMemoryLeakDetector:
"""
Based on Microsoft's SWAT team approach for low-overhead leak detection
"""
def __init__(self, sampling_rate=1000): # Sample every 1000 allocations
self.sampling_rate = sampling_rate
self.pmc_samples = []
def adaptive_sampling(self, allocation_rate):
"""
Adjust sampling rate based on allocation frequency
"""
if allocation_rate > 10000: # High allocation rate
self.sampling_rate = 500 # Increase sampling
elif allocation_rate < 100: # Low allocation rate
self.sampling_rate = 2000 # Decrease sampling
def collect_pmc_sample(self, llc_misses, tlb_misses):
sample = {
'timestamp': time.time(),
'llc_misses': llc_misses,
'tlb_misses': tlb_misses,
'allocation_context': self.get_allocation_context()
}
self.pmc_samples.append(sample)
# Statistical analysis for leak detection
if len(self.pmc_samples) > 100:
return self.analyze_trend()
return False
# Prometheus alerting rules for PMC-based memory leak detection
groups:
- name: pmc_memory_leak_detection
rules:
- alert: LLC_Miss_Rate_Anomaly
expr: |
(
increase(pmc_llc_misses_total[5m]) /
increase(pmc_memory_accesses_total[5m])
) > 0.1
for: 10m
labels:
severity: warning
component: memory_management
annotations:
summary: "High LLC miss rate detected on {{ $labels.instance }}"
description: |
LLC miss rate of {{ $value | humanizePercentage }} indicates potential
memory leak or working set expansion on {{ $labels.instance }}.
- alert: TLB_Pressure_High
expr: |
increase(pmc_dtlb_miss_cycles_total[5m]) /
increase(pmc_cpu_cycles_total[5m]) > 0.05
for: 15m
labels:
severity: critical
component: memory_management
annotations:
summary: "High TLB pressure on {{ $labels.instance }}"
description: |
TLB miss cycles represent {{ $value | humanizePercentage }} of total
CPU cycles, indicating severe memory fragmentation.
class TLBEfficiencyMonitor:
def __init__(self):
self.efficiency_history = []
self.alert_threshold = 0.95 # 95% efficiency minimum
def calculate_tlb_efficiency(self, tlb_hits, tlb_misses):
total_tlb_accesses = tlb_hits + tlb_misses
if total_tlb_accesses == 0:
return 1.0
return tlb_hits / total_tlb_accesses
def evaluate_efficiency(self, current_efficiency):
self.efficiency_history.append(current_efficiency)
# Keep only last 100 samples
if len(self.efficiency_history) > 100:
self.efficiency_history.pop(0)
# Calculate trending efficiency
if len(self.efficiency_history) >= 10:
recent_avg = np.mean(self.efficiency_history[-10:])
if recent_avg < self.alert_threshold:
return {
'status': 'critical',
'efficiency': recent_avg,
'message': f'TLB efficiency dropped to {recent_avg:.2%}'
}
return {'status': 'normal', 'efficiency': current_efficiency}
class MemoryBandwidthMonitor:
def __init__(self):
self.bandwidth_baseline = None
self.anomaly_threshold = 2.0 # 2x baseline
def analyze_bandwidth_anomaly(self, read_bw, write_bw, timestamp):
total_bw = read_bw + write_bw
if self.bandwidth_baseline is None:
self.bandwidth_baseline = total_bw
return False
bandwidth_ratio = total_bw / self.bandwidth_baseline
if bandwidth_ratio > self.anomaly_threshold:
return {
'anomaly_detected': True,
'bandwidth_increase': bandwidth_ratio,
'read_bw_gb_s': read_bw / (1024**3),
'write_bw_gb_s': write_bw / (1024**3),
'timestamp': timestamp
}
# Update baseline with exponential moving average
alpha = 0.1
self.bandwidth_baseline = (alpha * total_bw) + ((1 - alpha) * self.bandwidth_baseline)
return {'anomaly_detected': False}
Modern CPUs typically provide 4-8 programmable PMCs per core, but hundreds of events are available. This creates a resource allocation challenge:
class PMCMultiplexer:
def __init__(self, available_counters=4):
self.available_counters = available_counters
self.event_groups = []
self.current_group = 0
def create_event_groups(self, desired_events):
"""
Group events for time-multiplexed monitoring
"""
# Priority-based grouping for memory leak detection
high_priority = [
'MEM_LOAD_RETIRED.L3_MISS',
'DTLB_LOAD_MISSES.WALK_CYCLES'
]
medium_priority = [
'PAGE_WALKER_LOADS.DTLB_L1',
'LONGEST_LAT_CACHE.MISS'
]
low_priority = [
'MEM_INST_RETIRED.ALL_LOADS',
'OFFCORE_RESPONSE.DEMAND_DATA_RD'
]
# Create groups ensuring high-priority events are always monitored
for i in range(0, len(desired_events), self.available_counters):
group = desired_events[i:i+self.available_counters]
# Ensure at least one high-priority event per group
if not any(event in high_priority for event in group):
if high_priority:
group[0] = high_priority[0]
self.event_groups.append(group)
def rotate_groups(self, interval_seconds=10):
"""
Time-multiplexed monitoring with group rotation
"""
self.current_group = (self.current_group + 1) % len(self.event_groups)
return self.event_groups[self.current_group]
PMC access depends on kernel support and version-specific features:
# Required kernel configurations
CONFIG_PERF_EVENTS=y
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_USE_VMALLOC=y
# Check kernel PMC support
$ cat /proc/sys/kernel/perf_event_paranoid
# -1: Allow use of (almost) all events by all users
# 0: Disallow ftrace function tracepoint and kernel tracepoints
# 1: Disallow CPU events for unpriv
# 2: Disallow kernel profiling for unpriv
Version-specific challenges:
- Linux 3.0+ - Basic perf_events support
- Linux 3.7+ - Improved overflow handling
- Linux 4.1+ - Better virtualization support
- Linux 5.0+ - Intel PMU v4 support
Virtualized environments present significant challenges for PMC access:
Virtualization Type | PMC Access Level | Limitations |
---|---|---|
Bare Metal | Full | None |
KVM with CPU Passthrough | Near-full | Some events may be filtered |
VMware vSphere | Limited | Only basic events available |
Xen PV | None | No direct PMC access |
Docker/LXC | Host-dependent | Inherits host limitations |
AWS Nitro | Limited | Subset of events available |
Google Cloud | Varies | Instance-type dependent |
def detect_virtualization_pmc_support():
"""
Detect virtualization environment and PMC capabilities
"""
import subprocess
try:
# Check for virtualization
result = subprocess.run(['systemd-detect-virt'],
capture_output=True, text=True)
virt_type = result.stdout.strip()
if virt_type == 'none':
return 'bare_metal', 'full_pmc_access'
elif virt_type in ['kvm', 'qemu']:
return 'kvm', 'limited_pmc_access'
elif virt_type == 'vmware':
return 'vmware', 'basic_pmc_access'
else:
return virt_type, 'no_pmc_access'
except FileNotFoundError:
# Fallback detection
if os.path.exists('/proc/xen'):
return 'xen', 'no_pmc_access'
elif os.path.exists('/proc/vz'):
return 'openvz', 'no_pmc_access'
else:
return 'unknown', 'unknown_pmc_access'
PMC access requires careful security consideration:
# Secure PMC access configuration
# /etc/security/limits.conf
@performance_monitoring soft memlock unlimited
@performance_monitoring hard memlock unlimited
# Systemd service for PMC monitoring
[Unit]
Description=PMC Memory Leak Detection
After=network.target
[Service]
Type=simple
User=pmc_monitor
Group=perf_users
ExecStart=/usr/local/bin/pmc_leak_detector
Restart=always
RestartSec=10
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/log/pmc
[Install]
WantedBy=multi-user.target
This comprehensive documentation provides the foundation for implementing Hardware Performance Counter-based memory leak detection systems, covering both theoretical background and practical implementation details for production deployment.