Disk Collector - antimetal/system-agent GitHub Wiki

Disk Collector

Overview

The Disk Collector is a performance monitoring component of the Antimetal System Agent that collects disk I/O statistics from Linux systems. It reads raw counter values from /proc/diskstats to provide detailed metrics about disk operations, throughput, and queue performance. This collector is essential for:

  • Performance Monitoring: Track disk I/O patterns and identify bottlenecks
  • Capacity Planning: Understand disk utilization and throughput requirements
  • Troubleshooting: Identify disks with high latency or queue depths
  • SLA Compliance: Monitor disk performance against service level objectives

The collector reports statistics for whole disk devices only, filtering out partitions to avoid duplicate metrics.

Technical Details

MetricType

MetricTypeDisk

Data Source

  • Primary: /proc/diskstats - Linux kernel disk statistics interface
  • Format: Space-separated values with 14 fields per device
  • Documentation: Linux kernel iostats documentation

Capabilities

CollectorCapabilities{
    SupportsOneShot:    true,
    SupportsContinuous: false,  // Wrapped by ContinuousPointCollector
    RequiresRoot:       false,
    RequiresEBPF:       false,
    MinKernelVersion:   "2.6.0",
}

Registration

The collector is registered as a continuous collector that wraps the point collector:

performance.Register(
    performance.MetricTypeDisk, 
    performance.PartialNewContinuousPointCollector(...)
)

Collected Metrics

The collector returns an array of []*performance.DiskStats with the following metrics for each disk:

Field Type Description Unit
Device string Device name (e.g., sda, nvme0n1) -
Major uint32 Major device number -
Minor uint32 Minor device number -
ReadsCompleted uint64 Number of reads completed successfully count
ReadsMerged uint64 Number of reads merged before queuing count
SectorsRead uint64 Total sectors read (×512 for bytes) sectors
ReadTime uint64 Total time spent reading milliseconds
WritesCompleted uint64 Number of writes completed successfully count
WritesMerged uint64 Number of writes merged before queuing count
SectorsWritten uint64 Total sectors written (×512 for bytes) sectors
WriteTime uint64 Total time spent writing milliseconds
IOsInProgress uint64 Current number of I/Os in progress count
IOTime uint64 Total time spent doing I/Os milliseconds
WeightedIOTime uint64 Weighted time spent doing I/Os milliseconds

Calculated Fields (Not Populated by Point Collector)

The following fields exist in the DiskStats structure but are set to zero by this collector:

  • IOPS - I/O operations per second
  • ReadBytesPerSec - Read throughput
  • WriteBytesPerSec - Write throughput
  • Utilization - Disk utilization percentage
  • AvgQueueSize - Average queue size
  • AvgReadLatency - Average read latency
  • AvgWriteLatency - Average write latency

These fields are intended for rate calculation by continuous collectors or downstream processors.

Data Structure

The implementation is located at: pkg/performance/collectors/disk.go

The data structure is defined in: pkg/performance/types.go

Configuration

The collector requires minimal configuration:

config := performance.CollectionConfig{
    HostProcPath: "/proc",  // Must be absolute path
}

collector, err := collectors.NewDiskCollector(logger, config)

Container Environments

When running in containers, ensure the host's /proc filesystem is mounted:

volumes:
  - name: host-proc
    hostPath:
      path: /proc
      type: Directory
volumeMounts:
  - name: host-proc
    mountPath: /host/proc
    readOnly: true

Then configure with:

config.HostProcPath = "/host/proc"

Platform Considerations

Linux Kernel Requirements

  • Minimum Version: 2.6.0 (when /proc/diskstats was introduced)
  • Required Files: /proc/diskstats must be available and readable
  • Permissions: No root privileges required

Device Filtering

The collector automatically filters out partitions to report only whole disk devices:

  • Standard disks: Filters devices ending with digits (sda1, sdb2)
  • NVMe devices: Filters devices with 'pN' suffix (nvme0n1p1)
  • MMC devices: Filters devices with 'pN' suffix (mmcblk0p1)
  • Special devices: Includes loop and device-mapper devices (loop0, dm-0)

Container Considerations

  • Must mount host /proc filesystem
  • Use HostProcPath configuration to specify mount point
  • Read-only mount is sufficient

Common Issues

1. Missing /proc/diskstats

Error: failed to open /proc/diskstats: no such file or directory

  • Cause: Running on non-Linux system or /proc not mounted
  • Solution: Ensure running on Linux with /proc filesystem available

2. Empty Results

Symptom: Collector returns empty array

  • Cause: All devices filtered as partitions or no block devices present
  • Solution: Check system has block devices with lsblk

3. Parse Errors

Symptom: Some devices have zero values for all metrics

  • Cause: Malformed lines in /proc/diskstats or kernel format changes
  • Solution: Check kernel version compatibility and file format

4. Container Path Issues

Error: HostProcPath must be an absolute path

  • Cause: Relative path provided in configuration
  • Solution: Use absolute paths like /host/proc not ./proc

Examples

Sample Output

[
  {
    "Device": "sda",
    "Major": 8,
    "Minor": 0,
    "ReadsCompleted": 123456,
    "ReadsMerged": 567,
    "SectorsRead": 890123,
    "ReadTime": 4567,
    "WritesCompleted": 890,
    "WritesMerged": 123,
    "SectorsWritten": 456789,
    "WriteTime": 1234,
    "IOsInProgress": 0,
    "IOTime": 5678,
    "WeightedIOTime": 9012
  },
  {
    "Device": "nvme0n1",
    "Major": 259,
    "Minor": 0,
    "ReadsCompleted": 345678,
    "ReadsMerged": 789,
    "SectorsRead": 1234567,
    "ReadTime": 6789,
    "WritesCompleted": 1234,
    "WritesMerged": 567,
    "SectorsWritten": 890123,
    "WriteTime": 3456,
    "IOsInProgress": 2,
    "IOTime": 7890,
    "WeightedIOTime": 11234
  }
]

Performance Impact

The Disk Collector has minimal performance impact:

  • CPU Usage: Negligible - simple file parsing
  • Memory Usage: O(n) where n is number of block devices
  • I/O Operations: Single read of /proc/diskstats per collection
  • Collection Time: Typically < 1ms for systems with < 100 disks

Optimization Notes

  • Partitions are filtered early to reduce memory allocation
  • No system calls beyond file reading
  • Efficient line-by-line parsing without loading entire file

Related Collectors

Collector Relationships

  • Disk Collector: Runtime I/O statistics (dynamic)
  • Disk Info Collector: Hardware configuration (static)
  • Together they provide complete disk monitoring coverage

Troubleshooting Tips

  1. Verify Data Source:

    cat /proc/diskstats
    
  2. Check Device Filtering:

    # List all block devices
    lsblk
    
    # See which would be collected
    cat /proc/diskstats | awk '$3 !~ /[0-9]$/ && $3 !~ /p[0-9]+$/ {print $3}'
    
  3. Monitor Collection:

    # Watch diskstats changes
    watch -n 1 'cat /proc/diskstats | grep -E "sda |nvme0n1 "'
    
  4. Calculate Rates Manually:

    # Simple IOPS calculation
    awk '/sda / {print $4+$8}' /proc/diskstats; sleep 1; awk '/sda / {print $4+$8}' /proc/diskstats