TCP Collector - antimetal/system-agent GitHub Wiki

TCP Collector

Overview

The TCP Collector monitors TCP connection statistics and health metrics in the Antimetal System Agent. It provides critical insights into network performance, connection states, and potential issues by collecting data from multiple Linux kernel interfaces.

Why TCP Monitoring is Important

  • Connection Health: Track active, passive, and failed connection attempts
  • Performance Issues: Identify retransmissions, timeouts, and packet errors
  • Security Monitoring: Detect SYN flood attacks via SYN cookie metrics and listen queue overflows
  • Capacity Planning: Monitor connection counts by state to understand system load
  • Troubleshooting: Diagnose network issues through error counters and connection state distribution

Technical Details

MetricType

performance.MetricTypeTCP

Data Sources

The collector reads from multiple /proc filesystem files:

  1. /proc/net/snmp: Basic TCP statistics (RFC 1213 MIB-II)
  2. /proc/net/netstat: Extended TCP statistics (Linux-specific)
  3. /proc/net/tcp: IPv4 TCP connection states
  4. /proc/net/tcp6: IPv6 TCP connection states

Capabilities

{
    SupportsOneShot:    true,
    SupportsContinuous: false,
    RequiresRoot:       false,
    RequiresEBPF:       false,
    MinKernelVersion:   "2.6.0"
}

Source Code

Collected Metrics

Basic TCP Statistics (from /proc/net/snmp)

Metric Type Description
ActiveOpens uint64 Number of times TCP connections have made a direct transition to the SYN-SENT state from the CLOSED state
PassiveOpens uint64 Number of times TCP connections have made a direct transition to the SYN-RECV state from the LISTEN state
AttemptFails uint64 Number of times TCP connections have made a direct transition to the CLOSED state from either the SYN-SENT state or the SYN-RECV state
EstabResets uint64 Number of times TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE-WAIT state
CurrEstab uint64 Current number of TCP connections in ESTABLISHED or CLOSE-WAIT state
InSegs uint64 Total number of segments received, including those received in error
OutSegs uint64 Total number of segments sent, excluding those containing only retransmitted octets
RetransSegs uint64 Total number of segments retransmitted
InErrs uint64 Total number of segments received in error (bad checksums, etc.)
OutRsts uint64 Number of TCP segments sent containing the RST flag
InCsumErrors uint64 Number of TCP segments received with checksum errors

Extended TCP Statistics (from /proc/net/netstat)

Metric Type Description
SyncookiesSent uint64 Number of SYN cookies sent (helps defend against SYN flood attacks)
SyncookiesRecv uint64 Number of SYN cookies received and validated
SyncookiesFailed uint64 Number of invalid SYN cookies received
ListenOverflows uint64 Times the listen queue of a socket overflowed
ListenDrops uint64 SYNs to LISTEN sockets dropped due to overflow
TCPLostRetransmit uint64 Retransmissions lost (indicating severe network issues)
TCPFastRetrans uint64 Segments retransmitted using Fast Retransmit algorithm
TCPSlowStartRetrans uint64 Segments retransmitted in slow start
TCPTimeouts uint64 Total number of TCP timeout events

Connection States (from /proc/net/tcp*)

State Description
ESTABLISHED Connection has been established
SYN_SENT Actively attempting to establish a connection
SYN_RECV Initial synchronization of the connection underway
FIN_WAIT1 Connection terminating, waiting for ACK or FIN
FIN_WAIT2 Connection terminating, waiting for FIN
TIME_WAIT Connection closed but waiting for late packets
CLOSE Connection is closed
CLOSE_WAIT Remote shutdown; waiting for local application to close
LAST_ACK Remote shutdown and local shutdown; awaiting ACK
LISTEN Listening for incoming connections
CLOSING Both sides simultaneously closed

Data Structure

The collector returns a performance.TCPStats struct containing all metrics:

type TCPStats struct {
    // Basic TCP stats from /proc/net/snmp
    ActiveOpens  uint64
    PassiveOpens uint64
    AttemptFails uint64
    EstabResets  uint64
    CurrEstab    uint64
    InSegs       uint64
    OutSegs      uint64
    RetransSegs  uint64
    InErrs       uint64
    OutRsts      uint64
    InCsumErrors uint64
    
    // Extended stats from /proc/net/netstat
    SyncookiesSent      uint64
    SyncookiesRecv      uint64
    SyncookiesFailed    uint64
    ListenOverflows     uint64
    ListenDrops         uint64
    TCPLostRetransmit   uint64
    TCPFastRetrans      uint64
    TCPSlowStartRetrans uint64
    TCPTimeouts         uint64
    
    // Connection states from /proc/net/tcp*
    ConnectionsByState map[string]uint64
}

Configuration

The TCP Collector requires minimal configuration:

performance:
  collection_config:
    host_proc_path: "/proc"  # Path to proc filesystem (default: /proc)
  enabled_collectors:
    tcp: true

Container Configuration

When running in a container, ensure the host's /proc filesystem is mounted:

volumes:
  - name: proc
    hostPath:
      path: /proc
      type: Directory
volumeMounts:
  - name: proc
    mountPath: /host/proc
    readOnly: true

Then configure the collector:

performance:
  collection_config:
    host_proc_path: "/host/proc"

Platform Considerations

Linux Kernel Requirements

  • Minimum kernel version: 2.6.0
  • /proc/net/snmp has been available since early Linux versions
  • Extended statistics in /proc/net/netstat require kernel 2.6.37+
  • Some metrics may not be available on older kernels (gracefully handled)

Container Considerations

  • No special privileges required (runs as non-root)
  • Requires read access to /proc/net/* files
  • Host /proc filesystem must be mounted into container
  • Network namespace affects visible connections (container sees only its own connections unless using host network)

File Format Compatibility

The collector handles various format variations:

  • Missing extended statistics (older kernels)
  • Different field orderings
  • Additional unknown fields (newer kernels)
  • IPv4-only systems (no /proc/net/tcp6)

Common Issues

Issue: No TCP Statistics Found

Symptom: Error "TCP statistics not found in /proc/net/snmp"

Causes:

  • /proc filesystem not mounted correctly
  • Running in restricted container without proc access
  • Non-Linux system

Solution:

  • Verify /proc/net/snmp exists and is readable
  • Check container volume mounts
  • Ensure host_proc_path configuration is correct

Issue: Zero Connection Counts

Symptom: All connection state counts show 0

Causes:

  • Container network namespace isolation
  • Missing /proc/net/tcp or /proc/net/tcp6 files
  • Permission issues

Solution:

  • Use host network mode if system-wide monitoring needed
  • Verify file permissions
  • Check for SELinux/AppArmor restrictions

Issue: Missing Extended Statistics

Symptom: SYN cookie and timeout metrics all show 0

Causes:

  • Older kernel without extended TCP statistics
  • /proc/net/netstat not available

Solution:

  • This is normal on kernels < 2.6.37
  • Collector continues with basic statistics only
  • Upgrade kernel for full metrics

Examples

Sample Output

{
  "tcp": {
    "active_opens": 125847,
    "passive_opens": 89234,
    "attempt_fails": 234,
    "estab_resets": 156,
    "curr_estab": 342,
    "in_segs": 98765432,
    "out_segs": 87654321,
    "retrans_segs": 12345,
    "in_errs": 23,
    "out_rsts": 456,
    "in_csum_errors": 5,
    "syncookies_sent": 123,
    "syncookies_recv": 98,
    "syncookies_failed": 2,
    "listen_overflows": 15,
    "listen_drops": 10,
    "tcp_lost_retransmit": 8,
    "tcp_fast_retrans": 234,
    "tcp_slow_start_retrans": 56,
    "tcp_timeouts": 78,
    "connections_by_state": {
      "ESTABLISHED": 285,
      "SYN_SENT": 3,
      "SYN_RECV": 2,
      "FIN_WAIT1": 5,
      "FIN_WAIT2": 8,
      "TIME_WAIT": 25,
      "CLOSE": 0,
      "CLOSE_WAIT": 4,
      "LAST_ACK": 2,
      "LISTEN": 8,
      "CLOSING": 0
    }
  }
}

Performance Impact

The TCP Collector has minimal performance impact:

  • CPU Usage: Negligible - simple file parsing
  • Memory Usage: < 1MB - temporary buffers for file reading
  • I/O Operations: 4 file reads per collection
  • Collection Time: Typically < 5ms

Scalability Considerations

  • Connection counting scales linearly with number of connections
  • Large servers with thousands of connections may see slightly longer collection times
  • No performance degradation with high connection churn rates

Related Collectors

  • Network Collector: Provides interface-level statistics (bytes, packets, errors)
  • Load Collector: System load can correlate with network activity
  • CPU Collector: High CPU usage may indicate network processing overhead
  • eBPF Network Collector (planned): Will provide per-connection bandwidth and latency metrics

References