TCP Collector - antimetal/system-agent GitHub Wiki
TCP Collector
Overview
The TCP Collector monitors TCP connection statistics and health metrics in the Antimetal System Agent. It provides critical insights into network performance, connection states, and potential issues by collecting data from multiple Linux kernel interfaces.
Why TCP Monitoring is Important
- Connection Health: Track active, passive, and failed connection attempts
- Performance Issues: Identify retransmissions, timeouts, and packet errors
- Security Monitoring: Detect SYN flood attacks via SYN cookie metrics and listen queue overflows
- Capacity Planning: Monitor connection counts by state to understand system load
- Troubleshooting: Diagnose network issues through error counters and connection state distribution
Technical Details
MetricType
performance.MetricTypeTCP
Data Sources
The collector reads from multiple /proc
filesystem files:
/proc/net/snmp
: Basic TCP statistics (RFC 1213 MIB-II)/proc/net/netstat
: Extended TCP statistics (Linux-specific)/proc/net/tcp
: IPv4 TCP connection states/proc/net/tcp6
: IPv6 TCP connection states
Capabilities
{
SupportsOneShot: true,
SupportsContinuous: false,
RequiresRoot: false,
RequiresEBPF: false,
MinKernelVersion: "2.6.0"
}
Source Code
- Implementation:
pkg/performance/collectors/tcp.go
- Tests:
pkg/performance/collectors/tcp_test.go
Collected Metrics
Basic TCP Statistics (from /proc/net/snmp)
Metric | Type | Description |
---|---|---|
ActiveOpens |
uint64 | Number of times TCP connections have made a direct transition to the SYN-SENT state from the CLOSED state |
PassiveOpens |
uint64 | Number of times TCP connections have made a direct transition to the SYN-RECV state from the LISTEN state |
AttemptFails |
uint64 | Number of times TCP connections have made a direct transition to the CLOSED state from either the SYN-SENT state or the SYN-RECV state |
EstabResets |
uint64 | Number of times TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE-WAIT state |
CurrEstab |
uint64 | Current number of TCP connections in ESTABLISHED or CLOSE-WAIT state |
InSegs |
uint64 | Total number of segments received, including those received in error |
OutSegs |
uint64 | Total number of segments sent, excluding those containing only retransmitted octets |
RetransSegs |
uint64 | Total number of segments retransmitted |
InErrs |
uint64 | Total number of segments received in error (bad checksums, etc.) |
OutRsts |
uint64 | Number of TCP segments sent containing the RST flag |
InCsumErrors |
uint64 | Number of TCP segments received with checksum errors |
Extended TCP Statistics (from /proc/net/netstat)
Metric | Type | Description |
---|---|---|
SyncookiesSent |
uint64 | Number of SYN cookies sent (helps defend against SYN flood attacks) |
SyncookiesRecv |
uint64 | Number of SYN cookies received and validated |
SyncookiesFailed |
uint64 | Number of invalid SYN cookies received |
ListenOverflows |
uint64 | Times the listen queue of a socket overflowed |
ListenDrops |
uint64 | SYNs to LISTEN sockets dropped due to overflow |
TCPLostRetransmit |
uint64 | Retransmissions lost (indicating severe network issues) |
TCPFastRetrans |
uint64 | Segments retransmitted using Fast Retransmit algorithm |
TCPSlowStartRetrans |
uint64 | Segments retransmitted in slow start |
TCPTimeouts |
uint64 | Total number of TCP timeout events |
Connection States (from /proc/net/tcp*)
State | Description |
---|---|
ESTABLISHED |
Connection has been established |
SYN_SENT |
Actively attempting to establish a connection |
SYN_RECV |
Initial synchronization of the connection underway |
FIN_WAIT1 |
Connection terminating, waiting for ACK or FIN |
FIN_WAIT2 |
Connection terminating, waiting for FIN |
TIME_WAIT |
Connection closed but waiting for late packets |
CLOSE |
Connection is closed |
CLOSE_WAIT |
Remote shutdown; waiting for local application to close |
LAST_ACK |
Remote shutdown and local shutdown; awaiting ACK |
LISTEN |
Listening for incoming connections |
CLOSING |
Both sides simultaneously closed |
Data Structure
The collector returns a performance.TCPStats
struct containing all metrics:
type TCPStats struct {
// Basic TCP stats from /proc/net/snmp
ActiveOpens uint64
PassiveOpens uint64
AttemptFails uint64
EstabResets uint64
CurrEstab uint64
InSegs uint64
OutSegs uint64
RetransSegs uint64
InErrs uint64
OutRsts uint64
InCsumErrors uint64
// Extended stats from /proc/net/netstat
SyncookiesSent uint64
SyncookiesRecv uint64
SyncookiesFailed uint64
ListenOverflows uint64
ListenDrops uint64
TCPLostRetransmit uint64
TCPFastRetrans uint64
TCPSlowStartRetrans uint64
TCPTimeouts uint64
// Connection states from /proc/net/tcp*
ConnectionsByState map[string]uint64
}
Configuration
The TCP Collector requires minimal configuration:
performance:
collection_config:
host_proc_path: "/proc" # Path to proc filesystem (default: /proc)
enabled_collectors:
tcp: true
Container Configuration
When running in a container, ensure the host's /proc
filesystem is mounted:
volumes:
- name: proc
hostPath:
path: /proc
type: Directory
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
Then configure the collector:
performance:
collection_config:
host_proc_path: "/host/proc"
Platform Considerations
Linux Kernel Requirements
- Minimum kernel version: 2.6.0
/proc/net/snmp
has been available since early Linux versions- Extended statistics in
/proc/net/netstat
require kernel 2.6.37+ - Some metrics may not be available on older kernels (gracefully handled)
Container Considerations
- No special privileges required (runs as non-root)
- Requires read access to
/proc/net/*
files - Host
/proc
filesystem must be mounted into container - Network namespace affects visible connections (container sees only its own connections unless using host network)
File Format Compatibility
The collector handles various format variations:
- Missing extended statistics (older kernels)
- Different field orderings
- Additional unknown fields (newer kernels)
- IPv4-only systems (no
/proc/net/tcp6
)
Common Issues
Issue: No TCP Statistics Found
Symptom: Error "TCP statistics not found in /proc/net/snmp"
Causes:
/proc
filesystem not mounted correctly- Running in restricted container without proc access
- Non-Linux system
Solution:
- Verify
/proc/net/snmp
exists and is readable - Check container volume mounts
- Ensure
host_proc_path
configuration is correct
Issue: Zero Connection Counts
Symptom: All connection state counts show 0
Causes:
- Container network namespace isolation
- Missing
/proc/net/tcp
or/proc/net/tcp6
files - Permission issues
Solution:
- Use host network mode if system-wide monitoring needed
- Verify file permissions
- Check for SELinux/AppArmor restrictions
Issue: Missing Extended Statistics
Symptom: SYN cookie and timeout metrics all show 0
Causes:
- Older kernel without extended TCP statistics
/proc/net/netstat
not available
Solution:
- This is normal on kernels < 2.6.37
- Collector continues with basic statistics only
- Upgrade kernel for full metrics
Examples
Sample Output
{
"tcp": {
"active_opens": 125847,
"passive_opens": 89234,
"attempt_fails": 234,
"estab_resets": 156,
"curr_estab": 342,
"in_segs": 98765432,
"out_segs": 87654321,
"retrans_segs": 12345,
"in_errs": 23,
"out_rsts": 456,
"in_csum_errors": 5,
"syncookies_sent": 123,
"syncookies_recv": 98,
"syncookies_failed": 2,
"listen_overflows": 15,
"listen_drops": 10,
"tcp_lost_retransmit": 8,
"tcp_fast_retrans": 234,
"tcp_slow_start_retrans": 56,
"tcp_timeouts": 78,
"connections_by_state": {
"ESTABLISHED": 285,
"SYN_SENT": 3,
"SYN_RECV": 2,
"FIN_WAIT1": 5,
"FIN_WAIT2": 8,
"TIME_WAIT": 25,
"CLOSE": 0,
"CLOSE_WAIT": 4,
"LAST_ACK": 2,
"LISTEN": 8,
"CLOSING": 0
}
}
}
Performance Impact
The TCP Collector has minimal performance impact:
- CPU Usage: Negligible - simple file parsing
- Memory Usage: < 1MB - temporary buffers for file reading
- I/O Operations: 4 file reads per collection
- Collection Time: Typically < 5ms
Scalability Considerations
- Connection counting scales linearly with number of connections
- Large servers with thousands of connections may see slightly longer collection times
- No performance degradation with high connection churn rates
Related Collectors
- Network Collector: Provides interface-level statistics (bytes, packets, errors)
- Load Collector: System load can correlate with network activity
- CPU Collector: High CPU usage may indicate network processing overhead
- eBPF Network Collector (planned): Will provide per-connection bandwidth and latency metrics