Memory Technologies Platform Specific Pixie - antimetal/system-agent GitHub Wiki
Pixie
Overview
Pixie is an open-source Kubernetes-native observability platform that provides instant, zero-instrumentation monitoring and debugging capabilities. Originally developed by Pixie Labs and later acquired by New Relic in 2020, Pixie was contributed to the Cloud Native Computing Foundation (CNCF) as a Sandbox project in June 2021.
Key characteristics:
- Kubernetes-native observability platform built specifically for container environments
- Automatic eBPF instrumentation with no code changes required
- Zero manual instrumentation - automatically instruments applications as soon as they start
- Language agnostic - works with any programming language or framework
- Local data processing - all telemetry data remains within the cluster
- Now part of New Relic with both open-source and managed editions available
Performance Characteristics
- Overhead: 2-5% CPU typical, often under 2%
- Memory Requirements: Minimum 1GiB per node, 2GiB recommended
- Accuracy: High - captures full-body requests and responses
- False Positives: Low - eBPF provides accurate kernel-level data
- Production Ready: Yes - designed for production environments
- Platform: Kubernetes only
- Java Profiler Overhead: Ultra-low < 0.1% for continuous profiling
Architecture
Pixie employs a unique edge computing architecture that processes data locally within Kubernetes clusters:
Core Components
-
Vizier (Control Plane)
- Manages Pixie Edge Modules (PEMs)
- Handles query orchestration and metadata management
- Coordinates data collection and processing
-
Pixie Edge Modules (PEMs)
- Deployed as DaemonSet on each node
- Collect telemetry data using eBPF probes
- Process and store data locally in-memory
- Default memory allocation: 2GiB per PEM
-
eBPF Probes
- Kernel-level instrumentation
- Automatic application discovery and monitoring
- Captures network traffic, system calls, and application metrics
- No application code changes required
Data Processing Pipeline
- Collection: eBPF probes capture telemetry at kernel level
- Processing: Edge modules process data locally on each node
- Storage: In-memory data tables with configurable retention
- Querying: PxL scripts execute distributed queries across the cluster
Data Retention Model
- Local Storage: All data stored in-memory on cluster nodes
- No External Dependencies: No data sent outside the cluster by default
- Memory Allocation: 60% for data storage, 40% for collection
- Retention Period: Configurable, typically minutes to hours
System-Agent Implementation Plan
Kubernetes Cluster Requirements
Minimum Requirements:
- Kubernetes 1.16+
- Linux kernel 4.14+ (for eBPF support)
- At least 1GiB memory per node
- CPU architecture: x86_64 or ARM64
Recommended Configuration:
- Kubernetes 1.20+
- Linux kernel 5.4+ (optimal eBPF features)
- 2GiB+ memory per node
- Nodes should have < 25% memory utilization before Pixie installation
Deployment Options
1. CLI Installation (Recommended)
# Install Pixie CLI
bash -c "$(curl -fsSL https://withpixie.ai/install.sh)"
# Deploy to cluster
px deploy --cluster_name=my-cluster
2. Helm Chart Deployment
# Add Pixie Helm repo
helm repo add pixie https://pixie-operator-charts.storage.googleapis.com
# Install Pixie
helm install pixie pixie/pixie-chart \
--set deployKey=$PIXIE_DEPLOY_KEY \
--set clusterName=my-cluster
3. Kubectl Manifest
# Apply Pixie operator
kubectl apply -f https://pixie-operator-charts.storage.googleapis.com/latest/pixie_operator.yaml
# Create Vizier custom resource
kubectl apply -f pixie-vizier.yaml
Resource Requirements
Per Node Requirements:
- CPU: 100-200m reserved, up to 1000m limit
- Memory: 1-2Gi limit (2Gi recommended for production)
- Storage: Minimal - uses in-memory storage
- Network: Access to Pixie cloud services (for managed version)
Cluster-Level Resources:
- Vizier: 1Gi memory, 1000m CPU
- PEMs: Scale with node count
- Total Overhead: ~2-5% of cluster resources
Key Features
Automatic Service Mapping
- Zero-configuration service discovery across the cluster
- Real-time service topology visualization
- Dependency mapping between microservices
- Traffic flow analysis with request/response patterns
Request Tracing
- Full-body request/response capture for supported protocols
- Unsampled distributed tracing without instrumentation
- Protocol support: HTTP/HTTPS, gRPC, DNS, MySQL, PostgreSQL, Redis, Kafka, Cassandra, AMQP
- Real-time traffic inspection with filtering capabilities
CPU and Memory Profiling
- Continuous profiling with flame graphs
- Zero-instrumentation profiling for all languages
- CPU hotspot identification without recompilation
- Memory usage tracking and leak detection capabilities
- Call stack analysis with line-level precision
Network Monitoring
- Layer 7 protocol analysis without sidecars
- Network policy validation and traffic visualization
- Ingress/egress traffic monitoring with full payload capture
- DNS query analysis and resolution tracking
No Sidecars Needed
- eBPF-based collection eliminates sidecar containers
- Reduced resource overhead compared to proxy-based solutions
- Simplified deployment model with DaemonSet architecture
- Automatic protocol detection and parsing
Production Deployments
Used by Major Companies
Pixie is deployed in production by organizations including:
- Technology companies for microservices debugging
- Financial services for real-time transaction monitoring
- E-commerce platforms for performance optimization
- Media companies for streaming service analysis
Kubernetes-Specific Advantages
- Native Kubernetes integration with CRD-based management
- Pod-aware monitoring with automatic service discovery
- Namespace isolation and multi-tenancy support
- RBAC integration for secure access control
- Helm chart deployment for GitOps workflows
Scale Considerations
- Linear scaling with node count
- Memory-bound scaling based on traffic volume
- Query performance optimized for distributed execution
- Data locality ensures consistent performance
Success Stories
- 99.9% uptime achieved with proactive monitoring
- 50% reduction in mean time to resolution (MTTR)
- Zero application changes required for comprehensive observability
- Cross-team collaboration improved with shared debugging interface
Installation
CLI Installation
# Install Pixie CLI
bash -c "$(curl -fsSL https://withpixie.ai/install.sh)"
# Authenticate (for managed version)
px auth login
# Deploy to current kubectl context
px deploy --cluster_name=production-cluster
# Verify installation
px get viziers
Helm Chart Deployment
# values.yaml
deployKey: "your-deploy-key-here"
clusterName: "production-cluster"
vizier:
pemMemoryLimit: "2Gi"
dataAccess: "Full"
operator:
image:
tag: "latest"
helm install pixie pixie/pixie-chart -f values.yaml
Resource Requirements Planning
# Calculate memory requirements
# Formula: (Number of nodes) × (2Gi per PEM) + 1Gi (Vizier)
# Example 10-node cluster: (10 × 2Gi) + 1Gi = 21Gi total
# CPU requirements
# Formula: (Number of nodes) × (200m baseline) + spikes up to 1000m
Security Considerations
- Network policies to restrict Pixie component communication
- RBAC configuration for user access control
- TLS encryption for all inter-component communication
- Data residency - all telemetry stays within cluster
- Audit logging integration with Kubernetes audit system
PxL Scripts
PxL (Pixie Language) is a domain-specific language based on Python/Pandas syntax for querying and analyzing telemetry data.
Memory Leak Detection Scripts
Basic Memory Usage Monitoring
import px
# Query memory usage over time
df = px.DataFrame(table='process_stats', start_time='-5m')
# Filter for specific service
df = df[df.ctx['service'] == 'my-service']
# Aggregate memory usage by pod
memory_stats = df.groupby(['pod']).agg(
avg_memory_mb=('vsize_mb', px.mean),
max_memory_mb=('vsize_mb', px.max),
memory_growth=('vsize_mb', px.last) - ('vsize_mb', px.first)
)
px.display(memory_stats)
Memory Growth Detection
import px
# Detect memory growth trends
df = px.DataFrame(table='process_stats', start_time='-30m')
# Calculate memory growth rate
df.memory_growth_rate = (df.vsize_mb - df.vsize_mb.shift()) / df.vsize_mb.shift()
# Flag potential memory leaks (>5% growth rate)
leaks = df[df.memory_growth_rate > 0.05]
# Group by service and pod
leak_summary = leaks.groupby(['service', 'pod']).agg(
avg_growth_rate=('memory_growth_rate', px.mean),
peak_memory_mb=('vsize_mb', px.max)
)
px.display(leak_summary)
Custom Monitoring Scripts
Service Health Dashboard
import px
# Multi-dimensional service health
df = px.DataFrame(table='http_events', start_time='-10m')
health_metrics = df.groupby('service').agg(
request_count=('latency_ns', px.count),
avg_latency_ms=('latency_ns', px.mean) / 1000000,
error_rate=px.select(px.equals(px.floor(df.resp_status/100), 5)).mean(),
memory_usage_mb=('', lambda: px.DataFrame(table='process_stats')
.groupby('service')['vsize_mb'].mean())
)
px.display(health_metrics)
Data Export Scripts
import px
# Export metrics for external analysis
def export_metrics(service_name, duration='-1h'):
"""Export comprehensive metrics for a service"""
# HTTP metrics
http_df = px.DataFrame(table='http_events', start_time=duration)
http_df = http_df[http_df.ctx['service'] == service_name]
# Process metrics
proc_df = px.DataFrame(table='process_stats', start_time=duration)
proc_df = proc_df[proc_df.ctx['service'] == service_name]
# Network metrics
net_df = px.DataFrame(table='conn_stats', start_time=duration)
net_df = net_df[net_df.ctx['service'] == service_name]
return {
'http_metrics': http_df,
'process_metrics': proc_df,
'network_metrics': net_df
}
Code Examples
API Usage
import pxapi
# Connect to Pixie cluster
conn = pxapi.Client()
# Execute PxL script
script = """
import px
df = px.DataFrame(table='http_events', start_time='-5m')
px.display(df.groupby('service').agg(
request_count=('latency_ns', px.count),
avg_latency=('latency_ns', px.mean)
))
"""
results = conn.execute_script(script)
print(results)
Integration Patterns
# Kubernetes CronJob for periodic analysis
apiVersion: batch/v1
kind: CronJob
metadata:
name: pixie-memory-analysis
spec:
schedule: "*/15 * * * *" # Every 15 minutes
jobTemplate:
spec:
template:
spec:
containers:
- name: analyzer
image: pixie-analyzer:latest
command: ["python", "memory_leak_detector.py"]
env:
- name: PIXIE_API_KEY
valueFrom:
secretKeyRef:
name: pixie-credentials
key: api-key
Automated Monitoring
#!/usr/bin/env python3
"""
Automated Pixie memory monitoring with alerting
"""
import pxapi
import time
import smtplib
from datetime import datetime
class PixieMemoryMonitor:
def __init__(self, cluster_id):
self.conn = pxapi.Client(cluster_id=cluster_id)
self.memory_threshold = 90 # Percent
def check_memory_usage(self):
script = """
import px
df = px.DataFrame(table='process_stats', start_time='-5m')
memory_stats = df.groupby(['service', 'pod']).agg(
current_memory_mb=('vsize_mb', px.last),
memory_limit_mb=('memory_limit_bytes', px.last) / 1024 / 1024
)
memory_stats.memory_usage_pct = (
memory_stats.current_memory_mb / memory_stats.memory_limit_mb * 100
)
px.display(memory_stats[memory_stats.memory_usage_pct > 80])
"""
return self.conn.execute_script(script)
def send_alert(self, high_memory_services):
# Alert logic here
pass
def run_monitoring_loop(self):
while True:
results = self.check_memory_usage()
if len(results) > 0:
self.send_alert(results)
time.sleep(300) # 5-minute intervals
if __name__ == "__main__":
monitor = PixieMemoryMonitor("prod-cluster")
monitor.run_monitoring_loop()
Monitoring & Alerting
Memory Growth Patterns
# PxL script for identifying memory growth patterns
import px
def detect_memory_patterns(service_name, lookback='-2h'):
"""Detect memory allocation patterns and potential leaks"""
df = px.DataFrame(table='process_stats', start_time=lookback)
df = df[df.ctx['service'] == service_name]
# Calculate moving averages
df.memory_ma_5m = df.vsize_mb.rolling('5m').mean()
df.memory_ma_15m = df.vsize_mb.rolling('15m').mean()
# Identify growth trends
df.growth_trend = (df.memory_ma_5m > df.memory_ma_15m)
# Memory leak indicators
leak_indicators = df.groupby('pod').agg(
sustained_growth=('growth_trend', px.sum),
max_memory_mb=('vsize_mb', px.max),
memory_variance=('vsize_mb', px.var)
)
return leak_indicators[leak_indicators.sustained_growth > 5]
Service-Level Monitoring
# Comprehensive service health monitoring
import px
def service_health_check(service_filter=''):
"""Generate comprehensive service health report"""
# HTTP performance metrics
http_df = px.DataFrame(table='http_events', start_time='-15m')
if service_filter:
http_df = http_df[http_df.ctx['service'].contains(service_filter)]
http_metrics = http_df.groupby('service').agg(
request_count=('latency_ns', px.count),
avg_latency_ms=('latency_ns', px.mean) / 1000000,
p99_latency_ms=('latency_ns', px.quantile, 0.99) / 1000000,
error_rate=px.equals(px.floor(http_df.resp_status/100), 5).mean()
)
# Memory metrics
proc_df = px.DataFrame(table='process_stats', start_time='-15m')
if service_filter:
proc_df = proc_df[proc_df.ctx['service'].contains(service_filter)]
memory_metrics = proc_df.groupby('service').agg(
avg_memory_mb=('vsize_mb', px.mean),
max_memory_mb=('vsize_mb', px.max),
cpu_usage_pct=('cpu_usage_pct', px.mean)
)
# Combine metrics
combined = http_metrics.merge(memory_metrics, on='service', how='outer')
px.display(combined)
Alert Integration
Prometheus Integration
# ServiceMonitor for Pixie metrics export
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: pixie-metrics
spec:
selector:
matchLabels:
app: pixie-prometheus-exporter
endpoints:
- port: metrics
interval: 30s
path: /metrics
Custom Webhook Alerts
import requests
import json
def send_pixie_alert(service, memory_usage, threshold):
"""Send alert to webhook endpoint"""
alert_data = {
"text": f"Memory Alert: {service}",
"attachments": [{
"color": "danger",
"fields": [{
"title": "Service",
"value": service,
"short": True
}, {
"title": "Memory Usage",
"value": f"{memory_usage}MB ({memory_usage/threshold*100:.1f}%)",
"short": True
}]
}]
}
requests.post(
"https://hooks.slack.com/services/YOUR/WEBHOOK/URL",
json=alert_data
)
Baseline Establishment
# Establish performance baselines
import px
from datetime import datetime, timedelta
def establish_baseline(service_name, days_back=7):
"""Establish performance baseline over historical period"""
# Query historical data
lookback = f'-{days_back}d'
df = px.DataFrame(table='process_stats', start_time=lookback)
df = df[df.ctx['service'] == service_name]
# Calculate baseline metrics
baseline = df.groupby(['service', 'hour']).agg(
baseline_memory_mb=('vsize_mb', px.mean),
memory_p95=('vsize_mb', px.quantile, 0.95),
memory_stddev=('vsize_mb', px.std),
baseline_cpu_pct=('cpu_usage_pct', px.mean)
)
return baseline
Comparison with Alternatives
vs Parca: Kubernetes-Specific Features
Feature | Pixie | Parca |
---|---|---|
Scope | Full observability platform | Continuous profiling focused |
Data Coverage | Metrics, traces, logs, profiles | CPU/memory profiling only |
Protocol Support | HTTP, gRPC, DNS, MySQL, etc. | Not applicable |
Service Discovery | Automatic Kubernetes-native | Manual configuration |
Query Language | PxL (Pythonic) | PromQL-style queries |
Storage | In-memory, ephemeral | Persistent storage |
Deployment | DaemonSet + Operator | Single binary deployment |
When to Choose Pixie:
- Need comprehensive observability beyond profiling
- Want zero-instrumentation application monitoring
- Require real-time debugging capabilities
- Need service topology and dependency mapping
When to Choose Parca:
- Focus specifically on continuous profiling
- Need long-term profile data retention
- Want lightweight profiling-only solution
- Require detailed code-level analysis
vs Traditional APM: No Instrumentation Advantage
Pixie Advantages:
- Zero code changes required for deployment
- Language agnostic - works with any runtime
- Real-time insights without sampling
- Full request/response capture for debugging
- No performance impact from instrumentation libraries
Traditional APM Limitations:
- Requires SDK integration and code changes
- Language-specific instrumentation overhead
- Sampling can miss critical events
- Limited visibility into system-level interactions
- Deployment complexity with legacy applications
Advantages in Kubernetes Environments
-
Native Integration
- Built specifically for Kubernetes architecture
- Understands pods, services, and namespaces natively
- Automatic service discovery and mapping
-
eBPF Capabilities
- Kernel-level visibility without application changes
- Network traffic analysis at Layer 7
- System call and resource monitoring
-
Edge Computing Architecture
- Data processing happens locally in cluster
- No external dependencies for basic functionality
- Reduced latency and improved security
-
Developer Experience
- Instant debugging without redeployment
- Interactive query interface (Live UI)
- Collaborative debugging with team access
Repository & Documentation
Primary Resources
- Official Website: px.dev
- Main Documentation: docs.px.dev
- GitHub Repository: pixie-io/pixie
New Relic Integration
- New Relic Pixie Platform: newrelic.com/platform/kubernetes-pixie
- Integration Documentation: New Relic Docs - Pixie
- Guided Installation: Available through New Relic CLI
Community Resources
- PxL Script Repository: GitHub - pxl_scripts
- Community Scripts: Available in Pixie Live UI under
px/
namespace - Tutorials: docs.px.dev/tutorials
- API Documentation: docs.px.dev/reference
Available Editions
-
Pixie Core (Open Source)
- Self-managed deployment
- Full observability capabilities
- Community support
-
Pixie by New Relic (Managed)
- Fully managed service
- New Relic One integration
- Enterprise support
-
Pixie Enterprise Edition
- Industry-specific compliance features
- Advanced security controls
- Professional services support
Getting Started Resources
- Quick Start Guide: docs.px.dev/installing-pixie
- PxL Language Tutorial: docs.px.dev/tutorials/pxl-scripts
- API Client Libraries: Available for Python, Go, and JavaScript
- Grafana Plugin: Pixie Grafana Datasource
Last updated: 2024