Memory Technologies Platform Specific Pixie - antimetal/system-agent GitHub Wiki

Pixie

Overview

Pixie is an open-source Kubernetes-native observability platform that provides instant, zero-instrumentation monitoring and debugging capabilities. Originally developed by Pixie Labs and later acquired by New Relic in 2020, Pixie was contributed to the Cloud Native Computing Foundation (CNCF) as a Sandbox project in June 2021.

Key characteristics:

Kubernetes-native observability platform built specifically for container environments
Automatic eBPF instrumentation with no code changes required
Zero manual instrumentation - automatically instruments applications as soon as they start
Language agnostic - works with any programming language or framework
Local data processing - all telemetry data remains within the cluster
Now part of New Relic with both open-source and managed editions available

Performance Characteristics

Overhead: 2-5% CPU typical, often under 2%
Memory Requirements: Minimum 1GiB per node, 2GiB recommended
Accuracy: High - captures full-body requests and responses
False Positives: Low - eBPF provides accurate kernel-level data
Production Ready: Yes - designed for production environments
Platform: Kubernetes only
Java Profiler Overhead: Ultra-low < 0.1% for continuous profiling

Architecture

Pixie employs a unique edge computing architecture that processes data locally within Kubernetes clusters:

Core Components

Vizier (Control Plane)
- Manages Pixie Edge Modules (PEMs)
- Handles query orchestration and metadata management
- Coordinates data collection and processing
Pixie Edge Modules (PEMs)
- Deployed as DaemonSet on each node
- Collect telemetry data using eBPF probes
- Process and store data locally in-memory
- Default memory allocation: 2GiB per PEM
eBPF Probes
- Kernel-level instrumentation
- Automatic application discovery and monitoring
- Captures network traffic, system calls, and application metrics
- No application code changes required

Data Processing Pipeline

Collection: eBPF probes capture telemetry at kernel level
Processing: Edge modules process data locally on each node
Storage: In-memory data tables with configurable retention
Querying: PxL scripts execute distributed queries across the cluster

Data Retention Model

Local Storage: All data stored in-memory on cluster nodes
No External Dependencies: No data sent outside the cluster by default
Memory Allocation: 60% for data storage, 40% for collection
Retention Period: Configurable, typically minutes to hours

System-Agent Implementation Plan

Kubernetes Cluster Requirements

Minimum Requirements:

Kubernetes 1.16+
Linux kernel 4.14+ (for eBPF support)
At least 1GiB memory per node
CPU architecture: x86_64 or ARM64

Recommended Configuration:

Kubernetes 1.20+
Linux kernel 5.4+ (optimal eBPF features)
2GiB+ memory per node
Nodes should have < 25% memory utilization before Pixie installation

Deployment Options

1. CLI Installation (Recommended)

# Install Pixie CLI
bash -c "$(curl -fsSL https://withpixie.ai/install.sh)"

# Deploy to cluster
px deploy --cluster_name=my-cluster

2. Helm Chart Deployment

# Add Pixie Helm repo
helm repo add pixie https://pixie-operator-charts.storage.googleapis.com

# Install Pixie
helm install pixie pixie/pixie-chart \
  --set deployKey=$PIXIE_DEPLOY_KEY \
  --set clusterName=my-cluster

3. Kubectl Manifest

# Apply Pixie operator
kubectl apply -f https://pixie-operator-charts.storage.googleapis.com/latest/pixie_operator.yaml

# Create Vizier custom resource
kubectl apply -f pixie-vizier.yaml

Resource Requirements

Per Node Requirements:

CPU: 100-200m reserved, up to 1000m limit
Memory: 1-2Gi limit (2Gi recommended for production)
Storage: Minimal - uses in-memory storage
Network: Access to Pixie cloud services (for managed version)

Cluster-Level Resources:

Vizier: 1Gi memory, 1000m CPU
PEMs: Scale with node count
Total Overhead: ~2-5% of cluster resources

Key Features

Automatic Service Mapping

Zero-configuration service discovery across the cluster
Real-time service topology visualization
Dependency mapping between microservices
Traffic flow analysis with request/response patterns

Request Tracing

Full-body request/response capture for supported protocols
Unsampled distributed tracing without instrumentation
Protocol support: HTTP/HTTPS, gRPC, DNS, MySQL, PostgreSQL, Redis, Kafka, Cassandra, AMQP
Real-time traffic inspection with filtering capabilities

CPU and Memory Profiling

Continuous profiling with flame graphs
Zero-instrumentation profiling for all languages
CPU hotspot identification without recompilation
Memory usage tracking and leak detection capabilities
Call stack analysis with line-level precision

Network Monitoring

Layer 7 protocol analysis without sidecars
Network policy validation and traffic visualization
Ingress/egress traffic monitoring with full payload capture
DNS query analysis and resolution tracking

No Sidecars Needed

eBPF-based collection eliminates sidecar containers
Reduced resource overhead compared to proxy-based solutions
Simplified deployment model with DaemonSet architecture
Automatic protocol detection and parsing

Production Deployments

Used by Major Companies

Pixie is deployed in production by organizations including:

Technology companies for microservices debugging
Financial services for real-time transaction monitoring
E-commerce platforms for performance optimization
Media companies for streaming service analysis

Kubernetes-Specific Advantages

Native Kubernetes integration with CRD-based management
Pod-aware monitoring with automatic service discovery
Namespace isolation and multi-tenancy support
RBAC integration for secure access control
Helm chart deployment for GitOps workflows

Scale Considerations

Linear scaling with node count
Memory-bound scaling based on traffic volume
Query performance optimized for distributed execution
Data locality ensures consistent performance

Success Stories

99.9% uptime achieved with proactive monitoring
50% reduction in mean time to resolution (MTTR)
Zero application changes required for comprehensive observability
Cross-team collaboration improved with shared debugging interface

Installation

CLI Installation

# Install Pixie CLI
bash -c "$(curl -fsSL https://withpixie.ai/install.sh)"

# Authenticate (for managed version)
px auth login

# Deploy to current kubectl context
px deploy --cluster_name=production-cluster

# Verify installation
px get viziers

Helm Chart Deployment

# values.yaml
deployKey: "your-deploy-key-here"
clusterName: "production-cluster"

vizier:
  pemMemoryLimit: "2Gi"
  dataAccess: "Full"

operator:
  image:
    tag: "latest"

helm install pixie pixie/pixie-chart -f values.yaml

Resource Requirements Planning

# Calculate memory requirements
# Formula: (Number of nodes) × (2Gi per PEM) + 1Gi (Vizier)
# Example 10-node cluster: (10 × 2Gi) + 1Gi = 21Gi total

# CPU requirements
# Formula: (Number of nodes) × (200m baseline) + spikes up to 1000m

Security Considerations

Network policies to restrict Pixie component communication
RBAC configuration for user access control
TLS encryption for all inter-component communication
Data residency - all telemetry stays within cluster
Audit logging integration with Kubernetes audit system

PxL Scripts

PxL (Pixie Language) is a domain-specific language based on Python/Pandas syntax for querying and analyzing telemetry data.

Memory Leak Detection Scripts

Basic Memory Usage Monitoring

import px

# Query memory usage over time
df = px.DataFrame(table='process_stats', start_time='-5m')

# Filter for specific service
df = df[df.ctx['service'] == 'my-service']

# Aggregate memory usage by pod
memory_stats = df.groupby(['pod']).agg(
    avg_memory_mb=('vsize_mb', px.mean),
    max_memory_mb=('vsize_mb', px.max),
    memory_growth=('vsize_mb', px.last) - ('vsize_mb', px.first)
)

px.display(memory_stats)

Memory Growth Detection

import px

# Detect memory growth trends
df = px.DataFrame(table='process_stats', start_time='-30m')

# Calculate memory growth rate
df.memory_growth_rate = (df.vsize_mb - df.vsize_mb.shift()) / df.vsize_mb.shift()

# Flag potential memory leaks (>5% growth rate)
leaks = df[df.memory_growth_rate > 0.05]

# Group by service and pod
leak_summary = leaks.groupby(['service', 'pod']).agg(
    avg_growth_rate=('memory_growth_rate', px.mean),
    peak_memory_mb=('vsize_mb', px.max)
)

px.display(leak_summary)

Custom Monitoring Scripts

Service Health Dashboard

import px

# Multi-dimensional service health
df = px.DataFrame(table='http_events', start_time='-10m')

health_metrics = df.groupby('service').agg(
    request_count=('latency_ns', px.count),
    avg_latency_ms=('latency_ns', px.mean) / 1000000,
    error_rate=px.select(px.equals(px.floor(df.resp_status/100), 5)).mean(),
    memory_usage_mb=('', lambda: px.DataFrame(table='process_stats')
                     .groupby('service')['vsize_mb'].mean())
)

px.display(health_metrics)

Data Export Scripts

import px

# Export metrics for external analysis
def export_metrics(service_name, duration='-1h'):
    """Export comprehensive metrics for a service"""
    
    # HTTP metrics
    http_df = px.DataFrame(table='http_events', start_time=duration)
    http_df = http_df[http_df.ctx['service'] == service_name]
    
    # Process metrics  
    proc_df = px.DataFrame(table='process_stats', start_time=duration)
    proc_df = proc_df[proc_df.ctx['service'] == service_name]
    
    # Network metrics
    net_df = px.DataFrame(table='conn_stats', start_time=duration)
    net_df = net_df[net_df.ctx['service'] == service_name]
    
    return {
        'http_metrics': http_df,
        'process_metrics': proc_df,  
        'network_metrics': net_df
    }

Code Examples

API Usage

import pxapi

# Connect to Pixie cluster
conn = pxapi.Client()

# Execute PxL script
script = """
import px
df = px.DataFrame(table='http_events', start_time='-5m')
px.display(df.groupby('service').agg(
    request_count=('latency_ns', px.count),
    avg_latency=('latency_ns', px.mean)
))
"""

results = conn.execute_script(script)
print(results)

Integration Patterns

# Kubernetes CronJob for periodic analysis
apiVersion: batch/v1
kind: CronJob
metadata:
  name: pixie-memory-analysis
spec:
  schedule: "*/15 * * * *"  # Every 15 minutes
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: analyzer
            image: pixie-analyzer:latest
            command: ["python", "memory_leak_detector.py"]
            env:
            - name: PIXIE_API_KEY
              valueFrom:
                secretKeyRef:
                  name: pixie-credentials
                  key: api-key

Automated Monitoring

#!/usr/bin/env python3
"""
Automated Pixie memory monitoring with alerting
"""
import pxapi
import time
import smtplib
from datetime import datetime

class PixieMemoryMonitor:
    def __init__(self, cluster_id):
        self.conn = pxapi.Client(cluster_id=cluster_id)
        self.memory_threshold = 90  # Percent
        
    def check_memory_usage(self):
        script = """
        import px
        df = px.DataFrame(table='process_stats', start_time='-5m')
        
        memory_stats = df.groupby(['service', 'pod']).agg(
            current_memory_mb=('vsize_mb', px.last),
            memory_limit_mb=('memory_limit_bytes', px.last) / 1024 / 1024
        )
        
        memory_stats.memory_usage_pct = (
            memory_stats.current_memory_mb / memory_stats.memory_limit_mb * 100
        )
        
        px.display(memory_stats[memory_stats.memory_usage_pct > 80])
        """
        
        return self.conn.execute_script(script)
    
    def send_alert(self, high_memory_services):
        # Alert logic here
        pass
    
    def run_monitoring_loop(self):
        while True:
            results = self.check_memory_usage()
            if len(results) > 0:
                self.send_alert(results)
            time.sleep(300)  # 5-minute intervals

if __name__ == "__main__":
    monitor = PixieMemoryMonitor("prod-cluster")
    monitor.run_monitoring_loop()

Monitoring & Alerting

Memory Growth Patterns

# PxL script for identifying memory growth patterns
import px

def detect_memory_patterns(service_name, lookback='-2h'):
    """Detect memory allocation patterns and potential leaks"""
    
    df = px.DataFrame(table='process_stats', start_time=lookback)
    df = df[df.ctx['service'] == service_name]
    
    # Calculate moving averages
    df.memory_ma_5m = df.vsize_mb.rolling('5m').mean()
    df.memory_ma_15m = df.vsize_mb.rolling('15m').mean()
    
    # Identify growth trends
    df.growth_trend = (df.memory_ma_5m > df.memory_ma_15m)
    
    # Memory leak indicators
    leak_indicators = df.groupby('pod').agg(
        sustained_growth=('growth_trend', px.sum),
        max_memory_mb=('vsize_mb', px.max),
        memory_variance=('vsize_mb', px.var)
    )
    
    return leak_indicators[leak_indicators.sustained_growth > 5]

Service-Level Monitoring

# Comprehensive service health monitoring
import px

def service_health_check(service_filter=''):
    """Generate comprehensive service health report"""
    
    # HTTP performance metrics
    http_df = px.DataFrame(table='http_events', start_time='-15m')
    if service_filter:
        http_df = http_df[http_df.ctx['service'].contains(service_filter)]
    
    http_metrics = http_df.groupby('service').agg(
        request_count=('latency_ns', px.count),
        avg_latency_ms=('latency_ns', px.mean) / 1000000,
        p99_latency_ms=('latency_ns', px.quantile, 0.99) / 1000000,
        error_rate=px.equals(px.floor(http_df.resp_status/100), 5).mean()
    )
    
    # Memory metrics
    proc_df = px.DataFrame(table='process_stats', start_time='-15m')
    if service_filter:
        proc_df = proc_df[proc_df.ctx['service'].contains(service_filter)]
    
    memory_metrics = proc_df.groupby('service').agg(
        avg_memory_mb=('vsize_mb', px.mean),
        max_memory_mb=('vsize_mb', px.max),
        cpu_usage_pct=('cpu_usage_pct', px.mean)
    )
    
    # Combine metrics
    combined = http_metrics.merge(memory_metrics, on='service', how='outer')
    
    px.display(combined)

Alert Integration

Prometheus Integration

# ServiceMonitor for Pixie metrics export
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: pixie-metrics
spec:
  selector:
    matchLabels:
      app: pixie-prometheus-exporter
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

Custom Webhook Alerts

import requests
import json

def send_pixie_alert(service, memory_usage, threshold):
    """Send alert to webhook endpoint"""
    
    alert_data = {
        "text": f"Memory Alert: {service}",
        "attachments": [{
            "color": "danger",
            "fields": [{
                "title": "Service",
                "value": service,
                "short": True
            }, {
                "title": "Memory Usage",
                "value": f"{memory_usage}MB ({memory_usage/threshold*100:.1f}%)",
                "short": True
            }]
        }]
    }
    
    requests.post(
        "https://hooks.slack.com/services/YOUR/WEBHOOK/URL",
        json=alert_data
    )

Baseline Establishment

# Establish performance baselines
import px
from datetime import datetime, timedelta

def establish_baseline(service_name, days_back=7):
    """Establish performance baseline over historical period"""
    
    # Query historical data
    lookback = f'-{days_back}d'
    df = px.DataFrame(table='process_stats', start_time=lookback)
    df = df[df.ctx['service'] == service_name]
    
    # Calculate baseline metrics
    baseline = df.groupby(['service', 'hour']).agg(
        baseline_memory_mb=('vsize_mb', px.mean),
        memory_p95=('vsize_mb', px.quantile, 0.95),
        memory_stddev=('vsize_mb', px.std),
        baseline_cpu_pct=('cpu_usage_pct', px.mean)
    )
    
    return baseline

Comparison with Alternatives

vs Parca: Kubernetes-Specific Features

Feature	Pixie	Parca
Scope	Full observability platform	Continuous profiling focused
Data Coverage	Metrics, traces, logs, profiles	CPU/memory profiling only
Protocol Support	HTTP, gRPC, DNS, MySQL, etc.	Not applicable
Service Discovery	Automatic Kubernetes-native	Manual configuration
Query Language	PxL (Pythonic)	PromQL-style queries
Storage	In-memory, ephemeral	Persistent storage
Deployment	DaemonSet + Operator	Single binary deployment

When to Choose Pixie:

Need comprehensive observability beyond profiling
Want zero-instrumentation application monitoring
Require real-time debugging capabilities
Need service topology and dependency mapping

When to Choose Parca:

Focus specifically on continuous profiling
Need long-term profile data retention
Want lightweight profiling-only solution
Require detailed code-level analysis

vs Traditional APM: No Instrumentation Advantage

Pixie Advantages:

Zero code changes required for deployment
Language agnostic - works with any runtime
Real-time insights without sampling
Full request/response capture for debugging
No performance impact from instrumentation libraries

Traditional APM Limitations:

Requires SDK integration and code changes
Language-specific instrumentation overhead
Sampling can miss critical events
Limited visibility into system-level interactions
Deployment complexity with legacy applications

Advantages in Kubernetes Environments

Native Integration
- Built specifically for Kubernetes architecture
- Understands pods, services, and namespaces natively
- Automatic service discovery and mapping
eBPF Capabilities
- Kernel-level visibility without application changes
- Network traffic analysis at Layer 7
- System call and resource monitoring
Edge Computing Architecture
- Data processing happens locally in cluster
- No external dependencies for basic functionality
- Reduced latency and improved security
Developer Experience
- Instant debugging without redeployment
- Interactive query interface (Live UI)
- Collaborative debugging with team access

Repository & Documentation

Primary Resources

Official Website: px.dev
Main Documentation: docs.px.dev
GitHub Repository: pixie-io/pixie

New Relic Integration

New Relic Pixie Platform: newrelic.com/platform/kubernetes-pixie
Integration Documentation: New Relic Docs - Pixie
Guided Installation: Available through New Relic CLI

Community Resources

PxL Script Repository: GitHub - pxl_scripts
Community Scripts: Available in Pixie Live UI under px/ namespace
Tutorials: docs.px.dev/tutorials
API Documentation: docs.px.dev/reference

Available Editions

Pixie Core (Open Source)
- Self-managed deployment
- Full observability capabilities
- Community support
Pixie by New Relic (Managed)
- Fully managed service
- New Relic One integration
- Enterprise support
Pixie Enterprise Edition
- Industry-specific compliance features
- Advanced security controls
- Professional services support

Getting Started Resources

Quick Start Guide: docs.px.dev/installing-pixie
PxL Language Tutorial: docs.px.dev/tutorials/pxl-scripts
API Client Libraries: Available for Python, Go, and JavaScript
Grafana Plugin: Pixie Grafana Datasource

Last updated: 2024