CLAUDE - antimetal/system-agent GitHub Wiki

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a documentation wiki repository for the Antimetal System Agent, a Kubernetes controller that monitors infrastructure and streams data to the Antimetal platform. The repository contains comprehensive documentation in Markdown format covering architecture, deployment, API reference, and development guides.

Project: https://github.com/antimetal/system-agent
Wiki: https://github.com/antimetal/system-agent/wiki

Important: This is a documentation-only repository. There is no source code, build system, or executable components here.

See Home.md for complete navigation and organization of all documentation.

Development Workflow

Since this is a documentation repository, development consists of:

  1. Content Updates: Editing existing .md files
  2. New Documentation: Adding new .md files for features
  3. Cross-References: Maintaining internal links between documents
  4. Consistency: Following established documentation patterns

Wiki Conventions

  • Wiki structure follows GitHub Wiki conventions with .md files
  • Internal wiki links should NOT include .md extension (use [Architecture Overview](Architecture-Overview) format)
  • Wiki is published at: https://github.com/antimetal/system-agent/wiki

Common Tasks

Editing Documentation

  • All files are in Markdown format using GitHub flavor
  • Use consistent heading hierarchy (H1 for titles, H2 for major sections)
  • Include code examples with proper syntax highlighting
  • Cross-reference related documentation using relative links

Adding New Documentation

  • Follow naming convention: Component-Name.md or Feature-Name.md
  • Add entry to Home.md navigation structure
  • Include standard sections: Overview, Technical Details, Examples, Next Steps

Internal Navigation

  • Use relative links: [Architecture Overview](Architecture-Overview)
  • Maintain bidirectional links between related documents
  • Update Home.md when adding new major documentation

Documentation Patterns

Standard Structure

# Title

## Overview
Brief description and purpose

## Technical Details
Implementation specifics, interfaces, data structures

## Examples
Code samples and usage patterns

## Configuration
Available settings and options

## Next Steps
Links to related documentation

Code Examples

Use proper language tags for syntax highlighting:

  • go for Go code
  • yaml for YAML configuration
  • bash for shell commands
  • protobuf for Protocol Buffer definitions

Cross-References

  • Link to specific sections: [Custom Collectors](Custom-Collectors)
  • Reference line numbers when applicable: architecture.go:125
  • Use descriptive link text, avoid "click here"
  • For GitHub source code references, use full URLs: https://github.com/antimetal/system-agent/blob/main/pkg/performance/collectors/cpu.go
  • When referencing specific line numbers in source code, use GitHub's line number format: https://github.com/antimetal/system-agent/blob/main/pkg/performance/types.go#L408-L447

System Agent Architecture (Documentation Context)

The documented system follows these key patterns:

Component Architecture

  • Event-Driven: Components communicate through channels and events
  • Interface-Based: Clean abstractions (Provider, Collector interfaces)
  • Pluggable: Extensible collector and provider systems
  • Cloud-Agnostic: Supports EKS, GKE, AKS, KIND

Key Interfaces

// Cloud Provider Interface
type Provider interface {
    Name() string
    ClusterName(ctx context.Context) (string, error)
    Region(ctx context.Context) (string, error)
}

// Collector Interface
type Collector interface {
    Name() string
    Collect(ctx context.Context) (any, error)
}

Performance Collectors

  • PointCollector: One-shot data collection from /proc, /sys
  • ContinuousCollector: Streaming collection with lifecycle management
  • Data Sources: Linux kernel interfaces, eBPF programs
  • Metrics: CPU, Memory, Network, Disk, NUMA, Process stats

Data Flow

Kubernetes API --> Controller --> Resource Store --> Intake Worker --> gRPC --> Platform
Linux Kernel --> Collectors --> Performance Manager --> Metrics Store

Documentation Quality Guidelines

  • Accuracy: Keep technical details current with actual implementation
  • Completeness: Cover all configuration options and use cases
  • Clarity: Use clear, concise language with practical examples
  • Navigation: Maintain logical flow between related topics
  • Updates: Timestamp significant changes and maintain version references

When updating documentation, ensure consistency with the established patterns and maintain the comprehensive cross-reference structure that makes this wiki navigable and useful for developers and operators.

Documentation Maintenance Tools

This repository includes automated tools for maintaining documentation quality. Use these tools to ensure consistency and catch issues early.

Available Tools

The following tools are configured and ready to use:

  1. markdown-link-check - Validates internal and external links
  2. markdownlint - Enforces consistent formatting
  3. Vale - Style guide and terminology checking (requires initial setup)
  4. Custom Wiki Analyzer - Structure validation and broken link detection (see details below)

Wiki Analyzer Tool

The custom wiki analyzer (scripts/analyze-wiki.js) provides comprehensive wiki health analysis:

Features:

  • Detects broken internal links (missing target pages)
  • Identifies orphaned pages (not linked from anywhere)
  • Provides backlink analysis
  • Generates actionable recommendations

Output includes:

  • Total file count and link statistics
  • List of broken links with source locations
  • Orphaned files that need linking
  • Specific recommendations for improvements

Best Practices for Tool Usage

Before Making Changes

# Run comprehensive analysis to understand current state
npm run analyze:structure

# Check existing issues
npm run check:all

During Development

# Fix formatting issues automatically as you work
npm run lint:markdown:fix

# Check links after adding new ones
npm run check:links

# Validate Mermaid diagrams after adding/modifying
npm run check:mermaid  # All diagrams
npm run check:mermaid:single filename.md  # Single file

Before Committing

# Final validation - all checks must pass
npm run check:all

# If issues found, fix them before committing
npm run lint:markdown:fix  # Auto-fix formatting
# Manually fix broken links and style issues

Rules for Tool Integration

1. Always Use Tools Before Committing

  • Rule: Run npm run check:all before every commit
  • Rationale: Prevents broken links and formatting issues from entering the repository
  • Exception: Emergency fixes can be committed with issues, but must be fixed in follow-up commit

2. Fix Formatting Issues Automatically

  • Rule: Use npm run lint:markdown:fix to auto-resolve formatting
  • Rationale: Maintains consistency without manual effort
  • Exception: Review auto-fixes to ensure they don't break intentional formatting

3. Validate New Links Immediately

  • Rule: After adding any internal link, run npm run check:links:single filename.md
  • Rationale: Catches typos and missing files before they propagate
  • Exception: External links may be temporarily unreachable

4. Address Broken Links Systematically

  • Rule: When adding new pages referenced elsewhere, create placeholder content immediately
  • Rationale: Prevents accumulation of broken links
  • Template: Use minimal structure with "TODO" sections rather than empty files

5. Use Structure Analysis for Major Changes

  • Rule: Run npm run analyze:structure before reorganizing navigation or adding multiple pages
  • Rationale: Provides comprehensive view of wiki health and relationships
  • Action: Review orphaned files and missing references before proceeding

6. Terminology Consistency

  • Rule: Follow Vale suggestions for technical terminology
  • Standard Terms:
    • "Kubernetes" not "k8s" or "K8S"
    • "gRPC" not "grpc" or "GRPC"
    • "System Agent" not "system-agent"
    • "eBPF" not "ebpf" or "EBPF"
  • Override: Technical accuracy takes precedence over style rules

7. Link Format Standards

  • Rule: Internal wiki links must not include .md extension
  • Format: Use actual page names, like [CPU Collector](CPU-Collector) without .md extension
  • Rationale: GitHub Wiki requires this format for proper navigation
  • Common Mistakes:
    • Wrong: [Link](./CPU-Collector.md) - Don't use file paths
    • Wrong: [Link](CPU-Collector.md) - Don't include .md extension
    • Right: [Link](CPU-Collector) - Just the page name

8. Handle Tool Conflicts Gracefully

  • Rule: If tools conflict (e.g., line length vs readability), document exceptions
  • Process: Add exceptions to configuration files rather than ignoring tools
  • Documentation: Comment exceptions in config files with rationale

9. Validate Mermaid Diagrams

  • Rule: All Mermaid diagrams must be valid and render correctly
  • Tool: Use npm run check:mermaid to validate syntax
  • Common Issues:
    • Missing quotes around node labels with special characters
    • Incorrect arrow syntax (use --> not ->)
    • Unbalanced brackets or parentheses
  • Testing: Preview diagrams locally before committing

Workflow Integration

Local Development Workflow

# 1. Before starting work
npm run check:all                    # Understand current state

# 2. During development
npm run lint:markdown:fix            # Fix formatting as you go
npm run check:links:single file.md   # Validate new links

# 3. Before committing
npm run check:all                    # Comprehensive validation
npm run analyze:structure            # Check overall wiki health

# 4. Address any issues found
npm run lint:markdown:fix            # Auto-fix what you can
# Manually resolve broken links, style issues, structure problems

GitHub Actions Integration

  • Automatic: All tools run on every push/PR via .github/workflows/docs.yml
  • Enforcement: PRs should not be merged with failing documentation checks
  • Reports: Check Actions tab for detailed results and artifacts

Tool Configuration Management

  • Customization: Modify .markdownlint.json, .vale.ini, etc. as wiki evolves
  • Version Control: All tool configurations are version controlled
  • Documentation: Changes to tool behavior should be documented in this file

Troubleshooting Common Issues

Vale Setup

Vale requires both the Vale CLI and style guides to be installed:

# Install Vale CLI (choose one method):
# macOS with Homebrew:
brew install vale

# Or download from: https://github.com/errata-ai/vale/releases

# Install style guides (automated):
./scripts/install-vale-styles.sh

# Or manually install Vale styles:
vale sync  # If Vale supports it
# OR
cd .vale/styles
git clone https://github.com/errata-ai/Microsoft.git
git clone https://github.com/errata-ai/write-good.git

Note: Vale is optional for local development but used in CI/CD.

False Positive Links

Common scenarios and solutions:

Temporary External URLs:

// Add to .markdown-link-check.json
{
  "ignorePatterns": [
    {"pattern": "^http://localhost"},
    {"pattern": "^https://internal.company.com"}
  ]
}

Dynamic or Protected URLs:

// Add specific timeout and retry settings
{
  "timeout": "20s",
  "retryOn429": true,
  "aliveStatusCodes": [200, 206, 301, 302, 403, 999]
}

Wiki-Internal Links:

  • Ensure no .md extension in links
  • Use exact page names (case-sensitive)
  • Check for special characters in page names

Style Guide Conflicts

# Disable specific Vale rules in .vale.ini
write-good.TooWordy = NO

Formatting Conflicts

# Adjust markdownlint rules in .markdownlint.json
{
  "MD013": {"line_length": 120}  # Adjust line length
}

When to Override Tools

Acceptable Overrides

  1. Technical Accuracy: Code examples that violate formatting for clarity
  2. External Dependencies: Links to third-party services that may be temporarily down
  3. Legacy Content: Existing documentation that would require major restructuring

Unacceptable Overrides

  1. Convenience: Skipping checks because they take time
  2. Disagreement: Ignoring style rules due to personal preference
  3. Accumulation: Allowing multiple violations to build up over time

Tool Maintenance

Regular Tasks

  • Monthly: Review and update Vale style guides
  • Quarterly: Update tool versions in package.json
  • As Needed: Adjust configurations based on false positives

When Adding New Tools

  1. Add configuration files to repository
  2. Update GitHub Actions workflow
  3. Add npm scripts to package.json
  4. Document usage in this section
  5. Test thoroughly before deploying

Pre-commit Hook Setup (Optional)

To automatically run documentation checks before committing:

# Install husky for git hooks
npm install --save-dev husky

# Initialize husky
npx husky install

# Add pre-commit hook
npx husky add .husky/pre-commit "npm run check:all"

# Make hook executable
chmod +x .husky/pre-commit

Note: Pre-commit hooks can slow down commits. Consider using:

  • npm run lint:markdown only for faster checks
  • --no-verify flag to skip hooks when needed
  • GitHub Actions for comprehensive validation

This systematic approach ensures documentation quality while maintaining development velocity.

Additional Documentation Best Practices

Creating New Documentation Pages

When creating new documentation pages, especially those referenced by existing pages:

  1. Immediate Placeholder Creation: If a page is referenced but doesn't exist yet, create it immediately with a "Work in Progress" warning

  2. Standard WIP Header: Use this format for pages under development:

    > **⚠️ Work in Progress**: This documentation is currently being developed and may be incomplete or subject to change.
    
  3. Minimal Structure: Even WIP pages should have basic structure (Overview, sections planned)

  4. Track in Linear/GitHub: Create issues for documentation work to ensure completion

Common Documentation Patterns for System Agent

Based on the existing documentation structure, follow these patterns:

For Collector Documentation

# [Collector Name] Collector

## Overview
Brief description of what the collector does and why it's important

## Technical Details
- MetricType value
- Data sources (/proc, /sys files)
- Collection mode (one-shot vs continuous)

## Collected Metrics
Table of fields with types and descriptions

## Configuration
How to configure the collector

## Platform Considerations
Linux kernel requirements, container considerations

## Common Issues
Troubleshooting guide

## Examples
Sample output and usage patterns

## Related Collectors
Links to similar or complementary collectors

For Architecture Documentation

  • Use Mermaid diagrams for system architecture
  • Include both high-level and detailed component views
  • Show data flow with clear directional indicators
  • Reference actual code paths and interfaces

Link Validation Workflow

When working with links:

  1. Check Before Adding: Verify target pages exist before creating links
  2. Batch Creation: If adding multiple cross-references, create all target pages first
  3. Use Relative Links: For wiki internal links, always use relative format without .md
  4. External Links: For source code, use full GitHub URLs with specific commits/tags when possible

Documentation Review Checklist

Before committing documentation changes:

  • All internal links are wiki-format (no .md extension)
  • All referenced pages exist (even if just placeholders)
  • Code examples have proper syntax highlighting
  • Formatting passes markdownlint checks
  • No broken external links
  • Consistent terminology throughout
  • Navigation updated in Home.md if needed
  • Related pages have bidirectional links
  • Diagrams render correctly (if using Mermaid)

API Changes and Versioning Guidelines

When documenting API changes or version-specific features:

1. Version References

  • Always specify the System Agent version when documenting features
  • Use semantic versioning format: v1.2.3
  • Example: "Available in System Agent v2.1.0 and later"

2. Breaking Changes

When documenting breaking changes:

> **⚠️ Breaking Change in v2.0.0**: The `cpu_usage` field has been renamed to `cpu_percent`.
> 
> **Migration Guide:**
> - Update all references from `cpu_usage` to `cpu_percent`
> - The value range remains 0-100

3. API Evolution

  • Keep historical documentation for deprecated features
  • Mark deprecated items clearly with version info
  • Provide migration paths for deprecated functionality
  • Example:
### CPU Usage Field (Deprecated)

> **Deprecated in v2.0.0**: Use `cpu_percent` instead.

This field will be removed in v3.0.0.

4. Compatibility Matrix

For significant changes, maintain a compatibility matrix:

Feature v1.x v2.x v3.x
cpu_usage field Deprecated
cpu_percent field
eBPF collectors

5. Changelog References

  • Link to relevant changelog entries
  • Reference GitHub releases for detailed changes
  • Keep API documentation in sync with actual releases