AI Content Generation - Chris-Cullins/wiki_bot GitHub Wiki

AI Content Generation

Overview

The AI Content Generation area is responsible for transforming repository structure and source code into comprehensive wiki documentation. It orchestrates the entire documentation generation process by interfacing with Claude AI (via multiple provider options), processing responses, and ensuring output quality through validation and templating.

Core Responsibilities:

  • Generate wiki pages (Home, Architecture, Area-specific docs)
  • Manage AI query execution across different LLM providers
  • Process and normalize AI responses into structured Markdown
  • Apply templates and ensure content quality
  • Handle incremental documentation updates

Key Components

WikiGenerator (src/wiki-generator.ts)

The central orchestrator for all documentation generation. Manages the complete lifecycle of AI-powered wiki creation, from prompting to response processing to template application.

Key capabilities:

  • Multi-page generation (home, architecture, areas)
  • Response collection from streaming/non-streaming APIs
  • Content validation and normalization
  • Architectural area extraction and file identification
  • Integration with template rendering system

PromptLoader (src/prompt-loader.ts)

Simple utility for loading and hydrating prompt templates from markdown files.

Responsibilities:

  • Load prompt files from src/prompts/
  • Variable interpolation using {{variableName}} syntax
  • Clean separation of prompt content from generation logic

TemplateRenderer (src/template-renderer.ts)

Post-processing engine that applies customizable templates to AI-generated content.

Features:

  • Multi-directory template search (custom + default)
  • Variant-based template selection
  • Template caching for performance
  • Variable interpolation in templates

QueryFactory (src/query-factory.ts)

Abstraction layer for LLM provider selection and query execution.

Supported Providers:

  • agent-sdk: Anthropic's official Claude Agent SDK
  • claude-cli: Claude CLI tool with system prompt injection
  • codex-cli: Codex CLI with JSON response parsing
  • mock: Test mode with synthetic responses

MockAgentSDK (src/mock-agent-sdk.ts)

Testing utility that simulates AI responses without API calls.

Capabilities:

  • Context-aware mock responses based on prompt content
  • JSON/Markdown response type handling
  • Supports all generation workflow types

How It Works

Documentation Generation Flow

  1. Initialization

    const generator = new WikiGenerator(queryFn, config, logger);
    
    • Query function selected based on llmProvider config
    • Debug logging configured
    • Template renderer initialized
  2. Page Generation

    User Request → Load Prompt → Inject Variables → Execute Query → 
    Collect Response → Strip Wrappers → Normalize Content → 
    Apply Template → Validate Output
    
  3. Response Processing Pipeline

    • collectResponseText(): Streams events and extracts text from various message types
    • stripFenceWrappers(): Removes markdown code fences that LLMs often add
    • stripLeadingCommentary(): Removes meta-commentary before actual content
    • ensureHeading(): Guarantees proper H1 heading with title
    • hasMeaningfulBody(): Validates content substance
  4. Quality Gates

    • Meta-description detection: Filters out AI responses that describe what they're doing rather than actual content
    • Meaningful body check: Ensures pages have substantive content beyond just headings
    • Fallback to existing: Preserves prior valid content when regeneration fails

Architectural Area Discovery

Architecture Overview → Extract Areas (JSON) → For Each Area:
  → Identify Relevant Files (JSON) → Read File Contents → 
  Generate Area Documentation

The system uses a two-pass approach:

  1. Extract area names from architecture overview (JSON response)
  2. For each area, identify relevant files from full file list (JSON response)

Provider Abstraction

The QueryFunction type provides a unified interface:

type QueryFunction = (params: { prompt: string; options?: any }) => Query;

Each provider implementation:

  • agent-sdk: Direct async iteration over SDK messages
  • claude-cli: Spawns process, captures stdout, wraps in iterator
  • codex-cli: Spawns process, parses JSON lines, extracts agent_message types
  • mock: Generates contextual responses based on prompt patterns

Important Functions/Classes

WikiGenerator Core Methods

generateHomePage(repoStructure, existingDoc?)

Creates the wiki home page with project overview, features, and getting started guide.

Process:

  • Selects generate-home-page or update-home-page prompt
  • Injects repository structure and root path
  • Ensures proper heading and meaningful content
  • Applies home template

generateArchitecturalOverview(repoStructure, existingDoc?)

Produces structured architecture documentation with sections, diagram, and areas.

Special handling:

  • ensureArchitectureOutline(): Enforces standardized section structure
  • Extracts/normalizes Mermaid diagrams
  • Adds TODO placeholders for missing sections
  • Guarantees required sections: Summary, Pattern, Directories, Areas, Interactions, Data Flow, Diagram

extractArchitecturalAreas(architecturalOverview)

Parses JSON array of area names from architecture content.

Returns: string[] of area names or empty array on parse failure

identifyRelevantFiles(area, allFiles, repoStructure)

Determines which files belong to a specific architectural area.

Validation:

  • Filters non-existent paths
  • Deduplicates results
  • Logs warnings for invalid suggestions

generateAreaDocumentation(area, relevantFiles, existingDoc?)

Creates detailed documentation for a single architectural area.

Features:

  • Reads all relevant file contents
  • Formats as --- filepath ---\ncontent blocks
  • Applies depth instruction
  • Validates against meta-description patterns
  • Supports variant-specific templates via slugify(area)

Response Processing

collectResponseText(query)

Unified response collector supporting multiple message formats:

  • Stream events (content_block_delta, content_block_start)
  • SDK assistant messages with content blocks
  • Mock assistant messages (simple string content)

Priority: stream > SDK message > mock

stripFenceWrappers(content)

Removes markdown code fences while preserving language hints:

```markdown
# Content
→ Returns: `# Content`

#### `isMetaDescription(content)`
Detects when AI describes what it's doing rather than providing actual documentation.

**Trigger patterns:**
- "I've created/provided/assembled this documentation..."
- "This documentation includes/covers/contains..."

### Template System

#### `TemplateRenderer.render(templateName, context, options?)`
Applies templates with variant support:

**Search order:**
1. `{variantSubdir}/{variant}.md` (if both specified)
2. `{templateName}-{variant}.md` (if variant)
3. `{variant}.md` (if variant)
4. `{templateName}.md` (base)

**Context variables** replaced via `{{key}}` syntax

## Developer Notes

### Critical Gotchas

1. **Escaping Bug in Template Regex**
   ```typescript
   // CURRENT (WRONG):
   return value.replace(/[.*+?^${}()|[\]\\]/g, '\\{{fileContentText}}');
   
   // SHOULD BE:
   return value.replace(/[.*+?^${}()|[\]\\]/g, '\\{{content}}');

The literal {{fileContentText}} replacement breaks regex escaping.

  1. Provider Command Errors

    • CLI providers (claude-cli, codex-cli) require external binaries
    • ENOENT errors indicate missing installation
    • Usage limit detection only works for stdout messages
  2. Response Type Detection

    • Order matters: check mock → stream → SDK messages
    • Missing type guards cause silent failures
    • isTextBlock() and isTextDelta() validate content structure
  3. Incremental Updates

    • Controlled by config.incrementalDocs flag
    • Switches between generate-* and update-* prompts
    • Falls back to existing content when regeneration produces empty/meta results

Best Practices

When adding new page types:

  1. Create prompt in src/prompts/{action}-{pagetype}.md
  2. Add generation method to WikiGenerator
  3. Implement validation (meaningful body check)
  4. Create template in src/templates/{pagetype}.md
  5. Add fallback behavior for empty responses

When supporting new providers:

  1. Add provider type to Config.llmProvider
  2. Implement in createQueryFunction()
  3. Return async iterator matching Query type
  4. Handle both streaming and complete responses
  5. Add error handling for command/API failures

Quality validation flow:

const raw = await collectResponseText(query);
const stripped = stripFenceWrappers(raw);
const withHeading = ensureHeading(stripped, title);

if (isMetaDescription(withHeading) || !hasMeaningfulBody(withHeading, title)) {
  return existingDoc || fallback;
}

return await templates.render(templateType, { content: withHeading });

Configuration Interaction

  • config.debug: Enables detailed logging
  • config.promptLoggingEnabled: Writes prompts/responses to disk
  • config.documentationDepth: Affects getDepthInstruction() output
  • config.templateDir: Custom template directory (falls back to defaults)
  • config.testMode: Activates mock provider

Usage Examples

Basic Generation

import { WikiGenerator } from './wiki-generator.js';
import { createQueryFunction } from './query-factory.js';

const queryFn = createQueryFunction(config, repoPath);
const generator = new WikiGenerator(queryFn, config);

// Generate home page
const homePage = await generator.generateHomePage(repoStructure);

// Generate architecture
const architecture = await generator.generateArchitecturalOverview(repoStructure);

// Extract and document areas
const areas = await generator.extractArchitecturalAreas(architecture);
for (const area of areas) {
  const files = await generator.identifyRelevantFiles(area, allFiles, repoStructure);
  const doc = await generator.generateAreaDocumentation(area, files);
}

With Incremental Updates

const config = {
  incrementalDocs: true,
  // ... other config
};

const generator = new WikiGenerator(queryFn, config);

// Updates existing content or generates new
const updatedHome = await generator.generateHomePage(
  repoStructure,
  existingHomePage // Pass existing content
);

Using Different Providers

// Agent SDK (default)
const sdkQuery = createQueryFunction({ llmProvider: 'agent-sdk' }, repoPath);

// Claude CLI
const cliQuery = createQueryFunction({ llmProvider: 'claude-cli' }, repoPath);

// Test mode
const mockQuery = createQueryFunction({ testMode: true }, repoPath);

Custom Templates

const config = {
  templateDir: '/path/to/custom/templates',
  // ...
};

// Will search: custom dir → default dir
// Supports variants: area-config-management.md, config-management.md, area.md