Git Wiki Publishing - Chris-Cullins/wiki_bot GitHub Wiki

Git/Wiki Publishing

Overview

The Git/Wiki Publishing area handles all aspects of generating documentation content from repository analysis and persisting it to GitHub Wiki repositories. This system orchestrates the complete documentation pipeline: crawling the repository structure, generating wiki pages via LLM prompts, managing git operations for the wiki repository, and ensuring changes are committed and pushed correctly.

Key Responsibilities:

  • Managing git repository cloning, updates, and state tracking
  • Writing generated documentation to GitHub wiki repositories
  • Orchestrating the full documentation generation workflow
  • Handling incremental updates and selective regeneration
  • Providing configurable repository modes (fresh, incremental, reuse-or-clone)

Key Components

GitRepositoryManager (src/github/git-repository-manager.ts)

Core git operations handler that encapsulates all repository state management. Supports three operational modes:

  • fresh: Always removes and re-clones the repository
  • incremental: Updates existing repository or clones if missing
  • reuse-or-clone: Uses existing repository as-is, only clones if absent

Key Features:

  • Authenticated URL construction with token injection
  • Repository status tracking (branch, commits ahead/behind, uncommitted changes)
  • Safe update operations with uncommitted change detection
  • Credential sanitization in error messages

GitHubWikiWriter (src/github/github-wiki-writer.ts)

High-level interface for wiki documentation persistence. Manages the full lifecycle of wiki page generation and publication.

Core Capabilities:

  • Page name normalization and file mapping
  • Sidebar generation with intelligent page ordering (Home → Architecture → alphabetized areas)
  • Content change detection to avoid unnecessary commits
  • Cleanup of existing markdown files in fresh mode
  • Lazy repository preparation

WikiGenerator (src/wiki-generator.ts)

Orchestrates LLM-powered documentation generation across all wiki pages. Handles prompt construction, response parsing, and content normalization.

Generated Page Types:

  1. Home Page: Repository overview with structure summary
  2. Architecture Overview: High-level architectural patterns and area identification
  3. Area Documentation: Deep-dive into specific architectural areas with file analysis

Content Processing Pipeline:

  • Strips markdown fence wrappers from LLM responses
  • Ensures proper heading hierarchy
  • Guards against meta-descriptive content (responses about the documentation process itself)
  • Template rendering for customizable output formats
  • Depth-aware content generation (summary/standard/deep)

Main Application Flow (src/index.ts)

Entry point that coordinates the complete documentation workflow:

  1. Configuration & Setup: Load config, parse CLI args, initialize logger
  2. Repository Crawling: Scan repository structure and enumerate files
  3. Target Resolution: Match CLI --target-file arguments to actual paths
  4. Documentation Generation:
    • Home page generation
    • Architectural overview
    • Area extraction from overview
    • Per-area file identification and documentation
  5. Wiki Persistence: Commit and push all generated pages

Selective Regeneration Mode:
When --target-file is specified, only areas touching those files are regenerated. Existing pages are reused for untouched areas.

Configuration System (src/config.ts)

Environment-driven configuration supporting multiple LLM providers (agent-sdk, claude-cli, codex-cli) and repository modes. Key configuration options:

  • WIKI_REPO_MODE: Controls repository management strategy
  • INCREMENTAL_DOCS: Enables reuse of existing wiki content
  • WIKI_FRESH_CLEAN: Removes existing markdown files in fresh mode
  • DOC_DEPTH: Controls documentation verbosity (summary/standard/deep)
  • PROMPT_LOG_ENABLED: Persists prompt/response transcripts for debugging

Template System (src/template-renderer.ts)

Flexible template loader supporting variant-specific overrides:

  • Searches custom template directory first, then built-in defaults
  • Supports variant subdirectories (e.g., templates/areas/{area-slug}.md)
  • Simple {{placeholder}} interpolation
  • Gracefully falls back to raw content if templates are missing

How It Works

Documentation Generation Workflow

1. Repository Analysis
   └─> RepoCrawler scans file tree
   
2. Home Page Generation
   └─> Prompt includes full repository structure
   └─> LLM generates overview markdown
   
3. Architecture Overview
   └─> LLM analyzes structure for architectural patterns
   └─> Identifies major areas (e.g., "CLI", "Git/Wiki Publishing")
   
4. Area Extraction
   └─> Parse architecture overview for area names
   └─> Returns JSON array of area strings
   
5. Per-Area Documentation
   ├─> Identify relevant files for each area
   ├─> Read file contents
   ├─> Generate documentation from source code
   └─> Apply depth-specific instructions
   
6. Wiki Publication
   ├─> Write pages to local wiki checkout
   ├─> Generate sidebar with ordered page links
   ├─> Commit changes
   └─> Push to remote wiki repository

Repository State Management

GitRepositoryManager tracks repository state through the RepositoryStatus interface:

interface RepositoryStatus {
  exists: boolean;           // Local .git directory present
  clean: boolean;            // No uncommitted changes
  branch: string;            // Current branch name
  ahead: number;             // Commits ahead of remote
  behind: number;            // Commits behind remote
  uncommittedChanges: string[]; // Porcelain status lines
}

Update Safety: The update() method throws if uncommitted changes are detected, preventing accidental data loss when resetting to remote state.

Content Change Detection

GitHubWikiWriter.hasContentChanged() normalizes content before comparison:

  1. Ensure trailing newline
  2. Normalize line endings (CRLF → LF)
  3. Compare normalized strings

This prevents spurious commits from whitespace-only changes.

Meta-Description Filtering

WikiGenerator.isMetaDescription() detects when the LLM responds with commentary about the documentation process rather than actual documentation content. Triggers include patterns like:

  • "I've created this documentation..."
  • "This wiki covers..."
  • "The documentation includes..."

When detected, the system falls back to existing documentation or returns a placeholder.

Important Functions/Classes

GitRepositoryManager.prepare()

Orchestrates repository setup based on configured mode:

  • fresh: Removes existing repo → clones fresh
  • incremental: Updates if exists, otherwise clones
  • reuse-or-clone: Only clones if missing

Location: src/github/git-repository-manager.ts:52

GitHubWikiWriter.writeDocumentation(pages: Map<string, string>)

Main entry point for wiki persistence. Handles the complete write workflow:

  1. Ensure repository is prepared
  2. Check for uncommitted changes (warns but continues)
  3. Write all pages with change detection
  4. Generate sidebar
  5. Commit and push if changes detected

Location: src/github/github-wiki-writer.ts:54

WikiGenerator.generateAreaDocumentation(area, relevantFiles, existingDoc?)

Generates documentation for a specific architectural area:

  1. Reads all relevant file contents
  2. Constructs prompt with file contents and depth instructions
  3. Collects LLM response
  4. Strips fence wrappers and ensures heading
  5. Checks for meta-descriptions
  6. Applies templates
  7. Falls back to existing doc if generation produces empty content

Location: src/wiki-generator.ts:449

WikiGenerator.ensureArchitectureOutline(content)

Normalizes architecture page structure to ensure consistent sections:

  • Summary
  • Architectural Pattern
  • Key Directories
  • Architectural Areas
  • Component Interactions
  • Data Flow
  • Diagram (Mermaid)

Extracts any stray Mermaid diagrams and places them in the Diagram section.

Location: src/wiki-generator.ts:148

resolveTargetFiles(inputs, filePaths, repoPath)

Matches CLI --target-file arguments to actual repository file paths. Handles:

  • Relative paths (e.g., ./src/index.ts)
  • Absolute paths
  • Paths relative to current working directory
  • Cross-platform path normalization (forward slashes)

Location: src/index.ts:78

Developer Notes

Git Authentication

The system injects tokens into repository URLs for HTTPS authentication:

// Input:  https://github.com/user/repo.wiki.git
// Output: https://x-access-token:[email protected]/user/repo.wiki.git

Security: Git commands with URLs are sanitized in error messages (<redacted-url>).

Shallow Clones

Enable shallow: true for faster initial clones when full git history isn't needed. The system adds --depth 1 to the clone command.

Trade-off: Shallow clones cannot push to different branches or merge. Only use with simple linear workflows.

Incremental Documentation

When INCREMENTAL_DOCS=true, the system attempts to load existing wiki pages before generation. The LLM receives existing content in the prompt and can produce minimal updates.

Fallback Logic: If generation produces empty/meta-descriptive content, the original page is preserved.

Template Override System

Custom templates are searched before built-in defaults. For area-specific templates:

  1. {custom}/areas/{area-slug}.md
  2. {custom}/area-{area-slug}.md
  3. {custom}/{area-slug}.md
  4. {custom}/area.md
  5. {default}/area.md

The first match wins. If no template exists, raw content is used.

Selective Regeneration

CLI flag --target-file src/foo.ts enables targeted updates:

  • Only areas containing src/foo.ts are regenerated
  • Other areas reuse existing documentation
  • Automatically enables incrementalDocs mode
  • Home and Architecture pages are reused if they exist

Warning Behavior: Unmatched target files trigger a warning but don't halt execution.

LLM Response Parsing

collectResponseText() handles three message types:

  • Mock messages (test mode): { type: 'assistant', content: string }
  • SDK messages: { type: 'assistant', message: { content: [...] } }
  • Stream events: { type: 'stream_event', event: { ... } }

Stream events are prioritized, falling back to SDK messages, then mock content.

Page Name Sanitization

sanitizePageName() transforms area names into wiki-friendly page names:

  • Trims whitespace
  • Converts path separators to spaces
  • Removes special characters (keeps alphanumeric, spaces, hyphens, underscores)
  • Collapses multiple spaces
  • Preserves "Home" capitalization

Example: "CLI / Configuration""CLI Configuration"

Usage Examples

Generating Fresh Documentation

# Full fresh generation with cleanup
WIKI_REPO_MODE=fresh \
WIKI_FRESH_CLEAN=true \
GITHUB_WIKI_URL=https://github.com/user/repo.wiki.git \
GITHUB_TOKEN=ghp_xxx \
npm start

Incremental Updates

# Update only changed areas
WIKI_REPO_MODE=incremental \
INCREMENTAL_DOCS=true \
GITHUB_WIKI_URL=https://github.com/user/repo.wiki.git \
GITHUB_TOKEN=ghp_xxx \
npm start

Selective Regeneration

# Regenerate only areas touching specific files
npm start -- --target-file src/index.ts --target-file src/config.ts

Deep Documentation Depth

# Generate exhaustive documentation
DOC_DEPTH=deep \
GITHUB_WIKI_URL=https://github.com/user/repo.wiki.git \
GITHUB_TOKEN=ghp_xxx \
npm start

Using Custom Templates

# Override default templates
TEMPLATE_DIR=./custom-templates \
GITHUB_WIKI_URL=https://github.com/user/repo.wiki.git \
GITHUB_TOKEN=ghp_xxx \
npm start

Create ./custom-templates/area.md:

# {{title}}

**Area:** {{area}}

---

{{content}}

---

*Generated with custom template*