DeepWiki Open - chunhualiao/public-docs GitHub Wiki

RAG

https://github.com/AsyncFuncAI/deepwiki-open

When generating content for a wiki page, DeepWiki includes specific instructions in its prompt to the AI model to create Mermaid diagrams: page.tsx:255-262

These instructions explicitly ask the AI to include Mermaid diagrams to visualize:

  • Component relationships
  • Data flow
  • Architecture
  • Processes or workflows
  • Class hierarchies
  • State transitions The system provides detailed formatting guidelines to ensure the diagrams are properly structured: page.tsx:263-271

Importantly, DeepWiki instructs the AI to create vertically-oriented diagrams using "graph TD" (top-down) format to ensure better readability: page.tsx:264-267

The system even provides a template example of how the Mermaid diagram should be formatted: page.tsx:273-284

AI models used

DeepWiki can use multiple models, depending on the configuration:

Embedding Model:

  • text-embedding-3-small (when using OpenAI)
  • nomic-embed-text (when using Ollama)

Generator Models:

  • gemini-2.0-flash (Google Gemini)
  • Any model from OpenAI, Anthropic, Google, Meta, Mistral via OpenRouter
  • qwen3:1.7b (Ollama)

So, at least 2 models are used, but when using OpenRouter, the number of possible models is much higher.

Outputs

DeepWiki Mermaid Diagram Type SysML Diagram Type Description
graph TD (top-down graph) Block Definition Diagram (BDD) Used for component relationships and architecture in DeepWiki; maps to SysML's BDD which shows system structure and hierarchies
sequenceDiagram Sequence Diagram Both represent interactions between components over time, showing message flows
classDiagram Block Definition Diagram (BDD) DeepWiki uses this for class hierarchies; maps to SysML's BDD which can show class structures
stateDiagram State Machine Diagram Both represent state transitions and system behavior
Data flow diagrams (graph TD) Internal Block Diagram (IBD) DeepWiki uses top-down graphs for data flow; maps to SysML's IBD which shows internal structure and flows
Process workflows (graph TD) Activity Diagram DeepWiki uses top-down graphs for workflows; maps to SysML's Activity Diagram which shows processes

The mapping isn't one-to-one because SysML has more specialized diagram types than Mermaid, but the core concepts align. DeepWiki uses Mermaid's more general-purpose diagram types to represent similar concepts as SysML's specialized diagrams.

Inputs and Prompts

DeepWiki-open does scan all folders and files in a repository, but it employs several strategies to make this process efficient and manageable.

The system prioritizes code files, excludes common large directories, filters out extremely large files, and implements caching to avoid unnecessary rescanning. For very large repositories, the documentation suggests starting with smaller repositories first.

Prompt Type Purpose Location Key Components
Wiki Structure Generation Determines the overall wiki structure based on repository analysis src/app/page.tsx:386-442 Repository file tree, README content, XML output format
Page Content Generation Creates detailed content for each wiki page src/app/page.tsx:244-299 File paths, markdown formatting, diagram instructions
System Prompt (Backend) Sets the tone and guidelines for AI responses api/api.py:140-176 Role definition, formatting guidelines, style instructions
RAG System Prompt Guides the RAG system for context-aware responses api/rag.py:53-76 Language detection, markdown formatting rules
Mermaid Diagram Error Correction Fixes broken Mermaid diagrams src/app/page.tsx:90-97 Error message, original chart, correction instructions
RAG Template Structures the context for retrieval-augmented generation api/rag.py:79-105 System prompt, conversation history, context, user prompt

Wiki Structure Generation Prompt

content: `Analyze this GitHub repository ${owner}/${repo} and create a wiki structure for it.

1. The complete file tree of the project:
<file_tree>
${fileTree}
</file_tree>

2. The README file of the project:
<readme>
${readme}
</readme>

Wiki page content generation prompt

            repo_url: `https://github.com/${owner}/${repo}`,
            messages: [{
              role: 'user',
              content: `Generate comprehensive wiki page content for "${page.title}" in the repository ${owner}/${repo}.

This page should focus on the following files:
${filePaths.map(path => `- ${path}`).join('\n')}

The wiki page should:
1. Provide a detailed explanation of the purpose and functionality
2. Include code examples with explanations where appropriate
3. Explain how this component/feature fits into the overall architecture
4. Include any setup or usage instructions if applicable
5. Be formatted in Markdown for easy reading
6. IMPORTANT: Use Mermaid diagrams where appropriate to visualize:
   - Component relationships
   - Data flow
   - Architecture
   - Processes or workflows
   - Class hierarchies
   - State transitions
MERMAID DIAGRAM INSTRUCTIONS:
- Include at least one mermaid diagram if relevant to this topic
- IMPORTANT!!: Please orient and draw the diagram as vertically as possible. You must avoid long horizontal lists of nodes and sections!
- Use "graph TD" (top-down) for most diagrams to ensure vertical orientation
- Use proper formatting to avoid syntax errors:
  - Always have a space after "graph TD"
  - Use double dashes for arrows: A --> B (not A-B)
  - For node labels with spaces, use brackets: A[Node Label]
  - Keep diagrams simple and focused - don't try to show everything in one diagram

- Use the following format for mermaid diagrams:
\`\`\`mermaid
graph TD
  A[Start] --> B[Process]
  B --> C[End Result]
  
  %% You can use subgraphs to group related nodes
  subgraph Component
    B --> D[Helper Function]
    D --> B
  end
\`\`\`

- Common diagram types to consider:
  - graph TD (top-down graph) - PREFERRED for most cases
  - sequenceDiagram (sequence diagram)
  - classDiagram (class diagram)
  - stateDiagram (state diagram)

IMPORTANT FORMATTING INSTRUCTIONS:
- Return ONLY the markdown content itself
- DO NOT include \`\`\`markdown at the beginning or \`\`\` at the end
- DO NOT wrap content in any code blocks or other delimiters
- Start directly with the content (typically a heading)
- Just provide the raw markdown content with no preamble or conclusion

Return ONLY the raw markdown content for the wiki page.`

Backend System Prompt

        system_prompt = f"""<role>
You are an expert code analyst examining the GitHub repository: {repo_url} ({repo_name}).
You provide direct, concise, and accurate information about code repositories.
You NEVER start responses with markdown headers or code fences.
</role>

<guidelines>
- Answer the user's question directly without ANY preamble or filler phrases
- DO NOT start with preambles like "Okay, here's a breakdown" or "Here's an explanation"
- DO NOT start with markdown headers like "## Analysis of..." or any file path references
- DO NOT start with ```markdown code fences
- DO NOT end your response with ``` closing fences
- DO NOT start by repeating or acknowledging the question
- JUST START with the direct answer to the question
<example_of_what_not_to_do>

## Analysis of `adalflow/adalflow/datasets/gsm8k.py`

This file contains...

</example_of_what_not_to_do>

- Format your response with proper markdown including headings, lists, and code blocks WITHIN your answer
- For code analysis, organize your response with clear sections
- Think step by step and structure your answer logically
- Start with the most relevant information that directly addresses the user's query
- Be precise and technical when discussing code
</guidelines>
<style>
- Use concise, direct language
- Prioritize accuracy over verbosity
- When showing code, include line numbers and file paths when relevant
- Use markdown formatting to improve readability
</style>
"""

RAG System Prompt

system_prompt = r"""
You are a code assistant which answers user questions on a Github Repo.
You will receive user query, relevant context, and past conversation history.

LANGUAGE DETECTION AND RESPONSE:
- Detect the language of the user's query
- Respond in the SAME language as the user's query

FORMAT YOUR RESPONSE USING MARKDOWN:
- Use proper markdown syntax for all formatting
- For code blocks, use triple backticks with language specification (```python, ```javascript, etc.)
- Use ## headings for major sections
- Use bullet points or numbered lists where appropriate
- Format tables using markdown table syntax when presenting structured data
- Use **bold** and *italic* for emphasis
- When referencing file paths, use `inline code` formatting
IMPORTANT FORMATTING RULES:
1. DO NOT include ```markdown fences at the beginning or end of your answer
2. Start your response directly with the content
3. The content will already be rendered as markdown, so just provide the raw markdown content

Think step by step and ensure your answer is well-structured and visually organized.
"""

Mermaid Diagram Error Correction Prompt

const retryPrompt = `The following Mermaid diagram code failed to render with the error: "${errorMessage}"

Original Mermaid Code:
\`\`\`mermaid
${originalChart}
\`\`\`

Please regenerate the diagram from scratch and return ONLY the corrected Mermaid code block itself, starting with \`\`\`mermaid and ending with \`\`\`. Do not include any other text, explanation, or markdown formatting outside the code block. Fix the error: "${errorMessage}". Avoid horizontal layouts if possible, prefer "graph TD".`;
⚠️ **GitHub.com Fallback** ⚠️