hallucination - chunhualiao/public-docs GitHub Wiki
List
- Provide complete input (code, callees)
- Ask LLMs to check input and refuse the request, self/third-party critique and repair
- Using and testing generated docs: generate unit tests based on docs, then test the original functions.
- Cross-check with static analysis
- Human-in-the loop: add labels or ratings as feedback to fine tune the models
Generating documentation for C++ (or any other language) using large language models (LLMs) is a powerful technique, but ensuring correctness and avoiding hallucinations (i.e., plausible-sounding but incorrect outputs) can be challenging. Below are several strategies and best practices:
1. Provide High-Quality, Context-Rich Input
-
Give the model the complete source or relevant snippets:
- If the model doesn’t see all necessary classes, functions, or usage examples, it is more likely to produce incorrect or incomplete documentation.
- Ensure that all dependencies or relevant parts of the codebase are visible to the model—either through direct input (if feasible) or via a retrieval method that surfaces relevant snippets or references.
-
Include relevant comments and docstrings:
- If your code already contains comments or partial documentation, feed them to the model as well.
- The model can then refine existing docs instead of inventing them from scratch.
-
Provide usage context:
- If a function is part of a library or framework with established conventions, share that context.
- For example, if your class is an adapter pattern for a known library, mention that explicitly.
2. Use a Chunking and Retrieval Approach
-
Chunk the source code:
- Large codebases can exceed token limits if fed in full. Break the code into logical chunks (e.g., classes, modules, or namespaces).
- For each chunk, consider generating or refining documentation separately.
-
Retrieval-Augmented Generation (RAG):
- Before calling the LLM, use a retrieval system (e.g., a vector database) to find the most relevant code snippets or references.
- Feed those into the model as context so that the LLM can ground its output in the actual codebase.
- This approach reduces the chance of the model hallucinating because it is “reminded” of real code snippets.
3. Ground the Model in Established References
-
Reference library documentation or language specs:
- Provide authoritative sources (e.g., cppreference.com sections) relevant to the code at hand.
- If a function uses specific standard library constructs, feed the model the official documentation excerpt to anchor its explanations.
-
Link to your project’s style guides or design docs:
- If your project has internal references—like a design doc or style guide—feed that text to the model.
- This provides a “truth” baseline that the LLM can rely on when generating documentation.
-
Leverage code comments as “mini ground-truth”:
- If certain behaviors are well documented in comments (e.g., performance constraints, multi-threading assumptions), highlighting those for the LLM reduces the odds of misinterpretation.
4. Apply Verification and Testing Loops
-
Use a human-in-the-loop approach:
- Have developers or domain experts review and refine the generated documentation.
- A quick pass can catch glaring errors or omissions before they make it into a final docs set.
-
Automate tests to confirm correctness:
- For example, generate usage examples from the documentation snippets, then compile and run them.
- Confirm that the sample code compiles, executes as documented, and produces expected outputs.
-
Cross-check with static analysis or compiler warnings:
- Tools like Clang-Tidy or static analyzers can reveal any mismatch between the documentation’s claims and the code’s actual behavior.
- If the doc claims “no exceptions thrown,” but the function is annotated to throw an exception, that’s a red flag.
5. Use Model Selection and Fine-Tuning Strategies
-
Pick domain-specific or fine-tuned models where possible:
- General-purpose LLMs can be good for broad tasks, but if you can fine-tune or customize a model using your project’s own code-and-doc pairs, you’ll reduce hallucinations and increase accuracy.
-
Parameter selection and prompt engineering:
- Experiment with temperature (lower values yield more deterministic, less “creative” output).
- Use system prompts or special instructions that emphasize factual correctness over creativity.
-
Continuously refine with iterative feedback:
- If you find repeated errors—say, misunderstandings about a certain library or pattern—update your training data or prompt instructions to clarify those points in subsequent generations.
6. Enforce Transparency in the Output
-
Encourage the model to cite code references:
- Ask the LLM to reference line numbers or file paths.
- This reduces the likelihood of purely invented statements and makes it easier to verify the doc’s claims.
-
Include disclaimers for uncertain outputs:
- If the model is not sure about a particular function’s use case or edge cases, encourage it to note “Assumption or Potential Issue.”
- This signals to human reviewers where additional verification is needed.
7. Post-Processing and Analysis
-
Linter or rule-based checks on the generated docs:
- Simple rule-based systems can check for contradictory statements (e.g., “This function returns void” vs. “It returns an integer”).
- Highlight and correct these conflicts automatically or send them for human review.
-
Document structure templates:
- Provide a standard doc structure (e.g., “Summary,” “Parameters,” “Return Value,” “Exceptions,” “Example usage,” etc.).
- Using a consistent format helps both the LLM and your reviewers quickly identify missing or incorrect sections.
-
Monitor doc usage and user feedback:
- If users frequently ask about certain documented functions or report confusion, you know the doc might need refinement.
- Collect feedback to iteratively improve the generation pipeline.
Putting It All Together
By combining high-quality context, retrieval methods, human reviews, and iterative refinement, you can significantly improve the accuracy of LLM-generated C++ documentation and reduce hallucinations. Always view LLM output as a draft that needs validation and testing—especially for complex or business-critical APIs. With these strategies in place, LLMs can become powerful documentation aids that save time and ensure codebases remain well-documented and understandable.