hallucination - chunhualiao/public-docs GitHub Wiki

List

  • Provide complete input (code, callees)
  • Ask LLMs to check input and refuse the request, self/third-party critique and repair
  • Using and testing generated docs: generate unit tests based on docs, then test the original functions.
  • Cross-check with static analysis
  • Human-in-the loop: add labels or ratings as feedback to fine tune the models

Generating documentation for C++ (or any other language) using large language models (LLMs) is a powerful technique, but ensuring correctness and avoiding hallucinations (i.e., plausible-sounding but incorrect outputs) can be challenging. Below are several strategies and best practices:


1. Provide High-Quality, Context-Rich Input

  1. Give the model the complete source or relevant snippets:

    • If the model doesn’t see all necessary classes, functions, or usage examples, it is more likely to produce incorrect or incomplete documentation.
    • Ensure that all dependencies or relevant parts of the codebase are visible to the model—either through direct input (if feasible) or via a retrieval method that surfaces relevant snippets or references.
  2. Include relevant comments and docstrings:

    • If your code already contains comments or partial documentation, feed them to the model as well.
    • The model can then refine existing docs instead of inventing them from scratch.
  3. Provide usage context:

    • If a function is part of a library or framework with established conventions, share that context.
    • For example, if your class is an adapter pattern for a known library, mention that explicitly.

2. Use a Chunking and Retrieval Approach

  1. Chunk the source code:

    • Large codebases can exceed token limits if fed in full. Break the code into logical chunks (e.g., classes, modules, or namespaces).
    • For each chunk, consider generating or refining documentation separately.
  2. Retrieval-Augmented Generation (RAG):

    • Before calling the LLM, use a retrieval system (e.g., a vector database) to find the most relevant code snippets or references.
    • Feed those into the model as context so that the LLM can ground its output in the actual codebase.
    • This approach reduces the chance of the model hallucinating because it is “reminded” of real code snippets.

3. Ground the Model in Established References

  1. Reference library documentation or language specs:

    • Provide authoritative sources (e.g., cppreference.com sections) relevant to the code at hand.
    • If a function uses specific standard library constructs, feed the model the official documentation excerpt to anchor its explanations.
  2. Link to your project’s style guides or design docs:

    • If your project has internal references—like a design doc or style guide—feed that text to the model.
    • This provides a “truth” baseline that the LLM can rely on when generating documentation.
  3. Leverage code comments as “mini ground-truth”:

    • If certain behaviors are well documented in comments (e.g., performance constraints, multi-threading assumptions), highlighting those for the LLM reduces the odds of misinterpretation.

4. Apply Verification and Testing Loops

  1. Use a human-in-the-loop approach:

    • Have developers or domain experts review and refine the generated documentation.
    • A quick pass can catch glaring errors or omissions before they make it into a final docs set.
  2. Automate tests to confirm correctness:

    • For example, generate usage examples from the documentation snippets, then compile and run them.
    • Confirm that the sample code compiles, executes as documented, and produces expected outputs.
  3. Cross-check with static analysis or compiler warnings:

    • Tools like Clang-Tidy or static analyzers can reveal any mismatch between the documentation’s claims and the code’s actual behavior.
    • If the doc claims “no exceptions thrown,” but the function is annotated to throw an exception, that’s a red flag.

5. Use Model Selection and Fine-Tuning Strategies

  1. Pick domain-specific or fine-tuned models where possible:

    • General-purpose LLMs can be good for broad tasks, but if you can fine-tune or customize a model using your project’s own code-and-doc pairs, you’ll reduce hallucinations and increase accuracy.
  2. Parameter selection and prompt engineering:

    • Experiment with temperature (lower values yield more deterministic, less “creative” output).
    • Use system prompts or special instructions that emphasize factual correctness over creativity.
  3. Continuously refine with iterative feedback:

    • If you find repeated errors—say, misunderstandings about a certain library or pattern—update your training data or prompt instructions to clarify those points in subsequent generations.

6. Enforce Transparency in the Output

  1. Encourage the model to cite code references:

    • Ask the LLM to reference line numbers or file paths.
    • This reduces the likelihood of purely invented statements and makes it easier to verify the doc’s claims.
  2. Include disclaimers for uncertain outputs:

    • If the model is not sure about a particular function’s use case or edge cases, encourage it to note “Assumption or Potential Issue.”
    • This signals to human reviewers where additional verification is needed.

7. Post-Processing and Analysis

  1. Linter or rule-based checks on the generated docs:

    • Simple rule-based systems can check for contradictory statements (e.g., “This function returns void” vs. “It returns an integer”).
    • Highlight and correct these conflicts automatically or send them for human review.
  2. Document structure templates:

    • Provide a standard doc structure (e.g., “Summary,” “Parameters,” “Return Value,” “Exceptions,” “Example usage,” etc.).
    • Using a consistent format helps both the LLM and your reviewers quickly identify missing or incorrect sections.
  3. Monitor doc usage and user feedback:

    • If users frequently ask about certain documented functions or report confusion, you know the doc might need refinement.
    • Collect feedback to iteratively improve the generation pipeline.

Putting It All Together

By combining high-quality context, retrieval methods, human reviews, and iterative refinement, you can significantly improve the accuracy of LLM-generated C++ documentation and reduce hallucinations. Always view LLM output as a draft that needs validation and testing—especially for complex or business-critical APIs. With these strategies in place, LLMs can become powerful documentation aids that save time and ensure codebases remain well-documented and understandable.