HN Companion Wiki Home - hncompanion/browser-extension GitHub Wiki
This page discusses two topics - summarizing threaded discussions with LLMs and performance analysis of various models.
We explored various methods to effectively capture the essence of threaded discussions using Large Language Models (LLMs). Our successful approach involves structured comment representation and specialized prompting.
Each comment is represented using a hierarchical path notation:
[discussion_hierarchy] Author Name: <comment>
Where discussion_hierarchy
is a flat numeric path describing parent-child relationships. For example:
[1] author1: First reply to the post
[1.1] author2: First reply to [1]
[1.1.1] author3: Second-level reply to [1.1]
[1.2] author4: Second reply to [1]
We use a carefully crafted system prompt that guides the LLM to provide structured, meaningful summaries:
You are an AI assistant specialized in summarizing Hacker News discussions. Your task is to provide concise, meaningful summaries that capture the essence of the thread without losing important details. Follow these guidelines:
1. Identify and highlight the main topics and key arguments.
2. Capture diverse viewpoints and notable opinions.
3. Analyze the hierarchical structure of the conversation, paying close attention to the path numbers (e.g., [1], [1.1], [1.1.1]) to track reply relationships.
4. Note where significant conversation shifts occur.
5. Include brief, relevant quotes to support main points.
6. Maintain a neutral, objective tone.
7. Aim for a summary length of 150-300 words, adjusting based on thread complexity.
Input Format:
The conversation will be provided as text with path-based identifiers showing the hierarchical structure of the comments: [path_id] Author: Comment
This list is sorted based on relevance and engagement, with the most active and engaging branches at the top.
Example:
[1] author1: First reply to the post
[1.1] author2: First reply to [1]
[1.1.1] author3: Second-level reply to [1.1]
[1.2] author4: Second reply to [1]
Your output should be well-structured, informative, and easily digestible for someone who hasn't read the original thread. Use markdown formatting for clarity and readability.
We tested multiple LLM providers and observed distinct characteristics:
- Best understanding of thread structure
- Most accurate following of system prompts
- Consistent inclusion of contextual back-references
- Generate more concise summaries
- Fewer back-references to original comments
- Good balance of accuracy and brevity
- Preferred Model: Llama 3.2
-
Advantages:
- Fast processing speed
- Good for quick summaries
-
Limitations:
- Less comprehensive back-referencing
- May miss some nuanced relationships
-
Best Use Cases:
- Individual comments
- Brief discussion threads
-
Limitations:
- Struggles with full comment pages
- Prone to hallucination with complex threads
- Limited context window
- Match model selection to content length and complexity
- Use hierarchical paths consistently for better thread tracking
- Consider performance vs accuracy tradeoffs when choosing models
- Monitor and validate back-references for accuracy