AI: Training AI Model - RyanL2004/teamlyse GitHub Wiki
๐ Is SAMSum the Best Dataset for Training?
The SAMSum dataset is one of the best available for conversational summarization, but itโs not perfect for all aspects of your AI. Hereโs why:
โ Best for:
- ๐ฃ๏ธ Dialogue-based meetings (since itโs built from human conversations).
- ๐ Short summaries (great for summarizing meeting points).
โ ๏ธ Limitations:
- โ No structured meeting minutes.
- โ Lacks focus on action items, insights, or solutions.
- โ Mostly text-based chat conversations, not real spoken meetings.
๐ What We Can Do Next
To make your AI better than just summarization, we should:
๐ Fine-tune with a custom dataset that includes:
- ๐ Meeting summaries with key takeaways.
- ๐ข Real-world corporate meeting transcripts.
- ๐ค Data from decision-making discussions.
๐ Combine SAMSum with another dataset:
- AMI Corpus (Annotated Meeting Transcripts) โ ๐ข Corporate discussions.
- ICSI Corpus (Meeting recordings) โ ๐ค Long-form meeting insights.
๐ฎ Will the Model Provide Real-Time Insights & Solutions?
โ Right Now: Focusing on Summarization
For now, our fine-tuned model will:
โ๏ธ Summarize meetings at the end.
โ๏ธ Extract the most important points.
โ๏ธ Identify who said what.
๐ Future Goal: AI That Gives Real-Time Insights
To make your AI more intelligent and interactive, we need additional models and fine-tuning to:
โ๏ธ Analyze statements and problems during a meeting.
โ๏ธ Provide recommendations and solutions.
โ๏ธ Recognize decision-making patterns and best practices.
๐ก How Can We Achieve This?
- ๐ง Combine BART with NLP reasoning models (like DeepSeek or OpenAI GPT-3.5).
- ๐ข Fine-tune it on corporate decision-making datasets.
- โก Use real-time NLP processing with AI agents.
๐ Whatโs Our Next Step?
For now, we will:
โ๏ธ Fine-tune BART on SAMSum for summarization.
โ๏ธ Test how well it summarizes meetings.
โ๏ธ Later, we add real-time intelligence.
๐ So to Summarize:
- โ SAMSum is great for summarization, but we will improve it with better datasets.
- โ Right now, we focus on summarization.
- โ **Later,