AI: Integration - RyanL2004/teamlyse GitHub Wiki
Steps to Create and Implement AI for Meeting Companion Using DeepThink and OpenAI Whisper
This document provides a comprehensive step-by-step guide for creating and implementing AI into the Meeting Companion project. It leverages DeepThink for advanced summarization and reasoning, and OpenAI Whisper for transcription. Additionally, the use of TensorFlow or PyTorch is explained for training or fine-tuning any additional models required for enhancing AI functionalities.
Before starting, ensure the following:
- A Node.js backend to handle real-time communication and integrate APIs.
- A Python environment for AI-related tasks (e.g., summarization and transcription).
- Libraries installed:
openai
,transformers
,torch
,tensorflow
,pandas
,numpy
, etc. - Access to DeepThink APIs and OpenAI Whisper models.
-
Speech-to-Text with OpenAI Whisper:
- Transcribe audio input from meetings into text.
- OpenAI Whisper is open-source and offers high accuracy.
-
Summarization with DeepThink:
- Use pre-trained DeepThink summarization models for abstractive summaries.
- Handles reasoning and contextual understanding to generate actionable insights.
-
TensorFlow or PyTorch:
- Fine-tune any additional summarization or NLP models if necessary.
- Use for custom tasks like sentiment analysis, topic modeling, or enhancing accuracy of summarization.
Feature | Technology Used |
---|---|
Speech-to-Text (Transcription) | OpenAI Whisper |
Summarization (Text Processing) | Hugging Face Transformers (T5, Pegasus) |
Frameworks for Model Handling | TensorFlow, PyTorch |
Fine-Tuning & Customization (Future Enhancement) | PyTorch or TensorFlow (if training is needed) |
We will use Hugging Face Transformers to handle summarization efficiently. DeepThink can still be an option for reasoning-based text processing if needed in the future.
Category | Tool/Framework | Purpose |
---|---|---|
Backend API | Flask (Python) | Handles /transcription & /summarization APIs |
Speech-to-Text | OpenAI Whisper | Converts meetings into text |
Summarization Model | Hugging Face Transformers | Uses T5/Pegasus for text summarization |
ML Framework | PyTorch/TensorFlow | Supports model fine-tuning in the future |
Frontend Integration | React (via API calls) | Displays summaries in the UI |
Database (If Needed) | MongoDB/PostgreSQL | Stores transcriptions & summaries |
Deployment | AWS/GCP/Azure | Deploys the AI backend |
- Integrate Hugging Face Transformers for summarization in Flask.
- Enhance the
/transcription
endpoint to include automatic summarization. - Expose a standalone
/summarization
endpoint for external text input. - Optimize performance using TensorFlow/PyTorch for potential fine-tuning.
- Prepare for frontend integration via API calls to return summaries.
-
Install Required Libraries:
pip install openai torch torchvision transformers tensorflow pip install pydub whisper
-
Set Up Python Virtual Environment:
python -m venv venv source venv/bin/activate
-
Integrate Whisper and DeepThink:
- Clone the Whisper repository for local transcription:
git clone https://github.com/openai/whisper.git cd whisper pip install -e .
- Obtain access to DeepThink’s API or download their latest models.
- Clone the Whisper repository for local transcription:
-
Load the Whisper Model:
import whisper # Load pre-trained Whisper model model = whisper.load_model("base") # Transcribe audio input result = model.transcribe("path_to_audio_file.mp3") print("Transcription:", result['text'])
-
Real-Time Audio Input (Optional):
- Use libraries like
pydub
orspeech_recognition
for real-time audio input.
- Use libraries like
-
Integrate DeepThink’s API:
- Example API Request:
import requests url = "https://api.deepthink.ai/summarize" headers = {"Authorization": "Bearer YOUR_API_KEY"} payload = {"text": result['text'], "max_length": 100} response = requests.post(url, headers=headers, json=payload) summary = response.json().get("summary") print("Summary:", summary)
- Example API Request:
-
Batch Processing for Long Meetings:
- Split long transcriptions into smaller chunks:
def chunk_text(text, max_chars=500): return [text[i:i+max_chars] for i in range(0, len(text), max_chars)] chunks = chunk_text(result['text']) summaries = [requests.post(url, headers=headers, json={"text": chunk}).json().get("summary") for chunk in chunks] print("Combined Summary:", " ".join(summaries))
- Split long transcriptions into smaller chunks:
-
Fine-Tune Models (Optional):
- If DeepThink or Whisper’s outputs require customization, fine-tune using TensorFlow or PyTorch.
-
Example with Hugging Face and PyTorch:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("t5-small") model = AutoModelForSeq2SeqLM.from_pretrained("t5-small") text = "Meeting transcription input text here." inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=512, truncation=True) summary_ids = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True) print("Custom Summary:", tokenizer.decode(summary_ids[0], skip_special_tokens=True))
-
TensorFlow Example:
import tensorflow as tf from transformers import TFAutoModelForSeq2SeqLM, AutoTokenizer model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small") tokenizer = AutoTokenizer.from_pretrained("t5-small") inputs = tokenizer("summarize: Meeting transcription input", return_tensors="tf", max_length=512, truncation=True) outputs = model.generate(inputs.input_ids, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True) print("Custom Summary:", tokenizer.decode(outputs[0], skip_special_tokens=True))
-
Expose Python Scripts as APIs:
- Use Flask or FastAPI to expose Whisper and DeepThink functionalities.
from flask import Flask, request, jsonify app = Flask(__name__) @app.route('/transcribe', methods=['POST']) def transcribe(): audio_file = request.files['audio'] result = model.transcribe(audio_file) return jsonify(result['text']) @app.route('/summarize', methods=['POST']) def summarize(): text = request.json.get("text") summary = requests.post(url, headers=headers, json={"text": text}).json().get("summary") return jsonify(summary) if __name__ == '__main__': app.run(debug=True)
- Use Flask or FastAPI to expose Whisper and DeepThink functionalities.
-
Call Python APIs from Node.js:
- Use
axios
in Node.js to send requests to the Flask/FastAPI endpoints.
- Use
-
Host Backend APIs:
- Deploy Flask/FastAPI app on AWS, Google Cloud, or Heroku.
-
Optimize Model Performance:
- Use TensorFlow Lite or ONNX Runtime for faster inference if required.
-
Test the Full Workflow:
- Run end-to-end tests to validate transcription, summarization, and integration with meeting platforms.
By leveraging OpenAI Whisper for transcription and DeepThink for summarization, along with TensorFlow or PyTorch for customization, you can create a robust and cost-effective AI Meeting Companion. This setup ensures scalability, ease of integration, and high performance while keeping costs minimal. Let me know if you'd like further guidance or specific troubleshooting tips!
Now that transcription & summarization work flawlessly, we can enhance the AI Meeting Companion with new features. Here are some exciting next steps:
Right now, the summary provides a general overview.
➡️ We can extract key insights from meetings using NLP, such as:
- Decisions made
- Actionable tasks
- Deadlines and follow-ups
🔹 Solution: Use Hugging Face’s text2text-generation
models for action item extraction.
Instead of processing recorded files only, we can:
- Integrate with live meeting platforms (Zoom, Teams)
- Transcribe in real-time while the meeting is happening
🔹 Solution: Implement WebSockets to handle live streams.
Some users might have multilingual meetings.
➡️ Next step: Support real-time translation alongside transcription.
🔹 Solution: Use Hugging Face mT5
or OpenAI models for multi-language summarization.
Right now, everything runs via APIs.
The next step is to integrate with the React frontend to display:
✅ Live transcriptions
✅ Summaries in a structured format
✅ User customization (e.g., adjust summarization length, languages, etc.)
Before we fine-tune the model, we need to install the necessary libraries. Ensure your virtual environment (venv
) is activated and install the required dependencies:
bashCopierModifierpip install torch transformers datasets accelerate evaluate rouge_score
-
torch
→ Deep learning framework for training -
transformers
→ Pre-trained NLP models (Hugging Face) -
datasets
→ Managing large datasets for NLP -
accelerate
→ Optimizes training for fast performance -
evaluate
→ Metrics for evaluating model performance -
rouge_score
→ Measures summarization accuracy
To fine-tune our summarization model, we need a high-quality dataset. Hugging Face provides several options:
Dataset | Description -- | -- CNN/DailyMail | News summarization dataset (Long-form summarization) XSum | Single-sentence extreme summarization SAMSum | Summarization of dialogue (best for conversational summaries)Since your AI Meeting Companion focuses on summarizing long-form transcriptions, we will start with CNN/DailyMail.
We'll use the Hugging Face Datasets library to load cnn_dailymail
:
pythonCopierModifierfrom datasets import load_dataset
# Load the CNN/DailyMail dataset for training dataset = load_dataset("cnn_dailymail", "3.0.0")
# Split the dataset into train and validation sets train_data = dataset["train"] val_data = dataset["validation"]
# Sample print(train_data[0])
Since our dataset contains long-form text, we need to preprocess it by tokenizing it into input/output sequences.
pythonCopierModifierfrom transformers import BartTokenizer
# Load BART tokenizer tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")
# Define max input & output length MAX_INPUT = 1024 MAX_TARGET = 128
# Function to tokenize input and summaries def preprocess_data(data): model_inputs = tokenizer(data["article"], max_length=MAX_INPUT, truncation=True, padding="max_length")
<span class="hljs-comment"># Tokenize summaries</span> <span class="hljs-keyword">with</span> tokenizer.as_target_tokenizer(): labels = tokenizer(data[<span class="hljs-string">"highlights"</span>], max_length=MAX_TARGET, truncation=<span class="hljs-literal">True</span>, padding=<span class="hljs-string">"max_length"</span>) model_inputs[<span class="hljs-string">"labels"</span>] = labels[<span class="hljs-string">"input_ids"</span>] <span class="hljs-keyword">return</span> model_inputs
# Apply processing tokenized_dataset = dataset.map(preprocess_data, batched=True)
Now, let's set up the training configuration and fine-tune facebook/bart-large-cnn
.
pythonCopierModifierfrom transformers import BartForConditionalGeneration, TrainingArguments, Trainer
# Load BART model model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")
# Training parameters training_args = TrainingArguments( output_dir="./bart-summarization", evaluation_strategy="epoch", save_strategy="epoch", logging_dir="./logs", per_device_train_batch_size=4, per_device_eval_batch_size=4, num_train_epochs=3, # Adjust based on performance learning_rate=5e-5, weight_decay=0.01, save_total_limit=2, push_to_hub=False )
# Trainer setup trainer = Trainer( model=model, args=training_args, train_dataset=tokenized_dataset["train"], eval_dataset=tokenized_dataset["validation"], )
# Train the model trainer.train()
After fine-tuning, we need to evaluate how well the model performs using ROUGE Score, which is the standard metric for summarization.
pythonCopierModifierfrom evaluate import load
# Load ROUGE evaluator rouge = load("rouge")
# Generate model summaries on validation set def compute_metrics(eval_pred): preds, labels = eval_pred preds = tokenizer.batch_decode(preds, skip_special_tokens=True) labels = tokenizer.batch_decode(labels, skip_special_tokens=True) return rouge.compute(predictions=preds, references=labels)
# Run evaluation metrics = trainer.evaluate() print(metrics)
Once satisfied with the performance, save the fine-tuned model.
pythonCopierModifiermodel.save_pretrained("./fine-tuned-bart") tokenizer.save_pretrained("./fine-tuned-bart")
This will allow us to load it in our Flask API instead of the default facebook/bart-large-cnn
.
- 🛠️ Integrate the fine-tuned model into your AI Meeting Companion API.
- 📈 Test it with real meetings and analyze performance.
- 💡 Optimize hyperparameters to improve results.
- 🔥 Deploy the model in production (consider Hugging Face Spaces or AWS Lambda).