🏠 Transcription Aide Platform (TAP) - fuhui14/SWEN90017-2024-TAP GitHub Wiki

Transcription Aide Platform (TAP)

📘 Project Overview

🔹 Introduction

This project aims to develop a transcription platform that operates in a local environment using OpenAI's Whisper software. The platform is designed for a team working within a secure local area network (LAN), allowing team members to upload audio files for transcription without requiring user login.

🧩 Key Components

1. Web Interface

🎧 File Upload: Simple drag-and-drop upload through a web interface.
🔐 User Simplicity: No account creation or login is required, streamlining the workflow for internal team members.
✉️ Email Input: Users can input email address to receive transcription results.
🕓 History View & Expiry: View past files and their expiry dates (WIP).

2. Local Machine Execution

🏗️ Transcription Engine: The local machine hosts and runs OpenAI's Whisper software to transcribe uploaded audio files.
🗣️ Speaker Identification: Supports diarisation to differentiate multiple speakers.

⚙️ Technical Overview

🧱 System Architecture

The platform is built with:

Frontend: ReactJS (for user interactions)
Backend: Django (REST API to handle upload and processing)
Speech Processing: Whisper for transcription + diarisation module

🚀 Usage Flow

User accesses the platform on the local network.
User uploads an audio file (e.g., .wav or .mp3).
Optionally, the user enters an email address.
The backend receives the audio and sends it to Whisper for transcription.
Once complete:
- The transcription result is saved locally.
- If an email was provided, the result is sent via email.
The interface shows a history of previously uploaded and processed files.

🧪 Current Features (MVP Scope)

✅ Audio file upload
✅ Transcription using Whisper
✅ Optional email notification with result
✅ Basic UI (no login)
🕓 History view (partially implemented)
⌛ File expiry logic (to be developed)

🧠 Changelog (Release Notes)

Sprint 1

Initialized the file structure
Added meetings minutes during sprint 1
Create persona with explanation documentation
Implement user stories
Design the motivation model
Add Acceptance Criteria
Build low-fidelity & high-fidelity prototype with explanation documentations
Make technology selection

Sprint 2

Create system architecture diagrams
Class diagram
Use case diagram
Sequence diagram
Component diagram
Domain diagram
Deployment diagram
Activity diagram
ER diagram
Acknowledge speaker library
Build development environment configuration
Front End
Back End
Add risk management document
Create communication plan
Add mood board for high fidelity prototype

Sprint 3

Develop user stories for the priority of "Must Have"
Separate development of front end and back end
Relevant tests
Integrate back end and front end

Sprint 4

Complete remaining Must Have user stories.
Start development on Should Have & Could Have user stories.
Ensure early front-end and back-end integration to prevent last-minute issues.
Improve performance of identifying different speakers.
Begin testing for completed features.

Sprint 5

Complete all remaining Should Have and Could Have user stories.
Finalize admin portal and history features.
Complete testing framework and ensure full integration.
Optimize database, API, and file management system.
Deliver feature-complete system for Sprint 6 validation.

Sprint 6

Complete regression and acceptance testing
Finalize technical documentation, user manual, and test report
Conduct client acceptance testing and address final feedback
Prepare materials for Endeavor Exhibition (poster, demo script, risk assessment)
Perform system handover and final delivery

🔗 Related Links

For technical setup and deployment instructions, please see: Setup Guide