The Joy of Discovery - doraithodla/notes GitHub Wiki
Finally I feel that I understand the entire flow of a chatbot, thanks to a RASA article. I will write it down after I get some code written. But here are some of the steps
Actors:
- User
- Bot
- Model
- Vector Database
Building the model
- Read training data
- Clean data (remove punctuation)
- Tokenize sentences
- Word tokenize (preserving word order)
- Vectorize
- Store the text as a set of vectors (and their semantic links)
- Create work collocations
Train
- Update the model with training data (add, delete). Perhaps in the first version, train is the same as building a model
Test
- Test questions and answers
- Correct the weights
Use
-
User question(prompt). Initially only take questions (ignore prompt processing)
-
Go through the process of clean, tokenize, find matches in the vector database, Find the higest probably answer, display it
-
RLHF
-
simulate several RLHF sessions
-
Update the model
-
Version-1: Start with the LM and the chatbot
-
Verstion-2: RLHF
-
Version-3: Build, train, test the model