The Joy of Discovery - doraithodla/notes GitHub Wiki

Finally I feel that I understand the entire flow of a chatbot, thanks to a RASA article. I will write it down after I get some code written. But here are some of the steps

Actors:

User
Bot
Model
Vector Database

Building the model

Read training data
Clean data (remove punctuation)
Tokenize sentences
Word tokenize (preserving word order)
Vectorize
Store the text as a set of vectors (and their semantic links)
Create work collocations

Train

Update the model with training data (add, delete). Perhaps in the first version, train is the same as building a model

Test

Test questions and answers
Correct the weights

Use

User question(prompt). Initially only take questions (ignore prompt processing)
Go through the process of clean, tokenize, find matches in the vector database, Find the higest probably answer, display it
RLHF
simulate several RLHF sessions
Update the model
Version-1: Start with the LM and the chatbot
Verstion-2: RLHF
Version-3: Build, train, test the model