The Joy of Discovery - doraithodla/notes GitHub Wiki

Finally I feel that I understand the entire flow of a chatbot, thanks to a RASA article. I will write it down after I get some code written. But here are some of the steps

Actors:

  • User
  • Bot
  • Model
  • Vector Database

Building the model

  • Read training data
  • Clean data (remove punctuation)
  • Tokenize sentences
  • Word tokenize (preserving word order)
  • Vectorize
  • Store the text as a set of vectors (and their semantic links)
  • Create work collocations

Train

  • Update the model with training data (add, delete). Perhaps in the first version, train is the same as building a model

Test

  • Test questions and answers
  • Correct the weights

Use

  • User question(prompt). Initially only take questions (ignore prompt processing)

  • Go through the process of clean, tokenize, find matches in the vector database, Find the higest probably answer, display it

  • RLHF

  • simulate several RLHF sessions

  • Update the model

  • Version-1: Start with the LM and the chatbot

  • Verstion-2: RLHF

  • Version-3: Build, train, test the model