MLOps - dtoinagn/flyingbird.github.io GitHub Wiki

ML Ops Architecture Diagram based on Kubeflow

image

  • LLMOps Expertise: Develop pipelines specifically tailored for managing large language models, including fine-tuning, version control, and automated deployments. Implement monitoring systems to track model performance, latency, drift, and data quality.

  • MLOps Pipeline Development:

Build scalable pipelines for model training, evaluation, and deployment, leveraging tools such as MLflow, Kubeflow, or Airflow. Ensure reproducibility and traceability of experiments and models.

Machine Learning Workflow

image

Kubeflow Components involved at Experimental phase

image

Kubeflow Components involved at Production phase

image

Breakdown of Kubeflow core components

image

Meta Llama 3 Instruct model performance

Sample use cases: how to increase accuracy of a Llama 3 8B model to generate SQL queries.

image

What are hallucinations?

  • Incorrect or fabricated information generated by the model
  • LLM thinks: something slightly right = right
  • Detrimental with facts & when you need precision (API, IDs etc.) E.g for SQL generation LLM, it might give you an function that are not supported by certain database

The following common approaches help but not enough

  • Prompt Engineering -> 26%
  • Self-reflection -> 26-40%
  • Retrieval Augmented Generation (RAG) -> 50%
  • Instruction Fine-tuning -> 40-60%

How can find-tuning reduce hallucination?

  • Embed facts into the model.
  • However, instruction fine-tuning isn't the tool to remove hallucinations (and can be costly)
  • Memory Tuning (invented by Lamini) allows the model to recall a lot of facts precisely without compromising generalization & instruction-following.
  • Memory Tuning -> 95%