MLOps - dtoinagn/flyingbird.github.io GitHub Wiki

ML Ops Architecture Diagram based on Kubeflow

LLMOps Expertise: Develop pipelines specifically tailored for managing large language models, including fine-tuning, version control, and automated deployments. Implement monitoring systems to track model performance, latency, drift, and data quality.
MLOps Pipeline Development:

Build scalable pipelines for model training, evaluation, and deployment, leveraging tools such as MLflow, Kubeflow, or Airflow. Ensure reproducibility and traceability of experiments and models.

Machine Learning Workflow

Kubeflow Components involved at Experimental phase

Kubeflow Components involved at Production phase

Breakdown of Kubeflow core components

Meta Llama 3 Instruct model performance

Sample use cases: how to increase accuracy of a Llama 3 8B model to generate SQL queries.

What are hallucinations?

Incorrect or fabricated information generated by the model
LLM thinks: something slightly right = right
Detrimental with facts & when you need precision (API, IDs etc.) E.g for SQL generation LLM, it might give you an function that are not supported by certain database

The following common approaches help but not enough

Prompt Engineering -> 26%
Self-reflection -> 26-40%
Retrieval Augmented Generation (RAG) -> 50%
Instruction Fine-tuning -> 40-60%

How can find-tuning reduce hallucination?

Embed facts into the model.
However, instruction fine-tuning isn't the tool to remove hallucinations (and can be costly)
Memory Tuning (invented by Lamini) allows the model to recall a lot of facts precisely without compromising generalization & instruction-following.
Memory Tuning -> 95%