Fine‐tune DeepSeek‐R1 locally - dcasota/ollama-scripts GitHub Wiki

The following description origins from @_avichawla on X.

Prerequisites:

  • UnslothAI
  • Ollama
  1. Load the model
# pip install unsloth
from unsloth import FastLanguageModel
import torch

MODEL = "unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit"

model, tokenizer = FastLanguageModel.from_pretrained(
   model_name = MODEL,
   max_seq_length = 2048,
   dtype = None,
   load_in_4bit = True,
)
  1. Define LoRA config Use efficient techniques like LoRA to avoid fine-tuning the entire model weights.

In the following code, we use Unsloth's PEFT by specifying:

  • The model
  • LoRA low-rank (r)
  • Modules for fine-tuning
  • and a few more parameters.
model = FastLanguageModel.get_peft_model(
   model,
   r=4,
   target_modules = ["q_proj","v_proj", "o_proj"],
   use_gradient_checkpoint = "unsloth",
   lora_alpha = 16,
   lora_dropout = 0,
   bias = "none",
   use_rslora = False,
   loftq_config = None
)
  1. Prepare dataset

Next, we use the Alpaca dataset to prepare a conversation dataset. The conversation_extension parameter defines the number of user mnessages in a single conversation.

from datasets import load_dataset
from unsloth import to_sharegpt
from unsloth import standardize_sharegpt

dataset = load_dataset("vicgalle/alpaca-gpt4", split = "train")

dataset = to_sharegpt(
   dataset,
   merged_prompt = "{instruction}[\nYour input is:\n{input}](/dcasota/ollama-scripts/wiki/\nYour-input-is:\n{input})",
   output_column_name = "output",
   conversaton_extension = 3,
)

dataset = standardize_sharegpt(dataset)
  1. Define trainer

Here, we create a Trainer object by specifying the training config like learning rate, model, tokenize, and more.

from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(model = model,
   tokenizer = tokenizer,
   train_dataset = dataset,
   ...
   args = TrainingArguments(
       per_device_train_batch_size = 2
       gradient_accumulation_steps = 4,
       max_steps = 60,
       learning_rate = 2e-4,
       ...
       optim = "adamw_8bit",
       wight_deca = 0.01,
   ))
  1. Train
trainer_stats = trainer.train()
  1. Export to Ollama
# install ollama
curl -fsSl https://ollama.com/install.sh | sh

# save model and tokenizer
model.save_pretrained_gguf("model", tokenizer)

# create a fine-tuned model
ollama create deepseek_finetuned_model -f ./model/Modelfile
  1. Interact We have a fine-tuned DeepSeek (distilled Llama).

Now we can interact with it like any other model running on Ollama using:

  • the CLI
  • Ollama's Python package
  • Ollama's LLamaIndex integration, etc.
from IPython.display import Markdown
import ollama
response = ollama.chat (model="deepseek_finetuned_model",
   messages = [{role":"user",
      "content": "How to add chart to a document?"},
   ])
Markdown(response.message.content)