Fine‐tune DeepSeek‐R1 locally - dcasota/ollama-scripts GitHub Wiki
The following description origins from @_avichawla on X.
Prerequisites:
- UnslothAI
- Ollama
- Load the model
# pip install unsloth
from unsloth import FastLanguageModel
import torch
MODEL = "unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit"
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = MODEL,
max_seq_length = 2048,
dtype = None,
load_in_4bit = True,
)
- Define LoRA config Use efficient techniques like LoRA to avoid fine-tuning the entire model weights.
In the following code, we use Unsloth's PEFT by specifying:
- The model
- LoRA low-rank (r)
- Modules for fine-tuning
- and a few more parameters.
model = FastLanguageModel.get_peft_model(
model,
r=4,
target_modules = ["q_proj","v_proj", "o_proj"],
use_gradient_checkpoint = "unsloth",
lora_alpha = 16,
lora_dropout = 0,
bias = "none",
use_rslora = False,
loftq_config = None
)
- Prepare dataset
Next, we use the Alpaca dataset to prepare a conversation dataset. The conversation_extension parameter defines the number of user mnessages in a single conversation.
from datasets import load_dataset
from unsloth import to_sharegpt
from unsloth import standardize_sharegpt
dataset = load_dataset("vicgalle/alpaca-gpt4", split = "train")
dataset = to_sharegpt(
dataset,
merged_prompt = "{instruction}[\nYour input is:\n{input}](/dcasota/ollama-scripts/wiki/\nYour-input-is:\n{input})",
output_column_name = "output",
conversaton_extension = 3,
)
dataset = standardize_sharegpt(dataset)
- Define trainer
Here, we create a Trainer object by specifying the training config like learning rate, model, tokenize, and more.
from trl import SFTTrainer
from transformers import TrainingArguments
trainer = SFTTrainer(model = model,
tokenizer = tokenizer,
train_dataset = dataset,
...
args = TrainingArguments(
per_device_train_batch_size = 2
gradient_accumulation_steps = 4,
max_steps = 60,
learning_rate = 2e-4,
...
optim = "adamw_8bit",
wight_deca = 0.01,
))
- Train
trainer_stats = trainer.train()
- Export to Ollama
# install ollama
curl -fsSl https://ollama.com/install.sh | sh
# save model and tokenizer
model.save_pretrained_gguf("model", tokenizer)
# create a fine-tuned model
ollama create deepseek_finetuned_model -f ./model/Modelfile
- Interact We have a fine-tuned DeepSeek (distilled Llama).
Now we can interact with it like any other model running on Ollama using:
- the CLI
- Ollama's Python package
- Ollama's LLamaIndex integration, etc.
from IPython.display import Markdown
import ollama
response = ollama.chat (model="deepseek_finetuned_model",
messages = [{role":"user",
"content": "How to add chart to a document?"},
])
Markdown(response.message.content)