Low Rank Adaptation (LoRA) - Goekdeniz-Guelmez/mlx-lm-lora GitHub Wiki

LoRA (Training Type)

LoRA (Low-Rank Adaptation) adds trainable low-rank “adapter” matrices into a frozen model’s layers, drastically reducing the number of trainable parameters. In practice, only a small rank‑r weight update is learned and merged into the base weight matrix. This makes tuning much more memory- and compute-efficient. LoRA is ideal for fine-tuning very large models or when resources are limited.

  • Typical use cases: Instruction tuning, domain adaptation, or any supervised fine-tuning where full FT is too large. LoRA is the default mode for supervised tasks.
  • How it works: We freeze the base model and learn two low‑rank matrices A (down-projection) and B (up-projection) so that during forward passes the weight update BA is added to the frozen weight. Inference uses the merged weights, so there is no extra latency.
  • Command-line: By default mlx_lm_lora.train runs LoRA. You can explicitly set --train-type lora, though it’s the default.

Example:

mlx_lm_lora.train \
  --model mlx-community/Josiefied-Qwen3-8B-abliterated-v1-4bit \
  --train \
  --data mlx-community/wikisql \
  --iters 800

This fine-tunes my Josiefied-Qwen3-8B-abliterated-v1-4bit on the wikisql dataset with LoRA. Results are saved under adapters/ by default. You can later fuse adapters or continue training using MLX-LM.