Low Rank Adaptation (LoRA) - Goekdeniz-Guelmez/mlx-lm-lora GitHub Wiki
LoRA (Training Type)
LoRA (Low-Rank Adaptation) adds trainable low-rank “adapter” matrices into a frozen model’s layers, drastically reducing the number of trainable parameters. In practice, only a small rank‑r weight update is learned and merged into the base weight matrix. This makes tuning much more memory- and compute-efficient. LoRA is ideal for fine-tuning very large models or when resources are limited.
- Typical use cases: Instruction tuning, domain adaptation, or any supervised fine-tuning where full FT is too large. LoRA is the default mode for supervised tasks.
- How it works: We freeze the base model and learn two low‑rank matrices A (down-projection) and B (up-projection) so that during forward passes the weight update BA is added to the frozen weight. Inference uses the merged weights, so there is no extra latency.
- Command-line: By default mlx_lm_lora.train runs LoRA. You can explicitly set --train-type lora, though it’s the default.
Example:
mlx_lm_lora.train \
--model mlx-community/Josiefied-Qwen3-8B-abliterated-v1-4bit \
--train \
--data mlx-community/wikisql \
--iters 800
This fine-tunes my Josiefied-Qwen3-8B-abliterated-v1-4bit on the wikisql dataset with LoRA. Results are saved under adapters/ by default. You can later fuse adapters or continue training using MLX-LM.