Examples - mostlygeek/llama-swap GitHub Wiki

Guides and Configuration

Contributor	Link	OS	Server	Model	VRAM	Description
@mostlygeek	view	linux	llama.cpp	llama3.3 70B	52.5GB over 3 gpus	13 to 20 tok/sec with speculative decoding
@mostlygeek	view	linux	llama.cpp	qwen3-30B-3A	24GB	Running the latest Qwen3 models with thinking and no thinking
@mostlygeek	view	linux	llama.cpp	various VLMs	8GB to 24GB	Running various VLLMs with llama-server