Examples - mostlygeek/llama-swap GitHub Wiki
Guides and Configuration
Contributor | Link | OS | Server | Model | VRAM | Description |
---|---|---|---|---|---|---|
@mostlygeek | view | linux | llama.cpp | llama3.3 70B | 52.5GB over 3 gpus | 13 to 20 tok/sec with speculative decoding |
@mostlygeek | view | linux | llama.cpp | qwen3-30B-3A | 24GB | Running the latest Qwen3 models with thinking and no thinking |
@mostlygeek | view | linux | llama.cpp | various VLMs | 8GB to 24GB | Running various VLLMs with llama-server |