Install Ollama Local LLM Linux - stone-alex/EliteIntel GitHub Wiki
🐧 Local LLM - Linux Setup (Ollama)
Running a local LLM keeps everything private, offline, and free (beyond electricity and hardware). Think of it as the difference between running a game on your own rig vs. streaming it from the cloud - lower latency, no subscriptions, no one snooping on your loadout.
It requires Ollama and a capable GPU.
Minimum Hardware
To run Elite Dangerous and the LLM on the same machine, you need at minimum an NVIDIA RTX 3060 with 12 GB VRAM. That's the floor - it'll run, but don't expect headroom to spare.
Tip: You can point Elite Intel at an Ollama instance running on a separate PC on your network. If you have a home lab or a spare box with a good GPU, that's a great option - the game PC doesn't carry the load at all.
Recommended Model
| Model | VRAM Required | Notes |
|---|---|---|
Tulu-3.1-8B-SuperNova-Q4_K_M |
~5 GB | ✅ Recommended. Reliable for commands and queries. |
qwen3 8B |
~8 GB | Experimental. Expect occasional missed commands and hallucinations. |
Note: If you want the fastest local inference, consider LM Studio with
matrixportalx/tulu-3.1-8b-supernova. In testing, it's noticeably faster than Ollama on the same hardware with the same model class.
Step 1 - Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
Ollama installs as a systemd service and starts automatically.
Step 2 - Pull a recommended Model
ollama pull hf.co/matrixportalx/Tulu-3.1-8B-SuperNova-Q4_K_M-GGUF
Or experimental alternatives:
ollama pull qwen3:8b
Step 3 - (Optional) Tune the Ollama Service
Out of the box Ollama works fine, but if you want to share VRAM more carefully with Elite Dangerous, this is where you do it.
sudo nano /etc/systemd/system/ollama.service.d/override.conf
Paste this in:
[Service]
Environment="OLLAMA_MAX_VRAM=14000000000"
Environment="OLLAMA_DEBUG=0"
Environment="OLLAMA_NUM_PARALLEL=3"
Environment="OLLAMA_MAX_LOADED_MODELS=1"
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KEEP_ALIVE=-1"
Nice=10
IOSchedulingClass=best-effort
IOSchedulingPriority=5
Then reload and restart:
sudo systemctl daemon-reload
sudo systemctl restart ollama.service
What these settings do
OLLAMA_MAX_VRAM - hard cap on VRAM Ollama can use, in bytes. 14000000000 = 14 GB. Leaves the rest for Elite Dangerous. Adjust based on your GPU and how much the game needs.
OLLAMA_NUM_PARALLEL - how many requests Ollama handles simultaneously. Elite Intel makes async calls so setting this too low causes failures. 3 covers the typical command + query overlap without over-allocating.
OLLAMA_MAX_LOADED_MODELS - keeps only one model in VRAM at a time. No reason to keep stale models loaded.
OLLAMA_FLASH_ATTENTION - enables Flash Attention, which reduces memory bandwidth usage during inference. Generally faster, especially for repeated requests.
OLLAMA_KEEP_ALIVE=-1 - keeps the model loaded in VRAM indefinitely. Without this, Ollama may unload the model after a period of inactivity and you pay a reload penalty on the next request.
Step 4 - Wire It Up in Elite Intel
Head to the Settings tab in Elite Intel:
- Leave the LLM Key field blank (local Ollama doesn't need one).
- LLM Address defaults to
http://localhost:11434/api/chat. If Ollama is on another machine, replacelocalhostwith that machine's IP. - Command LLM - set to
hf.co/matrixportalx/Tulu-3.1-8B-SuperNova-Q4_K_M-GGUF:latest. (or whatever the ollama ls reveals as name) - Query LLM - set to
hf.co/matrixportalx/Tulu-3.1-8B-SuperNova-Q4_K_M-GGUF:latest. (or whatever the ollama ls reveals as name) - Hit Stop → Start on the AI tab to apply changes.
Community 👉Matrix👈