Why Tulu3.1 supernova - stone-alex/EliteIntel GitHub Wiki

Why tulu3.1 Supernova specifically?

Elite Intel is not your ship's parrot. It's primary function is command parser and data analysis. This means a very specific requirement. Generating random banter is not enough. The model has to correctly infer action from the voice input and correctly analyze data. It must return the results in formatted JSON, not an essay in markup or HTML. By far not every model of this size can do this reliably.

Tulu 3 (the base training recipe) is genuinely exceptional

Most instruct models are trained with RLHF, which uses a learned reward model to judge outputs. That reward model is itself a neural network, so it inherits all the usual biases and inconsistencies. Tulu 3 replaced this with RLVR, (Reinforcement Learning with Verifiable Rewards) where instead of a learned reward model, the training uses a deterministic scoring function: the answer is either correct or it isn't, binary, no bias. This is particularly impactful for instruction following tasks, where the reward signal is objective and non-biased.

The training pipeline also isn't just RLVR, it is a four-stage approach: data curation targeting core skills, supervised fine-tuning, Direct Preference Optimization, and then RLVR on top to sharpen verifiable task performance. Each stage builds on the last. That's why Tulu 3 on the 8B Llama base achieves results surpassing the instruct versions of Llama 3.1, Qwen 2.5, Mistral, and even closed models like GPT-4o-mini and Claude 3.5 Haiku.

Why this matters specifically for EliteIntel? The classifier stage is essentially an instruction-following task with verifiable correct answers (JSON action X vs. Y). That's precisely what RLVR hammers. The model was literally trained to be accurate at that kind of deterministic structured output.

So why is "Supernova" variant specifically?

This variant isn't stock Tulu 3. Tulu-3.1-8B-SuperNova is created via a linear merge of three models: Llama-3.1-MedIT-SUN-8B (medical/reasoning), Llama-3.1-Tulu-3-8B (instruction following), and Llama-3.1-SuperNova-Lite (Arcee's distilled model), each contributing equally at weight 1.0 using mergekit.

The SuperNova-Lite parent brings something extra: a distilled model from a much larger Arcee base, meaning it has knowledge density beyond what a vanilla 8B would have. The linear merge then averages the weight tensors directly so we get the combined "knowledge" baked into the weights without any additional training compute. This achieves particularly strong results on instruction-following tasks as demonstrated by its IFEval score.

Why is it fast? It is still just an 8B Llama architecture. On a 3090 24GB at Q4_K_M quantization, it fits comfortably in VRAM along with the game with room to spare, so there is a maximum throughput without any CPU offload penalties. The Qwen models compared against this are often have different attention head configurations (e.g., Qwen2.5's GQA ratio differs) that can run slightly slower in llama.cpp's GGUF backend. It works well on a 12GB VRAM card IF there is nothing else present, (Game runs on a different GPU or different machine)

Can I use a different model?

You can try. But probably you will not get the speed and inference accuracy that tulu3.1-supernova can give you.

Common problems (other than speed) will be incorrect responses. Such as model returns a mark up essay instead of inferred action or data analysis. This is the most common error.