Model Converter (tensor list builder) - Mungert69/GGUFModelBuilder GitHub Wiki
tensor_list_builder.py
is a utility script that analyzes a GGUF model file and, using a set of JSON quantization rules, suggests the optimal quantization type for each tensor/layer in the model. This is especially useful for precision-adaptive quantization (e.g., ultra-low-bit quantization) where different layers may benefit from different quantization levels.
It is typically used as part of the quantization pipeline to generate --tensor-type
arguments for llama.cpp quantization tools.
- Reads a GGUF model file and extracts the quantization type for each tensor/layer.
- Loads quantization rules from a JSON file (e.g.,
quant_rules.json
). - For each tensor/layer:
- Determines its layer order (e.g.,
blk.27.attn_k_norm → layer 27
). - Normalizes the layer order (for rules that depend on position in the model).
- Applies the quantization rules to suggest a new quantization type if needed.
- Explains the reason for any suggested change.
- Determines its layer order (e.g.,
Outputs:
- A list of suggested quantization changes (with reasons).
- A copy-paste ready string of
--tensor-type
arguments for use with quantization tools.
python tensor_list_builder.py <gguf_file> <quant_rules.json> <target_type> [--moe]
-
<gguf_file>
: Path to the GGUF model file to analyze. -
<quant_rules.json>
: Path to the JSON file with quantization rules. -
<target_type>
: The default quantization type you want to use (e.g., IQ2_XXS, Q4_K, etc.). -
--moe
: (Optional) Indicates the model is a Mixture of Experts (MoE).
Example:
python tensor_list_builder.py ./llama-3-8b-bf16.gguf quant_rules.json IQ2_XXS --moe
For each tensor/layer where a different quantization is suggested, it prints:
Tensor: blk.27.attn_k_norm
Current: Q4_K
Suggested: IQ2_XXS
Reason: Bumped from Q4_K by -2 levels for blk.27.attn_k_norm (Layer order bump: -2)
At the end, it prints a single line you can copy-paste into your quantization command:
--tensor-type blk.27.attn_k_norm=IQ2_XXS --tensor-type blk.28.attn_k_norm=IQ2_XXS ...
- Before quantizing a model with llama.cpp and you want to use per-tensor quantization for better accuracy/efficiency.
- When developing or testing new quantization strategies (e.g., ultra-low-bit, MoE-aware quantization).
- To generate arguments for quantization scripts that support per-tensor quantization.
- Extracts tensor quantization info by calling a helper script (
get_gguf_tensor_info.py
). - Parses quantization rules (supports wildcards, layer order, MoE-specific rules, etc.).
- Determines if a quantization "bump" (change) should be applied for each tensor.
- Prints suggestions and generates command-line arguments for use in quantization.
Mermaid diagram (not included here)
- Used by:
make_files.py
(for advanced quantization). - Output is typically passed to
llama-quantize
as--tensor-type
arguments.
Step | What it does |
---|---|
Load GGUF file | Reads tensor names and current quantization types |
Load rules | Reads quantization rules from JSON |
Analyze tensors | Applies rules to suggest per-tensor quantization |
Output suggestions | Prints changes and --tensor-type arguments |
Let me know if you want me to add the Mermaid flow or anything else!