Model Converter (tensor list builder) - Mungert69/GGUFModelBuilder GitHub Wiki

What is tensor_list_builder.py?

tensor_list_builder.py is a utility script that analyzes a GGUF model file and, using a set of JSON quantization rules, suggests the optimal quantization type for each tensor/layer in the model. This is especially useful for precision-adaptive quantization (e.g., ultra-low-bit quantization) where different layers may benefit from different quantization levels.

It is typically used as part of the quantization pipeline to generate --tensor-type arguments for llama.cpp quantization tools.


What Does It Do?

  • Reads a GGUF model file and extracts the quantization type for each tensor/layer.
  • Loads quantization rules from a JSON file (e.g., quant_rules.json).
  • For each tensor/layer:
    • Determines its layer order (e.g., blk.27.attn_k_norm → layer 27).
    • Normalizes the layer order (for rules that depend on position in the model).
    • Applies the quantization rules to suggest a new quantization type if needed.
    • Explains the reason for any suggested change.

Outputs:

  • A list of suggested quantization changes (with reasons).
  • A copy-paste ready string of --tensor-type arguments for use with quantization tools.

Usage

Command-Line

python tensor_list_builder.py <gguf_file> <quant_rules.json> <target_type> [--moe]
  • <gguf_file>: Path to the GGUF model file to analyze.
  • <quant_rules.json>: Path to the JSON file with quantization rules.
  • <target_type>: The default quantization type you want to use (e.g., IQ2_XXS, Q4_K, etc.).
  • --moe: (Optional) Indicates the model is a Mixture of Experts (MoE).

Example:

python tensor_list_builder.py ./llama-3-8b-bf16.gguf quant_rules.json IQ2_XXS --moe

Output

Quantization Suggestions:

For each tensor/layer where a different quantization is suggested, it prints:

Tensor: blk.27.attn_k_norm
  Current: Q4_K
  Suggested: IQ2_XXS
  Reason: Bumped from Q4_K by -2 levels for blk.27.attn_k_norm (Layer order bump: -2)

Suggested --tensor-type Arguments:

At the end, it prints a single line you can copy-paste into your quantization command:

--tensor-type blk.27.attn_k_norm=IQ2_XXS --tensor-type blk.28.attn_k_norm=IQ2_XXS ...

When Should You Use It?

  • Before quantizing a model with llama.cpp and you want to use per-tensor quantization for better accuracy/efficiency.
  • When developing or testing new quantization strategies (e.g., ultra-low-bit, MoE-aware quantization).
  • To generate arguments for quantization scripts that support per-tensor quantization.

How Does It Work?

  • Extracts tensor quantization info by calling a helper script (get_gguf_tensor_info.py).
  • Parses quantization rules (supports wildcards, layer order, MoE-specific rules, etc.).
  • Determines if a quantization "bump" (change) should be applied for each tensor.
  • Prints suggestions and generates command-line arguments for use in quantization.

Example Flow

Mermaid diagram (not included here)


Integration

  • Used by: make_files.py (for advanced quantization).
  • Output is typically passed to llama-quantize as --tensor-type arguments.

Summary Table

Step What it does
Load GGUF file Reads tensor names and current quantization types
Load rules Reads quantization rules from JSON
Analyze tensors Applies rules to suggest per-tensor quantization
Output suggestions Prints changes and --tensor-type arguments

Let me know if you want me to add the Mermaid flow or anything else!

⚠️ **GitHub.com Fallback** ⚠️