BitNet Converter (`bitnet-converter`)

A robust, streaming-friendly Rust tool for converting standard model weights (e.g., Hugging Face safetensors) into the optimized, quantized format required by the BitNet engine.

Purpose
Features
Conversion Pipeline
CLI Usage
Output Format
Error Handling
Parallelism & Performance
Test Coverage
Implementation Notes

Purpose

Convert model weights from common formats (e.g., Hugging Face safetensors) to BitNet's custom, quantized, streaming-friendly format
Apply quantization, permutation, packing, and interleaving steps
Ensure compatibility and performance for BitNet inference

Features

Burn-free, pure Rust: No dependency on the burn framework for conversion
Streaming output: Each model block and top-level module is saved as a separate file for efficient loading
Parallelized: Layer conversion and file writing are parallelized for speed
Robust loader: Handles 1D/2D tensors, shape promotion, and errors gracefully
Comprehensive error handling: Clear, actionable errors for missing tensors, shape mismatches, and more
Extensive tests: Serialization, streaming, loader, and error cases are all covered

Conversion Pipeline

Load config and safetensors from disk
Parse tensors using a minimal, robust loader (BF16 → f32, shape promotion)
Quantize and pack weights using SIMD-optimized and fallback routines
Structure the model into serializable records (embedding, blocks, norms, lm_head, metadata)
Serialize each part to a separate .bin file in the output directory

CLI Usage

cargo run -p bitnet-converter -- [--input-dir <input_dir>] [--output-dir <output_dir>]

If not provided, defaults to the official BitNet-2B model subdirectory under the workspace's models folder.
Logs are written to logs/bitnet-converter-<timestamp>.txt.

Example:

cargo run -p bitnet-converter -- --input-dir models/Original/microsoft/bitnet-b1.58-2B-4T-bf16 --output-dir models/Converted/microsoft/bitnet-b1.58-2B-4T-bf16

Output Format

Per-block files: Each transformer block is saved as block_<i>.bin
Top-level files: embedding.bin, norm.bin, lm_head.bin
Metadata: Each output includes a ModelMetadata struct (layer count, vocab size, hidden size, timestamp)
Format: All files are serialized using bincode for fast, compact storage

Error Handling

Uses a custom ConversionError enum for clear, actionable errors
Handles missing tensors, shape mismatches, and invalid data robustly
Loader gracefully skips unsupported shapes and corrupt files

Parallelism & Performance

Uses rayon for parallel layer processing and file writing
SIMD quantization is used where available, with fallback for other platforms
Loader is minimal and efficient

Test Coverage

Serialization/deserialization of full models and individual blocks
Streaming (per-block) output and reassembly
Loader correctness for 1D/2D tensors, error cases, and shape promotion
Error handling for corrupt files and unsupported shapes

Implementation Notes

Designed for extensibility to support new formats and quantization schemes
See the project plan for details on the conversion pipeline
For more details on the packing format and quantization, see code comments in src/packer.rs

For questions or contributions, see the main project README or open an issue.

KERNEL_TESTS Test Report

Generated on: 2025-06-24 13:38:14

Test Results

No.	Test Name	Status	Time Taken
1	Cross Device Consistency Test	✅ Pass	24.69 sec
2	Kernel All Minus One Weights Test	✅ Pass	349.00 ms
3	Kernel All Minus One Weights Test Warm	✅ Pass	3.00 ms
4	Kernel All Plus One Weights Test	✅ Pass	358.00 ms
5	Kernel All Plus One Weights Test Warm	✅ Pass	3.00 ms
6	Kernel All Zero Test	✅ Pass	343.00 ms
7	Kernel All Zero Test Warm	✅ Pass	3.00 ms
8	Kernel Large Batch Test	✅ Pass	351.00 ms
9	Kernel Large Batch Test Warm	✅ Pass	3.00 ms
10	Kernel Non Divisible Batch Test	✅ Pass	356.00 ms
11	Kernel Non Divisible Batch Test Warm	✅ Pass	3.00 ms
12	Low Level Kernel Correctness Test	✅ Pass	370.00 ms
13	Low Level Kernel Correctness Test Warm	✅ Pass	5.00 ms
14	Memory Safety Buffer Overflow Test	✅ Pass	368.00 ms
15	Memory Safety Buffer Overflow Test Warm	✅ Pass	0.00 ms
16	Memory Safety Hardcoded Large Allocation Test Warm	✅ Pass	0.00 ms
17	Performance Benchmark GPU Vs Scalar	✅ Pass	521.00 ms
18	Performance Benchmark GPU Vs Scalar Warm	✅ Pass	173.00 ms
19	Precision Test Fp Edge Cases	✅ Pass	363.00 ms
20	Precision Test Fp Edge Cases Warm	✅ Pass	2.00 ms
21	Streaming Load Test	✅ Pass	417.00 ms
22	Streaming Load Test Warm	✅ Pass	16.00 ms
23	Stress Test Maximum Dimension Support Warm	✅ Pass	1.92 sec
24	Basic GPU Buffer Operations	✅ Pass	568.00 ms
25	Bitlinear Layer Forward Pass	✅ Pass	698.00 ms
26	Bitlinear Layer Forward Pass Warm	✅ Pass	344.00 ms
27	GPU Kernel Dimensions	✅ Pass	355.00 ms
28	GPU Kernel Dimensions Warm	✅ Pass	1.00 ms
29	Matmul Quantized Scalar	✅ Pass	2.00 ms
30	Matmul Quantized Scalar Warm	✅ Pass	0.00 ms
31	Scalar Packing Decoding Symmetry	✅ Pass	0.00 ms
32	Unit Test Calculate Weight Scales	✅ Pass	1.00 ms
33	Unit Test Calculate Weight Scales Warm	✅ Pass	0.00 ms
34	Unit Test Pack Ternary Weights	✅ Pass	2.00 ms
35	Unit Test Pack Ternary Weights Warm	✅ Pass	0.00 ms

📝 View Full Log Dump

[2025-06-24, 13:37:38.618] -> Running unit_test_pack_ternary_weights...
[2025-06-24, 13:37:38.620] -> unit_test_pack_ternary_weights passed.
[2025-06-24, 13:37:38.622] -> Running unit_test_calculate_weight_scales...
[2025-06-24, 13:37:38.624] -> unit_test_calculate_weight_scales passed.
[2025-06-24, 13:37:38.625] -> Starting test_matmul_quantized_scalar...
[2025-06-24, 13:37:38.627] -> test_matmul_quantized_scalar passed.
[2025-06-24, 13:37:38.630] -> Testing basic GPU operations...
[2025-06-24, 13:37:39.196] -> Test data: [1.0, 2.0, 3.0, 4.0]
[2025-06-24, 13:37:39.198] -> Read-back data: [1.0, 2.0, 3.0, 4.0]
[2025-06-24, 13:37:39.198] -> Basic GPU operations test passed!
[2025-06-24, 13:37:39.593] -> Running correctness logic with dims: batch=4, in=16, out=8
[2025-06-24, 13:37:39.595] -> [Profile] Buffer Setup: 2.09ms
[2025-06-24, 13:37:39.596] -> [Profile] Bind Group Setup: 622.00µs
[2025-06-24, 13:37:39.597] -> [Profile] Dispatch & Submit: 628.70µs
[2025-06-24, 13:37:39.597] -> [Profile] Readback (map/poll/copy): 175.70µs
[2025-06-24, 13:37:39.597] -> [Profile] Total launch_gpu_kernel Time: 4.13ms
[2025-06-24, 13:37:39.598] -> Correctness test comparison: GPU[..4]=[2.855981, 1.343083, -0.007718868, -2.215315], Scalar[..4]=[2.855981, 1.343083, -0.007718868, -2.215315]
[2025-06-24, 13:37:39.598] -> low_level_kernel_correctness_test passed.
[2025-06-24, 13:38:09.617] -> Running correctness logic with dims: batch=4, in=16, out=8
[2025-06-24, 13:38:09.620] -> [Profile] Buffer Setup: 2.88ms
[2025-06-24, 13:38:09.620] -> [Profile] Bind Group Setup: 168.80µs
[2025-06-24, 13:38:09.621] -> [Profile] Dispatch & Submit: 1.04ms
[2025-06-24, 13:38:09.622] -> [Profile] Readback (map/poll/copy): 100.40µs
[2025-06-24, 13:38:09.622] -> [Profile] Total launch_gpu_kernel Time: 4.93ms
[2025-06-24, 13:38:09.622] -> Correctness test comparison: GPU[..4]=[2.855981, 1.343083, -0.007718868, -2.215315], Scalar[..4]=[2.855981, 1.343083, -0.007718868, -2.215315]
[2025-06-24, 13:37:39.601] -> Running test_gpu_kernel_dimensions...
[2025-06-24, 13:37:39.954] -> [Profile] Buffer Setup: 1.39ms
[2025-06-24, 13:37:39.954] -> [Profile] Bind Group Setup: 225.00µs
[2025-06-24, 13:37:39.955] -> [Profile] Dispatch & Submit: 763.60µs
[2025-06-24, 13:37:39.955] -> [Profile] Readback (map/poll/copy): 209.70µs
[2025-06-24, 13:37:39.956] -> [Profile] Total launch_gpu_kernel Time: 3.34ms
[2025-06-24, 13:37:39.956] -> test_gpu_kernel_dimensions passed.
[2025-06-24, 13:38:09.623] -> [Profile] Buffer Setup: 184.00µs
[2025-06-24, 13:38:09.623] -> [Profile] Bind Group Setup: 85.80µs
[2025-06-24, 13:38:09.623] -> [Profile] Dispatch & Submit: 285.40µs
[2025-06-24, 13:38:09.624] -> [Profile] Readback (map/poll/copy): 167.80µs
[2025-06-24, 13:38:09.624] -> [Profile] Total launch_gpu_kernel Time: 1.25ms
[2025-06-24, 13:37:39.959] -> Running kernel_large_batch_test...
[2025-06-24, 13:37:40.308] -> [Profile] Buffer Setup: 1.38ms
[2025-06-24, 13:37:40.308] -> [Profile] Bind Group Setup: 157.00µs
[2025-06-24, 13:37:40.309] -> [Profile] Dispatch & Submit: 852.20µs
[2025-06-24, 13:37:40.309] -> [Profile] Readback (map/poll/copy): 194.40µs
[2025-06-24, 13:37:40.310] -> [Profile] Total launch_gpu_kernel Time: 3.27ms
[2025-06-24, 13:37:40.310] -> kernel_large_batch_test passed.
[2025-06-24, 13:38:09.626] -> [Profile] Buffer Setup: 845.90µs
[2025-06-24, 13:38:09.627] -> [Profile] Bind Group Setup: 58.40µs
[2025-06-24, 13:38:09.627] -> [Profile] Dispatch & Submit: 247.00µs
[2025-06-24, 13:38:09.627] -> [Profile] Readback (map/poll/copy): 73.50µs
[2025-06-24, 13:38:09.627] -> [Profile] Total launch_gpu_kernel Time: 1.83ms
[2025-06-24, 13:37:40.313] -> Running kernel_all_zero_test...
[2025-06-24, 13:37:40.654] -> [Profile] Buffer Setup: 1.45ms
[2025-06-24, 13:37:40.654] -> [Profile] Bind Group Setup: 172.50µs
[2025-06-24, 13:37:40.655] -> [Profile] Dispatch & Submit: 665.40µs
[2025-06-24, 13:37:40.656] -> [Profile] Readback (map/poll/copy): 108.00µs
[2025-06-24, 13:37:40.656] -> [Profile] Total launch_gpu_kernel Time: 3.23ms
[2025-06-24, 13:37:40.657] -> kernel_all_zero_test passed.
[2025-06-24, 13:38:09.630] -> [Profile] Buffer Setup: 991.30µs
[2025-06-24, 13:38:09.630] -> [Profile] Bind Group Setup: 63.50µs
[2025-06-24, 13:38:09.631] -> [Profile] Dispatch & Submit: 416.60µs
[2025-06-24, 13:38:09.631] -> [Profile] Readback (map/poll/copy): 136.90µs
[2025-06-24, 13:38:09.631] -> [Profile] Total launch_gpu_kernel Time: 2.15ms
[2025-06-24, 13:37:40.660] -> Running kernel_all_plus_one_weights_test...
[2025-06-24, 13:37:41.015] -> [Profile] Buffer Setup: 1.40ms
[2025-06-24, 13:37:41.016] -> [Profile] Bind Group Setup: 156.40µs
[2025-06-24, 13:37:41.016] -> [Profile] Dispatch & Submit: 579.50µs
[2025-06-24, 13:37:41.017] -> [Profile] Readback (map/poll/copy): 219.50µs
[2025-06-24, 13:37:41.017] -> [Profile] Total launch_gpu_kernel Time: 3.07ms
[2025-06-24, 13:37:41.017] -> kernel_all_plus_one_weights_test passed.
[2025-06-24, 13:38:09.633] -> [Profile] Buffer Setup: 866.60µs
[2025-06-24, 13:38:09.633] -> [Profile] Bind Group Setup: 111.70µs
[2025-06-24, 13:38:09.634] -> [Profile] Dispatch & Submit: 316.60µs
[2025-06-24, 13:38:09.634] -> [Profile] Readback (map/poll/copy): 84.20µs
[2025-06-24, 13:38:09.634] -> [Profile] Total launch_gpu_kernel Time: 2.01ms
[2025-06-24, 13:37:41.020] -> Running kernel_all_minus_one_weights_test...
[2025-06-24, 13:37:41.367] -> [Profile] Buffer Setup: 1.49ms
[2025-06-24, 13:37:41.368] -> [Profile] Bind Group Setup: 155.90µs
[2025-06-24, 13:37:41.369] -> [Profile] Dispatch & Submit: 711.80µs
[2025-06-24, 13:37:41.369] -> [Profile] Readback (map/poll/copy): 138.20µs
[2025-06-24, 13:37:41.369] -> [Profile] Total launch_gpu_kernel Time: 3.17ms
[2025-06-24, 13:37:41.370] -> kernel_all_minus_one_weights_test passed.
[2025-06-24, 13:38:09.636] -> [Profile] Buffer Setup: 806.90µs
[2025-06-24, 13:38:09.637] -> [Profile] Bind Group Setup: 178.40µs
[2025-06-24, 13:38:09.637] -> [Profile] Dispatch & Submit: 366.30µs
[2025-06-24, 13:38:09.638] -> [Profile] Readback (map/poll/copy): 280.30µs
[2025-06-24, 13:38:09.638] -> [Profile] Total launch_gpu_kernel Time: 2.26ms
[2025-06-24, 13:37:41.373] -> Running kernel_non_divisible_batch_test...
[2025-06-24, 13:37:41.727] -> [Profile] Buffer Setup: 1.32ms
[2025-06-24, 13:37:41.727] -> [Profile] Bind Group Setup: 151.60µs
[2025-06-24, 13:37:41.728] -> [Profile] Dispatch & Submit: 607.80µs
[2025-06-24, 13:37:41.728] -> [Profile] Readback (map/poll/copy): 156.40µs
[2025-06-24, 13:37:41.728] -> [Profile] Total launch_gpu_kernel Time: 2.95ms
[2025-06-24, 13:37:41.729] -> kernel_non_divisible_batch_test passed.
[2025-06-24, 13:38:09.640] -> [Profile] Buffer Setup: 904.10µs
[2025-06-24, 13:38:09.641] -> [Profile] Bind Group Setup: 78.00µs
[2025-06-24, 13:38:09.641] -> [Profile] Dispatch & Submit: 265.80µs
[2025-06-24, 13:38:09.641] -> [Profile] Readback (map/poll/copy): 77.10µs
[2025-06-24, 13:38:09.641] -> [Profile] Total launch_gpu_kernel Time: 1.91ms
[2025-06-24, 13:37:41.732] -> Running test_bitlinear_layer_forward_pass...
[2025-06-24, 13:37:42.430] -> test_bitlinear_layer_forward_pass passed.
[2025-06-24, 13:37:42.434] -> Running performance_benchmark_gpu_vs_scalar...
[2025-06-24, 13:37:42.955] -> performance_benchmark_gpu_vs_scalar passed.
[2025-06-24, 13:37:43.326] -> Starting cross-device consistency test...
[2025-06-24, 13:37:43.327] -> Calculating scalar reference result...
[2025-06-24, 13:37:43.327] -> Scalar reference calculation complete.
[2025-06-24, 13:37:43.569] -> Found 5 adapters. Running per-device subtests.
[2025-06-24, 13:37:43.569] -> SUBTEST: Running on "NVIDIA GeForce RTX 2070 SUPER" ("Vulkan")
[2025-06-24, 13:37:43.641] -> [Profile] Buffer Setup: 1.27ms
[2025-06-24, 13:37:43.642] -> [Profile] Bind Group Setup: 245.20µs
[2025-06-24, 13:37:43.643] -> [Profile] Dispatch & Submit: 665.40µs
[2025-06-24, 13:37:43.643] -> [Profile] Readback (map/poll/copy): 158.70µs
[2025-06-24, 13:37:43.643] -> [Profile] Total launch_gpu_kernel Time: 3.08ms
[2025-06-24, 13:37:43.669] -> PASS: Kernel correctness on "NVIDIA GeForce RTX 2070 SUPER" ("Vulkan")
[2025-06-24, 13:37:43.670] -> SUBTEST: Running on "NVIDIA GeForce RTX 2070 SUPER" ("Dx12")
[2025-06-24, 13:37:55.731] -> [Profile] Buffer Setup: 17.42ms
[2025-06-24, 13:37:55.731] -> [Profile] Bind Group Setup: 175.40µs
[2025-06-24, 13:37:55.735] -> [Profile] Dispatch & Submit: 3.31ms
[2025-06-24, 13:37:55.737] -> [Profile] Readback (map/poll/copy): 1.76ms
[2025-06-24, 13:37:55.737] -> [Profile] Total launch_gpu_kernel Time: 23.36ms
[2025-06-24, 13:37:55.770] -> PASS: Kernel correctness on "NVIDIA GeForce RTX 2070 SUPER" ("Dx12")
[2025-06-24, 13:37:55.770] -> SUBTEST: Running on "Microsoft Basic Render Driver" ("Dx12")
[2025-06-24, 13:37:55.771] -> SKIPPING: Microsoft Basic Render Driver ("Dx12")
[2025-06-24, 13:37:55.771] -> SUBTEST: Running on "NVIDIA GeForce RTX 2070 SUPER" ("Dx12")
[2025-06-24, 13:38:07.945] -> [Profile] Buffer Setup: 10.57ms
[2025-06-24, 13:38:07.945] -> [Profile] Bind Group Setup: 198.60µs
[2025-06-24, 13:38:07.949] -> [Profile] Dispatch & Submit: 3.25ms
[2025-06-24, 13:38:07.950] -> [Profile] Readback (map/poll/copy): 1.56ms
[2025-06-24, 13:38:07.951] -> [Profile] Total launch_gpu_kernel Time: 16.26ms
[2025-06-24, 13:38:07.987] -> PASS: Kernel correctness on "NVIDIA GeForce RTX 2070 SUPER" ("Dx12")
[2025-06-24, 13:38:07.987] -> SUBTEST: Running on "NVIDIA GeForce RTX 2070 SUPER/PCIe/SSE2" ("OpenGL")
[2025-06-24, 13:38:08.004] -> [Profile] Buffer Setup: 1.82ms
[2025-06-24, 13:38:08.005] -> [Profile] Bind Group Setup: 117.20µs
[2025-06-24, 13:38:08.007] -> [Profile] Dispatch & Submit: 2.21ms
[2025-06-24, 13:38:08.013] -> [Profile] Readback (map/poll/copy): 5.64ms
[2025-06-24, 13:38:08.013] -> [Profile] Total launch_gpu_kernel Time: 10.53ms
[2025-06-24, 13:38:08.015] -> PASS: Kernel correctness on "NVIDIA GeForce RTX 2070 SUPER/PCIe/SSE2" ("OpenGL")
[2025-06-24, 13:38:08.015] -> Cross-device consistency test PASSED on all tested devices.
[2025-06-24, 13:38:08.084] -> Running streaming_load_test...
[2025-06-24, 13:38:08.502] -> streaming_load_test passed.
[2025-06-24, 13:37:42.959] -> Running precision_test_fp_edge_cases...
[2025-06-24, 13:37:43.320] -> [Profile] Buffer Setup: 1.22ms
[2025-06-24, 13:37:43.320] -> [Profile] Bind Group Setup: 251.80µs
[2025-06-24, 13:37:43.321] -> [Profile] Dispatch & Submit: 810.50µs
[2025-06-24, 13:37:43.322] -> [Profile] Readback (map/poll/copy): 242.20µs
[2025-06-24, 13:37:43.322] -> [Profile] Total launch_gpu_kernel Time: 3.45ms
[2025-06-24, 13:37:43.322] -> precision_test_fp_edge_cases passed.
[2025-06-24, 13:38:10.161] -> [Profile] Buffer Setup: 771.60µs
[2025-06-24, 13:38:10.161] -> [Profile] Bind Group Setup: 96.90µs
[2025-06-24, 13:38:10.162] -> [Profile] Dispatch & Submit: 321.20µs
[2025-06-24, 13:38:10.162] -> [Profile] Readback (map/poll/copy): 144.40µs
[2025-06-24, 13:38:10.162] -> [Profile] Total launch_gpu_kernel Time: 1.85ms
[2025-06-24, 13:38:08.880] -> Successfully caught invalid weight value (2) as a Result::Err.
[2025-06-24, 13:38:08.881] -> Successfully caught invalid weight value (-2) as a Result::Err.
[2025-06-24, 13:38:08.881] -> Running memory_safety_buffer_overflow_test...
[2025-06-24, 13:38:09.250] -> memory_safety_buffer_overflow_test passed.
[2025-06-24, 13:38:10.180] -> [WARM] Successfully caught expected error: Requested buffer size (268435520 bytes) exceeds device limits.
[2025-06-24, 13:38:08.850] -> WGPU context creation succeeded unexpectedly with impossible limits.
[2025-06-24, 13:38:08.850] -> Requested limits: Limits { max_texture_dimension_1d: 8192, max_texture_dimension_2d: 8192, max_texture_dimension_3d: 2048, max_texture_array_layers: 256, max_bind_groups: 4, max_bindings_per_bind_group: 1000, max_dynamic_uniform_buffers_per_pipeline_layout: 8, max_dynamic_storage_buffers_per_pipeline_layout: 4, max_sampled_textures_per_shader_stage: 16, max_samplers_per_shader_stage: 16, max_storage_buffers_per_shader_stage: 8, max_storage_textures_per_shader_stage: 4, max_uniform_buffers_per_shader_stage: 12, max_uniform_buffer_binding_size: 65536, max_storage_buffer_binding_size: 134217728, max_vertex_buffers: 8, max_buffer_size: 1, max_vertex_attributes: 16, max_vertex_buffer_array_stride: 2048, min_uniform_buffer_offset_alignment: 256, min_storage_buffer_offset_alignment: 256, max_inter_stage_shader_components: 60, max_color_attachments: 8, max_color_attachment_bytes_per_sample: 32, max_compute_workgroup_storage_size: 16384, max_compute_invocations_per_workgroup: 256, max_compute_workgroup_size_x: 256, max_compute_workgroup_size_y: 256, max_compute_workgroup_size_z: 64, max_compute_workgroups_per_dimension: 65535, min_subgroup_size: 0, max_subgroup_size: 0, max_push_constant_size: 0, max_non_sampler_bindings: 1000000 }
[2025-06-24, 13:38:08.851] -> Actual device limits returned: Limits { max_texture_dimension_1d: 8192, max_texture_dimension_2d: 8192, max_texture_dimension_3d: 2048, max_texture_array_layers: 256, max_bind_groups: 4, max_bindings_per_bind_group: 1000, max_dynamic_uniform_buffers_per_pipeline_layout: 8, max_dynamic_storage_buffers_per_pipeline_layout: 4, max_sampled_textures_per_shader_stage: 16, max_samplers_per_shader_stage: 16, max_storage_buffers_per_shader_stage: 8, max_storage_textures_per_shader_stage: 4, max_uniform_buffers_per_shader_stage: 12, max_uniform_buffer_binding_size: 65536, max_storage_buffer_binding_size: 134217728, max_vertex_buffers: 8, max_buffer_size: 1, max_vertex_attributes: 16, max_vertex_buffer_array_stride: 2048, min_uniform_buffer_offset_alignment: 256, min_storage_buffer_offset_alignment: 256, max_inter_stage_shader_components: 60, max_color_attachments: 8, max_color_attachment_bytes_per_sample: 32, max_compute_workgroup_storage_size: 16384, max_compute_invocations_per_workgroup: 256, max_compute_workgroup_size_x: 256, max_compute_workgroup_size_y: 256, max_compute_workgroup_size_z: 64, max_compute_workgroups_per_dimension: 65535, min_subgroup_size: 0, max_subgroup_size: 0, max_push_constant_size: 0, max_non_sampler_bindings: 1000000 }
[2025-06-24, 13:38:10.923] -> [Profile] Buffer Setup: 10.36ms
[2025-06-24, 13:38:10.924] -> [Profile] Bind Group Setup: 189.10µs
[2025-06-24, 13:38:10.925] -> [Profile] Dispatch & Submit: 549.20µs
[2025-06-24, 13:38:11.007] -> [Profile] Readback (map/poll/copy): 82.13ms
[2025-06-24, 13:38:11.007] -> [Profile] Total launch_gpu_kernel Time: 94.23ms
[2025-06-24, 13:38:10.180] -> [WARM] Successfully caught expected error for 10GB allocation: Requested buffer size (10737418240 bytes) exceeds device limits.
[2025-06-24, 13:38:09.252] -> Testing scalar packing-decoding symmetry...
[2025-06-24, 13:38:09.252] -> Original weights:  [-1, 0, 1, 0, 1, 1, 0, -1, -1, -1, 0, 0, 1, 1, 0, 1]
[2025-06-24, 13:38:09.253] -> Decoded weights:   [-1, 0, 1, 0, 1, 1, 0, -1, -1, -1, 0, 0, 1, 1, 0, 1]
[2025-06-24, 13:38:09.253] -> Scalar packing-decoding symmetry test passed.
[2025-06-24, 13:37:38.617] -> STARTING KERNEL TEST SUITE
[2025-06-24, 13:37:38.618] -> --- STARTING COLD RUN (INDIVIDUAL TESTS) ---
[2025-06-24, 13:38:09.253] -> --- STARTING WARM RUN (SHARED CONTEXT) ---
[2025-06-24, 13:38:09.616] -> [WARM] unit_test_pack_ternary_weights passed.
[2025-06-24, 13:38:09.622] -> [WARM] low_level_kernel_correctness_test passed.
[2025-06-24, 13:38:09.624] -> [WARM] test_gpu_kernel_dimensions passed.
[2025-06-24, 13:38:09.628] -> [WARM] kernel_large_batch_test passed.
[2025-06-24, 13:37:40.656] -> kernel_all_zero_test passed.
[2025-06-24, 13:38:09.632] -> [WARM] kernel_all_zero_test passed.
[2025-06-24, 13:38:09.616] -> [WARM] unit_test_calculate_weight_scales passed.
[2025-06-24, 13:38:09.617] -> [WARM] test_matmul_quantized_scalar passed.
[2025-06-24, 13:37:41.017] -> kernel_all_plus_one_weights_test passed.
[2025-06-24, 13:38:09.635] -> [WARM] kernel_all_plus_one_weights_test passed.
[2025-06-24, 13:37:41.370] -> kernel_all_minus_one_weights_test passed.
[2025-06-24, 13:38:09.638] -> [WARM] kernel_all_minus_one_weights_test passed.
[2025-06-24, 13:37:41.729] -> kernel_non_divisible_batch_test passed.
[2025-06-24, 13:38:09.642] -> [WARM] kernel_non_divisible_batch_test passed.
[2025-06-24, 13:38:09.987] -> [WARM] test_bitlinear_layer_forward_pass passed.
[2025-06-24, 13:38:10.160] -> [WARM] Performance Benchmark (100 iterations, 64 batch, 32 in, 16 out):
  GPU (Wall Time):    Avg: 1.602ms    | Total: 160.235ms 
  Scalar (CPU Time):  Avg: 112.948µs  | Total: 11.295ms  
Speedup (Wall vs Scalar):   0.07x
[2025-06-24, 13:38:10.160] -> [WARM] performance_benchmark_gpu_vs_scalar passed.
[2025-06-24, 13:38:10.163] -> [WARM] precision_test_fp_edge_cases passed.
[2025-06-24, 13:38:10.179] -> [WARM] Streaming Load Test (10 streams): Avg Latency: 1.558ms
[2025-06-24, 13:38:12.102] -> [WARM] stress_test_maximum_dimension_support passed.

Summary

Test Statistics

Total Tests: 35
Passed: 35
Failed: 0

Timing Information

Total Time: 32.60 sec
Average Time: 931.00 ms

Status

✅ All tests passed successfully!

Report generated by BitNet Test Framework

BitNet Converter - ocentra/bitnet.rs GitHub Wiki

BitNet Converter (`bitnet-converter`)

Table of Contents

Purpose

Features

Conversion Pipeline

CLI Usage

Output Format

Error Handling

Parallelism & Performance

Test Coverage

Implementation Notes

KERNEL_TESTS Test Report

Test Results

Summary

Test Statistics

Timing Information

Status

⚠️ GitHub.com Fallback ⚠️

BitNet Converter - ocentra/bitnet.rs GitHub Wiki

BitNet Converter (bitnet-converter)

Table of Contents

Purpose

Features

Conversion Pipeline

CLI Usage

Output Format

Error Handling

Parallelism & Performance

Test Coverage

Implementation Notes

KERNEL_TESTS Test Report

Test Results

Summary

Test Statistics

Timing Information

Status

⚠️ **GitHub.com Fallback** ⚠️

BitNet Converter (`bitnet-converter`)

⚠️ GitHub.com Fallback ⚠️