BitNet Core - ocentra/bitnet.rs GitHub Wiki
bitnet-core
)
BitNet Core (A pure Rust, streaming-friendly core engine for BitNet models, focused on high-performance inference, quantization, and kernel dispatch. Includes all performance-critical logic, model definitions, and backend implementations for both CPU and GPU (WGSL).
Table of Contents
- Purpose
- Main Modules
- Architecture
- How to Use
- Features
- Kernel & Quantization
- Test Coverage
- Implementation Notes
Purpose
- Serve as the backend engine for BitNet inference (and planned training)
- Provide modular, extensible components for model architecture, quantization, and kernel dispatch
- Support both CPU (SIMD) and GPU (WGSL) backends
- Enable streaming-friendly, per-block model loading and execution
Main Modules
model.rs
: Pure Rust Transformer model architecture (no burn dependency)attention.rs
,feed_forward.rs
,rms_norm.rs
: Core model submodules (pure Rust)bitnet_linear.rs
: BitLinear quantized layer, packing, and quantization utilitieskernels/
: CPU/GPU kernel implementations (WGSL, SIMD)settings.rs
: Inference and generation settingsembedding.rs
: Embedding layertokenizer.rs
: Tokenizer and chat template logicerror.rs
: Error types and handlinggui/
: (Optional) Core-level visualization and debugging UI for developers (feature-gated)training.rs
,visualization.rs
: (Planned) Training and logging/metrics hooks
Architecture
- Pure Rust, burn-free: All core logic is implemented in Rust, with no dependency on the burn framework for inference
- Streaming-friendly: Model weights are loaded per-block, supporting large models and efficient memory usage
- Quantized & packed: Uses ternary quantization and efficient packing for weights and activations
- GPU kernel integration: Includes WGSL kernels for high-performance inference on modern GPUs
How to Use
Add to your Cargo.toml
:
bitnet-core = { path = "../bitnet-core" }
Then in your code:
use bitnet_core::model::Transformer;
// ...
Features
- Modular, extensible design
- Optional GPU and core-gui features (feature flags)
- Designed for correctness, performance, and portability
- Streaming-friendly model loading and execution
- Robust error handling and test coverage
Kernel & Quantization
- WGSL GPU kernel: See
src/kernels/bitnet_kernel.wgsl
for the main ternary matmul kernel - Packing utilities: See
src/kernels.rs
for pure Rust packing and scale calculation - Quantization: Scalar and SIMD quantization utilities for activations and weights
- Tested against scalar reference: All kernels are validated against pure Rust reference implementations
Test Coverage
- Unit tests for packing, quantization, and kernel correctness
- Direct wgpu kernel launch tests (no burn dependency)
- End-to-end model pipeline validation (see
tests/pipeline_validation.rs
) - Streaming and per-block model loading tests
- Optional Stress Test: A long-running stress test (
stress_test_maximum_dimension_support
) is available but ignored by default. To run it, set theRUN_STRESS_TESTS
environment variable:- PowerShell:
$env:RUN_STRESS_TESTS="1"; cargo test --package bitnet-core --test kernel_tests -- --nocapture
- Linux/macOS:
RUN_STRESS_TESTS=1 cargo test --package bitnet-core --test kernel_tests -- --nocapture
- PowerShell:
Implementation Notes
- See the project plan for architecture and validation strategies
- Use feature flags to enable GPU or core-gui modules
- For kernel and quantization details, see code comments in
src/kernels.rs
andsrc/kernels/bitnet_kernel.wgsl
For questions or contributions, see the main project README or open an issue.