Betti Extractor Trace - kennetholsenatm-gif/q_mini_wasm_v2 GitHub Wiki
This document traces the complete execution path from the high-level BettiExtractor API down to the SYCL hardware acceleration layer for XPU (GPU/FPGA) execution.
File: q_mini_wasm_v2/core/qgnn/betti_extractor.hpp
class BettiExtractor {
public:
// User-facing API entry point
BettiNumbers compute_betti() const;
// Energy-aware variant
BettiNumbers compute_betti_with_budget(EnergyTrit max_energy) const;
// Core rank calculation (calls StabilizerTableau)
uint32_t calculate_gf3_rank() const;
// ... (truncated)
// See source for complete codeCall Flow:
- User calls
compute_betti() - Calls
calculate_gf3_rank() - Accesses
tableau_->get_element()for matrix data - Uses
gf3_mult_lut_for GF(3) operations
File: q_mini_wasm_v2/core/stabilizer/tableau.hpp
class StabilizerTableau {
public:
// Gate operations used by BettiExtractor
void apply_pauli_x(size_t j);
void apply_pauli_z(size_t j);
// Matrix access for rank calculation
uint8_t get_element(size_t row, size_t col) const noexcept;
uint32_t calculate_gf3_rank() const;
private:
// ... (truncated)
// See source for complete codeImplementation: q_mini_wasm_v2/core/stabilizer/tableau.cpp
uint8_t StabilizerTableau::get_element(size_t row, size_t col) const {
// CSR lookup - O(log E) per access
const size_t start = row_ptr_[row];
const size_t end = row_ptr_[row + 1];
for (size_t i = start; i < end; ++i) {
if (col_idx_[i] == col) return values_[i];
}
return 0;
}
// ... (truncated)
// See source for complete codeFile: q_mini_wasm_v2/core/qgnn/message_passing.hpp
class MessagePassingKernel {
public:
// Discrete unitary edge operator
struct TernaryTreeEdge {
size_t source_node;
size_t target_node;
int8_t weight; // GF(3): {-1, 0, +1}
};
// Forward pass through graph
std::span<const stabilizer::StabilizerTableau> forward_pass(
// ... (truncated)
// See source for complete codeFile: q_mini_wasm_v2/core/qgnn/message_passing_sycl.hpp
class MessagePassingSycl {
public:
explicit MessagePassingSycl(sycl::queue& queue, memory::MemoryArena& arena);
// GPU/FPGA accelerated forward pass
std::span<const stabilizer::StabilizerTableau> forward_pass(
std::span<const int8_t> graph_adjacency,
std::span<const stabilizer::StabilizerTableau> node_states
) noexcept;
// Batch symplectic attention on XPU
// ... (truncated)
// See source for complete codeFile: q_mini_wasm_v2/core/qgnn/message_passing_sycl.cpp
void MessagePassingSycl::launch_message_passing_kernel(
const int8_t* adjacency,
const stabilizer::StabilizerTableau* node_states,
stabilizer::StabilizerTableau* output_states,
size_t node_count
) noexcept {
queue_.submit([&](sycl::handler& cgh) {
// Allocate local (shared) memory for L1 cache
sycl::local_accessor<stabilizer::StabilizerTableau, 1> local_nodes(
sycl::range<1>(LOCAL_MEM_NODES), cgh
);
// ... (truncated)
// See source for complete codeMemory Arena (Unified Memory):
// File: q_mini_wasm_v2/core/memory/arena.hpp
class MemoryArena {
// USM (Unified Shared Memory) for zero-copy GPU/CPU access
void* allocate_usm(size_t size); // SYCL malloc_device/malloc_shared
// Trit packing for 5 trits/byte
TritPack5* allocate_trits(size_t count);
};SYCL Queue Management:
// SYCL queue targets GPU, FPGA, or host
sycl::queue queue{sycl::default_selector_v};
// Possible backends:
// - Intel GPU (Level Zero)
// - NVIDIA GPU (CUDA backend)
// - AMD GPU (ROCm backend)
// - FPGA (Intel oneAPI)
// - Host (fallback)User Application
โ
BettiExtractor::compute_betti() [betti_extractor.cpp:138]
โ
BettiExtractor::calculate_gf3_rank() [betti_extractor.cpp:190]
โ
tableau_->get_element(row, col) [tableau.cpp:54] // CSR lookup
โ (optional acceleration path)
MessagePassingKernel::forward_pass() [message_passing.hpp:44]
โ
MessagePassingSycl::forward_pass() [message_passing_sycl.cpp:11]
// ... (truncated)
// See source for complete code
- Input: SimplicialComplex (edges as TritPack5, vertices, faces)
- BettiExtractor converts to StabilizerTableau (CSR sparse matrix)
- Tableau stores GF(3) values {0,1,2} in compressed format
- Rank calculation uses LUT-based GF(3) arithmetic (<0.5 pJ/op)
-
Optional SYCL path batches operations on XPU:
- Node states copied to local memory (L1)
- Symplectic attention computed in parallel
- apply_csum updates via XPU ALUs
- Output: BettiNumbers (ฮฒโ, ฮฒโ, ฮฒโ)
| Layer | Complexity | Energy/Op | Implementation |
|---|---|---|---|
| BettiExtractor API | O(nยฒ) | <0.5 pJ | C++ LUT-based |
| StabilizerTableau | O(E) storage | ~1.5 pJ | CSR sparse |
| MessagePassingKernel | O(NยทE) | ~2.0 pJ | CPU loop |
| MessagePassingSycl | O(NยทE/work_group) | ~0.3 pJ | XPU parallel |
| SYCL Kernel | O(1) per thread | ~0.1 pJ | Hardware ALU |
All layers maintain GF(3) constraints:
- โ No floating-point operations
- โ All values in {0, 1, 2} (or {-1, 0, +1})
- โ
Modular arithmetic:
value % 3 - โ
LUT-based ops:
gf3_add_lut_[a*3+b] - โ Gottesman-Knill: Only H, S, CSUM, Pauli X/Z gates
Trace Complete: BettiExtractor โ SYCL XPU Layer