Challenge #25 - ajainPSU/ECE410-510-Codefests GitHub Wiki

Overview

In this portion of the project, the focus now is away from just standalone hardware accelerators, but into the integration of a functional communication interface between both hardware and software systems. Specifically this week, the task centers upon designing, simulating, and benchmarking a Serial Peripheral Interface (SPI) link that connects Python-based software drivers with Verilog-based hardware accelerators. Thus the objective is emphasized on constructing a complete SPI protocol stack either from existing IP, from scratch, or via AI-assisted design prompts. The interface is then co-simulated using cocotb, a Python-based co-simulation framework, to validate correctness and measure performance characteristics such as throughput and latency.

Cocotb Testbench - Python

The cocotb_testbench.py script implements a fully automated SPI communication benchmark for evaluating both functional correctness and performance metrics of the SPI interface connected to the hardware accelerator. This testbench validates transaction integrity across a range of packet sizes and simultaneously collects precise latency, throughput, and efficiency measurements for deeper system-level analysis.

Key components and notes include:

SPI Master Class: Implements the SPI master protocol in software, and drives and monitors the following hardware ports: sck: Serial clock output, mosi: Master Output Slave Input, miso: Master Input Slave Output, and cs: Chip select (asserted active low). It also has transfer logic structured that serializes each byte one bit at a time, waits for clock half-period between transitions, reads incoming bits from miso after every SCK rising edge, and returns the received byte stream.
Coroutine (test_spi_throughput_latency): This acts as the main cocotb test routine and drives the internal design clock (dut.clk) at a rate of 5 MHz (clk_period_ns = 200 ns), and also instantiates the SpiMaster with a 1 ns clock resolution (clk_period = 1000 ps).
Testing Procedure: A sequence of increasing payload sizes are tested: 1, 8, 16, 32, 64, 128, 256, 512, 1024 bytes. Each transaction sends "0x02" which is the command byte followed by a payload of repeating "0x55" bytes in which both the transmission and reception are measured.
Timing Measurements Captures elapsed latency (µs), Throughput (kbps), Protocol overhead (µs), and Efficiency (%) and displays them as a table.

Verilog Top Module

The qr_spi_top module implements the hardware slave-side interface of the Serial Peripheral Interface (SPI), which allows external software (Python cocotb testbench) to communicate with hardware accelerators. This module is designed to receive incoming SPI data packets, extract commands and payloads, buffer the data internally, and transmit data back over SPI upon request. It's also designed to be hooked into the hardware verilog modules (qr_hw_accelerators_v2.v) and has a wrapper (qr_wrapper.v) to be able to do that.

This module contains three main functional blocks:

SPI Shift logic: Implements bit-serial shift registers to receive (rx_shift) and transmit (tx_shift) data bytes over SPI, shifts data on sck edges while cs is low, completes a byte after 8 clock cycles of sck. bit_cnt counts the SPI bits shifted, spi_byte_done is the flag that is asserted after one complete byte transfer, and miso_reg drives the miso output.
Input/Output Buffers: Implements two internal memory arrays - input_buffer[NSYM]: Receives incoming data payloads from SPI master, and output_buffer[NSYM]: Stores response data to be transmitted back over SPI. byte_index: Tracks which byte is being loaded into the input buffer. out_index: Tracks which byte is being read from the output buffer for SPI transmission.
Command Processing Logic: First byte received is interpreted as a command byte (command). When command reception is complete, set the processing flag. Once all payload bytes are received, automatically sets send_bytes flag to begin transmission of response data.

Protocol Behavior has the SPI master sends command byte followed by data payload (multiple bytes). After receiving the command and full payload: internal processing flag triggers internal computations or state transitions, output buffer is prepared and loaded with data for transmission, and SPI slave begins streaming data back to master over SPI.

The top module uses a fixed-size parameterization but the payload size can be extended via parameterization. Additionally, whilst not using a more explicit FSM, the state transitions occur through: processing to mark active data processing period, send_bytes to indicate readiness to transmit buffered response data, and cs to reset SPI state and bit counters for the next transaction. For timing characteristics, the SPI transfer logic is edge-sensitive to external sck, fully decoupled from clk. Internal buffers, flags, and control logic run synchronously to clk. This allows the SPI interface to operate independently of the system clock domain.

Results

The following is documented from the final results, the file SPI-benchmark3.png

Average Latency: 107,281.42 µs (0.10728142 seconds)
Maximum throughput: 0.02 kbps (2e-5 Mbps = 2.5e-6 MB/s)
Protocol overhead: 1326.46 µs (0.00132646 seconds)

Table Summary

Figure 1: Table Summary of Packet Size with throughput and efficiency.

SPI-benchmark3-

Figure 2: SPI-benchmark3.png, the final SPI benchmark of the cocotb testbench with the hardware/software codesign.

Some observations made were:

Throughput saturates quickly at 0.02 kbps after packet sizes exceed ~16 bytes, showing the system is bottlenecked by protocol overhead rather than SPI clock rate.
Protocol efficiency remains roughly stable (around 0.33–0.35%) across all packet sizes.
The observed protocol overhead (1326.46 µs) is substantial, primarily due to software-driven SPI implementation, single-byte state transitions, and simulated latencies inside the cocotb loop.
Latency remains high due to cocotb’s software-RTL interaction and bit-serial operation emulation, and is probably not reflective of hardware performance.

LLM Inquiries

The following are the LLM inquiries made to get setup for the cocotb test. Files that were uploaded were from previous iterations (Challenges 18-23) in their respective directories.

Inquiry1