Benchmarking bladeRF RX speed (FX3 throughput) - MyPublicForks/bladeRF GitHub Wiki

Test setup

	Description
System 1	Desktop Core 2 Duo E8400 3GHz Windows 7 64-bit Renesas uPD720201
System 2	Laptop i7-4700MQ Windows 8.1 64-bit. Intel 8 Series/C220 Series USB 3.0 xhci

Libusb async setting	Value
#Buffers	32
Buffer size	32768 samples
simultaneous transfers	16

No file saving
Receive only
Sample rate: Not used - (FPGA readily supplies data to FX3 regardless of LMS sample availability)
1,000,000,000 samples were transferred for benchmarking

Result

With default FX3 setting:

USB Burst length = 16
DMA buffer size = 2048
DMA buffer count = 22

	Speed (Msps)	Equivalent KB/s
System 1	31.1	121,000
System 2	47.5	186,000

Based on "Optimizing USB 3.0 Throughput with EZ-USB® FX3™" Cypress AN86947, lets try this buffer configuration:

USB Burst length = 8
DMA buffer size = 16384
DMA buffer count = 2

	Speed (Msps)	Equivalent KB/s
System 1	42.0	164,000
System 2	47.9	187,000

Using Cypress driver + Streamer application

Using default pre-built GpifToUsb.img

#define CY_FX_EP_BURST_LENGTH           (8)
#define CY_FX_DMA_BUF_SIZE              (16384)
#define CY_FX_DMA_BUF_COUNT             (4)

	Throughput (KB/s)
System 1	185,000
System 2	388,000

Using Cypress driver + Streamer BUT with bladeRF firmware

CPU utilization much much better, but throughput is still capped at 187200 KB/s. Very puzzling, DMA_TX_EN is definitely off (only DMA_RX_EN enabled). Speed must be capped by FX3 DMA buffering or GPIF.

Found cause of speed capping of System 2. HDL feeding the FX3 GPIF wasn't doing it at the fastest possible rate. (Bug in debug_line_speed_rx implementation). Will repeat test again.

After fixing the HDL so GPIF is always filled

	Speed (Msps)	Equivalent KB/s	Notes
System 1	42.0	164,000	Total CPU utilization is 50% split 30/70 between 2 cores
System 2	60.5	236,000	One core has 100% CPU utilization

Changing CLI setting to:

Libusb async setting	Value
#Buffers	6
Buffer size	1048576 samples
simultaneous transfers	4

	Speed (Msps)	Equivalent KB/s
System 1	43.7	171,000
System 2	67.7	264,000

Now same bladeRF firmware but now with Cypress driver and streamer app

App setting

Packets per Xfer : 64 (ie 64*8192bytes)

Xfers to queue : 4

	KB/s	Notes
System 1	187,000	CPU utilization very low, fluctuate between 1-10%, average 5%
System 2	368,000	CPU utilization very low, Total 5%, 2 cores at about 10% each

Things to investigate

Why do I get only half the performance level quoted in Cypress AN86947?

Is libusb the bottleneck? Would I get better throughput using cypress driver?
Should/can the reading task be split in two threads?
Is the single DMA in FX3 the bottle neck?
Why is the CPU utilization relatively high?