Benchmarking bladeRF RX speed (FX3 throughput) - MyPublicForks/bladeRF GitHub Wiki
Description | |
---|---|
System 1 | Desktop Core 2 Duo E8400 3GHz Windows 7 64-bit Renesas uPD720201 |
System 2 | Laptop i7-4700MQ Windows 8.1 64-bit. Intel 8 Series/C220 Series USB 3.0 xhci |
Libusb async setting | Value |
---|---|
#Buffers | 32 |
Buffer size | 32768 samples |
simultaneous transfers | 16 |
- No file saving
- Receive only
- Sample rate: Not used - (FPGA readily supplies data to FX3 regardless of LMS sample availability)
- 1,000,000,000 samples were transferred for benchmarking
With default FX3 setting:
USB Burst length = 16 DMA buffer size = 2048 DMA buffer count = 22
Speed (Msps) | Equivalent KB/s | |
---|---|---|
System 1 | 31.1 | 121,000 |
System 2 | 47.5 | 186,000 |
Based on "Optimizing USB 3.0 Throughput with EZ-USB® FX3™" Cypress AN86947, lets try this buffer configuration:
USB Burst length = 8
DMA buffer size = 16384
DMA buffer count = 2
Speed (Msps) | Equivalent KB/s | |
---|---|---|
System 1 | 42.0 | 164,000 |
System 2 | 47.9 | 187,000 |
Using default pre-built GpifToUsb.img
#define CY_FX_EP_BURST_LENGTH (8)
#define CY_FX_DMA_BUF_SIZE (16384)
#define CY_FX_DMA_BUF_COUNT (4)
Throughput (KB/s) | |
---|---|
System 1 | 185,000 |
System 2 | 388,000 |
CPU utilization much much better, but throughput is still capped at 187200 KB/s. Very puzzling, DMA_TX_EN is definitely off (only DMA_RX_EN enabled). Speed must be capped by FX3 DMA buffering or GPIF.
Found cause of speed capping of System 2. HDL feeding the FX3 GPIF wasn't doing it at the fastest possible rate. (Bug in debug_line_speed_rx implementation). Will repeat test again.
Speed (Msps) | Equivalent KB/s | Notes | |
---|---|---|---|
System 1 | 42.0 | 164,000 | Total CPU utilization is 50% split 30/70 between 2 cores |
System 2 | 60.5 | 236,000 | One core has 100% CPU utilization |
Changing CLI setting to:
Libusb async setting | Value |
---|---|
#Buffers | 6 |
Buffer size | 1048576 samples |
simultaneous transfers | 4 |
Speed (Msps) | Equivalent KB/s | |
---|---|---|
System 1 | 43.7 | 171,000 |
System 2 | 67.7 | 264,000 |
App setting
Packets per Xfer : 64 (ie 64*8192bytes)
Xfers to queue : 4
KB/s | Notes | |
---|---|---|
System 1 | 187,000 | CPU utilization very low, fluctuate between 1-10%, average 5% |
System 2 | 368,000 | CPU utilization very low, Total 5%, 2 cores at about 10% each |
Why do I get only half the performance level quoted in Cypress AN86947?
- Is libusb the bottleneck? Would I get better throughput using cypress driver?
- Should/can the reading task be split in two threads?
- Is the single DMA in FX3 the bottle neck?
- Why is the CPU utilization relatively high?