Performance - notro/gud GitHub Wiki

Performance

There are several factors that affect performance:

Display resolution

1920x1080 = 2025k pixels, 800x600 = 468k pixels

Pixel format

XRGB8888 has a framebuffer twice the size of RGB565

USB speed

3.0 = 5Gbps, 2.0 = 480Mbps, 1.1 = 12Mbps

Compression

lz4 works very well for desktop application use, not so good for showing fullscreen movies.

CPU speed

Mainly affects decompression

RAM speed

the received buffer is memcpy’ed into the framebuffer

Buffer size

If the work/decompression buffer on the host or device is smaller than the framebuffer, the transfer is split up possibly slowing down the transfer.

Partial update

If the graphics application is smart, it will tell the kernel which part of the framebuffer that has been updated often resulting in a (much) smaller USB transfer.

Table 1. Compare hardware factors
Board USB CPU speed RAM speed

Rock Pi 4

3.0

1.8/1.4 GHz

LPDDR4-3200

Raspberry Pi 4

2.0

1.5 GHz

LPDDR4-3200

Raspberry Pi Zero

2.0

1.0 GHz

LPDDR2-450

Average compression ratio is available in debugfs (this is after showing Big Buck Bunny in HD):

pi@pi4:~ $ sudo cat /sys/kernel/debug/dri/0/stats
Max buffer size: 8.00 MiB
Number of errors:  0
Compression:       lz4
Compression ratio: 2.5

Measure throughput using perf-direct.py and perf-kms.py

There are 2 scripts that can be used to measure performance, one drives the device directly using libusb, the other goes through the host driver.

perf-direct.py runs these tests:
  • No compression

  • x0: Fully random image that will fail to compress into the max buffer size, so will fallback to no compression, so this takes the hit of first trying the compression.

  • x1: Random image with enough zeroes to compress into the same size as a no compress image.

  • x2,3,4,8,16: Fill image with zeroes until it compresses to the desired ratio.

Table 2. Frames per second with format RGB565
Resolution Board No* x2 x4

1920x1080

Rock Pi 4

19 fps

25 fps

41 fps

Raspberry Pi 4

7 fps

11 fps

22 fps

Raspberry Pi Zero

6 fps

9 fps

12 fps

1024x768

Rock Pi 4

49 fps

61 fps

60 fps

Raspberry Pi 4

18 fps

29 fps

63 fps

Raspberry Pi Zero

16 fps

24 fps

34 fps

800x600

Rock Pi 4

60 fps

61 fps

61 fps

Raspberry Pi 4

29 fps

48 fps

99 fps

Raspberry Pi Zero

27 fps

39 fps

55 fps

640x480

Rock Pi 4

n/a

n/a

n/a

Raspberry Pi 4

50 fps

79 fps

148 fps

Raspberry Pi Zero

45 fps

63 fps

85 fps

320x240

Raspberry Pi Pico

6 fps

9 fps

15 fps

240x135

Raspberry Pi Pico

14 fps

23 fps

37 fps

(* No compression)

SPI panels on Raspberry Pi

If the SPI display ends up as a DRM minor other than zero, override which one GUD uses in /boot/cmdline.txt: drm_dev=1

Table 3. Frames per second with format RGB565
Resolution Board SPI speed No* x2 x4 Max*

320x240

Raspberry Pi 4

62.5 MHz

43 fps

43 fps

43 fps

50 fps

Raspberry Pi Zero

66.6 MHz

25 fps

24 fps

24 fps

54 fps

320x480

Raspberry Pi 4

62.5 MHz

20 fps

21 fps

20 fps

25 fps

Raspberry Pi Zero

66.6 MHz

12 fps

12 fps

12 fps

27 fps

(No* compression)
(Max* Theoretical maximum if we could continously push only the pixel data from a static buffer and SPI was the only limiting factor)

Why does the Zero only get half the speed? This doesn’t make sense, almost all time should be taken by the SPI bus transfer. Running modetest on the device itself and thus driving the display directly shows that GUD is not to blame:

Table 4. Frames per second with format RGB565 using modetest
Resolution Board freq

320x240

Raspberry Pi 4

42 Hz

Raspberry Pi Zero

25 Hz

320x480

Raspberry Pi 4

21 Hz

Raspberry Pi Zero

13 Hz

perf-direct.py and modetest results

I have tried to track down what’s going on here, but gave up (details). I haven’t got a SPI analyzer so I can’t see what actually happens on the bus.

It turns out that the problem is the VPU clock changing: See https://github.com/raspberrypi/linux/issues/3381

Maximum theoretical USB bulk throughput

  • USB 3.0: 500 MB/s (TODO: find formula)

  • USB 2.0: 13 packets of 512 bytes per microframe (1/8ms): 13*512*8*1000/1024/1024 = 50 MB/s

  • USB 1.1: 19 packets of 64 bytes per frame (1ms): 19*64*1000/1024/1024 = 1.2 MB/s

Measure USB bulk throughput using usbtest

Linux has some builtins tools to test the USB stack
  • testusb: Userspace tool to control usbtest

  • usbtest: Kernel module that runs the tests

  • f_sourcesink: A sink USB function to pour USB bulk OUT requests into

Device:

## Stop the display gadget
# /etc/init.d/S70gud stop

## Start the source/sink USB gadget function configured like the g_zero legacy gadget (but only source/sink)
# g_zero start

Host:

# Match g_zero setup: write a 4MB buffer (~= 1920*1080*2), queue up 1 request, do it 73 times
$ sudo ~/testusb -a -t 27 -s 4194304 -g 1 -c 73
unknown speed   /dev/bus/usb/001/026    0
/dev/bus/usb/001/040 test 27,   10.054837 secs

$ dmesg
[357713.208084] usbtest 1-1.4:1.0: TEST 27: bulk write 292Mbytes

# raw USB throughput: 292/10.05 = 29.0MB/s, 73/10.05 = 7.2 fps
Table 5. usbtest results
Board USB MB/s fps*

Rock Pi 4

3.0

74.6

18.6

Raspberry Pi 4

2.0

29.0

7.2

Raspberry Pi Zero

2.0

20.9

5.2

Raspberry Pi Pico

1.1

0.97

n/a

(fps*: 1920x1080-RGB565 no compression framerate)

⚠️ **GitHub.com Fallback** ⚠️