WeIO timings - nodesign/weio GitHub Wiki

Summary

In the first release, we've noticed that the transmissions with the co-processor are slow. This page summarize all the measures I made to understand why it is so slow, and where the waste of time is.

The transmission process looks like this:

Measure of the whole chain

This measure use the WeIO API (A) to send a command to the LPC, which is processed in (E).

Here's the test script I used:

from weioLib.weio import *

def setup():
    attach.process(myProcess)
    
def myProcess():
    pinMode(0, OUTPUT)
    while True :
        digitalWrite(0,0)
        digitalWrite(0,1)

The result is a square wave on the PIN0. This square wave has a frequency of ~500Hz, meaning each command takes ~1ms to be sent and processed. (the square wave is generated with two commands).

Measure of the firmware part

The firmware part (D and E) looks like that:

First, the data received through the CDC stream are read, then the SFP frame is decoded, and finally, the command is executed. This flow is part of the main loop of the LPC. The time needed to execute the main loop is 3.6us when no data is received. The time needed to check if there's data is about 2.1us, and the other tasks of the main loop takes about 1.5us.

When data are received, the time needed to decode and execute the command is about 180us (for a command of 6 bytes - digitalWrite(0, 0)). Theorically, we could send up to 5.5k commands per seconds. (1/180us).

Once a command has been sent and processed, there's no incoming data for ~760us. This delay probably comes from the python processing, either in IoTPy or in weioAPI. To be sure, I tried to send 10 times the command via IoTPy (https://github.com/nodesign/weio/blob/next/IoTPy/pyuper/ioboard.py#L208). When this line is repeated 10 times, the delay between the command is reduced to 3.2us. This prove that this delay doesn't come from the firmware nor the CDC transmission, but by the software layer.

SFP encoding

I've found that encode_sfp (https://github.com/nodesign/weio/blob/next/IoTPy/pyuper/ioboard.py#L130-L148) can be optimized (https://github.com/nodesign/weio/blob/next/IoTPy/pyuper/ioboard.py#L146). Even if string.join should be used instead of string += string, it is slower. It's probably caused by the memory allocation, well explained here (https://paolobernardi.wordpress.com/2012/11/06/python-string-concatenation-vs-list-join/). Anyway, writing string += string instead of string.join allow to reduce the latency between commands from 760us to 530us.

Measure of the function stack

The goal is to test the latency between commands depending from where it is sent. The function stack, for digitalWrite, is the following:

In every case, the command sent is digitalWrite(0, 0), which is the SFP command \xd4\x00\x03\x04\x14\x00. This command takes 176us to be executed by the LPC.

When the commands are sent via a user program, the delay between the commands is ~816us.

(4) When the commands are hardcoded in encode_sfp, the delay is reduced to ~400us.

(3.1) When gpio.write is modified like this:

        self.board.uper_io(0, '\xd4\x00\x03\x04\x14\x00')

in IoTPy/pyuper/gpio.py at line 113, the delay between commands is reduced to ~184us

(3.2) When gpio.write is modified like this:

        self.board.ser.write('\xd4\x00\x03\x04\x14\x00')

in IoTPy/pyuper/gpio.py at line 113, the delay between commands is reduced to ~152us

(2) When digitalWrite is modified like this:

        gpio.board.ser.write('\xd4\x00\x03\x04\x14\x00')

in weioLib/weioGpio.py at line 123, the delay between commands is reduced to ~120us

(1) When digitalWrite is modified like this:

        return gpio.mainGpio[pin].board.ser.write('\xd4\x00\x03\x04\x14\x00')

in weioLib/weioIO.py at line 103, the delay between commands is reduced to ~64us.

  • Note (1): abeb560. With this change, the delay between commands is about 100us, and the GPIO frequency (digitalWrite) is ~3.57kHz)

To summarize all these measurments:

  • Time needed to encode a SFP command is about 416us (816us - 400us)
  • Time needed to execute the uper_io function is about 216us (400us - 184us)
  • Time needed to execute the write function is about 32us (184us - 152us)
  • Time needed to execute the digitalWrite function in weioGpio.py is about 32us (152us - 120us)
  • Time needed to execute the digitalWrite function in weioIO.py is about 56us (120us - 64us)

digitalWrite and write only wrap the commands. There's not much processing in these functions. So it's strange to measure an average of 40us only to jump from a function to another. This time seems to be the time needed for the stack management. It's also strange to measure 216us only for the execution of uper_io, since this function basically only has to send data on the serial port…

About the baudrate in IoTPy

The LPC communicates with the AR9331 through a USB CDC stream. In the code, it looks like a serial port, and a baudrate is defined. (https://github.com/nodesign/weio/blob/next/IoTPy/pyuper/ioboard.py#L57). In the case of a CDC stream, this value is useless and meaningless. Practically, the communication will be as higher as possible. Both AR9331 and LPC are USB 2.0 full speed compliant. That means that the USB bus speed is 1.5MBps. The CDC stream uses a bulk transfert type, which has a maximum allowed bandwidth of 1.2MBps. (bandwidth and latency not guaranteed). 1.2MBps is then theorically the maximum transfert rate. Bulk transfert are known to be really fast, especially when there's no other devices (or no other transfert) on the bus.

Some usefull links

http://processors.wiki.ti.com/index.php/USB_CDC_Benchmarking http://christopherpeplin.com/2012/02/bulk-usb-throughput https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow