Response Tokenisation - acaprojects/ruby-engine GitHub Wiki

Tokenisation is extremely important. When working with a stream of data, as provided by TCP and UDP, you most certainly want to

Wait for a complete response before processing
Want to process only one response at a time

Whilst this might seem to occur naturally most of the time, network contention, network errors and high data rates will eventually trip you up. CBus is one system where I've often see back to back messages returned in a single IO read.

Engine ships with two tokenisers to help you break up the incoming data. The default buffered tokeniser and the more advanced abstract tokeniser.

Default Tokeniser

Usage:

class Clipsal::CBus
    # Device driver helper
    tokenize delimiter: "\x0D"
end

Options:

Option	Description
delimiter	sequence to detect the end of message. Supports strings and regexs
indicator	sequence to detect the start of a message
msg_length	can be used with an indicator if messages are always a fixed length
size_limit	prevents buffering from using all your memory if the end of a message is never detected
min_length	can help prevent false positives
encoding	defaults to ASCII-8BIT to avoid invalid characters when dealing with binary data

Example:

tokenize indicator: "\x02", delimiter: "\x03"

and data: "yu\x03\x02hello\x03\x02world\x03\x02how" Would have the following result:

yu\x03 would be discarded
hello would be returned
world would be returned
\x02how would be buffered

Abstract Tokeniser

The primary use case for this tokeniser is variable length messages, where length can be determined by the message contents. (commonly a length field in the header)