Data format v1 - larpix/larpix-control GitHub Wiki
We've gone back and forth on ideas for packet storage. Here's my proposed specification based on conversations with Dan. It uses HDF5, which is a somewhat-restrictive choice, but it's more flexible and portable than ROOT (and less crash-y), yet easier to work with than a custom binary format. It also has room for metadata so files can be self-descriptive.
Comments
- not sure if chain (aka daisy chain) is the right way to organize this data
- not sure if log data should be in separate datasets or a single dataset of compound type
- not sure if
C
-type log messages belong in the log dataset or in a separate dataset in the config group - not sure if the 63-byte configuration should be part of the specification (as opposed to a larger shape that wouldn't need to be modified if LArPix is upgraded with more configuration registers)
- should there be an association from the packet dataset to the log? Right now there is no way to figure out which log message(s) go with a particular packet in
/DAQ/packets/chain_x
- should there be a "contact name" and "contact info" field in the root group and/or the DAQ group?
One file per run. The exact meaning of what constitutes a "run" is not part of the specification.
In HDF5, the basic units are the "group" (aka directory) and "dataset" (aka data table/array). Both groups and datasets can also have attributes (aka dict, mapping strings to strings, ints, or other simple types).
Each file has a /DAQ
group and an /analysis
group. The contents of /DAQ
are laid out in this specification. The contents of /analysis
are analysis-specific.
- / (root group)
- /DAQ
- /DAQ/packets
- /DAQ/packets/chain_x (packet data)
- /DAQ/log/chain_x (communications log data)
- /DAQ/log/chain_x/type
- /DAQ/log/chain_x/timestamp
- /DAQ/log/chain_x/message
- /DAQ/log/chain_x/reference
- /DAQ/config/chain_x
- /DAQ/config/chain_x/chip_y (configuration registers)
- /analysis
Attributes:
"format version": 1
"run": <run number>
"subrun": <subrun number, default = 0>
"creation timestamp": 15.......000 [the UNIX timestamp as an 8-byte unsigned integer]
"message": "A message describing the data in here"
Attributes:
"larpix-control version": "x.y.z"
Dataset type: 8-byte unsigned integer
Shape: (x,)
[1-dimensional, extendable length, maxshape=(None,)
]
Contents: each entry contains the bytes from one LArPix UART packet. The ten most-significant bits will all be 0 as padding. The first byte received (i.e. the one that contains the packet type bits) will be the LSB. So as an example, the bitstream for reading chip 0, register 0 is 000000 00000000 00000000 00000000 00000000 00000000 00000011
with bit index 0 on the right. Add 00000000 00
to the left as padding to 8 bytes. The corresponding bytes are 00 00 00 00 00 00 00 03
. So this packet would be stored as the 8-bit unsigned integer 3
. In code this looks like
from bitstring import BitArray
packet = BitArray('0b000000' + '00000000' * 5 + '00000011')
len(packet) # 54 bits
packet = BitArray('0b0000000000') + packet # pad on the left 10 bits
final_bytes = packet.bytes # bytes('\x00' * 7 + '\x03')
final_value = packet.uint
Dataset type: length-2 string
Shape: (x,)
[1-dimensional, extendable length, maxlength=(None,)
]
Contents: The type of log entry to store. One of:
-
M
: no data, just a message and timestamp -
R
: reading packets from LArPix -
W
: sending (writing) packets to LArPix -
C
: recording the configuration of a particular chip
Dataset type: 8-byte unsigned integer
Shape: (x,)
[1-dimensional, extendable length, maxlength=(None,)
]
Contents: The timestamp (in UNIX time) for the log message. By convention, the timestamps for R
and W
messages are the time the transmission begins.
Dataset type: length-280 string
Shape: (x,)
[1-dimensional, extendable length, maxlength=(None,)
]
Contents: The log message.
Dataset type: region reference to /DAQ/packets/chain_x
or /DAQ/config/chain_x/chip_y
Shape: (x,)
[1-dimensional, extendable length, maxlength=(None,)
]
Contents: A reference to the data associated with the log message. For R
and W
log entries, this would be the packets that were read or written, respectively. For M
log entries, there is no standard meaning. Specify your particular usage in the "M reference"
attribute. For C
log entries, this would be the configuration register values.
Attributes:
"M reference": "How to interpret the reference stored for an "M" log message"
Dataset type: 1-byte unsigned integer
Shape: (x, 63)
[2-dimensional, extendable length, maxshape=(None, 63)
]
Contents: Each "row" entry contains the contents of the particular chip's configuration registers. So dataset[5, 10]
contains the contents of register 10 in the configuration stored at index 5. Storage of incomplete configurations (i.e. not all of the registers) is discouraged.
Attributes:
"larpix version": <the LArPix version number of this chip>