Data format v1 - larpix/larpix-control GitHub Wiki

Preamble

We've gone back and forth on ideas for packet storage. Here's my proposed specification based on conversations with Dan. It uses HDF5, which is a somewhat-restrictive choice, but it's more flexible and portable than ROOT (and less crash-y), yet easier to work with than a custom binary format. It also has room for metadata so files can be self-descriptive.

Comments

  • not sure if chain (aka daisy chain) is the right way to organize this data
  • not sure if log data should be in separate datasets or a single dataset of compound type
  • not sure if C-type log messages belong in the log dataset or in a separate dataset in the config group
  • not sure if the 63-byte configuration should be part of the specification (as opposed to a larger shape that wouldn't need to be modified if LArPix is upgraded with more configuration registers)
  • should there be an association from the packet dataset to the log? Right now there is no way to figure out which log message(s) go with a particular packet in /DAQ/packets/chain_x
  • should there be a "contact name" and "contact info" field in the root group and/or the DAQ group?

File layout

One file per run. The exact meaning of what constitutes a "run" is not part of the specification.

In HDF5, the basic units are the "group" (aka directory) and "dataset" (aka data table/array). Both groups and datasets can also have attributes (aka dict, mapping strings to strings, ints, or other simple types).

Each file has a /DAQ group and an /analysis group. The contents of /DAQ are laid out in this specification. The contents of /analysis are analysis-specific.

Summary of layout

- / (root group)
  - /DAQ
    - /DAQ/packets
      - /DAQ/packets/chain_x (packet data)
      - /DAQ/log/chain_x (communications log data)
        - /DAQ/log/chain_x/type
        - /DAQ/log/chain_x/timestamp
        - /DAQ/log/chain_x/message
        - /DAQ/log/chain_x/reference
      - /DAQ/config/chain_x
        - /DAQ/config/chain_x/chip_y (configuration registers)
  - /analysis

Dataset and group specifications

/ (root group)

Attributes:

"format version": 1
"run": <run number>
"subrun": <subrun number, default = 0>
"creation timestamp": 15.......000 [the UNIX timestamp as an 8-byte unsigned integer]
"message": "A message describing the data in here"

/DAQ

Attributes:

"larpix-control version": "x.y.z"

/DAQ/packets/chain_x (packet data)

Dataset type: 8-byte unsigned integer

Shape: (x,) [1-dimensional, extendable length, maxshape=(None,)]

Contents: each entry contains the bytes from one LArPix UART packet. The ten most-significant bits will all be 0 as padding. The first byte received (i.e. the one that contains the packet type bits) will be the LSB. So as an example, the bitstream for reading chip 0, register 0 is 000000 00000000 00000000 00000000 00000000 00000000 00000011 with bit index 0 on the right. Add 00000000 00 to the left as padding to 8 bytes. The corresponding bytes are 00 00 00 00 00 00 00 03. So this packet would be stored as the 8-bit unsigned integer 3. In code this looks like

from bitstring import BitArray
packet = BitArray('0b000000' + '00000000' * 5 + '00000011')
len(packet)  # 54 bits 
packet = BitArray('0b0000000000') + packet  # pad on the left 10 bits
final_bytes = packet.bytes  # bytes('\x00' * 7 + '\x03')
final_value = packet.uint

/DAQ/log/chain_x/type

Dataset type: length-2 string

Shape: (x,) [1-dimensional, extendable length, maxlength=(None,)]

Contents: The type of log entry to store. One of:

  • M: no data, just a message and timestamp
  • R: reading packets from LArPix
  • W: sending (writing) packets to LArPix
  • C: recording the configuration of a particular chip

/DAQ/log/chain_x/timestamp

Dataset type: 8-byte unsigned integer

Shape: (x,) [1-dimensional, extendable length, maxlength=(None,)]

Contents: The timestamp (in UNIX time) for the log message. By convention, the timestamps for R and W messages are the time the transmission begins.

/DAQ/log/chain_x/message

Dataset type: length-280 string

Shape: (x,) [1-dimensional, extendable length, maxlength=(None,)]

Contents: The log message.

/DAQ/log/chain_x/reference

Dataset type: region reference to /DAQ/packets/chain_x or /DAQ/config/chain_x/chip_y

Shape: (x,) [1-dimensional, extendable length, maxlength=(None,)]

Contents: A reference to the data associated with the log message. For R and W log entries, this would be the packets that were read or written, respectively. For M log entries, there is no standard meaning. Specify your particular usage in the "M reference" attribute. For C log entries, this would be the configuration register values.

Attributes:

"M reference": "How to interpret the reference stored for an "M" log message"

/DAQ/config/chain_x/chip_y

Dataset type: 1-byte unsigned integer

Shape: (x, 63) [2-dimensional, extendable length, maxshape=(None, 63)]

Contents: Each "row" entry contains the contents of the particular chip's configuration registers. So dataset[5, 10] contains the contents of register 10 in the configuration stored at index 5. Storage of incomplete configurations (i.e. not all of the registers) is discouraged.

Attributes:

"larpix version": <the LArPix version number of this chip>
⚠️ **GitHub.com Fallback** ⚠️