Deaggregation - ScreamingUdder/wiki GitHub Wiki
We want to deaggregate data into separate topics at ISIS to match ESS where these data will come from different producers rather than a single instrument server. This will allow greater sharing of implementation between ISIS and ESS, for example in the Mantid live listener, and make experience gained in developing and operating the ISIS system more relevant to the ESS.
Proposed plan for changes:
- Put "slow" sample environment data into a separate, single-partition topic. "Slow" here means time-value pair logs, not EPICS waveforms and AreaDetector. At ESS, and eventually at ISIS, these messages will come from the EPICS to Kafka forwarder. Thus the schema used by the EPICS forwarder must also be used here.
- Put pulse information into a separate topic as this will somehow come from the detector group at ESS, possibly via EPICS. At the moment this is just a pulse time and proton charge. Pulse time (
frame_time
) should also still be included in event messages, it should be available through the global time system somehow and will also act as the pulse id to match events to a pulse info message and the proton charge. - Mantid stores time of flights as
float
(check this) so for now use float on the wire and get Morten to castint
tofloat
. Not sure about this but it is easy to change later. - Continue to use "frame" terminology in schema unless there is resistance from ESS folk.
- Change default value of the
end_of_frame
field in event messages totrue
so that it can be omitted by producers which do not support splitting into multiple messages per frame. This makes the implementation easier for the event formation units for now. - The optional timestamp field in the Kafka message header should be used instead of putting the timestamp in the flatbuffer payload. This will allow use of the new Kafka feature of finding offset by timestamp, permitting seeking to start-of-run or "5 mins ago" etc.
Unanswered questions:
- How do we handle detecting the end-of-run (EOR) in every topic?
- If EOR is a time known in advance then it is easy... but what if a run ends at an unexpected time, by the push of a button or some kind of safe limit hit/interlock etc?
- The system which knows when EOR was hit could publish an EOR message to every topic? If there is a short or even no pause between runs then this can result in other messages falling the wrong side of the EOR message.
- EOR could be published to one topic and then consumer continues to listen to other topics for a specified duration, then throws away any message with a timestamp after the timestamp in the EOR message. Any message which is thrown away should be reconsumed as part of the next run, unless there is a similar system for start-of-run (SOR) and it falls before the SOR. This is a slightly more complex solution and requires all messages to be timestamped but is probably the most robust solution.
Differences between ESS and ISIS requirements for schema
- ISIS has
RunStatus
- does not look like this is used by Mantid anyway, can be removed? (YAGNI) - ISIS has
frame_number
- could be useful but do not see how we can do this at ESS; EFU has no knowledge of run start + end (to reset frame number to 0). - ISIS has
period
number, no problem this can be included as default to 0 when omitted.