eDCKV file system and PACS - jacquesfauquex/DCKV GitHub Wiki

structure of a file

The file is a sequence of key-value with format:

1 byte: length of the key KL. This length includes the prefix length.
KL bytes: the key
4 bytes: length of the value VL
VL bytes: the value

All the keys of an instance are in ascending order, without any bytes before, between or after them.

At the end of a composite study file, we add a zero as key length, followed by the prefix of the last item. The purpose of it is to be able to eventually append other instances (when the prefix is higher) whithout parsing it.

concurrency

Objects of a same study may be received by concurrent processes. We avoid dealing with this problem from within the parser. Instead we create separate spaces within the file system and a two-layers process orchestration:

base layer: parser processes which dump their results in new dirs inside their respective spaces and make them visible at the finalization of each task. The spaces paths start with /$pid/
upper layer: an unique asynchronous process scans the file system:
- finds new visible files,
- checks if a corresponding study instance uid folder exists at path /studydate/studyiuid/ of its filesystem:
  - if true, merges the contents of the new dir with the existing files
  - if false, creates /studydate/studyiuid/[espinc].Blake3

If there is no key value corresponding to a category, there shall be no file for it. For example no /studydate/studyiuid/p.Blake3 cuando el estudio no contiene ningun dato de tipo privado.

lower layer: parsing, buffering and writing

For each category, the parser reserves a buffer of 0xFFFF bytes, where to append key values until it is full or until the end of the instance has been reached. In the first case, the buffer is dumped to file with temporary name and emptied before starting to fill it in again.

When this occurs, the first name for the file is /pid/sopiuid/[espinc]
When the parsing of the instance culminates, after verifying that /studydate/studyiuid/prefix.blake3hash does not exist, it is written to /pid/studydate/.studyiuid/prefix.blake3hash. If there was intermediate dumps, the temp file shall be renamed before the last dump occurs into it.
When the writing is finished, if there is no preexisting file with the same name,the file is renamed /studydate/studyiuid/prefix.blake3hash, so that it appears in the spool folder together with the other files to be processed by the upper layer. If it's a duplicate, it can be safely deleted. If the prefix corresponds, but the hash doesn´t we need to deal with the inconsistency

shrinkeage of studyiuid

For file system resources optimization, we use b64uid shrinkage of studyiuid (https://github.com/jacquesfauquex/b64uid).