Data Handling Policy - DUNE-DAQ/drunc GitHub Wiki

Note - these notes do not consitute towards formal documentation for how the data is handled once files are written. The objectives is to give a rough overview of what happens once a file is closed.

Data handling at EHN1

File location importance

Once a file is written, its save location determines whether the file gets transferred to tape. There are RAID arrays that are in the root of the relevant dataflow servers. The servers that we use for the relevant tasks are declared here.

Locations that get their contents transferred to tape for archival:

Server Name Directories
np04-srv-001 /data0, /data1, /data2, /data3
np04-srv-002 /data0, /data1, /data2, /data3
np04-srv-003 /data0, /data1, /data2
np04-srv-004 /data0, /data1, /data2, /data3
np04-srv-005 /data0, /data1, /data2, /data3, /data4, /data5

Note - any files that are writtten within subdirectories of these paths are not transferred to tape. This is a common workaround for files which need to be analyzed locally.

Note - If a file is written elsewhere and copied to one of the target paths, the file will get archived on tape.

How does the file end up on tape?

If a file is written to one of the directories listed in the table above, a cron job (maintained by DAQ group) will generate its associated metadata - these are json files. For the example file np02vdcoldbox_raw_run04917_0036_df-s03-d0_dw_0_20251128T134002.hdf5, the metadata json file is np02vdcoldbox_raw_run04917_0036_df-s03-d0_dw_0_20251128T134002.hdf5.json.

A second cron job scans these directories (maintained by the Offline Data Management Group) and uploads the metadata files to MetaCat (metadata catalogue). Once done, the metadata file will be renamed from np02vdcoldbox_raw_run04917_0036_df-s03-d0_dw_0_20251128T134002.hdf5.json to np02vdcoldbox_raw_run04917_0036_df-s03-d0_dw_0_20251128T134002.hdf5.json.copied. The file is then copied out of the np0x cluster and onto disk at one of the many global data sites, at which point the raw data filed is renamed from np02vdcoldbox_raw_run04917_0036_df-s03-d0_dw_0_20251128T134002.hdf5 to np02vdcoldbox_raw_run04917_0036_df-s03-d0_dw_0_20251128T134002.hdf5.copied. The data is queued for backup and archival at FNAL.

How do I later access the data?

The following links should be referred to as the primary guide on how to use metacat to query the global datasets and rucio to gain access to the data and download it. The run control refrains from providing more guidance here as it is not within our control