Instructions for setting up an fddaq‐v4.1.1 software area

Instructions for setting up an fddaq-v4.1.1 software area

19-Sep-2023

Reference information:

the fddaq-v4.1.1 Far Detector software release is based on the dunedaq-v4.1.1 release of the common DAQ software packages.

More reference information (click on the triangle to view the details)

the list of the FD software package versions that are included in the release is available here
the list of the common software package versions that are included in the release is available here
suggested Spack commands to learn about the characteristics of an existing software area are available as part of the daq-buildtools documentation here
the Tag Collector spreadsheet that was used for this release is here
the test-tracking spreadsheet that was used for the fddaq-v4.1.0 release is here

The steps for creating and using the software area:

create a new software area based on the v4.1.1 release build (see step 1.iv for the exact dbt-create command to use)
1. The steps for this are based on the latest instructions for daq-buildtools
2. As always, you should verify that your computer has access to /cvmfs/dunedaq.opensciencegrid.org
3. If you are using one of the np04daq computers, and need to clone packages, add the following lines to your .gitconfig file (no need to activate proxy globally, so you won't forget to disable it...):
```
[http]
  proxy = http://np04-web-proxy.cern.ch:3128
  sslVerify = false
```
4. Here are the steps for creating the new software area:
```
cd <directory_above_where_you_want_the_new_software_area>
source /cvmfs/dunedaq.opensciencegrid.org/setup_dunedaq.sh
setup_dbt fddaq-v4.1.1
dbt-create fddaq-v4.1.1 <work_dir>  # use optional "-c" argument to clone pyvenv in work area
cd <work_dir>
```
5. Please note that if you are following these instructions on a computer on which the DUNE-DAQ software has never been run before, there are several system packages that may need to be installed on that computer. These are mentioned in this script. To check whether a particular one is already installed, you can use a command like yum list libzstd and check whether the package is listed under Installed Packages.

add any desired repositories to the /sourcecode area. An example is provided here.

clone the repositories (the following block has some extra directory checking; it can all be copy/pasted into your shell window)

# change directory to the "sourcecode" subdir, if possible and needed
if [[ -d "sourcecode" ]]; then
    cd sourcecode
fi
# double-check that we're in the correct subdir
current_subdir=`echo ${PWD} | xargs basename`
if [[ "$current_subdir" != "sourcecode" ]]; then
    echo ""
    echo "*** Current working directory is not \"sourcecode\", skipping repo clones"
else
    # finally, do the repo clone(s)
    git clone https://github.com/DUNE-DAQ/daqconf.git -b dunedaq-v4.1.1
    git clone https://github.com/DUNE-DAQ/daq-systemtest.git -b dunedaq-v4.1.1
    git clone https://github.com/DUNE-DAQ/dfmodules.git -b dunedaq-v4.1.1
    cd ..
fi

setup the work area and build the software

dbt-workarea-env
dbt-build -j 20
dbt-workarea-env

prepare a daqconf.json file, such as the one shown here. This sample includes parameter values that select the WIBEth data type. (Please note the additional comments on this sample file that are included below!)

{
  "boot": {
    "use_connectivity_service": true,
    "start_connectivity_service": true,
    "connectivity_service_host": "localhost",
    "connectivity_service_port": 15432
  }, 
  "daq_common": {
    "data_rate_slowdown_factor": 1
  },
  "detector": {
    "clock_speed_hz": 62500000
  },
  "readout": {
    "use_fake_cards": true,
    "default_data_file": "asset://?label=WIBEth&subsystem=readout"
  },
  "trigger": {
    "trigger_window_before_ticks": 1000,
    "trigger_window_after_ticks": 1000
  },
  "hsi": {
    "random_trigger_rate_hz": 1.0
  }
}

A few notes on the sample file shown above:

The "use/start_connectivity_service" parameters aren't strictly needed, since their default value is "true". Ditto, the "connectivity_service_host/port". However, all of these are included so that people can use them for reference.
A port offset is applied to the "connectivity_service_port" by nanorc, so we don't all need to use different numbers, as long as we use different partition numbers when running nanorc, e.g. 'nanorc --partition-number 2 ...')
If you want to use an existing, externally-started Connectivity Service instance, such as the one on the np04 cluster, you would set "use_connectivity_service" to true, and "start_connectivity_service" to false.

Another option (the initial config, but with the ConnSvc disabled)

{
  "boot": {
    "use_connectivity_service": false,
    "start_connectivity_service": false
  }, 
  "daq_common": {
    "data_rate_slowdown_factor": 1
  },
  "detector": {
    "clock_speed_hz": 62500000
  },
  "readout": {
    "use_fake_cards": true,
    "default_data_file": "asset://?label=WIBEth&subsystem=readout"
  },
  "trigger": {
    "trigger_window_before_ticks": 1000,
    "trigger_window_after_ticks": 1000
  },
  "hsi": {
    "random_trigger_rate_hz": 1.0
  }
}

prepare a data-readout map file (e.g. my_dro_map.json), listing the detector streams (true or fake) that you want to run with, e.g.:

[
    {
        "src_id": 100,
        "geo_id": {
            "det_id": 3,
            "crate_id": 1,
            "slot_id": 0,
            "stream_id": 0
        },
        "kind": "eth",
        "parameters": {
            "protocol": "udp",
            "mode": "fix_rate",
            "rx_iface": 0,
            "rx_host": "localhost",
            "rx_mac": "00:00:00:00:00:00",
            "rx_ip": "0.0.0.0",
            "tx_host": "localhost",
            "tx_mac": "00:00:00:00:00:00",
            "tx_ip": "0.0.0.0"
        }
    },
    {
        "src_id": 101,
        "geo_id": {
            "det_id": 3,
            "crate_id": 1,
            "slot_id": 0,
            "stream_id": 1
        },
        "kind": "eth",
        "parameters": {
            "protocol": "udp",
            "mode": "fix_rate",
            "rx_iface": 0,
            "rx_host": "localhost",
            "rx_mac": "00:00:00:00:00:00",
            "rx_ip": "0.0.0.0",
            "tx_host": "localhost",
            "tx_mac": "00:00:00:00:00:00",
            "tx_ip": "0.0.0.0"
        }
    }
]

Generate a configuration, e.g.:

daqconf_multiru_gen -c ./daqconf.json --detector-readout-map-file ./my_dro_map.json my_test_config

nanorc <config name> <partition name> boot conf start_run <run number> wait 60 stop_run scrap terminate
- e.g. nanorc my_test_config ${USER}-test boot conf start_run 111 wait 60 stop_run scrap terminate
- or, you can simply invoke nanorc my_test_config ${USER}-test by itself and input the commands individually
When you return to working with the software area after logging out, the steps that you'll need to redo are the following:
- cd <work_dir>
- source ./env.sh
- dbt-build # if needed
- dbt-workarea-env # if needed

For reference, here are daqconf.json and dro_map.json files for emulated DuneWIB electronics

Sample daqconf.json for DuneWIB

{
  "boot": {
    "use_connectivity_service": true,
    "start_connectivity_service": true,
    "connectivity_service_host": "localhost",
    "connectivity_service_port": 15432
  }, 
  "daq_common": {
    "data_rate_slowdown_factor": 10
  },
  "detector": {
    "clock_speed_hz": 62500000
  },
  "readout": {
    "use_fake_cards": true,
    "data_files": [
      {"detector_id": 3, "data_file": "asset://?label=DuneWIB&subsystem=readout"}
    ]
  },
  "trigger": {
    "trigger_window_before_ticks": 1000,
    "trigger_window_after_ticks": 1000
  },
  "hsi": {
    "random_trigger_rate_hz": 1.0
  }
}

Another option, with DuneWIB, Trigger Primitive generation enabled, and multiple Dataflow apps

{
  "boot": {
    "use_connectivity_service": true,
    "start_connectivity_service": true,
    "connectivity_service_host": "localhost",
    "connectivity_service_port": 15432
  }, 
  "dataflow": {
    "enable_tpset_writing": true,
    "apps": [
       { "app_name": "dataflow0" },
       { "app_name": "dataflow1" }
    ]
  },
  "daq_common": {
    "data_rate_slowdown_factor": 10
  },
  "detector": {
    "clock_speed_hz": 62500000
  },
  "readout": {
    "enable_tpg": true,
    "tpg_threshold": 500,
    "use_fake_cards": true,
    "data_files": [
      {"detector_id": 3, "data_file": "asset://?label=DuneWIB&subsystem=readout"}
    ]
  },
  "trigger": {
    "trigger_activity_config": {"prescale":1000},
    "trigger_window_before_ticks": 1000,
    "trigger_window_after_ticks": 1000
  },
  "hsi": {
    "random_trigger_rate_hz": 1.0
  }
}

Sample dro_map.json for DuneWIB

[
    {
        "src_id": 100,
        "geo_id": {
            "det_id": 3,
            "crate_id": 1,
            "slot_id": 0,
            "stream_id": 0
        },
        "kind": "flx",
        "parameters": {
            "protocol": "full",
            "mode": "fix_rate",
            "host": "localhost",
            "card": 0,
            "slr": 0,
            "link": 0
        }
    },
    {
        "src_id": 101,
        "geo_id": {
            "det_id": 3,
            "crate_id": 1,
            "slot_id": 0,
            "stream_id": 1
        },
        "kind": "flx",
        "parameters": {
            "protocol": "full",
            "mode": "fix_rate",
            "host": "localhost",
            "card": 0,
            "slr": 0,
            "link": 1
        }
    }
]

An example hardware map file from the Vertical Drift Coldbox can be found here.

For reference, here are daqconf.json and dro_map.json files for VD TDE (vertical drift, top detector electronics)

Sample daqconf.json for VD TDE

{
  "boot": {
    "use_connectivity_service": true,
    "start_connectivity_service": true,
    "connectivity_service_host": "localhost",
    "connectivity_service_port": 15432
  },
  "daq_common": {
    "data_rate_slowdown_factor": 1
  },
  "detector": {
    "clock_speed_hz": 62500000
  },
  "readout": {
    "use_fake_cards": true,
    "default_data_file": "asset://?checksum=759e5351436bead208cf4963932d6327"
  },
  "trigger": {
    "trigger_window_before_ticks": 1000,
    "trigger_window_after_ticks": 1000
  },
  "hsi": {
    "random_trigger_rate_hz": 1.0
  }
}

Sample dro_map.json for VD TDE

[
    {
        "src_id": 100,
        "geo_id": {
            "det_id": 11,
            "crate_id": 1,
            "slot_id": 0,
            "stream_id": 0
        },
        "kind": "eth",
        "parameters": {
            "protocol": "udp",
            "mode": "fix_rate",
            "rx_iface": 0,
            "rx_host": "localhost",
            "rx_mac": "00:00:00:00:00:00",
            "rx_ip": "0.0.0.0",
            "tx_host": "localhost",
            "tx_mac": "00:00:00:00:00:00",
            "tx_ip": "0.0.0.0"
        }
    },
    {
        "src_id": 101,
        "geo_id": {
            "det_id": 11,
            "crate_id": 1,
            "slot_id": 1,
            "stream_id": 1
        },
        "kind": "eth",
        "parameters": {
            "protocol": "udp",
            "mode": "fix_rate",
            "rx_iface": 0,
            "rx_host": "localhost",
            "rx_mac": "00:00:00:00:00:00",
            "rx_ip": "0.0.0.0",
            "tx_host": "localhost",
            "tx_mac": "00:00:00:00:00:00",
            "tx_ip": "0.0.0.0"
        }
    }
]

Notes about the use of localhost in `daqconf.json` and `dro_map.json` files

Starting with dunedaq-v4.0.0, when we specify a hostname of "localhost" in a daqconf.json or dro_map.json file, that hostname is resolved at configuration time, using the name of the host on which the configuration is generated. This is handled by the code in the daqconf package, and it is done to prevent problems in situations in which some of the hosts are fully specified and some are simply listed as localhost. Such a mixed system can be problematic since the meaning of "localhost" will be different depending on when, and on which host, it is resolved. To prevent such problems, localhost is now fully resolved at configuration time.

This has ramifications that should be noted, however. Previously, when localhost-only system configurations were run with nanorc, the DAQ processes would be started on the host on which nanorc was run. With the new functionality, however, the DAQ processes that had a hostname of "localhost" will always be run on the computer on which the configruation was generated, independent of where nanorc is run.

Instructions for using the `HDF5LIBS_TestDumpRecord` utility

This utility can be used to print out information from the HDF5 raw data files. To invoke it use

HDF5LIBS_TestDumpRecord <filename>

Getting an overview of the HDF5 file structure

h5dump-shared -H <filename>

Dumping the binary content of a certain block from HDF5 file

This is another use of the h5dump-shared utility. This case uses the following command-line arguments:

the HDF5 path of the block we want to dump (-d )
the output binary file name (-o <output_file>)
the HDF5 file to be dumped

An example is:

h5dump-shared -d /TriggerRecord00001.0000/RawData/Detector_Readout_0x00000000_WIB -bLE -o dataset1.bin swtest_run002252_0000_dataflow0_datawriter_0_20221102T192809.hdf5

Once you have the binary file, you can examine it with tools like Linux od (octal dump), for example

od -x dataset1.bin

Sample integration tests

There are a few integration tests available in the integtest directory of the dfmodules package. To run them, we suggest adding the dfmodules package to your software area, rebuilding your area, cd sourcecode/dfmodules/integtest, and cat the README file to view the suggestions listed within it. (Those suggestions are along the lines of running a test with a command like pytest -s minimal_system_quick_test.py --nanorc-option partition-number <your_fav_num_1-9>.)

Monitoring the system

When running with nanorc, metrics reports appear in the info_*.json files that are produced (e.g. info_dataflow_<portno>.json). We can collate these, grouped by metric name, using python -m opmonlib.info_file_collator info_*.json (default output file is opmon_collated.json).

It is also possible to monitor the system using a graphic interface.

Notes on nanorc port offsets, including automatically-started ConnectivityService instances

From Pierre on 05-Apr-2023:

for nano04rc: port_offset = 0 + partition_number * 500 https://github.com/DUNE-DAQ/nanorc/blob/develop/src/nanorc/__main_np04__.py#LL77C6-L77C45
for nanorc: port_offset = 0 + partition_number * 500 https://github.com/DUNE-DAQ/nanorc/blob/develop/src/nanorc/cli.py#L108
for nanotimingrc: port_offset = 300 + partition_number * 500 https://github.com/DUNE-DAQ/nanorc/blob/develop/src/nanorc/__main_timing__.py#L69

Steps to enable and view TRACE debug messages

Here are suggested steps for enabling and viewing debug messages in the TRACE memory buffer:

set up your software area, if needed (e.g. cd <work_dir>; source ./dbt-env.sh ; dbt-workarea-env)
export TRACE_FILE=$DBT_AREA_ROOT/log/${USER}_dunedaq.trace
- this tells TRACE which file on disk to use for its memory buffer, and in this way, enables TRACE in your shell environment and in subsequent runs of the system with nanorc.
run the application using the nanorc commands described above
- this populates the list of already-enabled TRACE levels so that you can view them in the next step
run tlvls
- this command outputs a list of all the TRACE names that are currently known, and which levels are enabled for each name
- TRACE names allow us to group related messages, and these names typically correspond to the name of the C++ source file
- the bitmasks that are relevant for the TRACE memory buffer are the ones in the "maskM" column
enable levels with tonM -n <TRACE NAME> <level>
- for example, tonM -n DataWriter DEBUG+5 (where "5" is the level that you see in the TLOG_DEBUG statement in the C++ code)
re-run tlvls to confirm that the expected level is now set
re-run the application
view the TRACE messages using tshow | tdelta -ct 1 | more
- note that the messages are displayed in reverse time order

A couple of additional notes:

For debug statements in our code that look like TLOG_DEBUG(5) << "test, test";, we would enable the output of those messages using a shell command like tonM -n <TRACE_NAME> DEBUG+5. A couple of notes on this...
- when we look at the output of the bitmasks with the tlvls command, bit #5 is going to be offset by the number of bits that TRACE and ERS reserve for ERROR, WARNING, INFO, etc. messages. At the moment, the offset appears to be 8, so the setting of bit "DEBUG+5" corresponds to setting bit #13.
- when we view the messages with tshow, one of the columns in its output shows the level associated with the message (the column heading is abbreviated as "lvl"). Debug messages are prefaced with the letter "D", and they include the number that was specified in the C++ code. So, for our example of level 5, we would see "D05" in the tshow output for the "test, test" messages.
There are many other TRACE 'commands' that allow you to enable and disable messages. For example,
- tonMg <level> enables the specified level for all TRACE names (the "g" means global in this context)
- toffM -n <TRACE NAME> <level> disables the specified level for the specified TRACE name
- toffMg <level> disables the specified level for all TRACE names
- tlvlM -n <TRACE name> <mask> explicitly sets (and un-sets) the levels specified in the bitmask

Instructions for setting up an fddaq‐v4.1.1 software area - DUNE-DAQ/daqconf GitHub Wiki