Release Lifecycle Notes - art-daq/artdaq GitHub Wiki
One of the key features of any DAQ system is stability. To ensure that
artdaq performs as stably as possible, an exhaustive battery of tests
should be performed on each release.
The artdaq team may release testing or integration releases from time to
time, clearly labeled as such. Users accept that using a release that is
not marked “production” on the Release Notes page may result in data
loss, lower stability, or other undesired effects.
artdaq releases are tested in multiple stages: Unit testing, Integration testing, and Acceptance testing.
These tests are performed to validate the basic low-level functionality of artdaq classes. Generally, it is performed before a release is ever tagged, and re-run every time the release is built.
TODO: Calculate code coverage of unit tests and improve/add tests as necessary
A. Make sure that artdaq builds without errors or warnings
B. Make sure that all artdaq packages pass their built-in test suites
These tests are performed to validate some level of DAQ functionality,
but the system is run in “ideal” conditions, and stress tests are not
performed at this stage.
A release that does not pass or has not passed these tests may be
labeled as “testing”.
A. Check that quick-mrb-start.sh functions properly — run without
parameters
B. Perform transfer_driver tests (See transfer_driver tests below):
- Large fragments (100 MB) x 10000, record rate for Shmem, TCP, MPI
- Small fragments (5 KB) x 1000000, record rate for Shmem, TCP, MPI
(Originally 1K fragments)
C. Perform artdaqDriver tests: - test1: 10,000 1 MB events, record time
- test2: 1,000,000 1 KB events, record time
- test3: 10,000 1 MB events without disk-writing, record time
- test4: 10,000 1 MB events with Binary disk-writing to /dev/null,
record time (new for v2_03_00, run for v2_02_01)
D. Run quick-mrb-start.sh —run-demo - Make sure the demo runs as expected
- Make sure that the output data file is created
a. Run rawEventDump.fcl over the data file
b. Run toyDump.fcl over the data file - Store data file in Redmine as version reference
E. Run the DAQInterface example configurations - Make sure each example runs as expected
- Make sure the output data file is created
- Run verification FCL jobs on data file
F. Test version reference data files from Redmine — note if version incompatibility exists
G. Test previous version of artdaq with current reference data files — note if data files are not backwards-compatible - Run quick-mrb-start.sh —tag [previous version tag] in new directory
- See compatibility test notes
These tests are performed to verify the performance of the integrated
artdaq release in conditions as similar as possible to actual
experiments using artdaq.
Various stresses will be placed on the system to ensure that it
continues to perform well when subject to CPU, disk, network, and memory
constraints.
The request, routing, and filtering systems should all be throughly
tested as well.
A release that does not pass or has not passed these tests may be
labeled as “integration”.
A. CPU-bound performance tests (currently using protodune_mock_system_with_delay configuration on ironwork with 5 BRs and 5 EBs)
- Perform single-run tests with long duration, ensure that system remains stable for at least 1 hour
- Perform multi-run tests with short duration, ensure that system
remains stable through at least 120 runs (Current configuration is to
have 3 runs per system instance, DAQInterface remains running
throughout)
B. Large system tests (currently using protodune_mock_system_with_delay on mu2edaq cluster)
C. Large protoDUNE-like system (120 BRs, 16 EBs, across all available mu2edaq nodes)
D. TODO: Add more tests
E. Deployment tests (all available experiments) - Install release in testing area on experiment computing, run experiment DAQ through new release
Unlike the previous stages of testing, passing or failing the acceptance
tests is to some degree a value decision that must be made by the group
before giving a release the “production” label.
Any issues identified during Acceptance testing that do not result in a
release failing should be documented in Redmine and ideally resolved by
the next release.