Sync meeting on EESSI test suite (2023 11 08) - EESSI/meetings GitHub Wiki
EESSI test suite sync meetings
Planning
- every 2 weeks on Thursday at 14:00 CE(S)T
- next meetings:
- Wed 22 Nov'23 10:00 CET: OK for all
- Wed 6 Dec'23 15:30 CET: OK for all
- Wed 20 Dec'23 14:00 CET: OK for those who want to be there
- Wed 3 Jan'24 15:30 CET: unclear for some, to be confirmed
- Thu 18 Jan'24 14:00 CET
Meeting (2023-11-08)
- OSU test (PR #54)
- Sam reviewed it, Satish still needs to take review comments into account
- Biggest blocker is still the memory-issue. We agreed last time that we'll go for the option of using job options as described here. We will ask for
--mem, i.e. total memory, because that is supported on any job scheduler
- Updated CI driving scripts (PR #93)
- Todo: Caspar updates
REFRAME_VERSION in all the ci-config.sh files to 4.3.3
- CPU autodetect failing due to failing "
pip install reframe-hpc==4.3.3" (ReFrame issue #3023)
- will be fixed in upcoming ReFrame release (it's currently on the 4.5 milestone)
- add scales
1_cpn_2_nodes and 1_cpn_4_nodes (PR #94)
- someone should test this and make sure it works => Satish
- job script that is generated by ReFrame can be checked via dry run
- how can we collect/provide/dynamically determine performance reference numbers so tests can also be used for performance regression?
- step-by-step
- come up with a structure for storing/retrieving reference performance numbers (+ upper/lower bound thresholds) for a particular system
- incl. relevant metadata of the system (CPU, storage, network, ...)
- just use ReFrame perf logging for this, configured to store the perf log data like we want it to
- provide an automated way to harvest initial reference perf numbers from recent runs of test suite
- Create a function that produces a perf_ref + upper + lower over all entries for this unique combination of test hash, system, and
$EESSI_TESTSUITE_PERF_DATA_LABEL (based on some statistics, average, SD, etc)
- check if this could become a feature in ReFrame itself
- nice to have: automatically collect initial perf refs if none are available
- based on similarity of current system with systems for which data is available
- that's likely quite difficult to do...
export EESSI_TESTSUITE_PERF_DATA_LABEL='eessi-2023.06-nov2023'
eessi/testsuite/tests/apps/tensorflow
eessi/testsuite/perf_data/apps/tensorflow/
README.txt
hashes.txt => mapping of hashes to test parameters
hortense/
rome/
eessi-2023.06-nov2023.csv
test_hash,perf_var,perf_value,perf_lower_tresh,perf_upper_thres
/deadb33f,ns_day,100,95,105
milan/
eessi-2023.06-nov2023.csv
vega/
hydra/
snellius/
- Should we try and set up a meeting with the ReFrame developers on this perf data logging/harvesting idea?
- Kenneth can contact Vasileios on this via ReFrame Slack
Previous meetings