2023.12.18 - ovis-hpc/ovis-wiki GitHub Wiki

Version 4.4.1 Release

  • A decision has been made to release OVIS-4.4.1 with a known race condition. The code has been running stably on production machines for at least 6 months. V4.4.1 has bug fixes and improvements in stats commands. It has no core functionality changes from the previous version, v4.3.1.
  • The known bug is a race condition that occurred when the test scale was 140,000 sets aggregated to one LDMSD. The connections between sampler daemons and the 1st-level aggregators were disconnected and reconnected repeatedly. There were three aggregation levels. It did not surface when there was only one aggregation level.

Version 4.5.1 Testing Plan

  • We plan to address the race condition mentioned above in V4.5.1.
  • V4.5.1 contains significant capabilities; hence, the version moves from V4.4.1 to V4.5.1.

A path forward for LDMSD's samplers

  • We will brainstorm on a path forward for LDMSD's sampler in our future Best Practices discussion.
  • Chris brought up the direction of LDMSD's samplers to handle multiple devices or jobs.
    • Creating a single metric set that contains arrays and/or lists of records.
    • Creating multiple metric sets for each device or job.
  • Jim mentioned that we are adding a new support for configuring a plugin multiple configurations. This could be another option for LDMSD's samplers.