Meeting 2019 12 19 - openpmix/openpmix GitHub Wiki

  • v3.1.5 - Revisit in Jan
  • Cross-version 'make check' is fixed on all branches
  • dstore memory leak fix
    • PR https://github.com/openpmix/openpmix/pull/1573
      • dstore's "session" is the group of namespaces that belong to the same user
      • If you have multiple programs/jobs running under the same server
      • With this patch it will likely still leak memory in the original scenario
        • Job 1 running, Job 2 started, Job 1 finished - both Job 1 and 2 data is preserved
        • Because both job namespaces are under the same dstore "session"
        • Since the jobs never connected then Job 1's data should be removed since the server called deregister_nspace.
      • Alternative scenario:
        • 2 jobs that connect to each other, then disconnect
        • When one job finishes then it should cleanup it's namespace
        • This was what the original 'in_use' flag was trying to protect (before they are disconnected)
    • It's the responsibility of the RM to track 'connectedness' of the namespaces
      • At disconnect we need to verify that the client unmaps it's references to the peer's namespace segments.
      • If the server calls dereigster_nspace while the processes are connected should it return an error? I'd think so.
    • If jobs are connected then there is:
      • Shared fate(?), and notification
      • Connected access to namespace data (even after remote namespace exits?)
    • If you want to access information to a disconnected namespace
      • If the data exists then you can return it
      • If the namespace has been deleted then return "not found"
      • Example: Debugger
        • Can it ask about a job that is running? yes
        • Can it ask about a job that has terminated? No, because the storage is remoted.
    • Dave is working on testing/verifying this fix
    • Associated Issue: https://github.com/openpmix/openpmix/issues/1574
      • Longer term support for the different levels of information
      • dstore, currently, has a more specific definition of a "session" and needs to be broadened to the current PMIx notion of a "session" (and other levels of information)
  • PRRTE CI almost ready (Josh)
    • 'virtual scale cluster' model - default 5-10 nodes but can scale up much larger
    • Ci infrastructure is in place, Josh is working on some basic unit tests at the moment.