Meeting 2019 12 19 - openpmix/openpmix GitHub Wiki
v3.1.5 - Revisit in Jan
Cross-version 'make check' is fixed on all branches
dstore memory leak fix
PR https://github.com/openpmix/openpmix/pull/1573
dstore's "session" is the group of namespaces that belong to the same user
If you have multiple programs/jobs running under the same server
With this patch it will likely still leak memory in the original scenario
Job 1 running, Job 2 started, Job 1 finished - both Job 1 and 2 data is preserved
Because both job namespaces are under the same dstore "session"
Since the jobs never connected then Job 1's data should be removed since the server called deregister_nspace.
Alternative scenario:
2 jobs that connect to each other, then disconnect
When one job finishes then it should cleanup it's namespace
This was what the original 'in_use' flag was trying to protect (before they are disconnected)
It's the responsibility of the RM to track 'connectedness' of the namespaces
At disconnect we need to verify that the client unmaps it's references to the peer's namespace segments.
If the server calls dereigster_nspace while the processes are connected should it return an error? I'd think so.
If jobs are connected then there is:
Shared fate(?), and notification
Connected access to namespace data (even after remote namespace exits?)
If you want to access information to a disconnected namespace
If the data exists then you can return it
If the namespace has been deleted then return "not found"
Example: Debugger
Can it ask about a job that is running? yes
Can it ask about a job that has terminated? No, because the storage is remoted.
Dave is working on testing/verifying this fix
Associated Issue: https://github.com/openpmix/openpmix/issues/1574
Longer term support for the different levels of information
dstore, currently, has a more specific definition of a "session" and needs to be broadened to the current PMIx notion of a "session" (and other levels of information)
PRRTE CI almost ready (Josh)
'virtual scale cluster' model - default 5-10 nodes but can scale up much larger
Ci infrastructure is in place, Josh is working on some basic unit tests at the moment.
🗂️ Page Index for this GitHub Wiki