Jun232016 - openpmix/openpmix GitHub Wiki

PMIx Event Notification Meeting

Date

June 23, 2016

Attendees:

Ralph, Aurelien, Austen, Annu, Keita, Josh, AjayAkum

Minutes:

Ralph working on MCA pluggin system Working on cross version compatibility issues. The convenience library will be able to "speak" older version of the interface and be able to convert structures/etc to its internal/current version.

No new testing on Event notification work. IBM will try to test in July and actively develop with it later in the year. Aurelian will start working with it in the next week or so. Ralph has used notification for debugger disconnect and that is working.

IBM will post RFC soon (hopefully this week) for tool support.

Pluggin-able data store (for publish/lookup) comming soon.

Discussion of data recovery using fault tolerant publish/lookup store.

Discussion of the problem of returning network topology information. With unidirectional links and complexities of some networks (dragonfly, etc) it is hard to imagine how to present this information without a table, but the amount of data there could be a problem on large clusters. Need to spend some time considering 1) what information apps may want 2) how to present it in a scalable way. Possibly regex may be possible, but worst case of regex can still be pretty bad.

Discussion of SCONE. Need for revoke operation. Was planning to provide via a reliable broadcast rather than a specific "revoke" operation. Need to determine which broadcasts (hcoll for example) are reliable.

Some links to the communication needs for FT:

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwjSjMyS977NAhWJQiYKHRZBB6EQFggkMAE&url=http%3A%2F%2Fwww.netlib.org%2Futk%2Fpeople%2FJackDongarra%2FPAPERS%2FEuroMPI_2015_submission_23.pdf&usg=AFQjCNGa9SOphvxcedYA4wHA1TT478HPiQ&sig2=9MaHyLbiPmRnMmms_jlKkQ http://icl.cs.utk.edu/news_pub/submissions/sc15.pdf

⚠️ **GitHub.com Fallback** ⚠️