WeeklyTelcon_20160531 - ICLDisco/ompi GitHub Wiki
Open MPI Weekly Telcon
Dialup Info: (Do not post to public mailing list or public wiki)
Attendees
Jeff Squyres
Arm Patinyasakdikul
Edgar Gabriel
Howard Pritchard
Nathan Hjelmn
Ralph Castain
Sylvain Jeaugey
Todd Kordenbrock
Agenda
Review 1.10
Review 2.0.x
Wiki: https://github.com/open-mpi/ompi/wiki/Releasev20
Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker
Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0
PR 1174: needs a minor tweak, then we'll put it in
PR 1199: request revamp
Nathan found 2 new issues
one thread adds callback while another thread is calling that
callback (Nathan working on this right now -- PR probably in
the next few hours)
PR 1729: minor thread leak in persistent communications leak /
callbacks can be lost (very old bug -- dates back to 2005!).
Need George PR.
One more PR coming about XRC fix from NAthan
Nathan has a 1-line uGNI fix that he'd like to get in -- will send
to Howard
NVIDIA CUDA build failed in MTT: the fix was just merged
Fallout from request overhaul
How's it look on master?
looking good; other than missing CM, we're turning up mostly
other genuine threading bugs
all PMLs should be good now
George would like to fix a few error paths (maybe v2.0.1)
Is there a consolidated PR for v2.x?
Reminder from last week
23 pull requests on master, some since last October. Not TODAY (since we want George's Multithreaded thing in), but should bring them in or kill them.
MTT Dev status:
Logistics
MPI Forum next week
Will be there: Jeff, Howard, Nathan, Sylvain
Status Updates:
Mellanox: not here
Sandia:
tracking down bug in rendezvous protocol. Will roll to v2.x
-- may have to wait for v2.0.1.
Intel: Added stuff to mpirun:
--timeout (reminder, in case you didn't know it existed); exits
with ETIMEDOUT if timeout expires (110 on Linux / OS X)
--report-state-on-timeout
--get-stack-traces
Added ability to launch N daemons on a node (just for ORTE scale
testing; only works with rsh): MCA param ras_base_multiplier. NOT
for MPI performance testing! Only for ORTE scale testing.
Working on PMIx event notification stuff. ULFM comes in after that.
Going to use OPAL MCA stuff for Warewulf rewrite.
Status Update Rotation
Mellanox, Sandia, Intel
LANL, Houston, IBM
Cisco, ORNL, UTK, NVIDIA
🗂️ Page Index for this GitHub Wiki