WeeklyTelcon_20160705 - ICLDisco/ompi GitHub Wiki
Open MPI Weekly Telcon
- Dialup Info: (Do not post to public mailing list or public wiki)
Attendees
- Geoff Paulsen
- Jeff Squyres
- Howard Pritchard
- Josh Hursey
- Arm Patinyasakdikul
- Joshua Ladd
- Nathan Hjelm
- Nysal
- Ralph
- Ryan Grant
- Sylvain Jeaugey
- Todd Kordenbrock
Agenda
Review 1.10
Review 2.0.x
- Has improved. 233 failures, currently on Cisco.
- Cisco - Many Cisco failures are local cluster issues. Art is working on cleaning up.
- Jeff put in a patch into MTT to allow thread hangs to be marked as hangs.
- nVidia failures are all PMIx failures.
- Giles found a race condition in PMIx 2.0.
- v2.x failures on Comm_spawn_loop.
- overall not too bad.
MTT Dev status:
New Items:
- Face to face coming up
- Need to discuss ways to take payments.
- WebSite transitions
- Website itself
- Nightly tarballs
- Archives of mailing lists entries.
- Have mbox archives of all of the lists also. But as soon as we move stuff, where do NEW posts get archived?
- Travis was hung over the weekend. Not sure why.
- ibm jenkins was off over the weekend, should be fixed now.
Status Updates:
- Cisco
- Have Arm, got usNIC BTL thread multiple in master
- lots of minor bug fixing and 2.x items.
- been more focused on libfabric stuff.
- NVIDIA
- Watching MTT
- when have 2 cpus and IB card on node, might want to use IB card to do transfers between GPUs.
Status Update Rotation
- Mellanox, Sandia, Intel
- LANL, Houston, IBM
- Cisco, ORNL, UTK, NVIDIA