Last week discussed OMPI-IO + Luster slow on 2.0.0 (and master) branches. Discussed making ROMIO default for OMPI on Luster (only).
Last week discussed Group Comms weren't working for Comms of powers of 2. Nathan found massive memory issue.
Pull Requests - Several that Jeff, Ralph, or Howard need to review.
PR 896 - not going to help us avoid Luster issue. Reduce priority of Luster below ROMIO.
Edgar Tested on Cray.
894, 890, 900, 901 - Jeff and Howard are good with. Jeff will merge in.
Travis is now being run on 2.0 branch.
Issue 1299 - hang - want to get that into 2.0.0 - Nathan can you look at?
Issue 1301 - check max CQ size before creating CQ. Joshua Ladd will assign it to someone.
Should start marking these as 2.0.0 blockers.
Issue 1252 - Performance - Nathan going to write a decay function for progression. Will create a Pull Request and Geoff Paulsen will test. Last big one, and kind of important.
HWThreads - Ralph has no interest in going backwards to support physical CPUs. A real mess of switching if it's physical or virtual.
What is the desire? Recent OS and BIOS seem to get it right. AMD and Intel seem to be different, and seems to come up. Generated a TON of confusion among users.
Perhaps Mike has a use case that really demands it. Ralph will talk with him.
Review Master?
Edgar's PR into master PR (Try to work around Luster, by switching over to use ROMIO).
Not sure if issues he's seeing on Cray or on his cluster. Could be related, but need to get cluster running again.
Wanted to see if any warnings from jenkins.
But running that portion of code on Edgar's cluster, hits many issues.
BTL flags = 305 perf got horrible (used to get better).
did something else change in configure ? Hitting one issue after another independant of OMPIO.
OMPIO is not finding PFS2 correctly during configure. Jeff can use screen share with Edgar.
Issues only show up with 96 procs to hit, which makes debugging more difficult.
MTT status:
Cisco some timeouts having
Status Updates:
LANL - Nathan - Not much, just trying to see if can find issue for Progress slowdown. Continue to iterate on RDMA stuff to look for any remaining bugs.
Howard - reviewing PR on 2.0.0. Backlog of things for Edgar.
Houston - New Component he's developed over last few weeks. Now competative on Cray, but too late for 2.0.0, s
dynamic gen 2 - a number of new features unimplemented, but room to grow.
HLRS - no update.
IBM - Hired Joshua Hersey.
Working on deciding internally to use GITHUB Enterprise, or GITLAB based approach.
Working with David Solt on first PR, getting process setup for other developers.