UCF Workshop 2018 - openucx/ucx GitHub Wiki

Dates

December 10-13

Location

Arm, Austin, TX

Agenda (draft)

  • Async progress for protocols. Yossi/Mellanox, 1hr.
    Progress various protocols, such as rendezvous, stream, disconnect, RMA/AMO emulation using progress thread
  • Thread safety, fine-grained locking. Yossi/Mellanox, 1hr.
    Discuss what is needed in UCP and UCT to support better concurrency than a big global lock
  • Support for shmem signaled put. Yossi/Mellanox, 1hr.
    How to support new OpenSHMEM primitive - put with signal
  • Upstream (rdma-core) support status. Yossi/Mellanox, 1hr.
    Using UCX with Inbox drivers and latest rdma-core
  • Xpmem support for tag matching. Yossi/Mellanox, 1hr.
    Use 1-copy for expected eager messages using UCT tag-offload API
  • Stream API and close protocol. Yossi/Mellanox, 1hr.
    Using stream API as replacement for TCP and considerations of closing/flushing a connection
  • High availability, failover. Yossi/Mellanox, 1hr.
    How to implement fabric error recovery by using multiple devices/ports
  • UCP API v2.0, Yossi/Mellanox, 1hr.
    Things we would like to change/optimize/cleanup in next UCP API, and backward compatibility considerations
  • UCP Active message API. Yossi/Mellanox, 1hr.
    Discuss active messages implementation on UCP level.
  • UCT component architecture. Split UCT to modules, and load them dynamically, so missing dependencies would disable only the relevant transports. Yossi/Mellanox, 1hr
  • Multi-binary support for various uarch, Pasha/ARM, 1hr
  • Internal memcpy, DPDK style ? Pasha/ARM, 0.5hr
  • MPICH + UCX - State of the union Ken/ANL, 1hr
  • OpenSHMEM context to UCX worker mapping, Manju/Mellanox
  • UCX+GPU: AMD and NVIDIA, Khaled/AMD, Brad/AMD, Akshay/NVIDIA
  • Collectives, Khaled/AMD 1hr
  • UCX specification update, man pages, Brad/AMD, 1hr
  • OSSS SHMEM with UCX update, Tony/SBU, 1hr
  • UCT API freeze, Nathan (?)/LANL, 1hr
  • Regression and testing for multiple uarch (x86/Power/ARM) and interconnects ( Roce, iwarp, tcp, etc. ), Howard/LANL, ?/Mellanox 1-2hr
  • UCX Datatypes for GPU devices, Akshay/Nvidia, 1hr
  • Open MPI integration with UCX - State of the union, Mellanox, 1hr
  • UCX + Python bindings, Akshay/Nvidia, 1hr

Attendees

  • Pavel S
  • Yossi I
  • Megan G
  • Tony
  • Gilad
  • Steve