UCF Hackathon 2019 - openucx/ucx GitHub Wiki

Disclosure

You are registering for an open public standards setting discussion and development meeting of UCF. The discussions that take place during this meeting are intended to be open to the general public and all work product derived therefrom shall be made widely and freely available to the public. All information including exchange of technical information shall take place during open sessions of this meeting and UCF will not sponsor or support any private working group, standards setting or development sessions that may take place during this meeting. Your participation in any non-public interactions or settings during this meeting are outside the scope of UCF's intended open-public meeting format.

Dates

December 9-12.

  • 9 (Only afternoon session)
  • 10-11 (Full days)
  • 12 (only morning session)

Registration Form

https://forms.gle/mFTkYgDRtp2hPejy8

Location

ARM, Inc
5707 Southwest Pkwy #100, Austin, TX 78735

Parking

Arm parking next to the office building. You can use any un-labeled parking spot.

Registration

Arm Austin Lobby - we have two buildings in this area. Lobby/registration will in the building with Arm logo. The actual meeting will be in the second building, first floor (Training room C). Feel free to call/email Pavel Shamis.

Step by Step Instructions

Request for Slides

Please upload your slides (pdf or pptx) here

Agenda

Date Time Topic Speaker/Moderator
12/09 11:30-12:00 Registration Pasha/Megan
12:00-13:00 Lunch Training C
13:00-13:50 InfiniBand and RDMA recent advances Gilad Shainer
14:00-15:50 Proposal for collective operations API and implementation overview Manju/AlexM
16:00-17:00 Hardware specific (u-arch) codes in UCX AlexM
------- ------------- -------------------------------------------------------------------- ------------------
12/10 08:45-09:00 Registration Join over Web Connect or Skype Pasha/Megan
9:00-9:50 Breakfast: LANL future directions Steve Poole
10:00-11:00 GPU Affinity Akshay/Yossi
11:00-12:00 GPU Affinity Akshay/Yossi
12:00-13:00 Lunch Training C.
13:00-14:00 Pipelined transfers within the node, am_zcopy for cuda-ipc Devender
14:00-14:50 RDMA-Core State of the Union Jason
15:00-15:30 UCX containers Mikhail.
15:30-16:00 Documentation Brent/Pasha
16:00-16:30 Release Procedure Yossi/Pasha
16:30-17:00 CI at Azure Jason/Yossi
------- ------------- -------------------------------------------------------------------- ------------------
12/11 08:45-09:00 Registration Join over Web Connect or Skype Pasha/Megan
09:00-9:50 Breakfast: Charm++ Overview Nitin
10:00-10:30 Charm++ - present work and results Mikhail
10:40-12:00 UCP protocols architecture CI at Azure Yossi
11:00-12:00 Lunch Training C
12:00-13:00 Multi-rail for TCP sockets Continue GPU protocols discussion Akshay/Devendar/Yossi
13:00-14:00 UCP request API Mikhail
14:00-14:50 UCP active message API Sameh
15:00-15:50 Support for different memory types Yossi
16:00-17:00 RapidsAI/Dask/Joins Nikolay
------- ------------- -------------------------------------------------------------------- ------------------
12/12 08:45-09:00 Registration. Join over Web Connect or Skype Pasha/Megan
09:00-09:50 Breakfast: SparkUCX and Java API Yossi
10:00-11:00 Smartnic Pasha
11:00-12:00 Open discussion Pasha
12:00-13:00 Lunch and adjourn Training C

Action items based on meeting notes

  • EP address bloat - almost 1k today. How we can compress this ? message pack ? Maybe we should switch to on-demand address exchange
  • Add CLA bot
  • Add requirement for git signatures
  • Jason suggest to distribute rpm and Debian packages
  • Azure CI FAQ for UCX
  • Charm++ Spack installation has to be update
  • Charm++ API requests, active messages, priorities
  • Follow up with Nathan and Steve on UCT API stabilization / Nathan
  • Follow up with board on intermediate progress with UCG proposal / Pasha / Steve
  • UCP protocols v2 proposal
  • CUDA-IPC cache free
  • Profile CUDA-IPC transport
  • No nv_peer_mem = crash (probably because IB is detetecting buffer as host)
  • #4568
  • #4567
  • #4566
  • #4565
  • #4564
Original agenda laundry list
Topic Speaker/moderator Time
Proposal for collective operations API and implementation overview AlexM 2
Hi1620 - hw-specific code in UCX AlexM 1
GPU - HCA affinity Akshay,Yossi 2
Pipelined transfers within the node, am_zcopy for cuda-ipc Devendar 1
Multi-rail for TCP sockets Akshay 1
UCX in containers Mikhail 0.5
CI and AZP Yossi,Jason 0.5
UCP request API Mikhail 1
UCP active message API Sameh 1
Charm++ - present work and results Mikhail 0.5
Support for different memory types Yossi 1
RapidsAI/Dask/Joins Nikolay 1
RDMA Core Jason 1
SmartNIC Pasha 1
SparkUCX and Java API Yossi 1
Release procedure Pasha,Yossi 0.5
Documentation for users Pasha 0.5
UCP protocols architecture Yossi 1.5
Total 17.5-18
⚠️ **GitHub.com Fallback** ⚠️