Collective Operations (UCG) - openucx/ucx GitHub Wiki

Description

UCG (G for Groups) is an experimental new layer in UCX providing collective operations support for MPI. UCG was implemented on top of the existing layers, and uses both UCP, UCT and UCS. More information on how it was designed could be found in the presentation from UCX's 2019 annual meeting.

Usage

UCG was introduced into UCX as a git-submodule under src/ucg, when the source code actually resides in a separate repository. This means when you build the UCX - it pulls a specific version from that repository, corresponding to your UCX version.

Building with UCG

In order to build UCX with UCG, you need to pass flags at two stages during the build:

  1. autogen.sh --with-ucg
  2. configure --enable-ucg <other-arguments>

Once those succeed - make is run as usual, and builds/installs UCG as well as the other UCX components.

Testing UCG

UCG exports an API similar to that of MPI collective operations (e.g. MPI's Allreduce). In order for this API to be called, UCG has a dedicated new component for Open MPI, but it is not upstream yet (temporary location). I'm regularly testing it with OSU, but it's still early days w.r.t. MPI applications.

Known Issues

  1. Building UCG takes forever (hangs on builtin_data.c?) - this is a known, unfortunate implication of the way that code was written. This one file takes some GCC versions more than an hour to build. I tried splitting that code - but it hurt performance, so no good solution quite yet.