Rendezvous Protocol - openucx/ucx GitHub Wiki

The Rendezvous protocol is currently supported for a case where both sender and receiver sides use either a contigious datatype or a generic datatype. This protocol will be triggered starting a message size which is set using the UCX_RNDV_THRESH environment parameter and can be changed by the user. In case Rendezvous is possible with a get_zcopy operarion by the receive side, it will be done. This depends on the used transport and the used datatype. Otherwise, for example, when a generic datatype is used, UCX will perform the Rendezvous protocol with Active Messaging.

If the UCX_RNDV_THRESH value is set to 'auto', UCX will calculate the threshold (i.e. message size from which to start using the rendezvous protocol) on its own. UCX does this by finding the message size at which AM/RMA rndv's latency is worse than the eager_zcopy latency by a small percentage (that is set by the user). Eventhough the latency will be a bit worse, the memory usage should be lower in case of Rendezvous since the unexpected queue won't be filled with incoming messages that didn't have a match yet.

  • The latency function for eager_zcopy is:
[ reg_cost.overhead + size * md_attr->reg_cost.growth +  max(size/bw , size/bcopy_bw) + overhead ]
  • The latency function for RMA (get_zcopy) Rendezvous is:
[ reg_cost.overhead + size * md_attr->reg_cost.growth + latency + overhead +
  reg_cost.overhead + size * md_attr->reg_cost.growth + overhead + latency +
  size/bw + latency + overhead + latency ]

The latency function for Active message Rendezvous is:

[ latency + overhead + reg_cost.overhead +
 size * md_attr->reg_cost.growth + overhead + latency +
 max(size/bw , size/bcopy_bw) + latency + overhead + latency ]

Isolating the 'size' parameter yields the rndv_thresh:

  • For AM Rendezvous:
    bcopy_bw = context->config.ext.bcopy_bw,
    recv_reg_cost = 0,
    diff_percent = (1 - context->config.ext.rndv_perf_diff / 100.0)

  • For RMA Rendezvous:
    bcopy_bw = inf,
    recv_reg_cost = 1,
    diff_percent = 1

    numerator = diff_percent * ((4 * ucp_tl_iface_latency(context, iface_attr)) +
                (3 * iface_attr->overhead) +
                (md_attr->reg_cost.overhead * (1 + recv_reg_cost))) -
                md_attr->reg_cost.overhead - iface_attr->overhead;

    denumerator = md_attr->reg_cost.growth +
                  ucs_max((1.0 / iface_attr->bandwidth), (1.0 / context->config.ext.bcopy_bw)) -
                  (diff_percent * (ucs_max((1.0 / iface_attr->bandwidth), (1.0 / bcopy_bw)) +
                  md_attr->reg_cost.growth * (1 + recv_reg_cost)));

rndv_thresh = numerator / denumerator;

In case the curves of 'eager_zcopy' latency and 'rendezvous' latency don't meet, the threshold will be set to the value of the UCX_RNDV_THRESH_FALLBACK environment parameter.