Connection establishment - openucx/ucx GitHub Wiki

UCT

UCT supports either connecting to a remote interface, or connecting to a remote endpoint (p2p mode). When connecting to an interface, need to create endpoint with uct_ep_create_connected(), and the endpoint can be used immediately. When connecting to a remote endpoint, need to create an endpoint using uct_ep_create() and then connect it using uct_ep_connect(). This endpoint can be used only after remote side has connected its endpoint as well.

UCP goals

UCP, on the other hand, exposes only one-sided semantics: ucp_ep_create() would create an endpoint which could be used immediately for communications to a remote worker. Therefore, UCP has to implement a connection establishment protocol - wireup - which would bridge this gap and support the following scenarios:

  1. Establish connection over p2p transport.
  2. Create endpoint for sending replies back, for one-sided transports.
  3. Use multiple transports for different types of operations.

UCP design

Every UCP endpoint contains several UCT endpoints, possibly using different transports, for the following operations:

  • Active messages
  • Remote memory access
  • Atomic operations

Some may point to same UCT endpoint, or be NULL. When a connection establishment is in progress, an endpoint may actually be a stub endpoint, which is actually a dummy endpoint which only puts operations on the pending queue, and uses an auxiliary transport to send the wireup messages (which make up the connection establishment protocol). Once the "real" endpoint inside the stub endpoint is connected, the stub is replaced by the real endpoint, and the pending operations are re-played on it.

UCP wireup protocol

Create endpoint - ucp_ep_create

  1. Select transport for every operation required by UCP configuration ("features")
  2. Create connected endpoint for transports which can connect directly to remote interface
  3. Create stub endpoint for the p2p transports
  4. If there is at least one p2p transport, send wireup REQUEST with all local addresses

Protocol which requires remote side to send a reply

  1. Send our worker uuid in the protocol header (e.g rendezvous request header)
  2. If we have not tried to connect to remote side yet, also send a wireup REQUEST

Wireup REQUEST received

  1. If endpoint not exists - create it (in case remote side want us to create endpoint for replies)
  2. If endpoint exists but transports were not selected (in case it's a stub) - select transports and create local UCT endpoints.
  3. In case there are local stub endpoints:
  4. If transport connects to interface - switch the stub to the real transport.
  5. If transport is p2p - start auxiliary wireup on the stub endpoint using the received address.
  6. If selected a duplicate - destroy the stub.
  7. Connect local transport to remote addresses
  8. Send wireup REPLY with local addresses

Wireup REPLY received

  1. Connect local endpoints to remote addresses
  2. Send wireup ACK to let remote side know it can start communications on p2p transports.
  3. If both sides are connected now, switch stub endpoints to the "real" transport.

Wireup ACK received If both sides are connected now, switch stub endpoints to the "real" transport.