RMA WG 01 25 2021 - openshmem-org/specification GitHub Wiki

Attendees: Manju, Dave, Khaled, Wasi, Nick, Naveen, Min

Agenda

Go over remaining active topics and discuss & update plan

Notes

Signal
- Naveen: Atomic implementation concerns: It can degrade performance of put_with_signal if make signal be the same atomicity as that of former because they have to be transferred via the same EP. Then the user (or library) has to explicitly call fence before signal to ensure ordering with previous RMA/AMO operations.
- Manju: need define the semantics of signal: (1) Meaning of fence? (2) Difference between signal and AMO? (3) How it helps GPU comm? (4) Now we have three atomic modes: AMO with SHMEM_TEAM_WORLD domain, AMO with SHMEM_TEAM_SHARED domain, put_with_signal. Will signal be the 4th?
- Khaled: conditional check on GPU will cause suboptimal perf if replace signal with put_with_signal(0 byte)
- Nick: considering released memory ordering model with signal? E.g., the user application already guarantees ordering between signal and the previous RMA/AMO, then the library does not have to call fence for ordering.
- Next step: will follow up discussion on ticket #382
GPU
- Khaled: Have discussed with Jim. May consider two aspects:
  1. What functions are needed for GPU to match existing API:
    - CPU prepared: stream triggered, kernel triggered
    - GPU prepared: kernel initiated
  2. Multilevel memory model on GPU
- Manju: event triggered == {stream|kernel} triggered
  - Trying to decouple invocation|execution semantics
  - May be useful semantics for upcoming(current) smartNIC
- Min: event triggered might be similar to OMP task model. Allowing user to define dependency (e.g., ordering of streams)
- Khaled: How to define the new semantics?
  - If we reuse existing APIs (e.g., adding an event parameter in put), it will be hard for user to follow if using the same set of APIs for two semantics.
  - If do not reuse, then too many extensions.
- Min: Define two distinct models? one for regular and the other for event-triggered?
  - Khaled: What if I want to use event/regular models same time (e.g., both GPU and CPU communicate)
  - Wasi: We can tread CPU also as a device

Plan for next meeting

Go over the remaining active topics and discuss any specific item