vysmaw usage - mpokorny/vysmaw GitHub Wiki
VLA context
The "wcbe" application is a distributed application running on the VLA correlator back-end (CBE) cluster, which receives and processes the various WIDAR correlator data products according to the requirements of each WIDAR configuration, and writes visibility data to BDF files. The mapping of WIDAR data products to wcbe processes may change with every WIDAR reconfiguration according to the number of active sub-arrays, the CBE nodes that are in active use, and an opaque WIDAR product-to-CBE mapping algorithm implemented by wcbe. The vys system operates via a broadcast of signals by wcbe of spectral metadata and data location information to clients as the data are being processed by each wcbe process, allowing clients to receive data from any CBE process without prior knowledge of the mapping of WIDAR products to CBE processes. These signal messages provide sufficient metadata to identify not only the baseline, spectral window and stokes product of the visibility spectrum, but also the "location" of the spectrum in the CBE. Clients may then retrieve only those spectra which they require.
OpenFabrics - OFED
The current vysmaw implementation is based on OpenFabrics Enterprise Distribution (OFED)/OpenFabrics Software (OFS), which provides access to RDMA (Remote Direct Memory Access) and kernel bypass send/receive features of the InfiniBand fabric available to the CBE cluster. Both the signal broadcast and spectrum retrieval functions described in the preceding paragraph are implemented using features of OFS. The use of these OFS features allows for efficient transfer of data over the fabric directly to the client library and application, but at the cost of requiring the active participation of the application in managing the resources used by the client library. This design nevertheless does not create any direct dependencies between vysmaw application processes, or between vys producers and consumers, allowing for a high level of isolation between all processes in the vys system. In other words, failure of a client process to manage resources efficiently can only affect the data received by that process. Any inter-process dependencies that exist will only be at the level of system resource limitations; for example, two vysmaw application processes running on a single node must share the available memory on that node (while the address spaces of the client processes remain distinct.)
Memory registration
The most significant of the resources allocated by the vysmaw library for every client is so-called "registered memory." Registered memory is used by OFED/OFS routines to allow communication over OpenFabrics networks to bypass the operating system kernel, which is a key feature of OFED/OFS performance. All OFS send/receive and RDMA read/write operations require access to registered memory. Registered memory takes the form of physical memory locked in the virtual address space of the kernel. For highest performance, the vysmaw library does not copy data out of registered memory buffers prior to providing the client access to such buffers. The accounting of registered memory usage by a vysmaw application is handled by the library itself, although this requires the participation of the application code to notify the library when the application has finished accessing the contents of a buffer in registered memory.
Signal broadcast
The signal messages are received by the vysmaw library in a registered memory block. Access to the spectrum metadata of these messages is provided to the application through the callback function predicate arguments. Although the application is not required to explicitly release a reference to every buffer used for the signal message, it is nevertheless possible for the application to cause in the library the starvation of buffers available to receive signal messages. Although there is buffering between the signal receive loop in vysmaw and the call to the application callback function, an inefficient callback may result in the receive loop running out of buffers into which to store the received signal messages. Buffering between the receive loop and the callback loop acts to minimize latency in the network communication loop as well as allowing the receipt of messages at peak rates exceeding the peak callback loop bandwidth, although not for indefinite periods.
Application callback
Note that the application callback function signature is designed for some amount of batch processing in every function call. This design not only allows the library to invoke the callback less frequently than otherwise possible, but it is also aligned with the aggregation of metadata in the messages from the wcbe processes.
OFED multicast
The broadcast of signal messages in vys is currently implemented using multicast over InfiniBand. Although this implementation may change if its performance proves to be inadequate, client applications should be unaffected by any such change.
Spectral data
All spectra whose metadata satisfy the application callback predicate are retrieved by the vysmaw library via OFS RDMA into a registered memory block. These buffers are provided directly to the application in the messages retrieved from the queue. To minimize the risk of causing starvation of buffers for receipt of spectral data in the vysmaw library, applications should release the buffers received on the queue as soon as possible. This generally means that the data should be read from the buffer by the application code at most one time. A buffer is released by calling the "unref" function for the message (which must be done for every message, regardless of its type, but for reasons other than to avoid depletion of the available registered memory.) Note that a Message in the Cython layer will automatically release its reference to the underlying C-level message when its Python reference count goes to zero. It is good practice, however, to call the "unref" function explicitly in application code to ensure the buffer is returned to the registered memory pool as soon as possible.
Spectral data availability
The spectral data in the CBE are maintained at the location indicated in the metadata broadcasts for a limited time. On the CBE side, the spectral data available to vys are also required to be in registered memory, which is used for spectral data buffers on a rotating basis. Every spectrum for which metadata are broadcast will be available for reading by vysmaw processes for a limited time. Each spectral data buffer hosted by some CBE process will eventually be reused by that process for other data. The protocol implemented by the vys system includes a data validation step to ensure that the data received by a vysmaw client is that matching the metadata used to identify the spectrum. The length of time for which a spectrum is available is dependent upon the WIDAR dump rate for the product, and the amount of memory allocated by the CBE processes to contain spectral data buffers. Currently, a minimum value for the time that any spectral data buffer will be valid is undetermined, although, given the current CBE implementation, which does not yet implement a vys visibility stream, about two seconds is reasonable.
The time difference of a CBE process sending a metadata broadcast for a given spectral data product, and a vysmaw library thread reading that spectral data is the critical quantity in determining whether the spectral data are valid when they are received at the application process. When that time difference increases, the likelihood that the spectral data will be valid when they arrive in the application process memory decreases. The only influence an application can have on this latency period is through the time spent in the callback function predicate. Other sources of spectral data retrieval latency are network latency, latency introduced by the vysmaw library implementation, and a potential backlog in calls to the predicate or in processing RDMA read requests. To be clear, once a spectral data product has been read by the vysmaw library, it exists in the process' memory, and cannot be overwritten until the client releases the buffer reference. The application is notified when spectral data validation has failed, without providing any associated, invalid data.