Coarray Object Interface - omni-compiler/omni-compiler GitHub Wiki

Contents

Object Interface Specifications

  1. Introduction
  2. Initialization of Static Coarrays
  3. Allocation/deallocation of Allocatable Coarrays
  4. PUT and GET Communications
  5. Synchronizations
  6. Inquire Functions
  7. Internal functions

1. Introduction

1.1 Terms

registration

Recording the address and size of a data object to the communication library. The registration is collective and may contain pin-down memory, acquirement of the global address, sharing information among all images, etc. After the registration, the data is allowed to be accessed from the other images.

Chunk

Memory area that contains one or more coarray data objects. To keep 8-byte (or another) memory alignment, the Chunk may contain extra data at the tail of each data object. A Chunk is shared from the Memory Pool or is allocated and registered by the libraries, depending on the memory manager.

Static Chunk

The Chunk that contains small static coarrays and is shared from the Memory Pool.

Memory Pool

The memory area that is allocated and registered before the program execution. It contains the communication buffer localBuf, a small system area, and an optional Static Chunk.

1.2 Memory Allocation Methods

The RA and CA methods cannot be implemented over GASNet because GASNet does not support multiple registration or de-registration.

The RS method has not been implemented over FJ-RDMA because no merit was found comparing to the RA method/FJ-RDMA.

The Runtime Sharing (RS) method

  • Memory Pool is allocated and registered with the size specified by the user,
  • Static coarrays are shared from the Memory Pool, and
  • Allocatable coarrays are pulled from and pushed to the Memory Pool.

RS method

The Runtime Allocation (RA) method

  • Memory Pool is allocated and registered with the size added the total size of static coarrays, which is calculated before the program execution.
  • Static coarrays are shared from the Memory Pool, and
  • Allocatable coarrays are allocated at runtime and registered with the communication library.

RA method

The Compiler Allocation (CA) method

  • Memory Pool is allocated and registered not including the sizes of coarrays,
  • Static coarrays are declared as static variables of the base language and registered with the communication library, and
  • Allocatable coarrays are allocated at runtime and registered with the communication library.

CA method


2. Initialization of Static Coarrays

Static coarrays are allocated and regisgered before the program execution.

[RS] [RA]

  1. XMPCO_count_size for all static coarrays (2.1)
  2. XMPCO_malloc_pool (2.2)
  3. XMPCO_malloc_staticCoarray for all static coarrays (2.3)

[CA]

  1. XMPCO_malloc_pool (2.2)
  2. XMPCO_regmem_staticCoarray for all static coarrays (2.4)

For each instance of subroutine/function, a ResourceSet structure is dynamically created (2.5) and deleted (2.6) at the entry and exit point, respectively, if any allocatable coarray variables appear in it.

2.1 XMPCO_count_size [RS][RA]

void XMPCO_count_size(int count, size_t element);

  • If the size does not exceed POOL_THRESHOLD, the size count * element is added to the size of the Static Chunk. The initial size of the Static Chunk is zero.

RESTRICTION

  • All images must call the same instances with the same values of arguments.

NOTE

  • This function should be called for every static coarray in the program.
  • The Static Chunk does not contain huge-size static coarrays that are to be allocated separately from the Static Chunk.

2.2 XMPCO_malloc_pool

void xmp_XMPCO_malloc_pool(void);

  • Get the descriptor of the Memory Pool from the lower-level runtime library.
    • [RS] The memory pool is already allocated and registered by the low-level library.
    • [RA] [CA] The memory pool is allocated and registered via the low-level library.
  • The Memory Pool initially contains the following data:
    • localBuf -- the static local buffer for one-sided communication. The size is defined in xmpco_params.h.
    • small work area for the system. The size is less than 100 bytes.
    • [RS][RA] Static Chunk -- a Chunk shared by the static coarrays. The size is determined by iterative calls of XMPCO_count_size.

RESTRICTION

  • This function should be called once and before the program execution.
  • [RS] The size of the Memory Pool cannot exceed the size of the heap area.

NOTE

  • [RS] The size of the heap area can be set with environment variable XMP_ONESIDED_HEAP_SIZE. For example:

% XMP_ONESIDED_HEAP_SIZE=70M

2.3 XMPCO_malloc_staticCoarray [RS] [RA]

CoarrayInfo_t *XMPCO_malloc_staticCoarray(char **addr, int count, size_t element, int namelen, char *name);

  • If the size count * element is not larger than POOL_THRESHOLD, then:
    • Share the size of memory from the Static Chunk.
  • If the size count * element is larger than POOL_THRESHOLD, then:
    • [RS] Share the size of memory from the Memory Pool outside of the Static Chunk.
    • [RA] Allocate the size of memory by using the libraries and register the address and the size.
  • Set the local address of the coarray to *addr
  • Return the descriptor of the coarray, that is a pointer to the CoarrayInfo corresponding to the coarray.

2.4 XMPCO_regmem_staticCoarray [CA]

CoarrayInfo_t *XMPCO_regmem_staticCoarray(void *var, int count, size_t element, int namelen, char *name);

  • Register the address var and the size count * element of the coarray.
  • Return the descriptor of the coarray, that is the pointer to the CoarrayInfo corresponding to the coarray.

2.5 XMPCO_prolog

void XMPCO_prolog(ResourceSet_t **rsetp, int namelen, char *name);

  • Create a ResourceSet structure corresponding to the procedure (subroutine or function) with the name specified with namelen and name.
  • Set *rsetp to the pointer to the ResourceSet.

NOTE

  • This function should be called at the entry of such procedure that includes any allocatable coarray variables.
  • All allocatable coarray variables belonging to the procedure should be linked to the ResourceSet.

2.4 XMPCO_epilog

void XMPCO_epilog(ResourceSet_t **rsetp);

  • Delete the ResourceSet structure *rsetp.
  • Set *rsetp to NULL.

NOTE

  • This function should be called at the exit of such procedure that includes any allocatable coarray variables.

3. Allocation/deallocation of Allocatable Coarrays

3.1 XMPCO_malloc_coarray [RS][RA]

extern CoarrayInfo_t *XMPCO_malloc_coarray(char **addr, int count, size_t element, ResourceSet_t *rset);

  • [RS] Share the size of memory count * element from the Memory Pool. The memory manager determines the address at the top of the vacant area.
  • [RA] Allocate and register the size of memory count * element by the standard C and the communication libraries.
  • Create a new CoarrayInfo and link to the ResourceSet rset corresponding to the program context (e.g., a function).
  • Set the local address of the coarray to *addr
  • Return the descriptor of the coarray, that is the pointer to the created CoarrayInfo.

3.2 XMPCO_free_coarray [RS][RA]

extern void XMPCO_free_coarray(CoarrayInfo_t *cinfo);

  • Unlink the coarray of cinfo from the parent Chunk.
  • If the parent Chunk does not have coarrays yet, invoke the garbage collector.

TBA: about the garbage collector

3.3 XMPCO_regmem_coarray [CA]

extern CoarrayInfo_t *XMPCO_regmem_coarray(void *var, int count, size_t element, ResourceSet_t *rset);

  • Register the address var and the size count * element of the coarray.
  • Return the descriptor of the coarray, that is the pointer to the CoarrayInfo corresponding to the coarray.
  • Create a new CoarrayInfo and link to the ResourceSet rset corresponding to the program context (e.g., a function).
  • Return the descriptor of the coarray, that is the pointer to the created CoarrayInfo.

NOTE

  • This function should be called immediately after the allocation of the variable var with size count * element.

3.4 XMPCO_deregmem_coarray [CA]

extern void XMPCO_deregmem_coarray(CoarrayInfo_t *cinfo);

  • Unlink the coarray of cinfo from the parent Chunk.
  • If the parent Chunk does not have coarrays yet, de-register the chunk of memory from the communication library.

NOTE

  • This function should be called immediately before freeing the variable var.

4. PUT and GET Communications

4.1 XMPCO_PUT_scalarStmt

void XMPCO_PUT_scalarStmt(CoarrayInfo_t *descPtr, char *baseAddr, int element,
                          int coindex, char *rhs, BOOL synchronous);
  • Select the scheme, depending on the arguments and the environment variables.
  • If the DMA scheme is selected, invoke PUT operation from rhs to the remote.
  • Else if the buffering scheme is selected, copy the data from rhs to localBuf and then invoke PUT operation from localBuf to the remote. If the length of data element is longer than the size of localBuf, the copy and the PUT invocation are repeated.
  • For the PUT operation, lower-level library functions _XMP_coarray_contiguous_put() and _XMP_atomic_define_1() are used if synchronous is false and true, respectively.

NOTE

  • Argument synchronous should be set to true in order to implement coarray intrinsic atomic_define.

4.2 XMPCO_PUT_arrayStmt

void XMPCO_PUT_arrayStmt(CoarrayInfo_t *descPtr, char *baseAddr, int element,
                         int coindex, char *rhs, int rank,
                         int skip[], int skip_rhs[], int count[],
                         BOOL synchronous);
  • Select the scheme, depending on the arguments and the environment variables.
  • If the DMA scheme is selected, invoke PUT operation from rhs to the remote, for each commonly contiguous range of both data.
  • Else if the buffering scheme is selected, copy the data from rhs to localBuf, and then invoke PUT operation from localBuf to the remote for each contiguous range of the remote data. If the contiguous length of the remote data is longer than the size of localBuf, the copy and the PUT operation are repeated.
  • For the PUT operation, lower-level library function _XMP_coarray_contiguous_put() is used if synchronous is false.

NOTE

  • Argument synchronous is not set to true in the current implementation.

4.3 XMPCO_PUT_spread

void XMPCO_PUT_spread(CoarrayInfo_t *descPtr, char *baseAddr, int element,
                      int coindex, char *rhs, int rank,
                      int skip[], int count[], BOOL synchronous);
  • Spread the data rhs into localBuf and then invoke PUT operation from rhs to the remote, for each contiguous range of the remote data. If the contiguous length of the remote data is longer than the size of localBuf, the spreading and the PUT operation are repeated.
  • For the PUT operation, lower-level library function _XMP_coarray_contiguous_put() is used if synchronous is false.

NOTE

  • Argument synchronous is not set to true in the current implementation.

4.4 XMPCO_GET_scalarExpr

void XMPCO_GET_scalarExpr(CoarrayInfo_t *descPtr, char *baseAddr,
                          int element, int coindex, char *result);
  • Select the scheme, depending on the arguments and the environment variables.
  • If the DMA scheme is selected, invoke GET operation from the remote to result.
  • Else if the buffering scheme is selected, invoke GET operation from the remote to localBuf and then copy from localBuf to result. If the length of data element is longer than the size of localBuf, the GET invocation and the copy are repeated.
  • For the GET operation, lower-level library function _XMP_coarray_contiguous_get() is used.

NOTE

  • Argment result should be an element-byte contiguous data object and is expected to be allocated in the wrapper layer of the runtime.

4.5 XMPCO_GET_arrayExpr

void XMPCO_GET_arrayExpr(CoarrayInfo_t *descPtr, char *baseAddr,
                         int element, int coindex, char *result,
                         int rank, int skip[], int count[]);
  • Select the scheme, depending on the arguments and the environment variables.
  • If the DMA scheme is selected, invoke GET operation from the remote to result, for each contiguous range of the remote data.
  • Else if the buffering scheme is selected, invoke GET operation from the remote to localBuf for each contiguous range of the remote data, and then copy from localBuf to result. If the contiguous length of the remote data is longer than the size of localBuf, the copy and the PUT operation are repeated.
  • For the GET operation, lower-level library function _XMP_coarray_contiguous_get() is used.

NOTE

  • Argment result should be a contiguous data object with the same size as the remote data object and is expected to be allocated in the wrapper layer of the runtime.

4.6 XMPCO_GET_arrayStmt

void XMPCO_GET_arrayStmt(CoarrayInfo_t *descPtr, char *baseAddr,
                         int element, int coindex, char *localAddr,
                         int rank, int skip[], int skip_local[], int count[]);
  • Select the scheme, depending on the arguments and the environment variables.
  • If the DMA scheme is selected, invoke GET operation from the remote to localAddr, for each commonly contiguous range of both data.
  • Else if the buffering scheme is selected, invoke GET operation from the remote to localBuf for each contiguous range of the remote data, and then copy from localBuf to localAddr. If the contiguous length of the remote data is longer than the size of localBuf, the copy and the PUT operation are repeated.
  • For the GET operation, lower-level library function _XMP_coarray_contiguous_get() is used.

NOTE

  • This function is used for the code optimization.

5. Synchronizations (under construction)

void XMPCO_sync_all(void);

void XMPCO_sync_all_auto(void);

void XMPCO_sync_all_withComm(MPI_Comm comm);


6. Inquire Functions

CoarrayInfo_t *XMPCO_find_descptr(char *addr, int namelen, char *name);

  • Search the list of descriptors for the coarray in which local address addr is included.
  • Return the descriptor descPtr that is corresponding to the coarray.
  • If the descriptor is not found, descPtr becomes NULL that means the coarray is not currently allocated.

NOTE

  • This subroutine/function is used to find the descriptor of the dummy coarray.

7. Internal Functions

7.1 Set Functions to CoarrayInfo

void _XMPCO_set_corank(CoarrayInfo_t *cp, int corank);

  • set corank of the coarray to CoarrayInfo cp.

[F] void _XMPCO_set_codim_withBounds(CoarrayInfo_t *cp, int dim, int lb, int ub);

  • Set the lower bound lb and the upper bound ub of the dim-th codimension for the coarray variable corresponding to cp.
  • The extent size is evaluated as ub - lb + 1 for each codimension.
  • The number of codimensions starts from 0, i.e., 0 <= dim < corank.
  • The values of ubound and size are ignored for the last codimension (dim = corank - 1).

RESTRICTION

  • This function must be called after calling _XMPCO_set_corank for the same CoarrayInfo cp.

[C] void _XMPCO_set_codim_withSize(CoarrayInfo_t *cp, int dim, int lb, int size);

  • Set the lower bound lb and the extent size of the dim-th codimension for the coarray variable corresponding to cp.
  • The upper bound ub is evaluated as lb + size - 1 for each codimension.
  • The number of codimensions starts from 0, i.e., 0 <= dim < corank.
  • The values of ubound and size are ignored for the last codimension (dim = corank - 1).

RESTRICTION

  • This function must be called after calling _XMPCO_set_corank for the same CoarrayInfo cp.

void _XMPCO_set_varname(CoarrayInfo_t *cp, int namelen, char *name);

  • set the name of the coarray to CoarrayInfo cp. The character strings starting from name with the length namelen is copied as the name of the coarray. E.g., if namelen is 3 and name is "abcdef...", then the name is copied strings "abc".

NOTE

  • Since the name of coarray is used only for diagnostic message output and debugging purpose in the runtime, the call of this function is optional.

CoarrayInfo_t* _XMPCO_set_nodes(CoarrayInfo_t *cinfo, _XMP_nodes_t *nodes);

  • Set the lower-level runtime descriptor nodes to CoarrayInfo cinfo. If cinfo is NULL, create a CoarrayInfo previously.
  • Return the input cinfo or the created CoarrayInfo.