Coarray Object Interface - omni-compiler/omni-compiler GitHub Wiki
Contents
Object Interface Specifications
- Introduction
- Initialization of Static Coarrays
- Allocation/deallocation of Allocatable Coarrays
- PUT and GET Communications
- Synchronizations
- Inquire Functions
- Internal functions
1. Introduction
1.1 Terms
registration
Recording the address and size of a data object to the communication library. The registration is collective and may contain pin-down memory, acquirement of the global address, sharing information among all images, etc. After the registration, the data is allowed to be accessed from the other images.
Chunk
Memory area that contains one or more coarray data objects. To keep 8-byte (or another) memory alignment, the Chunk may contain extra data at the tail of each data object. A Chunk is shared from the Memory Pool or is allocated and registered by the libraries, depending on the memory manager.
Static Chunk
The Chunk that contains small static coarrays and is shared from the Memory Pool.
Memory Pool
The memory area that is allocated and registered before the program execution.
It contains the communication buffer localBuf, a small system area, and an optional Static Chunk.
1.2 Memory Allocation Methods
The RA and CA methods cannot be implemented over GASNet because GASNet does not support multiple registration or de-registration.
The RS method has not been implemented over FJ-RDMA because no merit was found comparing to the RA method/FJ-RDMA.
The Runtime Sharing (RS) method
- Memory Pool is allocated and registered with the size specified by the user,
- Static coarrays are shared from the Memory Pool, and
- Allocatable coarrays are pulled from and pushed to the Memory Pool.

The Runtime Allocation (RA) method
- Memory Pool is allocated and registered with the size added the total size of static coarrays, which is calculated before the program execution.
- Static coarrays are shared from the Memory Pool, and
- Allocatable coarrays are allocated at runtime and registered with the communication library.

The Compiler Allocation (CA) method
- Memory Pool is allocated and registered not including the sizes of coarrays,
- Static coarrays are declared as static variables of the base language and registered with the communication library, and
- Allocatable coarrays are allocated at runtime and registered with the communication library.

2. Initialization of Static Coarrays
Static coarrays are allocated and regisgered before the program execution.
[RS] [RA]
XMPCO_count_sizefor all static coarrays (2.1)XMPCO_malloc_pool(2.2)XMPCO_malloc_staticCoarrayfor all static coarrays (2.3)
[CA]
XMPCO_malloc_pool(2.2)XMPCO_regmem_staticCoarrayfor all static coarrays (2.4)
For each instance of subroutine/function, a ResourceSet structure is dynamically created (2.5) and deleted (2.6) at the entry and exit point, respectively, if any allocatable coarray variables appear in it.
2.1 XMPCO_count_size [RS][RA]
void XMPCO_count_size(int count, size_t element);
- If the size does not exceed
POOL_THRESHOLD, the sizecount * elementis added to the size of the Static Chunk. The initial size of the Static Chunk is zero.
RESTRICTION
- All images must call the same instances with the same values of arguments.
NOTE
- This function should be called for every static coarray in the program.
- The Static Chunk does not contain huge-size static coarrays that are to be allocated separately from the Static Chunk.
2.2 XMPCO_malloc_pool
void xmp_XMPCO_malloc_pool(void);
- Get the descriptor of the Memory Pool from the lower-level runtime library.
- [RS] The memory pool is already allocated and registered by the low-level library.
- [RA] [CA] The memory pool is allocated and registered via the low-level library.
- The Memory Pool initially contains the following data:
localBuf-- the static local buffer for one-sided communication. The size is defined inxmpco_params.h.- small work area for the system. The size is less than 100 bytes.
- [RS][RA] Static Chunk -- a Chunk shared by the static coarrays. The size is determined by iterative calls of
XMPCO_count_size.
RESTRICTION
- This function should be called once and before the program execution.
- [RS] The size of the Memory Pool cannot exceed the size of the heap area.
NOTE
- [RS] The size of the heap area can be set with environment variable
XMP_ONESIDED_HEAP_SIZE. For example:
% XMP_ONESIDED_HEAP_SIZE=70M
2.3 XMPCO_malloc_staticCoarray [RS] [RA]
CoarrayInfo_t *XMPCO_malloc_staticCoarray(char **addr, int count, size_t element, int namelen, char *name);
- If the size
count * elementis not larger thanPOOL_THRESHOLD, then:- Share the size of memory from the Static Chunk.
- If the size
count * elementis larger thanPOOL_THRESHOLD, then:- [RS] Share the size of memory from the Memory Pool outside of the Static Chunk.
- [RA] Allocate the size of memory by using the libraries and register the address and the size.
- Set the local address of the coarray to
*addr - Return the descriptor of the coarray, that is a pointer to the CoarrayInfo corresponding to the coarray.
2.4 XMPCO_regmem_staticCoarray [CA]
CoarrayInfo_t *XMPCO_regmem_staticCoarray(void *var, int count, size_t element, int namelen, char *name);
- Register the address
varand the sizecount * elementof the coarray. - Return the descriptor of the coarray, that is the pointer to the CoarrayInfo corresponding to the coarray.
2.5 XMPCO_prolog
void XMPCO_prolog(ResourceSet_t **rsetp, int namelen, char *name);
- Create a ResourceSet structure corresponding to the procedure (subroutine or function) with the name specified with
namelenandname. - Set
*rsetpto the pointer to the ResourceSet.
NOTE
- This function should be called at the entry of such procedure that includes any allocatable coarray variables.
- All allocatable coarray variables belonging to the procedure should be linked to the ResourceSet.
2.4 XMPCO_epilog
void XMPCO_epilog(ResourceSet_t **rsetp);
- Delete the ResourceSet structure
*rsetp. - Set
*rsetpto NULL.
NOTE
- This function should be called at the exit of such procedure that includes any allocatable coarray variables.
3. Allocation/deallocation of Allocatable Coarrays
3.1 XMPCO_malloc_coarray [RS][RA]
extern CoarrayInfo_t *XMPCO_malloc_coarray(char **addr, int count, size_t element, ResourceSet_t *rset);
- [RS] Share the size of memory
count * elementfrom the Memory Pool. The memory manager determines the address at the top of the vacant area. - [RA] Allocate and register the size of memory
count * elementby the standard C and the communication libraries.
- Create a new CoarrayInfo and link to the ResourceSet
rsetcorresponding to the program context (e.g., a function).
- Set the local address of the coarray to
*addr - Return the descriptor of the coarray, that is the pointer to the created CoarrayInfo.
3.2 XMPCO_free_coarray [RS][RA]
extern void XMPCO_free_coarray(CoarrayInfo_t *cinfo);
- Unlink the coarray of
cinfofrom the parent Chunk. - If the parent Chunk does not have coarrays yet, invoke the garbage collector.
TBA: about the garbage collector
3.3 XMPCO_regmem_coarray [CA]
extern CoarrayInfo_t *XMPCO_regmem_coarray(void *var, int count, size_t element, ResourceSet_t *rset);
- Register the address
varand the sizecount * elementof the coarray. - Return the descriptor of the coarray, that is the pointer to the CoarrayInfo corresponding to the coarray.
- Create a new CoarrayInfo and link to the ResourceSet
rsetcorresponding to the program context (e.g., a function). - Return the descriptor of the coarray, that is the pointer to the created CoarrayInfo.
NOTE
- This function should be called immediately after the allocation of the variable
varwith sizecount * element.
3.4 XMPCO_deregmem_coarray [CA]
extern void XMPCO_deregmem_coarray(CoarrayInfo_t *cinfo);
- Unlink the coarray of
cinfofrom the parent Chunk. - If the parent Chunk does not have coarrays yet, de-register the chunk of memory from the communication library.
NOTE
- This function should be called immediately before freeing the variable
var.
4. PUT and GET Communications
4.1 XMPCO_PUT_scalarStmt
void XMPCO_PUT_scalarStmt(CoarrayInfo_t *descPtr, char *baseAddr, int element,
int coindex, char *rhs, BOOL synchronous);
- Select the scheme, depending on the arguments and the environment variables.
- If the DMA scheme is selected, invoke PUT operation from
rhsto the remote. - Else if the buffering scheme is selected, copy the data from
rhsto localBuf and then invoke PUT operation from localBuf to the remote. If the length of dataelementis longer than the size of localBuf, the copy and the PUT invocation are repeated. - For the PUT operation, lower-level library functions
_XMP_coarray_contiguous_put()and_XMP_atomic_define_1()are used ifsynchronousis false and true, respectively.
NOTE
- Argument
synchronousshould be set to true in order to implement coarray intrinsicatomic_define.
4.2 XMPCO_PUT_arrayStmt
void XMPCO_PUT_arrayStmt(CoarrayInfo_t *descPtr, char *baseAddr, int element,
int coindex, char *rhs, int rank,
int skip[], int skip_rhs[], int count[],
BOOL synchronous);
- Select the scheme, depending on the arguments and the environment variables.
- If the DMA scheme is selected, invoke PUT operation from
rhsto the remote, for each commonly contiguous range of both data. - Else if the buffering scheme is selected, copy the data from
rhsto localBuf, and then invoke PUT operation from localBuf to the remote for each contiguous range of the remote data. If the contiguous length of the remote data is longer than the size of localBuf, the copy and the PUT operation are repeated. - For the PUT operation, lower-level library function
_XMP_coarray_contiguous_put()is used ifsynchronousis false.
NOTE
- Argument
synchronousis not set to true in the current implementation.
4.3 XMPCO_PUT_spread
void XMPCO_PUT_spread(CoarrayInfo_t *descPtr, char *baseAddr, int element,
int coindex, char *rhs, int rank,
int skip[], int count[], BOOL synchronous);
- Spread the data
rhsinto localBuf and then invoke PUT operation fromrhsto the remote, for each contiguous range of the remote data. If the contiguous length of the remote data is longer than the size of localBuf, the spreading and the PUT operation are repeated. - For the PUT operation, lower-level library function
_XMP_coarray_contiguous_put()is used ifsynchronousis false.
NOTE
- Argument
synchronousis not set to true in the current implementation.
4.4 XMPCO_GET_scalarExpr
void XMPCO_GET_scalarExpr(CoarrayInfo_t *descPtr, char *baseAddr,
int element, int coindex, char *result);
- Select the scheme, depending on the arguments and the environment variables.
- If the DMA scheme is selected, invoke GET operation from the remote to
result. - Else if the buffering scheme is selected, invoke GET operation from the remote to localBuf and then copy from localBuf to
result. If the length of dataelementis longer than the size of localBuf, the GET invocation and the copy are repeated. - For the GET operation, lower-level library function
_XMP_coarray_contiguous_get()is used.
NOTE
- Argment
resultshould be anelement-byte contiguous data object and is expected to be allocated in the wrapper layer of the runtime.
4.5 XMPCO_GET_arrayExpr
void XMPCO_GET_arrayExpr(CoarrayInfo_t *descPtr, char *baseAddr,
int element, int coindex, char *result,
int rank, int skip[], int count[]);
- Select the scheme, depending on the arguments and the environment variables.
- If the DMA scheme is selected, invoke GET operation from the remote to
result, for each contiguous range of the remote data. - Else if the buffering scheme is selected, invoke GET operation from the remote to localBuf for each contiguous range of the remote data, and then copy from localBuf to
result. If the contiguous length of the remote data is longer than the size of localBuf, the copy and the PUT operation are repeated. - For the GET operation, lower-level library function
_XMP_coarray_contiguous_get()is used.
NOTE
- Argment
resultshould be a contiguous data object with the same size as the remote data object and is expected to be allocated in the wrapper layer of the runtime.
4.6 XMPCO_GET_arrayStmt
void XMPCO_GET_arrayStmt(CoarrayInfo_t *descPtr, char *baseAddr,
int element, int coindex, char *localAddr,
int rank, int skip[], int skip_local[], int count[]);
- Select the scheme, depending on the arguments and the environment variables.
- If the DMA scheme is selected, invoke GET operation from the remote to
localAddr, for each commonly contiguous range of both data. - Else if the buffering scheme is selected, invoke GET operation from the remote to localBuf for each contiguous range of the remote data, and then copy from localBuf to
localAddr. If the contiguous length of the remote data is longer than the size of localBuf, the copy and the PUT operation are repeated. - For the GET operation, lower-level library function
_XMP_coarray_contiguous_get()is used.
NOTE
- This function is used for the code optimization.
5. Synchronizations (under construction)
void XMPCO_sync_all(void);
void XMPCO_sync_all_auto(void);
void XMPCO_sync_all_withComm(MPI_Comm comm);
6. Inquire Functions
CoarrayInfo_t *XMPCO_find_descptr(char *addr, int namelen, char *name);
- Search the list of descriptors for the coarray in which local address
addris included. - Return the descriptor
descPtrthat is corresponding to the coarray. - If the descriptor is not found,
descPtrbecomes NULL that means the coarray is not currently allocated.
NOTE
- This subroutine/function is used to find the descriptor of the dummy coarray.
7. Internal Functions
7.1 Set Functions to CoarrayInfo
void _XMPCO_set_corank(CoarrayInfo_t *cp, int corank);
- set
corankof the coarray to CoarrayInfocp.
[F] void _XMPCO_set_codim_withBounds(CoarrayInfo_t *cp, int dim, int lb, int ub);
- Set the lower bound
lband the upper boundubof thedim-th codimension for the coarray variable corresponding tocp. - The extent
sizeis evaluated asub - lb + 1for each codimension. - The number of codimensions starts from 0, i.e., 0 <=
dim< corank. - The values of
uboundandsizeare ignored for the last codimension (dim= corank - 1).
RESTRICTION
- This function must be called after calling
_XMPCO_set_corankfor the same CoarrayInfocp.
[C] void _XMPCO_set_codim_withSize(CoarrayInfo_t *cp, int dim, int lb, int size);
- Set the lower bound
lband the extentsizeof thedim-th codimension for the coarray variable corresponding tocp. - The upper bound
ubis evaluated aslb + size - 1for each codimension. - The number of codimensions starts from 0, i.e., 0 <=
dim< corank. - The values of
uboundandsizeare ignored for the last codimension (dim= corank - 1).
RESTRICTION
- This function must be called after calling
_XMPCO_set_corankfor the same CoarrayInfocp.
void _XMPCO_set_varname(CoarrayInfo_t *cp, int namelen, char *name);
- set the name of the coarray to CoarrayInfo
cp. The character strings starting fromnamewith the lengthnamelenis copied as the name of the coarray. E.g., ifnamelenis 3 andnameis"abcdef...", then the name is copied strings"abc".
NOTE
- Since the name of coarray is used only for diagnostic message output and debugging purpose in the runtime, the call of this function is optional.
CoarrayInfo_t* _XMPCO_set_nodes(CoarrayInfo_t *cinfo, _XMP_nodes_t *nodes);
- Set the lower-level runtime descriptor
nodesto CoarrayInfocinfo. Ifcinfois NULL, create a CoarrayInfo previously. - Return the input
cinfoor the created CoarrayInfo.