Multi Dimensional Dynamically Allocated Arrays - mrnorman/YAKL GitHub Wiki
// All YAKL Array classes have the following template parameters
// Default memory space depends on whether YAKL_ARCH is defined or not
// Default style is always C-style
template <class T, int rank, int myMem=memDefault, int myStyle=stylec>
// C-style Arrays
Array();
Array(const char *label, int d0 [, int d1,...]); // Owned Array object (reference counted)
Array(const char *label, T *data, int d0 [, int d1,...]); // Non-owned Array object (not ref counted)
// Fortran-style Arrays
Array(const char *label, int d0 [, int d1,...]); // Owned Array object (reference counted)
Array(const char *label, std::vector d0 [, std::vector d1,...]); // Owned Array object (reference counted)
Array(const char *label, T *data, std::vector d0 [, std::vector d1,...]); // Owned Array object (reference counted)
// Fortran-style Arrays with non-1 lower bounds typically use initializer lists
// real(8) :: arr(-1:nx+2)
Array<double,1,memDevice,styleFortran>("arr",{-1,nx+2});YAKL's multi-dimensional Array class provides a convenient way to manage multi-dimensional data (up to 8 dimensions). memDevice Array objects support any C++ type that is std::is_arithmetic<T>. memHost Array objects support any type, since a constructor can be called on the host. Array objects come in C-style or Fortran-style. Array objects can be "owned" (meaning allocation and free is managed by the Array object, and references are counted); or they can be "non-owned", meaning no allocation, free, or reference counting is handled by the Array object. Reference counting means keeping track of how many Array objects are aliasing the same data pointer, to avoid deallocating the data while it's still being used.
Array objects are indexed via the C++ operator() overload:
Array<float,3,memHost,styleC> a("a",4,3,2);
for (int k=0; k < 4; k++) {
for (int j=0; j < 3; j++) {
for (int i=0; i < 2; i++) {
a(k,j,i) = -999.;
}
}
}C-style Array objects have lower bounds of zero and row-major index ordering (meaning the right-most index varies the fastest).
Fortran-style Array objects have lower bounds that default to 1 but can change and column-major index ordering (meaning the left-most index varies the fastest).
All YAKL Array objects have data that is contiguous in memory with no striding.
YAKL Array objects use what is called "shallow copy" by default. What this means is that the copy and move constructors / assignment operators only copy the metadata of the Array and the pointer address, not the data itself. To copy the data itself is called a "deep copy". Therefore, you can performantly pass Array objects by value because only a small amount of data is actually copied.
Array<float,2,memDevice,styleC> a, b, c, d;
// This allocates an owned Array object "a"
a = Array<float,2,memDevice,styleC>("a",100);
yakl::memset(a,0.f);
// This causes b to share the data pointer of A, and it's *very cheap* to perform
b = a; // This is cheap
// The reference counter shared by a and b is now "2"
// Also note that b's label is "a" because a's metadata was copied over
// Because b shared a's data pointer, the following changes a's data as well
memset(b,1.f);
std::cout << a; // this will print all "1"s
// This creates a non-owned Array object that uses the data pointer from a and b
c = Array<float,2,memDevice,styleC>("c",b.data(),100); // This is cheap
// Note that c is *not* reference counted, and no metadata is copied into c.
// Therefore, if you said c is 101 elements, you'll get a memory error
// Also, if you deallocate a's and b's shared data pointer while still using c,
// you'll get a memory error.
// non-owned constructors should be used with significant caution.
// Frankly, they should only be used when converting another language's / framework's
// contiguous data into a YAKL Array object (e.g., from Fortran or Kokkos).
memset(c,2.f);
std::cout << a; // This will be all 2's
std::cout << b; // This will be all 2's
memset(b,3.f);
std::cout << c; // This will be all 3's
// d is its own separate Array object of size 100
// It does not share the data pointer of a, b, and c
d = Array<float,2,memDevice,styleC>("d",100);
// Copy the data from the pointer shared by a, b, and c to d's pointer
c.deep_copy_to(d); // This is expensive
// d has a separate data pointer from a, b, and c.
// Therefore, changes to a, b, and c do not affect d and vice versa
memset(d,4.f);
std::cout << d; // This will be all 4's
std::cout << a; // This will be all 3's
// This creates e as an Array<float,1,memDevice,styleC> object of size 100,
// and it deep copies d's data to e's data pointer
auto e = d.createDeviceCopy(); // This is expensive
memset(e,5.f);
std::cout << e; // This will be all 5's
std::cout << d; // This will be all 4's
std::cout << b; // This will be all 3'sFortran users, please note this: In Fortran, a = b implies a coy of data from b to a. In portable C++ (YAKL), however, a = b only copies the metadata (and the data pointer, but not the data itself).
If you created your Array object inside a function, including main(), you do not need to explicitly deallocate. Just like in Fortran, the moment the Array object falls out of scope, it is automatically deallocated. This is by far the preferred method.
However, there are mainly two cases when explicit deallocation is desirable: (1) the Array object has a static lifetime (e.g., it lives in global or namespace scope); and (2) the Array object is large, and you want to deallocate it before allocating something else to avoid excessive memory usage.
There are three ways to deallocate an Array object:
- Explicitly call the
arr.deallocate()class method - Replace the
Arrayobject with anotherArrayobject (forcing the previous object to fall out of scope). Replacing with an emptyArrayobject is also valid. - Let it fall out of scope naturally
void foo() {
Array<real,1,memDevice,styleC> arr ("arr", 1024*1024*32);
Array<real,1,memDevice,styleC> arr2("arr2",1024*1024*32);
Array<real,1,memDevice,styleC> arr3("arr3",1024*1024*32);
Array<real,1,memDevice,styleC> arr4("arr4",1024*1024*32);
arr.deallocate(); // arr is deallocated
arr2 = Array<real,1,memDevice,styleC>(); // arr2 is deallocated
arr3 = arr4 // arr3 is deallocated and now uses arr4's data pointer and metadata
}
// After this function completes, arr4 is deallocated by falling out of scopeYou can reshape an Array object by either calling the reshape() member function or the collapse() member function.
Array<real,3,memDevice,styleC> a("a",10,9,8);
// Create an Array with two dimensions shaped as (10,72)
auto b = a.reshape<2>({10,72}); // This is cheap
// b now shares a's data pointer in a shallow copy, but its dimensions are different
// You'll have a runtime error if the product of dimensions doesn't match
// create an Array with one dimension, collapsing all dimensions into one
auto c = a.collapse(); // This is cheap
// c now shares the data pointer with a and b, but its dimension is 720The deep_copy_to(), createHostCopy(), and createDeviceCopy() routines manage data between host and device memory spaces.
-
a.deep_copy_to(b);will copy data fromatob, regardless of which memory spacesaandblive. -
auto b = a.createHostCopy();will allocate a newmemHostArrayobject,b, with the metadata shallow copied fromaand the data deep copied fromaregardless of wherealives. -
auto b = a.createDeviceCopy();will allocate a newmemDeviceArrayobject,b, with the metadata shallow copied fromaand the data deep copied fromaregardless of wherealives.
-
YAKL_INLINE T *data(): Get the raw data pointer used by thisArrayobject -
YAKL_INLINE int get_rank(): Get the number of dimensions -
YAKL_INLINE index_t totElems(): Get the total number of array elements.index_tis anunsigned intby default. -
YAKL_INLINE span_is_contiguous(): Always true -
YAKL_INLINE initialized(): Determine if thisArrayobject has been allocated -
const char *label(): Get the label for theArrayobject. Returns empty string ifYAKL_DEBUGis not defined as a macro -
YAKL_INLILNE SArray<index_t,1,rank> get_dimensions(): Get the dimension sizes of thisArrayobject in the form of a staticArrayobject. -
YAKL_INLILNE SArray<index_t,1,rank> get_lbounds(): Get the lower bounds of thisArrayobject in the form of a staticArrayobject. For C-style arrays, this is always zeroes -
YAKL_INLILNE SArray<index_t,1,rank> get_ubounds(): Get the dimension sizes of thisArrayobject in the form of a staticArrayobject. For C-style arrays, this is always the size of the dimension minus 1.
#include <iostream>
Array<float,1,memDevice,styleFortran> arr("arr",10);
memset(arr,10.f);
// This works for host or device arrays. Device arrays are copied to the host before printing
std::cout << arr;YAKL supports only basic slicing at present, meaning that you must slice out entire dimensions rather than partial dimensions. Also, the slices must be contiguous. The syntax is as follows:
using yakl::COLON;
Array<float,3,memDevice,styleFortran> a("a",nx,ny,nz);
// b = a(:,:,k) -- "Grab a horizontal slice at this vertical level"
auto b = a.slice<2>(COLON,COLON,k);
// The number inside the <> is the number of dimensions in the resulting slice
// Also note that the COLON's must be at the left since it's column-major ordering
Array<float,3,memDevice,styleC> c("c",nz,ny,nx);
auto d = c.slice<2>(k,COLON,COLON);
// C-style is row-major ordering, so COLON's must be at the rightImportant: slices are not reference counted or owned. The result of a slice<N>() call is an Array object that uses the data pointer from the Array object it is sliced from. Therefore, if the Array object you slice from is deallocated while you're still using the slice, you'll have a memory error.
If you want to write to the array slice passed to a function, you must save it as a temporary variable first and pass the temporary variable: E.g., auto tmp = arr.slice<2>({COLON,COLON,ind1,ind2}); myfunc(tmp);. This is because C++ treats objects returned by value as const. If you're only reading from the array slice, you can pass it directly inline because it obeys const
To begin, you never allocate an Array with a const underlying data type. The reason is that if the underlying data type is const, then you can't write to it after creating the Array object and allocating the const data pointer. Instead, you create an Array object with non-const type, write to it, and then convert it to a const-type array that shares the same data pointer and reference counter pointer (incrementing the reference counter).
Confused? See the example below:
Array<real,1,memDevice,styleC> arrTemp("arr",10);
parallel_for( 10 , YAKL_LAMBDA (int i) { arrTemp(i) = i; })
// Create a const-type array using the initialized non-const-type array
Array<real const,1,memDevice,styleC> arr(arrTemp); // East const for the win!Now, when you use arr, any attempts to write to it will be met with a nice comprehensible error during compile time, and its data is protected.
Note that you can create a non-const array inside a function, then convert it to a const array object, and pass the resulting const array object back by value. This is because the const array object shared the reference pointer with the non-const one you created, and therefore, the data pointer lives out of the function scope.
The most common situation by far in which memory leaks associated with Array objects will happen is when you declare Array objects with a static lifetime (e.g., you place them in global or namespace scope, or you apply the static keyword to a class data member), meaning they never fall out of scope until the program exits. It is easy to allocate something and forget to deallocate it explicitly.
If you pass -DYAKL_DEBUG to the YAKL compiler flags, array index checking is turned on. It will even detect if you create an Array object on the host and access its data on the device and vice versa (and thrown an error if the YAKL hardware backend has disparate memory spaces between host and device). With -DYAKL_DEBUG, YAKL will also check to see if you try to access data in an Array object that hasn't been allocated yet.
Reference counting, allocation, and deallocation are all inside std::mutex critical regions, and therefore, Array objects can be safely used inside CPU threaded regions that use pthreads, std::thread, or OpenMP CPU threads.