Can I use runtime sized stack arrays inside kernels? - mrnorman/YAKL GitHub Wiki

In general, unfortunately, you cannot use runtime-sized or variable-sized arrays on the stack inside kernels. This might be supported when parallel_for is run on the host, since this is generally allowed in host-side C++ code. However, most device kernels (CUDA and HIP for sure) must know the size of the stack at compile time. Therefore, it is best to avoid trying to size arrays on the stack (and SArray and FSArray objects) with a variable only known at runtime. Integral literals, pre-processor macros, or constexpr variables are necessary to determine the size of an array placed on the stack.