Composable Kernel Overview - ROCm/composable_kernel GitHub Wiki
Composable Kernel Features
Hardware Agnostic “Tensor Coordinate Transformation” Primitives
- Enable composition of complex operators from basic ones, without overhead of tensor re-formatting
- GEMM --> Implicit GEMM, Hybrid direct/implicit GEMM
- Reduction --> Pooling, Batch-norm, etc
- Data-Transfer --> Im2Col, Depth2Space, etc
- Automatically generated optimized logic for address calculation associated with coordinate transformation without developer's intervention.
Reusable Tensor Operators for AMD GPUs
- Grid/Block/wave/thread-level tensor operators implemented as C++ templated device functions/classes
- GEMM-like operators
- Reduction-like operators
- Data-transfer-like operators
Prebuilt and Customized Operator Fusion
- Prebuilt fused operators include [Work in progress]
- GEMM/Conv + pointwise Op
- Conv + Pooling
- GEMM/Conv + reduction-like operator
Unified Implementation of Tensor Operators
![image](https://user-images.githubusercontent.com/22615726/138780532-1874a192-d5c8-43f8-a628-3eb0fe40ceec.png)
![image](https://user-images.githubusercontent.com/22615726/138781046-b0de70cd-cd77-486a-b99d-a43d7c453f3e.png)