Composable Kernel Overview - ROCm/composable_kernel GitHub Wiki
Composable Kernel Features
Hardware Agnostic “Tensor Coordinate Transformation” Primitives
- Enable composition of complex operators from basic ones, without overhead of tensor re-formatting
- GEMM --> Implicit GEMM, Hybrid direct/implicit GEMM
- Reduction --> Pooling, Batch-norm, etc
- Data-Transfer --> Im2Col, Depth2Space, etc
- Automatically generated optimized logic for address calculation associated with coordinate transformation without developer's intervention.
Reusable Tensor Operators for AMD GPUs
- Grid/Block/wave/thread-level tensor operators implemented as C++ templated device functions/classes
- GEMM-like operators
- Reduction-like operators
- Data-transfer-like operators
Prebuilt and Customized Operator Fusion
- Prebuilt fused operators include [Work in progress]
- GEMM/Conv + pointwise Op
- Conv + Pooling
- GEMM/Conv + reduction-like operator
Unified Implementation of Tensor Operators

