Home - ROCm/Tensile GitHub Wiki

[!WARNING] This wiki is obsolete. For the latest documentation, go to rocm.docs.amd.com/projects/Tensile

Tensile is a tool for creating a benchmark-driven backend library for GEMMs, GEMM-like problems (such as batched GEMM), N-dimensional tensor contractions, and anything else that multiplies two multi-dimensional objects together on AMD GPU.

Overview for creating a custom TensileLib backend library for your application :

  1. Install the PyYAML, CMake, OpenMP, MessagePack, and other dependencies (mandatory), git clone and cd Tensile
  2. Create a benchmark config.yaml file in ./Tensile/Configs/
  3. Run the benchmark. After the benchmark is finished. Tensile will dump 4 directories: 1 & 2 is about benchmarking. 3 & 4 is the summarized results from your library (like rocBLAS) viewpoints.

0_Build: has a client exe, so you can launch from a library viewpoint.

1_BenchmarkProblems: has all the problems descriptions and executables generated during benchmarking, where you can re-launch the script (run.sh) to reproduce results.

2_BenchmarkData: has the raw performance results for all kernels, in csv and yaml formats.

3_LibraryLogic: has optimal kernel configurations yaml file. Usually rocBLAS takes the yaml files from this folder.

4_LibraryClient: has the code objects, kernels, and library code. This is the output of running TensileCreateLibrary using the 3_LibraryLogic directory as an input

  1. Add the Tensile library to your application's CMake target. The Tensile library will be written, compiled and linked to your application at application-compile-time.