Hipblaslt build Guide on Windows - likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU GitHub Wiki

This guide walks you through building HIPBLASLT for ROCm on Windows. If you already have the libraries, you can skip this section!

1,

 git clone --branch windows_gfx1201 https://github.com/ROCm/hipBLASLt/commits/windows_gfx1201/
cd hiplbaslt
git checkout commit d9083e69fdaa01326b7b901511ee6770971f921a 

Note: the build steps are based on this commit .and the update maybreak this , you may adjust it as necessary.

2, run

python rdeps.py

3, run

python rmake.py -a "gfx1100" --msgpack

After done Locate your Compiled Files:

hipblaslt.dll: Located in C:\ROCM\hipBLASLt\build\release\staging\ (or a similar path based on your build location). Tensile data files: Found within C:\ROCM\hipBLASLt\build\release\Tensile\library\ (adjust the path if needed).

1, one bugs need to fix . edit hipBLASLt\library\src\amd_detail\rocblaslt\src\kernels\compile_code_object.sh , line 40 ,from$clang_pathto "$clang_path"

2, what if you arch not in support lists . For example gfx1103

simply serach gfx1100 in the entire folder hipblaslt ,include the tensilelite directory , add the info gfx1103 in similiar sturcture in gfx1103

Mainly change , the files lists below

hipBLASLt\library\src\amd_detail\rocblaslt\src\tensile_host.cpp

cmakelists.txt

hipBLASLt\tensilelite\Tensile\Common.py

tensilelite\Tensile\Ops\AMaxGenerator.py

hipBLASLt\tensilelite\Tensile\Ops\LayerNormGenerator.py

hipBLASLt\tensilelite\Tensile\Source\CMakeLists.txt

hipBLASLt\tensilelite\Tensile\Source\memory_gfx.h

hipBLASLt\tensilelite\Tensile\Source\lib\include\Tensile\AMDGPU.hpp

hipBLASLt\tensilelite\Tensile\Source\lib\include\Tensile\PlaceholderLibrary.hpp

hipBLASLt\tensilelite\Tensile\Source\lib\include\Tensile\Serialization\Predicates.hpp

hipBLASLt\tensilelite\Tensile\Source\lib\source\ocl\OclUtils.cpp

hipBLASLt\tensilelite\Tensile\Utilities\tensile_generator\tensile_config_generator.py

3, get the logic in hipBLASLt\library\src\amd_detail\rocblaslt\src\Tensile\Logic\asm_full\ for your targe arch .

Here's a step-by-step guide:

Choose Your Architecture:

Select an existing architecture folder within hipBLASLt\library\src\amd_detail\rocblaslt\src\Tensile\Logic\asm_full\ (e.g., navi31). This will serve as a template for your new architecture. Create a new folder with the name of your target architecture (e.g., navi3x). Copy Files:

Copy all the files from your chosen template folder into your new architecture folder. Modify Files:

Open the copied files in a code editor (like VS Code or Visual Studio). Search for instances of navi31 and replace them with navi3x. Update any gfx1100 references to gfx113x (or your target GPU's identifier). Find lines containing ISA: [11, 0, 0] and replace them with ISA: [11, 0, x]. (Remember to adjust the ISA code according to your GPU) "Rename all files within the new folder to reflect your architecture name (e.g., change 'navi31' to 'navi3x'). You can use a file renaming tool like 'File Rename APP', a free application available in the Windows Store, for this task."

after that , you may edit navi3x into hipBLASLt\tensilelite\Tensile\Common.py with your target arche gfx110x also (11, 0, x) into globalParameters["SupportedISA"]

For tunning , you may need to build with Clients in Linux


./install.sh -idc -a gfx1100

while on windows ( failed to build benchmark clients on test machine on currecntly branch).

python rmake.py -a "gfx1100" --no-msgpack -c

For some official not support arches now , eg gfx1030 ... gfx103x series ,

you may refer those self-tunning guides to get the logic file used in hipBLASLt\library\src\amd_detail\rocblaslt\src\Tensile\Logic\asm_full\

https://github.com/ROCm/hipBLASLt/blob/develop/docs/how-to-use-hipblaslt-tuning-utility.rst

https://github.com/ROCm/hipBLASLt/tree/develop/tensilelite/Tensile/Utilities/tensile_generator

or more detailed for MI300X example .

https://github.com/ROCm/ROCm/blob/develop/docs/how-to/tuning-guides/mi300x/workload.rst#rocm-library-tuning

However , thare are not enought templated to further tunning without much knowledage about gemma , you may refer to

template.yaml in test

For example : tensilelite/Tensile/Tests/common/gemm/fp8fp16mix_hhs.yaml are enable to tunning mix fp8 and fp16.

or get some help from AI helper . ChatGPT , Deepseek ...