Hipblaslt build Guide on Windows - likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU GitHub Wiki
This guide walks you through building HIPBLASLT for ROCm on Windows. If you already have the libraries, you can skip this section!
1,
git clone --branch windows_gfx1201 https://github.com/ROCm/hipBLASLt/commits/windows_gfx1201/
cd hiplbaslt
git checkout commit d9083e69fdaa01326b7b901511ee6770971f921a
Note: the build steps are based on this commit .and the update maybreak this , you may adjust it as necessary.
2, run
python rdeps.py
3, run
python rmake.py -a "gfx1100" --msgpack
After done Locate your Compiled Files:
hipblaslt.dll: Located in C:\ROCM\hipBLASLt\build\release\staging\ (or a similar path based on your build location). Tensile data files: Found within C:\ROCM\hipBLASLt\build\release\Tensile\library\ (adjust the path if needed).
1, one bugs need to fix . edit hipBLASLt\library\src\amd_detail\rocblaslt\src\kernels\compile_code_object.sh
, line 40 ,from$clang_path
to "$clang_path"
2, what if you arch not in support lists . For example gfx1103
simply serach gfx1100
in the entire folder hipblaslt ,include the tensilelite
directory , add the info gfx1103
in similiar sturcture in gfx1103
Mainly change , the files lists below
hipBLASLt\library\src\amd_detail\rocblaslt\src\tensile_host.cpp
cmakelists.txt
hipBLASLt\tensilelite\Tensile\Common.py
tensilelite\Tensile\Ops\AMaxGenerator.py
hipBLASLt\tensilelite\Tensile\Ops\LayerNormGenerator.py
hipBLASLt\tensilelite\Tensile\Source\CMakeLists.txt
hipBLASLt\tensilelite\Tensile\Source\memory_gfx.h
hipBLASLt\tensilelite\Tensile\Source\lib\include\Tensile\AMDGPU.hpp
hipBLASLt\tensilelite\Tensile\Source\lib\include\Tensile\PlaceholderLibrary.hpp
hipBLASLt\tensilelite\Tensile\Source\lib\include\Tensile\Serialization\Predicates.hpp
hipBLASLt\tensilelite\Tensile\Source\lib\source\ocl\OclUtils.cpp
hipBLASLt\tensilelite\Tensile\Utilities\tensile_generator\tensile_config_generator.py
3, get the logic in hipBLASLt\library\src\amd_detail\rocblaslt\src\Tensile\Logic\asm_full\
for your targe arch .
Here's a step-by-step guide:
Choose Your Architecture:
Select an existing architecture folder within hipBLASLt\library\src\amd_detail\rocblaslt\src\Tensile\Logic\asm_full\
(e.g., navi31). This will serve as a template for your new architecture.
Create a new folder with the name of your target architecture (e.g., navi3x
).
Copy Files:
Copy all the files from your chosen template folder into your new architecture folder. Modify Files:
Open the copied files in a code editor (like VS Code or Visual Studio).
Search for instances of navi31 and replace them with navi3x.
Update any gfx1100
references to gfx113x
(or your target GPU's identifier).
Find lines containing ISA: [11, 0, 0]
and replace them with ISA: [11, 0, x]
. (Remember to adjust the ISA code according to your GPU)
"Rename all files within the new folder to reflect your architecture name (e.g., change 'navi31' to 'navi3x'). You can use a file renaming tool like 'File Rename APP', a free application available in the Windows Store, for this task."
after that , you may edit navi3x
into hipBLASLt\tensilelite\Tensile\Common.py
with your target arche gfx110x
also (11, 0, x) into globalParameters["SupportedISA"]
For tunning , you may need to build with Clients in Linux
./install.sh -idc -a gfx1100
while on windows ( failed to build benchmark clients on test machine on currecntly branch).
python rmake.py -a "gfx1100" --no-msgpack -c
For some official not support arches now , eg gfx1030 ... gfx103x series ,
you may refer those self-tunning guides to get the logic file used in hipBLASLt\library\src\amd_detail\rocblaslt\src\Tensile\Logic\asm_full\
https://github.com/ROCm/hipBLASLt/blob/develop/docs/how-to-use-hipblaslt-tuning-utility.rst
https://github.com/ROCm/hipBLASLt/tree/develop/tensilelite/Tensile/Utilities/tensile_generator
or more detailed for MI300X example .
However , thare are not enought templated to further tunning without much knowledage about gemma , you may refer to
template.yaml
in test
For example : tensilelite/Tensile/Tests/common/gemm/fp8fp16mix_hhs.yaml
are enable to tunning mix fp8 and fp16.
or get some help from AI helper . ChatGPT , Deepseek ...