ClrxAsmAmd - CLRX/CLRX-mirror GitHub Wiki
CLRadeonExtender Assembler AMD Catalyst handling
The AMD Catalyst driver provides own OpenCL implementation that can generates own binaries of the OpenCL programs. The CLRX assembler supports both OpenCL 1.2 and OpenCL 2.0 binary format. This chapter describes Amd OpenCL 1.2 binary format.
Binary format
The AMD OpenCL binaries contain constant global data, the device and compilation
informations and embedded kernel binaries. Kernel binaries are inside .text
section.
Program code are separate for each kernel and no shared machine code between kernels.
Each kernel binary has the metadata string, ATI CAL notes and program code.
The metadata strings describes the kernel arguments, settings of the
input/output buffers, constant buffers, read only and write only images, local data.
ATI CAL notes are special small data fragments that describes features of the kernel.
The most important ATI CAL note is PROGINFO that holds important data for runtime execution,
like register usage, UAV usage, floating point setup.
A .data
section inside kernel is usable section and holds same zeroes.
Layout of the source code
The CLRX assembler allow to use one of two ways to configure kernel setup:
for human (.config
) and for quick recompilation (ATI CALNotes and the metadata string).
Register usage setup
The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs. This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
Scalar register allocation
To used scalar registers, assembler add 2 additional registers for handling VCC.
The .sgprsnum
set number of all SGPRs except VCC. Since CLRX policy 0.2 (200)
a .sgprsnum
set number of all SGPRs including VCC.
List of the specific pseudo-operations
.arg
Syntax for scalar: .arg ARGNAME [, "ARGTYPENAME"], ARGTYPE[, unused]
Syntax for structure: .arg ARGNAME, [, "ARGTYPENAME"], ARGTYPE[, STRUCTSIZE[, unused]]
Syntax for image: .arg ARGNAME[, "ARGTYPENAME"], ARGTYPE[, [ACCESS] [, RESID[, unused]]]
Syntax for counter32: .arg ARGNAME[, "ARGTYPENAME"], ARGTYPE[, RESID[, unused]]
Syntax for global pointer: .arg ARGNAME[, "ARGTYPENAME"],
ARGTYPE[[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, RESID[, unused]]]]
Syntax for local pointer: .arg ARGNAME[, "ARGTYPENAME"],
ARGTYPE[[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, unused]]]
Syntax for constant pointer: .arg ARGNAME[, "ARGTYPENAME"],
ARGTYPE[[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, [CONSTSIZE] [, RESID[, unused]]]]
Adds kernel argument definition. Must be inside kernel configuration. First argument is argument name from OpenCL kernel definition. Next optional argument is argument type name from OpenCL kernel definition. Next arugment is argument type:
- char, uchar, short, ushort, int, uint, ulong, long, float, double - simple scalar types
- charX, ucharX, shortX, ushortX, intX, uintX, ulongX, longX, floatX, doubleX - vector types (X indicates number of elements: 2, 3, 4, 8 or 16)
- counter32 - 32-bit counter type
- structure - structure
- image, image1d, image1d_array, image1d_buffer, image2d, image2d_array, image3d - image types
- sampler - sampler
- type* - pointer to data
Rest of the argument depends on type of the kernel argument. STRUCTSIZE determines size of
structure. ACCESS for image determines can be one of the: read_only
, rdonly
or
write_only
, wronly
.
PTRSPACE determines space where pointer points to.
It can be one of: local
, constant
or global
.
ACCESS for pointers can be: const
, restrict
and volatile
.
CONSTSIZE determines maximum size in bytes for constant buffer.
RESID determines resource id.
- for global or constant pointers is UAVID, range is in 8-1023.
- for constant pointers (driver older than 1348.X), range is in 1-159.
- for read only images range is in 0-127.
- For write only images or counters range is in 0-7.
The last argument unused
indicates that argument will not be used by kernel.
Sample usage:
.arg v1,"double_t",double
.arg v2,double2
.arg v3,double3
.arg v23,image2d,
.arg v30,image2d,,5
.arg v41,ulong16 *,global
.arg v42,ulong16 *,global, restrict
.arg v57,structure*,82,global
.boolconsts
This pseudo-operation must be inside kernel. Open ATI_BOOL32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.
.calnote
Syntax: .calnote CALNOTEID
This pseudo-operation must be inside kernel. Open ATI CAL note.
.cbid
Syntax: .cbid Syntax: .cbid VALUE
If this pseudo-operation inside ATI_CONSTANT_BUFFERS CAL note then it adds entry into ATI_CONSTANT_BUFFERS CAL note. If this pseudo-operation in kernel configuration then set constant buffer id.
.cbmask
Syntax: .cbmask INDEX, SIZE
This pseudo-operation must be in ATI_CONSTANT_BUFFERS CAL note. Add entry into ATI_CONSTANT_BUFFERS CAL note.
.compile_options
Syntax: .compile_options "STRING"
Set compile options for this binary.
.condout
Syntax: .condout [VALUE]
Syntax: .condout VALUE
If this pseudo-operation inside kernel then it open ATI_CONDOUT CAL note. Next occurrence in this same kernel, add new CAL note. Optional argument add 4-byte value to content of this CAL note. If this pseudo-operation in kernel configuration then set CONDOUT value.
.config
Open kernel configuration. Must be inside kernel. Kernel configuration can not be defined if any CALNote, metadata or header was defined. Following pseudo-ops can be inside kernel config:
- .arg
- .cbid
- .condout
- .cws
- .dims
- .earlyexit
- .hwlocal
- .hwregion
- .ieeemode
- .localsize
- .pgmrsrc2
- .printfid
- .privateid
- .sampler
- .scratchbuffer
- .sgprsnum
- .tgsize
- .uavid
- .uavprivate
- .useconstdata
- .useprintf
- .userdata
- .vgprsnum
.constantbuffers
This pseudo-operation must be inside kernel. Open ATI_CONSTANT_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.
.cws, .reqd_work_group_size
Syntax: .cws [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
Syntax: .reqd_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
This pseudo-operation must be inside kernel configuration. Set reqd_work_group_size hint for this kernel. In versions earlier than 0.1.7 this pseudo-op has been broken and this pseudo-op set zeroes in two last component instead ones. We recomment to fill all components.
.dims
Syntax: .dims DIMENSIONS
Syntax: .dims GID_DIMS, LID_DIMS
This pseudo-operation must be inside kernel configuration. Define what dimensions (from list: x, y, z) will be used to determine space of the kernel execution. In second syntax form, the dimensions are given for group_id (GID_DIMS) and for local_id (LID_DIMS) separately.
.driver_info
Syntax: .driver_info "INFO"
Set driver info for this binary.
.driver_version
Syntax: .driver_version VERSION
Set driver version for this binary. Version in form: MajorVersion*100+MinorVersion. This pseudo-op replaces driver info.
.earlyexit
Syntax: .earlyexit [VALUE]
Syntax: .earlyexit VALUE
If this pseudo-operation inside kernel then it open ATI_EARLY_EXIT CAL note. Next occurrence in this same kernel, add new CAL note. Optional argument add 4-byte value to content of this CAL note. If this pseudo-operation in kernel configuration then set EARLY_EXIT value.
.entry
Syntax: .entry UAVID, F1, F2, TYPE
Syntax: .entry VALUE1, VALUE2
This pseudo-operation must be in ATI_UAV or ATI_PROGINFO CAL note. Add entry into CAL note. For ATI_UAV, pseudo-operation accepts 4 32-bit values. For ATI_PROGINFO, accepts 2 32-bit values.
.exceptions
Syntax: .exceptions EXCPMASK
This pseudo-operation must be inside kernel configuration. Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
.floatconsts
This pseudo-operation must be inside kernel. Open ATI_FLOAT32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.
.floatmode
Syntax: .floatmode VALUE
This pseudo-operation must be inside kernel configuration. Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Value shall to be byte value. Default value is 0xc0.
.get_driver_version
Syntax: .get_driver_version SYMBOL
Store current driver version to SYMBOL. Version in form version*100 + revision
.
.globalbuffers
This pseudo-operation must be inside kernel. Open ATI_GLOBAL_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.
.globaldata
Go to constant global data section.
.header
Go to main header of the binary.
.hwlocal, .localsize
Syntax: .hwlocal SIZE
Syntax: .localsize SIZE
This pseudo-operation must be inside kernel configuration. Set HWLOCAL value, the initial local data size.
.hwregion
Syntax: .hwregion VALUE
This pseudo-operation must be inside kernel configuration. Set HWREGION value.
.ieeemode
Syntax: .ieeemode
This pseudo-op must be inside kernel configuration. Set ieee-mode.
.inputs
This pseudo-operation must be inside kernel. Open ATI_INPUTS CAL note. Next occurrence in this same kernel, add new CAL note.
.inputsamplers
This pseudo-operation must be inside kernel. Open ATI_INPUT_SAMPLERS CAL note. Next occurrence in this same kernel, add new CAL note.
.intconsts
This pseudo-operation must be inside kernel. Open ATI_INT32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.
.metadata
This pseudo-operation must be inside kernel. Go to metadata content.
.outputs
This pseudo-operation must be inside kernel. Open ATI_OUTPUTS CAL note. Next occurrence in this same kernel, add new CAL note.
.persistentbuffers
This pseudo-operation must be inside kernel. Open ATI_PERSISTENT_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.
.pgmrsrc2
Syntax: .pgmrsrc2 VALUE
This pseudo-operation must be inside kernel configuration. Set PGMRSRC2 value. If dimensions is set then bits that controls dimension setup will be ignored. SCRATCH_EN bit will be ignored.
.printfid
Syntax: .printfid RESID
This pseudo-operation must be inside kernel configuration. Set printfid.
.privateid
Syntax: .privateid RESID
This pseudo-operation must be inside kernel configuration. Set privateid.
.proginfo
This pseudo-operation must be inside kernel. Open ATI_PROGINFO CAL note. Next occurrence in this same kernel, add new CAL note.
.sampler
Syntax: .sampler INPUT, SAMPLER
Syntax: .sampler RESID,....
If this pseudo-operation is in ATI_SAMPLER CAL note, then it adds sampler entry. If this pseudo-operation is in kernel configuration, then it adds samplers with specified resource ids.
.scratchbuffer
Syntax: .scratchbuffer SIZE
This pseudo-operation must be inside kernel configuration. Set scratchbuffer size.
.scratchbuffers
This pseudo-operation must be inside kernel. Open ATI_SCRATCH_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.
.segment
Syntax: .segment OFFSET, SIZE
This pseudo-operation must be in ATI_BOOL32CONSTS, ATI_INT32CONSTS or ATI_FLOAT32CONSTS CAL note. Add entry into CAL note.
.sgprsnum
Syntax: .sgprsnum REGNUM
This pseudo-op must be inside kernel configuration. Set number of scalar registers which can be used during kernel execution. It counts SGPR registers excluding VCC, FLAT_SCRATCH and XNACK_MASK.
.subconstantbuffers
This pseudo-operation must be inside kernel. Open ATI_SUB_CONSTANT_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.
.tgsize
This pseudo-op must be inside kernel configuration. Enable usage of the TG_SIZE_EN.
.uav
This pseudo-operation must be inside kernel. Open ATI_UAV CAL note. Next occurrence in this same kernel, add new CAL note.
.uavid
Syntax: .uavid UAVID
This pseudo-op must be inside kernel configuration. Set UAVId value.
.uavmailboxsize
Syntax: .uavmailboxsize [VALUE]
This pseudo-operation must be inside kernel. Open ATI_UAV_MAILBOX_SIZE CAL note. Next occurrence in this same kernel, add new CAL note. If first argument is given, then 32-bit value will be added to content.
.uavopmask
Syntax: .uavopmask [VALUE]
This pseudo-operation must be inside kernel. Open ATI_UAV_OP_MASK CAL note. Next occurrence in this same kernel, add new CAL note. If first argument is given, then 32-bit value will be added to content.
.uavprivate
Syntax: .uavprivate VALUE
This pseudo-op must be inside kernel configuration. Set uav private value.
.useconstdata
Eanble using of the const data.
.useprintf
Eanble using of the printf mechanism.
.userdata
Syntax: .userdata DATACLASS, APISLOT, REGSTART, REGSIZE
This pseudo-op must be inside kernel configuration. Add USERDATA entry. First argument is data class. It can be one of the following:
- IMM_RESOURCE
- IMM_SAMPLER
- IMM_CONST_BUFFER
- IMM_VERTEX_BUFFER
- IMM_UAV
- IMM_ALU_FLOAT_CONST
- IMM_ALU_BOOL32_CONST
- IMM_GDS_COUNTER_RANGE
- IMM_GDS_MEMORY_RANGE
- IMM_GWS_BASE
- IMM_WORK_ITEM_RANGE
- IMM_WORK_GROUP_RANGE
- IMM_DISPATCH_ID
- IMM_SCRATCH_BUFFER
- IMM_HEAP_BUFFER
- IMM_KERNEL_ARG
- SUB_PTR_FETCH_SHADER
- PTR_RESOURCE_TABLE
- PTR_INTERNAL_RESOURCE_TABLE
- PTR_SAMPLER_TABLE
- PTR_CONST_BUFFER_TABLE
- PTR_VERTEX_BUFFER_TABLE
- PTR_SO_BUFFER_TABLE
- PTR_UAV_TABLE
- PTR_INTERNAL_GLOBAL_TABLE
- PTR_EXTENDED_USER_DATA
- PTR_INDIRECT_RESOURCE
- PTR_INDIRECT_INTERNAL_RESOURCE
- PTR_INDIRECT_UAV
- IMM_CONTEXT_BASE
- IMM_LDS_ESGS_SIZE
- IMM_GLOBAL_OFFSET
- IMM_GENERIC_USER_DAT
Second argument is apiSlot. Third argument determines the first scalar register which will hold userdata. Fourth argument determines how many scalar register needed to hold userdata.
.vgprsnum
Syntax: .vgprsnum REGNUM
This pseudo-op must be inside kernel configuration. Set number of vector registers which can be used during kernel execution.
Sample code
This is sample example of the kernel setup:
/* Disassembling 'DCT_15_5.1' */
.amd
.gpu Pitcairn
.32bit
.compile_options ""
.driver_info "@(#) OpenCL 1.2 AMD-APP (1702.3). Driver version: 1702.3 (VM)"
.kernel DCT
.header
.fill 16, 1, 0x00
.byte 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00
.fill 8, 1, 0x00
.metadata
.ascii ";ARGSTART:__OpenCL_DCT_kernel\n"
.ascii ";version:3:1:111\n"
.ascii ";device:pitcairn\n"
.ascii ";uniqueid:1024\n"
.ascii ";memory:uavprivate:0\n"
.ascii ";memory:hwlocal:0\n"
.ascii ";memory:hwregion:0\n"
.ascii ";pointer:output:float:1:1:0:uav:12:4:RW:0:0\n"
.ascii ";pointer:input:float:1:1:16:uav:13:4:RO:0:0\n"
.ascii ";pointer:dct8x8:float:1:1:32:uav:14:4:RO:0:0\n"
.ascii ";pointer:inter:float:1:1:48:hl:1:4:RW:0:0\n"
.ascii ";value:width:u32:1:1:64\n"
.ascii ";value:blockWidth:u32:1:1:80\n"
.ascii ";value:inverse:u32:1:1:96\n"
.ascii ";function:1:1030\n"
.ascii ";uavid:11\n"
.ascii ";printfid:9\n"
.ascii ";cbid:10\n"
.ascii ";privateid:8\n"
.ascii ";reflection:0:float*\n"
.ascii ";reflection:1:float*\n"
.ascii ";reflection:2:float*\n"
.ascii ";reflection:3:float*\n"
.ascii ";reflection:4:uint\n"
.ascii ";reflection:5:uint\n"
.ascii ";reflection:6:uint\n"
.ascii ";ARGEND:__OpenCL_DCT_kernel\n"
.data
.fill 4736, 1, 0x00
.inputs
.outputs
.uav
.entry 12, 4, 0, 5
.entry 13, 4, 0, 5
.entry 14, 4, 0, 5
.entry 11, 4, 0, 5
.condout 0
.floatconsts
.intconsts
.boolconsts
.earlyexit 0
.globalbuffers
.constantbuffers
.cbmask 0, 32764
.cbmask 1, 0
.inputsamplers
.scratchbuffers
.int 0x00000000
.persistentbuffers
.proginfo
.entry 0x80001000, 0x00000003
.entry 0x80001001, 0x00000017
.entry 0x80001002, 0x00000000
.entry 0x80001003, 0x00000002
.entry 0x80001004, 0x00000002
.entry 0x80001005, 0x00000002
.entry 0x80001006, 0x00000000
.entry 0x80001007, 0x00000004
.entry 0x80001008, 0x00000004
.entry 0x80001009, 0x00000002
.entry 0x8000100a, 0x00000001
.entry 0x8000100b, 0x00000008
.entry 0x8000100c, 0x00000004
.entry 0x80001041, 0x0000000b
.entry 0x80001042, 0x00000018
.entry 0x80001863, 0x00000066
.entry 0x80001864, 0x00000100
.entry 0x80001043, 0x000000c0
.entry 0x80001044, 0x00000000
.entry 0x80001045, 0x00000000
.entry 0x00002e13, 0x00400998
.entry 0x8000001c, 0x00000100
.entry 0x8000001d, 0x00000000
.entry 0x8000001e, 0x00000000
.entry 0x80001841, 0x00000000
.entry 0x8000001f, 0x00007000
.entry 0x80001843, 0x00007000
.entry 0x80001844, 0x00000000
.entry 0x80001845, 0x00000000
.entry 0x80001846, 0x00000000
.entry 0x80001847, 0x00000000
.entry 0x80001848, 0x00000000
.entry 0x80001849, 0x00000000
.entry 0x8000184a, 0x00000000
.entry 0x8000184b, 0x00000000
.entry 0x8000184c, 0x00000000
.entry 0x8000184d, 0x00000000
.entry 0x8000184e, 0x00000000
.entry 0x8000184f, 0x00000000
.entry 0x80001850, 0x00000000
.entry 0x80001851, 0x00000000
.entry 0x80001852, 0x00000000
.entry 0x80001853, 0x00000000
.entry 0x80001854, 0x00000000
.entry 0x80001855, 0x00000000
.entry 0x80001856, 0x00000000
.entry 0x80001857, 0x00000000
.entry 0x80001858, 0x00000000
.entry 0x80001859, 0x00000000
.entry 0x8000185a, 0x00000000
.entry 0x8000185b, 0x00000000
.entry 0x8000185c, 0x00000000
.entry 0x8000185d, 0x00000000
.entry 0x8000185e, 0x00000000
.entry 0x8000185f, 0x00000000
.entry 0x80001860, 0x00000000
.entry 0x80001861, 0x00000000
.entry 0x80001862, 0x00000000
.entry 0x8000000a, 0x00000001
.entry 0x80000078, 0x00000040
.entry 0x80000081, 0x00008000
.entry 0x80000082, 0x00008000
.subconstantbuffers
.uavmailboxsize 0
.uavopmask
.byte 0x00, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
.fill 120, 1, 0x00
.text
/*befc03ff 00008000*/ s_mov_b32 m0, 0x8000
...
/*bf810000 */ s_endpgm
with kernel configuration:
.amd
.gpu Pitcairn
.32bit
.kernel DCT
.config
.dims xy
.arg output,float*,global
.arg input,float*,global,const
.arg dct8x8,float*,global,const
.arg inter,float*,local
.arg width,uint
.arg blockWidth,uint
.arg inverse,uint
.userdata PTR_UAV_TABLE,0,2,2
.userdata IMM_CONST_BUFFER,0,4,4
.userdata IMM_CONST_BUFFER,1,8,4
.text
/*befc03ff 00008000*/ s_mov_b32 m0, 0x8000
...
/*bf810000 */ s_endpgm