GcnInstrsSmrd - CLRX/CLRX-mirror GitHub Wiki
GCN ISA SMRD instructions
The basic encoding of the SMRD instructions need 4 bytes (dword). List of fields:
Bits | Name | Description |
---|---|---|
0-7 | OFFSET | Unsigned 8-bit dword offset or SGPR number that holds byte offset |
8 | IMM | IMM indicator |
9-14 | SBASE | Number of aligned SGPR pair. |
15-21 | SDST | Scalar destination operand |
22-26 | OPCODE | Operation code |
27-31 | ENCODING | Encoding type. Must be 0b11000 |
Value of the IMM determines meaning of the OFFSET field:
- IMM=1 - OFFSET holds a dword offset to SBASE.
- IMM=0 - OFFSET holds number of SGPR that holds byte offset to SBASE.
For GCN1.1 when IMM=0 and OFFSET field has value 0xff then next dword is contant literal.
For S_LOAD_DWORD* instructions, 2 SBASE SGPRs holds a base 64-bit address. For S_BUFFER_LOAD_DWORD* instructions, 4 SBASE SGPRs holds a buffer descriptor. In this case, SBASE must be a multipla of 2.
The SMRD instructions can return the result data out of the order. Any SMRD operation
(including S_MEMTIME) increments LGKM_CNT counter. The best way to wait for results
is S_WAITCNT LGKMCNT(0)
.
- LGKM_CNT incremented by one for every fetch of single Dword
- LGKM_CNT incremented by two for every fetch of two or more Dwords
NOTE: Between setting third dword from buffer resource and S_BUFFER_* instruction is required least one instruction (vector or scalar) due to delay.
List of the instructions by opcode:
Opcode | Mnemonic (GCN1.0) | Mnemonic (GCN1.1) |
---|---|---|
0 (0x0) | S_LOAD_DWORD | S_LOAD_DWORD |
1 (0x1) | S_LOAD_DWORDX2 | S_LOAD_DWORDX2 |
2 (0x2) | S_LOAD_DWORDX4 | S_LOAD_DWORDX4 |
3 (0x3) | S_LOAD_DWORDX8 | S_LOAD_DWORDX8 |
4 (0x4) | S_LOAD_DWORDX16 | S_LOAD_DWORDX16 |
8 (0x8) | S_BUFFER_LOAD_DWORD | S_BUFFER_LOAD_DWORD |
9 (0x9) | S_BUFFER_LOAD_DWORDX2 | S_BUFFER_LOAD_DWORDX2 |
10 (0xa) | S_BUFFER_LOAD_DWORDX4 | S_BUFFER_LOAD_DWORDX4 |
11 (0xb) | S_BUFFER_LOAD_DWORDX8 | S_BUFFER_LOAD_DWORDX8 |
12 (0xc) | S_BUFFER_LOAD_DWORDX16 | S_BUFFER_LOAD_DWORDX16 |
29 (0x1d) | -- | S_DCACHE_INV_VOL |
30 (0x1e) | S_MEMTIME | S_MEMTIME |
31 (0x1f) | S_DCACHE_INV | S_DCACHE_INV |
Instruction set
Alphabetically sorted instruction list:
S_BUFFER_LOAD_DWORD
Opcode: 8 (0x8)
Syntax: S_BUFFER_LOAD_DWORD SDST, SBASE(4), OFFSET
Description: Load single dword from read-only memory through constant cache (kcache).
SBASE is buffer descriptor.
Operation:
SDST = *(UINT32*)(SMRD + (OFFSET & ~3))
S_BUFFER_LOAD_DWORDX16
Opcode: 12 (0xc)
Syntax: S_BUFFER_LOAD_DWORDX16 SDST(16), SBASE(4), OFFSET
Description: Load 16 dwords from read-only memory through constant cache (kcache).
SBASE is buffer descriptor.
Operation:
for (BYTE i = 0; i < 16; i++)
SDST[i] = *(UINT32*)(SMRD + i*4 + (OFFSET & ~3))
S_BUFFER_LOAD_DWORDX2
Opcode: 9 (0x9)
Syntax: S_BUFFER_LOAD_DWORDX2 SDST(2), SBASE(4), OFFSET
Description: Load two dwords from read-only memory through constant cache (kcache).
SBASE is buffer descriptor.
Operation:
SDST = *(UINT64*)(SMRD + (OFFSET & ~3))
S_BUFFER_LOAD_DWORDX4
Opcode: 10 (0xa)
Syntax: S_BUFFER_LOAD_DWORDX4 SDST(4), SBASE(4), OFFSET
Description: Load four dwords from read-only memory through constant cache (kcache).
SBASE is buffer descriptor.
Operation:
for (BYTE i = 0; i < 4; i++)
SDST[i] = *(UINT32*)(SMRD + i*4 + (OFFSET & ~3))
S_BUFFER_LOAD_DWORDX8
Opcode: 11 (0xb)
Syntax: S_BUFFER_LOAD_DWORDX8 SDST(8), SBASE(4), OFFSET
Description: Load eight dwords from read-only memory through constant cache (kcache).
SBASE is buffer descriptor.
Operation:
for (BYTE i = 0; i < 8; i++)
SDST[i] = *(UINT32*)(SMRD + i*4 + (OFFSET & ~3))
S_DCACHE_INV
Opcode: 31 (0x1f)
Syntax: S_DCACHE_INV
Description: Invalidate entire L1 K cache.
S_DCACHE_INV_VOL
Opcode: 29 (0x1d) for GCN 1.1
Syntax: S_DCACHE_INV_VOL
Description: Invalidate all volatile lines in L1 K cache.
S_LOAD_DWORD
Opcode: 0 (0x0)
Syntax: S_LOAD_DWORD SDST, SBASE(2), OFFSET
Description: Load single dword from read-only memory through constant cache (kcache).
Operation:
SDST = *(UINT32*)(SMRD + (OFFSET & ~3))
S_LOAD_DWORDX16
Opcode: 4 (0x4)
Syntax: S_LOAD_DWORDX16 SDST(16), SBASE(2), OFFSET
Description: Load 16 dwords from read-only memory through constant cache (kcache).
Operation:
for (BYTE i = 0; i < 16; i++)
SDST[i] = *(UINT32*)(SMRD + i*4 + (OFFSET & ~3))
S_LOAD_DWORDX2
Opcode: 1 (0x1)
Syntax: S_LOAD_DWORDX2 SDST(2), SBASE(2), OFFSET
Description: Load two dwords from read-only memory through constant cache (kcache).
SDST = *(UINT64*)(SMRD + (OFFSET & ~3))
S_LOAD_DWORDX4
Opcode: 2 (0x2)
Syntax: S_LOAD_DWORDX4 SDST(4), SBASE(2), OFFSET
Description: Load four dwords from read-only memory through constant cache (kcache).
Operation:
for (BYTE i = 0; i < 4; i++)
SDST[i] = *(UINT32*)(SMRD + i*4 + (OFFSET & ~3))
S_LOAD_DWORDX8
Opcode: 3 (0x3)
Syntax: S_LOAD_DWORDX8 SDST(8), SBASE(2), OFFSET
Description: Load eight dwords from read-only memory through constant cache (kcache).
Operation:
for (BYTE i = 0; i < 8; i++)
SDST[i] = *(UINT32*)(SMRD + i*4 + (OFFSET & ~3))
S_MEMTIME
Opcode: 30 (0x1e)
Syntax: S_MEMTIME SDST(2)
Description: Store value of 64-bit clock counter to SDST. Before reading result, S_WAITCNT
LGKMCNT(0) is required.
Operation:
SDST = CLOCKCNT