OpcodeMiscellaneous - hyqneuron/asfermi GitHub Wiki
Note: round brackets, instead of square brackets, are used to represent an optional component. This is to avoid confusion with memory operands which are surrounded by square brackets.
Also, immea and composite operand may be mentioned in the parts below. Please visit these two links for more information regarding their meaning.
Miscellaneous Instructions
S2R
Store special register to general-purpose register.
Instruction usage:
S2R reg0, SRName;
SRName could be one of the names specified below, or it could be SRxxx, where xxx is a non-negative integer less than 256.
Template opcode:
0010 000000 1110 000000 000000 00000000 000000000000000000000000 110100
reg0 SRName
SRName value | SRName | SRName value | SRName |
---|---|---|---|
0 | SR_LaneId | 38 | SR_CTAid_Y |
2 | SR_VirtCfg | 39 | SR_CTAid_Z |
3 | SR_VirtId | 40 | SR_NTid |
4 | SR_PM0 | 41 | SR_NTid_X |
5 | SR_PM1 | 42 | SR_NTid_Y |
6 | SR_PM2 | 43 | SR_NTid_Z |
7 | SR_PM3 | 44 | SR_GridParam |
8 | SR_PM4 | 45 | SR_NCTAid_X |
9 | SR_PM5 | 46 | SR_NCTAid_Y |
10 | SR_PM6 | 47 | SR_NCTAid_Z |
11 | SR_PM7 | 48 | SR_SWinLo |
16 | SR_PRIM_TYPE | 49 | SR_SWINSZ |
17 | SR_INVOCATION_ID | 50 | SR_SMemSz |
18 | SR_Y_DIRECTION | 51 | SR_SMemBanks |
24 | SR_MACHINE_ID_0 | 52 | SR_LWinLo |
25 | SR_MACHINE_ID_1 | 53 | SR_LWINSZ |
26 | SR_MACHINE_ID_2 | 54 | SR_LMemLoSz |
27 | SR_MACHINE_ID_3 | 55 | SR_LMemHiOff |
28 | SR_AFFINITY | 56 | SR_EqMask |
32 | SR_Tid | 57 | SR_LtMask |
33 | SR_Tid_X | 58 | SR_LeMask |
34 | SR_Tid_Y | 59 | SR_GtMask |
35 | SR_Tid_Z | 60 | SR_GeMask |
36 | SR_CTAParam | 80 | SR_ClockLo |
37 | SR_CTAid_X | 81 | SR_ClockHi |
Related dimensions:
c [0x0] [0x8] : %ntid.x
c [0x0] [0xc] : %ntid.y
c [0x0] [0x10]: %ntid.z
c [0x0] [0x14]: %nctaid.x
c [0x0] [0x18]: %nctaid.y
c [0x0] [0x1c]: %nctaid.z
BFE VirtId, 0x914 : %smid
BFE VirtCfg, 0x914: %nsmid
LEPC
Instructon usage:
LEPC reg0;
Template opcode
0010 000000 1110 000000 000000 00000000000000000000000000000000 100010
reg0
CCTL
Instruction usage:
CCTL(.E)(.Op1).Op2 reg0, [reg1+0xabcd];
0xabcd should be a multiple of 4. Template opcode:
1010 000000 1110 000000 000000 00 000000000000000000000000000000 0 11001
mod reg0 reg1 mod2 0xabcd mod3
mod 2:4 value | .Op2 |
---|---|
0 | QRY1 |
1 | PF1 |
2 | PF1_5 |
3 | PR2 |
4 | WB |
5 | IV |
6 | IVALL |
7 | RS |
mod2 value | .Op1 |
---|---|
0 | default |
1 | .U |
2 | .C |
3 | .I |
mod3 | meaning |
---|---|
0 | default |
1 | .E |
CCTLL
Instruction usage:
CCTLL.Op1 reg0, [reg1 + 0xabcd];
0xabcd should can be at most 24 bits long and should be a multiple of 4. asfermi will allow numbers that are not multiples of 4 to be processed and written in the opcodes, but cuobjdump ignores the lowest 2 bits, and the hardware's behaviour is not confirmed.
Template opcode
1010 000000 1110 000000 000000 000000000000000000000000 00000000 001011
mod reg0 reg1 0xabcd
mod 2:4 value | .Op2 |
---|---|
0 | QRY1 |
1 | PF1 |
2 | PF1_5 |
3 | PR2 |
4 | WB |
5 | IV |
6 | IVALL |
7 | RS |
PSETP
Instruction usage:
PSETP.(Mainop)(.Logicop) p0, p1, (!)p2, (!)p3, ((!)p4);
Template opcode
0010 000000 1110 000 000 1110 00 0000 00 00000000000000000 0000 00000 110000
p1 p0 p4 p3 mod2 p2 mod3
mod2 0:1 | .Mainop |
---|---|
00 | .AND |
10 | .OR |
01 | .XOR |
11 | invalid |
mod3 3:4 | .Logicop |
---|---|
00 | .AND |
10 | .OR |
01 | .XOR |
11 | invalid |