OpcodeMiscellaneous - hyqneuron/asfermi GitHub Wiki

Note: round brackets, instead of square brackets, are used to represent an optional component. This is to avoid confusion with memory operands which are surrounded by square brackets.

Also, immea and composite operand may be mentioned in the parts below. Please visit these two links for more information regarding their meaning.

Miscellaneous Instructions

S2R

Store special register to general-purpose register.

Instruction usage:

S2R reg0, SRName;

SRName could be one of the names specified below, or it could be SRxxx, where xxx is a non-negative integer less than 256.

Template opcode:

0010 000000 1110 000000 000000 00000000 000000000000000000000000 110100
                   reg0          SRName
SRName value SRName SRName value SRName
0 SR_LaneId 38 SR_CTAid_Y
2 SR_VirtCfg 39 SR_CTAid_Z
3 SR_VirtId 40 SR_NTid
4 SR_PM0 41 SR_NTid_X
5 SR_PM1 42 SR_NTid_Y
6 SR_PM2 43 SR_NTid_Z
7 SR_PM3 44 SR_GridParam
8 SR_PM4 45 SR_NCTAid_X
9 SR_PM5 46 SR_NCTAid_Y
10 SR_PM6 47 SR_NCTAid_Z
11 SR_PM7 48 SR_SWinLo
16 SR_PRIM_TYPE 49 SR_SWINSZ
17 SR_INVOCATION_ID 50 SR_SMemSz
18 SR_Y_DIRECTION 51 SR_SMemBanks
24 SR_MACHINE_ID_0 52 SR_LWinLo
25 SR_MACHINE_ID_1 53 SR_LWINSZ
26 SR_MACHINE_ID_2 54 SR_LMemLoSz
27 SR_MACHINE_ID_3 55 SR_LMemHiOff
28 SR_AFFINITY 56 SR_EqMask
32 SR_Tid 57 SR_LtMask
33 SR_Tid_X 58 SR_LeMask
34 SR_Tid_Y 59 SR_GtMask
35 SR_Tid_Z 60 SR_GeMask
36 SR_CTAParam 80 SR_ClockLo
37 SR_CTAid_X 81 SR_ClockHi

Related dimensions:

c [0x0] [0x8] : %ntid.x
c [0x0] [0xc] : %ntid.y
c [0x0] [0x10]: %ntid.z
c [0x0] [0x14]: %nctaid.x
c [0x0] [0x18]: %nctaid.y
c [0x0] [0x1c]: %nctaid.z
BFE VirtId, 0x914 : %smid
BFE VirtCfg, 0x914: %nsmid

LEPC

Instructon usage:

LEPC reg0;

Template opcode

0010 000000 1110 000000 000000 00000000000000000000000000000000 100010
                   reg0

CCTL

Instruction usage:

CCTL(.E)(.Op1).Op2 reg0, [reg1+0xabcd];

0xabcd should be a multiple of 4. Template opcode:

1010 000000 1110 000000 000000   00 000000000000000000000000000000    0 11001
        mod        reg0   reg1 mod2                         0xabcd mod3
mod 2:4 value .Op2
0 QRY1
1 PF1
2 PF1_5
3 PR2
4 WB
5 IV
6 IVALL
7 RS
mod2 value .Op1
0 default
1 .U
2 .C
3 .I
mod3 meaning
0 default
1 .E

CCTLL

Instruction usage:

CCTLL.Op1 reg0, [reg1 + 0xabcd];

0xabcd should can be at most 24 bits long and should be a multiple of 4. asfermi will allow numbers that are not multiples of 4 to be processed and written in the opcodes, but cuobjdump ignores the lowest 2 bits, and the hardware's behaviour is not confirmed.

Template opcode

1010 000000 1110 000000 000000 000000000000000000000000 00000000 001011
        mod        reg0   reg1                   0xabcd 
mod 2:4 value .Op2
0 QRY1
1 PF1
2 PF1_5
3 PR2
4 WB
5 IV
6 IVALL
7 RS

PSETP

Instruction usage:

PSETP.(Mainop)(.Logicop) p0, p1, (!)p2, (!)p3, ((!)p4);

Template opcode

0010 000000 1110 000 000 1110 00 0000   00 00000000000000000 0000 00000 110000
                  p1  p0   p4      p3 mod2                     p2  mod3
mod2 0:1 .Mainop
00 .AND
10 .OR
01 .XOR
11 invalid
mod3 3:4 .Logicop
00 .AND
10 .OR
01 .XOR
11 invalid