OpcodeDataMovement - hyqneuron/asfermi GitHub Wiki
Note: round brackets, instead of square brackets, are used to represent an optional component. This is to avoid confusion with memory operands which are surrounded by square brackets.
Also, immea and composite operand may be mentioned in the parts below. Please visit these two links for more information regarding their meaning.
Data Movement Instructions
MOV
Instruction usage:
MOV(.S) reg0, composite operand;
If a hex number is used as the second operand, it cannot be longer than 20 bits. Template opcode:
0010 011110 1110 000000 000000 0000000000000000000000 0000000000 010100
mod reg0 reg1 immea
mod 5 | meaning |
---|---|
0 | default |
1 | .S |
MOV32I
Instruction usage:
MOV32I reg0, 0xabcd/1234;
Template opcode
0100 011110 1110 000000 000000 00000000000000000000000000000000 011000
reg0 0xabcd
LD
Instruction usage:
LD(.E)(.cop)(.type) reg0, [reg1(+0xabcd)];
Template opcode:
1010 000100 1110 000000 000000 00000000000000000000000000000000 0 00001
mod reg0 reg1 0xabcd m
mod 0:1 | .cop |
---|---|
00 | default(.ca) |
10 | .CG |
01 | .CS |
11 | .CV |
mod 2:4 | .type |
---|---|
000 | .U8 |
100 | .S8 |
010 | .U16 |
110 | .S16 |
001 | default(.u32) |
101 | .64 |
011 | .128 |
m 0 | meaning |
---|---|
0 | default |
1 | .E |
LDU
Load uniform. Same as ldu in ptx.
Instruction usage:
LDU(.E)(.type) reg0, [reg1(+0xabcd)];
Template opcode
1010 000100 1110 000000 000000 00000000000000000000000000000000 0 10001
mod reg0 reg1 0xabcd m
.type is the same as in LD
m 0 | meaning |
---|---|
0 | default |
1 | .E |
LDL
Load local memory. Same as ld.local in ptx
Instruction usage:
LDL(.cop)(.type) reg0, [reg1(+0xabcd)];
Template opcode
1010 000100 1110 000000 000000 000000000000000000000000 00000000 000011
mod reg0 reg1 0xabcd
.type is exactly the same as in LD. As for .cop, .CS is replaced with .LU instead.
LDS
Load shared memory. Same as ld.shared in ptx
Instruction usage:
LDS(.type) reg0, [reg1(+0xabcd)];
Template opcode
1010 000100 1110 000000 000000 000000000000000000000000 00000010 000011
mod reg0 reg1 0xabcd
.type is the same as in LD.
Note that 0xabcd is a 24-bit signed integer. Its magnitude by right should not exceed 0xFFFF.
LDC
Load constant memory. Same as ld.const in ptx.
Instruction usage:
LDC(.type) reg0, c[0xa][0xbcde]
Template opcode
0110 000100 1110 000000 000000 0000000000000000 00000 00000000000 101000
mod reg0 reg1 0xbcde 0xa
.type is the same as in LD.
While it appears that 0xa could have a maximum of 0x1f, for now asfermi does not allow any value beyond 0xf. 0xbcde must less than 0x10000.
ST
Instruction usage:
ST(.E)(.cop)(.type) [reg1(+0xabcd)], reg0;
Template opcode:
1010 000100 1110 000000 000000 00000000000000000000000000000000 0 01001
mod reg0 reg1 0xabcd m
For various values of mod 2:4 (.type) and their meaning please refer to the LD instruction above.
mod 0:1 | .cop |
---|---|
00 | default(.wb) |
10 | .CG |
01 | .CS |
11 | .WT |
m 0 | meaning |
---|---|
0 | default |
1 | .E |
STL
Store to local memory. Same as st.local in ptx.
Instruction usage:
STL(.cop)(.type) [reg1(+0xabcd)], reg0;
Template opcode
1010 000100 1110 000000 000000 000000000000000000000000 00000000 010011
mod reg0 reg1 0xabcd
the meaning of mod(.cop and .type) is exactly same as in ST.
STS
Store shared memory. Same as st.shared in ptx
Instruction usage:
STS(.type) [reg1(+0xabcd)], reg0;
Template opcode
1010 000100 1110 000000 000000 000000000000000000000000 00000010 010011
mod reg0 reg1 0xabcd
.type is the same as in LD.
Note that 0xabcd is a 24-bit signed integer. Its magnitude by right should not exceed 0xFFFF.
LDLK
Instruction usage:
LDLK(.type) p, reg0, [reg1 + 0xabcd];
Template opcode
1010 0001 00 1110 000000 000000 00000000000000000000000000000000 0 00101
mod p_0 reg0 reg1 0xabcd p_1
.type is the same as in LD.
p_0 and p_1 combined is p;
Note: While LDLK ought to operate on global memory, thus requiring the use of extended addressing mode in 64-bit environment, so far it seems LDLK does not support the .E modifier. As a result, LDLK may not work for 64-bit environments. (TBC)
LDSLK
Instruction usage:
LDSLK(.type) p, reg0, [reg1 + 0xabcd];
Template opcode
1010 000100 1110 000000 000000 000000000000000000000000 000 00000 100011
mod reg0 reg1 0xabcd p
.type is the same as in LD.
STUL
Instruction usage:
STUL(.type) [reg1+0xabcd], reg0;
Template opcode
1010 000100 1110 000000 000000 00000000000000000000000000000000 010111
mod reg0 reg1 0xabcd
.type is the same as in LD.
Note: While STUL ought to operate on global memory, thus requiring the use of extended addressing mode in 64-bit environment, so far it seems STUL does not support the .E modifier. As a result, STUL may not work for 64-bit environments. (TBC)
STSUL
Instruction usage:
STSUL(.type) [reg1+0xabcd], reg0;
Template opcode
1010 000100 1110 000000 000000 000000000000000000000000 00000000 110011
mod reg0 reg1 0xabcd
.type is the same as in LD.