GcnInstrsVop3p - CLRX/CLRX-mirror GitHub Wiki

GCN ISA VOP3P instructions (GCN 1.4)

The VOP3P encoding is derivative of VOP3 encoding. It needs two dwords to one instruction and give many modifiers to control behaviour of instruction. This encoding has been designed for packed half-floating point and packed 16-bit arithmetic instructions.

List of fields for VOP3P encoding:

Bits	Name	Description
0-7	VDST	Vector destination operand
8-10	NEG_HI	Negation modifier for high part of source operands
11-13	OP_SEL	Operand selection for lower part
14,59,60	OP_SEL_HI	Operand selection for high part
15	CLAMP	CLAMP modifier
16-22	OPCODE	Operation code
23-31	ENCODING	Encoding type. Must be 0b110100111
32-40	SRC0	First (scalar or vector) source operand
41-49	SRC1	Second (scalar or vector) source operand
50-58	SRC2	Third (scalar or vector) source operand
61-63	NEG	Negation modifier for lower part of source operands

Typical syntax: INSTRUCTION VDST, SRC0, SRC1, SRC2 [MODifIERS]

Modifiers:

CLAMP - clamps destination floating point value in range 0.0-1.0
-SRC - negate floating point value from source operand. Applied after ABS modifier.
OP_SEL:VALUE|[B0,...] - operand lower half selection (0 - lower 16-bits, 1 - bits)
OP_SEL_HI:VALUE|[B0,...] - operand higher half selection (0 - lower 16-bits, 1 - bits)
NEG - negate floating point value from lower part.
NEG_HI - negate floating point value from higher part.

Operand half selection (OP_SEL) take value with bits number depends of number operands. Zero in bit choose lower 16-bits in dword, one choose higher 16-bits. Example: op_sel:[0,1,1] - higher 16-bits in second source and in third source. List of bits of OP_SEL field:

OP_SEL Bit	OP_SEL_HI Bit	Operand	Description
11	14	SRC0	Choose part of SRC0 (first source operand)
12	59	SRC1	Choose part of SRC1 (second source operand)
13	60	SRC2	Choose part of SRC2 (third source operand)

Limitations for operands:

only one SGPR can be read by instruction. Multiple occurrences of this same SGPR is allowed
only one literal constant can be used, and only when a SGPR or M0 is not used in source operands
only SRC0 can holds LDS_DIRECT

List of the instructions by opcode:

Opcode	Mnemonic
0 (0x0)	V_PK_MAD_I16
1 (0x1)	V_PK_MUL_LO_U16
2 (0x2)	V_PK_ADD_I16
3 (0x3)	V_PK_SUB_I16
4 (0x4)	V_PK_LSHLREV_B16
5 (0x5)	V_PK_LSHRREV_B16
6 (0x6)	V_PK_ASHRREV_I16
7 (0x7)	V_PK_MAX_I16
8 (0x8)	V_PK_MIN_I16
9 (0x9)	V_PK_MAD_U16
10 (0xa)	V_PK_ADD_U16
11 (0xb)	V_PK_SUB_U16
12 (0xc)	V_PK_MAX_U16
13 (0xd)	V_PK_MIN_U16
14 (0xe)	V_PK_FMA_F16
15 (0xf)	V_PK_ADD_F16
16 (0x10)	V_PK_MUL_F16
17 (0x11)	V_PK_MIN_F16
18 (0x12)	V_PK_MAX_F16
32 (0x20)	V_MAD_MIX_F32
33 (0x21)	V_MAD_MIXLO_F16
34 (0x22)	V_MAD_MIXHI_F16

Instruction set

Alphabetically sorted instruction list:

V_MAD_MIX_F32

Opcode: 32 (0x20)
Syntax: V_MAD_MIX_F32 VDST, SRC0, SRC1, SRC2
Description: Multiply single FP value from SRC0 by single FP value SRC1 and add single FP value from SRC2, and store result to VDST. OP_SEL and OP_SEL_HI controls type and place of sources:

OP_SEL	OP_SEL_HI	Meaning
0	0	FP32
1	0	FP32
0	1	FP16 in lower part
1	1	FP32 in higher part

NEG_HI changes meaning to absolute-value modifier.

FLOAT getSource(UINT32 S, BYTE OP_SEL, BYTE OP_SEL_HI, SRCINDEX)
{
    BYTE mask = 1<<SRCINDEX
    if ((OP_SEL_HI&mask) == 0)
        return ASFLOAT(S)
    if ((OP_SEL&mask) == 0 && (OP_SEL_HI&mask) == 1)
        return (FLOAT)ASHALF(S&0xffff)
    else
        return (FLOAT)ASHALF(S>>16)
}
FLOAT SS0 = getSource(SRC0, OP_SEL, OP_SEL_HI, 0)
FLOAT SS1 = getSource(SRC1, OP_SEL, OP_SEL_HI, 1)
FLOAT SS2 = getSource(SRC2, OP_SEL, OP_SEL_HI, 2)
FLOAT S0 = NEG_HI&1 ? ABS(SS0) : SS0
FLOAT S1 = NEG_HI&2 ? ABS(SS1) : SS1
FLOAT S2 = NEG_HI&4 ? ABS(SS2) : SS2
VDST = S0 * S1 + S2

V_MAD_MIXLO_F16

Opcode: 33 (0x21)
Syntax: V_MAD_MIXLO_F16 VDST, SRC0, SRC1, SRC2
Description: Multiply FP value from SRC0 by FP value SRC1 and add half FP value from SRC2, and store result to lower 16-bit of VDST. OP_SEL and OP_SEL_HI controls type and place of sources:

OP_SEL	OP_SEL_HI	Meaning
0	0	FP32
1	0	FP32
0	1	FP16 in lower part
1	1	FP32 in higher part

NEG_HI changes meaning to absolute-value modifier.

FLOAT getSource(UINT32 S, BYTE OP_SEL, BYTE OP_SEL_HI, SRCINDEX)
{
    BYTE mask = 1<<SRCINDEX
    if ((OP_SEL_HI&mask) == 0)
        return ASFLOAT(S)
    if ((OP_SEL&mask) == 0 && (OP_SEL_HI&mask) == 1)
        return (FLOAT)ASHALF(S&0xffff)
    else
        return (FLOAT)ASHALF(S>>16)
}
FLOAT SS0 = getSource(SRC0, OP_SEL, OP_SEL_HI, 0)
FLOAT SS1 = getSource(SRC1, OP_SEL, OP_SEL_HI, 1)
FLOAT SS2 = getSource(SRC2, OP_SEL, OP_SEL_HI, 2)
FLOAT S0 = NEG_HI&1 ? ABS(SS0) : SS0
FLOAT S1 = NEG_HI&2 ? ABS(SS1) : SS1
FLOAT S2 = NEG_HI&4 ? ABS(SS2) : SS2
VDST = (ASUINT32((HALF)(S0 * S1 + S2))&0xfff) | (VDST&0xffff0000)

V_MAD_MIXHI_F16

Opcode: 34 (0x22)
Syntax: V_MAD_MIXHI_F16 VDST, SRC0, SRC1, SRC2
Description: Multiply half FP value from SRC0 by half FP value SRC1 and add half FP value from SRC2, and store result to higher 16-bit part of VDST. OP_SEL and OP_SEL_HI controls type and place of sources:

OP_SEL	OP_SEL_HI	Meaning
0	0	FP32
1	0	FP32
0	1	FP16 in lower part
1	1	FP32 in higher part

NEG_HI changes meaning to absolute-value modifier.

FLOAT getSource(UINT32 S, BYTE OP_SEL, BYTE OP_SEL_HI, SRCINDEX)
{
    BYTE mask = 1<<SRCINDEX
    if ((OP_SEL_HI&mask) == 0)
        return ASFLOAT(S)
    if ((OP_SEL&mask) == 0 && (OP_SEL_HI&mask) == 1)
        return (FLOAT)ASHALF(S&0xffff)
    else
        return (FLOAT)ASHALF(S>>16)
}
FLOAT SS0 = getSource(SRC0, OP_SEL, OP_SEL_HI, 0)
FLOAT SS1 = getSource(SRC1, OP_SEL, OP_SEL_HI, 1)
FLOAT SS2 = getSource(SRC2, OP_SEL, OP_SEL_HI, 2)
FLOAT S0 = NEG_HI&1 ? ABS(SS0) : SS0
FLOAT S1 = NEG_HI&2 ? ABS(SS1) : SS1
FLOAT S2 = NEG_HI&4 ? ABS(SS2) : SS2
VDST = (ASUINT32((HALF)(S0 * S1 + S2))<<16) | (VDST&0xffff)

V_PK_ADD_F16

Opcode: 15 (0xf)
Syntax: V_PK_ADD_F16 VDST, SRC0, SRC1
Description: Add two 16-bit FP values from SRC0 to 16-bit FP values from SRC1, and store result to VDST.
Operation:

HALF S0_0 = ASHALF(SRC0&0xffff), S0_1 = ASHALF(SRC0>>16)
HALF S1_0 = ASHALF(SRC1&0xffff), S1_1 = ASHALF(SRC1>>16)
HALF temp0 = S0_0 + S1_0
HALF temp1 = S0_1 + S1_1
VDST = ASINT16(temp0) | (ASINT16(temp1)<<16)

V_PK_ADD_I16

Opcode: 2 (0x2)
Syntax: V_PK_ADD_I16 VDST, SRC0, SRC1
Description: Add two 16-bit signed integers from SRC0 to 16-bit signed integers from SRC1, and store result to VDST. If CLAMP modifier supplied, then results are saturated to 16-bit signed values.
Operation:

INT16 S0_0 = SRC0&0xffff, S0_1 = SRC0>>16
INT16 S1_0 = SRC1&0xffff, S1_1 = SRC1>>16
INT32 temp0 = S0_0 + S1_0
INT32 temp1 = S0_1 + S1_1
if (CLAMP)
{
    temp0 = (UINT16)MAX(MIN(temp0, 32767), -32768)
    temp1 = (UINT16)MAX(MIN(temp1, 32767), -32768)
}
VDST = (temp0&0xffff) | (temp1<<16)

V_PK_ADD_U16

Opcode: 10 (0xa)
Syntax: V_PK_ADD_U16 VDST, SRC0, SRC1
Description: Add two 16-bit unsigned integers from SRC0 to 16-bit unsigned integers from SRC1, and store result to VDST. If CLAMP modifier supplied, then results are saturated to 16-bit unsigned values.
Operation:

UINT16 S0_0 = SRC0&0xffff, S0_1 = SRC0>>16
UINT16 S1_0 = SRC1&0xffff, S1_1 = SRC1>>16
UINT32 temp0 = S0_0 + S1_0
UINT32 temp1 = S0_1 + S1_1
if (CLAMP)
{
    temp0 = MIN(temp0, 65535)
    temp1 = MIN(temp1, 65535)
}
VDST = (temp0&0xffff) | (temp1<<16)

V_PK_FMA_F16

Opcode: 14 (0xe)
Syntax: V_PK_FMA_F16 VDST, SRC0, SRC1, SRC2
Description: Two fused multiplies-adds on two 16-bit FP values from SRC0, SRC1 and SRC2 and store result to VDST.
Operation:

HALF S0_0 = ASHALF(SRC0&0xffff), S0_1 = ASHALF(SRC0>>16)
HALF S1_0 = ASHALF(SRC1&0xffff), S1_1 = ASHALF(SRC1>>16)
HALF S2_0 = ASHALF(SRC2&0xffff), S2_1 = ASHALF(SRC2>>16)
HALF temp0 = FMA(S0_0, S1_0, S2_0)
HALF temp1 = FMA(S0_1, S1_1, S2_1)
VDST = ASINT16(temp0) | (ASINT16(temp1)<<16)

V_PK_ASHRREV_I16

Opcode: 6 (0x6)
Syntax: V_PK_ASHRREV_I16 VDST, SRC0, SRC1
Description: Arithmetic shift right two 16-bit signed values from SRC1 by numbers of bits given in SRC0.
Operation:

INT16 S0_0 = SRC0&0xffff, S0_1 = SRC0>>16
INT16 S1_0 = SRC1&0xffff, S1_1 = SRC1>>16
UINT16 temp0 = S1_0 >> (S0_0&15)
UINT16 temp1 = S1_1 >> (S0_1&15)
VDST = temp0 | (temp1<<16)

V_PK_LSHLREV_B16

Opcode: 4 (0x4)
Syntax: V_PK_LSHLREV_B16 VDST, SRC0, SRC1
Description: Shift left two 16-bit values from SRC1 by numbers of bits given in SRC0.
Operation:

UINT16 S0_0 = SRC0&0xffff, S0_1 = SRC0>>16
UINT16 S1_0 = SRC1&0xffff, S1_1 = SRC1>>16
UINT16 temp0 = S1_0 << (S0_0&15)
UINT16 temp1 = S1_1 << (S0_1&15)
VDST = temp0 | (temp1<<16)

V_PK_LSHRREV_B16

Opcode: 5 (0x5)
Syntax: V_PK_LSHRREV_B16 VDST, SRC0, SRC1
Description: Shift right two 16-bit values from SRC1 by numbers of bits given in SRC0.
Operation:

UINT16 S0_0 = SRC0&0xffff, S0_1 = SRC0>>16
UINT16 S1_0 = SRC1&0xffff, S1_1 = SRC1>>16
UINT16 temp0 = S1_0 >> (S0_0&15)
UINT16 temp1 = S1_1 >> (S0_1&15)
VDST = temp0 | (temp1<<16)

V_PK_MAD_I16

Opcode: 0 (0x0)
Syntax: V_PK_MAD_I16 VDST, SRC0, SRC1, SRC2
Description: Multiply two 16-bit signed integers from SRC0 by two 16-bit signed integers from SRC1 and add two 16-bit signed integers from SRC2, and store result to VDST. If CLAMP modifier supplied, then results are saturated to 16-bit signed values.
Operation:

INT16 S0_0 = SRC0&0xffff, S0_1 = SRC0>>16
INT16 S1_0 = SRC1&0xffff, S1_1 = SRC1>>16
INT16 S2_0 = SRC2&0xffff, S2_1 = SRC2>>16
INT32 temp0 = S0_0 * S1_0 + S2_0
INT32 temp1 = S0_1 * S1_1 + S2_1
if (CLAMP)
{
    temp0 = (UINT16)MAX(MIN(temp0, 32767), -32768)
    temp1 = (UINT16)MAX(MIN(temp1, 32767), -32768)
}
VDST = (temp0&0xffff) | (temp1<<16)

V_PK_MAD_U16

Opcode: 9 (0x9)
Syntax: V_PK_MAD_U16 VDST, SRC0, SRC1, SRC2
Description: Multiply two 16-bit unsigned integers from SRC0 by two 16-bit unsigned integers from SRC1 and add two 16-bit unsigned integers from SRC2, and store result to VDST. If CLAMP modifier supplied, then results are saturated to 16-bit unsigned values.
Operation:

UINT16 S0_0 = SRC0&0xffff, S0_1 = SRC0>>16
UINT16 S1_0 = SRC1&0xffff, S1_1 = SRC1>>16
UINT16 S2_0 = SRC2&0xffff, S2_1 = SRC2>>16
UINT32 temp0 = S0_0 * S1_0 + S2_0
UINT32 temp1 = S0_1 * S1_1 + S2_1
if (CLAMP)
{
    temp0 = MIN((UINT16)temp0, 65535)
    temp1 = MIN((UINT16)temp1, 65535)
}
VDST = (temp0&0xffff) | (temp1<<16)

V_PK_MAX_F16

Opcode: 18 (0x12)
Syntax: V_PK_MAX_F16 VDST, SRC0, SRC1
Description: Choose greatest 16-bit floating point values between values from SRC0 and SRC1, and store result to VDST.
Operation:

HALF S0_0 = ASHALF(SRC0&0xffff), S0_1 = ASHALF(SRC0>>16)
HALF S1_0 = ASHALF(SRC1&0xffff), S1_1 = ASHALF(SRC1>>16)
HALF temp0 = MAX(S0_0, S1_0)
HALF temp1 = MAX(S0_1, S1_1)
VDST = ASINT16(temp0) | (ASINT16(temp1)<<16)

V_PK_MAX_I16

Opcode: 7 (0x7)
Syntax: V_PK_MAX_I16 VDST, SRC0, SRC1
Description: Choose greatest 16-bit signed integers between values from SRC0 and SRC1, and store result to VDST.
Operation:

INT16 S0_0 = SRC0&0xffff, S0_1 = SRC0>>16
INT16 S1_0 = SRC1&0xffff, S1_1 = SRC1>>16
UINT16 temp0 = MAX(S0_0, S1_0)
UINT16 temp1 = MAX(S0_1, S1_1)
VDST = temp0 | (temp1<<16)

V_PK_MAX_U16

Opcode: 12 (0xc)
Syntax: V_PK_MAX_U16 VDST, SRC0, SRC1
Description: Choose greatest 16-bit unsigned integers between values from SRC0 and SRC1, and store result to VDST.
Operation:

UINT16 S0_0 = SRC0&0xffff, S0_1 = SRC0>>16
UINT16 S1_0 = SRC1&0xffff, S1_1 = SRC1>>16
UINT16 temp0 = MAX(S0_0, S1_0)
UINT16 temp1 = MAX(S0_1, S1_1)
VDST = temp0 | (temp1<<16)

V_PK_MIN_F16

Opcode: 17 (0x11)
Syntax: V_PK_MIN_F16 VDST, SRC0, SRC1
Description: Choose smallest 16-bit floating point values between values from SRC0 and SRC1, and store result to VDST.
Operation:

HALF S0_0 = ASHALF(SRC0&0xffff), S0_1 = ASHALF(SRC0>>16)
HALF S1_0 = ASHALF(SRC1&0xffff), S1_1 = ASHALF(SRC1>>16)
HALF temp0 = MIN(S0_0, S1_0)
HALF temp1 = MIN(S0_1, S1_1)
VDST = ASINT16(temp0) | (ASINT16(temp1)<<16)

V_PK_MIN_I16

Opcode: 8 (0x8)
Syntax: V_PK_MIN_I16 VDST, SRC0, SRC1
Description: Choose smallest 16-bit signed integers between values from SRC0 and SRC1, and store result to VDST.
Operation:

INT16 S0_0 = SRC0&0xffff, S0_1 = SRC0>>16
INT16 S1_0 = SRC1&0xffff, S1_1 = SRC1>>16
UINT16 temp0 = MIN(S0_0, S1_0)
UINT16 temp1 = MIN(S0_1, S1_1)
VDST = temp0 | (temp1<<16)

V_PK_MIN_U16

Opcode: 13 (0xd)
Syntax: V_PK_MIN_U16 VDST, SRC0, SRC1
Description: Choose smallest 16-bit unsigned integers between values from SRC0 and SRC1, and store result to VDST.
Operation:

UINT16 S0_0 = SRC0&0xffff, S0_1 = SRC0>>16
UINT16 S1_0 = SRC1&0xffff, S1_1 = SRC1>>16
UINT16 temp0 = MIN(S0_0, S1_0)
UINT16 temp1 = MIN(S0_1, S1_1)
VDST = temp0 | (temp1<<16)

V_PK_MUL_F16

Opcode: 16 (0x10)
Syntax: V_PK_MUL_F16 VDST, SRC0, SRC1
Description: Multiply two 16-bit FP values from SRC0 by 16-bit FP values from SRC1, and store result to VDST.
Operation:

HALF S0_0 = ASHALF(SRC0&0xffff), S0_1 = ASHALF(SRC0>>16)
HALF S1_0 = ASHALF(SRC1&0xffff), S1_1 = ASHALF(SRC1>>16)
HALF temp0 = S0_0 * S1_0
HALF temp1 = S0_1 * S1_1
VDST = ASINT16(temp0) | (ASINT16(temp1)<<16)

V_PK_MUL_LO_U16

Opcode: 1 (0x1)
Syntax: V_PK_MUL_U16 VDST, SRC0, SRC1
Description: Multiply two 16-bit unsigned integers from SRC0 by two 16-bit unsigned integers from SRC1, and store result to VDST.
Operation:

UINT16 S0_0 = SRC0&0xffff, S0_1 = SRC0>>16
UINT16 S1_0 = SRC1&0xffff, S1_1 = SRC1>>16
UINT16 temp0 = S0_0 * S1_0
UINT16 temp1 = S0_1 * S1_1
VDST = temp0 | (temp1<<16)

V_PK_SUB_I16

Opcode: 3 (0x3)
Syntax: V_PK_SUB_I16 VDST, SRC0, SRC1
Description: Subtract two 16-bit signed integers from SRC1 from 16-bit signed integers from SRC0, and store result to VDST. If CLAMP modifier supplied, then results are saturated to 16-bit signed values.
Operation:

INT16 S0_0 = SRC0&0xffff, S0_1 = SRC0>>16
INT16 S1_0 = SRC1&0xffff, S1_1 = SRC1>>16
INT32 temp0 = S0_0 - S1_0
INT32 temp1 = S0_1 - S1_1
if (CLAMP)
{
    temp0 = (UINT16)MAX(MIN(temp0, 32767), -32768)
    temp1 = (UINT16)MAX(MIN(temp1, 32767), -32768)
}
VDST = (temp0&0xffff) | (temp1<<16)

V_PK_SUB_U16

Opcode: 11 (0xb)
Syntax: V_PK_SUB_U16 VDST, SRC0, SRC1
Description: Subtract two 16-bit unsigned integers from SRC1 from 16-bit unsigned integers from SRC0, and store result to VDST. If CLAMP modifier supplied, then results are saturated to 16-bit unsigned values.
Operation:

UINT16 S0_0 = SRC0&0xffff, S0_1 = SRC0>>16
UINT16 S1_0 = SRC1&0xffff, S1_1 = SRC1>>16
UINT32 temp0 = S0_0 - S1_0
UINT32 temp1 = S0_1 - S1_1
if (CLAMP)
{
    temp0 = MAX(temp0, 0)
    temp1 = MAX(temp1, 0)
}
VDST = (temp0&0xffff) | (temp1<<16)