Design of Vector Arrays in ASR - lcompilers/lpython GitHub Wiki

Current Approach

To represent SIMD arrays, we currently have Array_t with its physical type as SIMD. Now in the ASR Passes or Backends, we might need to do some special checking/handling for SIMD arrays. For example, if we have to skip the SIMD array processing in some pass (for example array_op pass), we do the following at (several) places wherever it is needed:

if (ASR::is_a<ASR::Array_t>(*ASRUtils::expr_type(x.m_target)) &&
    ASR::down_cast<ASR::Array_t>(ASRUtils::expr_type(x.m_target))->m_physical_type
        == ASR::array_physical_typeType::SIMDArray) {
    return;
}

A drawback in this approach is that we do not have a representation for binary operations on arrays at the ASR Level. Consider the following example where A, B and C are SIMD arrays of type reals:

    C = A + B

The ASR that gets generated for the above line is

(=
    (Var 2 c)
    (RealBinOp
        (Var 2 a)
        Add
        (Var 2 b)
        (Array
            (Real 4)
            [((IntegerConstant 1 (Integer 4))
            (IntegerConstant 8 (Integer 4)))]
            SIMDArray
        )
        ()
    )
    ()
)

The concern in the above generated ASR is that it has a RealBinOp node with two variables of type vector arrays. That is, the node RealBinOp does not have scalar real numbers as its children, but has VectorArray's of real numbers.

New Design

Similar to how we supported UnsignedIntegers at the ASR Level, we have the possibility of supporting Vector arrays by adding a new ttype_t, that is VectorArray_t. This new type will support only a single dimension that is in the order of power of 2, unlike the regular arrays that can be multi-dimensional. Along with the type VectorArray_t, we can add nodes for operations on this type like visit_VectorAdd(), visit_VectorSub() and other operations. These can also be collectively grouped under visit_VectorBinOp().

For cast operations between Arrays and VectorArrays we can add a cast type like ArrayToVectorArray or VectorArrayToArray in the ASR Cast_t node.

SIMD in WASM, LLVM and C

WASM

In wasm, in addition to the basic number types, there is a single 128 bit wide vector type representing different types of packed integer or floating-point data. It is represented as v128. The packed data can be interpreted as signed or unsigned integers, single or double precision floating-point values, or a single 128 bit type. The interpretation is determined by individual operations.

SIMD-Operations in Wasm have the following schema:

{interpretation}.{operation}

Where {interpretation} is how the v128 vector is interpreted by the {operation}. The interpretation, that is the prefix describes the shape of the operand, written 𝑡x𝑁 , and consisting of a packed numeric type 𝑡 and the number of lanes 𝑁 of that type. Instructions prefixed with v128 do not involve a specific interpretation, and treat the v128 as an i128 value or a vector of 128 individual bits.

v128.const i128
v128.vvunop
v128.vvbinop
v128.vvternop
v128.vvtestop
i8x16.shuffle laneidx 16
i8x16.swizzle
shape .splat
i8x16.extract_lane_sx laneidx | i16x8.extract_lane_sx laneidx
i32x4.extract_lane laneidx | i64x2.extract_lane laneidx
fshape .extract_lane laneidx
shape .replace_lane laneidx
i8x16.virelop | i16x8.virelop | i32x4.virelop
i64x2.eq | i64x2.ne | i64x2.lt_s | i64x2.gt_s | i64x2.le_s | i64x2.ge_s
fshape .vfrelop
ishape.viunop | i8x16.popcnt
i16x8.q15mulr_sat_s
i32x4.dot_i16x8_s
fshape .vfunop
ishape .vitestop
ishape .bitmask
i8x16.narrow_i16x8_sx | i16x8.narrow_i32x4_sx
i16x8.extend_half _i8x16_sx | i32x4.extend_half _i16x8_sx
i64x2.extend_half _i32x4_sx
ishape .vishiftop
ishape .vibinop
i8x16.viminmaxop | i16x8.viminmaxop | i32x4.viminmaxop
i8x16.visatbinop | i16x8.visatbinop
i16x8.mul | i32x4.mul | i64x2.mul
i8x16.avgr_u | i16x8.avgr_u
i16x8.extmul_half _i8x16_sx | i32x4.extmul_half _i16x8_sx | i64x2.extmul_half _i32x4_sx i16x8.extadd_pairwise_i8x16_sx | i32x4.extadd_pairwise_i16x8_sx
fshape .vfbinop
i32x4.trunc_sat_f32x4_sx | i32x4.trunc_sat_f64x2_sx _zero
f32x4.convert_i32x4_sx | f32x4.demote_f64x2_zero
f64x2.convert_low_i32x4_sx | f64x2.promote_low_f32x4
vvunop ::= not
vvbinop ::= and|andnot|or|xor
vvternop ::= bitselect
vvtestop ::= any_true
vitestop ::= all_true
virelop ::= eq|ne|lt_sx |gt_sx |le_sx |ge_sx
vfrelop ::= eq|ne|lt|gt|le|ge
viunop ::= abs | neg
vibinop ::= add | sub
viminmaxop ::= min_sx | max_sx
visatbinop ::= add_sat_sx | sub_sat_sx
vishiftop ::= shl | shr_sx
vfunop ::= abs|neg|sqrt|ceil|floor|trunc|nearest
vfbinop ::= add|sub|mul|div|min|max|pmin|pmax

The above vector instructions can be grouped into several subcategories:

  • Constants: return a static constant.
  • Unary Operations: consume one v128 operand and produce one v128 result.
  • Binary Operations: consume two v128 operands and produce one v128 result.
  • Ternary Operations: consume three v128 operands and produce one v128 result.
  • Tests: consume one v128 operand and produce a Boolean integer result.
  • Shifts: consume a v128 operand and a i32 operand, producing one v128 result.
  • Splats: consume a value of numeric type and produce a v128 result of a specified shape.
  • Extract lanes: consume a v128 operand and return the numeric value in a given lane.
  • Replace lanes: consume a v128 operand and a numeric value for a given lane, and produce a v128 result.

LLVM

LLVM supports a vector type that is a simple derived type that represents a vector of elements. Vector types are used when multiple primitive data are operated in parallel using a single instruction (SIMD). A vector type requires a size (number of elements), an underlying primitive data type, and a scalable property to represent vectors where the exact hardware vector length is unknown at compile time.

< <# elements> x <elementtype> >          ; Fixed-length vector
< vscale x <# elements> x <elementtype> > ; Scalable vector
Syntax Description
<4 x i32> Vector of 4 32-bit integer values.
<8 x float> Vector of 8 32-bit floating-point values.
<2 x i64> Vector of 2 64-bit integer values.
<4 x i64*> Vector of 4 pointers to 64-bit integer values.
<vscale x 4 x i32> Vector with a multiple of 4 32-bit integer values.

LLVM supports several instructions to represent vector operations in a target-independent manner. These instructions cover the element-access and vector-specific operations needed to process vectors effectively.

  • extractelement Instruction: It extracts a single scalar element from a vector at a specified index.
<result> = extractelement <n x <ty>> <val>, <ty2> <idx>  ; yields <ty>
<result> = extractelement <vscale x n x <ty>> <val>, <ty2> <idx> ; yields <ty>
  • insertelement Instruction: It inserts a scalar element into a vector at a specified index.
<result> = insertelement <n x <ty>> <val>, <ty> <elt>, <ty2> <idx>    ; yields <n x <ty>>
<result> = insertelement <vscale x n x <ty>> <val>, <ty> <elt>, <ty2> <idx> ; yields <vscale x n x <ty>>
  • shufflevector Instruction: It constructs a permutation of elements from two input vectors, returning a vector with the same element type as the input and length that is the same as the shuffle mask.
<result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <m x i32> <mask>    ; yields <m x <ty>>
<result> = shufflevector <vscale x n x <ty>> <v1>, <vscale x n x <ty>> v2, <vscale x m x i32> <mask>  ; yields <vscale x m x <ty>>

C

Documentation

⚠️ **GitHub.com Fallback** ⚠️