VASM Unofficial Documentation - PPC64/hhvm GitHub Wiki
VASM stands for Virtual Assembler and it's a “agnostic machine code” instructions. And is explained here
While a lot of assemblers might be concerned about moving values into appropriate registers and issuing type/size-specific cpu instructions, this form stops slightly short and focuses on the job at hand with arguments and addresses being relegated to “virtual” registers and instructions. For example, Vasm code to test the result value of a previous expression, and jump on a non-zero value might look like the following:
v << cmp{ inst->src(0), 0 };
v << jcc{ CC_E, {inst->next()} };
We don’t need to care, at this point, what register inst->src(0) refers to (e.g. rax, ebx ...), only that it is the same register which came from some prior instruction’s ->dest() (destination). Similarly, we’re not concerned with how equality is checked. On x86 this might be done by testing the Zero flag, on others it may be by checking the negation of a “Difference” flag. Lastly, we don’t need to know where the next instruction’s entry point is at. It could be one byte ahead (meaning no jump need be performed at all), 64 bytes on (calling for a short jump), or at the other end of our memory model entirely (necessitating a long jump). The next, and final step of compilation will take care of all of these details for us.
The instructions follow a suffix convention for X86:
- b 8-bit
- w 16-bit
- l 32-bit
- q 64-bit
- i immediate
- m Vptr
- p RIPRelativeRef
- s Smashable
However, ARM implement some specific VASM opcodes who's not seens to not follow this convention. I strong recommend you to follow this convention if we need to implement vasm specific opcodes. VASM specific opcodes can be use internally. The VASM->HHIR relationship must be architecture independent. Some specfic vasm can be implemented to replaces inefficient computations.
Vreg: Virtual Register (8, 16, 32, 64 bits)
Vreg64: Virtual Register (64)
Vtuple: Virtual Tuple Data Structure (a pair of Vreg)
Immed: Immediate value (Signal Extended)
Immed64: Immediate value (64 bit extended)
VregSF: Flag register used by x64
Vptr: It's a pointer to a Vreg permits to access register like a pointer overload operators [] and *()
RIPRelativeRef: It's RIP register used by x64
Vlabel: Data structure used to wraps a block number.
Vpoint: Is a handle to record or retrieve a code address.
Fixup:
The Fixup map allows us to reconstruct the state of the VM registers (fp, sp, and pc) from an up-stack invocation record. Each range of bytes in the translation cache is associated with a "distance" in both stack cells and opcode bytes from the beginning of the function. These are known at translation time.
The way this works is by chasing the native rbp chain to find a rbp that we know is a VM frame (i.e. is actually a full ActRec). Once we find that, regsFromActRec is called, which looks to see if the return ip for the frame before the VM frame has an entry in the fixup map (i.e. if it points into the translation cache)---if so, it finds the fixup information in one of two ways:
-
Fixup: the normal case.
The Fixup record just stores an offset relative to the ActRec* for vmsp, and an offset from the start of the func for pc. In the case of resumable frames the sp offset is relative to Stack::resumableStackBase.
-
IndirectFixup: this is used for some shared stubs in the TC.
In this case, some JIT'd code associated with the ActRec* we found made a call to a shared stub, and then that stub called C++. The IndirectFixup record stores an offset to the saved frame pointer two levels deeper in C++, that says where the return IP for the call to the shared stub can be found. I.e., we're trying to chase back two return ips into the TC.
Note that this means IndirectFixups will not work for C++ code paths that need to do a fixup without making at least one other C++ call, but for the current use case this is fine.
callfaststub { TCA target; Fixup fix; RegSet args; };
Call a "fast" stub, which is a stub that preserves more registers than a normal call. It may still call C++ functions on a slow path (which is why there's a Fixup operand) but it will save any required registers before doing so.
copy { Vreg s, d; };
copy2 { Vreg64 s0, s1, d0, d1; }
;
copyargs { Vtuple s, d; };
Copies the content of s (source) to d (dest). All copies happen in parallel, meaning operand order doesn't matter when a PhysReg appears as both a src and dst.
debugtrap {};
Causes any attached debugger to trap. Process may abort if no debugger is attached.
fallthru {};
No-op, used for marking the end of a block that is intentionally going to fall-through. Only for use with Vauto.
ldimmb { Immed s; Vreg d; };
ldimml { Immed s; Vreg d; };
ldimmq { Immed64 s; Vreg d; };
ldimmqs { Immed64 s; Vreg d; }; // smashable version of ldimmq
load an immedate value without mutating status flags.
load { Vptr s; Vreg d; };
load a s value pointed by Vptr to d register.
mccall { CodeAddress target; RegSet args; };
mcprep { Vreg64 d; };
nothrow {};
phidef { Vtuple defs; };
phijmp { Vlabel target; Vtuple uses; };
phijcc { ConditionCode cc; VregSF sf; Vlabel targets[2]; Vtuple uses; };
store { Vreg s; Vptr d; };
syncpoint { Fixup fix; };
unwind { Vlabel targets[2]; };
Loop unwind/unrolling optimization (???)
vcall { CppCall call; VcallArgsId args; Vtuple d; Fixup fixup; DestType destType; bool nothrow; };
Function call, without or with exception edges, respectively. Contains information about a C++ helper call needed for lowering to different target architectures.
vinvoke { CppCall call; VcallArgsId args; Vtuple d; Vlabel targets[2]; Fixup fixup; DestType destType; bool smashable; };
vcallstub { TCA target; RegSet args; Vtuple extraArgs; Vlabel targets[2]; };
Non-smashable PHP function call with exception edges and additional integer arguments. INCOMPLETE
landingpad { bool fromPHPCall; };
countbytecode { Vreg base; VregSF sf; };
defvmsp { Vreg d; };
Used when reentering translated code after an ABI boundary, such as the beginning of a tracelet or right after a bindcall.
syncvmsp { Vreg s; };
Used right before leaving translated code for an ABI boundary, such as bindjmp or fallbackcc.
srem { Vreg s0, s1, d; };
sar { Vreg s0, s1, d; VregSF sf; };
shl { Vreg s0, s1, d; VregSF sf; };
absdbl { Vreg s, d; };
vretm { Vptr retAddr; Vptr prevFp; Vreg d; RegSet args; };
vret { Vreg retAddr; RegSet args; };
vretm: Pushes retAddr, loads prevFp into d, and executes a ret instruction.
vret: Pushes retAddr and executes a ret instruction.
leavetc { RegSet args; };
Execute a ret instruction directly, returning to enterTCHelper.
This vasm opcodes are used as architecture independent. Some instructions like leap are architecture dependend because use RIP relative register which only exists in X64/X86 machines.
The instructions bellow uses AT&T syntax. AT&T syntax is the standard on Unix-like systems and uses source before destine and, a suffix indicates the size of the operand. Operand order for binary operations are defined bellow:
op s0 s1 d => d = s1 op s0
op imm s1 d => d = s1 op imm
cmp s0 s1 => s1 cmp s0
The values of vasm instruction field can be: Immed (immediate), Vptr (memory reference) and, Vreg (Virtual Register). Instructions with i suffix are synthetic and translated as:
mov s1, d op s0, d
Where op is the operation.
Compares s1 to s0 with result in sf (Flag register)
cmpb { Vreg8 s0; Vreg8 s1; VregSF sf; };
cmpbi { Immed s0; Vreg8 s1; VregSF sf; };
cmpbim { Immed s0; Vptr s1; VregSF sf; };
cmpl { Vreg32 s0; Vreg32 s1; VregSF sf; };
cmpli { Immed s0; Vreg32 s1; VregSF sf; };
cmplim { Immed s0; Vptr s1; VregSF sf; };
cmplm { Vreg32 s0; Vptr s1; VregSF sf; };
cmpq { Vreg64 s0; Vreg64 s1; VregSF sf; };
cmpqi { Immed s0; Vreg64 s1; VregSF sf; };
cmpqim { Immed s0; Vptr s1; VregSF sf; };
cmpqims { Immed s0; Vptr s1; VregSF sf; };
cmpqm { Vreg64 s0; Vptr s1; VregSF sf; };
cmpsd { ComparisonPred pred; VregDbl s0, s1, d; };
cqo {};
Computes the bit-wise logical AND of first operand (source 1 operand) and the second operand (source 2 operand) and sets the SF, ZF, and PF status flags according to the result. The result is then discarded.
testb { Vreg8 s0, s1; VregSF sf; };
testbi { Immed s0; Vreg8 s1; VregSF sf; };
testbim { Immed s0; Vptr s1; VregSF sf; };
testwim { Immed s0; Vptr s1; VregSF sf; };
testl { Vreg32 s0, s1; VregSF sf; };
testli { Immed s0; Vreg32 s1; VregSF sf; };
testlim { Immed s0; Vptr s1; VregSF sf; };
testq { Vreg64 s0, s1; VregSF sf; };
testqm { Vreg64 s0; Vptr s1; VregSF sf; };
testqim { Immed s0; Vptr s1; VregSF sf; };
xorb { Vreg8 s0, s1, d; VregSF sf; };
xorbi { Immed s0; Vreg8 s1, d; VregSF sf; };
xorl { Vreg32 s0, s1, d; VregSF sf; };
xorq { Vreg64 s0, s1, d; VregSF sf; };
xorqi { Immed s0; Vreg64 s1, d; VregSF sf; };
andb { Vreg8 s0, s1, d; VregSF sf; };
andbi { Immed s0; Vreg8 s1, d; VregSF sf; };
andbim { Immed s; Vptr m; VregSF sf; };
andl { Vreg32 s0, s1, d; VregSF sf; };
andli { Immed s0; Vreg32 s1, d; VregSF sf; };
andq { Vreg64 s0, s1, d; VregSF sf; };
andqi { Immed s0; Vreg64 s1, d; VregSF sf; };
not { Vreg64 s, d; };
notb { Vreg8 s, d; };
orwim { Immed s0; Vptr m; VregSF sf; };
orq { Vreg64 s0, s1, d; VregSF sf; };
orqi { Immed s0; Vreg64 s1, d; VregSF sf; };
orqim { Immed s0; Vptr m; VregSF sf; };
neg { Vreg64 s, d; VregSF sf; };
shifts are att-style: s1<<{s0|ecx} => d,sf
psllq { Immed s0; VregDbl s1, d; };
psrlq { Immed s0; VregDbl s1, d; };
sarq { Vreg64 s, d; VregSF sf; }; // uses rcx
sarqi { Immed s0; Vreg64 s1, d; VregSF sf; };
shlli { Immed s0; Vreg32 s1, d; VregSF sf; };
shlq { Vreg64 s, d; VregSF sf; }; // uses rcx
shlqi { Immed s0; Vreg64 s1, d; VregSF sf; };
shrli { Immed s0; Vreg32 s1, d; VregSF sf; };
shrqi { Immed s0; Vreg64 s1, d; VregSF sf; };
Add computes the sum of two signed (s0 and s1) values and put the result in d. VregSF is the signal flag.
addli { Immed s0; Vreg32 s1, d; VregSF sf; };
addlm { Vreg32 s0; Vptr m; VregSF sf; };
addq { Vreg64 s0, s1, d; VregSF sf; };
addqi { Immed s0; Vreg64 s1, d; VregSF sf; };
addqim { Immed s0; Vptr m; VregSF sf; };
Note: The instruction addqim uses a segment register:
if (i.m.seg == Vptr::FS) a->fs(); a->addq(i.s0, i.m.mr());
Where fs() function return the prefix for segment register.
subbi { Immed s0; Vreg8 s1, d; VregSF sf; };
subl { Vreg32 s0, s1, d; VregSF sf; };
subli { Immed s0; Vreg32 s1, d; VregSF sf; };
subq { Vreg64 s0, s1, d; VregSF sf; };
subqi { Immed s0; Vreg64 s1, d; VregSF sf; };
subsd { VregDbl s0, s1, d; };
Subtracts 1 from the destination operand, while preserving the state of the CF flag. The destination operand can be a register or a memory location. The instruction’s 64-bit mode default operation size is 32 bits.
decl { Vreg32 s, d; VregSF sf; };
declm { Vptr m; VregSF sf; }; //Synthetic for memory reference
decq { Vreg64 s, d; VregSF sf; };
decqm { Vptr m; VregSF sf; }; //Synthetic for memory reference
idiv { Vreg64 s; VregSF sf; };
Note: Non-integral results are truncated (chopped) towards 0. The remainder is always less than the divisor in magnitude
imul { Vreg64 s0, s1, d; VregSF sf; };
Increments 1 from the destination operand, while preserving the state of the CF flag. The destination operand can be a register or a memory location. The instruction’s 64-bit mode default operation size is 32 bits.
incl { Vreg32 s, d; VregSF sf; };
inclm { Vptr m; VregSF sf; }; //Synthetic for memory reference
incq { Vreg64 s, d; VregSF sf; };
incqm { Vptr m; VregSF sf; }; //Synthetic for memory reference
incqmlock { Vptr m; VregSF sf; };
Uses LOCK prefix to allow the instruction to be executed atomically.
incwm { Vptr m; VregSF sf; };
Increments a word in a memmory. This operations is promoted to 64 bits.
Computes the effective address of the second operand (the source operand) and stores it in the first operand (destination operand). The source operand is a memory address (offset part) specified with one of the processors addressing modes; the destination operand is a general-purpose register. The address-size and operand-size attributes affect the action performed by this instruction, as shown in the following table. The operand-size attribute of the instruction is determined by the chosen register; the address-size attribute is determined by the attribute of the code segment.
lea { Vptr s; Vreg64 d; };
OPERAND SIZE PREFIX DEST.
16 32 67H. LOWER 16-bit (66H prefix)
16 64 DEFAULT LOWER 16-bit (66H prefix)
32 32 67H LOWER 32-bit
32 64 DEFAULT LOWER 32-bit
64 32 67H 64-bit (zero-extended)
64 64 DEFAULT 64-bits
leap { RIPRelativeRef s; Vreg64 d; };
Same as leap but uses RIP Relative
Copies the second operand (source operand) to the first operand (destination operand). The source operand can be an immediate value, general-purpose register, segment register, or memory location; the destination register can be a general-purpose register, segment register, or memory location. Both operands must be the same size, which can be a byte, a word, a doubleword, or a quadword.
movb { Vreg8 s, d; };
movl { Vreg32 s, d; };
// Move zero-extended s to d.
movzbl { Vreg8 s; Vreg32 d; };
movzbq { Vreg8 s; Vreg64 d; };
// Move truncated s to d.
movtqb { Vreg64 s; Vreg8 d; };
movtql { Vreg64 s; Vreg32 d; };
Synthetics:
loadups { Vptr s; Vreg128 d; };
loadtqb { Vptr s; Vreg8 d; };
loadl { Vptr s; Vreg32 d; };
loadqp { RIPRelativeRef s; Vreg64 d; };
loadsd { Vptr s; VregDbl d; };
loadzbl { Vptr s; Vreg32 d; };
loadzbq { Vptr s; Vreg64 d; };
loadzlq { Vptr s; Vreg64 d; };
storeb { Vreg8 s; Vptr m; };
storebi { Immed s; Vptr m; };
storeups { Vreg128 s; Vptr m; };
storel { Vreg32 s; Vptr m; };
storeli { Immed s; Vptr m; };
storeqi { Immed s; Vptr m; };
storesd { VregDbl s; Vptr m; }; //Float-point
storew { Vreg16 s; Vptr m; };
storewi { Immed s; Vptr m; };
cloadq { ConditionCode cc; VregSF sf; Vreg64 f; Vptr t; Vreg64 d; };
Implements the equivalent of: t1 = load t d = condition ? t1 : f Note that t is unconditionally dereferenced.
This is translated somewhat as: loadq t, d cmov f, d
If d is equal base pointer m and f is not is not the destiny register. We can't move f over d or we'll clobber the Vptr we need to load from. Since cload does the load unconditionally anyway, we can just load and cmov. Id d is different from Vptr we can just movq f,d and cmov.
cmovq { ConditionCode cc; VregSF sf; Vreg64 f, t, d; };
Implements d = condition ? t : f. Sythetic instruction
Transfers program control to a different point in the instruction stream without recording return information. The destination (target) operand specifies the address of the instruction being jumped to. This operand can be an immediate value, a general-purpose register, or a memory location. This instruction can be used to execute four different types of jumps:
• Near jump—A jump to an instruction within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment jump.
• Short jump—A near jump where the jump range is limited to –128 to +127 from the current EIP value.
• Far jump—A jump to an instruction located in a different segment than the current code segment but at the same privilege level, sometimes referred to as an intersegment jump.
• Task switch—A jump to an instruction located in a different task.
Near and Short Jumps. When executing a near jump, the processor jumps to the address (within the current code segment) that is specified with the target operand. The target operand specifies either an absolute offset (that is an offset from the base of the code segment) or a relative offset (a signed displacement relative to the current value of the instruction pointer in the EIP register). A near jump to a relative offset of 8-bits (rel8) is referred to as a short jump. The CS register is not changed on near and short jumps. An absolute offset is specified indirectly in a general-purpose register or a memory location (r/m16 or r/m32). The operand-size attribute determines the size of the target operand (16 or 32 bits). Absolute offsets are loaded directly into the EIP register. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared, resulting in a maximum instruction pointer size of 16 bits. A relative offset (rel8, rel16, or rel32) is generally specified as a label in assembly code, but at the machine code level, it is encoded as a signed 8-, 16-, or 32-bit immediate value. This value is added to the value in the EIP register. (Here, the EIP register contains the address of the instruction following the JMP instruction). When using relative offsets, the opcode (for short vs. near jumps) and the operand-size attribute (for near relative jumps) determines the size of the target operand (8, 16, or 32 bits).
Far Jump In 64-Bit Mode — The instruction’s operation size is fixed at 64 bits. If a selector points to a gate, then RIP equals the 64-bit displacement taken from gate; else RIP equals the zero-extended offset from the far pointer referenced in the instruction.
jcc { ConditionCode cc; VregSF sf; Vlabel targets[2]; };
Note: The Jcc instruction does not support far jumps (jumps to other code segments). When the target for the conditional jump is in a different segment, use the opposite condition from the condition being tested for the Jcc instruction, and then access the target with an unconditional far jump (JMP instruction) to the other segment. For example, the following conditional far jump is illegal:
JZ FARLABEL;
To accomplish this far jump, use the following two instructions:
JNZ BEYOND; JMP FARLABEL; BEYOND:
Synthetics:
jcci { ConditionCode cc; VregSF sf; Vlabel target; TCA taken; };
jmp { Vlabel target; };
Synthetics:
jmpr { Vreg64 target; RegSet args; }; //Synthetic jmp
jmpm { Vptr target; RegSet args; }; //Synthetic jmp
jmpi { TCA target; RegSet args; }; //Synthetic jmp
Increments (pop) or decrement (push) the stack pointer (VmSp)
pop { Vreg64 d; };
push { Vreg64 s; };
Loads the value from the top of the stack to the location specified with the destination operand (or explicit opcode) and then increments the stack pointer. The destination operand can be a general-purpose register, memory location, or segment register. Address and operand sizes are determined and used as follows:
• Address size. The D flag in the current code-segment descriptor determines the default address size; it may be overridden by an instruction prefix (67H). The address size is used only when writing to a destination operand in memory.
• Operand size. The D flag in the current code-segment descriptor determines the default operand size; it may be overridden by instruction prefixes (66H). The operand size (16, 32, or 64 bits) determines the amount by which the stack pointer is incremented/decremented (2, 4 or 8).
• Stack-address size. Outside of 64-bit mode, the B flag in the current stack-segment descriptor determines the size of the stack pointer (16 or 32 bits); in 64-bit mode, the size of the stack pointer is always 64 bits. The stack-address size determines the width of the stack pointer when reading from the stack in memory and when incrementing the stack pointer. (As stated above, the amount by which the stack pointer is incremented/decremented is determined by the operand size.)
Sythetics:
popm { Vptr d; };
Saves procedure linking information on the stack and branches to the called procedure specified using the target operand. The target operand specifies the address of the first instruction in the called procedure. The operand can be an immediate value, a general-purpose register, or a memory location.
call { CodeAddress target; RegSet args; };
callm { Vptr target; RegSet args; }; //synthetic for call instruction
callr { Vreg64 target; RegSet args; }; //synthetic for call instruction
ARM only implements callr vasm. callr is tranlated to a blr instruction (branch and link register).
Transfers program control to a return address located on the top of the stack. The address is usually placed on the stack by a CALL instruction, and the return is made to the instruction that follows the CALL instruction. The optional source operand specifies the number of stack bytes to be released after the return address is popped; the default is none. This operand can be used to release parameters from the stack that were passed to the called procedure and are no longer needed. It must be used when the CALL instruction used to switch to a new procedure uses a call gate with a non-zero word count to access the new procedure. Here, the source operand for the RET instruction must specify the same number of bytes as is specified in the word count field of the call gate. The RET instruction can be used to execute three different types of returns:
-
Near return — A return to a calling procedure within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment return.
-
Far return — A return to a calling procedure located in a different segment than the current code segment, sometimes referred to as an intersegment return.
-
Inter-privilege-level far return — A far return to a different privilege level than that of the currently executing program or procedure.
When executing a near return, the processor pops the return instruction pointer (offset) from the top of the stack into the EIP register and begins program execution at the new instruction pointer. The CS register is unchanged. When executing a far return, the processor pops the return instruction pointer from the top of the stack into the EIP register, then pops the segment selector from the top of the stack into the CS register. The processor then begins program execution in the new code segment at the new instruction pointer.
ret { RegSet args; };
ud2 {};
Convert with Truncation Scalar Double-Precision FP Value to Signed Integer (quad)
cvttsd2siq { VregDbl s; Vreg64 d; };
Convert Dword Integer to Scalar Double-Precision FP Value
cvtsi2sd { Vreg64 s; VregDbl d; };
Synthetic instruction using memory reference:
cvtsi2sdm { Vptr s; VregDbl d; };
nop {};
Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register.
setcc { ConditionCode cc; VregSF sf; Vreg8 d; }
;
addsd { VregDbl s0, s1, d; };
The instruction addsd adds the low double-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the double-precision floating-point result in the destination operand. The source operand can be an XMM register or a 64-bit memory location. The destination operand is an XMM register. The high quadword [64..127] of the destination operand remains unchanged.
unpcklpd { VregDbl s0, s1; Vreg128 d; };
Performs an interleaved unpack of the low double-precision floating-point values from the source operand (second operand) and the destination operand (first operand). The source operand can be an XMM register or a 128-bit memory location; the destination operand is an XMM register.
Divides the low double-precision floating-point value in the first source operand by the low double-precision floating-point value in the second source operand, and stores the double-precision floating-point result in the destination operand. The second source operand can be an XMM register or a 64-bit memory location. The first source and destination hyperons are XMM registers. The high quadword of the destination operand is copied from the high quadword of the first source operand.
divsd { VregDbl s0, s1, d; };
mulsd { VregDbl s0, s1, d; };
Round the DP FP value in the lower qword of the source operand (second operand) using the rounding mode specified in the immediate operand (third operand) and place the result in the destination operand (first operand). The rounding process rounds a double-precision floating-point input to an integer value and returns the integer result as a double precision floating-point value in the lowest position. The upper double precision floating-point value in the destination is retained.
roundsd { RoundDirection dir; VregDbl s, d; };
sqrtsd { VregDbl s, d; };
Performs an unordered compare of the double-precision floating-point values in the low quadwords of source operand 1 (first operand) and source operand 2 (second operand), and sets the ZF, PF, and CF flags in the EFLAGS register according to the result (unordered, greater than, less than, or equal). The OF, SF and AF flags in the EFLAGS register are set to 0. The unordered result is returned if either source operand is a NaN (QNaN or SNaN). The sign of zero is ignored for comparisons, so that –0.0 is equal to +0.0. Source operand 1 is an XMM register; source operand 2 can be an XMM register or a 64 bit memory location.
ucomisd { VregDbl s0, s1; VregSF sf; };
hcsync { Fixup fix; Vpoint call; };
hcnocatch { Vpoint call; };
hcunwind { Vpoint call; Vlabel targets[2]; };
brk { uint16_t code; };
hostcall { RegSet args; uint8_t argc; Vpoint syncpoint; };
This instruction is translate as a hlt instruction. The HLT instruction causes the processor to enter Debug state if Halting debug-mode is enabled.
cbcc { vixl::Condition cc; Vreg64 s; Vlabel targets[2]; };
Can be tranlated into cbz or cbnz. Compare and Branch on Zero, Compare and Branch on Non-Zero.
tbcc { vixl::Condition cc; unsigned bit; Vreg64 s; Vlabel targets[2]; };
Can be translated into tbz or tbnz.
Test bit and branch if zero(or non zero) to a label at a PC-relative offset,
without affecting the condition flags, and with a hint that this is not a
subroutine call or return.
This instruction is used by the alias LSL (register).
lslv { Vreg64 sl, sr, d; };
This instruction is used by the alias ASR (register).
asrv { Vreg64 sl, sr, d; };
Why not reuse imul?
mul { Vreg64 s0, s1, d; };