VASM Instruction Set Summary

1. Introduction

VASM stands for Virtual Assembler and it's a “agnostic machine code” instructions. And is explained here

While a lot of assemblers might be concerned about moving values into appropriate registers and issuing type/size-specific cpu instructions, this form stops slightly short and focuses on the job at hand with arguments and addresses being relegated to “virtual” registers and instructions. For example, Vasm code to test the result value of a previous expression, and jump on a non-zero value might look like the following:

v << cmp{ inst->src(0), 0 };

v << jcc{ CC_E, {inst->next()} };

We don’t need to care, at this point, what register inst->src(0) refers to (e.g. rax, ebx ...), only that it is the same register which came from some prior instruction’s ->dest() (destination). Similarly, we’re not concerned with how equality is checked. On x86 this might be done by testing the Zero flag, on others it may be by checking the negation of a “Difference” flag. Lastly, we don’t need to know where the next instruction’s entry point is at. It could be one byte ahead (meaning no jump need be performed at all), 64 bytes on (calling for a short jump), or at the other end of our memory model entirely (necessitating a long jump). The next, and final step of compilation will take care of all of these details for us.

Conventions

The instructions follow a suffix convention for X86:

b 8-bit
w 16-bit
l 32-bit
q 64-bit
i immediate
m Vptr
p RIPRelativeRef
s Smashable

However, ARM implement some specific VASM opcodes who's not seens to not follow this convention. I strong recommend you to follow this convention if we need to implement vasm specific opcodes. VASM specific opcodes can be use internally. The VASM->HHIR relationship must be architecture independent. Some specfic vasm can be implemented to replaces inefficient computations.

Instruction Fields

Vreg: Virtual Register (8, 16, 32, 64 bits)

Vreg64: Virtual Register (64)

Vtuple: Virtual Tuple Data Structure (a pair of Vreg)

Immed: Immediate value (Signal Extended)

Immed64: Immediate value (64 bit extended)

VregSF: Flag register used by x64

Vptr: It's a pointer to a Vreg permits to access register like a pointer overload operators [] and *()

RIPRelativeRef: It's RIP register used by x64

Vlabel: Data structure used to wraps a block number.

Vpoint: Is a handle to record or retrieve a code address.

Fixup:

The Fixup map allows us to reconstruct the state of the VM registers (fp, sp, and pc) from an up-stack invocation record. Each range of bytes in the translation cache is associated with a "distance" in both stack cells and opcode bytes from the beginning of the function. These are known at translation time.

The way this works is by chasing the native rbp chain to find a rbp that we know is a VM frame (i.e. is actually a full ActRec). Once we find that, regsFromActRec is called, which looks to see if the return ip for the frame before the VM frame has an entry in the fixup map (i.e. if it points into the translation cache)---if so, it finds the fixup information in one of two ways:

Fixup: the normal case.

The Fixup record just stores an offset relative to the ActRec* for vmsp, and an offset from the start of the func for pc. In the case of resumable frames the sp offset is relative to Stack::resumableStackBase.
IndirectFixup: this is used for some shared stubs in the TC.

In this case, some JIT'd code associated with the ActRec* we found made a call to a shared stub, and then that stub called C++. The IndirectFixup record stores an offset to the saved frame pointer two levels deeper in C++, that says where the return IP for the call to the shared stub can be found. I.e., we're trying to chase back two return ips into the TC.

Note that this means IndirectFixups will not work for C++ code paths that need to do a fixup without making at least one other C++ call, but for the current use case this is fine.

2. VASM Intrinsic Instructions

Call Fast Stub

callfaststub { TCA target; Fixup fix; RegSet args; };

Call a "fast" stub, which is a stub that preserves more registers than a normal call. It may still call C++ functions on a slow path (which is why there's a Fixup operand) but it will save any required registers before doing so.

Copy

copy { Vreg s, d; };

copy2 { Vreg64 s0, s1, d0, d1; };

copyargs { Vtuple s, d; };

Copies the content of s (source) to d (dest). All copies happen in parallel, meaning operand order doesn't matter when a PhysReg appears as both a src and dst.

Debug Trap

debugtrap {};

Causes any attached debugger to trap. Process may abort if no debugger is attached.

Fallthrough

fallthru {};

No-op, used for marking the end of a block that is intentionally going to fall-through. Only for use with Vauto.

Load Intrinsics Instructions

ldimmb { Immed s; Vreg d; };

ldimml { Immed s; Vreg d; };

ldimmq { Immed64 s; Vreg d; };

ldimmqs { Immed64 s; Vreg d; }; // smashable version of ldimmq

load an immedate value without mutating status flags.

load { Vptr s; Vreg d; };

load a s value pointed by Vptr to d register.

MC Generator Call

mccall { CodeAddress target; RegSet args; };

mcprep { Vreg64 d; };

nothrow {};

phidef { Vtuple defs; };

phijmp { Vlabel target; Vtuple uses; };

phijcc { ConditionCode cc; VregSF sf; Vlabel targets[2]; Vtuple uses; };

store { Vreg s; Vptr d; };

syncpoint { Fixup fix; };

Unwind

unwind { Vlabel targets[2]; }; Loop unwind/unrolling optimization (???)

Virtual Call

vcall { CppCall call; VcallArgsId args; Vtuple d; Fixup fixup; DestType destType; bool nothrow; };

Function call, without or with exception edges, respectively. Contains information about a C++ helper call needed for lowering to different target architectures.

Virtual Invoke

vinvoke { CppCall call; VcallArgsId args; Vtuple d; Vlabel targets[2]; Fixup fixup; DestType destType; bool smashable; };

Virtual Call Stub

vcallstub { TCA target; RegSet args; Vtuple extraArgs; Vlabel targets[2]; };

Non-smashable PHP function call with exception edges and additional integer arguments. INCOMPLETE

landingpad { bool fromPHPCall; };

countbytecode { Vreg base; VregSF sf; };

Copy rVmSp into d.

defvmsp { Vreg d; };

Used when reentering translated code after an ABI boundary, such as the beginning of a tracelet or right after a bindcall.

Copy s into rVmSp.

Sync Virtual Stack

syncvmsp { Vreg s; };

Used right before leaving translated code for an ABI boundary, such as bindjmp or fallbackcc.

srem { Vreg s0, s1, d; };

sar { Vreg s0, s1, d; VregSF sf; };

shl { Vreg s0, s1, d; VregSF sf; };

absdbl { Vreg s, d; };

Virtual Ret Instruction

vretm { Vptr retAddr; Vptr prevFp; Vreg d; RegSet args; };

vret { Vreg retAddr; RegSet args; };

vretm: Pushes retAddr, loads prevFp into d, and executes a ret instruction.

vret: Pushes retAddr and executes a ret instruction.

Leave TC

leavetc { RegSet args; };

Execute a ret instruction directly, returning to enterTCHelper.

3. X64 Vasm instructions

This vasm opcodes are used as architecture independent. Some instructions like leap are architecture dependend because use RIP relative register which only exists in X64/X86 machines.

The instructions bellow uses AT&T syntax. AT&T syntax is the standard on Unix-like systems and uses source before destine and, a suffix indicates the size of the operand. Operand order for binary operations are defined bellow:

op s0 s1 d => d = s1 op s0

op imm s1 d => d = s1 op imm

cmp s0 s1 => s1 cmp s0

The values of vasm instruction field can be: Immed (immediate), Vptr (memory reference) and, Vreg (Virtual Register). Instructions with i suffix are synthetic and translated as:

mov s1, d op s0, d

Where op is the operation.

3.1 Logical Instructions

Compare Instructions

Compares s1 to s0 with result in sf (Flag register)

cmpb { Vreg8 s0; Vreg8 s1; VregSF sf; };

cmpbi { Immed s0; Vreg8 s1; VregSF sf; };

cmpbim { Immed s0; Vptr s1; VregSF sf; };

cmpl { Vreg32 s0; Vreg32 s1; VregSF sf; };

cmpli { Immed s0; Vreg32 s1; VregSF sf; };

cmplim { Immed s0; Vptr s1; VregSF sf; };

cmplm { Vreg32 s0; Vptr s1; VregSF sf; };

cmpq { Vreg64 s0; Vreg64 s1; VregSF sf; };

cmpqi { Immed s0; Vreg64 s1; VregSF sf; };

cmpqim { Immed s0; Vptr s1; VregSF sf; };

cmpqims { Immed s0; Vptr s1; VregSF sf; };

cmpqm { Vreg64 s0; Vptr s1; VregSF sf; };

cmpsd { ComparisonPred pred; VregDbl s0, s1, d; };

cqo {};

Logical Compare

Computes the bit-wise logical AND of first operand (source 1 operand) and the second operand (source 2 operand) and sets the SF, ZF, and PF status flags according to the result. The result is then discarded.

testb { Vreg8 s0, s1; VregSF sf; };

testbi { Immed s0; Vreg8 s1; VregSF sf; };

testbim { Immed s0; Vptr s1; VregSF sf; };

testwim { Immed s0; Vptr s1; VregSF sf; };

testl { Vreg32 s0, s1; VregSF sf; };

testli { Immed s0; Vreg32 s1; VregSF sf; };

testlim { Immed s0; Vptr s1; VregSF sf; };

testq { Vreg64 s0, s1; VregSF sf; };

testqm { Vreg64 s0; Vptr s1; VregSF sf; };

testqim { Immed s0; Vptr s1; VregSF sf; };

XOR Instructions

xorb { Vreg8 s0, s1, d; VregSF sf; };

xorbi { Immed s0; Vreg8 s1, d; VregSF sf; };

xorl { Vreg32 s0, s1, d; VregSF sf; };

xorq { Vreg64 s0, s1, d; VregSF sf; };

xorqi { Immed s0; Vreg64 s1, d; VregSF sf; };

And Instructions

andb { Vreg8 s0, s1, d; VregSF sf; };

andbi { Immed s0; Vreg8 s1, d; VregSF sf; };

andbim { Immed s; Vptr m; VregSF sf; };

andl { Vreg32 s0, s1, d; VregSF sf; };

andli { Immed s0; Vreg32 s1, d; VregSF sf; };

andq { Vreg64 s0, s1, d; VregSF sf; };

andqi { Immed s0; Vreg64 s1, d; VregSF sf; };

Not Instructions

not { Vreg64 s, d; };

notb { Vreg8 s, d; };

Or Instructions

orwim { Immed s0; Vptr m; VregSF sf; };

orq { Vreg64 s0, s1, d; VregSF sf; };

orqi { Immed s0; Vreg64 s1, d; VregSF sf; };

orqim { Immed s0; Vptr m; VregSF sf; };

Negate Instruction

neg { Vreg64 s, d; VregSF sf; };

Shift Instructions

shifts are att-style: s1<<{s0|ecx} => d,sf

psllq { Immed s0; VregDbl s1, d; };

psrlq { Immed s0; VregDbl s1, d; };

sarq { Vreg64 s, d; VregSF sf; }; // uses rcx

sarqi { Immed s0; Vreg64 s1, d; VregSF sf; };

shlli { Immed s0; Vreg32 s1, d; VregSF sf; };

shlq { Vreg64 s, d; VregSF sf; }; // uses rcx

shlqi { Immed s0; Vreg64 s1, d; VregSF sf; };

shrli { Immed s0; Vreg32 s1, d; VregSF sf; };

shrqi { Immed s0; Vreg64 s1, d; VregSF sf; };

3.2 Arithmetic Instructions

Add instructions

Add computes the sum of two signed (s0 and s1) values and put the result in d. VregSF is the signal flag.

addli { Immed s0; Vreg32 s1, d; VregSF sf; };

addlm { Vreg32 s0; Vptr m; VregSF sf; };

addq { Vreg64 s0, s1, d; VregSF sf; };

addqi { Immed s0; Vreg64 s1, d; VregSF sf; };

addqim { Immed s0; Vptr m; VregSF sf; };

Note: The instruction addqim uses a segment register:

if (i.m.seg == Vptr::FS) a->fs(); a->addq(i.s0, i.m.mr());

Where fs() function return the prefix for segment register.

Subtraction

subbi { Immed s0; Vreg8 s1, d; VregSF sf; };

subl { Vreg32 s0, s1, d; VregSF sf; };

subli { Immed s0; Vreg32 s1, d; VregSF sf; };

subq { Vreg64 s0, s1, d; VregSF sf; };

subqi { Immed s0; Vreg64 s1, d; VregSF sf; };

subsd { VregDbl s0, s1, d; };

Decrement instructions

Subtracts 1 from the destination operand, while preserving the state of the CF flag. The destination operand can be a register or a memory location. The instruction’s 64-bit mode default operation size is 32 bits.

decl { Vreg32 s, d; VregSF sf; };

declm { Vptr m; VregSF sf; }; //Synthetic for memory reference

decq { Vreg64 s, d; VregSF sf; };

decqm { Vptr m; VregSF sf; }; //Synthetic for memory reference

Signed Divide

idiv { Vreg64 s; VregSF sf; };

Note: Non-integral results are truncated (chopped) towards 0. The remainder is always less than the divisor in magnitude

Signed Multiplication

imul { Vreg64 s0, s1, d; VregSF sf; };

Increment Instructions.

Increments 1 from the destination operand, while preserving the state of the CF flag. The destination operand can be a register or a memory location. The instruction’s 64-bit mode default operation size is 32 bits.

incl { Vreg32 s, d; VregSF sf; };

inclm { Vptr m; VregSF sf; }; //Synthetic for memory reference

incq { Vreg64 s, d; VregSF sf; };

incqm { Vptr m; VregSF sf; }; //Synthetic for memory reference

incqmlock { Vptr m; VregSF sf; };

Uses LOCK prefix to allow the instruction to be executed atomically.

incwm { Vptr m; VregSF sf; };

Increments a word in a memmory. This operations is promoted to 64 bits.

3.4 Load/Store Instructions

Load Effective Address

Computes the effective address of the second operand (the source operand) and stores it in the first operand (destination operand). The source operand is a memory address (offset part) specified with one of the processors addressing modes; the destination operand is a general-purpose register. The address-size and operand-size attributes affect the action performed by this instruction, as shown in the following table. The operand-size attribute of the instruction is determined by the chosen register; the address-size attribute is determined by the attribute of the code segment.

lea { Vptr s; Vreg64 d; };

OPERAND SIZE PREFIX DEST.

16 32 67H. LOWER 16-bit (66H prefix)

16 64 DEFAULT LOWER 16-bit (66H prefix)

32 32 67H LOWER 32-bit

32 64 DEFAULT LOWER 32-bit

64 32 67H 64-bit (zero-extended)

64 64 DEFAULT 64-bits

leap { RIPRelativeRef s; Vreg64 d; };

Same as leap but uses RIP Relative

Move Instructions

Copies the second operand (source operand) to the first operand (destination operand). The source operand can be an immediate value, general-purpose register, segment register, or memory location; the destination register can be a general-purpose register, segment register, or memory location. Both operands must be the same size, which can be a byte, a word, a doubleword, or a quadword.

movb { Vreg8 s, d; };

movl { Vreg32 s, d; };

// Move zero-extended s to d.

movzbl { Vreg8 s; Vreg32 d; };

movzbq { Vreg8 s; Vreg64 d; };

// Move truncated s to d. movtqb { Vreg64 s; Vreg8 d; };

movtql { Vreg64 s; Vreg32 d; };

Synthetics:

loadups { Vptr s; Vreg128 d; };

loadtqb { Vptr s; Vreg8 d; };

loadl { Vptr s; Vreg32 d; };

loadqp { RIPRelativeRef s; Vreg64 d; };

loadsd { Vptr s; VregDbl d; };

loadzbl { Vptr s; Vreg32 d; };

loadzbq { Vptr s; Vreg64 d; };

loadzlq { Vptr s; Vreg64 d; };

storeb { Vreg8 s; Vptr m; };

storebi { Immed s; Vptr m; };

storeups { Vreg128 s; Vptr m; };

storel { Vreg32 s; Vptr m; };

storeli { Immed s; Vptr m; };

storeqi { Immed s; Vptr m; };

storesd { VregDbl s; Vptr m; }; //Float-point

storew { Vreg16 s; Vptr m; };

storewi { Immed s; Vptr m; };

Conditional Load

cloadq { ConditionCode cc; VregSF sf; Vreg64 f; Vptr t; Vreg64 d; };

Implements the equivalent of: t1 = load t d = condition ? t1 : f Note that t is unconditionally dereferenced.

This is translated somewhat as: loadq t, d cmov f, d

If d is equal base pointer m and f is not is not the destiny register. We can't move f over d or we'll clobber the Vptr we need to load from. Since cload does the load unconditionally anyway, we can just load and cmov. Id d is different from Vptr we can just movq f,d and cmov.

Move conditionally

cmovq { ConditionCode cc; VregSF sf; Vreg64 f, t, d; };

Implements d = condition ? t : f. Sythetic instruction

3.5 Branch Instructions

Transfers program control to a different point in the instruction stream without recording return information. The destination (target) operand specifies the address of the instruction being jumped to. This operand can be an immediate value, a general-purpose register, or a memory location. This instruction can be used to execute four different types of jumps:

• Near jump—A jump to an instruction within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment jump.

• Short jump—A near jump where the jump range is limited to –128 to +127 from the current EIP value.

• Far jump—A jump to an instruction located in a different segment than the current code segment but at the same privilege level, sometimes referred to as an intersegment jump.

• Task switch—A jump to an instruction located in a different task.

Near and Short Jumps. When executing a near jump, the processor jumps to the address (within the current code segment) that is specified with the target operand. The target operand specifies either an absolute offset (that is an offset from the base of the code segment) or a relative offset (a signed displacement relative to the current value of the instruction pointer in the EIP register). A near jump to a relative offset of 8-bits (rel8) is referred to as a short jump. The CS register is not changed on near and short jumps. An absolute offset is specified indirectly in a general-purpose register or a memory location (r/m16 or r/m32). The operand-size attribute determines the size of the target operand (16 or 32 bits). Absolute offsets are loaded directly into the EIP register. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared, resulting in a maximum instruction pointer size of 16 bits. A relative offset (rel8, rel16, or rel32) is generally specified as a label in assembly code, but at the machine code level, it is encoded as a signed 8-, 16-, or 32-bit immediate value. This value is added to the value in the EIP register. (Here, the EIP register contains the address of the instruction following the JMP instruction). When using relative offsets, the opcode (for short vs. near jumps) and the operand-size attribute (for near relative jumps) determines the size of the target operand (8, 16, or 32 bits).

Far Jump In 64-Bit Mode — The instruction’s operation size is fixed at 64 bits. If a selector points to a gate, then RIP equals the 64-bit displacement taken from gate; else RIP equals the zero-extended offset from the far pointer referenced in the instruction.

Jump if Condition Is Met

jcc { ConditionCode cc; VregSF sf; Vlabel targets[2]; };

Note: The Jcc instruction does not support far jumps (jumps to other code segments). When the target for the conditional jump is in a different segment, use the opposite condition from the condition being tested for the Jcc instruction, and then access the target with an unconditional far jump (JMP instruction) to the other segment. For example, the following conditional far jump is illegal:

JZ FARLABEL;

To accomplish this far jump, use the following two instructions:

JNZ BEYOND; JMP FARLABEL; BEYOND:

Synthetics: jcci { ConditionCode cc; VregSF sf; Vlabel target; TCA taken; };

Jump

jmp { Vlabel target; };

Synthetics:

jmpr { Vreg64 target; RegSet args; }; //Synthetic jmp jmpm { Vptr target; RegSet args; }; //Synthetic jmp jmpi { TCA target; RegSet args; }; //Synthetic jmp

3.6 Stack Instructions

Push/Pop instructions

Increments (pop) or decrement (push) the stack pointer (VmSp)

pop { Vreg64 d; };

push { Vreg64 s; };

Loads the value from the top of the stack to the location specified with the destination operand (or explicit opcode) and then increments the stack pointer. The destination operand can be a general-purpose register, memory location, or segment register. Address and operand sizes are determined and used as follows:

• Address size. The D flag in the current code-segment descriptor determines the default address size; it may be overridden by an instruction prefix (67H). The address size is used only when writing to a destination operand in memory.

• Operand size. The D flag in the current code-segment descriptor determines the default operand size; it may be overridden by instruction prefixes (66H). The operand size (16, 32, or 64 bits) determines the amount by which the stack pointer is incremented/decremented (2, 4 or 8).

• Stack-address size. Outside of 64-bit mode, the B flag in the current stack-segment descriptor determines the size of the stack pointer (16 or 32 bits); in 64-bit mode, the size of the stack pointer is always 64 bits. The stack-address size determines the width of the stack pointer when reading from the stack in memory and when incrementing the stack pointer. (As stated above, the amount by which the stack pointer is incremented/decremented is determined by the operand size.)

Sythetics:

popm { Vptr d; };

3.7 Function Calling Instructions

Call instructions

Saves procedure linking information on the stack and branches to the called procedure specified using the target operand. The target operand specifies the address of the first instruction in the called procedure. The operand can be an immediate value, a general-purpose register, or a memory location.

call { CodeAddress target; RegSet args; };

callm { Vptr target; RegSet args; }; //synthetic for call instruction

callr { Vreg64 target; RegSet args; }; //synthetic for call instruction

ARM only implements callr vasm. callr is tranlated to a blr instruction (branch and link register).

Return from Procedure

Transfers program control to a return address located on the top of the stack. The address is usually placed on the stack by a CALL instruction, and the return is made to the instruction that follows the CALL instruction. The optional source operand specifies the number of stack bytes to be released after the return address is popped; the default is none. This operand can be used to release parameters from the stack that were passed to the called procedure and are no longer needed. It must be used when the CALL instruction used to switch to a new procedure uses a call gate with a non-zero word count to access the new procedure. Here, the source operand for the RET instruction must specify the same number of bytes as is specified in the word count field of the call gate. The RET instruction can be used to execute three different types of returns:

Near return — A return to a calling procedure within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment return.
Far return — A return to a calling procedure located in a different segment than the current code segment, sometimes referred to as an intersegment return.
Inter-privilege-level far return — A far return to a different privilege level than that of the currently executing program or procedure.

When executing a near return, the processor pops the return instruction pointer (offset) from the top of the stack into the EIP register and begins program execution at the new instruction pointer. The CS register is unchanged. When executing a far return, the processor pops the return instruction pointer from the top of the stack into the EIP register, then pops the segment selector from the top of the stack into the CS register. The processor then begins program execution in the new code segment at the new instruction pointer.

ret { RegSet args; };

Other Instructions

Undefined Instruction

ud2 {};

Instructions for 64 bit conversion.

Convert with Truncation Scalar Double-Precision FP Value to Signed Integer (quad)

cvttsd2siq { VregDbl s; Vreg64 d; };

Convert Dword Integer to Scalar Double-Precision FP Value

cvtsi2sd { Vreg64 s; VregDbl d; };

Synthetic instruction using memory reference: cvtsi2sdm { Vptr s; VregDbl d; };

No-Operation Instruction

nop {};

Set Byte on Condition

Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register.

setcc { ConditionCode cc; VregSF sf; Vreg8 d; };

3.8 Float Point Instructions

Add Scalar Double

addsd { VregDbl s0, s1, d; };

The instruction addsd adds the low double-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the double-precision floating-point result in the destination operand. The source operand can be an XMM register or a 64-bit memory location. The destination operand is an XMM register. The high quadword [64..127] of the destination operand remains unchanged.

Unpack and Interleave Low Packed Double-Precision Floating-Point Values

unpcklpd { VregDbl s0, s1; Vreg128 d; };

Performs an interleaved unpack of the low double-precision floating-point values from the source operand (second operand) and the destination operand (first operand). The source operand can be an XMM register or a 128-bit memory location; the destination operand is an XMM register.

Divide Scalar Double-Precision Floating-Point Values

Divides the low double-precision floating-point value in the first source operand by the low double-precision floating-point value in the second source operand, and stores the double-precision floating-point result in the destination operand. The second source operand can be an XMM register or a 64-bit memory location. The first source and destination hyperons are XMM registers. The high quadword of the destination operand is copied from the high quadword of the first source operand.

divsd { VregDbl s0, s1, d; };

Multiply Scalar Double-Precision Floating-Point Values

mulsd { VregDbl s0, s1, d; };

Round Scalar Double Precision Floating-Point Values

Round the DP FP value in the lower qword of the source operand (second operand) using the rounding mode specified in the immediate operand (third operand) and place the result in the destination operand (first operand). The rounding process rounds a double-precision floating-point input to an integer value and returns the integer result as a double precision floating-point value in the lowest position. The upper double precision floating-point value in the destination is retained.

roundsd { RoundDirection dir; VregDbl s, d; };

Compute Square Root of Scalar Double-Precision Floating-Point Value

sqrtsd { VregDbl s, d; };

Unordered Compare Scalar Double-Precision Floating-Point Values and Set EFLAGS

Performs an unordered compare of the double-precision floating-point values in the low quadwords of source operand 1 (first operand) and source operand 2 (second operand), and sets the ZF, PF, and CF flags in the EFLAGS register according to the result (unordered, greater than, less than, or equal). The OF, SF and AF flags in the EFLAGS register are set to 0. The unordered result is returned if either source operand is a NaN (QNaN or SNaN). The sign of zero is ignored for comparisons, so that –0.0 is equal to +0.0. Source operand 1 is an XMM register; source operand 2 can be an XMM register or a 64 bit memory location.

ucomisd { VregDbl s0, s1; VregSF sf; };

4. ARM specific vasm instructions.

4.1 Instrinsics

hcsync { Fixup fix; Vpoint call; };

hcnocatch { Vpoint call; };

hcunwind { Vpoint call; Vlabel targets[2]; };

4.2 Instructions

ARM Debug Breakpoint

brk { uint16_t code; };

ARM HostCall

hostcall { RegSet args; uint8_t argc; Vpoint syncpoint; };

This instruction is translate as a hlt instruction. The HLT instruction causes the processor to enter Debug state if Halting debug-mode is enabled.

ARM Compare and Branch

cbcc { vixl::Condition cc; Vreg64 s; Vlabel targets[2]; };

Can be tranlated into cbz or cbnz. Compare and Branch on Zero, Compare and Branch on Non-Zero.

ARM Test and Branch

tbcc { vixl::Condition cc; unsigned bit; Vreg64 s; Vlabel targets[2]; }; Can be translated into tbz or tbnz. Test bit and branch if zero(or non zero) to a label at a PC-relative offset, without affecting the condition flags, and with a hint that this is not a subroutine call or return.

ARM Logical Shift Left Variable

This instruction is used by the alias LSL (register).

lslv { Vreg64 sl, sr, d; };

ARM Arithmetic Shift Right Variable

This instruction is used by the alias ASR (register).

asrv { Vreg64 sl, sr, d; };

ARM Multiply instruction

Why not reuse imul?

mul { Vreg64 s0, s1, d; };