The LSN Virtual Machine - Paramecium13/LSN GitHub Wiki
Note: Both this article and what it describes are works in progress and subject to frequent changes and corrections
This article is the specification for the LSN Virtual Machine and describes the expected behavior. Though it occasionally mentions details of the .NET implementation, it is intended to be used as a reference for creating LSN Virtual Machines in other languages.
The LSN Virtual Machine is a stack machine; its instructions operate on data stored in an Evaluation Stack.
Memory Model
The data values that LSN works with are stored in 16-byte (assuming an underlying 64-bit architecture) values called LSN Values, which consist of an 8-byte double precision floating point number, which can also be used to represent a 32-bit integer by simple casting, and an 8-byte reference, which can refer to any kind of LSN object. This strange and likely inefficient structure came into use with the old object based interpretation in which all implementations of the Evaluate(...) method needed to have the same return type. The 8-byte floating point value was added to avoid having to unbox/box int and double values at the start/end of each Evaluate(...) method. This structure is being retained in the virtual machine to ease the transition away from the previous mode of interpretation. Future versions of the VM will not use the LSN Value structure.
Registers
Yes, the LSN VM is a stack machine, but it still has a few special-purpose registers. Their contents are not directly accessed by instructions and are stored not as LSN Values, but as data types of the host language/runtime (e.g. integers and references).
List of Registers
- Next Instruction: The index of the next instruction to execute
- Target: The index of the target instruction. For more information, see the descriptions of the Jump to Target and Set Target instructions in the Instructions section of this article.
- Current Procedure: Information about the current procedure, namely a reference to the file containing it, its offset within the file's instruction segment, and the size of its stack frame.
Evaluation Stack
The virtual machine stores intermediate results of expression evaluation, non-virtual procedure arguments, and procedure return values on the evaluation stack. Instructions pop their input value(s) from the stack and push their result onto it. For example, a binary arithmetic instruction, such as add or multiply, will pop its two inputs from the stack, perform its operation, and push the result onto the stack.
LSN Byte Code contains instructions for manipulating values on the stack. For details, see the relevant portion of the Instructions section.
Call-Return Stack
This is analogous to the call-return stack in languages such as C/C++, C#, and Java. Despite its name, its contents will actually reside in the heap memory of the host application. The call-return stack stores local values of procedures and frequently their arguments as well. When a procedure is called, space for that procedures local values is allocated on the call-return stack. When it returns, that space is deallocated. To facilitate garbage collection done by the host language, deallocated space must be explicitly set to none, removing any object references that might prevent an unused object from being collected.
The Heap
LSN does not really have a strong concept of the heap. ... Garbage collection is provided by the language/runtime...
LSN does however provide instructions to deallocate structs and records, possibly by returning their underlying storage to a cache. These instructions will be generated at the place in the procedure where the record/struct is no longer used by anything. For nested structs, the compiler will generate instructions for freeing the contained structs first. Hopefully, future versions of the virtual machine will store structs on the call-return stack.
Instructions
LSN instructions take up 32 bits. The lower 16 bits are the operation code and the upper 16 bits are the data. The data portion is used differently by different instructions, some do not even use it at all. It can be interpreted as a signed 16-bit integer or an index (unsigned 16-bit integer). Unless otherwise stated, all instructions pop all the values they use from the stack.
Stack Manipulation
- Duplicate (DUP): Duplicate the value on top of the stack.
- Swap (SWAP): Swap the top two values on the stack.
- Swap and Duplicate (SWADUP): Place a copy of the value second from the top of the stack on top of the stack.
- Pop (POP): Pop the value on top of the stack and forget about it.
Arithmetic and Logic
- Add, Subtract, Multiply (ADD, SUB, MUL): Performs the operation on the top two values on the stack and pushes the result onto the stack. These operations do not distinguish between 32-bit integers and 64-bit floats, because a 32-bit integer value is stored as a 64-bit float in the LSN Value structure. All these operations are calculated on 64-bit floats.
- Integer Division and Modulus (DIV_I32, REM_I32): Performs integer division or modulus on the top two values on the stack, with the top value being on the left and the bottom one on the right and pushes the result onto the stack.
- Exponentiation (POW): Calculates the value on top of the stack raised to the power of the value below it on the stack and pushes the result onto the stack.
- Make a Range (MAKERANGE): Creates a range structure from the top two values on the stack. See the relevant documentation for details.
Comparisons
- Equals and Not Equals for integers, 64-bit floats, strings, and objects ([EQ|NEQ]_[I32|F64|STR|OBJ]): Floating point comparison uses an epsilon. String comparison is done by value. Object comparison is done by reference. To compare structs or records by value (as is the default in LSN), the compiler must generate the instructions to do so. The boolean result is pushed onto the stack.
- Non null, optionally with No Pop (NONNULL, NUNNULL_NOPOP): Pops the value on top of the stack and checks if it is null, pushing false if it is null and true if it is not null. The No Pop version does not pop the value it is checking and is used in the if let language structure.
- Less Than, Less Than or Equal, Greater Than, Greater Than or Equal (LT, LTE, GT, GTE): Performs the given comparison on the top two values on the stack, with the top value on the left side as usual, and pushes the result.
Flow Control
- Unconditional Jump (JUMP): Jump to the instruction at the index stored in the data portion of the instruction. The index is relative to the start of the current procedure.
- Conditional Jump, True or False (JUMP_TRUE, JUMP_FALSE): Jump to the instruction at the index stored in the data portion of this instruction if the value on top of the stack is true/false.
- Conditional Jump No Pop, True or False (JUMP_TRUE_NOPOP, JUMP_FALSE_NOPOP): Same as the above instruction but without popping the value on top of the stack. It is used for short circuiting boolean logic.
- Switch (SWITCH): Unused. I have yet to determine how to implement the switch structure...
- Set Target (SETTARGET): Set the target register to the value stored in this instruction's data.
- Jump to Target (JUMPTOTARGET): Jump to the instruction located at the target register.
Calling
Constants
Variables, Fields, and Elements
Atomic Field Operations
Perform arithmetic operations modifying the value of a field in one instruction (e.g.self.myNum += x;
). These will be useful for multiprocessing if, as I am planning, the LSN VM is only run on one thread and it switches between 'processes'. Perhaps I should also add test and set and compare exchange.