VM vs Interpreter - ThornLang/JavaThorn GitHub Wiki

VM vs Interpreter: Understanding ThornLang's Dual Execution Model

ThornLang provides two distinct execution models: a tree-walking interpreter and a register-based bytecode VM. This guide explains the differences, trade-offs, and when to use each.

Overview
Architecture Comparison
Performance Characteristics
Feature Parity
When to Use Each Mode
Technical Details
Debugging Differences
Future Roadmap

Overview

Tree-Walking Interpreter (Default)

java com.thorn.Thorn script.thorn

The interpreter directly traverses and evaluates the Abstract Syntax Tree (AST) without any intermediate compilation step.

Key Points:

Direct AST evaluation
No compilation overhead
Simpler implementation
Better error messages
Easier to debug

Bytecode VM (--vm flag)

java com.thorn.Thorn script.thorn --vm

The VM first compiles the AST into bytecode instructions, then executes these instructions on a register-based virtual machine.

Key Points:

Compilation to bytecode
Register-based architecture
Optimized instruction dispatch
Better performance for loops
More memory usage

Architecture Comparison

Tree-Walking Interpreter Flow

Source Code → Scanner → Parser → AST → Interpreter → Result
                                        ↑
                                   Direct evaluation

Bytecode VM Flow

Source Code → Scanner → Parser → AST → Compiler → Bytecode → VM → Result
                                                               ↑
                                                     Instruction execution

Code Example: How Each Executes

// Simple example
x = 10;
y = 20;
result = x + y;
print(result);

Interpreter Execution:

Evaluates x = 10 by visiting assignment node
Evaluates y = 20 by visiting assignment node
Evaluates x + y by visiting binary expression node
Evaluates print(result) by visiting call expression node

VM Execution:

Compiles to bytecode:

LOAD_CONST    R0, 10      ; Load 10 into register 0
STORE_GLOBAL  "x", R0     ; Store R0 to global x
LOAD_CONST    R0, 20      ; Load 20 into register 0
STORE_GLOBAL  "y", R0     ; Store R0 to global y
LOAD_GLOBAL   R0, "x"     ; Load x into R0
LOAD_GLOBAL   R1, "y"     ; Load y into R1
ADD           R2, R0, R1  ; Add R0 and R1, store in R2
STORE_GLOBAL  "result", R2
LOAD_GLOBAL   R0, "result"
PRINT         R0

Executes bytecode instructions sequentially

Performance Characteristics

Startup Time

Mode	Startup Time	Best For
Interpreter	~50ms	Short scripts, REPL
VM	~150ms	Long-running programs

The VM has higher startup time due to compilation overhead.

Execution Speed

// Benchmark: Fibonacci
$ fib(n) {
    if (n <= 1) return n;
    return fib(n-1) + fib(n-2);
}

Operation	Interpreter	VM	VM Advantage
fib(30)	125ms	42ms	3.0x faster
fib(35)	3200ms	1100ms	2.9x faster
Loop 1M iterations	890ms	210ms	4.2x faster
Array operations	340ms	180ms	1.9x faster

Memory Usage

// Memory comparison for typical program
// Program with 1000 functions and 10000 variables

Metric	Interpreter	VM
Base memory	~10MB	~15MB
Per function	~2KB	~5KB
Per variable	~100B	~100B
Peak usage	Lower	Higher

Feature Parity

Both execution modes support all ThornLang features:

Feature	Interpreter	VM
Variables & Types	✓	✓
Functions	✓	✓
Classes	✓	✓
Arrays	✓	✓
Pattern Matching	✓	✓
Modules	✓	✓
Error Handling	✓	✓
Built-in Functions	✓	✓

Implementation Differences

While features work the same, implementations differ:

// Function calls
$ add(a, b) => a + b;
result = add(5, 3);

Interpreter:

Creates new environment for function
Binds parameters to arguments
Evaluates function body AST
Returns to caller environment

VM:

Pushes new call frame
Copies arguments to registers
Jumps to function bytecode
Returns via RETURN instruction

When to Use Each Mode

Use the Interpreter (Default) When:

Developing and Testing

# Quick script testing
java com.thorn.Thorn test.thorn

Running Short Scripts

// Simple automation script
files = listFiles("*.txt");
for (file in files) {
    process(file);
}

Using the REPL

# Interactive development
java com.thorn.Thorn
thorn> x = 42
thorn> print(x * 2)

Better Error Messages Needed

// Interpreter provides clearer stack traces
$ buggyFunction() {
    return undefinedVar;  // Clear error location
}

Use the VM (--vm) When:

Running Compute-Heavy Code

// Mathematical computations
$ mandelbrot(width, height, maxIter) {
    // Complex calculations benefit from VM
}

Processing Large Data Sets

// Data processing with many iterations
data = loadLargeDataset();
for (record in data) {
    transformed = transform(record);
    results.push(transformed);
}

Long-Running Services

// Server or daemon processes
while (true) {
    request = waitForRequest();
    response = processRequest(request);
    sendResponse(response);
}

Recursive Algorithms

// Deep recursion benefits from VM's call frame optimization
$ quickSort(arr, low, high) {
    if (low < high) {
        pi = partition(arr, low, high);
        quickSort(arr, low, pi - 1);
        quickSort(arr, pi + 1, high);
    }
}

Technical Details

Interpreter Internals

// Simplified interpreter visit pattern
Object visitBinaryExpr(Expr.Binary expr) {
    Object left = evaluate(expr.left);
    Object right = evaluate(expr.right);
    
    switch (expr.operator.type) {
        case PLUS:
            return (Double)left + (Double)right;
        // ... other operators
    }
}

Characteristics:

Recursive visitor pattern
Direct Java method calls
Dynamic dispatch overhead
Simple to understand and modify

VM Internals

// VM instruction dispatch loop
while (!halted) {
    int instruction = getCurrentInstruction();
    OpCode opcode = getOpcode(instruction);
    
    switch (opcode) {
        case ADD_FAST:
            registers[A] = registers[B] + registers[C];
            break;
        // ... other instructions
    }
    pc++;
}

Characteristics:

Flat instruction dispatch
Register-based operations
Minimized function call overhead
Harder to debug

Bytecode Instruction Set

The VM uses ~50 instructions including:

Category	Instructions
Load/Store	LOAD_CONST, LOAD_LOCAL, STORE_LOCAL, LOAD_GLOBAL
Arithmetic	ADD, SUB, MUL, DIV, MOD, POW, NEG
Fast Arithmetic	ADD_FAST, SUB_FAST, MUL_FAST, DIV_FAST
Comparison	EQ, NE, LT, LE, GT, GE
Control Flow	JUMP, JUMP_IF_FALSE, JUMP_IF_TRUE, CALL, RETURN
Arrays	GET_INDEX, SET_INDEX, ARRAY_LENGTH, ARRAY_PUSH
Objects	GET_PROPERTY, SET_PROPERTY, NEW_OBJECT

Register Allocation

The VM uses a simple register allocation strategy:

// Source code
x = a + b * c;

// Bytecode (simplified)
MUL     R0, b, c    ; R0 = b * c
ADD     R1, a, R0   ; R1 = a + R0
STORE   x, R1       ; x = R1

Debugging Differences

Error Messages

Interpreter Error:

Runtime error: Undefined variable 'foo'
  at myFunction (script.thorn:10)
  at processData (script.thorn:25)
  at main (script.thorn:40)

VM Error:

Runtime error: Undefined variable 'foo'
  at bytecode offset 0x1A5
  in function myFunction

Debugging Tools

Tool	Interpreter	VM
--ast flag	Shows AST structure	Shows AST structure
Stack traces	Full source location	Bytecode offsets
Step debugging	Possible (future)	More complex
Performance profiling	Basic timing	Instruction counts

Debug Mode Example

# View AST (works for both modes)
java com.thorn.Thorn script.thorn --ast

# Future: VM bytecode dump
# java com.thorn.Thorn script.thorn --vm --dump-bytecode

Future Roadmap

Planned Interpreter Improvements

AST Optimization Pass
- Constant folding
- Dead code elimination
- Common subexpression elimination
Cached Property Access
- Property lookup tables
- Inline caching

Planned VM Improvements

Advanced Optimizations
- Better register allocation
- Instruction combining
- Loop optimizations
JIT Compilation
- Hot path detection
- Native code generation
- Adaptive optimization
Debugging Support
- Source maps for bytecode
- Bytecode disassembler
- Step-through debugging

Unified Improvements

Profiling Tools

# Future profiling support
java com.thorn.Thorn script.thorn --profile

Optimization Levels

# Future optimization flags
java com.thorn.Thorn script.thorn --vm -O2

Decision Matrix

Use this matrix to decide which mode to use:

Criteria	Score	Interpreter	VM
Script runs < 1 second	High	✓
Script runs > 10 seconds	High		✓
Heavy computation	High		✓
Many function calls	Medium		✓
String manipulation	Low	✓	✓
Development/debugging	High	✓
Production deployment	Medium		✓
REPL usage	High	✓

Examples

Example 1: Script Automation (Use Interpreter)

// file_renamer.thorn - Better with interpreter
import { fs } from "system";

files = fs.listDir(".");
for (file in files) {
    if (file.endsWith(".tmp")) {
        newName = file.replace(".tmp", ".bak");
        fs.rename(file, newName);
        print("Renamed: " + file + " -> " + newName);
    }
}

Example 2: Data Processing (Use VM)

// data_analyzer.thorn - Better with VM
$ analyze(dataset) {
    results = {};
    
    for (record in dataset) {
        category = record["category"];
        value = record["value"];
        
        if (results[category] == null) {
            results[category] = {
                "sum": 0,
                "count": 0,
                "min": value,
                "max": value
            };
        }
        
        stats = results[category];
        stats["sum"] += value;
        stats["count"] += 1;
        stats["min"] = min(stats["min"], value);
        stats["max"] = max(stats["max"], value);
    }
    
    return results;
}

// Process large dataset
data = loadCSV("large_dataset.csv");  // 1M+ records
results = analyze(data);

Summary

Interpreter: Best for development, scripting, and I/O-bound tasks
VM: Best for computation, long-running programs, and production
Both: Support all ThornLang features equally
Choose based on: Runtime duration, computation intensity, and use case

The dual execution model gives ThornLang flexibility to excel in both scripting and application development scenarios.