Embedding Languages Comparison - TeodorVecerdi/TTTEmbeddingLanguages GitHub Wiki

Embedding Languages - Comparison

Title image

Brief Description

Embedding programming languages is a process in which a host programming language (commonly a compiled language) runs, and embeds, another programming language (commonly an interpreted language) usually with the purpose of abstracting away application specific code from the rest of the codebase. The embedded language does not have to necessarily be interpreted, as it can be seen in the game engine Unity3D which embeds C#, a compiled language, for its game scripting.

The most commonly used programming language for embedding into other programming languages or codebases is Lua, because it is self-contained and small in size, with a simple and easy to understand syntax.

Embedded languages usually differ in terms of performance and memory usage when interoperating with the host language.
This report covers the main differences between Lua and Python when embedded in C# by comparing their memory usage and performance (time) when reading/writing data to and from the embedded language, and function calls.

The prototype uses the NLua library to embed Lua, and the Python.NET library to embed Python. The code used for gathering the measurements can be found here, and the raw data as well as other measurements and graphs not presented in the following report, can be found here.

Evaluation Proposal: Embed Lua and Python in C# (.NET 5) and compare their memory usage, performance, versatility and difficulty of embedding.

Relevance: Embedding languages is widely used in Game Engines (Unity scripting languages and you; GDScript basics; UnrealScript) to simplify development and specifically scripting, but also in other software such as Blender (Blender Manual) and some games that allow for easier modding support (Rimworld Modding) or other forms of content creation (ComputerCraft Mod). The process of embedding a language is also widely used outside the game industry, through Domain-specific languages. (Mernik, M., Heering, J., & Sloane, A. M. (2005))

References:

Comparison

Writing and reading an Integer

The first test I did was measuring the performance and memory usage of simply writing (and reading) an integer to the same variable in each language, to be able to get a baseline of how the two languages compare.
The code for this test looks like this:

for(int i = 0; i < iterations; i++) {
	// state.Set is just a placeholder for the languages' specific method calls for setting a variable
	state.Set("a", i); // and for reading: a = state.Get("a");
}

In terms of performance, the results are similar and there is no clear winner. When writing an integer, Python is about 60% faster, coming in at 0.13 microseconds/write in contrast to Lua's 0.21 microseconds/write. When reading an integer, Lua is the winner being around 50% faster than Python with 0.19 microseconds/read vs. Python's 0.28 microseconds/read.

Integer Write/Read Performance
The chart shows the average time it takes Lua and Python to write and read an integer

When it comes to memory usage, Python uses less memory when both reading and writing. Python uses around 250% less memory - 48.1 bytes - compared to Lua's 176 bytes for every integer write. Reading an integer shows a similar story, Python uses around 150% less memory: 72 bytes vs. 176 bytes for every integer read.

Integer Write/Read Memory usage
The chart shows the average memory used by Lua and Python to write and read an integer

Writing more complex types

The next test I did was to evaluate and compare the performance and memory usage when writing increasingly complex types, starting from an integer, to a Vector3, a Transform, and finally a List of 4 megabytes worth of Vector3's. This test shows how the two languages scale when writing different types.
The protocol for this test is slightly different than the previous one. In each iteration of writing the objects, I write to a different variable (a{i} where i is the current iteration) with pre-allocated keys/strings for each variable. This is why there are slight differences in results compared to the first test. Writing to the same variable like in the first test, led to similar results, with slightly less memory usage and slightly faster execution.
The chart shows significant differences when Lua writes to a single variable versus when it is writing to a new variable every frame. Python, on the other hand, shows little differences. Other than that, the chart shows that Python is overall faster than Lua, but also an 'anomaly' that happens when Lua attempts to write a List of almost 100.000 Vector3's. It seems like it does a lot of work to get the List in Lua, while Python shows the same performance as writing any other data type.

Performance of writing complex types
The chart shows the average time it takes for Lua and Python to write an int, Vector3, Transform and 4 MB worth of Vector3's

The memory usage results, however, show Lua winning for two of the tests, and show similar usage when writing an integer to the first test. Writing a Vector3 and a Transform uses less memory for Lua, and writing the List of Vector3's shows a similar difference to the performance results above.
From my experiments, Python .NET - the library I used to embed Python - treats every object as a reference to the C# object, instead of copying its value to the Python runtime. This explains the small memory footprint of writing a 4MB list. Based on another experiment I did, wrapping the List in a class brought Lua's memory usage and performance to numbers more comparable to Python.

Memory Usage of writing complex types
The chart shows the average memory usage for Lua and Python to write an int, Vector3, Transform and 4 MB worth of Vector3's

Function calls

The last test I did was measuring the performance and memory usage of function calls in Lua and Python. This is relevant because with most embedded languages, the biggest performance hit comes from function calls.
For this test, I create a lua/python state, create a function and then measure the performance and memory usage for 1 million calls to that function. Since I was only interested in the function call itself, I corrected the memory usage for any variable allocated during the test so that I could get results that strictly represent the function call performance. In my case I had to correct the memory usage for allocating 24 bytes each function call (two integers, a long, and a DateTime which is also a long)

The functions I used for the measurements were the equivalent of a 2-integer addition function in each language:
Lua:

function add(a, b)
	return a + b
end

Python:

def add(a, b):
	return a + b

The charts show Python being almost 400% slower than Lua at 1 million function calls, taking more than one second for Python, while Lua only takes 0.21 seconds. Memory usage shows Python coming ahead by only 25MB, or 13% less memory usage.

Performance Memory Usage
1 Million function calls - performance 1 Million function calls - memory usage

Difficulty of embedding and versatility

Both programming languages are pretty straight forward to embed. After referencing the required libraries, getting a Hello, World! up and running requires just a few lines of code. One big difference between the two is that Lua is self-contained and does not require any prior setup by the user, while Python needs a reference to the Python DLL on your user's machine.

In the case of Lua, it only takes two lines of code, including the required initialization:

using var lua = new NLua.Lua();
lua.DoString("print('Hello, World!')");

For Python, it's four lines of code to get Hello, World! on the screen:

Runtime.Runtime.PythonDLL = "python39.dll";
using var state = Py.GIL();
using var scope = Py.CreateScope();
scope.Exec("print('Hello, World!')");

When it comes to versastility, and interoperability with C#/.NET, both libraries allow you to do similar things: you can read, parse, and execute code from files directly, you can import assemblies into Lua/Python, reference C# types, and from C# reference Lua/Python types. During my testing I haven't really found something that one library can do that the other cannot. One subjective difference I found between the two languages themselves in terms of versatility is that Python allows for easy inheritance and OOP, and, since Python 3.5, type hinting and safety.

Time and Space complexity

PythonNet and NLua

Looking at the source code of both libraries, and how they implement the Get/Set methods for interacting with the interpreter/embedded language, it seems both set objects and get objects from the state in constant time O(1), with a constant space complexity.
There are a few exceptions though:

1. Python - PythonNET converts each object to/from a PyObject (by wrapping it in a pointer). This conversion process is usually done in O(1) with the exception of lists, or other IEnumerables where it happens in linear time O(N) where N is the count of items in the list. PythonNET creates a PyList object and for each object in the original list, it converts it to a PyObject and appends it to the PyList. Otherwise it gets a handle to the original object and wraps it in a PyObject. Similarly, when processing lists, PythonNet does it with linear space complexity.

2. Lua - NLua also has an exception where getting and setting a value is done in O(N) instead of constant time where N is the 'depth' of the object you're trying to access. To get or set objects using NLua you typically do something like this: lua["path.to.your.object"] = value. It then splits your key/object index into an array of paths ["path", "to", "your", "object"] and accesses each object in the path until it gets to the last one, where it sets it to a value, or returns it. When it comes to space complexity, it seems that only getting an item requires a linear space complexity, calling the GetObject on every string in the path. Setting an object, besides manipulating the stack on every string in the path, only calls SetObject once, at the end of the path.

My measurement setup

All of my measurements were done with time complexity of O(S * A) where S is the number of samples I measure, and A is the number of actions each sample did. For example, measuring the average performance of writing one million integers, over 256 samples had a time complexity of O(256 * 1 million). Processing the results was done either in constant time O(1) for memory usage measurements where I only took one sample, or in O(S) for performance measurements where after getting all the samples I processed them into a minimum, maximum and average.

In terms of space complexity, measurements are done with a space complexity of O(S) where S is the number of samples measured.

⚠️ **GitHub.com Fallback** ⚠️