Memory allocations in Lua - FAForever/fa GitHub Wiki
Everything written in this article completely depends on your Lua environment. This article is specifically written for the Lua environment of Supreme Commander: Forged Alliance. We'll assume that you have basic knowledge of how Lua works internally. If you feel you lack this knowledge then I can highly recommend to scan the paper 'The Implementation of Lua 5.0' and to carefully read the fourth chapter about tables.
Prerequisites
Before you continue it is important to understand that you should only try and optimize your code if it (at a minimum) meets the following conditions:
- Functional: you can't optimize code that either doesn't exist, or code that is incomplete. Any attempt at optimizing before the code is function is premature.
- Be able to test changes: you'll introduce significant changes to your code in order to optimize it. This will introduce bugs. It is therefore vital to be able to continuously test your changes as you progress.
- Be able to profile: no matter how good you (think you) are, if you can't measure it then you're blind. And if you're blind then you may be spending a lot of time on something that is irrelevant.
And on top of that, have a rough understanding of what it is that you're working with. That is required to make decisions on what to tackle next and whether it makes sense to continue. That is what this article will help you with.
Tables
A table contains a hash part and an array part. The hash part is used when the key can not be used for the array part, usually that is because the key is not a number. The array part is used when you use indices that already exist (1) or are successive (2), otherwise they are still interpreted as a hash (3). Note that table.insert
guarantees that (2) occurs, and therefore always adds the element to the array part.
local data = { 1, 2, 3, 4 }
data[2] = 99 -- (1) uses array part
data[5] = 5 -- (2) uses array part
data[99] = 2 -- (3) uses hash part
Elements stored in an array cost less memory to store and are easier to access.
When either part is too small it resizes. The behavior is different for both parts. Using debug.allocatedsize
we can inspect the behavior:
LOG(debug.allocatedsize({ })) -- 40 bytes
LOG(debug.allocatedsize({ a = 1 })) -- 80 bytes
LOG(debug.allocatedsize({ a = 1, b = 2 })) -- 120 bytes
LOG(debug.allocatedsize({ a = 1, b = 2, c = 3 })) -- 120 bytes
LOG(debug.allocatedsize({ a = 1, b = 2, c = 3, d = 4 })) -- 200 bytes
LOG(debug.allocatedsize({ a = 1, b = 2, c = 3, d = 4, e = 5 })) -- 200 bytes
LOG(debug.allocatedsize({ a = 1, b = 2, c = 3, d = 4, e = 5, f = 6 })) -- 200 bytes
LOG(debug.allocatedsize({ a = 1, b = 2, c = 3, d = 4, e = 5, f = 6, g = 7 })) -- 200 bytes
LOG(debug.allocatedsize({ a = 1, b = 2, c = 3, d = 4, e = 5, f = 6, g = 7, h = 8 })) -- 360 bytes
LOG(debug.allocatedsize({ 1 })) -- 48 bytes
LOG(debug.allocatedsize({ 1, 2 })) -- 56 bytes
LOG(debug.allocatedsize({ 1, 2, 3 })) -- 64 bytes
LOG(debug.allocatedsize({ 1, 2, 3, 4 })) -- 72 bytes
LOG(debug.allocatedsize({ 1, 2, 3, 4, 5 })) -- 80 bytes
There is a lot of information to decipher:
- An empty table occupies 40 bytes
- A hash table pre-allocates
Navigational mesh
For this article we'll use the navigational mesh as an example. It matches the prerequisites:
- The navigational mesh is fully functional
- The navigational mesh can be tested: the setup is done in such a way that changes are applied immediately and the only thing you need to do is press the generate button in the debug UI to re-generate it using the latest code changes. Various data is gathered and shown in the UI that we can use to confirm we did not change the behavior
- The navigational mesh can be profiled: we can measure the time it takes to generate the navigational mesh and using ToBytes we can get a good approximation of the number of bytes that are allocated by the data structure. Note that the former is not deterministic due to the CPU scheduler, but the latter is
We'll specifically follow the steps of pull request #4536. The navigational mesh is a graph. The graph consists of leaves and nodes. Each leaf is constructed like any other leaf. Each node is constructed like any other node. To encompass the entire map the navigational mesh requires (tens of) thousands of nodes and leaves. With that in mind, any change we make is amplified by (tens of) thousands of times.