Performance and hash operations - FAForever/fa GitHub Wiki
Tables are extensively discussed in the main article on performance surrounding Lua tables. In this article we'll zoom into the second technique that is discussed in the main article: reduction of hash operations. We will do so by carefully dissecting a small example: we will not only look at the Lua code, but also look at the byte code generated from the Lua code. The former is what we write when we develop, the latter is what the game executes.
The example we will use is the logic surrounding the moving arms of the UEF Naval factory as it is building a unit. You can find all the changes in #4704.
Before you continue we highly recommend you to read chapter 7 of The implementation of Lua 5.0. Chapter 7 is about the virtual machine. In particular figure 5 on page 13 is a useful reference.
Before we apply any performance improvements we should always check at least the following three conditions:
- (1) The code needs to be functional
- (2) The (functionality of the) code needs to be testable
- (3) The (performance criteria of the) code needs to be measurable
In this toy example we meet all three conditions. The build animation is functional and has been used for years now. We can test the code by placing down a new factory and make it build / cancel building a unit. And last we can measure the size of the bytecode and how expensive the instructions are.
We'll start with measuring the existing byte code, and by doing so we create a base line.
We can measure the bytecode using the debug.listcode(fn)
function, where the first and only parameter must be a Lua function. We did this in a8a754d, where we immediately print the bytecode to the console after the class is created. The file will automatically reload when you have the disk watch enabled. You can learn about that in the development environment setup guide.
We'll shortly review the output of a single line of debug.listcode(fn)
:
"( 24) 0 - GETUPVAL 1 0 0",
The first number (24) is the line number of the file. The second number (0) is the instruction number. Then we have the instruction to process (GETUPVAL
in this case). And last we have the parameters of the instruction to process, you can read more about those in the paper. For the purpose of our task ahead they are interesting, but not relevant.
The complete output of the function StartArmsMoving
is shown below. For clarity we've put the Lua code and the generated bytecode next to each other. As an exercise to the user: try to map the bytecode back to the Lua code, in particular: what symbol represents a table operation?
Lua code | Byte code |
StartArmsMoving = function(self)
TSeaFactoryUnit.StartArmsMoving(self)
if not self.ArmSlider then
self.ArmSlider = CreateSlider(self, 'Right_Arm')
self.Trash:Add(self.ArmSlider)
end
end, |
{ table: 201E2708
"( 16) 0 - GETUPVAL 1 0 0",
"( 16) 1 - GETTABLE 1 1 250",
"( 16) 2 - MOVE 2 0 0",
"( 16) 3 - CALL 1 2 1",
"( 17) 4 - GETTABLE 1 0 251",
"( 17) 5 - TEST 1 1 1",
"( 17) 6 - JMP 0 9",
"( 18) 7 - GETGLOBAL 1 2",
"( 18) 8 - MOVE 2 0 0",
"( 18) 9 - LOADK 3 3",
"( 18) 10 - CALL 1 3 2",
"( 18) 11 - SETTABLE 0 251 1",
"( 19) 12 - GETTABLE 1 0 254",
"( 19) 13 - SELF 1 1 255",
"( 19) 14 - GETTABLE 3 0 251",
"( 19) 15 - CALL 1 3 1",
"( 21) 16 - RETURN 0 1 0",
maxstack=4,
numparams=1
} |
We're particularly interested in the GETTABLE
, GETGLOBAL
and SELF
instructions. They indicate a hashing operation. The GETTABLE
instruction originates from the use of .
or [...]
. The SELF
instruction originates from the use of :
. And last, the GETGLOBAL
instruction originates from using a value that is in the global scope. The global scope is a table again and therefore it involves a hashing operation.
We'll focus this article on MovingArmsThread
that represents the loop of the animation. As an exercise to the user: try to map the bytecode back to the Lua code. It really helps you understand what we're working with.
Lua code | Byte code |
MovingArmsThread = function(self)
TSeaFactoryUnit.MovingArmsThread(self)
while true do
if not self.ArmSlider then return end
self.ArmSlider:SetGoal(0, 0, 40)
self.ArmSlider:SetSpeed(40)
WaitFor(self.ArmSlider)
self.ArmSlider:SetGoal(0, 0, 0)
WaitFor(self.ArmSlider)
end
end, |
{ table: 20595AA0
"( 24) 0 - GETUPVAL 1 0 0",
"( 24) 1 - GETTABLE 1 1 250",
"( 24) 2 - MOVE 2 0 0",
"( 24) 3 - CALL 1 2 1",
"( 25) 4 - JMP 0 26",
"( 26) 5 - GETTABLE 1 0 251",
"( 26) 6 - TEST 1 1 1",
"( 26) 7 - JMP 0 1",
"( 26) 8 - RETURN 0 1 0",
"( 27) 9 - GETTABLE 1 0 251",
"( 27) 10 - SELF 1 1 252",
"( 27) 11 - LOADK 3 3",
"( 27) 12 - LOADK 4 3",
"( 27) 13 - LOADK 5 4",
"( 27) 14 - CALL 1 5 1",
"( 28) 15 - GETTABLE 1 0 251",
"( 28) 16 - SELF 1 1 255",
"( 28) 17 - LOADK 3 4",
"( 28) 18 - CALL 1 3 1",
"( 29) 19 - GETGLOBAL 1 6",
"( 29) 20 - GETTABLE 2 0 251",
"( 29) 21 - CALL 1 2 1",
"( 30) 22 - GETTABLE 1 0 251",
"( 30) 23 - SELF 1 1 252",
"( 30) 24 - LOADK 3 3",
"( 30) 25 - LOADK 4 3",
"( 30) 26 - LOADK 5 3",
"( 30) 27 - CALL 1 5 1",
"( 31) 28 - GETGLOBAL 1 6",
"( 31) 29 - GETTABLE 2 0 251",
"( 31) 30 - CALL 1 2 1",
"( 25) 31 - JMP 0 -27",
"( 33) 32 - RETURN 0 1 0",
maxstack=6,
numparams=1
} |
There are two additional pieces of information that debug.listcode(fn)
returns: the stack size that the function requires to run and the number of parameters. The latter is relatively trivial, but the former is interesting. It describes the amount of space on the stack that the interpreter needs to preserve in order to execute this function. As we make changes we'll see that we make a tradeoff: we'll increase the stacksize as we try to remove GETTABLE
, GETGLOBAL
and SELF
instructions.
All of the changes of this section can be found in b92805e
The first observation we can make is that we're retrieving the field ArmSlider
over and over again. It generates a GETTABLE
instruction each time. We can prevent this by storing the result of the GETTABLE
operation in a local value before we enter the loop and then use that in the loop.
Lua code | Byte code |
MovingArmsThread = function(self)
TSeaFactoryUnit.MovingArmsThread(self)
local armSlider = self.ArmSlider
while true do
if not armSlider then
return
end
armSlider:SetGoal(0, 0, 40)
armSlider:SetSpeed(40)
WaitFor(armSlider)
armSlider:SetGoal(0, 0, 0)
WaitFor(armSlider)
end
end, |
{ table: 205023C0
"( 27) 0 - GETUPVAL 1 0 0", -- TSeaFactoryUnit.MovingArmsThread(self)
"( 27) 1 - GETTABLE 1 1 250", -- TSeaFactoryUnit.MovingArmsThread(self)
"( 27) 2 - MOVE 2 0 0", -- TSeaFactoryUnit.MovingArmsThread(self)
"( 27) 3 - CALL 1 2 1", -- TSeaFactoryUnit.MovingArmsThread(self)
"( 28) 4 - GETTABLE 1 0 251", -- local armSlider = self.ArmSlider
"( 29) 5 - JMP 0 22", -- while true do
"( 30) 6 - TEST 1 1 1", -- if not armSlider then
"( 30) 7 - JMP 0 1", -- if not armSlider then
"( 31) 8 - RETURN 0 1 0", -- return
-- (end)
"( 34) 9 - SELF 2 1 252", -- armSlider:SetGoal(0, 0, 40)
"( 34) 10 - LOADK 4 3", -- ^
"( 34) 11 - LOADK 5 3", -- ^
"( 34) 12 - LOADK 6 4", -- ^
"( 34) 13 - CALL 2 5 1", -- armSlider:SetGoal(0, 0, 40)
"( 35) 14 - SELF 2 1 255", -- armSlider:SetSpeed(40)
"( 35) 15 - LOADK 4 4", -- ^
"( 35) 16 - CALL 2 3 1", -- armSlider:SetSpeed(40)
"( 36) 17 - GETGLOBAL 2 6", -- WaitFor(armSlider)
"( 36) 18 - MOVE 3 1 0", -- WaitFor(armSlider)
"( 36) 19 - CALL 2 2 1", -- WaitFor(armSlider)
"( 37) 20 - SELF 2 1 252", -- armSlider:SetGoal(0, 0, 0)
"( 37) 21 - LOADK 4 3", -- ^
"( 37) 22 - LOADK 5 3", -- ^
"( 37) 23 - LOADK 6 3", -- ^
"( 37) 24 - CALL 2 5 1", -- armSlider:SetGoal(0, 0, 0)
"( 38) 25 - GETGLOBAL 2 6", -- WaitFor(armSlider)
"( 38) 26 - MOVE 3 1 0", -- WaitFor(armSlider)
"( 38) 27 - CALL 2 2 1", -- WaitFor(armSlider)
"( 29) 28 - JMP 0 -23", -- (this is where the loop starts again: going back 23 instructions, to instruction 6)
"( 40) 29 - RETURN 0 1 0", -- (end)
maxstack=7,
numparams=1
} |
For clarity we've added in the translation from bytecode back to Lua code. In order to understand this article it is vital that you understand how the mapping takes place.
A few observations after our changes:
- We increased the stack size from 6 to 7
- We reduced the number of instructions reduced from 32 to 29
- We reduced the number of
GETTABLE
operations from 7 to 2 - There are no
GETTABLE
operations in the loop
The conclusion is that we improved the situation: we've reduced the number of random memory accesses at the cost of just one additional value on the stack. As an exercise to the user: try to map the bytecode back to the Lua code after we make the same changes to StartArmsMoving
.
Lua code | Byte code |
StartArmsMoving = function(self)
TSeaFactoryUnit.StartArmsMoving(self)
local armSlider = self.ArmSlider
if not armSlider then
armSlider = CreateSlider(self, 'Right_Arm')
self.ArmSlider = armSlider
self.Trash:Add(armSlider)
end
end, |
{ table: 202A3D20
"( 17) 0 - GETUPVAL 1 0 0",
"( 17) 1 - GETTABLE 1 1 250",
"( 17) 2 - MOVE 2 0 0",
"( 17) 3 - CALL 1 2 1",
"( 18) 4 - GETTABLE 1 0 251",
"( 19) 5 - TEST 1 1 1",
"( 19) 6 - JMP 0 10",
"( 20) 7 - GETGLOBAL 2 2",
"( 20) 8 - MOVE 3 0 0",
"( 20) 9 - LOADK 4 3",
"( 20) 10 - CALL 2 3 2",
"( 20) 11 - MOVE 1 2 0",
"( 21) 12 - SETTABLE 0 251 1",
"( 22) 13 - GETTABLE 2 0 254",
"( 22) 14 - SELF 2 2 255",
"( 22) 15 - MOVE 4 1 0",
"( 22) 16 - CALL 2 3 1",
"( 24) 17 - RETURN 0 1 0",
maxstack=5,
numparams=1
} |
All of the changes of this section can be found in b92805e