Performance and hash operations - FAForever/fa GitHub Wiki

Tables are extensively discussed in the main article on performance surrounding Lua tables. In this article we'll zoom into the second technique that is discussed in the main article: reduction of hash operations. We will do so by carefully dissecting a small example: we will not only look at the Lua code, but also look at the byte code generated from the Lua code. The former is what we write when we develop, the latter is what the game executes.

The example we will use is the logic surrounding the moving arms of the UEF Naval factory as it is building a unit. You can find all the changes in #4704.

Before you continue we highly recommend you to read chapter 7 of The implementation of Lua 5.0. Chapter 7 is about the virtual machine. In particular figure 5 on page 13 is a useful reference.

Background

Virtual machine

Feeding the CPU

Performance and hash operations

Before we apply any performance improvements we should always check at least the following three conditions:

  • (1) The code needs to be functional
  • (2) The (functionality of the) code needs to be testable
  • (3) The (performance criteria of the) code needs to be measurable

In this toy example we meet all three conditions. The build animation is functional and has been used for years now. We can test the code by placing down a new factory and make it build / cancel building a unit. And last we can measure the size of the bytecode and how expensive the instructions are.

We'll start with measuring the existing byte code, and by doing so we create a base line.

Baseline

We can measure the bytecode using the debug.listcode(fn) function, where the first and only parameter must be a Lua function. We did this in a8a754d, where we immediately print the bytecode to the console after the class is created. The file will automatically reload when you have the disk watch enabled. You can learn about that in the development environment setup guide.

We'll shortly review the output of a single line of debug.listcode(fn):

  • "( 24) 0 - GETUPVAL 1 0 0",

The first number (24) is the line number of the file. The second number (0) is the instruction number. Then we have the instruction to process (GETUPVAL in this case). And last we have the parameters of the instruction to process, you can read more about those in the paper. For the purpose of our task ahead they are interesting, but not relevant.

The complete output of the function StartArmsMoving is shown below. For clarity we've put the Lua code and the generated bytecode next to each other. As an exercise to the user: try to map the bytecode back to the Lua code, in particular: what symbol represents a table operation?

Lua code Byte code
StartArmsMoving = function(self)
    TSeaFactoryUnit.StartArmsMoving(self)
    if not self.ArmSlider then
        self.ArmSlider = CreateSlider(self, 'Right_Arm')
        self.Trash:Add(self.ArmSlider)
    end
end,
{ table: 201E2708 
  "(  16)    0 - GETUPVAL       1    0    0",
  "(  16)    1 - GETTABLE       1    1  250",
  "(  16)    2 - MOVE           2    0    0",
  "(  16)    3 - CALL           1    2    1",
  "(  17)    4 - GETTABLE       1    0  251",
  "(  17)    5 - TEST           1    1    1",
  "(  17)    6 - JMP            0    9",
  "(  18)    7 - GETGLOBAL      1    2",
  "(  18)    8 - MOVE           2    0    0",
  "(  18)    9 - LOADK          3    3",
  "(  18)   10 - CALL           1    3    2",
  "(  18)   11 - SETTABLE       0  251    1",
  "(  19)   12 - GETTABLE       1    0  254",
  "(  19)   13 - SELF           1    1  255",
  "(  19)   14 - GETTABLE       3    0  251",
  "(  19)   15 - CALL           1    3    1",
  "(  21)   16 - RETURN         0    1    0",
  maxstack=4,
  numparams=1
}

We're particularly interested in the GETTABLE, GETGLOBAL and SELF instructions. They indicate a hashing operation. The GETTABLE instruction originates from the use of . or [...]. The SELF instruction originates from the use of :. And last, the GETGLOBAL instruction originates from using a value that is in the global scope. The global scope is a table again and therefore it involves a hashing operation.

We'll focus this article on MovingArmsThread that represents the loop of the animation. As an exercise to the user: try to map the bytecode back to the Lua code. It really helps you understand what we're working with.

Lua code Byte code
MovingArmsThread = function(self)
    TSeaFactoryUnit.MovingArmsThread(self)
    while true do
        if not self.ArmSlider then return end
        self.ArmSlider:SetGoal(0, 0, 40)
        self.ArmSlider:SetSpeed(40)
        WaitFor(self.ArmSlider)
        self.ArmSlider:SetGoal(0, 0, 0)
        WaitFor(self.ArmSlider)
    end
end,
{ table: 20595AA0 
  "(  24)    0 - GETUPVAL       1    0    0",
  "(  24)    1 - GETTABLE       1    1  250",
  "(  24)    2 - MOVE           2    0    0",
  "(  24)    3 - CALL           1    2    1",
  "(  25)    4 - JMP            0   26",
  "(  26)    5 - GETTABLE       1    0  251",
  "(  26)    6 - TEST           1    1    1",
  "(  26)    7 - JMP            0    1",
  "(  26)    8 - RETURN         0    1    0",
  "(  27)    9 - GETTABLE       1    0  251",
  "(  27)   10 - SELF           1    1  252",
  "(  27)   11 - LOADK          3    3",
  "(  27)   12 - LOADK          4    3",
  "(  27)   13 - LOADK          5    4",
  "(  27)   14 - CALL           1    5    1",
  "(  28)   15 - GETTABLE       1    0  251",
  "(  28)   16 - SELF           1    1  255",
  "(  28)   17 - LOADK          3    4",
  "(  28)   18 - CALL           1    3    1",
  "(  29)   19 - GETGLOBAL      1    6",
  "(  29)   20 - GETTABLE       2    0  251",
  "(  29)   21 - CALL           1    2    1",
  "(  30)   22 - GETTABLE       1    0  251",
  "(  30)   23 - SELF           1    1  252",
  "(  30)   24 - LOADK          3    3",
  "(  30)   25 - LOADK          4    3",
  "(  30)   26 - LOADK          5    3",
  "(  30)   27 - CALL           1    5    1",
  "(  31)   28 - GETGLOBAL      1    6",
  "(  31)   29 - GETTABLE       2    0  251",
  "(  31)   30 - CALL           1    2    1",
  "(  25)   31 - JMP            0  -27",
  "(  33)   32 - RETURN         0    1    0",
  maxstack=6,
  numparams=1
}

There are two additional pieces of information that debug.listcode(fn) returns: the stack size that the function requires to run and the number of parameters. The latter is relatively trivial, but the former is interesting. It describes the amount of space on the stack that the interpreter needs to preserve in order to execute this function. As we make changes we'll see that we make a tradeoff: we'll increase the stacksize as we try to remove GETTABLE, GETGLOBAL and SELF instructions.

Factoring out GETTABLE instructions

All of the changes of this section can be found in b92805e

The first observation we can make is that we're retrieving the field ArmSlider over and over again. It generates a GETTABLE instruction each time. We can prevent this by storing the result of the GETTABLE operation in a local value before we enter the loop and then use that in the loop.

Lua code Byte code
MovingArmsThread = function(self)
    TSeaFactoryUnit.MovingArmsThread(self)
    local armSlider = self.ArmSlider
    while true do
        if not armSlider then 
            return 
        end
        armSlider:SetGoal(0, 0, 40)
        armSlider:SetSpeed(40)
        WaitFor(armSlider)
        armSlider:SetGoal(0, 0, 0)
        WaitFor(armSlider)
    end
end,
{ table: 205023C0 
  "(  27)    0 - GETUPVAL       1    0    0", --    TSeaFactoryUnit.MovingArmsThread(self)
  "(  27)    1 - GETTABLE       1    1  250", --    TSeaFactoryUnit.MovingArmsThread(self)
  "(  27)    2 - MOVE           2    0    0", --    TSeaFactoryUnit.MovingArmsThread(self)
  "(  27)    3 - CALL           1    2    1", --    TSeaFactoryUnit.MovingArmsThread(self)
  "(  28)    4 - GETTABLE       1    0  251", --    local armSlider = self.ArmSlider
  "(  29)    5 - JMP            0   22",      --    while true do
  "(  30)    6 - TEST           1    1    1", --        if not armSlider then
  "(  30)    7 - JMP            0    1",      --        if not armSlider then
  "(  31)    8 - RETURN         0    1    0", --            return
                                              --        (end)
  "(  34)    9 - SELF           2    1  252", --        armSlider:SetGoal(0, 0, 40)
  "(  34)   10 - LOADK          4    3",      --                          ^
  "(  34)   11 - LOADK          5    3",      --                             ^
  "(  34)   12 - LOADK          6    4",      --                                ^
  "(  34)   13 - CALL           2    5    1", --        armSlider:SetGoal(0, 0, 40)
  "(  35)   14 - SELF           2    1  255", --        armSlider:SetSpeed(40)
  "(  35)   15 - LOADK          4    4",      --                           ^
  "(  35)   16 - CALL           2    3    1", --        armSlider:SetSpeed(40)
  "(  36)   17 - GETGLOBAL      2    6",      --        WaitFor(armSlider)
  "(  36)   18 - MOVE           3    1    0", --        WaitFor(armSlider)
  "(  36)   19 - CALL           2    2    1", --        WaitFor(armSlider)
  "(  37)   20 - SELF           2    1  252", --        armSlider:SetGoal(0, 0, 0)
  "(  37)   21 - LOADK          4    3",      --                          ^
  "(  37)   22 - LOADK          5    3",      --                             ^  
  "(  37)   23 - LOADK          6    3",      --                                ^  
  "(  37)   24 - CALL           2    5    1", --        armSlider:SetGoal(0, 0, 0)
  "(  38)   25 - GETGLOBAL      2    6",      --        WaitFor(armSlider)
  "(  38)   26 - MOVE           3    1    0", --        WaitFor(armSlider)
  "(  38)   27 - CALL           2    2    1", --        WaitFor(armSlider)
  "(  29)   28 - JMP            0  -23",      --    (this is where the loop starts again: going back 23 instructions, to instruction 6)
  "(  40)   29 - RETURN         0    1    0", --    (end)
  maxstack=7,
  numparams=1
}

For clarity we've added in the translation from bytecode back to Lua code. In order to understand this article it is vital that you understand how the mapping takes place.

A few observations after our changes:

  • We increased the stack size from 6 to 7
  • We reduced the number of instructions reduced from 32 to 29
  • We reduced the number of GETTABLE operations from 7 to 2
  • There are no GETTABLE operations in the loop

The conclusion is that we improved the situation: we've reduced the number of random memory accesses at the cost of just one additional value on the stack. As an exercise to the user: try to map the bytecode back to the Lua code after we make the same changes to StartArmsMoving.

Lua code Byte code
StartArmsMoving = function(self)
    TSeaFactoryUnit.StartArmsMoving(self)
    local armSlider = self.ArmSlider
    if not armSlider then
        armSlider = CreateSlider(self, 'Right_Arm')
        self.ArmSlider = armSlider
        self.Trash:Add(armSlider)
    end
end,
{ table: 202A3D20 
  "(  17)    0 - GETUPVAL       1    0    0",
  "(  17)    1 - GETTABLE       1    1  250",
  "(  17)    2 - MOVE           2    0    0",
  "(  17)    3 - CALL           1    2    1",
  "(  18)    4 - GETTABLE       1    0  251",
  "(  19)    5 - TEST           1    1    1",
  "(  19)    6 - JMP            0   10",
  "(  20)    7 - GETGLOBAL      2    2",
  "(  20)    8 - MOVE           3    0    0",
  "(  20)    9 - LOADK          4    3",
  "(  20)   10 - CALL           2    3    2",
  "(  20)   11 - MOVE           1    2    0",
  "(  21)   12 - SETTABLE       0  251    1",
  "(  22)   13 - GETTABLE       2    0  254",
  "(  22)   14 - SELF           2    2  255",
  "(  22)   15 - MOVE           4    1    0",
  "(  22)   16 - CALL           2    3    1",
  "(  24)   17 - RETURN         0    1    0",
  maxstack=5,
  numparams=1
}

Factoring out GETGLOBAL instructions

All of the changes of this section can be found in b92805e

Factoring out SELF instructions

⚠️ **GitHub.com Fallback** ⚠️