IR Design ‐ Instructions - CompilerProgramming/ez-lang GitHub Wiki

This is part of the series that discusses what you must think about when deciding how to represent IRs.

Linear IR

The discussion here is primarily for linear Intermediate Representations. By linear we mean that the instructions are organized into sequences that are logically executed one after the other. This is in contrast to graph IRs where instructions are organized as a graph and the order of execution is not sufficiently defined by the graph.

Organize Instructions Into Basic Blocks

For many algorithms we need to operate on the control flow graph. It is important to organize the instructions into Basic Blocks, where each BB encapsulates instructions that are executed sequentially. The final instruction in the BB must a jump to some other BB.

public class BasicBlock {
    List<Instruction> instructions;
}

Control Flow Graph

The Basic Blocks should contain links to predecessor and successor BBs - this forms the Control Flow graph. The links are driven by the jump instructions that terminate each BB.

public class BasicBlock {
    List<Instruction> instructions;
    List<BasicBlock> successors;
    List<BasicBlock> predecessors;
}

Instruction Operands

Instructions deal with values, some of those may be Virtual Registers, some Constants, and even Basic Blocks, for example in jump Instructions. We therefore create a type called Operand that represents values handled by an Instruction. Operands can have different subtypes in an OO design (or an Operand can be a union or sumtype in non-OO languages) - where each subtype is for a specific type of value.

class Operand {}
class RegisterOperand extends Operand {}
class ConstantOperand extends Operand {}

Instructions Define and Use Registers

From the optimization point of view, it is important to model the Instruction in such a way that we can easily handle following requirements:

  • Does the instruction define a Register - i.e. assigns a value to it?
  • Does it use Registers defined by other instructions
  • Replace a register that is defined
  • Replace a specific use of a Register - or replace all uses - a Register may be replaced by another Register or by something else such as a Constant.

Whether an Instruction defines a Register and/or uses Values including Registers depends on the Instruction type. So each Instruction type sets its own requirements, however, all Instructions satisfy the interface requirements above, so that the optimizer can treat each instruction uniformly.

class Instruction {
    // Does this instruction define a value?
    boolean definesVar();
    // Get the defined value
    Register def();
    // Get register uses
    List<Register> uses();
    // Replace register that is defined
    void replaceDef(Register newReg);
    // Replace uses
    void replaceUses(Register[] newUses);
    // Replace specific use
    boolean replaceUse(Register oldUse, Register newUse);
    // Replace specific register with constant
    void replaceWithConstant(Register register, ConstantOperand constantOperand);
}

Phis Are Special

Phi instructions also define a Register and use one or more Registers, but Phis do not have the same semantics as other instructions in many situations, such as when computing Liveness. So a consideration is how to model Phi instructions, whether they obey the general interface or have their own specialized interface. In EeZee language we give Phi instructions their own interface, which is similar to the general instruction but ensures that Phis cannot be erroneously manipulated by the optimizer. This has a trade off because there are scenarios where Phis do not require the special treatment.

class Phi extends Instruction {
        // def = value
        Register value();
        void replaceValue(Register newReg);

        // input = use
        int numInputs();
        Operand input(int i);
        Register inputAsRegister(int i);
        boolean isRegisterInput(int i);
        Register[] inputRegisters();
        void replaceInput(int i, Register newReg);
        void removeInput(int i);
}

Instructions Can Be Replaced

The optimizer must be able to:

  • Identify an instruction
  • Replace it with a new Instruction
  • Or delete it

Thus every Instruction must have an identity. It is also helpful for an Instruction to know which BB it lives in.

Define Data Structures

The compiler must be able to operate and manipulate BBs, Instructions, Operands and Registers - so it is important to model these as data structures, and not encode them into a serialized form until the end. Some languages such as Lua directly encode instructions - this is only suitable for single pass compilers that do not do optimizations.

Def Use Chains

Def use chains link Instructions to Registers - in SSA form, each Register can have a set of Instructions that use it. In non-SSA form, a Register may have multiple definitions and each definition its own set of Uses. Think how this will be handled.

Walking Basic Blocks

Basic Blocks may need to be traversed in specific order such as Reverse Post Order (RPO). To support this auxiliary information has to be recorded for each Basic Block.

Dominator Trees and Dominance Frontiers

The optimizer will most likely need to compute and maintain Dominator Trees and Dominance Frontiers. The algorithms that compute these and the data structures for maintaining this information will typically be attached to Basic Blocks.

It is useful for Basic Blocks to be assigned a unique ID from a densely packed sequence of integers, this makes for a good way to generate names for the BBs, but also can be used to key into auxiliary data structures.

public class BasicBlock {
    /**
     * Unique BB ID
     */
    int bid;

    /**
     * The preorder traversal number.
     */
    int pre;
    /**
     * The depth of the BB in the dominator tree
     */
    int domDepth;
    /**
     * Reverse post order traversal number.
     */
    int rpo;
    /**
     * Immediate dominator is the closest strict dominator.
     */
    BasicBlock idom;
    /**
     * Nodes for whom this node is the immediate dominator,
     * thus the dominator tree.
     */
    List<BasicBlock> dominatedChildren;
    /**
     * Dominance frontier
     */
    Set<BasicBlock> dominationFrontier;
}
⚠️ **GitHub.com Fallback** ⚠️