UserGuide:IR - nimakarimipour/WALAWiki GitHub Wiki


title: UserGuide:IR permalink: /UserGuide:IR/

wala.core technical overview

Overview

The WALA IR (Intermediate Representation) is the central data structure that represents the instructions of a particular method. The IR represents a method's instructions in a language close to JVM bytecode, but in an SSA-based register transfer language which eliminates the stack abstraction, relying instead on a set of symbolic registers. The IR organizes instructions in a control-flow graph of basic blocks, as typical in compiler textbooks.

The IR is immutable; it does not support program transformation, and WALA does not provide support for code generation from an IR. The philosophy is that the IR could be used in a pass that generates analysis information, to be used in support of some other tool, such as an IDE or compiler. (It is possible to do program transformation at the bytecode level in WALA using the shrike package, but not directly from the IR discussed here). Typically, an analysis will set up various structures and maps from IR constructs to relevant analysis information, such as abstractions and dataflow facts.

IR Basics

The WALA IR class provides an SSA based intermediate representation of a particular method. The IR consists of a control-flow graph of basic blocks, along with a set of instructions. Try the PDFWalaIR example program to see what an IR looks like.

The current IR implementation is somewhat convoluted, due to historical back-breaking for space efficiency. (it may change in the future). Rather than deal with the underlying implementation, you should try to access the IR via its public interfaces. For example, the following should be most useful:

If you care about the order in which you iterate over the instructions, you should loop over the basic blocks, and then iterate over the instructions in each basic block.

Usually, you will build a callgraph, and then acquire the IR for a particular node via CGNode.getIR(). Note that WALA might build different IRs for a single IMethod, depending on the context represented by a CGNode. For example, each clone node represents the clone IMethod in a context which defines the receiver type. The IR for each clone node is specialized with respect to this context.

Control Flow Graph

Exceptional edges are added fairly naively, using only declared types to determine exceptional flow, and assuming every Potentially Excepting Instruction (PEI) might throw exceptions. There is no smart optimization to remove infeasible exceptional control flow.

Occasionally you may wish to tweak your view of the CFG. For example, perhaps a client wants a view of the CFG that ignores exceptional edges. For these purposes, check out the class PrunedCFG.

To do some basic navigation of a CFG, check out the Util class in com.ibm.wala.cfg. This class gives utilities to find, for example, the taken and not-taken successors of conditional branches.

Value Numbering

The variables in an IR each have a unique id, called the value number. So you can think of the variables in an IR having the names v1, v2, v3, .... .

By convention, the value numbers for a method start at 1. So for a non-static method, v1 represents the this parameter. The parameters to the method are next (v2, v3, etc). For a static method, v1 represents the first parameter.

Most SSAInstructions define at most one variable, and use some number of variables. Each instruction carries the numbers of the variables it defs and uses, accessible through [http://wala.sourceforge.net/javadocs/trunk/com/ibm/wala/ssa/SSAInstruction.html#getDef() getDef()] and getUses(). Actually, SSAInvokeInstructions may additionally def a second variable, representing the exceptional return value.

Conversely, each variable (value number) will have exactly one def and some number of uses. Use the DefUse class to find the instructions that def and use a particular value number. If DefUse returns null as the def for a value number, that value number represents either a parameter or a constant. See below for information regarding constants.

During SSA translation, stack locations and locals are both translated into symbolic virtual registers. For a given SSA value number and instruction index, you can use [http://wala.sourceforge.net/javadocs/trunk/com/ibm/wala/ssa/IR.html#getLocalNames(int,%20int) IR.getLocalNames()] to find out the names of the corresponding locals if available. You can look at [http://wala.sourceforge.net/javadocs/trunk/com/ibm/wala/ssa/SSABuilder.htmlSSABuilder.SSA2LocalMap] to see how the IR keeps track of this information internally.

Type inference

You can discover types for value numbers in an IR using the TypeInference class. E.g.:

    IR ir = ...;
    boolean doPrimitives = ...; // infer types for primitive vars?
    TypeInference ti = TypeInference.make(ir, doPrimitives);
    TypeAbstraction type = ti.getType(vn);

Constant Values

Primitive and String constants are represented in the IR as follows. Each constant has a corresponding variable, but there is no explicit statement in the IR assigning the constant to the variable. To discover constants, use the IR's SymbolTable, obtained through calling [http://wala.sourceforge.net/javadocs/trunk/com/ibm/wala/ssa/IR.html#getSymbolTable() IR.getSymbolTable()]. SymbolTable.isConstant() will tell you if a particular variable represents a constant (pass in the value number of the variable as the parameter), and SymbolTable.getConstantValue() will give you the represented value.

From IR to bytecode?

WALA is not a compiler; it does not have a code generating back end and the IR is not designed for transformations. We do not know of any implementation that generates bytecode directly from the WALA SSA IR.

A plausible option is to do the analysis on the IR, map the analysis results back to the Shrike representation, and do bytecode transformations in Shrike.