UserGuide:IR - nimakarimipour/WALAWiki GitHub Wiki
title: UserGuide:IR permalink: /UserGuide:IR/
Overview
The WALA IR (Intermediate Representation) is the central data structure that represents the instructions of a particular method. The IR represents a method's instructions in a language close to JVM bytecode, but in an SSA-based register transfer language which eliminates the stack abstraction, relying instead on a set of symbolic registers. The IR organizes instructions in a control-flow graph of basic blocks, as typical in compiler textbooks.
The IR is immutable; it does not support program transformation, and WALA does not provide support for code generation from an IR. The philosophy is that the IR could be used in a pass that generates analysis information, to be used in support of some other tool, such as an IDE or compiler. (It is possible to do program transformation at the bytecode level in WALA using the shrike package, but not directly from the IR discussed here). Typically, an analysis will set up various structures and maps from IR constructs to relevant analysis information, such as abstractions and dataflow facts.
IR Basics
The WALA
IR
class provides an SSA based
intermediate representation of a particular method. The IR consists of a
control-flow
graph
of basic
blocks,
along with a set of
instructions.
Try the
PDFWalaIR
example program to see what an IR looks like.
The current IR implementation is somewhat convoluted, due to historical back-breaking for space efficiency. (it may change in the future). Rather than deal with the underlying implementation, you should try to access the IR via its public interfaces. For example, the following should be most useful:
- [http://wala.sourceforge.net/javadocs/trunk/com/ibm/wala/ssa/IR.html#iterateAllInstructions()
IR.iterateAllInstructions()]
: return all the instructions, in an undefined order. Useful for flow-insensitive analysis. - [http://wala.sourceforge.net/javadocs/trunk/com/ibm/wala/ssa/IR.html#getControlFlowGraph()
IR.getControlFlowGraph()
]: return the control-flow graph, a graph of IBasicBlock. - [http://wala.sourceforge.net/javadocs/trunk/com/ibm/wala/util/graph/NodeManager.html#iterateNodes()
IR.getControlFlowGraph().iterateNodes()
]: return an iteration of the nodes (basic blocks) in a control-flow graph. - [http://wala.sourceforge.net/javadocs/trunk/com/ibm/wala/cfg/IBasicBlock.html#iterateAllInstructions()
IBasicBlock.iterateAllInstructions()
]: iterate all the instructions in a particular basic block.
If you care about the order in which you iterate over the instructions, you should loop over the basic blocks, and then iterate over the instructions in each basic block.
Usually, you will build a callgraph, and then acquire the IR for a
particular node via CGNode.getIR(). Note that WALA might build different
IRs for a single IMethod, depending on the context represented by a
CGNode. For example, each clone
node represents the clone
IMethod in
a context which defines the receiver type. The IR for each clone
node
is specialized with respect to this context.
Control Flow Graph
Exceptional edges are added fairly naively, using only declared types to determine exceptional flow, and assuming every Potentially Excepting Instruction (PEI) might throw exceptions. There is no smart optimization to remove infeasible exceptional control flow.
Occasionally you may wish to tweak your view of the CFG. For example,
perhaps a client wants a view of the CFG that ignores exceptional edges.
For these purposes, check out the class PrunedCFG
.
To do some basic navigation of a CFG, check out the Util
class in
com.ibm.wala.cfg
. This class gives utilities to find, for example, the
taken and not-taken successors of conditional branches.
Value Numbering
The variables in an IR each have a unique id, called the value number. So you can think of the variables in an IR having the names v1, v2, v3, .... .
By convention, the value numbers for a method start at 1. So for a
non-static method, v1 represents the this
parameter. The parameters to
the method are next (v2, v3, etc). For a static method, v1 represents
the first parameter.
Most
SSAInstructions
define at most one variable, and use some number of variables. Each
instruction carries the numbers of the variables it defs and uses,
accessible through
[http://wala.sourceforge.net/javadocs/trunk/com/ibm/wala/ssa/SSAInstruction.html#getDef()
getDef()
] and
getUses()
.
Actually,
SSAInvokeInstructions
may additionally def a second variable, representing the exceptional
return value.
Conversely, each variable (value number) will have exactly one def and
some number of uses. Use the
DefUse
class to find the instructions that def and use a particular value
number. If DefUse
returns null
as the def for a value number, that
value number represents either a parameter or a constant. See below for
information regarding constants.
During SSA translation, stack locations and locals are both translated
into symbolic virtual registers. For a given SSA value number and
instruction index, you can use
[http://wala.sourceforge.net/javadocs/trunk/com/ibm/wala/ssa/IR.html#getLocalNames(int,%20int)
IR.getLocalNames()
] to find out the names of the corresponding locals
if available. You can look at
[http://wala.sourceforge.net/javadocs/trunk/com/ibm/wala/ssa/SSABuilder.htmlSSABuilder.SSA2LocalMap
]
to see how the IR keeps track of this information internally.
Type inference
You can discover types for value numbers in an IR using the
TypeInference
class. E.g.:
IR ir = ...;
boolean doPrimitives = ...; // infer types for primitive vars?
TypeInference ti = TypeInference.make(ir, doPrimitives);
TypeAbstraction type = ti.getType(vn);
Constant Values
Primitive and String constants are represented in the IR as follows.
Each constant has a corresponding variable, but there is no explicit
statement in the IR assigning the constant to the variable. To discover
constants, use the IR's
SymbolTable
,
obtained through calling
[http://wala.sourceforge.net/javadocs/trunk/com/ibm/wala/ssa/IR.html#getSymbolTable()
IR.getSymbolTable()
].
SymbolTable.isConstant()
will tell you if a particular variable represents a constant (pass in
the value number of the variable as the parameter), and
SymbolTable.getConstantValue()
will give you the represented value.
From IR to bytecode?
WALA is not a compiler; it does not have a code generating back end and the IR is not designed for transformations. We do not know of any implementation that generates bytecode directly from the WALA SSA IR.
A plausible option is to do the analysis on the IR, map the analysis results back to the Shrike representation, and do bytecode transformations in Shrike.