Optimizer - nponeccop/HNC GitHub Wiki

The optimizer consists of:

  • Optimizer Facade
  • Transformation Lifter
  • Inbound Edges Counter
  • Inliner

The Facade

The facade exports single optimize :: Definition -> Definition function transforming untyped terms.

Note that the optimizer doesn't support full terms yet - only terms consisting of a single top-level definition.

The Lifter

The lifter (see HN.Optimizer.WithGraph) lifts graph transformations to term transformations. It transforms a term to a graph, applies the passed transformation and transforms the result back to term form.

So the major components of the Lifter are Graph Compiler (see HN.Optimizer.GraphCompiler) and Graph Decompiler (see HN.Optimizer.GraphDecompiler).

Graph compilation maps hierarchical identifier space to flat space of graph node labels. Graph decompilation performs the reverse mapping. However, the compiler only stores a simple bidirectional flat map (see HN.Optimizer.LabelFor). The scopes are reconstructed by The Decompiler using HOOPL built-in Dominator analysis. So, the decompiler automatically performs a Let Floating transformation - all lets are moved as deeply as possible. Also, as the Decompiler (and HOOPL in general) only visits the nodes reachable from entry nodes, we get some dead code elimination for free.

As our IR is not a sequence of simple assembly-level instructions, it is not clear how to transform our terms into graphs.

The Hoopl paper suggests: "In a higher-level intermediate form, a node might represent a simple statement".

As we don't support statement sequences yet, our HOOPL blocks are degenerate: they consist of a single statement. Logically, it means a single closed-closed HOOPL node. However, HOOPL seems not to support such degenerate blocks. It might be possible to construct it by hand using low-level operations, but no high-level helper functions exist for the C C case.

So our blocks consist of 2 nodes (see HN.Optimizer.Node): a dummy Entry :: Label -> Node C O and a real Exit :: DefinitionNode -> Node O C.

data DefinitionNode
	= LetNode [Label] (Expression Label)
	| ArgNode
	| LibNode

As you can see, there are 3 subtypes of nodes, and by extension blocks.

LetNode represents "in" expressions of definitions. An expressions can refer to formal arguments, to other definitions (either local or outer) and to library definitions written in C++.

As described above, scope structure is made flat, so it doesn't matter whether a definition is local or outer: references to both of them are represented by Label, and bodies by LetNode.

The [Label] part represent the formal arguments of a definition. So all its labels must refer to 'ArgNode' blocks (blocks with real node containing an ArgNode). Note that we cannot infer this list from the Expression member: arguments can be used by expression in any order, and some arguments can be even unused.

We cannot trivially strip unused formal arguments as we strip unused definitions: we must also remove corresponding actual arguments from all call sites, and we must check that the function is never passed as an argument to another function. That requires writing a separate optimization pass.

The Counter

To inline a definition, we first count references to the definition. As we chose the references to be edges in the HOOPL control graph, we must find out how many edges enter each block.

This information can be obtained polymorphically for any HOOPL graph, as its purely graph-theoretic and doesn't depend on internal structure of nodes. So the corresponding Hoopl pass

(see HN.Optimizer.Inbound.passF) can be made polymorphic in node type similarly to Compiler.Hoopl.Passes.Dominator.domPass:

domPass :: (NonLocal n, Monad m) => FwdPass m n Doms

This pass can then be included in HOOPL collection of useful universal passes (see Compiler.Hoopl.Passes.*)

The pass doesn't perform any rewrites.

The Inliner

The Inliner uses the facts obtained from the Counter to inline definitions by performing rewrites. It also updates the facts as rewrites are performed. It is an open question whether a single deep rewriting by inliner is enough, or we should perform Counter and Inliner passes until a fixed point is reached.

The problem is that inlining is a forward rewriting pass, but counting is a backward analysis.

The HOOPL paper states:

"In general, dataflow analysis establishes properties of paths:

  • An assertion about all paths to a program point is established by a forward analysis.
  • An assertion about all paths from a program point is established by a backward analysis."

Another view is that:

  • During a forward pass, information propagates from the entry point(s) downward the control graph
  • During a backward pass, information propagates upward the control graph to the entry point(s)

At two sides of an edge we have a definition and one of its call sites. So:

  • Forward passes collect information at call sites, propagates it to definitions and aggregate it there (using the lattice join)
  • Backward passes collect information at definitions, propagate it to call sites and aggregate it there

The Counter pass collects call count at call sites and aggregates partial call counts from each call site to get a total count, so it's a Forward pass.

The Inliner pass collects definition bodies at definitions and propagates them to call sites, so it's a Backward pass. The aggregation step is trivial, as all bodies are the same.

In fact, two pieces of data must be passed to call sites: definition bodies and a flag whether they should be inlined.

I use a trick to combine them:

type ListFact = WithTopAndBot DefinitionNode

listLattice = addPoints' "ListFact" undefined

These lines create a lattice containing:

  • all values of DefinitionNode type
  • special Top value as the unique top element
  • special Bottom value as the unique bottom element

WithTopAndBot defines the type of lattice values, and addPoints' implements the join operation for the lattice.

Here is some laws from lattice theory:

  • join x y = join y x
  • join x Bot = x
  • join x Top = Top

So most joins can be implemented automatically by addPoints'. We only must provide an implementation of join if both x and y are "middle" elements, that is, if they are both DefinitionNode. The middle elements are called PElem in HOOPL parlance.

In our case such join can never happen, so we pass undefined to addPoints'.

As an initial fact base we take facts collected by Counter and modify them:

intitalFactBase = mapMap modify counterResult where
	modify 1 = Bot
	modify _ = Top

So, if a definition identified by label can be inlined, we have Bot in the initial fact base, and if a defintion cannot be inlined, we have Top. So our initial fact base only contains Top or Bottom for each definition.

Then our transfer function unconditionally returns definition body, that is, a middle lattice element.

This fact gets immediately (even before it reaches call site) aggregated with the default fact, so:

if a definition can be inlined, we have join Bot definitionBody = definitionBody if a definition cannot be inlined, we have join Top definitionBody = Top

So at this step the Bot is eliminated: only definitionBody or Top can be propagated to call sites and joined there. A fascinating fact is that at the call sites only one type of joins can happen - join Top Top, implemented by addPoints'. definitionBody is never joined with anything because we found that there is only one edge.

So after aggregation step the following arrives to call sites:

  • Top if a definition cannot be inlined
  • definitionBody if a definition can be inlined

So the rewrite function has all information to perform a rewrite: both the condition and the definition body to inline.