Class loading in Soot - VictoryWangCN/soot GitHub Wiki

These are some notes on how class loading / Jimplification works in Soot 2.5.*

Loading Classes

Here are some notes from Ondrej Lhotak on how this works...

The design of coffi requires that in order to bring some class C up to level N, you must first bring all classes that C references up to level N-1. In order to know which classes C references, you need to bring C up to level N-1.

(In the above, the word "references" is overloaded: the definition of what a class "references" changes depending on which level you are trying to resolve it to.)

The system of worklists is designed to implement these rather tricky constraints.

References can be recursive: C can reference C', and C' can also reference C. The private bringToX methods are called from coffi when resolving C. If they tried to resolve C' directly, then the process of resolving of C' would then try to recursively resolve C, and you would have infinite recursion.

The original Raja/Clark design of coffi and the SootResolver required that you resolved all known classes to SIGNATURES first. It was then safe to resolve any class of interest to BODIES. As class libraries grew larger, resolving everything to SIGNATURES got very slow. (As well, phantom classes were needed vrey frequently.) Running Soot on a single class, even in non-whole-program mode, could easily take 30 seconds or more. This was deemed unacceptable for the abc project. The options were to rewrite coffi from the ground up (and possibly even change the Jimple IR), or to build this complicated resolver to work around the limitations of coffi's original design. We settled on the second option.

The -full-resolver option brings back the old resolve-everything algorithm, if you're feeling nostalgic...

Loading Bodies

Method bodies are loaded at the following locations:

  • If -w is enabled: In soot.jimple.toolkits.callgraph.OnFlyCallGraphBuilder.processNewMethod, for every method that is deemed to be reachable in the call graph, starting from the given entry points.
  • In any case: In soot.PackManager.retrieveAllBodies for all application classes. If -w is given with a single argument class then this will only load the static initializer of this class.
  • If pre-jimplify is on for Spark then in soot.jimple.spark.builder.ContextInsensitiveBuilder.preJimplify. When this option is set to true, Spark converts all available methods to Jimple before starting the points-to analysis. This allows the Jimplification time to be separated from the points-to time. However, it increases the total time and memory requirement, because all methods are Jimplified, rather than only those deemed reachable by the points-to analysis.

Some notes by Eric on memory consumption

Whenever loading a class into Soot, even when it's only loaded to HIERARCHY or SIGNATURES level, Coffi creates a memory-expensive CoffiClass object that represents the constant pool etc. in parsed format. References to this CoffiClass are retained in CoffiMethodSource objects such that if one later wants to retrieve the Jimple for that method all this parsed data is still present. We should see if this mechanism cannot be changed such that we create hierarchy and signature information much more cheaply, and such that we can re-create it on the fly if really needed to load bodies later-on.