Using Geometric Encoding based Context Sensitive Points to Analysis (geomPTA) - VictoryWangCN/soot GitHub Wiki

Contributed by Xiao Xiao ([email protected])

Introduction

geomPTA is a context sensitive points-to analysis based on SPARK. It uses the call graph generated by SPARK to build the full context sensitivity model for subsequent analysis. The cycles on call graph are handled by a special form of 1CFA to gain better precision. This is necessary because Java programs intend to have large cycles. The full algorithm details can be found in our technical report, where the evaluation results are only for reference because we are continuing to improve geomPTA.

Running geomPTA

Specifying the options "cg.spark enabled:true" and "cg.spark geom-pta:true" can start geomPTA. An note is that the option "cg.spark simplify-offline" should be false. A useful option is "cg.spark geom-runs", which is 1 by default. This option controls how many times the geomPTA will be iterated, where more iterations mean better precision. Usually, setting "cg.spark geom-runs:2" can obtain high precision and that does not cost 2X running time.

By default, geomPTA does not compute the context sensitive points-to information for all variables that are seen by SPARK. First, geomPTA performs a local equivalence merging to merge the pointers that are local to the same function and have same points-to information. This process is similar to the OVS algorithm. Second, geomPTA scans the whole program and artificially marks the functions in the packages with the names starting by "java.", "sun.", and etc. (For full list, please see function isJavaLibraryClass in class SootClass) as library functions. For rest of the functions, they are called application functions, although not all are really from user's code. We call the pointers in library functions that could impact the points-to information of pointers in application code supporting pointers and the pointers in application code core pointers. In order to soundly refine the call graph, the base pointers at virtual callsites are also marked as core pointers.

When geomPTA finished, only the core pointers gain the context sensitive points-to information. For non-core pointers, only the context insensitive points-to information computed by SPARK is available. If you want to change this default behavior to compute points-to information for all pointers, please set the option "cg.spark geom-app-only:false".

geomPTA also updates the call graph computed by SPARK, which can be obtained by Scene.v().getCallGraph(). After geomPTA, the virtual calls (including the interface calls) will be refined with up-to-date points-to information. The spurious call edges will be removed and the unreachable methods will also be cleaned. Generally speaking, the refined call graph will be much more precise than that generated by SPARK.

Note: If you iterate the call graph edges obtained by Scene.v().getCallGraph().listener(), the call edges removed in the updated call graph may also be present in the iterator. This is because all the edges ever added to the call graph are tracked.

Querying

Querying with SPARK interface

SPARK provides a set of overloading functions of reachingObject for querying points-to information of a given local variable, global variable, and an object field. The querying result is saved in PointsToSet, an interface that defines basic function for accessing the points-to result. However, it doesn't permit programmers to iterate the objects contained in the set. To do this, you can cast a PointsToSet object returned by reachingObject to the type PointsToSetInternal, which has a forall utility function that accepts a user defined visitor. Sample code for querying points-to information for pointer l is as follows:

PointsToAnalysis pta = Scene.v().getPointsToAnalysis();
GeomPointsTo geomPTA = (GeomPointsTo)pta;
PointsToSetInternal pts = (PointsToSetInternal)geomPTA.reachingObjects(l)

pts.forall( new P2SetVisitor() {

    public void visit(Node n) {

        // Do what you like with n, which is in the type of AllocNode

    }

});

If only SPARK is enabled, the cast GeomPointsTo geomPTA = (GeomPointsTo)pta should be changed to PAG sparkPTA = (PAG)pta.

Querying with geomPTA interface

In addition to the SPARK querying interface, geomPTA also provides its own interface soot.jimple.spark.geom.geomPA.GeomQueries for querying context sensitive points-to information in more sophisticated usage scenarios. In the following, we assume:

  1. The pointer is l;
  2. The enclosing function of l is func(l).

k-CFA query

The most common way for specifying context for l is using the last K call edges to func(l). We provide two functions for this kind of query:

public boolean contextsByCallChain(Edge[] callEdgeChain, Local l, PtSensVisitor visitor)

public boolean contextByCallChain(Edge[] callEdgeChain, Local l, SparkField field, PtSensVisitor visitor)

The two functions are similar except that the first function queries variable l and the second function queries the expression l.field. The call edges given in callEdgeChain are in the order that callEdgeChain[0] is the farthest call edge in the chain and callEdgeChain[k-1] is direct call edge of func(l). The last parameter visitor is a container that stores the querying result. Usually, you can use following code to create a container:

PtSensVisitor visitor = new Obj_full_extractor();

Query with sketched contexts

For this query, we specify any edge e and use all the paths from main to func(l) that pass e as the contexts for l. The functions related to this query are:

public boolean contexsByAnyCallEdge( Edge sootEdge, Local l, PtSensVisitor visitor )

public boolean contextsByAnyCallEdge(Edge sootEdge, Local l, SparkField field, PtSensVisitor visitor)

This query is special to the full context model, because we do not know how far is from the given edge e to func(l). Therefore, it is hard to answer it with a kCFA analysis as implemented in Paddle. One application of this query is proving that, can a given library function call could call back to application code? To answer the query, we first collect all candidate call back callsites reachable from the given entry library call. Then, for each candidate call back site p.foo(...), we check if the virtual call p.foo can resolve to an application function, considering the points-to information of p under the contexts that must include the given entry callsite. Sample code for this application can be found here.

Is alias query

The most common query is deciding if pointer l1 is an alias of pointer l2 under any contexts. We provide three functions for this query:

public boolean isAliasCI(Local l1, Local l2)

public boolean isAlias(IVarAbstraction pn1, IVarAbstraction pn2)

public boolean isAlias(Local l1, Local l2)

Of course, we can also test if two pointers can point to same object under specified contexts. For this purpose, we should borrow the querying interfaces given above and use the hasNonEmptyIntersection function provided by the class PtSensVisitor. Sample code is:

PtSensVisitor pts1 = new Obj_full_extractor();

PtSensVisitor pts2 = new Obj_full_extractor();

if (contexsByAnyCallEdge( Edge e1, Local l1, PtSensVisitor pts1 ))

    if (contexsByAnyCallEdge( Edge e2, Local l2, PtSensVisitor pts2 ))

         boolean ans = pts1.hasNonEmptyIntersection(pts2)

In the code, ans == true means the two pointers form an alias.

Conclusion and remark

geomPTA is still under development and please always use the develop version of soot to gain newest features. Currently, geomPTA can easily analyze subjects with JDK 1.7 and be aware of Android multi-threading. For any bugs or ideas, please contact me.