Adding attributes to class files (Advanced) - VictoryWangCN/soot GitHub Wiki

Soot can annotate class files: for instance, it can add information about which array bounds checks and null pointer checks are redundant. We anticipate that users of Soot may wish to add new attributes to class files. This tutorial uses the array bounds check attribute to illustrate the internal structure of Soot annotation and describes how to add new attributes via Soot. Before reading this tutorial, readers should be familiar with the basic Soot classes, like SootClass, SootField, SootMethod, Body, and Unit. Have a look into the tutorial Fundamental Soot Objects if you do not yet know about these classes.

This tutorial dives deep into the details of annotations. If you want to get a brief overview have a look at this tutorial.

Structure of annotation

The general description of our annotation framework can be found in our CASCON paper, A Framework for Optimizing Java Using Attributes. This tutorial explains practical issues related to implementing new attributes.

We first introduce the classes used for annotation. These classes reside in the soot.tagkit package.

Host

Any class which implements the Host interface promises that it can hold tags.

public interface Host
{
    /** Get a list of tags associated with the current object. */
    public List getTags();

    /** Returns a tag with the given name. */
    public Tag getTag(String aName);

    /** Adds a tag. */
    public void addTag(Tag t);

    /** Removes a tag with the given name. */
    public void removeTag(String name);

    /** Returns true if this host has a tag with the given name. */
    public boolean hasTag(String aName);
}

AbstractHost

Soot provides a default implementation of the Host interface in the form of AbstractHost. Unless you have a pressing desire to provide the functionality yourself, any classes which you would like to implement Host should subclass AbstractHost. In Soot, the classes SootClass, SootField, SootMethod, Body, and Unit inherit from AbstractHost. Instances of these classes know how to carry tags, in the form of the Tag interface.

Tag

In Soot, we represent any annotation by an object whose type implements the Tag interface. This interface defines one methods: getName() returns the unique name of the tag (note that tag names must not conflict with each other).

public interface Tag
{
    /** Returns the tag's name. */
    public String getName();
}

Attribute

The Attribute interface extends Tag; it promises that the associated tag has attribute-like data which can be read and written as an array of bytes.

public interface Attribute extends Tag
{
    /** Returns the tag's raw data. */
    public byte[] getValue() 
        throws AttributeValueException;

    /** Sets the value of the attribute from a byte[]. */
    public void setValue(byte[] v);
}

A Tag which is not an Attribute could be used to store arbitrary Soot information about a Host. An Attribute is something that would go in a class file.

TagAggregator

The array-bounds check analysis annotates individual instructions as it discovers whether or not their bounds checks are required. More generally, analyses will attach attributes directly to the Units in question. However, the Java class file structure does not make any provisions for directly attaching attributes to bytecodes, and attaching attributes directly to bytecodes would in any case be inefficient. Hence, we designed our attributes so that they would attach to a method in a tabular format: only one actual attribute is required per method body and tag type; this meta-attribute contains information about a number of different instructions.

An implementation of TagAggregator promises that it can combine all tags of some type into one big aggregated attribute, which can be attached to a method's code attribute. One implementation of a TagAggregator is soot.jimple.toolkits.annotation.tags.ArrayNullTagAggregator.

One of the things that an aggregator must do is decide which bytecode instructions a tag will be associated with, since each Jimple statement may turn into several bytecode instructions. Two subclasses of TagAggregator, ImportantTagAggregator and FirstTagAggregator, attach the tags to important instructions containing field or array references or method invocations, and to the first bytecode instruction for the Jimple statement, respectively.

Base64

This utility tool class allows the encoding of raw bytes to base64-encoded characters and the decoding of base64 characters back to raw bytes.

JasminAttribute

Attributes are generated by analysis phases in the form of strings containing labels in the unit body and their values; for instance, we might have the attribute "%label2%Aw==%label3%Ag==%label4%Ag==" associated with a method body. In order to include this attribute in a class file, exact PC values are needed for the labels.

The JasminAttribute class provides a decode method which takes a string of (label, value) pairs and a map from labels to PCs and emits raw data, ready for inclusion in a class file. This method is called by Jasmin after the PC values are known. Any attribute which uses (label, value) pairs can subclass JasminAttribute to get output to class files for free; other attributes hoping to be output to class files must subclass JasminAttribute and override the decode method.

The abstract getJasminValue() method must return a string that can be included when outputting a .jasmin file. This string later gets decoded by decode().

public abstract class JasminAttribute implements Attribute
{
    public static byte[] decode(String attr, Hashtable labelToPc);
    abstract public String getJasminValue(Map instToLabel);
}

CodeAttribute

This class provides an implementation of the abstract getJasminValue() method of JasminAttribute. The getJasminValue() method must return a string reflecting the contents of its CodeAttribute. It may use the provided instToLabel map to convert Units into labels used in the returned String.

public class CodeAttribute extends JasminAttribute
{
    public String getJasminValue(Map instToLabel);
}

This type of attribute is clearly intended to be used for attributes associated with code.

GenericAttribute

Java describes how three other types of attributes can be created: attributes may be associated with methods, fields and classes as well as code. Soot supports these attributes via the GenericAttribute class. Any such attribute can be created with an attribute name and a byte array value; it can then be attached to SootClass, SootField, or SootMethod.

public class GenericAttribute implements Attribute
{
    public GenericAttribute(String name, byte[] value);
    public String getName();
    public byte[] getValue();
}

The above classes provide APIs useful for adding new attributes. Soot attributes are represented as Tags, and are attached to Hosts. An exception is CodeAttribute. Because the tags for CodeAttribute are attached to Units, a TagAggregator is used to combine them. TagAggregators are instances of BodyTransformers, and are generally included in the tag pack.

Adding method attributes in Soot

Adding a code attribute is non-trivial, as it requires that an aggregator be provided. We first give a trivial example of adding a method attribute via GenericAttribute. The code can be found in ashes.examples.addattributes. It can also be downloaded here.

We proceed by adding a new phase to the jtp Pack, called annotexample.

public class AnnExample
{
    public static void main(String[] args)
    {
        /* adds the transformer. */
        PackManager.v().getPack("jtp").add(new
                        Transform("jtp.annotexample",
                        AnnExampleWrapper.v()));

        /* invokes Soot */
        soot.Main.main(args);
    }
}

The AnnExampleWrapper is a subclass of BodyTransformer, which implements the internalTransform method. It simply adds a string "Hello world!" as an attribute to every method. The attribute has the name "Example".

public class AnnExampleWrapper extends BodyTransformer
{
    private static AnnExampleWrapper instance =
        new AnnExampleWrapper();

    private AnnExampleWrapper() {};

    public static AnnExampleWrapper v()
    {
        return instance;
    }

    public void internalTransform(Body body, String phaseName, Map options)
    {
        SootMethod method = body.getMethod();
        String attr = new String("Hello world!");
        
        Tag example = new GenericAttribute("Example", attr.getBytes());
        method.addTag(example);
    }
}

We recompile foo and annotate it with new attributes. java AnnExample foo The annotated class file has an "Example" attribute for each method. The string "Hello world!" is in binary form.

public class foo extends java.lang.Object
filename                foo
compiled from           foo.jasmin
compiler version        45.3
access flags            33
constant pool           14 entries
ACC_SUPER flag          true

Attribute(s):
        SourceFile(foo.jasmin)

2 methods:
        public void <init>() <(Unknown attribute Example: 48 65 6c 6c 6f 20 77 6f 72 6c 64 21)>
        void footest() <(Unknown attribute Example: 48 65 6c 6c 6f 20 77 6f 72 6c 64 21)>

public void <init>() <(Unknown attribute Example: 48 65 6c 6c 6f 20 77 6f 72 6c 64 21)>
Code(max_stack = 1, max_locals = 1, code_length = 5)
0:    aload_0
1:    invokespecial        java.lang.Object.<init> ()V (2)
4:    return

void footest() <(Unknown attribute Example: 48 65 6c 6c 6f 20 77 6f 72 6c 64 21)>
Code(max_stack = 3, max_locals = 1, code_length = 7)
0:    iconst_2
1:    newarray                <int>
3:    iconst_0
4:    iconst_1
5:    iastore
6:    return

The Array Bounds Check Annotation Example

In this section, we will use the array bounds check attribute to illustrate the process of creating a new code attribute.

The classes in this example are located in the soot.jimple.toolkits.annotation.arraycheck and .nullcheck packages.

Clearly we must be able to represent whether or not an array reference is safe. To do this, we first created the ArrayCheckTag class implementing (a subclass of) Tag. It is not an Attribute because the information is not in a form suitable for adding to a class file and setting the information directly is meaningless. ArrayCheckTag has a constructor with boolean parameters representing upper and lower array bounds checks. If a parameter is true, the respective bound check is needed. The getValue() method converts the boolean values to a byte value where the lowest two bits represent the bounds checks.

/** 
 * This tag represents the two bounds checks of an array reference.
 * The value <code>true</code> indicates that a check is needed.
 */
public ArrayCheckTag(boolean lower, boolean upper)
{
    lowerCheck = lower;
    upperCheck = upper;
}

/** Returns the value of this tag as a one-byte array for inclusion in
 * the class file. */
public byte[] getValue()
{
    byte[] value = new byte[1];

    value[0] = 0;
    
    if (lowerCheck)
        value[0] |= 0x01;
    
    if (upperCheck)
        value[0] |= 0x02;

    return value;
}

We designed an algorithm to analyze array bounds checks. The final phase of this algorithm attaches the analysis results to the various units as tags. This is accomplished with the following code:

Tag checkTag = new ArrayCheckTag(lowercheck, uppercheck);
stmt.addTag(checkTag);

As previously explained, code tags are attached to Units, but Units themselves do not have attributes. Thus, an aggregator is needed to group the attributes. Now, a null pointer check elimination algorithm has already executed, attaching NullCheckTags to Units. An ArrayNullTagAggregator will collect the NullCheckTags and ArrayCheckTags, combining these two tags into a single ArrayNullCheckTag per method body.

public class ArrayNullCheckTag implements OneByteCodeTag
{
    private final static String NAME = "ArrayNullCheckTag";
    public String getName();
    public byte[] getValue();
    public byte accumulate(byte other);
}

The ArrayNullTagAggregator implements the TagAggregator interface. It is called while Baf is generating its backend code. The wantTag() method returns true for tags that are to be considered by this aggregator. The considerTag method accumulates one (unit, tag) pair, typically encountered during Baf's traversal of the units.

public class ArrayNullTagAggregator extends TagAggregator
{    
    public ArrayNullTagAggregator( Singletons.Global g ) {}
    public static ArrayNullTagAggregator v() { return G.v().ArrayNullTagAggregator(); }

    public boolean wantTag( Tag t ) {
      return (t instanceof OneByteCodeTag);
    }
    public void considerTag(Tag t, Unit u)
    {
        Inst i = (Inst) u;
        if(!( i.containsInvokeExpr() || i.containsFieldRef() || i.containsArrayRef())) return;

        OneByteCodeTag obct = (OneByteCodeTag) t;

        if( units.size() == 0 || units.getLast() != u ) {
            units.add( u );
            tags.add( new ArrayNullCheckTag() );
        }
        ArrayNullCheckTag anct = (ArrayNullCheckTag) tags.getLast();
        anct.accumulate(obct.getValue()[0]);
    }
    
    public String aggregatedName()
    {
        return "ArrayNullCheckAttribute"; 
    }
}

We examine the annotation process on a simple example, foo.class.

public class foo
{    
    void footest()
    {        
        int[] c = new int[2];
        c[0] = 1;
    }
}

After compilation with javac, we can use the JavaClass tool to inspect the contents of the class file.

public class foo extends java.lang.Object
filename                foo
compiled from           foo.java
compiler version        45.3
access flags            33
constant pool           14 entries
ACC_SUPER flag          true

Attribute(s):
        SourceFile(foo.java)    

2 methods:
        public void <init>()
        void footest()

public void <init>()
Code(max_stack = 1, max_locals = 1, code_length = 5)
0:    aload_0
1:    invokespecial        java.lang.Object.<init> ()V (3)
4:    return

Attribute(s) = LineNumber(0, 1)

void footest()
Code(max_stack = 3, max_locals = 2, code_length = 9)
0:    iconst_2
1:    newarray                <int>
3:    astore_1
4:    aload_1
5:    iconst_0
6:    iconst_1
7:    iastore
8:    return

Attribute(s) = LineNumber(0, 5), LineNumber(4, 7), LineNumber(8, 3)

Great. Next, we annotate the class by executing

java soot.Main -A both foo

and inspect the annotated class file.

public class foo extends java.lang.Object
filename                foo
compiled from           foo.jasmin
compiler version        45.3
access flags            33
constant pool           14 entries
ACC_SUPER flag          true

Attribute(s):
        SourceFile(foo.jasmin)

2 methods:
        public void <init>()
        void footest()

public void <init>()
Code(max_stack = 1, max_locals = 1, code_length = 5)
0:    aload_0
1:    invokespecial        java.lang.Object.<init> ()V (2)
4:    return

Attribute(s) = (Unknown attribute ArrayNullCheckAttribute: 00 01 00)

void footest()
Code(max_stack = 3, max_locals = 1, code_length = 7)
0:    iconst_2
1:    newarray                <int>
3:    iconst_0
4:    iconst_1
5:    iastore
6:    return

Attribute(s) = (Unknown attribute ArrayNullCheckAttribute: 00 05 00)

We can see that an ArrayNullCheckAttribute has been added to the class file, and we can read the attribute data in hexadecimal.

Known shortcomings

Soot cannot currently preserve existing attributes in a class file when transforming and annotating it. In the foo example, any debug information from javac would be lost after annotation.

⚠️ **GitHub.com Fallback** ⚠️