Adding attributes to class files (Basic) - VictoryWangCN/soot GitHub Wiki

This tutorial will show you how to add various kinds of class file attributes to a program, created from scratch, with Soot. It extends the program introduced in Creating a class file from scratch. Another, slightly more detailed, illustration of this process is available in the Soot tutorial Adding attributes to class files.

Class file Attributes

Attributes are name-value pairs that can be associated with various class file structures. The Java VM spec (chapter 4.7) defines four different kinds of attribute. (Or, to be more precise, attributes can be attached to four different class file structures.) The four are: class, field, method, and code.

All attributes have the following structure:

attribute_info {
    u2 attribute_name_index;
    u4 attribute_length;
    u1 info[attribute_length];
}

We will extend the program from the Creating a class file from scratch tutorial to add attributes of each kind to the class file it produces.

Soot Hosts and Tags

We can attach arbitrary metadata to a variety of Soot data structures. Some such metadata may be converted into class file attributes when a class file is produced, and some may be for internal Soot use only. Two interfaces, in the soot.tagkit package, define this metadata facility: Host and Tag. A Tag is a named piece of metadata and a Host is an object that may have any number of uniquely named Tags attached.

If we wish to define a Tag that will be converted to a class file attribute, we use the Attribute subinterface and attach it to a soot structure that corresponds to one of the four class file structures to which attributes can be attached.

Adding Class and Method Attributes

In this section, we will extend our program to add class and method attributes to the resultant HelloWorld class. To add class and method attributes, we will use the GenericAttribute class, which is a default implementation of Attribute suitable for simple class, method and field attributes.

The following code snippets show how to add class and method attributes. The class attribute will have the value foo and the method attribute will have no data. The generated class file will now have these attributes.

First, we create our class attribute and add it to the SootClass. In the resultant class file, the attribute_info structure's attribute_name_index field will point to the Unicode string "ca.mcgill.sable.MyClassAttr" in the constant pool, and the info field will contain the bytes of the Unicode string foo.

    // create and add the class attribute, with data ``foo''
       GenericAttribute classAttr = new GenericAttribute(
           "ca.mcgill.sable.MyClassAttr",
           "foo".getBytes());
       sClass.addTag(classAttr);

We do basically the same thing for a method attribute:

    // Create and add the method attribute with no data
       GenericAttribute mAttr = new GenericAttribute(
           "ca.mcgill.sable.MyMethodAttr",
           "".getBytes());
       method.addTag(mAttr);

If the HelloWorld class being produced by this program had any fields, we could similarly add an attribute to a SootField.

Adding Code Attributes

According to the Java VM Spec, every method structure in a class file will have exactly one Code_attribute, unless the method is native or abstract, in which case it will have zero. The Code_attribute, which is where the bytecode instructions of a method are stored, may have an arbitrary number of attributes attached. When we refer to code attributes in this tutorial, it is these optional attributes attached to the class file Code_attribute structure to which we refer. They have the same structure as class, method, and field attributes.

In general, we use code attributes to annotate bytecode instructions in the body of a method, since it is not possible to annotate instructions directly. For example, the LineNumberTable attribute annotates bytecode instructions with the source file line numbers of their source Java statements. It is usual that the value of a code attribute be the encoding of a table associating bytecode offsets to values.

The Soot structure that corresponds to the class file's Code_attribute structure is the Body class. Thus in order to add a code attribute to the resultant class file, we must add an Attribute, in Soot, to the method's Body. However, for several reasons it would be very inconvenient to implement instruction tagging in Soot by directly manipulating an attribute attached to a method's Body. Instead, Soot allows us to tag the instructions themselves, and provides facilities for automatically converting these instruction tags into a single code attribute when the class file is created.

Tagging `Unit`s in Soot

Soot Bodys can be in one of several different intermediate representations. In our program, the Body we are creating is populated with Jimple statements. Statements in each intermediate representation subclass Unit, which implements Host, and so to each we may add a Tag. Since tags attached to such statements don't correspond directly to an attribute in the produced class file, but rather are converted later by Soot, we add Tags not Attributes.

Let us say we wish to add an attribute called "ca.mcgill.sable.MyTag" to each bytecode, and that the tag will have an integer value. First we define a new class representing the tag:

private class MyTag implements Tag {

    int value;

    public MyTag(int value) {
        this.value = value;
    }

    public String getName() {
        return "ca.mcgill.sable.MyTag";
    }

    // output the value as a 4-byte array
    public byte[] getValue() {
        ByteArrayOutputStream baos = new ByteArrayOutputStream(4);
        DataOutputStream dos = new DataOutputStream(baos);
        try {
            dos.writeInt(value);
            dos.flush();
        } catch(IOException e) {
            System.err.println(e);
            throw new RuntimeException(e);
        }
        return baos.toByteArray();
    }
}

Next, we add it to several Jimple statements in our program:

        // add "l0 = @parameter0"
            tmpUnit = Jimple.v().newIdentityStmt(arg,
                Jimple.v().newParameterRef(
                    ArrayType.v(RefType.v("java.lang.String"), 1), 0));
            tmpUnit.addTag(new MyTag(1));
            units.add(tmpUnit);

        // insert "tmpRef.println("Hello world!")"
        {
            SootMethod toCall = Scene.v().getMethod(
                "<java.io.PrintStream: void println(java.lang.String)>");
            tmpUnit = Jimple.v().newInvokeStmt(Jimple.v().newVirtualInvokeExpr(
                tmpRef, toCall.makeRef(), StringConstant.v("Hello world!")));
            tmpUnit.addTag(new MyTag(2));
            units.add(tmpUnit);
        }

We now have Tags attached to some of the statements in the method Body. We will now look at how to convert these into a code attribute that can be written out with the class file.

Tag Aggregators

In order to convert the Tags on statements into a code attribute that can be written out in the class file, we must define a TagAggregator. A TagAggregator is a Soot BodyTransformer that accepts a Body with tagged instructions, and produces a Body with an equivalent code attribute. We could use the GenericAttribute class to represent the attribute structure written in the class file; in this section, we will be using Soot's CodeAttribute class, which is a default implementation of a bytecode offset to value table. (To try to create an Attribute by hand with the same data would not be easy: dealing with the Body in an intermediate representation as we are, we don't yet know what the resultant bytecode offsets will be. CodeAttribute takes care of this for us.)

A TagAggregator works by constructing a list of Units and a list of Tags. The list of Units denotes eventual bytecode offsets in the offset-value table, and the list of Tags the corresponding values. These lists must be the same length. We need to define three methods. The first, wantTag, is a predicate on tags that selects only those tags we are interested in. In our case, those that are instances of the MyTag class. The second, considerTag, populates the two lists. The third, AggregatedName, returns the name of the resultant attribute.

class MyTagAggregator extends TagAggregator {

    public String aggregatedName() {
        return "ca.mcgill.sable.MyTag";
    }

    public boolean wantTag(Tag t) {
        return (t instanceof MyTag);
    }

    public void considerTag(Tag t, Unit u) {
        units.add(u);
        tags.add(t);
    }
}

TagAggregator is an instance of BodyTransformer, and to use it we pass a method body to its transform method. However, the TagAggregator transform method expects a Baf Body, and the Body we are working with is currently a Jimple Body. We must, therefore, first convert the Body to Jimple. Each tag on each Jimple statement will be propagated to the corresponding Baf statements.

    MyTagAggregator mta = new MyTagAggregator();
    // convert the body to Baf
    method.setActiveBody(
        Baf.v().newBody((JimpleBody) method.getActiveBody()));
    // aggregate the tags and produce a CodeAttribute
    mta.transform(method.getActiveBody());

Since our method now has a Baf Body, we must modify the code used to write the class out to a file:

    // write the class to a file
    String fileName = SourceLocator.v()
        .getFileNameFor(sClass, Options.output_format_class);
    OutputStream streamOut = new JasminOutputStream(
        new FileOutputStream(fileName));
    PrintWriter writerOut = new PrintWriter(
        new OutputStreamWriter(streamOut));
    AbstractJasminClass jasminClass = new soot.baf.JasminClass(sClass);
    jasminClass.print(writerOut);
    writerOut.flush();
    streamOut.close();

One last thing: We need to register MyTagAggregator as a new transform. If you are tagging the results of another Soot transform, this is a must in order to propagate those results to the class file. We can add MyTagAggregator as a transform into the pack tag:

    PackManager.v().getPack("tag").add(new Transform("tag.mta", 
        new MyTagAggregator()));

The resultant class file will now contain a code attribute that encodes a table mapping bytecode offsets to values.

The complete example can be found in this file.