Adding attributes to class files (Advanced) - VictoryWangCN/soot GitHub Wiki
Soot can annotate class files: for instance, it can add information about which array bounds checks and null pointer checks are redundant. We anticipate that users of Soot may wish to add new attributes to class files. This tutorial uses the array bounds check attribute to illustrate the internal structure of Soot annotation and describes how to add new attributes via Soot. Before reading this tutorial, readers should be familiar with the basic Soot classes, like SootClass
, SootField
, SootMethod
, Body
, and Unit
. Have a look into the tutorial Fundamental Soot Objects if you do not yet know about these classes.
This tutorial dives deep into the details of annotations. If you want to get a brief overview have a look at this tutorial.
The general description of our annotation framework can be found in our CASCON paper, A Framework for Optimizing Java Using Attributes. This tutorial explains practical issues related to implementing new attributes.
We first introduce the classes used for annotation. These classes reside in the soot.tagkit package.
Host
Any class which implements the Host interface promises that it can hold tags.
public interface Host
{
/** Get a list of tags associated with the current object. */
public List getTags();
/** Returns a tag with the given name. */
public Tag getTag(String aName);
/** Adds a tag. */
public void addTag(Tag t);
/** Removes a tag with the given name. */
public void removeTag(String name);
/** Returns true if this host has a tag with the given name. */
public boolean hasTag(String aName);
}
AbstractHost
Soot provides a default implementation of the Host
interface in the form of AbstractHost
. Unless you have a pressing desire to provide the functionality yourself, any classes which you would like to implement Host
should subclass AbstractHost
. In Soot, the classes SootClass
, SootField
, SootMethod
, Body
, and Unit
inherit from AbstractHost
. Instances of these classes know how to carry tags, in the form of the Tag
interface.
Tag
In Soot, we represent any annotation by an object whose type implements the Tag interface. This interface defines one methods: getName()
returns the unique name of the tag (note that tag names must not conflict with each other).
public interface Tag
{
/** Returns the tag's name. */
public String getName();
}
Attribute
The Attribute
interface extends Tag
; it promises that the associated tag has attribute-like data which can be read and written as an array of bytes.
public interface Attribute extends Tag
{
/** Returns the tag's raw data. */
public byte[] getValue()
throws AttributeValueException;
/** Sets the value of the attribute from a byte[]. */
public void setValue(byte[] v);
}
A Tag
which is not an Attribute could be used to store arbitrary Soot information about a Host
. An Attribute
is something that would go in a class file.
TagAggregator
The array-bounds check analysis annotates individual instructions as it discovers whether or not their bounds checks are required. More generally, analyses will attach attributes directly to the Units in question. However, the Java class file structure does not make any provisions for directly attaching attributes to bytecodes, and attaching attributes directly to bytecodes would in any case be inefficient. Hence, we designed our attributes so that they would attach to a method in a tabular format: only one actual attribute is required per method body and tag type; this meta-attribute contains information about a number of different instructions.
An implementation of TagAggregator
promises that it can combine all tags of some type into one big aggregated attribute, which can be attached to a method's code attribute. One implementation of a TagAggregator
is soot.jimple.toolkits.annotation.tags.ArrayNullTagAggregator
.
One of the things that an aggregator must do is decide which bytecode instructions a tag will be associated with, since each Jimple statement may turn into several bytecode instructions. Two subclasses of TagAggregator
, ImportantTagAggregator
and FirstTagAggregator
, attach the tags to important instructions containing field or array references or method invocations, and to the first bytecode instruction for the Jimple statement, respectively.
Base64
This utility tool class allows the encoding of raw bytes to base64-encoded characters and the decoding of base64 characters back to raw bytes.
JasminAttribute
Attributes are generated by analysis phases in the form of strings containing labels in the unit body and their values; for instance, we might have the attribute "%label2%Aw==%label3%Ag==%label4%Ag==" associated with a method body. In order to include this attribute in a class file, exact PC values are needed for the labels.
The JasminAttribute
class provides a decode
method which takes a string of (label, value) pairs and a map from labels to PCs and emits raw data, ready for inclusion in a class file. This method is called by Jasmin after the PC values are known. Any attribute which uses (label, value) pairs can subclass JasminAttribute
to get output to class files for free; other attributes hoping to be output to class files must subclass JasminAttribute
and override the decode method.
The abstract getJasminValue()
method must return a string that can be included when outputting a .jasmin
file. This string later gets decoded by decode()
.
public abstract class JasminAttribute implements Attribute
{
public static byte[] decode(String attr, Hashtable labelToPc);
abstract public String getJasminValue(Map instToLabel);
}
CodeAttribute
This class provides an implementation of the abstract getJasminValue()
method of JasminAttribute
. The getJasminValue()
method must return a string reflecting the contents of its CodeAttribute
. It may use the provided instToLabel
map to convert Units into labels used in the returned String.
public class CodeAttribute extends JasminAttribute
{
public String getJasminValue(Map instToLabel);
}
This type of attribute is clearly intended to be used for attributes associated with code.
GenericAttribute
Java describes how three other types of attributes can be created: attributes may be associated with methods, fields and classes as well as code. Soot supports these attributes via the GenericAttribute
class. Any such attribute can be created with an attribute name and a byte array value; it can then be attached to SootClass
, SootField
, or SootMethod
.
public class GenericAttribute implements Attribute
{
public GenericAttribute(String name, byte[] value);
public String getName();
public byte[] getValue();
}
The above classes provide APIs useful for adding new attributes. Soot attributes are represented as Tag
s, and are attached to Host
s. An exception is CodeAttribute
. Because the tags for CodeAttribute
are attached to Unit
s, a TagAggregator
is used to combine them. TagAggregator
s are instances of BodyTransformer
s, and are generally included in the tag
pack.
Adding a code attribute is non-trivial, as it requires that an aggregator be provided. We first give a trivial example of adding a method attribute via GenericAttribute
. The code can be found in ashes.examples.addattributes. It can also be downloaded here.
We proceed by adding a new phase to the jtp Pack, called annotexample.
public class AnnExample
{
public static void main(String[] args)
{
/* adds the transformer. */
PackManager.v().getPack("jtp").add(new
Transform("jtp.annotexample",
AnnExampleWrapper.v()));
/* invokes Soot */
soot.Main.main(args);
}
}
The AnnExampleWrapper
is a subclass of BodyTransformer
, which implements the internalTransform
method. It simply adds a string "Hello world!" as an attribute to every method. The attribute has the name "Example".
public class AnnExampleWrapper extends BodyTransformer
{
private static AnnExampleWrapper instance =
new AnnExampleWrapper();
private AnnExampleWrapper() {};
public static AnnExampleWrapper v()
{
return instance;
}
public void internalTransform(Body body, String phaseName, Map options)
{
SootMethod method = body.getMethod();
String attr = new String("Hello world!");
Tag example = new GenericAttribute("Example", attr.getBytes());
method.addTag(example);
}
}
We recompile foo and annotate it with new attributes.
java AnnExample foo
The annotated class file has an "Example" attribute for each method. The string "Hello world!" is in binary form.
public class foo extends java.lang.Object
filename foo
compiled from foo.jasmin
compiler version 45.3
access flags 33
constant pool 14 entries
ACC_SUPER flag true
Attribute(s):
SourceFile(foo.jasmin)
2 methods:
public void <init>() <(Unknown attribute Example: 48 65 6c 6c 6f 20 77 6f 72 6c 64 21)>
void footest() <(Unknown attribute Example: 48 65 6c 6c 6f 20 77 6f 72 6c 64 21)>
public void <init>() <(Unknown attribute Example: 48 65 6c 6c 6f 20 77 6f 72 6c 64 21)>
Code(max_stack = 1, max_locals = 1, code_length = 5)
0: aload_0
1: invokespecial java.lang.Object.<init> ()V (2)
4: return
void footest() <(Unknown attribute Example: 48 65 6c 6c 6f 20 77 6f 72 6c 64 21)>
Code(max_stack = 3, max_locals = 1, code_length = 7)
0: iconst_2
1: newarray <int>
3: iconst_0
4: iconst_1
5: iastore
6: return
In this section, we will use the array bounds check attribute to illustrate the process of creating a new code attribute.
The classes in this example are located in the soot.jimple.toolkits.annotation.arraycheck
and .nullcheck
packages.
Clearly we must be able to represent whether or not an array reference is safe. To do this, we first created the ArrayCheckTag
class implementing (a subclass of) Tag
. It is not an Attribute
because the information is not in a form suitable for adding to a class file and setting the information directly is meaningless. ArrayCheckTag
has a constructor with boolean parameters representing upper and lower array bounds checks. If a parameter is true, the respective bound check is needed. The getValue()
method converts the boolean values to a byte value where the lowest two bits represent the bounds checks.
/**
* This tag represents the two bounds checks of an array reference.
* The value <code>true</code> indicates that a check is needed.
*/
public ArrayCheckTag(boolean lower, boolean upper)
{
lowerCheck = lower;
upperCheck = upper;
}
/** Returns the value of this tag as a one-byte array for inclusion in
* the class file. */
public byte[] getValue()
{
byte[] value = new byte[1];
value[0] = 0;
if (lowerCheck)
value[0] |= 0x01;
if (upperCheck)
value[0] |= 0x02;
return value;
}
We designed an algorithm to analyze array bounds checks. The final phase of this algorithm attaches the analysis results to the various units as tags. This is accomplished with the following code:
Tag checkTag = new ArrayCheckTag(lowercheck, uppercheck);
stmt.addTag(checkTag);
As previously explained, code tags are attached to Unit
s, but Unit
s themselves do not have attributes. Thus, an aggregator is needed to group the attributes. Now, a null pointer check elimination algorithm has already executed, attaching NullCheckTag
s to Unit
s. An ArrayNullTagAggregator
will collect the NullCheckTag
s and ArrayCheckTag
s, combining these two tags into a single ArrayNullCheckTag
per method body.
public class ArrayNullCheckTag implements OneByteCodeTag
{
private final static String NAME = "ArrayNullCheckTag";
public String getName();
public byte[] getValue();
public byte accumulate(byte other);
}
The ArrayNullTagAggregator
implements the TagAggregator
interface. It is called while Baf is generating its backend code. The wantTag()
method returns true
for tags that are to be considered by this aggregator. The considerTag
method accumulates one (unit, tag) pair, typically encountered during Baf's traversal of the units.
public class ArrayNullTagAggregator extends TagAggregator
{
public ArrayNullTagAggregator( Singletons.Global g ) {}
public static ArrayNullTagAggregator v() { return G.v().ArrayNullTagAggregator(); }
public boolean wantTag( Tag t ) {
return (t instanceof OneByteCodeTag);
}
public void considerTag(Tag t, Unit u)
{
Inst i = (Inst) u;
if(!( i.containsInvokeExpr() || i.containsFieldRef() || i.containsArrayRef())) return;
OneByteCodeTag obct = (OneByteCodeTag) t;
if( units.size() == 0 || units.getLast() != u ) {
units.add( u );
tags.add( new ArrayNullCheckTag() );
}
ArrayNullCheckTag anct = (ArrayNullCheckTag) tags.getLast();
anct.accumulate(obct.getValue()[0]);
}
public String aggregatedName()
{
return "ArrayNullCheckAttribute";
}
}
We examine the annotation process on a simple example, foo.class
.
public class foo
{
void footest()
{
int[] c = new int[2];
c[0] = 1;
}
}
After compilation with javac
, we can use the JavaClass tool to inspect the contents of the class file.
public class foo extends java.lang.Object
filename foo
compiled from foo.java
compiler version 45.3
access flags 33
constant pool 14 entries
ACC_SUPER flag true
Attribute(s):
SourceFile(foo.java)
2 methods:
public void <init>()
void footest()
public void <init>()
Code(max_stack = 1, max_locals = 1, code_length = 5)
0: aload_0
1: invokespecial java.lang.Object.<init> ()V (3)
4: return
Attribute(s) = LineNumber(0, 1)
void footest()
Code(max_stack = 3, max_locals = 2, code_length = 9)
0: iconst_2
1: newarray <int>
3: astore_1
4: aload_1
5: iconst_0
6: iconst_1
7: iastore
8: return
Attribute(s) = LineNumber(0, 5), LineNumber(4, 7), LineNumber(8, 3)
Great. Next, we annotate the class by executing
java soot.Main -A both foo
and inspect the annotated class file.
public class foo extends java.lang.Object
filename foo
compiled from foo.jasmin
compiler version 45.3
access flags 33
constant pool 14 entries
ACC_SUPER flag true
Attribute(s):
SourceFile(foo.jasmin)
2 methods:
public void <init>()
void footest()
public void <init>()
Code(max_stack = 1, max_locals = 1, code_length = 5)
0: aload_0
1: invokespecial java.lang.Object.<init> ()V (2)
4: return
Attribute(s) = (Unknown attribute ArrayNullCheckAttribute: 00 01 00)
void footest()
Code(max_stack = 3, max_locals = 1, code_length = 7)
0: iconst_2
1: newarray <int>
3: iconst_0
4: iconst_1
5: iastore
6: return
Attribute(s) = (Unknown attribute ArrayNullCheckAttribute: 00 05 00)
We can see that an ArrayNullCheckAttribute
has been added to the class file, and we can read the attribute data in hexadecimal.
Soot cannot currently preserve existing attributes in a class file when transforming and annotating it. In the foo
example, any debug information from javac
would be lost after annotation.