Simpl Specifications - ecologylab/simpl GitHub Wiki

Introduction

S.IM.PL stands for Support for Information Mapping across Programming Languages. It is a system that allows developers to take persistent data in a language, serialize it into a string, and then deserialize the string into data in another language. The ability to do this greatly increases the useability of code across, and from, multiple systems. For example, a database of C++ objects might be difficult to use with Java, but with S.IM.PL, we can serialize the objects into formatted strings, and deserialize them in Java for portability. While it may be possible to manually reconstruct each object, the S.IM.PL approach is, well, simpler.

S.IM.PL is important because it facilitates communication across systems. It breaks barriers between programming languages that could greatly hinder development for projects. Moreover, it is consistent, in that an identical object which is part of a class across two languages with the same name, fields, etc. should serialize from both languages into the same string, and deserialize into a copy of the original object.

Source and Target Languages

Some languages are much easier for implementing S.IM.PL, and are categorized as "source" languages. S.IM.PL serialization requires "source" languages to have support for reflection and annotations. Languages such as Java and C# fall within this category. Languages without such support are considered "target" languages. Target languages include Objective-C, Python, JavaScript, and C++. New data types in a source language.

Currently, S.IM.PL fully supports Java and C#, and contains limited or unreleased support for Python and Objective-C.

DBAL overview

We use DBAL (or Data Binding Annotation Language) to denote classes, fields, etc. SIMPL that S.IM.PL will recognize and incorporate into its type system. These are the classes and fields that you want propagated through serialization and data structure definitions. Any type which you intend to serialize or deserialize must be annotated accordingly. In addition, DBAL annotations are important for specifying how a class or field is de/serialized. S.IM.PL supports a range of fine-grained control of the de/serialization process through DBAL. The annotations are made available at runtime through reflection.

DBAL annotations in a class should be added to your source code at the beginning of any field declaration that you want to SIMPL to utilize. The example below shows a very basic composite object, containing two scalar objects (scalarA and scalarB) which must be annotated with "@simpl_scalar" before S.IM.PL will recognize them as S.IM.PL objects and serialize/deserialize them with the parent object.

    //Example: basicComposite.java
    package ecologylab.fundamental;

    import ecologylab.serialization.annotations.simpl_scalar;

    public class basicComposite {
	
	    @simpl_scalar
	    public Integer scalarA;
    
	    @simpl_scalar
	    public int scalarB;
	
    
	    public basicComposite(){}

	    public basicComposite(int i)
	    {
	    	this.scalarA = i;
	    	this.scalarB = i+1;
	    }
    }

General Tags

These tags are required for a field to be recognized as de/serializable. They should be placed at the beginning of the declaration of the field. This allows S.IM.PL to detect which fields are important, and treat the appropriately.

Annotation Quick Reference Table

Tag Name	Purpose	Parameters
@simpl_scalar	Declares that a scalar-typed field should be de/serialized.	None
@simpl_composite	Declares a field with a user-defined class type.	None
@simpl_collection( ["TagName"])	Declares a monomorphic collection such as a List or ArrayList	Takes "TagName" as a parameter, which is a user-given string used as an identifier for the collection
@simpl_map( ["TagName"])	Declares a map structure for de/serialization	Takes "TagName" as a parameter, which is a user-given string used as an identifier for the map
@simpl_inherit	This tag dictates that a class should use the annotations associated with the parent class. Adding this implies that the parent class contains fields required for serialization.	None

Formatting Tags

These tags allow fine-grained control over the serialization process by the user. Adding these tags to the beginning of a declaration allows control over a specific behavior during the de/serialization process.

Annotation Quick Reference Table

Tag Name	Purpose	Parameters
@simpl_wrap	Makes sure the field is serialized as a child of a parent object. In collections and composites, it determines whether the outer composite or the containing collection are serialized.	None
@simpl_nowrap	The opposite of @simpl_wrap, this tag makes sure the field is not serialized as the child of a parent object.	None
@simpl_scope	Allows dynamic specification of a polymorphic set of classes through a type system scope.	Requires, as a parameter, a unique identifier of a registered type system scope data structure.
@simpl_classes	Statically specifies a set of classes to consider for an Object's Type at runtime.
@simpl_tag	Allows the user to explicitly assign a tag name for a field or class.
@simpl_format( ["Format RegEx"])	Used in conjunction with @simpl_scalar. Enforces a format for the scalar with the regular expression parameter.	Takes a the Regular Expression as a string parameter.
@simpl_other_tags( [Tag Names])	Like @simpl_tag, which will always serialize to the given parameter tag, @simpl_other_tags can be used only for deserialization, and the tags in the parameter are treated like aliases.	As a parameter, this annotation takes either a single tag name (Like "regex_split") or a series of tag names (like "{"field_name", "described_class_name"}"). All of these tag names will be treated as aliases. If more than one tag is added, surround the set with braces and comma-separate the tags.
@simpl_Hints( Hint.[Attribute])	This annotation is used for fine-grained control over serialized representation.	Takes a "Hint." attribute that accepts the following values: Hint.XML_ATTRIBUTE, Hint.XML_LEAF, Hint.XML_LEAFCDATA, Hint.XML_TEXT, Hint.XML_TEXT_CDATA, or Hint.UNDEFINED.

Miscellaneous Tags

These are tags that do not fit into the above categories, but are still important for control over the de/serialization process.

Annotation Quick Reference Table

Tag Name	Purpose	Parameters
@simpl_composite_as_scalar	Denotes a scalar field as a scalar value for a composite element. This annotation can also be used for other formats that do not support the representation of composite elements.	None
@simpl_descriptor_classes( {[ClassDescriptor].class, [FieldDescriptor].class})	Sets the class descriptor and field descriptors for the annotated class.	Takes two parameters, which are both class names. Typically "MetadataClassDescriptor.class" or "MetaMetadataClassDescriptor.class", and "MetadataFieldDescriptor.class" or "MetaMetadataClassDescriptor.class".
@simpl_filter( [regex = "Expression"])	Uses a regular expression to filter out the data when translating from serialized representation.	Takes, as a parameter, a valid regular expression as a string.
@simpl_inherit_parent_tag	A child class with this annotation is assigned the tag of the parent class.	None
@simpl_map_key_field( [String name])	Paired with @simpl_map, allows the user to name the map's key.	Takes a string, as a parameter, of the intended name of the key.
@simpl_use_equals_equals	A class annotated with this can use "==" to test equivalence of objects.	None

Supported serialization formats

Currently we fully support JSON and XML as serialization formats. We also have limited support for Bibtex. If S.IM.PL support is extended to other programming languages, we strongly encourage support for at least JSON and XML, as that will allow interaction with current implementations of S.IM.PL.

Supported Types

There are three main categories of types that we support: Scalars, Composites, and Collections.

Scalars

When we say "scalars", we refer to basic and frequently used types in programming. In languages such as Java, this includes primitive data types, as well as references to them, such as the Integer class. Also in this category are a number of types that developers may find convenient, such as Color and ParsedURL.

We split scalars up into three basic types: The Core, Extended, and Custom scalars. The Core scalars are those that at minimum must be included in any implementation of Simpl. Without the these implemented, Simpl will be missing, well, it's "core" functionality.

Core types are defined here.

The most useful types we support as scalars are:

Primitives
boolean, Boolean;
int, Integer; long, Long; short, Short; byte, Byte;
float, Float; double, Double;
Enumerated;

More fundamental scalars:

String, String Builder; char, Char;
UUID
Date
Color
URL, ParsedURL;
File
Array Data as Scalar
Image
Binary Data

Composites

"Composite" refers to any abstract data type, consisting of multiple fields, including scalars and even other composites. In object-oriented programming languages, such as C++, Java, and C#, "composites" include programmer-made classes and fields.

Recursion and Self-Reference

It is important to recognize that composite objects frequently contain other composites, scalars, and collections. This means that when a composite object is serialized, it must first serialize all of the contained types (and thus, must first serialize all of their contained types, and so on). All contained types will be recursively serialized starting with the innermost object. The example above showed a composite containing two scalars, but S.IM.PL also supports self-reference loops among series of composites. The following example shows two composite objects that mutually reference each other.

    package legacy.tests.graph;

    import legacy.tests.TestCase;
    import legacy.tests.TestingUtils;
    import ecologylab.serialization.SIMPLTranslationException;
    import ecologylab.serialization.SimplTypesScope;
    import ecologylab.serialization.annotations.simpl_composite;
    import ecologylab.serialization.annotations.simpl_inherit;
    import ecologylab.serialization.annotations.simpl_scalar;
    
    @simpl_inherit
    public class ClassA implements TestCase
    {
    	@simpl_scalar
    	private int	x;
    
    	@simpl_composite
    	private ClassB	classB;
    
    	public ClassA() {}
    
    	public ClassA(int ix, ClassB iclassB)
    	{
	    	setX(ix);
	    	setClassB(iclassB);
    	}
        //Getters and setters omitted for brevity
    }
    
    @simpl_inherit
    public class ClassB implements TestCase
    {
    	@simpl_scalar
    	private int y;
    	
    	@simpl_composite
    	private ClassA classA;
    	
    	public ClassB() {}
    	
    	public ClassB(int iy, ClassA iclassA)
       	{
    		setY(iy);
    		setClassA(iclassA);
    	}
        //Getters and setters omitted for brevity
    }

    @Override
        public void runTest() throws SIMPLTranslationException
        {
    		SimplTypesScope.enableGraphSerialization();
    
    	    	ClassA testA = new ClassA(1, null);
    	    	ClassB testB = new ClassB(4, testA);
        
    	    	testA.setClassB(testB);
        
    	    	SimplTypesScope tScope = SimplTypesScope.get("classATScope", ClassA.class, ClassB.class);
                DualBufferOutputStream outputStream = new DualBufferOutputStream();
                
    	    	TestingUtils.test(test, tScope, Format.XML);
	    	TestingUtils.test(test, tScope, Format.JSON);
    
	    	SimplTypesScope.disableGraphSerialization();
	}
    //gRAPH CALLing graph serialization/deserialization.

In this example, if classA is called to be serialized first, the scalar 'x' will first be serialized, followed by classB. The object classB contains the scalar 'y', and an instance of ClassA, a composite type. S.IM.PL recognizes that this is the same instance of ClassA we were initially trying to serialize, so it instead puts a reference there to the original object, which allows us to serialize recursively referenced objects.

Polymorphism

It is important to make special mention one of S.IM.PL's features with respect to Collections and Composites: Polymorphism. What this means for Collections is that they can contain many different classes of objects. For example, you could have a "Person" List, containing "Plumber", "Milkman", and "Student" objects, as long as they are children of the "Person" class. Alternatively, a class could have "Person" as a field, and you would want to consider subclasses of "Person" when de/serializing. Polymorphic composites and collections are treated slightly differently in S.IM.PL than their monomorphic counterparts, in that we must consider the scope of classes that may be important. We must be sure to pair polymorphic collections and composites with annotations of @simpl_classes and @simpl_scope.

For more examples, and further information regarding polymorphism in a particular language, see the tutorial section.

Collections

"Collections" refers to any set of Scalar or Composite objects that is not a part of a user-defined class or type. Collections are typically types that are commonly supported by programming languages. This category also include the "Scope" type, which is important for S.IM.PL itself. Simpl supports monomorphic and polymorphic collections, as well as recursive and self-referencing collections. For more information on recursive and self-referencing collections and composites, or polymorphic collections and composites, see the Recursion and Self-Reference and [Polymorphism][polymorphism] subsections under Composite Types, as both apply here as well.

Some types of collections supported are below:

Arraylist (default Collection)
List
Hashmap (default Map)
HashmapArrayList
Scope

The Serialization Process

Format Composites, Collections, and Scalar Types

Before an object can be serialized, it must have a couple of features above and beyond a normal object. As stated above, before serializing an composite or collection into a string, all of the related class fields or collection items must be annotated appropriately using the DBAL. -If the object to be serialized is a class, make certain it has a public parameter-less constructor. (Getter and setter methods for all fields?)

Parsing Example

A good example of a before/after serialization is detailed below:

package simplTestCasesDeSerializationTest;
//Included files omitted for brevity

public class SimplCompositeDeSerializationTest {
	
	@Test
	public void compositeDeSerializationTest() throws SIMPLTranslationException
	{
		Point p = new Point(1, -1);
		Circle c = new Circle(3, p);
		SimplTypesScope s = SimplTypesScope.get("circlescope", Point.class, Circle.class);
		SimplTypesScope.enableGraphSerialization();
		
		DualBufferOutputStream outputStream = new DualBufferOutputStream();
		SimplTypesScope.serialize(c, outputStream, Format.JSON);
		
		String result = outputStream.toString();
		assertEquals("{\\"circle\\":{\\"radius\\":\\"3\\",\\"center\\":{\\"x\\":\\"1\\",\\"y\\":\\"-1\\"}}}", result);
		
                InputStream inputStream = new ByteArrayInputStream(outputStream.toByte());
		
		Object jsonObject = s.deserialize(inputStream,  (DeserializationHookStrategy) null, Format.JSON, null);
		assertTrue(jsonObject instanceof Circle);
		Circle jsonCircle = (Circle) jsonObject;
                assertEquals(jsonCircle.getRadius(), c.getRadius());
                assertEquals(jsonCircle.getCenter().getX(), c.getCenter().getX());
                assertEquals(jsonCircle.getCenter().getY(), c.getCenter().getY());
	}
}

In this example we create objects 'p' and 'c' of types "Point" and "Circle" respectively. The Point constructor takes integers 'x' and 'y' coordinates as parameters, and the Circle constructor takes an integer 'radius' and a point 'center'. We add the related classes to a SimplTypesScope (important for deserialization), which we name "circlescope", and then setup outputStream for serialization. We call the serialization function with the circle object, the output stream, and Format.JSON parameters, meaning we want the circle object serialized, and the string put into the outputStream object in JSON format.

In this unit test, we show that nested composites serialize correctly. The output string (without escape sequences) is:

"{"circle":{"radius":"3","center":{"x":"1","y":"-1"}}}"

Which is to say that this is a "circle" object, containing a field "radius" with value "3", and a field "center" which contains two other fields "x" and "y" and their respective values. Similarly, the string in XML format would be:

"<circle><radius>3</radius><center x="1" y="-1"/></circle>"

Also shown is deserialization of the object using the SimplTypesScope, and assertion statements verifying the correctness of the output object.

The Deserialization Process

Establishing SimplTypesScope

In order for S.IM.PL to properly deserialize a string, it has to first know what to look for. We don't need to be concerned about scalar types, since S.IM.PL is already aware of them, and similarly, the established collection types are fine. However, every composite that you might be expecting to receive in the process of deserialization must be added to the "scope of considered classes", which we refer to as the SimplTypesScope.

A SimplTypesScope can easily be thought of as a set of DBAL annotated classes, through which S.IM.PL can search for matches to the object to be deserialized. If a S.IM.PL is trying to deserialize an object, it looks through the related SimplTypesScope for a class with the same name and fields and, if it does not find one, it throws an error. For example, if we wanted to deserialize an string representing a "Car" object, but our SimplTypesScope does not contain the "Car" class, S.IM.PL will not know how to deserialize the object into the target language.

Deserialization Hook Strategy

Deserialization Hook Strategies are optional additions to the deserialization process, which execute code before, during, or after deserialization (with the PreHook, InHook, and PostHook processes, respectively). A good example of when this might be needed, would be if after deserializing each in a series of objects, you wanted to add the objects to a database. The PostHook option would allow this functionality to be added by the user.