Lecture04 - nus-cs2030/2324-s2 GitHub Wiki

Java Generics

This lecture introduces students to how generic classes and methods are defined in order to allow for code reuse. We take inspiration from Java's ArrayList and employ the delegation pattern to build our immutable list ImList. In order to support substitutability in generics, we look at the concept of data flow, and how we can support appropriate substitutions by using bounded wildcards.

Java's Mutable ArrayList

Let us begin our discussion with Java's ArrayList, a data structure that we have already started using. We have been declaring instances of ArrayList to store different types of elements, e.g. ArrayList<Shape>, ArrayList<String>, etc. Specifically, ArrayList is a box type that contains another type (we sometimes refer to this as a container type). How does Java handle different "inner" types?

If you refer to the Java API, you will notice that the ArrayList class is declared as

public class ArrayList<E> ... {
    ...
}

E is the generic type of the ArrayList generic class. Whenever we declare say, ArrayList<String>, the generic type E is type parameterized to the String type.

Declaring ArrayList this way makes it a polymorphic data structure. This is in contrast to writing individual monomorphic classes such as ArrayListShape and ArrayListString, which have almost the same functionality apart from the type of the element they store. We also call this parametric polymorphism since ArrayList handles list operations that involve different types based on the actual type parameterization.

In the following, we show how an ArrayList<String> can be created as an empty list, as well as from a given list of String elements.

jshell> List<String> strings = new ArrayList<String>()
strings ==> []

jshell> strings.add("one")
$.. ==> true

jshell> strings.add("two")
$.. ==> true

jshell> strings // ArrayList is a mutable data structure
strings ==> [one, two]

jshell> new ArrayList<String>(strings) // create another list with same elements as strings
$.. ==> [one, two]

Note that the add method in the generic class has the following specification:

for all types bound to E, ArrayList<E> add(E) => boolean

add is a dynamic (or dynamically dispatched) method and Java uses it to decide at runtime the method implementation to call based on the type of the object (or the type of the this parameter). In our example, if E is bound to String, then the method to call is based on the the type of the this parameter (ArrayList<String>) with the method taking in a String and returning a boolean.

A formal specification of a dynamic method is given by

dynamic method :: this_type(input_type) => boolean

Defining ImList via Delegation

Let us now try understand the implementation of ImList which has been provided to you since the first lecture. Here is a monomorphic (non-generic or non-polymorphic) ImListString that only stores String elements.

class ImListString {
    private final List<String> elems;

    ImListString() {
        this.elems = new ArrayList<String>();
    }

    ImListString(List<String> list) {
        this.elems = new ArrayList<String>(elems);
    }

    @Override
    public String toString() {
        return this.elems.toString();
    }
}

Clearly, writing another monomorphic ImListInteger class would entail duplicating the entire ImListString class with only the String type changed to Integer. We cannot declare elems as List<int> using the primitive type int as only object (or class) types can be used for type parameterization. As such rather than int, we use the wrapper type Integer which is an object type that wraps an integer value. The following illustrates how we could have wrapped an integer value 1 within an Integer object, and using intValue() to extract the raw primitive value from the object.

jshell> Integer i = new Integer(1)
i ==> 1

jshell> int j = i.intValue()
j ==> 1

Having to explicitly wrapping primitive values with its wrapper type, and calling a method to extract the value is extremely cumbersome. Hence, Java provides auto-boxing and auto-unboxing to simplify the process.

jshell> Integer i = 1 // auto-box and assign to i
i ==> 1

jshell> int j = i // auto-unbox and assign to j
j ==> 1

Now we are ready to modify our ImListString to a generic class. We also include the add method that takes in an element of type E, and a get method that returns a value of type E.

class ImList<E> { // declare generic type E
    private final List<E> elems;

    ImList() {
        this.elems = new ArrayList<E>();
    }

    ImList(List<E> list) {
        this.elems = new ArrayList<E>(elems);
    }

    ImList<E> add(E elem) {
        ImList<E> newList = new ImList<E>(this.elems);
        newList.elems.add(elem);
        return newList;
    }

    E get(int index) {
        return this.elems.get(index);
    }

    @Override
        public String toString() {
        return this.elems.toString();
    }
}

To create different ImList structures for different element types, we just need to type parameterize accordingly.

jshell> ImList<String> strings = new ImList<String>()
strings ==> []

jshell> ImList<Integer> ints = new ImList<Integer>(List.of(1, 2, 3))
ints ==> [1, 2, 3]

It is worth nothing that much of the implementations of the ImList is delegated to an internal ArrayList that is encapsulated from the client. The only additional responsibility of ImList is to make sure that all modifications to the list is actually applied on a new ImList and returned.

Generic Method

In order to invoke the add method of generic class ImList<E>, the generic type E must be type parameterized during the creation of say, ImList<Integer>.

jshell> ImList<Integer> ints = new ImList<Integer>()
ints ==> []

jshell> ints.add(1)
ints ==> [1]

In contrast, a generic method can be invoked without the need to first create an object. This implies that type parameterization cannot happen during object creation. So when does type parameterization take place? Consider the following of method that is declared statically, i.e. not invoked via any object.

jshell> <T> String of(T t) {
...> return t.toString();
...> }
| created method of(T)

jshell> of(1)
$.. ==> "1"

jshell> of(1.0)
$.. ==> "1.0"

Here, the of method is a static (or statically dispatched) method which is resolved during compile time depending on whether of(1) or of(1.0) is called. Also notice the need to declare the type parameter 'T' when defining the of method header.

In general, a static method is resolved during compile time based on its input argument types and return type. Note the absence of the type of the this parameter as compared to a dynamic method.

static method :: (input_types) => result_type

Static methods are useful as class factory methods. These methods allow the client to create objects without invoking the constructor via the new keyword.

class ImList<E> {
    private final ArrayList<E> elems;

    private ImList() { // constructor is private
        this.elems = new ArrayList<E>();
    }

    private ImList(List<E> list) { // constructor is private
        this.elems = new ArrayList<E>(list);
    }

    static <E> ImList<E> of() {
        return new ImList<E>();
    }
    
    static <E> ImList<E> of(List<E> list) {
        return new ImList<E>(list);
    }
    ...
}

The two static generic methods of(..) have the following specification:

  • for all types bound to E, of() => ImList<E>
  • for all types bound to E, of(List<E>) => ImList<E>

As the constructors are now declared private, the only way for clients to create the list is via the factory of methods.

jshell> List<Integer> list = List.of(1, 2, 3)
list ==> [1, 2, 3]

jshell> ImList.of(list) // ImList<Integer> type-inferred implicitly since list is List<Integer>
$.. ==> [1, 2, 3]

jshell> ImList.<Integer>of() // ImList<Integer> with explicit type-witness to Integer
$.. ==> []

Notice that the creation of the empty list ImList.of() is ambiguous as we need to specify what type of ImList we desire. Hence we employ type witnessing, which is the type parameterization in between the dot . and the method name of.

Generics and Substitutability

By now, you should be familiar with substitutability between parent/child classes, as well as between interfaces and their implementation classes. Substitutability or more formally Liskov's Substitution Principle (LSP) is a principle in OOP that states that objects of a superclass should be replaceable (or substitutable) with objects of its subclasses without affecting the correctness of the program.

Consider the following familiar example

Circle circle = new Circle(1.0);
Shape shape = circle;
double area = shape.getArea();

The code above is valid because Circle is-a Shape. This is-a relationship can also be expressed as Circle <: Shape (read as Circle is a Shape or Circle is substitutable for Shape). Moreover, recall that substitutability also holds during parameter passing, which is an assignment across methods.

void foo(Shape shape) {
    double area = shape.getArea();
}

foo(circle);

It is also timely to look at assignment or parameter passing as data flow. An assignment shape = circle represents data flow from the RHS of the assignment (in this case circle) to the LHS variable (in this case shape). To support this assignment Circle <: Shape must hold. Likewise in parameter passing, calling foo(circle) also represents data flow from the argument (in this case circle) to the method parameter (in this case shape).
Indeed, Circle <: Shape supports parameter passing while maintaing LSP.

Now in the context of generics, we have already seen that ArrayList<Shape> <: List<Shape> holds (since ArrayList <: List).

List<Shape> shapes = new ArrayList<Shape>(..)

But what about the following assignment? Is it valid?

ArrayList<Shape> shapes = new ArrayList<Circle>(..)

Java does not allow the above assignment since generics is invariant. However, this does not mean that substitutability is not applicable generics. The validity of an assignment based on LSP depends on what is done after the assignment, in particular the direction of the data flow.

Upper bounded Wildcards and Outward Data Flow

Let's assume that shapes is declared as some kind of ImList and assigned to a non-empty list.

ImList<..> shapes = new ImList<Shape>(...)
Shape shape = shapes.get(0);

The statement Shape s = shapes.get(0) is clearly valid if shapes was assigned to ImList<Shape>. This assignment should also be valid if shapes was assigned to ImList<Circle>.

ImList<..> shapes = new ImList<Circle>(...)
Shape shape = shapes.get(0);

Based on LSP, substituting ImList<Shape> with ImList<Circle> does not affect the correctness of the program since in the latter, we are getting a circle (which is a shape) from the list and assigning it to the shape variable.

Now pay particular attention to the statement

Shape shape = shapes.get(0);

The flow of data is from the shapes list to a shape variable. It does not matter if shapes is a ImList<Shape> or list of its sub-class (e.g. ImList<Circle>) as long as data flows out of shapes. Here, we also say that shapes is a supplier of data.

In contrast, the following should not be allowed.

ImList<..> shapes = new ImList<Object>(..)
Shape shape = list.get(0);

There is no guarantee that shapes will contain only Shape objects and assignment to the shape variable fails.

To support substitutions that the allow the flow of data out of shapes, we declare shapes to be an ImList of Shape or any of its sub-classes.

ImList<? extends Shape> shapes = ...
Shape shape = list.get(0);

In other words, the following substitutions hold:

  • ImList<Shape> <: ImList<? extends Shape>
  • ImList<Circle> <: ImList<? extends Shape>.

We sometimes abbreviate ? extends Shape using the informal notation +Shape. For example, in ImList<+Shape>, the + denotes Shape data flowing out from the ImList.

To summarize, when the variable shapes is a supplier of data (i.e. data flowing outwards), shapes can be assigned to ImList of Shape or any of its sub-class.

Lower bounded Wildcards and Inward Data Flow

While we consider data flowing out in the previous section, we now consider the scenario of data flowing in by using an example from Comparator. Note that we do not illustrate with ImList as data flowing into an ImList would involve mutating the list.

Suppose we have the following AreaComp implementation of Comparator<Shape> that compares two Shape objects.

class AreaComp implements Comparator<Shape> {
    public int compare(Shape s1, Shape s2) {
        return s1.getArea() - s2.getArea();
    }
}

Now declare shapeComp as some kind of Comparator that will be used to sort a list of Shape objects.

Comparator<..> shapeComp = new AreaComp();
new ImList<Shape>(..).sort(shapeComp);

Clearly, the above assignment is valid since we can sort a list of shapes based on the area of the shape. Consider another Comparator<Object> that compares the length of the string representations of the objects.

class StringLenComp implements Comparator<Object> {
    public int compare(Object s1, Object s2) {
        return o1.toString().length() - o2.toString().length();
    }
}
Comparator<..> shapeComp = new StringLenComp();
new ImList<Shape>(..).sort(shapeComp);

Sorting is also valid since we can sort shapes based on the lengths of their String representations.

Now do you see that data has to flow into shapeComp in order for the compare method to return the appropriate integer value? Here we say that shapeComp is a consumer of data.

In contrast, if we have a Comparator<Circle> that compares say, the radii of circles, then using it to sort ImList<Shape> is invalid since there is no guarantee that only circles are contained in the list.

To support substitutions that allow the flow of data into shapeComp, we declare shapeComp to be a Comparator of Shape or any of its super-classes.

Comparator<? super Shape> shapeComp = ...;
new ImList<Shape>(..).sort(shapeComp);

In other words, the following substitutions hold:

Comparator<Shape> <: Comparator<? super Shape>
Comparator<Object> <: Comparator<? super Shape>.

We sometimes abbreviate ? super Shape using the informal notation -Shape. As an example, in Comparator<-Shape>, the - denotes Shape data flowing into the Comparator.

Considering Data Flow in ImList

We are now ready to include the addAll and sort methods in ImList by focusing the flow of data in these two methods.

class ImList<E> {
    private final List<E> elems;
    ...

    ImList<E> addAll(ImList<..> list) { 
        ImList<E> newList = new ImList<E>(this.elems);
        newList.elems.addAll(list);
        return newList;
    }

    ImList<E> sort(Comparator<..> cmp) {
        ImList<E> newList = new ImList<E>(this.elems);
        newList.elems.sort(cmp);
        return newList;
    }        
    ...
}

The addAll method in generic class ImList requires that the list method parameter be a supplier of data, i.e. data flows out of list. This suggests that the addAll method should be declared as

ImList<E> addAll(ImList<? extends E> list) {
    ...
}

The above makes the following a valid statement:

new ImList<Shape>().addAll(new ImList<Circle>().add(new Circle(1.0)))

The astute reader should also appreciate that the of method (and the constructor in generic class ImList<E>) that takes ImList<E> as argument, should also support any subclass of E to be passed into the method. Hence the parameter type should more generally be declared as ImList<? extends E> list.

In contrast, the sort method in generic class ImList<E> requires that the cmp method parameter be a consumer of data, i.e. data flowing into cmp. This suggests that the sort method should be declared as

ImList<E> sort(Comparator<? super E> cmp) {
    ...
}

The above makes the following a valid statement:

new ImList<Shape>(..).sort(new StringLenComp())

There is a useful acronym PECS (stands for Producer Extends, Consumer Super) that one can generally rely on when determining whether the parameter of a method should be wildcard bounded with ? extends (or ? super) depending on whether it is a producer (or consumer) of data.

Sub-Typing in Generics

Now let's formally define sub-typing for generics. In the following, we shall assume that S <: T.

Invariant sub-typing

Java generics uses invariant sub-typing.

S<E1> <: T<E2> only if E1=E2, or more formally

E1 = E2; S <: T
---------------
S<E1> <: T<E2>

where E1 = E2 is the invariant sub-typing, S <: T is the class sub-typing.

Using ArrayList as an example, it is evident that one can read an element from the list using get, and insert an element into the list using add. To faciliate both add or get operations, ArrayList should support both inward and outward data flow, and hence E1 = E2.

Likewise, since ArrayList <: List, ArrayList<E1> <: List<E2> only if E1 = E2.

Covariant sub-typing

Assume that the following foo method is defined with E1 <: E2.

... foo(T<? extends E2> t) {
    // E2 data flowing out of t

then the following statements are valid

  • T<E1> t = ...; foo(t)
  • T<E2> t = ...; foo(t)
  • S<E1> s = ...; foo(s)
  • S<E2> s = ...; foo(s)

More formally,

E1 <: E2; S <: T
---------------
S<E1> <: T<+E2>

where E1 <: E2 is the covariant sub-typing, S <: T is class sub-typing.

If a method takes as parameter ArrayList<? extends E2> list (or List<? extends E2> list) and list is a supplier of data in the method, then we can call the method by passing in an object of type ArrayList<E1>.

Contravariant sub-typing

Assume that the following bar method is defined with E2 <: E1

... bar(T<? super E2> t) {
    // E2 data flowing into t

then the following statements are valid

  • T<E2> t = ...; bar(t)
  • T<E1> t = ...; bar(t)
  • S<E2> s = ...; bar(s)
  • S<E1> s = ...; bar(s)

More formally,

E2 <: E1; S <: T
---------------
S<E1> <: T<-E2>

where E2 <: E1 is the contravariant sub-typing, S <: T is class sub-typing.

If a method takes as parameter ArrayList<? super E2> list (or List<? super E2> list) and list is a consumer of data in the method, then we can call the method by passing in an object of type ArrayList<E1>.

More Examples

Upper and lower bounded wildcards can be used for different parameters of a method. Here is an example of a generic method that adds elements from a source list to a destination list.

static <T> void copy(ArrayList<? extends T> src, ArrayList<? super T> dst) {
    for (T t : src) {
        dst.add(t);
    }
}

Notice that we are reading from src (a supplier of data) and writing into dst (a consumer of data).

Printer

Suppose we have a Printer object with a print method that is used for the purpose of printing. We can write a method that takes in Printer.

<T> ...(Printer<? super T> printer) {
    T t = ...
    printer.print(t);
}

As an example, suppose T is bound to Integer. In order to print an Integer, it can either be printed as an Integer, or printed as an Object. Hence, Printer<Object> <: Printer<-Integer>

Function

In due time, you will also see the use of a Function object being passed to a method where a value of type T is passed as input to the function via a method called apply, and a value of type R is returned as output.

<T,R> ...(Function<? super T, ? extends R> fn) { // declare two type parameters T and R
    T t = ...
    R r = fn.apply(t); // input of type T flows into fn, and output of type R flows out of fn
    ...
}

Suppose T is bound to Integer, and R is bound to Shape. We can pass a function Function<Object, Circle> and it will still work since Function<Object, Circle> <: Function<-Integer,+Shape>.

Method overriding

Method overriding is an interesting example where covariant sub-typing is apparent. Suppose we have the following two classes S and T.

class T {
    A method() {
        ...
    }
}

class S extends T {
    @Override
    B method() {
        ...
    }
}

You have seen that method in class S overrides method in class T as long as B <: A. Indeed, the return type in Java overriding methods employs covariant sub-typing. As an example, given the foo method below:

void foo(T t) {
    A a = t.method();
}

When we can call foo(new S()), the program still works.

B <: A; S <: T
------------------------------------
type(S.method()) <: type(T.method())

where type(S.method()) is the return type of method() in class S

Q: Do you expect contravariance on parameters of overriding methods?

class T {
    void method(B b) {
        ...
    }
}

class S extends T {
    @Override
    void method(A a) {
        ...
    }
}

Given the following foo method with B <: A,

void foo(T t) {
    t.method(new B());
}

can we call foo(new S()) with an overriding method that takes in parameter of type A?

⚠️ **GitHub.com Fallback** ⚠️