Lecture04 - nus-cs2030/2324-s2 GitHub Wiki
This lecture introduces students to how generic classes and methods are defined in order to allow for code reuse.
We take inspiration from Java's ArrayList
and employ the delegation
pattern to build our immutable list ImList
.
In order to support substitutability in generics, we look at the concept
of data flow, and how we can support appropriate substitutions by using
bounded wildcards.
Let us begin our discussion with Java's ArrayList
, a data structure
that we have already started using.
We have been declaring instances of ArrayList
to store different types of
elements, e.g. ArrayList<Shape>
, ArrayList<String>
, etc.
Specifically, ArrayList
is a box type that contains another
type (we sometimes refer to this as a container type).
How does Java handle different "inner" types?
If you refer to the Java API, you will notice that the ArrayList
class is declared as
public class ArrayList<E> ... {
...
}
E
is the generic type of the ArrayList
generic class.
Whenever we declare say,
ArrayList<String>
, the generic type E
is type parameterized to the String
type.
Declaring ArrayList
this way makes it a polymorphic data structure.
This is in contrast to writing individual monomorphic classes such as
ArrayListShape
and ArrayListString
, which have almost the same functionality apart from the type of the element they store.
We also call this parametric polymorphism since ArrayList
handles list
operations that involve different types based on the actual type parameterization.
In the following, we show how an ArrayList<String>
can be created as an empty list, as well as from a given list of String
elements.
jshell> List<String> strings = new ArrayList<String>()
strings ==> []
jshell> strings.add("one")
$.. ==> true
jshell> strings.add("two")
$.. ==> true
jshell> strings // ArrayList is a mutable data structure
strings ==> [one, two]
jshell> new ArrayList<String>(strings) // create another list with same elements as strings
$.. ==> [one, two]
Note that the add
method in the generic class has the following specification:
for all types bound to E
, ArrayList<E> add(E) => boolean
add
is a dynamic (or dynamically dispatched) method and Java uses
it to decide at runtime the method implementation to call based on
the type of the object (or the type of the this
parameter).
In our example, if E
is bound to String
, then the method to call is based on the the type of the this
parameter (ArrayList<String>
) with the method taking in a String
and returning a boolean
.
A formal specification of a dynamic method is given by
dynamic method :: this_type(input_type) => boolean
Let us now try understand the implementation of ImList
which has
been provided to you since the first lecture.
Here is a monomorphic (non-generic or non-polymorphic) ImListString
that only stores String
elements.
class ImListString {
private final List<String> elems;
ImListString() {
this.elems = new ArrayList<String>();
}
ImListString(List<String> list) {
this.elems = new ArrayList<String>(elems);
}
@Override
public String toString() {
return this.elems.toString();
}
}
Clearly, writing another monomorphic ImListInteger
class would entail duplicating the entire ImListString
class with only the String
type changed to Integer
.
We cannot declare elems
as List<int>
using the primitive type int
as only object (or class) types can be used for type parameterization.
As such rather than int
, we use the wrapper type Integer
which
is an object type that wraps an integer value.
The following illustrates how we could have wrapped an integer value
1 within an Integer
object, and using intValue()
to extract the
raw primitive value from the object.
jshell> Integer i = new Integer(1)
i ==> 1
jshell> int j = i.intValue()
j ==> 1
Having to explicitly wrapping primitive values with its wrapper type, and calling a method to extract the value is extremely cumbersome. Hence, Java provides auto-boxing and auto-unboxing to simplify the process.
jshell> Integer i = 1 // auto-box and assign to i
i ==> 1
jshell> int j = i // auto-unbox and assign to j
j ==> 1
Now we are ready to modify our ImListString
to a generic class.
We also include the add
method that takes in an element of type E
, and a get
method that returns a value of type E
.
class ImList<E> { // declare generic type E
private final List<E> elems;
ImList() {
this.elems = new ArrayList<E>();
}
ImList(List<E> list) {
this.elems = new ArrayList<E>(elems);
}
ImList<E> add(E elem) {
ImList<E> newList = new ImList<E>(this.elems);
newList.elems.add(elem);
return newList;
}
E get(int index) {
return this.elems.get(index);
}
@Override
public String toString() {
return this.elems.toString();
}
}
To create different ImList
structures for different element
types, we just need to type parameterize accordingly.
jshell> ImList<String> strings = new ImList<String>()
strings ==> []
jshell> ImList<Integer> ints = new ImList<Integer>(List.of(1, 2, 3))
ints ==> [1, 2, 3]
It is worth nothing that much of the implementations of the ImList
is delegated to an internal ArrayList
that is encapsulated from the
client. The only additional responsibility of ImList
is to make
sure that all modifications to the list is actually applied on a new
ImList
and returned.
In order to invoke the add
method of generic class ImList<E>
, the generic type E
must be type parameterized during the creation of say, ImList<Integer>
.
jshell> ImList<Integer> ints = new ImList<Integer>()
ints ==> []
jshell> ints.add(1)
ints ==> [1]
In contrast, a generic method can be invoked without the need to first create an object. This implies that type parameterization cannot happen during object creation.
So when does type parameterization take place?
Consider the following of
method that is declared statically, i.e. not invoked via any object.
jshell> <T> String of(T t) {
...> return t.toString();
...> }
| created method of(T)
jshell> of(1)
$.. ==> "1"
jshell> of(1.0)
$.. ==> "1.0"
Here, the of
method is a static (or statically dispatched) method which is resolved during compile time depending on whether of(1)
or of(1.0)
is called.
Also notice the need to declare the type parameter 'T' when defining
the of
method header.
In general, a static method is resolved during compile time based on
its input argument types and return type. Note the absence of the
type of the this
parameter as compared to a dynamic method.
static method :: (input_types) => result_type
Static methods are useful as class factory methods.
These methods allow the client to create objects without invoking
the constructor via the new
keyword.
class ImList<E> {
private final ArrayList<E> elems;
private ImList() { // constructor is private
this.elems = new ArrayList<E>();
}
private ImList(List<E> list) { // constructor is private
this.elems = new ArrayList<E>(list);
}
static <E> ImList<E> of() {
return new ImList<E>();
}
static <E> ImList<E> of(List<E> list) {
return new ImList<E>(list);
}
...
}
The two static generic methods of(..)
have the following specification:
- for all types bound to
E
,of() => ImList<E>
- for all types bound to
E
,of(List<E>) => ImList<E>
As the constructors are now declared private
, the only way for clients to create the list is via the factory of
methods.
jshell> List<Integer> list = List.of(1, 2, 3)
list ==> [1, 2, 3]
jshell> ImList.of(list) // ImList<Integer> type-inferred implicitly since list is List<Integer>
$.. ==> [1, 2, 3]
jshell> ImList.<Integer>of() // ImList<Integer> with explicit type-witness to Integer
$.. ==> []
Notice that the creation of the empty list ImList.of()
is
ambiguous as we need to specify what type of ImList
we desire. Hence
we employ type witnessing, which is the type parameterization in between
the dot .
and the method name of
.
By now, you should be familiar with substitutability between parent/child classes, as well as between interfaces and their implementation classes. Substitutability or more formally Liskov's Substitution Principle (LSP) is a principle in OOP that states that objects of a superclass should be replaceable (or substitutable) with objects of its subclasses without affecting the correctness of the program.
Consider the following familiar example
Circle circle = new Circle(1.0);
Shape shape = circle;
double area = shape.getArea();
The code above is valid because Circle
is-a Shape
.
This is-a relationship can also be expressed as Circle <: Shape
(read as Circle
is a Shape
or
Circle
is substitutable for Shape
).
Moreover, recall that substitutability also holds during parameter passing, which is an assignment across methods.
void foo(Shape shape) {
double area = shape.getArea();
}
foo(circle);
It is also timely to look at assignment or parameter passing as data flow.
An assignment shape = circle
represents data flow from the RHS of the
assignment (in this case circle
) to the LHS variable (in this case
shape
). To support this assignment Circle <: Shape
must hold.
Likewise in parameter passing, calling foo(circle)
also represents data
flow from the argument (in this case circle
) to the method parameter (in this
case shape
).
Indeed, Circle <: Shape
supports parameter passing while maintaing LSP.
Now in the context of generics, we have already seen that ArrayList<Shape> <: List<Shape>
holds (since ArrayList <: List
).
List<Shape> shapes = new ArrayList<Shape>(..)
But what about the following assignment? Is it valid?
ArrayList<Shape> shapes = new ArrayList<Circle>(..)
Java does not allow the above assignment since generics is invariant. However, this does not mean that substitutability is not applicable generics. The validity of an assignment based on LSP depends on what is done after the assignment, in particular the direction of the data flow.
Let's assume that shapes
is declared as some kind of ImList
and assigned to a non-empty list.
ImList<..> shapes = new ImList<Shape>(...)
Shape shape = shapes.get(0);
The statement Shape s = shapes.get(0)
is clearly valid if shapes
was assigned to ImList<Shape>
. This assignment should also be valid if
shapes
was assigned to ImList<Circle>
.
ImList<..> shapes = new ImList<Circle>(...)
Shape shape = shapes.get(0);
Based on LSP, substituting ImList<Shape>
with ImList<Circle>
does not affect the correctness of the program since in the latter, we are getting a circle (which is a shape) from the list and assigning it to the shape
variable.
Now pay particular attention to the statement
Shape shape = shapes.get(0);
The flow of data is from the shapes
list to a shape
variable.
It does not matter if shapes
is a ImList<Shape>
or list of its sub-class (e.g. ImList<Circle>
) as long as data flows out of shapes
.
Here, we also say that shapes
is a supplier of data.
In contrast, the following should not be allowed.
ImList<..> shapes = new ImList<Object>(..)
Shape shape = list.get(0);
There is no guarantee that shapes
will contain only Shape
objects and assignment to the shape
variable fails.
To support substitutions that the allow the flow of data out of shapes
, we declare shapes
to be an ImList
of Shape
or any of its sub-classes.
ImList<? extends Shape> shapes = ...
Shape shape = list.get(0);
In other words, the following substitutions hold:
ImList<Shape> <: ImList<? extends Shape>
-
ImList<Circle> <: ImList<? extends Shape>
.
We sometimes abbreviate ? extends Shape
using the informal notation +Shape
.
For example, in ImList<+Shape>
, the +
denotes Shape
data flowing out from the ImList
.
To summarize, when the variable shapes
is a supplier of data (i.e. data flowing outwards), shapes
can be assigned to ImList
of Shape
or any of its
sub-class.
While we consider data flowing out in the previous section, we now
consider the scenario of data flowing in by using an example from Comparator
.
Note that we do not illustrate with ImList
as data flowing into an ImList
would involve mutating the list.
Suppose we have the following AreaComp
implementation of
Comparator<Shape>
that compares two Shape
objects.
class AreaComp implements Comparator<Shape> {
public int compare(Shape s1, Shape s2) {
return s1.getArea() - s2.getArea();
}
}
Now declare shapeComp
as some kind of Comparator
that will be
used to sort a list of Shape
objects.
Comparator<..> shapeComp = new AreaComp();
new ImList<Shape>(..).sort(shapeComp);
Clearly, the above assignment is valid since we can sort a list of shapes based
on the area of the shape.
Consider another Comparator<Object>
that compares the length of the
string representations of the objects.
class StringLenComp implements Comparator<Object> {
public int compare(Object s1, Object s2) {
return o1.toString().length() - o2.toString().length();
}
}
Comparator<..> shapeComp = new StringLenComp();
new ImList<Shape>(..).sort(shapeComp);
Sorting is also valid since we can sort shapes based on the lengths of
their String
representations.
Now do you see that data has to flow into shapeComp
in order for the
compare
method to return the appropriate integer value?
Here we say that shapeComp
is a consumer of data.
In contrast, if we have a Comparator<Circle>
that compares say, the radii of
circles, then using it to sort ImList<Shape>
is invalid since there is no guarantee that only circles are contained in the list.
To support substitutions that allow the flow of data into shapeComp
, we declare shapeComp
to be a Comparator
of Shape
or any of its super-classes.
Comparator<? super Shape> shapeComp = ...;
new ImList<Shape>(..).sort(shapeComp);
In other words, the following substitutions hold:
Comparator<Shape> <: Comparator<? super Shape>
Comparator<Object> <: Comparator<? super Shape>.
We sometimes abbreviate ? super Shape
using the informal notation -Shape
.
As an example, in Comparator<-Shape>
, the -
denotes Shape
data flowing into the Comparator
.
We are now ready to include the addAll
and sort
methods in ImList
by focusing the flow of data in these two methods.
class ImList<E> {
private final List<E> elems;
...
ImList<E> addAll(ImList<..> list) {
ImList<E> newList = new ImList<E>(this.elems);
newList.elems.addAll(list);
return newList;
}
ImList<E> sort(Comparator<..> cmp) {
ImList<E> newList = new ImList<E>(this.elems);
newList.elems.sort(cmp);
return newList;
}
...
}
The addAll
method in generic class ImList requires that the list
method parameter be a supplier of data, i.e. data flows out of
list
. This suggests that the addAll
method should be declared as
ImList<E> addAll(ImList<? extends E> list) {
...
}
The above makes the following a valid statement:
new ImList<Shape>().addAll(new ImList<Circle>().add(new Circle(1.0)))
The astute reader should also appreciate that the of
method (and the constructor in generic class ImList<E>
) that takes ImList<E>
as argument, should also support any subclass of E
to be passed into the method. Hence the parameter type should more generally be declared as ImList<? extends E> list
.
In contrast, the sort
method in generic class ImList<E>
requires
that the cmp
method parameter be a consumer of data, i.e. data flowing into
cmp
. This suggests that the sort
method should be declared as
ImList<E> sort(Comparator<? super E> cmp) {
...
}
The above makes the following a valid statement:
new ImList<Shape>(..).sort(new StringLenComp())
There is a useful acronym PECS (stands for Producer Extends, Consumer Super)
that one can generally rely on when determining whether the parameter of a method should be wildcard bounded with ? extends
(or ? super
) depending on whether it is a producer (or consumer) of data.
Now let's formally define sub-typing for generics.
In the following, we shall assume that S <: T
.
Java generics uses invariant sub-typing.
S<E1> <: T<E2>
only if E1=E2
, or more formally
E1 = E2; S <: T
---------------
S<E1> <: T<E2>
where E1 = E2
is the invariant sub-typing, S <: T
is the class sub-typing.
Using ArrayList
as an example, it is evident that one can read an element
from the list using get
, and insert an element into the list using add
.
To faciliate both add
or get
operations, ArrayList
should support both inward and outward data flow, and hence E1 = E2
.
Likewise, since ArrayList <: List
, ArrayList<E1> <: List<E2>
only if E1 = E2
.
Assume that the following foo
method is defined with E1 <: E2
.
... foo(T<? extends E2> t) {
// E2 data flowing out of t
then the following statements are valid
T<E1> t = ...; foo(t)
T<E2> t = ...; foo(t)
S<E1> s = ...; foo(s)
S<E2> s = ...; foo(s)
More formally,
E1 <: E2; S <: T
---------------
S<E1> <: T<+E2>
where E1 <: E2
is the covariant sub-typing, S <: T
is class sub-typing.
If a method takes as parameter ArrayList<? extends E2> list
(or List<? extends E2> list
) and list
is a supplier of data in the method, then we can call the method by passing in an object of type ArrayList<E1>
.
Assume that the following bar
method is defined with E2 <: E1
... bar(T<? super E2> t) {
// E2 data flowing into t
then the following statements are valid
T<E2> t = ...; bar(t)
T<E1> t = ...; bar(t)
S<E2> s = ...; bar(s)
S<E1> s = ...; bar(s)
More formally,
E2 <: E1; S <: T
---------------
S<E1> <: T<-E2>
where E2 <: E1
is the contravariant sub-typing, S <: T
is class sub-typing.
If a method takes as parameter ArrayList<? super E2> list
(or List<? super E2> list
) and list
is a consumer of data in the method, then we can call the method by passing in an object of type ArrayList<E1>
.
Upper and lower bounded wildcards can be used for different parameters of a method. Here is an example of a generic method that adds elements from a source list to a destination list.
static <T> void copy(ArrayList<? extends T> src, ArrayList<? super T> dst) {
for (T t : src) {
dst.add(t);
}
}
Notice that we are reading from src
(a supplier of data) and writing into
dst
(a consumer of data).
Suppose we have a Printer
object with a print
method that is used for the purpose of printing. We can write a method that takes in Printer
.
<T> ...(Printer<? super T> printer) {
T t = ...
printer.print(t);
}
As an example, suppose T is bound to Integer
.
In order to print an Integer
, it can either be printed as an Integer
, or printed as an Object
.
Hence, Printer<Object> <: Printer<-Integer>
In due time, you will also see the use of a Function
object being passed
to a method where a value of type T
is passed as input to the function via a method called apply
, and a value of type R
is returned as output.
<T,R> ...(Function<? super T, ? extends R> fn) { // declare two type parameters T and R
T t = ...
R r = fn.apply(t); // input of type T flows into fn, and output of type R flows out of fn
...
}
Suppose T
is bound to Integer
, and R
is bound to Shape
. We can pass
a function Function<Object, Circle>
and it will still work since
Function<Object, Circle> <: Function<-Integer,+Shape>
.
Method overriding is an interesting example where covariant sub-typing is apparent.
Suppose we have the following two classes S
and T
.
class T {
A method() {
...
}
}
class S extends T {
@Override
B method() {
...
}
}
You have seen that method
in class S
overrides method
in class T
as
long as B <: A
. Indeed, the return type in Java overriding methods employs covariant sub-typing. As an example, given the foo
method below:
void foo(T t) {
A a = t.method();
}
When we can call foo(new S()), the program still works.
B <: A; S <: T
------------------------------------
type(S.method()) <: type(T.method())
where type(S.method())
is the return type of method()
in class S
Q: Do you expect contravariance on parameters of overriding methods?
class T { void method(B b) { ... } } class S extends T { @Override void method(A a) { ... } }Given the following
foo
method withB <: A
,void foo(T t) { t.method(new B()); }
can we call
foo(new S())
with an overridingmethod
that takes in parameter of typeA
?