Guava - illyfrancis/scribble GitHub Wiki

Old presentation

Part 1

Immutable collections

use immutable collections, instead of unmodifirable
- ImmuatableList, Set, SortedSet, Map, (SortedMap - oneday)
- immutable vs unmodifiable
  - immutability guarantee, easier to use, faster, less memory
comparing to Collections.unmodifirableXXX with immutable (8:06)

Examples

constant set using JDK (11:55)
a little better Collections.unmodifiableSet( new LinkedHashSet<Integer>( Arrays.asList(...)));
- ImmutableSet LUCKY_NUMBERS = ImmutableSet.of(4,8,15...);
  - of() <- java.util.EnumSet pattern (** check it out**)
  - constant map - ImmutableMap<String, Integer> ENG_TO_INT = ImmutableMap.with("four", 4)...build() (15:16)
  - Defensive copies
    - ImmutableSet.copyOf(numbers) (17:10)
  - factories
    - ImmutableSet.of() (== Collections.emptyset ?)
    - ImmutableSet.of(a) (== Collections.singleton?)
    - ImmutableSet.of(a, b, c)
    - ImmutableSet.copyOf(someIterator);
    - ImmutableSet.copyOf(someIterable);
  - static final ImmutableMap<Integer, String> MAP = ImmutableMap.of(1, "one", 2, "two");
  - there should be a builders pattern (check current imp)
- Don't take nulls!!!

Multisets

Type of collection

Collection behaviour

Can it have duplicates?
Is ordering significant? (for equals())
Iteration order
- insertion-ordered? (Linked list) comparator-ordered? (Tree) user-ordered? (Array)
- something else well-defined?
- or it just doesn't matter?

In general, the first two determine the interface type, and the third tends to influce your choice of implementation

List vs. Set

Set: unordered equality, no dups
List: ordered equauality, can have dups

Diagram (31:50)

                 Ordered?
             Y             N
Dups? +--------------+----------+
  Y   |     List     | Multiset |
      +--------------+----------+
  N   | (UniqueList) |    Set   |
      +--------------+----------+

Multiset: unordered equality, can have dups

== Bag
use cases,
- hand of cards, compare same hands
- are these Lists equal, ignoring order?
- histograms, what distinct tags am I using on my blog, and how many times do I use each one?

Example code, before (36:07)

Map<String, Integer> tags
  = new HashMap<String, Integer>();
for (BlogPost post : getAllBlogPosts()) {
  for (String tag: post.getTags()) {
    int value = tags.containsKey(tag) ? tags.get(tag) : 0;
    tags.put(tag, value + 1);
  }
}

distinct tags: tags.keySet()
count for "java" tag: tags.containsKey("java") ? tags.get("java") : 0;
total count: // oh crap...
Java tutorial shows proper way of doing this!!! (have a look)

Example code, after(38:27)

Multiset<String> tags = HashMultiset.create();
for (BlogPost post : getAllBlogPosts()) {
  tags.addAll(post.getTags())
}

distinct tags: tags.elementSet();
count for "java" tag: tags.count("java")
total count: tags.size()

Example, after after (40:10)

if need to remove/decrement? (Multiset supports it)
concurrency? (instead of locking the entire map, use ConcurrentMultiset)

Part 2 (http://www.youtube.com/watch?v=9ni_KEkHfto)

Multiset API (00:41)

Everything from collection plus
count, add, remove, setCount, etc

Multiset implementations

ImmutableMultiset
HashMultiset
LinkedHashMultiset
TreeMultiset
EnumMultset
ConcurrentMultiset

Multimaps

Before

Map<Salesperson, List<Sale>> map = new HashMap<Salesperson, List<Sale>>();

public void makeSale(Salesperson salesPerson, Sale sale) {
  List<Sale> sales = map.get(salesPerson);
  if (sales == null) {
    sales = new ArrayList<Sale>();
    map.put(salesPerson, sales);
  }
  sales.add(sale);
}

with multimaps

Multimap<Salesperson, Sale> multimap = ArrayListMultimap.create();

public void void makeSale(Salesperson salesPerson, Sale sale) {
  multimap.put(salesPerson, sale);
}

collection of key-value pairs (entries) like a Map except that keys don't have to be unique {a=1, a=2, b=3}
or as Map use asMap() {a=[1,2], b=[3]}
get() view implements subtype, (13:44)
re-look 14:20, good example biggiest sale
view collections - Multimap has six: get(), keys(), keySet(), values(), entries(), asMap()

Multimap vs Map

Most Map methods are identical on Multimap
- size(), isEmpty()
- containsKey(), containsValue()
- put(), putAll()
- clear()
- values()
The others have analogues
- get() returns Collection instead of V
- remove(K) becomes remove(K, V) and removeAll(K)
- keySet() becomes keys()(well, and and KeySet())
- entrySet() becomes entries()
And Multimap has a few new things
- containsEntry(), replaceValues()

BiMap

aka, unique-values map, guarantees its values are unique as well as its keys
has inverse() view
- bimap.inverse().inverse() == bimap
stop creating two separate forward and backward Maps!

ReferenceMap (21:39)

when dealing with weak or soft refereces
a generalization of java.util.WeakHashMap
Nine possible combos:
- strong, weak, or soft keys
- strong, weak or soft values
fully concurrent
- implements ConcurrentMap
- cleanup done on GC Thread
and more...
used a lot for caching,
- when using strong reference, the object doesn't get gced? but weak reference does????? (24:00)

Ordering class

Comparator is easy to implement but a pain to use
Ordering is Comparator++ (or RichComparator)

Ordering<String> caseless = Ordering.forComparator(String.CASE_INSENSITIVE_ORDER)

Also there's method like min(iterable), max(iter), isIncreasing(iterable), sortedCopy(iterable), reverse()...

Static factory methods

Rather than we type

Multimap<String, Class<? extends Handler>> handlers =
  new ArrayListMultimap<String, Class<? extends Handler>>();

do this

Multimap<String, Class<? extends Handler>> handlers =
  ArrayListMultimap.create();

also provided for JDK collections, like Lists, Sets, Maps
with overloads to accept Iterables to copy elements from

Working with Iterator and Iterables

Collection is a good abstraction when all your data is in memory
Sometimes you want to process large amounts of data in a single pass
Implementing Collection is possible but cumbersome, and won't behave nicely
Iterator and Iterable are often all you need
GoogleCollection methods accept Iterator and Iterable whenever practical

Iterators and Iterables classes

These classes have parallel APIs, one for Iterator and the other for Iterable

Iterable transform(Iterable, Function)
Iterable filter(Iterable, Predicate)
T find(Iterable<T>, Predicate)
Iterable concat(Iterable<Iterable>)
Iterable cycle(Iterable)
T getOnlyElement(Iterable<T>)
Iterable<T> reverse(List<T>)
...

These methods are LAZY!
- backing iterators aren't accessed until needed

Presso by an aussie guy (not that good)

No more null

use Optional

class Person {
  Optional<Color> getFavoriteColor();
}

Color colorToUse = person.getFavoriteColor().or(Blue)

From Devoxx

com.google.common.base

Preconditions

Used to validate assumptions at the start of methods or constructors (and fail-fast)

public Car(Engine engine) {
  this.engine = checkNotNull(engine); // NPE
}
public void drive(double speed) {
  checkArgument(speed > 0.0, "speed (%s) must be positive", speed); // IAE
  checkState(engine.isRunning(), "engine must be running"); // ISE
  ...
}

Objects.toStringHelper()

for implementing Object.toString() cleaner

return Objects.toStringHelper(this)
  .add("name", name)
  .add("id", userId)
  .add("pet", petName)  // petName is @Nullable!
  .omitNullValues()
  .toString();

// "Person{name=Kurt Kluever, id=42}"

or without .omitNullValues()

// "Person{name=Kurt Kluever, id=42, pet=null}"

Stopwatch

Prefer Stopwatch over System.nanoTime()
- (and definitely over currentTimeMillis())
- exposes relative timings, not absolute time
- alternate time sources can be substituted using Ticker (read() returns nanoseconds)
- toString() gives human readable format

Stopwatch stopwatch = new Stopwatch();
stopwatch.start();
doSomeOtherOperation();
long millis = stopwatch.elapsedMillis();
long nanos = stopwatch.elapsedTime(TimeUnit.NANOSECONDS);

String splitting

Splitter.on(',')
  .trimResults()
  .omitEmptyStrings()
  .split(" foo, ,bar, quux, ");

=> ["foo", "bar", "quux"]

split()
.trimResults()
.omitEmptyStrings()

private static final Splitter SPLITTER = 
  Splitter.on(',').trimResults();

SPLITTER.split("Kurt, Kevin, Chris");
// yields: ["Kurt", "Kevin", "Chris"]

String Joining

Joiner concats strings using a delimiter
- throws a NPE on null objects, unless
  - .skipNulls()
  - .useForNull(String)

private static final Joiner JOINER = 
  Joiner.on(", ").skipNulls();

JOINER.join("Kurt", "Keven", null, "Chris");
// yields: "Kurt, Kevin, Chris"

CharMatcher

What's a matching character?
- WHITESPACE, ASCII, ANY (many pre-defined sets)
- .is('x'), .isNot('_'), .oneOf("aeiou"), .inRange('a', 'Z')
- or subclass CharMatcher implement matches(char)
What to do with those matching characters?
- matchsAllOf, matchesAnyOf, matchesNoneOf
- indexIn, lastLindexIn, countIn
- removeFrom, retainFrom
- trimFrom, trimLeadingFrom, trimTrailingFrom
- collapseFrom, trimAndCollapseFrom, replaceFrom
Example (scrub a user ID)
- CharMatcher.DIGIT.or(CharMatcher.is('-')).retainFrom(userInput);

`Optional<T>`

immutable wrapper that is either:
- present - contains non-null reference
- absent - contains nothing
- it never contains 'null'
possible uses:
- return type (vs null)
  - a T that must be present
  - a T that might be absent
- distinguish between
  - unknown (not present in a map)
  - known to have no value (present in the map with value Optional.absent())
- wrap nullable references for storage in a collection that does not support null
creating an Optional<T>
- Optional.of(notNull);
- Optional.absent();
- Optional.fromNullable(maybeNull);
Unwrapping an Optional<T>
- mediaType.charset().get();
- mediaType.charset().or(Charsets.UTF_8);
- mediaType.charset().or(costlySupplier);
- mediaType.charset().orNull();
Other useful methods
- mediaType.charset().asSet(); // 0 or 1
- mediaType.charset().transform(stringFunc);

Functional Programming

Function<F, T>
- one way transformation of F into T
- T apply(F input)
- most common use: transforming collections (view)
Predicate<F>
- determines true or false for a given F
- boolean apply(F input)
- most common use: filtering collections (view)

com.google.common.collection

FP example

Predicate<Client> activeClients = new Predicate<Client>() {
  public boolean apply(Client client) {
    return client.activeInLastMonth();
  }
};

Returns an immutable list of the names of the first 10 active clients in the database.

FluentIterable.from(database.getClientList())
  .filter(activeClients)  // Predicate
  .transform(Functions.toStringFunction())  // Function
  .limit(10)
  .toImmutableList();

FluentIterable API

Chaining (returns FluentIterable)
- skip
- limit
- cycle
- filter, transform
Querying (returns boolean)
- allMatch, anyMatch
- contains, isEmpty
Converting
- toImmutable{List, Set, SortedSet}
- toArray
Extracting
- first, last, firstMatch (returns Optional)
- get (returns E)

FP

functional style

Function<String, Integer> lengthFunction = 
  new Function<String, Integer>() {
    public Integer apply(String string) {
      return string.length();
    }
  };

Predicate<String> allCaps = new Predicate<String>() {
  public boolean apply(String string) {
    return CharMatcher.JAVA_UPPER_CASE
      .matchesAllOf(string);
  }
};

Multiset<Integer> lengths = HashMultiset.create(
  Iterables.transform(
    Iterables.filter(strings, allCaps),
    lengthFunction));

// ugly!!

without fp

Multiset<Integer> lengths = HashMultiset.create();
for (String string: strings) {
  if (CharMatcher.JAVA_UPPER_CASE.matchesAllOf(string)) {
    lengths.add(string.length());
  }
}

`Multiset<E>`

== a bag
add multiple instances of a given element
counts how many occurrences exist
similar to a Map<E, Integer> but.
- only positive counts
- size() returns total # of items, not # keys
- count() for non-existent key is 0
- iterator() goes over each element in the Multiset
  - elementSet().iterator() unique elements
similar to AtomicLongMap<E> which is like a Map<E, AtomicLong>

`Multimap<K, V>`

Like a Map but may have duplicate keys
The values related to a single key can be viewed as a collection (set or list)
similar to a Map<K, Collection<V>> but
- get() never returns null (returns an empty collection)
- containsKey() is ture only if 1 or more values exists
- entries() returns all entries for all keys
- size() returns total number of entries, not keys
- asMap() to view it as a Map<K, Collection<V>>
typically want variable type to be either ListMultimap SetMultimap (and not Multimap) - ???

`BiMap<K1, K2>`

bi-directional map
both keys and values are unique
can view the inverse map with inverse()
use instead of maintaining two separate maps
- Map<K1, K1>
- Map<K2, K1>

`Table<R, C, V>`

A "two-tier" map, or a map with two keys (called the "row key" and "column key")

can be sparse or dense
- HashBasedTable: uses hash maps (sparse)
- TreeBasedTable: uses tree maps (sparse)
- ArrayTable: uses V[][] (dense)
many views on the underlying data are possible
- row or column map (of maps)
- row or column key set
- set of all cells (as <R, C, V> entries>
use instead of Map<R, Map<C, V>>

Immutable Collections

offered for all collection types
inherently thread-safe
reduced memory footprint
similar to Collections.unmodifiableXXX but
- performs a copy (not a view / wrapper)
- more efficient compared to unmodifiable collections
- type conveys immutability

Cimparators

Ugly...

Comparator<String> byReverseOffsetThenName = 
  new Comparator<String>() {
    public int compare(String tzId1, String tzId2) {
      int offset1 = getOffsetForTzId(tzId1);
      int offset2 = getOffsetForTzId(tzId2);
      int result = offset2 - offset1;  // careful! (why??? could be null???)
      return (result == 0)
        ? tzId1.compareTo(tzId2)
        : result;
    }
  };

ComparisonChain example

One way to rewrite this:

Comparator<String> byReverseOffsetThenName =
  new Comparateor<String>() {
    public int compare(String tzId1, String tzId2) {
      return ComparisonChain.start()
        .compare(getOffset(tzId2), getOffset(tzId1)
        .compare(tzId1, tzId2)
        .result();
    }
  };

Short-circuits, never allocates, is fast. Also has

compare(T, T, Comparator<T>)
compareFalseFirst
compareTrueFirst

Ordering example

Comparator<String> byReverseOffsetThenName =
  Ordering.natural()
    .reverse()
    .onResultOf(tzToOffset())
    .compound(Ordering.natural());

private Function<String, Integer> tzToOffset() {
  return new Function<String, Integer>() {
    public Integer apply(String tzId) {
      return getOffset(tzId);
    }
  };
}

Ordering

Step 1

Implements Comparator and adds delicious goodies! (Could have been called FluentComparator like FluentIterable)

Common ways to get an Ordering to start with:

Ordering.natural()
new Ordering() { ... }
Ordering.from(existingComparator);
Ordering.explicit("alpha", "beta", "gamma");

Step 2

Then you can use the chaining methods to get an altered version of that Ordering

reverse()
compound(Comparator)
onResultOf(Function)
nullsFirst()
nullsLast()
lexicographical()
- yields an Ordering<Iterable>)

Now you've got your Comparator but also Ordering has some handy operations

immutableSortedCopy(Iterable)
isOrdered(Iterable)
isStrictlyOrdered(Iterable)
min(Iterable)
max(Iterable)
leastOf(int, Iterable)
greatestOf(int, Iterable)

Some are even optimized for the specific kind of comparator you have.

which is better?

Ordering or ComparisonChain? -> it depends

com.google.common.hash

Why a new hashing API?

Think of Object.hashCode() as "good enough for in-memory hash maps" but:

Strictly limited to 32 bits
Worse, composed hash codes are "collared" down to 32 bits during the computation
No separation between "which data to hash" and "which algorithm to hash it with"
implementations have very bad bit dispersion

These make it not very useful for a multitude of hashing applications: a document "fingerprint", cryptographic hashing, cuckoo hashing, Bloom filters...

JDK solution

To address JDK intro'd two interfaces

java.security.MessageDigest
java.util.zip.Checksum

Each named after a specific use case for hashing.

Worse than the split, neither is remotely easy to use when you're not hashing raw byte arrays.

Guava Hashing example

HashCode hash =
  Hashing.murmur3_123().newHasher()
    .putInt(person.getAge())
    .putLong(person.getId())
    .putString(person.getFirstName())
    .putString(person.getLastName())
    .putBytes(person.getSomeBytes())
    .putObject(person.getPet(), petFunnel)
    .hash();

HashCode has asLong(), asBytes(), toString()...
or put it into a Set, return it from an API etc

Hashing overview

The com.google.common.hash API offers:

unified user-friendly API for all hash functions
seedable 32- and 128-bit implementations of murmur3 (???)
md5(), sha1(), sha256(), sha512() adapters
- change only one line of code to switch between these and murmur etc
gooFastHash(int bits) for when you don't care what algorithm you use
general utilities for HashCode instances, like combinOrdered/combineUnordered ???

BloomFilter

A probabilistic set

public boolean mightContain(T);
- true == "probably there"
- false == "definitely not there"

why? consider a spell checker:

if syzygy gets red-underlined, that's annoying
but if yarrzcw doesn't get red-underlined ... oh well!
and the memory savings can be very large

Primary use: short-circuiting an expensive boolean query

com.google.common.cache

[to read] - read first pass, many good stuff