Functional Programming with UncleJim - GlenKPeterson/Paguro GitHub Wiki
Note: UncleJim is now called Paguro.
What do you think about when you write code? Are you thinking, "First I'm going to make this whiz, then make it bang!" Or are you thinking, "Well, to make a whiz, I first need a Whiz class, which then needs a private constructor, a public static factory method, and then some fields to hold the whiz state, followed by some accessor methods... Wait, why was I defining Whiz?"
Almost every project I work on involves some degree of experimentation or prototyping. Both the data structures and the functions change during this process. Often I show some prototypes to the business and together we change our goals for the project. Rigid data definitions are not helpful at this point.
Many people conflate objects with mutability. That happens sometimes, but I like to avoid it whenever practical.
I love Types because they
- are compiler-verified documentation of function parameter types
- give good names to data structures
- make refactoring practical with large code bases (and easier with small ones)
- check parameter and return types more easily/effectively than writing tests.
Dynamic languages let you jump right to functions. Some are better at letting you name data structures than others. Typically in Java, you have to make your classes before writing any functions. But what if you could define your functions first, make your project whiz and bang, then easily add compiler-verified descriptive names for your data types later?
Lessons learned from Clojure (vs. Java and Scala):
- Immutability prevents as many errors as type safety does. It can also simplify your code, making it more readable.
- Writing functions before formalizing (defining/naming) data structures is a huge win.
- Functional Programming focuses you on choosing the right collections and avoids the distractions of looping.
- Pure functions (referentially transparent / lacking side-effects) are much easier to test (and reason about).
This deserves its own section: Looping vs. Functional Transformations
Instead of thinking, “Oh, I need to process that stuff, I'll add an inner loop!” I want to be thinking “What data structure would have the best performance characteristics for that transformation?” I can only think about a limited number of things at once. Each unnecessary concern takes precious thought power away from a more important one. Choosing data structures for their performance characteristics is my #2 priority, right below developing a correct solution.
Paguro enables an easier, safer style of Java programming by leveraging the immutability of Clojure inside the type-safe world of Java. It takes advantages of Java's type inferencing by avoiding void return types, arrays, primitives, and checked exceptions in lambdas. It can decrease the amount of code you need to write by a factor of at 2x-3x while focusing you on using the right collections for the fastest possible code.
Paguro is a Java 8 API, not a separate language like Clojure or Scala. The jar file is only about 200K and it compiles and runs under profile compact1. It is available under a combined Apache 2.0 and Eclipse license due to the persistent collections being derivative works of the Clojure source code which is licensed under Eclipse.
By statically importing Paguro's StaticImports.*, you gain access to a number of simple methods for defining immutable data. For instance, vec()
creates a vector (which extends List).
tup()
creates a type-safe heterogeneous map. OK, that's a mouthful. You can use tup()
instead of defining classes. It actually returns a Tuple class of the appropriate number of generic arguments (currently up to 12). The Java compiler (and most IDEs) can infer the types of the arguments to tup()
. Extending Paguro's tuples is a safe and easy way to make immutable classes of your own.
Less common collection builders are set()
, map()
, sortedSet()
, sortedMap()
, etc. They pretty much do what you'd expect them to.
You're probably thinking, "Why should I care when Java has built in Collections.unmodifiableList(Arrays.asList())
and Collections.singletonList()?
" Hey, if you want to keyboard all of that in, then read it all back later, I'm not going to argue with you. But consider:
-
Paguro uses immutable collections, so that when you have a vector of 1000 integers and you call
myVec.append(1001)
, it reuses most if the internal structure of the collection, only ever copying a maximum of 32 items to the new collection. Adding an item to an unmodifiable list requires copying the entire list. -
Paguro also has unmodifiable collection wrappers, but unlike
Collections.unmodifiable___()
they deprecate the mutator methods so your IDE and compiler will warn you if you call them by accident. Jim's Immutable interfaces extend the Unmodifiable ones so they share this trait. -
Why would you ever want to think about whether you're using
Collections.singletonList()
orArrays.asList()
? Why remember to pass that toCollections.unmodifiableList()
? Just callvec()
instead.
Transformations are like Java 8 Streams, but instead of 43 different functional interfaces, Jim has only 3 generic ones. Function0
takes zero arguments, Function1
takes one, and Function2
takes 2. Easy, right? Instead of IntConsumer, use the type system to declare Function1<Integer,?>
.
Those 43 java.util.function
interfaces have 11 different names for apply()
. Paguro's Functions have just 2: applyEx()
throws an exception, while apply()
wraps that exception. You still use the same lambda syntax with Paguro that you would with Java 8 Streams, but without worrying about checked exceptions.
Clojure is perhaps one of the easiest languages to learn functional programming in, but the sheer number of higher order functions is overwhelming. I kept asking people, “Which are the critical ones, and which are just convenience wrappers?” For this reason, Paguro has a limited selection of higher-order functions on its Transformable interface which work on all collections and xform()'s. They have been chosen as the most fundamental/useful, yet brief and easy to understand subset of all the possible functional transformations:
// Return only the first numItems.
Transformable<T> take(long numItems);
// Return items from the beginning until the given predicate returns false.
Transformable<T> takeWhile(Function1<? super T,Boolean> predicate);
// Ignore the first numItems and return only those that come after.
Transformable<T> drop(long numItems);
// Add items to the end of this Transformable
Transformable<T> concat(Iterable<? extends T> list);
// Add items to the beginning of this Transformable
Transformable<T> precat(Iterable<? extends T> list);
// Return only the items for which the given predicate returns true.
Transformable<T> filter(Function1<? super T,Boolean> predicate);
// Transform each item into exactly one new item using the given function.
<U> Transformable<U> map(Function1<? super T,? extends U> func);
// Transform each item into zero or more new items using the given function.
<U> Transformable<U> flatMap(Function1<? super T,Iterable<U>> f);
// Eagerly apply the function to each item, accumulating the result in u.
<U> U foldLeft(U u, Function2<U,? super T,U> fun);
// Same as foldLeft, but lets you break on an output condition
<U> U foldLeft(U u, Function2<U,? super T,U> fun,
Function1<? super U,Boolean> terminateWhen);
These transformations (except for foldLeft) are lazy, they don't produce intermediate collections, and they can be immutably chained together. Better yet, declaring an empty xform()
on a collection is only 2% slower than a native forEach loop (measured with 30 million Integers). But that 2% allows me to focus on picking collections that have the performance characteristics most appropriate for the data transformation I am performing. Using the right collection typically results in a performance gain from O(n) to O(log n) or similar. This isn't about writing slower code. It's about writing faster, safer code on the first pass. See the Transformable usage example.
Tip: If you are passed an existing mutable collection, it's most efficient to wrap it in xform()
(from StaticImports). This treats the collection as a data source for a functional transformation.
The official 1.0 was released in March 2016, but it was used in production at work for several years before that. Release 2.0 in September 2016 added Serialization and deprecated or removed some outdated methods. Codecov.io reports test coverage.
Feedback and contributions are appreciated!