Arguments in favor of the technique - SoftDevGang/RefactorLegacyCodeThroughPureFunctions GitHub Wiki

The Nominal Format for Software Design

This is an argument by analogy with the way data processing systems work (eg. Microsoft Biztalk). Such systems accept multiple data formats as inputs and as outputs. The way they process the data is by first transforming the input into a nominal data format (a specific XML format for Microsoft Biztalk), and then to transform the nominal data format into the output format. This approach is useful because it disconnects input formats from output formats.

Similarly, we can consider a way of refactoring legacy code that uses the same steps: refactor to a nominal format, and then refactor from the nominal format to the desired design. This leads to the question: what would be a nominal format for software design?

The claim made by this technique is that the nominal format for software design is pure functions + I/O functions. In other words, the claim is that:

Any software program can be written as a combination of pure functions and I/O functions

Note (Alex): I am convinced that a formal proof of this claim can be made, but for now I'm working based on empirical evidence.

The claim above makes no value judgment regarding the design. In other words, the nominal format for design is not deemed to be better or worse than other code structures (eg. object-oriented or functional). This design format serves simply as an intermediary step in a larger refactoring.

However, the nominal format for software design is useful because it has a few advantages:

  • Pure functions have no dependencies, therefore make all their dependencies explicit as input parameters or lambda bindings
  • Pure functions are immutable, making them simpler than the functions with side effects.
  • Pure functions are free of context, allowing them to move anywhere. The nominal design format offers therefore the maximum number of options for reorganization in a final design
  • Pure functions are very easy to test because they are fundamentally a programmatic representation of a big table

These advantages are similar to the nominal data formats used in data transformation: the nominal format is free of context, allowing multiple options for translation to other formats.

Using tools to "massage" mutation and dependencies

This argument is meant to show the effectiveness of the method compared to other methods.

Experience with refactoring legacy code shows that a lot of time is spent on restructuring dependencies and on taking care of mutations. The technique takes advantage of a code construct, pure functions, that avoids mutation and makes dependencies explicit. Tools like the compiler, the automated refactorings implemented in IDEs, and code checkers allow safe extraction of pure functions from arbitrarily complex code structures.