Home - SoftDevGang/RefactorLegacyCodeThroughPureFunctions GitHub Wiki

Welcome to the exploration of Refactoring Legacy Code through Pure Functions!

Background

Legacy code is everywhere. When dealing with legacy code, we have the following options:

  • Stop modifying the existing code and add new features in new code
  • Keep modifying the existing code, while taking extra precautions or praying that it works
  • Learn the techniques from "Working Effectively with Legacy Code" by Michael Feathers and apply them

Each of these techniques balances risk with time spent refactoring. We can spend 0 time refactoring and work with high risk, or we can spend a lot of time refactoring and reducing the risk.

Unfortunately the method that offers the biggest reductions in risk (M. Feathers' work) is also the one that takes the longest to master and to apply. Not many projects choose therefore to apply it.

Alex Bolboaca was unhappy with this state of things and started looking for techniques that can be taught and applied faster. After working with many legacy codebases while deepening his knowledge of functional programming, he realized that there is a way to refactor legacy code that takes advantage of the simplest type of function: pure functions.

Inspiration

No idea in software engineering is born in a void. This idea is inspired by prior work and experiences:

  • "Working effectively with legacy code" book by Michael Feathers
  • The legacy coderetreat session on pure functions that Alex first experimented in an event facilitated by Erik Talboom in Belgium
  • Data transformation models, including the Microsoft Biztalk architectural model
  • Applying immutability in Groovy on Grails on the Eventrix project
  • Writing the book on functional programming in C++ allowed a deeper understanding of functional programming in practice, even in a complex language
  • The experience with C++ free functions, constness enforcement, and strong immutability, applied in various C++ projects in a 20 years time span.

The Method

The fundamental method is the following:

  • Find the smallest scope you need to change
  • Refactor that scope to pure functions + I/O functions using specific pattern-based mechanical refactoring steps
  • Write data-driven or property-based tests on the pure functions
  • Refactor the pure functions towards an end design

Domain of Applicability

This method is meant to be applied only to code that you are afraid to change, because it either doesn't have tests or it doesn't have tests you can trust. It is meant to be applied once per piece of code, assuming that after you apply it you will have tests you can trust and therefore you can use normal refactoring practices.

It is also meant to be applied incrementally and fractally. Indeed, one of the strengths of this method is that you can apply it on a block of code, on a method, or on a class. Applying it naively to huge code blocks (eg. classes of 5000 lines) would only lead to confusion. Instead, if you are dealing with a huge code block, identify a small block of code to start from, extract the pure functions only for that block of code, write the tests for the new pure functions, and move the pure functions to the place where it makes sense for your design.

You should see the pure functions as a tool for precision surgery. They are not meant to impose a final design, instead they are used for a limited time with a very specific goal: surfacing and managing dependencies. Once they accomplish their goal, and their code is covered by tests, they stop imposing limitations upon your refactoring.

Clarify pure functions

In this context, the definition of pure function is the following:

A pure function is a function that returns the same output value whenever it receives the same input values, and it doesn't change anything in the context

This implies that every pure function returns a value, and it receives 0 - n arguments.

As a difference from the strong immutability of functional programming, we accept as pure functions functions that make changes to local variables, as this makes no difference for our goals.

Clarify I/O functions

A function that produces output is very similar with a pure function, except that it writes to the outside world: file system, network, web service, operating system, database, http response etc. These functions usually return nothing.

A function that takes input is very similar with a pure function, except that it reads from the outside world: file system, console, web service, network, operating system, database, http request etc. These functions may take 0 parameters.

During refactoring we try to make them as context-free as possible, allowing us to move them to other places in the code.

Estimated advantages of the method

The biggest challenge when writing tests or refactoring legacy code is dealing with dependencies. In fact, most techniques from M. Feathers deal with breaking dependencies by introducing seams.

Pure functions are a natural way of removing dependencies with the support of the compiler and IDEs. By applying a few specific steps, the pure function becomes free of context, by receiving all its dependencies as arguments. This allows us to write fairly quickly tests that cover the specific behaviour of the pure function.

Compared to M. Feathers' method, we expect that a limited number of refactoring techniques need to be taught in order to refactor code to pure functions. Moreover the techniques are based on code patterns; for example a state change can be replaced with a pure function that receives the initial value and returns the new value. We expect this factor to contribute to the ease of learning and application to production systems.

We also expect to develop techniques that are specific for programming languages and perhaps IDEs. While the skeleton techniques remain the same, the specific refactoring steps may be different in C++ compared to Javascript.

Disadvantages of the method

When applying this method, you will end up with a design that does not follow the principle "tell, don't ask", due to passing through pure functions which implement the "ask" way of design. Therefore, additional refactoring may be needed to satisfy this principle. Thank you Johan Martinsson for pointing this out.

It is uncertain at this moment whether the first step can be safely performed. More validation of the method is necessary.

Open Questions

The following questions are still open and need to be addressed before making claims about the method's usefulness:

  • Is the technique applicable? What is its domain of applicability?
  • Is the technique teachable?
  • Is the technique more effective than the alternatives? Under which circumstances?

Derived questions are:

  • Is there a list of techniques that cover all possible code patterns so that step 1 can be safely achieved?
  • If not all code patterns are covered by the list of techniques, what are the limits? What are the code patterns that cannot be refactored?
  • Can we teach a random sample of programmers the list of techniques in such a way that they can apply it quickly and effectively?