Clean Code - uyuni-project/uyuni GitHub Wiki

Clean Code

Programs must be written for people to read, and only incidentally for machines to execute. --Harold Abelson and Gerald Jay Sussman, Structure and Interpretation of Computer Programs (1984)

Writing computer programs that are easy to understand, change, expand, and replace is hard. This topic is as old as programming, yet no silver-bullet has been found. Many helpful resources try to advice us on how to design our programs to keep them maintainable. This page serves as a starting point.

Naming

Symbols (variables, functions, methods, classes, modules, etc.) should be clear, descriptive and consistent. These attributes remove cognitive load on the programmer and makes it easier to understand different parts of the program.

SOLID Principles for Object-oriented Design

A collection of principles that helps with structuring object-oriented programs.

Single Responsibility principle

A module should only be responsible to one, and only one, actor

An actor can be a person or a group of persons with the same role. A different way to word this principle is "group things that change for the same reason separately from things that change for other reasons".

Open Closed principle

Software entities should be open for extension, but closed for modification

These two attributes sound like they are conflicting. The solution is to use interfaces that are not changed, so that e.g. class A can depend on interface I instead of class B. When class B is changed/replaced, class A does not need to be changed.

Liskov Substitution principle

Functions that use pointers or references to base classes must be able to use objects of derived classes without knowing it.

This principle states that sub-classes must not break the API of the parent class.

Interface Segregation principle

Clients should not be forced to depend upon interfaces that they do not use.

This principle states that interfaces should be small, and all implementations fulfill the complete interface. If an interface gets too big, it should be split.

Dependency Inversion principle

A. High-level modules should not import anything from low-level modules. Both should depend on abstractions (e.g., interfaces). B. Abstractions should not depend on details. Details (concrete implementations) should depend on abstractions.

High-level modules directly interacting with low-level modules makes it hard to change the low-level modules. On top of that, it also makes testing the high-level module hard. When both instead depend on an interface, the low-level module can be changed without the high-level module noticing (Open Closed principle). Testing is easier since the high-level module can use a test-dummy in place of the real low-level module.

Incremental Development

Uyuni has currently roughly one million lines of source code between the Java and Python code base in the main repository. It would be herculean task to improve the readability and maintainability of everything at once. Not only would that alone take months, if not years, it might not last either. Programs are modified all the time, the only way to stay on top of the changes is by adapting an incremental-improvements mindset. This mindset is best summarized by the boy scout rule:

Always leave the code you're editing a little better than you found it.

Style Guides and Formatting

Every developer has their own preferences. Working together, there is a simple but important rule:

Consistent style is more important than personal preferences.

Newer programming languages, such as Go, even come with their own formatter to avoid formatting discussions. In Uyuni we use different programming languages and each of them has their own idiosyncrasies. Therefore, code style and tooling is different for each language.

Chunking & Aliasing

One technique to reduce cognitive load on programmers is to reduce the number of items one needs to hold in their head at once. Humans only have a "L1 cache" size of 7 ± 2.

Chunking is the act of splitting parts of your programs in chunks (e.g. refactor into functions). Each chunk can be small enough to only require 7 ± 2 items, making it easier to keep everything in the programmer's head. The second part is aliasing, or giving chunks names. Once understood, aliases don't take up any space in the "human L1 cache".

Raymond Hettinger did a great job explaining and showing this technique at PyBay 2019 (The Mental Game of Python).

Guard clauses

Conditional complexity - usually introduced in the form of if and switch statements - is a factor that is constantly working against simplicity by making a function increasingly harder to read and taxing on a programmer's "L1 cache" as it grows.

It is not uncommon to see functions following this workflow:

  1. do some checks to ensure everything is valid for the function to run
  2. main function code
  3. else statements dealing with the error cases

Although this does not imply by itself that the code is badly written or hard to maintain, it can increase the likeliness of incurring in problems such as:

  • Excessive indentation: excessive nesting of control structures makes code reading difficult. Most coding style guides recommend limiting indentation to 2 - 4 levels of nesting.
  • Relationship between if-else: when there is a large number of separate code fragments between if-else, which are conceptually related to each other, it is necessary to perform the code reading by jumping between the different parts.
  • Increased mental effort: as consequence of the different jumps in the source code, the generation of new code requires an extra effort.

Take this Java function as an example:

public void Subscribe(User user, Subscription subscription, Term term) {
    if (user != null) {
        if (subscription != null) {
            if (term != null) {
                if (term != Term.Annually) {
                    // subscribe annually
                } else if (term == Term.Monthly) {
                    // subscribe monthly
                } else {
                    // subscribe to trial
                }
            } else {
                throw new ArgumentNullException(nameof(term));
            }
        } else {
            throw new ArgumentNullException(nameof(subscription));
        }
    } else {
        throw new ArgumentNullException(nameof(user));
    }
}

The 'fail fast' approach states that:

The sooner the user, or another system, is notified that there is a problem with processing inputs to a system, the better.

A guard clause is a check that immediately exits the function, either with a return statement or an exception. and is used to perform validation of inputs while also minimizing complexity in the function.

Inverting the workflow to use guard clauses at the start of the function tends to make the code shorter, simpler and less deeply indented.

public void Subscribe(User user, Subscription subscription, Term term) {
    if (user == null)
        throw new ArgumentNullException(nameof(user));
    if (subscription == null)
        throw new ArgumentNullException(nameof(subscription));
    if (term == null)
        throw new ArgumentNullException(nameof(term));
  
    if (term == Term.Annually) {
        // subscribe annually
    } else if (term == Term.Monthly) {
        // subscribe monthly
    } else {
       // subscribe to trial
    }
}

Further Reading

  1. Programming as Theory Building - Peter Naur PDF, markdown
  2. Don't write clean code, write CRISP code - John Arundel blog post