Clean Code - egnomerator/misc GitHub Wiki

Clean Code by Robert C. Martin

Most impactful concepts for me

  • Refactoring as a continual and incremental process
  • The importance of names
    • clear, concise, meaningful, consistent, don't "encode" (e.g. Hungarian), long isn't necessarily bad
  • Design classes with high cohesion and DI
  • Functions as one level of abstraction and the Stepdown Rule
    • e.g.

      To do X, first we do A, then we B, then we C

      • To do A, we ...
      • To do B, we ...
      • To do C, we ...

      To do Y, first we ...

  • Keep it short
    • minimize length of functions, of classes, and of parameter lists
  • Clean boundaries such as 3rd party interfaces and unknown interfaces
    • some applicable design patterns to help accomplish below concepts: Adapter, Facade, Bridge
    • 3rd party: don't let the 3rd party API control you, integrate it with an interface customized to your app
    • unknown: create the interface you wish you had, mock it in tests, then integrate real interface (e.g. use adaptor)
  • Do not BDUF (big design up front)
    • simple design up front, delay decisions as long as possible--you'll theoretically be more informed later
  • the complimentary nature of objects and data structures (<-- copy/paste-search to read the notes)

What is clean code?

Robert C. Martin's Collection of thoughts from:

  • Bjarne Stroustrup: inventor of C++; author of The C++ Programming Language
  • Grady Booch: author of Object Oriented Analysis and Design with Applications
  • "Big" Dave Thomas: founder of OTI, godfather of the Eclipse strategy
  • Michael Feathers: author of Working Effectively with Legacy Code
  • Ron Jeffries: author of Extreme Programming Installed and Extreme Programming Adventures in C#
  • Ward Cunningham: inventor of Wiki; inventor of Fit; coinventor of eXtreme Programming; Motive force behind Design Patterns; Smalltalk and OO thought leader; the godfather of all those who care about code
  • Uncle Bob

clean code is:

  • Does one thing well
  • logic straight forward--bugs hard to hide
  • Error handling is complete
  • performance close to optimal--doesn't tempt adding messy unprincipled optimizations
  • reads like well-written prose
  • full of crisp abstractions and straightforward lines of control
  • can be read and advanced by a developer other than its original author
  • has unit and acceptance tests
  • provides one way rather than many ways to do one thing
  • minimal dependencies which are explicitly defined
  • provides a clear and minimal API
  • should be literate
  • looks like it was written by someone who cares
  • runs all the tests
  • contains no duplication
  • expresses all the design ideas that are in the system
  • minimizes the number of entities such as classes, methods, functions, and the like
  • meaningful names
  • do one thing
  • early use of simple abstractions to
    • enable future change
    • maintain focus on the purpose (guide away from extraneous implementations)
  • expressiveness
  • each routine you read turns out to be pretty much what you expected
  • makes the language look like it was made for the problem
  • easy to read

ALL BELOW ARE THE UN-EDITED COMPLETE NOTES


Forward:

God is in the details

- Ludwig mies van der Rohe

TPM (Total Productive Maintenance)

  • a 1951 Japanese quality approach
  • 5S principles
    1. Seiri
      • think "sort" in English
      • Knowing where things are
      • Suitable naming
    2. Seiton
      • think "systematize" in English
      • A place for everything, and everything in its place

    3. Seiso
      • think "shine" in English
      • get rid of extraneous comments, commented out code, etc.
    4. Seiketsu
      • standardization
      • group must agree on coding style and set of practices
    5. Shutsuke
      • self-discipline
      • having the discipline to follow the practices and frequently reflect on one's work and be willing to change

Continually maintain and refactor code. Ship code that is more readable and more maintainable from the start.

architect Christopher Alexander views every act of design itself as a small, local act of repair.

French poet Paul Valery: a poem is never done and bears continual rework, and to stop working on it is abandonment.

The design lives in the code; design is a process, not a static endpoint

  • The code is the design

  • Simple code

Introduction

The comic illustration about WTFs/minute as the only valid measurement of code quality

Craftsmanship

There are two parts to learning craftsmanship: knowledge and work. You must gain the knowledge of principles, patterns, practices, and heuristics that a craftsman knows, and you must also grind that knowledge into your fingers, eyes, and gut by working hard and practicing.

(Page 27).

Ch. 1 Clean Code

Some have expressed that one day there will be no need for programmers

  • this is nonsense, because there will always be a need to translate requirements into machine language
  • humans can't even master this translation of vague requirements
  • their will always be a need for programmers to perform this translation--and it takes an expert to do this effectively; a machine cannot do this, a human must be the translator

Bad code can bring a company down

  • a company had a killer app that became popular quick
  • eventually the company died bc users stopped using it due to excessive bugs, crashes, etc.
  • an employee confirmed that this was due to creating a messy, un-maintainable code base while rushing to production

Wading through bad code--being impeded, slowed down by bad code

We commonly allow bad code saying that we'll clean it up later

  • but LeBlanc's law says

later means never

Programmer's responsibility to defend the quality of the code against the boss's schedule-based demands

The only way to go fast is to keep the code as clean as possible.

What is clean code?

Collected thoughts from:

  • Bjarne Stroustrup: inventor of C++; author of The C++ Programming Language
  • Grady Booch: author of Object Oriented Analysis and Design with Applications
  • "Big" Dave Thomas: founder of OTI, godfather of the Eclipse strategy
  • Michael Feathers: author of Working Effectively with Legacy Code
  • Ron Jeffries: author of Extreme Programming Installed and Extreme Programming Adventures in C#
  • Ward Cunningham: inventor of Wiki; inventor of Fit; coinventor of eXtreme Programming; Motive force behind Design Patterns; Smalltalk and OO thought leader; the godfather of all those who care about code
  • Uncle Bob

clean code is:

  • Does one thing well
  • logic straight forward--bugs hard to hide
  • Error handling is complete
  • performance close to optimal--doesn't tempt adding messy unprincipled optimizations
  • reads like well-written prose
  • full of crisp abstractions and straightforward lines of control
  • can be read and advanced by a developer other than its original author
  • has unit and acceptance tests
  • provides one way rather than many ways to do one thing
  • minimal dependencies which are explicitly defined
  • provides a clear and minimal API
  • should be literate
  • looks like it was written by someone who cares
  • runs all the tests
  • contains no duplication
  • expresses all the design ideas that are in the system
  • minimizes the number of entities such as classes, methods, functions, and the like
  • meaningful names
  • do one thing
  • early use of simple abstractions to
    • enable future change
    • maintain focus on the purpose (guide away from extraneous implementations)
  • expressiveness
  • each routine you read turns out to be pretty much what you expected
  • makes the language look like it was made for the problem
  • easy to read

The boyscout rule:

Leave the campground cleaner than you found it.

  • always refactor

Ch. 2 Meaningful Names

Use Intention-Revealing Names

  • example, instead of a variable d with a comment, use a meaningful variable name like daysSinceCreation

Avoid disinformation

  • don't use names that already have entrenched meanings which are different from the intended meaning
  • don't use cryptic names that look like something else
    • e.g. l (lower case "L" looks like an Arabic numeral "1")

Make meaningful distinctions

  • don't make a minor name change to satisfy a compiler requirement
    • e.g. 2 different things in the same scope that you think warrant the same name; you just misspell one of them
      • later, a misspelling correction could cause a compiler error
      • instead, these 2 different things should have 2 different names
  • don't use number series naming!!!
    • e.g. a1, a2, a3, ...
  • don't use "noise words"
    • e.g. a Product class that has ProcuctInfo and ProcuctData properties
      • "Info" and "Data" are meaningless noise words

Use pronounceable words

compare

class DtaRcrd102
{
    private Date genymdhms;
    private Date modymdhms;
    private final String pszqint = "102";
}

to

class Customer
{
    private Date generationTimeStamp;
    private Date modificationTimeStamp;
    private final String recordId = "102";
}

Use searchable names

  • single-letter names can ONLY be used as local variables inside short methods
  • The length of a name should correspond to the size of its scope

Avoid encodings

  • don't encode type or scope into names
  • Hungarian Notation was common due to old language requirements and limitations
    • this is no longer necessary and should be avoided
  • don't use member prefixes (e.g. m_description "m_" is unnecessary clutter and a marker of old code)

Interfaces and Implementations

  • scenario: building an abstract factory for the creation of shapes
    • the factory will be an interface and will be implemented by a concrete class
  • it's common to see IShapeFactory
  • Uncle bob prefers encoding implementation rather than enconding abstraction
    • encoding the interface is a distraction at best and TMI at worst
    • prefers the user not know they are working with an interface
    • just want them to know it's a shape factory
    • if I must encode either the interface or the impementation, I choose the implementation.

      • prefer ShapeFactoryImp or even CShapeFactory over IShapeFactory
  • my thoughts here:
    • isn't there an overarching principle to abstract away and/or hide implementation details?
    • and if so, isn't IShapeFactory more aligned with that principle than ShapeFactoryImp or CShapeFactory??

Avoid mental mapping

  • common to see variable names that are neither from the problem domain nor the solution domain
    • e.g. single letter variable name that we must mentally map to the concept
  • "smart" vs professional programmers
    • to professionals, clarity is king

Class names

  • a class name should not be a verb
  • a class name should be a noun or noun phrase
    • e.g. Customer, WikiPage, AddressParser

Method names

  • method names should be verb or verb phrases
    • e.g. postPayment, deletePage, save
  • Accessors, mutators, and predicates
    • should be named for their value and prefixed with "get", "set", and "is", according to the javabean standard
string name = employee.getName();
customer.setName("mike");
if(paycheck.isPosted()) // ...

Overloaded constructors

  • use static factory methods with names that describe the arguments
Complex fulcrumPoint = Complex.FromRealNumber(23.0);

// is generally better than

Complex fulcrumPoint = new Complex(23.0);
  • consider enforcing their use by making the corresponding constructors private

Don't be cute

  • say what you mean. mean what you say
  • don't use "HolyHandGrenade" instead of "deleteItems" for entertainment value

Pick one word per concept

  • be consistent, choose the word for the concept and stick with it
    • "fetch", "retrieve", "get" as equivalent methods for different classes is confusing
  • "controller", "manager", "driver" in same code base is confusing

Don't pun

  • example "add" method
    • using "add" for a method that adds 2 items
    • and then using "add" when adding an item to a list
      • maybe instead use "append" or "insert"

Use solution domain names

  • programmers know technical terms and programming terms like pattern names and algorithm names
  • e.g. AccountVisitor (visitor pattern)

Use problem domain names

  • when solution domain names aren't sufficient, use domain names
  • can at least consult a domain expert
  • the code that has more to do with the problem domain concepts should have names drawn from the problem domain

Add meaningful context

  • names which are meaningful on their own are uncommon
  • commonly it's necessary to add context
  • use well-named namespaces, classes, and functions to provide context
  • in rare cases, prefixes might be necessary

Don't add gratuitous context

  • example: "Gas Stataion Deluxe" application
    • you have a "GSDAccountAddress"
    • later you need another mailing address, do you reuse that specific class?
    • the "GSD" is pointless, and worse, makes it less re-usable
    • another downside in this scenario, is that searching for GSD in the app pulls up more results than you want due to overusing "GSD"
  • shorter names are preferred over longer as long as they are still clear
    • "Address" is a good class name and can have instances named "AccountAddress" and "CustomerAddress"
    • if you need to distinguish between MAC address, port address, and web address, you might consider "PostalAddress", "MAC", and "URI"

Ch. 3 Functions

Small!

  • 1st rule of functions is that they should be small
  • 2nd rule of functions is that they should be smaller than that
  • should hardly ever be even 20 lines long
  • shoot for less than a handful of lines
  • and short lines--not taking up the width of the screen

Blocks and indenting

  • if and loop blocks should be one line long--a function call
  • that function should have a nicely descriptive name--documentation value
  • no nesting (e.g. ifs inside ifs)
  • all this keeps functions short and readable

Do one thing

  • functions should do ONE THING
  • how do we know if a function is doing one thing??
    • we ensure that the body of the function remain at ONE LEVEL OF ABSTRACTION lower
    • the function's main concept (hopefully communicated by its name) is broken into steps
    • those steps should be describable in a "TO" paragraph
    • here is an example function followed by it's "TO" paragraph
// this example function from the book is a refactored version of the original
// - two versions of refactoring later actually

public static string renderPageWithSetupsAndTeardowns(PageData pageData, boolean isSuite)
throws Exception
{
    if (isTestPage(pageData)) includeSetupAndTeardownPages(pageData, isSuite);
    return pageData.getHtml();
}

// TO paragraph:
// TO RenderPageWithSetupsAndTeardowns, we check to see whether the page is a test
// page and if so, we include the setups and teardowns. In either case we render
// the page in HTML.

note: one way to know a function is doing just one thing is if you have a hard time extracting another function out of it who's name isn't merely a restatement of its implementation.

Sections within functions

  • a function that can be divided into sections (e.g. declarations, initializations, etc.) is a sign that it's doing more than one thing
  • functions that do one thing cannot be reasonably divided into sections

One level of abstraction per function

  • to make sure that a function is doing just one thing, make sure that the statements within it are all at the same level of abstraction

The Stepdown Rule: Reading code from top to bottom

  • the code should read like a top-down narrative
    • each function should be followed by those at the next level of abstraction so that we can read the program descending one level of abstraction at a time as we read down the list of functions
    • i.e. we want to be able to read the program as though it were a set of TO paragraphs, each of which is describing the current level of abstraction and referencing subsequent TO paragraphs at the next level down
    • e.g.
      • To print all prime factors under user input n we request user input for n, then we get all prime factors under n, and then we print these prime factors
        • to request user input for n ...
          • to ...
        • to get all prime factors under n ...
          • to ...
        • to print these prime factors ...
          • to ...

Switch statements

  • these are hard to keep small
  • they also are difficult to make do one thing, given that by their nature they do N things
  • we can't always avoid switch statements but we can bury them at a low-level and never repeat them
  • general rule for switch statements
    • they can be tolerated if they appear only once, are used to create polymorphic objects, and are hidden behind an inheritance relationship so that the rest of the system can't see them.

Use descriptive names

  • don't be afraid to use a long name
    • a long descriptive name is better than a long descriptive comment
  • don't be afraid to spend time choosing a name, even try several different names and read the code with each
  • choosing a descriptive name can help clarify the design
    • it's not uncommon that hunting for a good name results in a favorable restructuring of code

Function arguments

  • the ideal number of arguments for a function is zero (niladic)
  • next is one (monadic), followed closely by 2 (dyadic)
  • 3 (triadic) should be avoided where possible
  • more than 3 (polyadic) requires very special justification--and then shouldn't be used anyway

why is it so important to minimize # of arguments?

  • arguments make it more difficult to understand the function
  • increase testing effort the more the arguments, the argument combinations to test for

Common monadic forms

  • common
    • asking a question about an argument: bool fileExists("MyFile")
    • operating on an argument: InputStream fileOpen("MyFile")
  • less common but still useful
    • event: void passwordAttemptFailedNTimes(int attempts)
      • system interpret is to interpret this function as an event and use the argument to alter state
      • use this form with care, make it clear that this is an event
  • avoid other forms
    • void includeSetupPageInto(StringBuffer pageText)
      • using an output arg instead of return value for transformation is confusing
      • if a function is going to transform its input arg, the transformation should be the return value

Flag arguments

passing a boolean into a function is a truly terrible practice

Dyadic functions

  • Point p = new Point(0,0) is obviously a valid dyadic function
    • a Cartesian point naturally takes 2 arguments
    • in fact the 2 arguments are ordered components of a single value
  • assertEquals(expected, actual)
    • this is obviously fine, but even this has a problem
    • what order to pass in these arguments? you just have to learn it
  • just be aware that dyadic functions come at a cost
  • and be aware of the mechanisms that might be available to convert them to monads

Triads

  • always try to avoid if possible

Argument objects

  • when a function looks like it should take more than 2-3 args, these args could likely be wrapped in a class
Circle makeCircle(double x, double y, double radius);
Circle makeCircle(Point center, double radius);
  • is reducing the # of args by creating objects out of them cheating? NO!
  • when groups of variables are passed together (like the above Point for the circle) they're likely part of a concept that deserves a name of its own

Argument lists

  • some functions take a variable # of arguments
String.Format("%s worked %.2f hours.", name, hours);
  • if the variable arguments are all treated identically as in the above string format, then they are equivalent to a single argument
  • thus String.Format() is dyadic
// another way to think of this function
String.Format(String format, Object... args);
  • all the same rules apply to that dyad
  • same is true for monads, dyads, and triads that take variable arguments
void monad(Integer... args);
void dyad(String name, Integer... args);
void triad(String name, int count, Integer... args);

Verbs and keywords

  • monads should be verb/noun pairs
    • e.g. write(name) or perhaps writeField(name) is even better due to clarifying what name is
  • encoding argument names into function names can be helpful
    • e.g.
      • assertEquals(expected, actual) ... what order to we put our arguments?
      • assertExpectedEqualsActual(expected, actual) ... this fixes the question of ordering

Have no side effects

  • side effects are unexpected changes to function arguments, instance properties, scope variables, globals, etc.
  • these can be very damaging, because the function caller has no idea of or even reason to expect these effects
  • example
    • an example was given of a function called checkPassword which
      • not only checked the password, but also
      • initialized Session if password was correct
    • the side affect is initializing Session--the function name did not tell the caller about Session
    • fix by changing name to checkPasswordAndInitializeSession (... lol, but no, bc SRP)

Output arguments

  • in general, avoid output arguments, if a function must change the state of something, have it change the state of its owning object

Command query separation

  • functions should either do something or answer something, but not both

  • negative example: public boolean set(String attribute, String value);
    • leads to odd code like this: if(set("username", "uncle bob")){...}

Prefer exceptions to returning error codes

  • returning error codes from command functions is a subtle violation of command query separation
  • read this section on page 46 (77 of PDF)
    • the whole section is short and fits on this one page
    • the section is a clear explanation with a wonderful code illustration/clarification
  • interesting note:
    • James Coplien seems to have the complete opposite opinion
      • source: https://henrikwarne.com/2014/09/04/a-response-to-why-most-unit-testing-is-waste/#comment-5502
      • on exceptions
        • "In general, I hold exceptions to be a bad idea"
        • "Exceptions are just hyper galactic GOTOs"
        • "If you’re throwing an exception across more than two activation records, you probably can’t reason about the code"
        • the paragraph he wrote that those quotes are from also mentions C++ though
          • so idk if his opinion written there is limited to exceptions in C++, but i doubt it??
      • on return codes
        • "with good programming discipline they [exceptions] can be replaced by return codes ... and handled locally"

Extract try/catch blocks

  • if you have multiple function calls in a try block, refactor/extract them into a new function, and call just that single function in the try block instead
  • easier to read and modify

Error handling is one thing

  • functions should do just one thing
  • error handling is one thing
  • if a function has a try block in it:
    • the keyword try should be the first word in the function body, and
    • nothing after the try/catch block should exist in that function body

Error codes are dependency magnets - BAD!!!

  • dependency magnets are classes or enums which many other classes must reference
  • example: an error code enum ... add a new error code, all modules that reference this must re-compile
  • better: exceptions - a new exception inherits from the base Exception class and won't have that issue

Don't repeat yourself

  • don't repeat yourself

How to write good functions?

  • start ugly, refactor, refactor, refactor
  • TDD: don't just start ugly and refactor
  • so start ugly but confident due to thorough unit test coverage
  • then in the safety of continually passing unit tests, refactor, refactor, refactor

Ch. 4 Comments

Comments are at best a necessary evil.

Examples of worth while comments

  • authorship and copyright at top of file
  • others were mentioned but didn't really seem worth noting, bc they were still noted as debatably worth using

Examples of bad comments

  • a bunch of these examples, but not worth taking note of
  • he especially liked showing examples similar to this:
// this is where we get the name
public string getName() {}

BOTTOM LINE of this chapter: don't use comments almost ever

Ch. 5 Formatting

a source file's vertical formatting

  • try to keep it short
    • example solutions like fitness and junit keep most files under 100 lines
  • vertical openness between concepts - effective use of blank lines to separate concepts
  • vertical density - grouping lines of a concept together
  • vertical distance
    • closely related concepts should be vertically closer to each other

declare variables as close to their usage as possible

  • local function variables should be declared at the top of the function
  • instance variables should be declared at the top of the class
    • this should not increase the vertical distance of these variables, because in a well-designed class, they are used by many, if not all, of the methods of the class

dependent functions

  • if one function calls another, they should be vertically close
  • caller above callee

conceptual affinity

  • code that is closely related to other code should have little vertical distance
  • caller/callee is just one example
  • another example would be a group of functions that perform variations of a basic task

vertical ordering

  • high-level functions first
  • low-level details after

horizontal formatting

  • according to the distribution of his example solutions
    • 20-60 chars represents 40%
  • "Hollerith" limit of 80 is "a bit arbitrary"
  • 100-120 should be the max
  • uncle bob sets his limit at 120

horizontal openness and density

  • white space can help to associate or disassociate concepts
  • e.g. myFunction(parm1, param2)
    • no space between function name and parenthesis
    • space between params bc they are separate things
  • e.g. return (-b + Math.sqrt(determinant)) / (2*a) and return b*b - 4*a*c
    • no space in the higher precedence multiplied factors
    • space between the lower precedence added terms

horizontal alignment

var numbVar                 = 1;
var numbVarLonger           = 1;
var numbVarRidiculouslyLong = 1;
  • e.g. above shows aligning variable assignments
  • don't do this
  • if you have a long list of declarations, you may need to refactor something(s) out

indentation

  • very important
  • this helps show the hierarchy of scopes (namespace, class, method, block, etc.)
  • don't break these indentations for short statements that normally are expressed in blocks
// prefer this
public class CommentWidget extends TextWidget
{
    public static final String REGEXP = "example regex pattern";

    public CommentWidget (ParentWidget parent, String text)
    {
        super(parent, text);
    }

    public String render() throws Exception
    {
        return "";
    }
}

// over this
public class CommentWidget extends TextWidget
{
    public static final String REGEXP = "example regex pattern";

    public CommentWidget (ParentWidget parent, String text){super(parent, text);}

    public String render() throws Exception{return "";}
}

dummy scopes

  • sometimes a for and while loops have dummy scope bodies
  • try to avoid these or at least make them very obvious
// prefer this
while (dis.read(buf, 0, readBufferSize) != -1)
    ;

// over this
while (dis.read(buf, 0, readBufferSize) != -1);

team rules

  • decide on formatting rules for the team, and each team member needs to follow these rules
  • the entire code base must be consistent

Uncle Bob's formatting rules

  • Listing 5-6, pages 91, 92 (pp. 122, 123 of the PDF)

Ch. 6 Objects and Data Structures

  • We keep our variables private, because we don't want anyone else depending on them.
  • We want the freedom to change their type or implementation.
  • Why do so many programmers automatically add getters and setters to their objects, exposing their private variables as if they were public?

Data abstraction

  • it's better to not just add auto-getter/setters
    • this does not hide implementation
  • use abstraction to hide data implementation while still allowing users to manipulate

example:

// this reveals implementation
public interface Vehicle
{
    double getFuelTankCapacityInGallons();
    double getGallonsOfGasoline();
}

// this hides implementation
public interface Vehicle
{
    double getPercentFuelRemaining();
}

Data/Object anti-symmetry

Procedural Shapes

public class Square
{
    public Point topLeft;
    public double side;
}

public class Rectangle
{
    public Point topLeft;
    public double height;
    public double width;
}

public class Circle
{
    public Point center;
    public double radius;
}

public class Geometry
{
    public final double PI = 3.14;

    public double area(Object shape) throws NoSuchShapeException
    {
        if (shape instanceof Square)
        {
            Square s = (Square)shape;
            return s.side * s.side;
        }
        else if (shape instanceof Rectangle)
        {
            Rectangle r = (Rectangle)shape;
            return r.height * r.width;
        }
        else if (shape instanceof Circle)
        {
            Circle c = (Circle)shape;
            return PI * c.radius * c.radius;
        }
        throw new NoSuchShapeException();
    }
}

Polymorphic Shapes

public class Square implements Shape
{
    private Point topLeft;
    private double side;

    public double area()
    {
        return side*side;
    }
}

public class Rectangle implements Shape
{
    private Point topLeft;
    private double height;
    private double width;

    public double area()
    {
        return height * width;
    }
}

public class Circle implements Shape
{
    private Point center;
    private double radius;
    public final double PI = 3.14;

    public double area()
    {
        return PI * radius * radius;
    }
}

objects vs data structures

  • objects hide their data behind abstractions and expose functions that operate on that data
    • see polymorphic shapes code above
    • the objects have hidden data, and public functions operating on that data
  • data structures expose their data and have no meaningful functions
    • see procedural shapes code above
    • the objects are really just data structures - public data, no functions

the complimentary nature of objects and data structures

  • Procedural code makes it easy to add new functions without changing the existing data structures
    • see procedural shapes code above
    • could easily add a new perimeter function to the Geometry class, and all "shape" data structures would be unaffected
  • OO code makes it easy to add new classes without changing existing functions
    • see polymorphic shapes code above
    • can easily add a new shape class, and the existing shape classes' area functions would be unaffected

the compliment is also true:

  • Procedural code makes it hard to add new data structures, because all the functions must change
    • adding a new "shape" data structure would require updating the Geometry.area function (and a perimeter function, and so on)
  • OO code makes it hard to add new functions because all the classes must change
    • if we wanted to add a perimeter function to shapes, we'd have to add the new function to the super class and implement that in all inheriting classes

So, the things that are hard for OO are easy for procedures, and the things that are hard for procedures are easy for OO!

In any complex system there are going to be times when we want to add new data types rather than new functions. For these cases objects and OO are most appropriate. On the other hand, there will also be times when we'll want to add new functions as opposed to data types. In that case procedural code and data structures will be more appropriate.

Mature programmers know that the idea that everything is an object is a myth. Sometimes you really do want simple data structures with procedures operating on them.

Law of Demeter (LOD)

  • a module should not know about "the innards" of the objects it manipulates
  • an object should not expose its internal structure through accessors, because to do so is to expose, rather than to hide, its internal structure
  • the Law states that a method f of class C should only call the methods of these:
    • C
    • an object created by f
    • an object passed as an argument to f
    • an object held in an instance variable of C
  • the method f should not invoke methods on objects that are returned by any of the allowed functions
  • example possible violation:
final String outputDir = ctxt.getOptions().getScratchDir().getAbsolutePath();
  • "train wreck" ... code like the above is often called a train wreck bc it looks like a bunch of coupled train cars
Options opts = ctxt.getOptions();
File scratchDir = opts.getScratchDir();
final String outputDir = scratchDir.getAbsolutePath();
  • the above snippets (train wreck style or the second) are in a function that has a lot of knowledge
    • knowledge of options, of a scratch directory, and of an absolute path
  • whether the code violates the LOD depends on whether ctxt, Options, and ScratchDir are objects or data structures
    • if they are objects, then exposing their "innards" is a violation
    • if they are data structures, then LOD doesn't apply
      • bc data structures naturally expose data and have not functionality
  • the use of accessor functions confuses the issue
    • LOD would not be a question if the code had been written like this:
final String outputDir = ctxt.options.scratchDir.absolutePath;
  • this issue would be a lot less confusing if data structures simply had public variables and no functions, and objects had private variables and public functions
  • but there are frameworks and standards (e.g. "beans") that demand that even simple data structures have accessors and mutators

Hybrids

  • avoid these
    • they are indicative of a muddled design whos authors are unsure or (worse) ignorant of whether they need protection from functions or types
  • what are they
    • part object, part data structure
    • have functions that do significant things
    • have public variables or accessors and mutators that basically make the private variables public
  • these hybrids are the worst of both worlds
    • they make it hard to add new functions AND hard to add new data structures

Hiding structure

  • make sure that higher level classes aren't aware of lower level functionality
  • make sure that classes aren't breaking the LOD - make sure they're not aware of internals of classes that they shouldn't know the internals of

DTOs

  • a quintessential data structure, a class with public variables and no functions
  • very useful, for I/O communication (e.g. database, sockets, etc.)
  • often the first in a series of translation stages converting raw data in a database into objects in the application code
  • the "bean"
    • somewhat more common (at least in Java i assume--my thought)
    • have private variables manipulated by getters/setters
    • the "quasi-encapsulation of beans seems to make some OO purists feel better but usually provides no other benefit

  • active record
    • a specific form of DTO
    • in addition to the public (or publicly accessible) variables, they have navigational methods
      • e.g. save() and find()
    • typically direct translations from database tables or other data sources
    • developers often treat these as objects by putting business rules in them - don't
    • instead, use an object that hides it's internals, and have these internals be instances of these active records (this is an example approach, there are other ways)

conclusion

  • objects expose behavior and hide data
    • makes it easy to add new kinds of objects without changing existing behaviors
    • makes it hard to add new behaviors to existing objects
  • data structures expose data and have no significant behavior
    • makes it easy to add new behaviors to existing data structures
      • (my thought: idk if "better" wording here might be:)
        • ("makes it easy to add new behaviors that operate on existing data structures" or)
        • ("makes it easy to add new functions that operate on existing data structures")
    • makes it hard to add new data structures to existing functions

Ch. 7 Error Handling

  • many code bases are dominated by error handling
  • error handling is important, but if it obscures logic, it's wrong

Use exceptions rather than return codes

  • return codes are used in languages that don't have exceptions
  • these languages are still around but if you have exceptions, you should use them
  • problems with working with flags or return codes
    • clutter the caller
    • require that the caller check for errors immediately after the call
  • exceptions allow cleaner code with logic unobscured by error handling

without exceptions

public class DeviceController
{
    ...
    public void sendShutDown()
    {
        DeviceHandle handle = getHandle(DEV1);
        // Check the state of the device
        if (handle != DeviceHandle.INVALID)
        {
            // Save the device status to the record field
            retrieveDeviceRecord(handle);
            // if not suspended, shut down
            if (record.getStatus() != DEVICE_SUSPENDED)
            {
                pauseDevice(handle);
                clearDeviceWorkQueue(handle);
                closeDevice(handle);
            }
            else
            {
                logger.log("Device suspended. Unable to shut down");
            }
        }
        else
        {
            logger.log("Invalid handle for: " + DEV1.toString());
        }
    }
    ...
}

with exceptions

public class DeviceController
{
    ...
    public void sendShutDown()
    {
        try
        {
            tryToShutDown();
        }
        catch (DeviceShutDownError e)
        {
            logger.log(e);
        }
    }

    private void tryToShutDown() throws DeviceShutDownError
    {
        DeviceHandle handle = getHandle(DEV1);
        DeviceRecord record = retrieveDeviceRecord(handle);

        pauseDevice(handle);
        clearDeviceWorkQueue(handle);
        closeDevice(handle);
    }

    private DeviceHandle getHandle(DeviceID id)
    {
        ...
        throw new DeviceShutDownError("Invalid handle for: " + id.toString());
        ...
    }
    ...
}
  • the code with exceptions untangled the 2 concerns - 1. algorithm for device shutdown 2. error handling

Write your try-catch-finally statement first

t-c-f => try-catch-finally statement

  • one of the most interesting things about exceptions is that they define a scope within your program
  • when you execute code in the try block of a t-c-f you are stating that execution can abort at any point and then resume in the catch
  • a t-c-f is, in a way, like a transaction - the catch block must leave the program in a consistent state regardless of what happens in the try
  • good practice to start with a t-c-f whenever writing code that could throw exceptions
  • this helps define what users of the code should expect if something goes wrong

Use unchecked exceptions

  • java has "checked exceptions" where a method's signature includes a list of exception types it can pass to its caller
  • examples of languages that have no such thing: C#, C++, Python, Ruby
  • robust systems can be created without checked exceptions - so does there benefit justify their price?
    • the price: violating the OCP
      • if you throw a checked exception who's catch is 3 methods above, the signatures of the 2 methods under the catching method must declare that exception type
      • this means changes to low level methods' signatures for checked methods will force signature changes to higher level methods
    • worth the price? basically no (but it was mentioned maybe for a critical library)

Provide context with exceptions

  • create informative error messages and pass them along with your exceptions
  • mention the operation that failed and the type of failure

Define exception classes in terms of a caller's needs

  • many ways to classify errors
    • by source - what component did they come from
    • by type - device failures, network failures, programming errors, etc.
  • when defining exception classes in an app, most important concern should be how they are caught

poor exception classification

ACMEPort port = new ACMEPort(12);

try
{
    port.open();
}
catch (DeviceResponseException e)
{
    reportPortError(e);
    logger.log("Device response exception", e);
}
catch (ATM1212UnlockedException e)
{
    reportPortError(e);
    logger.log("Unlock exception", e);
}
catch (GMXError e)
{
    reportPortError(e);
    logger.log("Device response exception", e);
}
finally
{
    ...
}

refactored

LocalPort port = new LocalPort(12);

try
{
    port.open();
}
catch (PortDeviceFailure e)
{
    reportError(e);
    logger.log(e.getMessage(), e);
}
finally
{
    ...
}

public class LocalPort
{
    private ACMEPort innerPort;

    public LocalPort(int portNumber)
    {
        innerPort = new ACMEPort(portNumber);
    }

    public void open()
    {
        try
        {
            innerPort.open();
        }
        catch (DeviceResponseException e)
        {
            throw new PortDeviceFailure(e);
        }
        catch (ATM1212UnlockedException e)
        {
            throw new PortDeviceFailure(e);
        }
        catch (GMXError e)
        {
            throw new PortDeviceFailure(e);
        }
    }
}
  • why is the refactored version better
  • simplifying the code by wrapping the API we're calling and making sure it returns a common exception
    • since we are doing basically the same thing for each exception type

benefits of wrapping APIs

  • wrapping 3rd-party APIs is considered best practice, here's why
  • wrapping it minimizes your dependency on it
    • you can choose to use a different library with minimal penalty
  • makes it easier to mock
  • not tied to the API vendors design choices
    • your wrapper is the API you want, and if the vendor changes something, you have minimal changes to make

define the normal flow

  • following the above discussed exception handling practices pushes error detection to the edges of the program resulting in less cluttered code
    • wrap APIs to throw your own exceptions
    • handler above your code to handle aborted computations
  • this is good but sometimes you may not want to abort a computation

awkward code

try
{
    MealExpenses expenses = expenseReportDAO.getMeals(employee.getID());
    m_total += expenses.getTotal();
}
catch (MealExpensesNotFound e)
{
    m_total += getMealPerDiem();
}
  • in this business, if meals are expensed, they're part of the total, otherwise, a per diem is expensed for the employee
  • the exception handling is cluttering the logic
  • would rather the code simply look like this:
MealExpenses expenses = expenseReportDAO.getMeals(employee.getID());
m_total += expenses.getTotal();
  • we can make it that simple
  • change the ExpenseReportDAO to always return a MealExpenses object
  • if there are no meal expenses, it returns a MealExpenses object that returns the per diem as its total
public class PerDiemMealExpenses implements MealExpenses
{
    public it getTotal()
    {
        // return the default per diem
    }
}
  • this is the "Special Case Pattern" (book references Fowler)
    • create a class or configure an object to handle a special case
    • doing so relieves client code of the burden of dealing with exceptional behavior

Null

  • this is an error handling chapter, so we will discuss things we do that invite errors

Don't return null

  • it's very common to see lines of code all over a program checking for null
  • this is bad!
  • returning null is creating problems for yourself - for the caller of what returns null
  • if you are tempted to return null from a method, consider throwing an exception or returning a special case object instead

    • similarly with 3rd party APIs, wrap an API method that returns null in your own method that either returns a special case object or throws
  • an example: a method that returns a list that can return null
    • instead of checking for null returned from this method, make the method return an empty list instead of null
    • now you don't have to check for null before iterating the list, just iterate it

Don't pass null

  • avoid passing null into methods whenever possible - this is even worse than returning null
    • working with an API method that expects you to pass null is an example where you can't avoid this
  • in most programming languages, there isn't a good way to handle null passed in to a method
    • you can check for null and have an exception handler for InvalidArgumentException, but what should the handler do?
  • just avoid passing null into methods

Ch. 8 Boundaries

  • we seldom control all the software in our system
    • we may buy 3rd-party packages, use open source, depend on components from other teams, etc.
  • we have to integrate the foreign code with our own
  • we have to keep the boundaries of our software clean

Using 3rd-Party code

  • there's a natural tension between an interface provider and the user of that interface
  • interface providers strive for broad applicability, users want the interface to focus on their needs
  • this tension can lead to problems at system boundaries
  • example: java.util.Map
    • what if we want to insure that no one deletes items in a map we want to pass around?
      • there's a clear function in the Map's interface
    • what if we want to limit the types of objects put in a map?
      • Map is generic
// say we have an app that needs a Map of Sensors
Map sensors = new HashMap();

// now some other part of the code needs to access sensor
// - we notice that this tends to be all over the place in the application
// - the client is carrying the responsibility of casting
Sensor s = (Sensor)sensors.get(sensorId);

// let's improve the readability of the code
Map<Sensor> sensors = new HashMap<Sensor>();
// now the client no longer needs to handle casting in all those places in the code
Sensor s = sensors.get(sensorId);

// Map<Sensor> is still providing more capability than we need or want
// And if the Map interface ever changes, we may have many places in the code to change
// ...
// This class could be an good approach
// this can evolve with very little impact on the rest of the app
// the use of generics is no longer a big issue, bc casting is handled in the class
// this interface is tailored to the needs of the app
// the code is easier to read, and harder to misuse
// this class can enforce design and business rules
public class Sensors
{
    private Map sensors = new HashMap();

    public Sensors getById(String id)
    {
        return (Sensor)sensors.get(id);
    }

    // snip
}

// use
Sensors sensors = new Sensors();
Sensor s = sensors.getById(sensorId);
  • this is not a suggestion to always encapsulate the use of Map in this form
  • this is advice that you NOT pass Maps (or any other interface at the boundary) around your system
  • if you use a boundary interface like Map, keep it inside the class, or close family of classes, where it's used
  • avoid returning Map from, or accepting Map as an argument to, public APIs

Exploring and learning boundaries

  • learning and integrating a 3rd-party API can take a lot of time and effort
    • common to spend long debugging sessions trying to figure out how the 3rd-party code works and determining whether an exception is being thrown by a bug in your code or by code that is simply using the API wrong
  • using "learning tests" can help
  • create tests that focus on what you want out of the API
  • these are essentially controlled experiments that check your understanding of the API
  • learning tests not only help us learn the API, they verify that the package works as we expect
    • if an update comes later, our learning tests show whether the package still works as we expect
  • these learning tests can serve as "boundary tests"
    • it's always good to support a clean boundary with a set of outbound tests that exercise the interface in a way that production code does
  • boundary tests helps us migrate to newer versions rather than feeling like we have to stay with an old version longer than we maybe should

Using code that does not yet exist

  • there's another kind of boundary, one that separates the known from the unknown
  • example:
    • sometimes you have to write code for an application that relies on 3rd-party code for which you don't know the interface yet
    • how do you start writing you code in this scenario??
    • start working far away from the boundary
    • you'll find yourself getting close to that boundary and even bumping up against it
    • as you do this, you'll learn more about what you WANT the interface to do for you
    • define the interface you WISH you had
    • you can mock the API in tests
    • once the real interface is ready
      • create an Adapter to bridge the gap between your application interface and the other interface
        • this Adapter encapsulates the interaction with the API and provides a single place to change when the API changes
      • you can create boundary tests once you have the API

principles

  • design to accommodate change without huge investments and rework
  • avoid letting to much of your code know about external APIs
  • have clear separation and tests that define expectations

Ch. 9 Unit Tests

  • 3 laws of TDD given
  • writing tests this way results in a bulk of test code that can present a daunting management problem

Keeping tests clean

  • dirty test code is as bad as--or worse than--no test code
  • test code has to change as production code changes, and the dirtier the code the greater the maintenance burden
  • with dirty test code, tests will become viewed as an ever-increasing liability
  • test code is just as important as production code

Tests enable the -ilities

  • unit tests are what keep your code flexible, maintainable, and reusable
    • if you have unit tests, you do not fear making changes to the code!
    • unit tests allow you to make changes with confidence
      • confidence that you have not broken anything
    • this means you can make refactor and improve the architecture and design without fear
  • tests enable all the -ilities, bc tests enable change

Clean tests

  • what makes a clean test? readability
    • you want to say a lot with as few expressions as possible

Domain-specific testing language

  • design a testing API that helps the tests be simple and readable
  • this testing API is not designed up-front, but evolves over time as a result of continually refactoring the tests

Dual standard

  • this dual standard is related to things like efficiency in memory and CPU
  • unit tests don't have to be as efficient as prod code
  • (but tests MUST still be clean and expressive)

One assert per test

  • this helps readability and is a good guideline
  • but doesn't necessarily always have to be followed
  • definitely minimize the number of asserts

Single concept per test

  • test a single concept in each test function
  • this is probably a better rule than the one-assert-per-test rule

F.I.R.S.T.

  • Fast - tests should run fast
  • Independent - tests should not depend on each other
  • Repeatable - tests should be repeatable in any environment
    • you should be able to run the tests in the production environment, in the QA environment, and on your laptop while riding home on the train without a network

  • Self-Validating - tests should have a boolean output (pass or fail)
    • you shouldn't have read a log file or compare output files
  • Timely - tests should be written in a timely fashion
    • tests should be written ahead of the production code that makes them pass

Ch. 10 Classes

class organization

  • standard Java convention says ...
  • a class should begin with a list of variables
    • public static constants, then
    • private static variables, then
    • private instance variables
    • (there is seldom a good reason to have a public variable)
  • public functions should follow the variables
  • put the private functions called by a public function immediately after than public function

Encapsulation (emphasis in this quite is mine)

We like to keep our variables and utility functions private, but we're not fanatic about it. Sometimes we need to make a variable or utility function protected so that it can be accessed by a test. For us, tests rule. If a test in the same package needs to call a function or access a variable, we'll make it protected or package scope. However, we'll first look for a way to maintain privacy. Loosening encapsulation is always the last resort.

p. 136 (p. 167 of PDF)

  • my thoughts from reading this - and from other things he has said
    • uncle bob apparently sometimes puts unit tests in the same package as the production code they're testing??
      • protected, in Java, means accessible in the containing type and derived types, and in the containing package
      • packages, in Java, are basically Namespaces
      • i assume that this choice is not common?
        • the last sentence in that paragraph emphasizes that "loosening of encapsulation is always a last resort"
        • so that may mean that even if they have a test in the same package, it likely won't affect the SUT's encapsulation
        • it also could maybe indicate that it's not very common for him to put unit tests in the same package? (but this is a guess/assumption)
    • on our team, we never put tests in the same project as our production code
      • we have separate designated test projects where we put all tests

Classes should be small!

  • how small?
  • measure in # of responsibilities - 1
    • with functions, we measured in # of lines
  • a class name should describe its responsibility
    • it should be concise and unambiguous

Single Responsibility Principle (SRP)

  • a class or module should have one, and only one, reason to change
  • many developers fear that a large number of small, single-purpose classes makes it more difficult to understand the bigger picture
    • concerned about having to navigate from class to class to figure out how a larger piece of work gets accomplished
  • just as much to learn about the app regardless of organization
    • analogy of a toolbox:
      • you want everything thrown in a couple drawers?
      • or everything organized into smaller, labeled drawers?

Cohesion

  • classes should have a small # of instance variables
  • each method of a class should manipulate one or more of these instance variables
  • in general, the more instance variables a method manipulates, the more cohesive that method is to its class
  • a class in which each variable is used by each method is maximally cohesive
  • in general it is neither advisable nor possible to create such maximally cohesive classes
  • but we do want high cohesion
  • having high cohesion means that the methods and variables of the class make logical sense to be together
  • the strategy of keeping functions small and parameter lists short can sometimes lead to a proliferation of instance variables that are used by a subset of methods
    • when this happens, it normally means you can split it into smaller classes

Maintaining cohesion results in many small classes

  • example given of an original code version and a refactored version
  • original version was one long method
  • new version was many small classes and functions
    • more readable, understandable, maintainable

Organizing for change

  • any time that you have to modify a class, you are introducing a risk that the system no longer works as expected
    • open/closed principle was alluded to here
  • what to do when we spot a class that seems to violate SRP
    • we should primarily consider system change over "fixing" a violation of SRP
    • if we have no current reason to change a class that violates SRP and no need for change in the forseeable future, then we leave the class alone
      • introducing the change at all is still introducing risk
    • if we do have a reason to change a class that violates SRP, it's time to strongly consider refactoring
      • now introducing the change is happening regardless, so we want to change it the right way

Isolating from change

  • client classes of concrete objects depend on those concrete objects' concrete details
  • these client classes are at risk when those concrete details change
  • minimize the impact of these changes with interfaces and abstract classes
    • DIP (dependency inversion principle) says that our classes should depend on abstractions, not concrete details

Ch. 11 Systems

Separate constructing a system from using it

Software systems should separate the startup process, when the application objects are constructed and the dependencies are "wired" together, from the runtime logic that takes over after startup.

  • The startup process is a concern that the app must address
    • the separation of concerns is important in software design
  • an example of NOT separating the startup concern:
public Service getService()
{
    if(service = null)
        service = new MyServiceImpl(...); // Good enough default for most cases?
    return service;
}
  • that's the lazy initialization/evaluation idiom
    • merits:
      • you only incur construction overhead if you use the object
      • this can result in faster startup times
      • you ensure null isn't returned
    • cons:
      • hard-coded dependency on MyServiceImpl and everything its constructor requires
      • you can't even compile without resolving these dependencies--even if you never use it
      • testing can be a problem especially if MyServiceImpl is a heavy object
      • construction logic is mixed in with normal runtime processing
        • breaking SRP
        • requires testing for more execution paths
  • one occurrence of lazy initialization isn't a serious problem
  • but too many occurrences of this can hurt modularity and cause duplication
  • we should have a global, consistent strategy for resolving major dependencies

Separation of Main

One way to separate construction from use is simply to move all aspects of construction to main, or modules called by main, and to design the rest of the system assuming that all objects have been constructed and wired up appropriately.

  • the flow of control is easy to follow
    • all objects the system needs are constructed in main
    • all dependency arrows go the same way crossing the barrier between main and the app
    • they all point away from main
  • the app doesn't know about main, it just expects that everything it needs has been built

Factories

  • sometimes the app needs to control when objects are created
  • Abstract Factory pattern gives the app control of when but hides the details of how

Dependency Injection

  • DI is a mechanism for separating construction from use
  • DI is the application of IoC to dependency management
  • IoC moves secondary responsibilities from an object to other objects dedicated to those purposes
  • an object should not instantiate its dependencies
    • this should be handled by another "authoritative" mechanism - inverting control
    • setup is a global concern, so the authoritative mechanism should be "main" routine or a special-purpose container
  • example of "partial" DI:
MyService myService = (MyService)(jndiContext.lookup("NameOfMyService"));
  • this is "partial" DI, bc:
    • the invoking object does not control what type of object is returned
    • but, the invoking object actively resolves the dependency
  • True DI goes a step further
    • the class takes no direct steps to resolve the dependency, it's passive
    • it provides a setter method or constructor argument that are used to inject the D
  • example of True DI
public MyInvokingObject(MyService myService){...}

Scaling Up

  • it is a myth that we can get systems "right the first time"
  • scale at the code level
    • implement what you need today, then refactor and expand tomorrow
  • scale at system level
    • thinking in comparison to the approach at the code level
      • doesn't system architecture require preplanning?
      • certainly system architecture can't grow incrementally from simple to complex?
    • the ephemeral nature of software systems allows incremental growth

      Software systems are unique compared to physical systems. Their architectures can grow incrementally, if we maintain the proper separation of concerns.

Cross-Cutting Concerns

  • example, persistence concerns
  • even though we want to achieve modularity - separation of concerns, implementations of persistency strategies must be spread across many many objects
  • AOP (Aspect-Oriented Programming)
    • an approach intended to restore modularity for cross-cutting concerns
    • modular constructs called "aspects" specify which points in the system should have their behavior modified in some consistent way to support a particular concern
    • this specification is done using a succinct declarative or programmatic mechanism
  • in AOP, with persistence, you would declare which objects and attributes (or patterns thereof) should be persisted and then delegate the persistence tasks to your persistence framework
  • AOP Frameworks
    • Spring AOP and JBoss AOP are examples of Pure Java AOP frameworks
    • these tools automate the implementation of complicated proxy boilerplate (see Java Proxies section on p. 161, PDF p. 192)
    • this allows the programmer to focus on their domain and implement business logic in POJOs
    • the frameworks are incorporated via declarative configuration files or APIs
    • this enables defining more consistent system-wide behavior
    • the frameworks create nested decorator objects around the POJO

BDUF (big design up front) - bad

  • this is not only unnecessary, it's potentially harmful due to inhibiting adaptation to change
    • psychologically difficult to discard work that resulted from a lot of previous effort
    • architectural choices influence subsequent thinking about the design
  • building architects have to BDUF bc radical architectural changes aren't economically feasible after significant construction
  • it can be economically feasible for software architects to make radical changes after significant development if their application's structure has effectively separated concerns

Simple Design up front - good

  • a "naively simple" but nicely decoupled architecture allows for quicker delivery and future scalability

Optimize decision making

  • delay decisions as long as possible to ensure making the most informed choices

Use standards wisely - when they add demonstrable value

  • there are many mature standards and standards-based tools available, not all situations call for them though
  • i thought of WCF in this brief section

Ch. 12 Emergence - getting clean via emergent design

A design is "simple" if it: (Kent Beck's 4 rules of simple design - in order of importance)

  1. Runs all the tests
  2. Contains no duplication
  3. Expresses the intent of the programmer
  4. Minimizes the number of classes and methods

And, couched in TDD

  1. Runs all the tests

    • this enables refactoring safely
  2. Refactoring: Contains no duplication

    • not only exact duplicate lines, but also this type of example:
    int size() {}
    boolean isEmpty() {}
    
    // can eliminate duplication by something like this:
    //  - calling size rather than repeating what size does and checking the result
    boolean isEmpty()
    {
        return 0 == size();
    }
  3. Refactoring: Expresses the intent of the programmer

  4. Refactoring: Minimizes the number of classes and methods

Ch. 13 Concurrency

Objects are abstractions of processes. Threads are abstractions of schedule.

James O. Coplien

Concurrency is hard.

Concurrency is a decoupling strategy

  • it helps us decouple what gets done from when it gets done
  • a single-threaded app has highly coupled what and when
    • can determine app's state by just looking the stack backtrace
    • can set breakpoints and know the state by the breakpoint that's hit
  • decoupling what from when can dramatically improve both throughput and structures of an app
    • structurally, this kind of app can look like a bunch of tiny computers collaborating rather than one main loop
    • this offers powerful ways to separate concerns
    • this can make the system easier to understand

example scenario where concurrency could improve a system

  • info aggregator that retrieves data from websites and reports a daily summary
    • a lot of network I/O waiting - the websites list grows, combing all websites can take too long
  • a system handles one user at-a-time and requires 1 second per user
    • fine for low number of users, but what about if users have to get in line of 150 users
  • a system that interprets large data sets but can only give complete solution after processing all sets
    • perhaps each set should be processed by a different machine for parallel processing

Myths and misconceptions

  • concurrency always improves performance
    • it can sometimes improve performance, but only when there is a lot of wait time that can be shared between multiple threads or multiple processors - neither situation is trivial
  • design does not change when designing concurrent programs
    • the design of a concurrent algorithm can be remarkably different from a single-threaded
    • the decoupling of what from when usually has a huge effect on the structure of the system
  • understanding concurrency issues is not important when working with a container such as a Web or EJB container
    • you should know about concurrent update and deadlock (described later in book)

Some more balanced statements regarding concurrency

  • concurrency incurs some overhead, both in performance as well as writing additional code
  • correct concurrency is complex, even for simple problems
  • concurrency bugs aren't usually repeatable, so they are often ignored as one-offs instead of the true defects they are
  • concurrency often requires a fundamental change in design strategy

Challenges

consider class X

public class X
{
    private int lastIdUsed;

    public int getNextId()
    {
        return ++lastIdUsed;
    }
}
  • say we
    • create an instance of X,
    • then set the lastIdUsed field to 42,
    • and then share the instance between 2 threads
  • now suppose both threads call getNextId() - there are 3 possible outcomes
    1. thread 1 gets value 43; thread 2 gets value 44; lastIdUsed is 44
    2. thread 1 gets value 44; thread 2 gets value 43; lastIdUsed is 44
    3. thread 1 gets value 43; thread 2 gets value 43; lastIdUsed is 43
  • to understand how that 3rd result is possible, you need to understand the JIT compiler and Java memory model
    • in short, there are 1,000's of possible paths for the threads to take within that getNextId method - most of the paths produce the right result
    • the problem is that some of those paths produce the wrong result

Concurrency defence principles

Principles and techniques for defending your systems from problems of concurrent code

SRP

  • Concurrency is complex enough to be a reason to change in its own right, and therefore should be separated from the rest of the code.
  • unfortunately it is common for concurrency implementation details to be embedded directly into other production code
  • consider these things
    • concurrency-related code has its own life cycle of development, change and tuning
    • concurrency-related code has its own challenges, which are different from and often more difficult than nonconcurrency-related code
    • the number of ways in which miswritten concurrency-based code can fail makes it challenging enough without the added burden of surrounding application code
  • recommendation: keep your concurrency-related code separate from other code

Corollary: limit the scope of data

  • 2 threads modifying a shared object can interfere with each other
  • one solution is to use the synchronized keyword to protect a critical section in the code that uses the shared object
  • these critical sections must be minimized:
    • you have to remember to protect each of these sections
    • you want to avoid duplicating effort to protect multiple sections
    • more of these sections makes it more difficult to determine sources of failures
  • recommendation: take data encapsulation to heart; severely limit the access of any data that may be shared

Corollary: use copies of data

  • try to avoid sharing data all together
  • sometimes it's possible to copy objects and treat them as read-only
  • sometimes it's possible to copy objects, collect results from multiple threads, and then merge the results in a single thread
  • the extra object creation is a possible concern
    • experiment to determine if this is an issue
    • more than likely, the savings in avoiding the intrinsic lock will make up for additional creation and GC overhead

Corollary: threads should be as independent as possible

  • consider writing your threaded code such that each thread is isolated, sharing no data with other threads
    • each thread processes one client request with all of its required data coming from an unshared source and stored as local variables
  • it's common to eventually run into shared resources (e.g. db connections) despite this effort
  • recommendation: attempt to partition data into independent subsets that can be operated on by independent threads, possibly in different processors

Know your library

  • mention of java-specific libraries that are designed for use in concurrency

Thread-safe collections

  • recommendation: review the classes available to you. in the case of java, become familiar with (list of specific java packages)

Know your execution models

key definitions

  • bound resources: resources of a fixed size or number used in a concurrent environment (e.g. db connectiosn, fixed-size r/w buffers)
  • mutual exclusion: only one thread can access shared data or a shared resource at a time
  • starvation: one thread or a group of threads is prohibited from proceeding for an excessively long time or forever (e.g. always letting fast-running threads first could mean long-running threads never run if there's no end to fast-running threads)
  • deadlock: two or more threads waiting for each other to finish--each thread has a resource that the other requires and neither can finish until it gets the other resource
  • livelock: threads in lockstep, each trying to do work but finding another "in the way." due to resonance, threads continue trying to make progress but are unable to for an excessively long time or forever

Producer-consumer

  • what it is
    • producer thread(s) create work and place on queue (bound resource) and signal that the queue is no longer empty
    • consumer thread(s) get work from queue and complete it and signal that the queue is no longer full
    • both potentially wait to be notified that they can continue
  • personal note: this sounds like what the filesystemwatcher library does
    • the one that i found online and modified

Readers-writers

  • this is about finding a balance between the needs of writer's vs those of readers
    • writers tend to block many readers, you could require the writer to wait until there are no readers
      • this emphasizes throughput at the expense of potentially starving writers
    • prioritizing writers too much can hurt throughput by blocking readers for a long period of time
  • the challenge is finding the right balance and avoiding concurrent update issues

Dining philosophers

  • a metaphor for threads competing for resources
  • the philosophers--sitting at a circular table with single-forks between them--think util hungry and require a fork-per-hand to eat
  • this can result in deadlock, livelock, throughput, and efficiency degradation
  • recommendation: learn the basic algorithms and understand their solutions

Beware dependencies between synchronized methods

  • recommendation: avoid using more than one method on a shared object
  • sometimes you just need to do this, 3 ways to handle correctly:
    1. client-based locking
      • client locks server before calling 1st method and releases after last method returns
    2. server-based locking
      • server has method (which client calls) that locks server, calls all methods, then unlocks
    3. adapted server
      • intermediary performs locking; this is server-based locking where original server can't be changed

Keep synchronized sections small

  • the synchronized keyword creates a lock to ensure a section of code only ever has 1 thread executing
  • locks create delays, add overhead, can result in contention and degraded performance
  • so use judiciously, and keep the sections small
  • recommendation: keep your synchronized sections as small as possible

Writing correct shut-down code is hard

  • writing graceful shutdown code is hard
  • common issues involve deadlock, with threads waiting for a signal to continue that never comes
  • e.g. parent thread spawns child threads, then waits for all to finish before shutting down
    • what if a child thread is deadlocked? the system will never shut down
  • e.g. similar situation but system has been instructed to shutdown
    • parent tells children to abandon work and finish--what if there was a pair of children working as producer/condsumer, the producer shuts down, and the consumer gets stuck waiting forever for a signal it expected from the producer
      • this would prevent the parent thread from finishing and the system would not shut down
  • recommendation: think about shut-down early and get it working early. it's going to take longer than you expect. review existing algorithms, bc this is probably harder than you think

Testing threaded code

  • recommendations:
    • Treat spurious failures as candidate threading issues
      • do not ignore system failures as one-offs
    • Get your nonthreaded code working first
      • don't chase nonthreading and threading bugs simultaneously
    • Make your threaded code pluggable
      • make it runnable with one, several, and varied # of threads during execution
      • make it runnable with test doubles and test it with test doubles that are fast/slow/varied
      • test to run a number of iterations
    • Make your threaded code tuneable
      • early on, find ways to test under many different configurations, and ensure the # of threads can be easily tuned
      • consider allowing that # to change while the system is running
      • consider self-tuning based on throughput and system utilization
    • Run with more threads than processors
      • running with more threads than processors forces task swapping
      • more task swapping will cause defective code to the surface (like a chaos monkey scenario)
    • Run on different platforms
      • run your code on all target platforms early and often
    • Instrument your code to try and force failures
      • Hand-coded
        • inserting statements like Thread.yield(), wait(), etc.
        • this approach isn't great
          • odds still aren't with you that you choose the right statement or location to insert it
          • might accidentally leave this code in prod
      • Automated
        • use jiggling strategies to ferret out errors
        • this is a unit testing-based approach
        • the simplistic suggestion to create the below example class with static method used to instrument the code with "jiggle points"
        • inserting ThreadJigglePoint.jiggle(); in a number of places
        • then write one implementation that simply does nothing for prod
        • then write another implementation that randomly chooses between different thread jiggling statements (e.g. Thread.yield(), wait(), etc.) or nothing
        • then have testing automatically run this a 1,000 times
        • then you either find an error or at least demonstrate due diligence
public class ThreadJigglePoint
{
    public static void jiggle(){}
}

Ch. 14 Successive Refinement

Incrementalism

  • big changes bad
    • it's tempting to make massive changes to a program structure to "improve" it
    • it's very hard to get a program working the same way it worked before the "improvement"
  • incremental changes good
    • use TDD to help
    • make the small changes, run the tests to ensure everything still works
    • always keep the system running!
  • refactoring is like solving a Rubik's cube

Ch. 15 JUnit Internals

Refactoring can take a long time, will be iterative, can involve reversals of previous changes before being done.

Ch. 16 Refactoring SerialDate

SerialDate is a class within a Java package.

Unit Tests

  • existing test suite covered about 50%
  • Before beginning to refactor this class, he created a suite of tests to provide full coverage of the class's functionality
    • These tests did not all pass from the start--they defined the behavior he thought the class should have
  • New test suite started at 92% coverage (with some tests commented out to start)
  • run tests after every change to keep it working as you refactor

Ch. 17 Smells and Heuristics

A helpful summary chapter!

This was more or less a summarizing of previous points in the book under a helpful set of categories.

Appendix A: Concurrency II

Concurrency is really hard.

Appendix B: org.jfree.date.SerialDate

This is just reference code for an earlier chapter in the book.

Appendix C: Cross References of Heuristics

Reference for earlier chapter in the book.

⚠️ **GitHub.com Fallback** ⚠️