Clean Code - egnomerator/misc GitHub Wiki
- Refactoring as a continual and incremental process
- The importance of names
- clear, concise, meaningful, consistent, don't "encode" (e.g. Hungarian), long isn't necessarily bad
- Design classes with high cohesion and DI
- Functions as one level of abstraction and the Stepdown Rule
- e.g.
To do X, first we do A, then we B, then we C
- To do A, we ...
- To do B, we ...
- To do C, we ...
To do Y, first we ...
- e.g.
- Keep it short
- minimize length of functions, of classes, and of parameter lists
- Clean boundaries such as 3rd party interfaces and unknown interfaces
- some applicable design patterns to help accomplish below concepts: Adapter, Facade, Bridge
- 3rd party: don't let the 3rd party API control you, integrate it with an interface customized to your app
- unknown: create the interface you wish you had, mock it in tests, then integrate real interface (e.g. use adaptor)
- Do not BDUF (big design up front)
- simple design up front, delay decisions as long as possible--you'll theoretically be more informed later
- the complimentary nature of objects and data structures (<-- copy/paste-search to read the notes)
Robert C. Martin's Collection of thoughts from:
- Bjarne Stroustrup: inventor of C++; author of The C++ Programming Language
- Grady Booch: author of Object Oriented Analysis and Design with Applications
- "Big" Dave Thomas: founder of OTI, godfather of the Eclipse strategy
- Michael Feathers: author of Working Effectively with Legacy Code
- Ron Jeffries: author of Extreme Programming Installed and Extreme Programming Adventures in C#
- Ward Cunningham: inventor of Wiki; inventor of Fit; coinventor of eXtreme Programming; Motive force behind Design Patterns; Smalltalk and OO thought leader; the godfather of all those who care about code
- Uncle Bob
clean code is:
- Does one thing well
- logic straight forward--bugs hard to hide
- Error handling is complete
- performance close to optimal--doesn't tempt adding messy unprincipled optimizations
- reads like well-written prose
- full of crisp abstractions and straightforward lines of control
- can be read and advanced by a developer other than its original author
- has unit and acceptance tests
- provides one way rather than many ways to do one thing
- minimal dependencies which are explicitly defined
- provides a clear and minimal API
- should be literate
- looks like it was written by someone who cares
- runs all the tests
- contains no duplication
- expresses all the design ideas that are in the system
- minimizes the number of entities such as classes, methods, functions, and the like
- meaningful names
- do one thing
- early use of simple abstractions to
- enable future change
- maintain focus on the purpose (guide away from extraneous implementations)
- expressiveness
- each routine you read turns out to be pretty much what you expected
- makes the language look like it was made for the problem
- easy to read
ALL BELOW ARE THE UN-EDITED COMPLETE NOTES
God is in the details
- Ludwig mies van der Rohe
TPM (Total Productive Maintenance)
- a 1951 Japanese quality approach
- 5S principles
- Seiri
- think "sort" in English
- Knowing where things are
- Suitable naming
- Seiton
- think "systematize" in English
-
A place for everything, and everything in its place
- Seiso
- think "shine" in English
- get rid of extraneous comments, commented out code, etc.
- Seiketsu
- standardization
- group must agree on coding style and set of practices
- Shutsuke
- self-discipline
- having the discipline to follow the practices and frequently reflect on one's work and be willing to change
- Seiri
Continually maintain and refactor code. Ship code that is more readable and more maintainable from the start.
architect Christopher Alexander views every act of design itself as a small, local act of repair.
French poet Paul Valery: a poem is never done and bears continual rework, and to stop working on it is abandonment.
The design lives in the code; design is a process, not a static endpoint
-
The code is the design
-
Simple code
The comic illustration about WTFs/minute as the only valid measurement of code quality
There are two parts to learning craftsmanship: knowledge and work. You must gain the knowledge of principles, patterns, practices, and heuristics that a craftsman knows, and you must also grind that knowledge into your fingers, eyes, and gut by working hard and practicing.
(Page 27).
Some have expressed that one day there will be no need for programmers
- this is nonsense, because there will always be a need to translate requirements into machine language
- humans can't even master this translation of vague requirements
- their will always be a need for programmers to perform this translation--and it takes an expert to do this effectively; a machine cannot do this, a human must be the translator
Bad code can bring a company down
- a company had a killer app that became popular quick
- eventually the company died bc users stopped using it due to excessive bugs, crashes, etc.
- an employee confirmed that this was due to creating a messy, un-maintainable code base while rushing to production
Wading through bad code--being impeded, slowed down by bad code
We commonly allow bad code saying that we'll clean it up later
- but LeBlanc's law says
later means never
Programmer's responsibility to defend the quality of the code against the boss's schedule-based demands
The only way to go fast is to keep the code as clean as possible.
Collected thoughts from:
- Bjarne Stroustrup: inventor of C++; author of The C++ Programming Language
- Grady Booch: author of Object Oriented Analysis and Design with Applications
- "Big" Dave Thomas: founder of OTI, godfather of the Eclipse strategy
- Michael Feathers: author of Working Effectively with Legacy Code
- Ron Jeffries: author of Extreme Programming Installed and Extreme Programming Adventures in C#
- Ward Cunningham: inventor of Wiki; inventor of Fit; coinventor of eXtreme Programming; Motive force behind Design Patterns; Smalltalk and OO thought leader; the godfather of all those who care about code
- Uncle Bob
clean code is:
- Does one thing well
- logic straight forward--bugs hard to hide
- Error handling is complete
- performance close to optimal--doesn't tempt adding messy unprincipled optimizations
- reads like well-written prose
- full of crisp abstractions and straightforward lines of control
- can be read and advanced by a developer other than its original author
- has unit and acceptance tests
- provides one way rather than many ways to do one thing
- minimal dependencies which are explicitly defined
- provides a clear and minimal API
- should be literate
- looks like it was written by someone who cares
- runs all the tests
- contains no duplication
- expresses all the design ideas that are in the system
- minimizes the number of entities such as classes, methods, functions, and the like
- meaningful names
- do one thing
- early use of simple abstractions to
- enable future change
- maintain focus on the purpose (guide away from extraneous implementations)
- expressiveness
- each routine you read turns out to be pretty much what you expected
- makes the language look like it was made for the problem
- easy to read
The boyscout rule:
Leave the campground cleaner than you found it.
- always refactor
Use Intention-Revealing Names
- example, instead of a variable
d
with a comment, use a meaningful variable name likedaysSinceCreation
Avoid disinformation
- don't use names that already have entrenched meanings which are different from the intended meaning
- don't use cryptic names that look like something else
- e.g.
l
(lower case "L" looks like an Arabic numeral "1")
- e.g.
Make meaningful distinctions
- don't make a minor name change to satisfy a compiler requirement
- e.g. 2 different things in the same scope that you think warrant the same name; you just misspell one of them
- later, a misspelling correction could cause a compiler error
- instead, these 2 different things should have 2 different names
- e.g. 2 different things in the same scope that you think warrant the same name; you just misspell one of them
- don't use number series naming!!!
- e.g.
a1
,a2
,a3
, ...
- e.g.
- don't use "noise words"
- e.g. a
Product
class that hasProcuctInfo
andProcuctData
properties- "Info" and "Data" are meaningless noise words
- e.g. a
Use pronounceable words
compare
class DtaRcrd102
{
private Date genymdhms;
private Date modymdhms;
private final String pszqint = "102";
}
to
class Customer
{
private Date generationTimeStamp;
private Date modificationTimeStamp;
private final String recordId = "102";
}
Use searchable names
- single-letter names can ONLY be used as local variables inside short methods
-
The length of a name should correspond to the size of its scope
Avoid encodings
- don't encode type or scope into names
- Hungarian Notation was common due to old language requirements and limitations
- this is no longer necessary and should be avoided
- don't use member prefixes (e.g.
m_description
"m_" is unnecessary clutter and a marker of old code)
Interfaces and Implementations
- scenario: building an abstract factory for the creation of shapes
- the factory will be an interface and will be implemented by a concrete class
- it's common to see
IShapeFactory
- Uncle bob prefers encoding implementation rather than enconding abstraction
- encoding the interface is a distraction at best and TMI at worst
- prefers the user not know they are working with an interface
- just want them to know it's a shape factory
-
if I must encode either the interface or the impementation, I choose the implementation.
- prefer
ShapeFactoryImp
or evenCShapeFactory
overIShapeFactory
- prefer
- my thoughts here:
- isn't there an overarching principle to abstract away and/or hide implementation details?
- and if so, isn't
IShapeFactory
more aligned with that principle thanShapeFactoryImp
orCShapeFactory
??
Avoid mental mapping
- common to see variable names that are neither from the problem domain nor the solution domain
- e.g. single letter variable name that we must mentally map to the concept
- "smart" vs professional programmers
- to professionals, clarity is king
Class names
- a class name should not be a verb
- a class name should be a noun or noun phrase
- e.g. Customer, WikiPage, AddressParser
Method names
- method names should be verb or verb phrases
- e.g. postPayment, deletePage, save
- Accessors, mutators, and predicates
- should be named for their value and prefixed with "get", "set", and "is", according to the javabean standard
string name = employee.getName();
customer.setName("mike");
if(paycheck.isPosted()) // ...
Overloaded constructors
- use static factory methods with names that describe the arguments
Complex fulcrumPoint = Complex.FromRealNumber(23.0);
// is generally better than
Complex fulcrumPoint = new Complex(23.0);
- consider enforcing their use by making the corresponding constructors private
Don't be cute
- say what you mean. mean what you say
- don't use "HolyHandGrenade" instead of "deleteItems" for entertainment value
Pick one word per concept
- be consistent, choose the word for the concept and stick with it
- "fetch", "retrieve", "get" as equivalent methods for different classes is confusing
- "controller", "manager", "driver" in same code base is confusing
Don't pun
- example "add" method
- using "add" for a method that adds 2 items
- and then using "add" when adding an item to a list
- maybe instead use "append" or "insert"
Use solution domain names
- programmers know technical terms and programming terms like pattern names and algorithm names
- e.g. AccountVisitor (visitor pattern)
Use problem domain names
- when solution domain names aren't sufficient, use domain names
- can at least consult a domain expert
- the code that has more to do with the problem domain concepts should have names drawn from the problem domain
Add meaningful context
- names which are meaningful on their own are uncommon
- commonly it's necessary to add context
- use well-named namespaces, classes, and functions to provide context
- in rare cases, prefixes might be necessary
Don't add gratuitous context
- example: "Gas Stataion Deluxe" application
- you have a "GSDAccountAddress"
- later you need another mailing address, do you reuse that specific class?
- the "GSD" is pointless, and worse, makes it less re-usable
- another downside in this scenario, is that searching for GSD in the app pulls up more results than you want due to overusing "GSD"
- shorter names are preferred over longer as long as they are still clear
- "Address" is a good class name and can have instances named "AccountAddress" and "CustomerAddress"
- if you need to distinguish between MAC address, port address, and web address, you might consider "PostalAddress", "MAC", and "URI"
- 1st rule of functions is that they should be small
- 2nd rule of functions is that they should be smaller than that
- should hardly ever be even 20 lines long
- shoot for less than a handful of lines
- and short lines--not taking up the width of the screen
Blocks and indenting
- if and loop blocks should be one line long--a function call
- that function should have a nicely descriptive name--documentation value
- no nesting (e.g.
if
s insideif
s) - all this keeps functions short and readable
Do one thing
- functions should do ONE THING
-
how do we know if a function is doing one thing??
- we ensure that the body of the function remain at ONE LEVEL OF ABSTRACTION lower
- the function's main concept (hopefully communicated by its name) is broken into steps
- those steps should be describable in a "TO" paragraph
- here is an example function followed by it's "TO" paragraph
// this example function from the book is a refactored version of the original
// - two versions of refactoring later actually
public static string renderPageWithSetupsAndTeardowns(PageData pageData, boolean isSuite)
throws Exception
{
if (isTestPage(pageData)) includeSetupAndTeardownPages(pageData, isSuite);
return pageData.getHtml();
}
// TO paragraph:
// TO RenderPageWithSetupsAndTeardowns, we check to see whether the page is a test
// page and if so, we include the setups and teardowns. In either case we render
// the page in HTML.
note: one way to know a function is doing just one thing is if you have a hard time extracting another function out of it who's name isn't merely a restatement of its implementation.
Sections within functions
- a function that can be divided into sections (e.g. declarations, initializations, etc.) is a sign that it's doing more than one thing
- functions that do one thing cannot be reasonably divided into sections
- to make sure that a function is doing just one thing, make sure that the statements within it are all at the same level of abstraction
The Stepdown Rule: Reading code from top to bottom
- the code should read like a top-down narrative
- each function should be followed by those at the next level of abstraction so that we can read the program descending one level of abstraction at a time as we read down the list of functions
- i.e. we want to be able to read the program as though it were a set of TO paragraphs, each of which is describing the current level of abstraction and referencing subsequent TO paragraphs at the next level down
- e.g.
- To print all prime factors under user input n we request user input for n, then we get all prime factors under n, and then we print these prime factors
- to request user input for n ...
- to ...
- to get all prime factors under n ...
- to ...
- to print these prime factors ...
- to ...
- to request user input for n ...
- To print all prime factors under user input n we request user input for n, then we get all prime factors under n, and then we print these prime factors
Switch statements
- these are hard to keep small
- they also are difficult to make do one thing, given that by their nature they do N things
- we can't always avoid switch statements but we can bury them at a low-level and never repeat them
-
general rule for switch statements
-
they can be tolerated if they appear only once, are used to create polymorphic objects, and are hidden behind an inheritance relationship so that the rest of the system can't see them.
-
Use descriptive names
- don't be afraid to use a long name
- a long descriptive name is better than a long descriptive comment
- don't be afraid to spend time choosing a name, even try several different names and read the code with each
- choosing a descriptive name can help clarify the design
- it's not uncommon that hunting for a good name results in a favorable restructuring of code
- the ideal number of arguments for a function is zero (niladic)
- next is one (monadic), followed closely by 2 (dyadic)
- 3 (triadic) should be avoided where possible
- more than 3 (polyadic) requires very special justification--and then shouldn't be used anyway
why is it so important to minimize # of arguments?
- arguments make it more difficult to understand the function
- increase testing effort the more the arguments, the argument combinations to test for
Common monadic forms
- common
- asking a question about an argument:
bool fileExists("MyFile")
- operating on an argument:
InputStream fileOpen("MyFile")
- asking a question about an argument:
- less common but still useful
- event:
void passwordAttemptFailedNTimes(int attempts)
- system interpret is to interpret this function as an event and use the argument to alter state
- use this form with care, make it clear that this is an event
- event:
- avoid other forms
-
void includeSetupPageInto(StringBuffer pageText)
- using an output arg instead of return value for transformation is confusing
- if a function is going to transform its input arg, the transformation should be the return value
-
Flag arguments
passing a boolean into a function is a truly terrible practice
Dyadic functions
-
Point p = new Point(0,0)
is obviously a valid dyadic function- a Cartesian point naturally takes 2 arguments
- in fact the 2 arguments are ordered components of a single value
-
assertEquals(expected, actual)
- this is obviously fine, but even this has a problem
- what order to pass in these arguments? you just have to learn it
- just be aware that dyadic functions come at a cost
- and be aware of the mechanisms that might be available to convert them to monads
Triads
- always try to avoid if possible
Argument objects
- when a function looks like it should take more than 2-3 args, these args could likely be wrapped in a class
Circle makeCircle(double x, double y, double radius);
Circle makeCircle(Point center, double radius);
- is reducing the # of args by creating objects out of them cheating? NO!
- when groups of variables are passed together (like the above Point for the circle) they're likely part of a concept that deserves a name of its own
Argument lists
- some functions take a variable # of arguments
String.Format("%s worked %.2f hours.", name, hours);
- if the variable arguments are all treated identically as in the above string format, then they are equivalent to a single argument
- thus String.Format() is dyadic
// another way to think of this function
String.Format(String format, Object... args);
- all the same rules apply to that dyad
- same is true for monads, dyads, and triads that take variable arguments
void monad(Integer... args);
void dyad(String name, Integer... args);
void triad(String name, int count, Integer... args);
Verbs and keywords
- monads should be verb/noun pairs
- e.g.
write(name)
or perhapswriteField(name)
is even better due to clarifying what name is
- e.g.
- encoding argument names into function names can be helpful
- e.g.
-
assertEquals(expected, actual)
... what order to we put our arguments? -
assertExpectedEqualsActual(expected, actual)
... this fixes the question of ordering
-
- e.g.
- side effects are unexpected changes to function arguments, instance properties, scope variables, globals, etc.
- these can be very damaging, because the function caller has no idea of or even reason to expect these effects
- example
- an example was given of a function called
checkPassword
which- not only checked the password, but also
- initialized Session if password was correct
- the side affect is initializing Session--the function name did not tell the caller about Session
- fix by changing name to
checkPasswordAndInitializeSession
(... lol, but no, bc SRP)
- an example was given of a function called
Output arguments
- in general, avoid output arguments, if a function must change the state of something, have it change the state of its owning object
Command query separation
-
functions should either do something or answer something, but not both
- negative example:
public boolean set(String attribute, String value);
- leads to odd code like this:
if(set("username", "uncle bob")){...}
- leads to odd code like this:
Prefer exceptions to returning error codes
- returning error codes from command functions is a subtle violation of command query separation
- read this section on page 46 (77 of PDF)
- the whole section is short and fits on this one page
- the section is a clear explanation with a wonderful code illustration/clarification
- interesting note:
- James Coplien seems to have the complete opposite opinion
- source: https://henrikwarne.com/2014/09/04/a-response-to-why-most-unit-testing-is-waste/#comment-5502
- on exceptions
- "In general, I hold exceptions to be a bad idea"
- "Exceptions are just hyper galactic GOTOs"
- "If you’re throwing an exception across more than two activation records, you probably can’t reason about the code"
- the paragraph he wrote that those quotes are from also mentions C++ though
- so idk if his opinion written there is limited to exceptions in C++, but i doubt it??
- on return codes
- "with good programming discipline they [exceptions] can be replaced by return codes ... and handled locally"
- James Coplien seems to have the complete opposite opinion
Extract try/catch blocks
- if you have multiple function calls in a try block, refactor/extract them into a new function, and call just that single function in the try block instead
- easier to read and modify
Error handling is one thing
- functions should do just one thing
- error handling is one thing
- if a function has a try block in it:
- the keyword
try
should be the first word in the function body, and - nothing after the try/catch block should exist in that function body
- the keyword
Error codes are dependency magnets - BAD!!!
- dependency magnets are classes or enums which many other classes must reference
- example: an error code enum ... add a new error code, all modules that reference this must re-compile
- better: exceptions - a new exception inherits from the base Exception class and won't have that issue
Don't repeat yourself
- don't repeat yourself
How to write good functions?
- start ugly, refactor, refactor, refactor
- TDD: don't just start ugly and refactor
- so start ugly but confident due to thorough unit test coverage
- then in the safety of continually passing unit tests, refactor, refactor, refactor
Comments are at best a necessary evil.
Examples of worth while comments
- authorship and copyright at top of file
- others were mentioned but didn't really seem worth noting, bc they were still noted as debatably worth using
Examples of bad comments
- a bunch of these examples, but not worth taking note of
- he especially liked showing examples similar to this:
// this is where we get the name
public string getName() {}
BOTTOM LINE of this chapter: don't use comments almost ever
a source file's vertical formatting
- try to keep it short
- example solutions like fitness and junit keep most files under 100 lines
- vertical openness between concepts - effective use of blank lines to separate concepts
- vertical density - grouping lines of a concept together
- vertical distance
- closely related concepts should be vertically closer to each other
declare variables as close to their usage as possible
- local function variables should be declared at the top of the function
- instance variables should be declared at the top of the class
-
this should not increase the vertical distance of these variables, because in a well-designed class, they are used by many, if not all, of the methods of the class
-
dependent functions
- if one function calls another, they should be vertically close
- caller above callee
conceptual affinity
- code that is closely related to other code should have little vertical distance
- caller/callee is just one example
- another example would be a group of functions that perform variations of a basic task
vertical ordering
- high-level functions first
- low-level details after
horizontal formatting
- according to the distribution of his example solutions
- 20-60 chars represents 40%
- "Hollerith" limit of 80 is "a bit arbitrary"
- 100-120 should be the max
- uncle bob sets his limit at 120
horizontal openness and density
- white space can help to associate or disassociate concepts
- e.g.
myFunction(parm1, param2)
- no space between function name and parenthesis
- space between params bc they are separate things
- e.g.
return (-b + Math.sqrt(determinant)) / (2*a)
andreturn b*b - 4*a*c
- no space in the higher precedence multiplied factors
- space between the lower precedence added terms
horizontal alignment
var numbVar = 1;
var numbVarLonger = 1;
var numbVarRidiculouslyLong = 1;
- e.g. above shows aligning variable assignments
- don't do this
- if you have a long list of declarations, you may need to refactor something(s) out
indentation
- very important
- this helps show the hierarchy of scopes (namespace, class, method, block, etc.)
- don't break these indentations for short statements that normally are expressed in blocks
// prefer this
public class CommentWidget extends TextWidget
{
public static final String REGEXP = "example regex pattern";
public CommentWidget (ParentWidget parent, String text)
{
super(parent, text);
}
public String render() throws Exception
{
return "";
}
}
// over this
public class CommentWidget extends TextWidget
{
public static final String REGEXP = "example regex pattern";
public CommentWidget (ParentWidget parent, String text){super(parent, text);}
public String render() throws Exception{return "";}
}
dummy scopes
- sometimes a for and while loops have dummy scope bodies
- try to avoid these or at least make them very obvious
// prefer this
while (dis.read(buf, 0, readBufferSize) != -1)
;
// over this
while (dis.read(buf, 0, readBufferSize) != -1);
team rules
- decide on formatting rules for the team, and each team member needs to follow these rules
- the entire code base must be consistent
Uncle Bob's formatting rules
- Listing 5-6, pages 91, 92 (pp. 122, 123 of the PDF)
- We keep our variables private, because we don't want anyone else depending on them.
- We want the freedom to change their type or implementation.
- Why do so many programmers automatically add getters and setters to their objects, exposing their private variables as if they were public?
Data abstraction
- it's better to not just add auto-getter/setters
- this does not hide implementation
- use abstraction to hide data implementation while still allowing users to manipulate
example:
// this reveals implementation
public interface Vehicle
{
double getFuelTankCapacityInGallons();
double getGallonsOfGasoline();
}
// this hides implementation
public interface Vehicle
{
double getPercentFuelRemaining();
}
Procedural Shapes
public class Square
{
public Point topLeft;
public double side;
}
public class Rectangle
{
public Point topLeft;
public double height;
public double width;
}
public class Circle
{
public Point center;
public double radius;
}
public class Geometry
{
public final double PI = 3.14;
public double area(Object shape) throws NoSuchShapeException
{
if (shape instanceof Square)
{
Square s = (Square)shape;
return s.side * s.side;
}
else if (shape instanceof Rectangle)
{
Rectangle r = (Rectangle)shape;
return r.height * r.width;
}
else if (shape instanceof Circle)
{
Circle c = (Circle)shape;
return PI * c.radius * c.radius;
}
throw new NoSuchShapeException();
}
}
Polymorphic Shapes
public class Square implements Shape
{
private Point topLeft;
private double side;
public double area()
{
return side*side;
}
}
public class Rectangle implements Shape
{
private Point topLeft;
private double height;
private double width;
public double area()
{
return height * width;
}
}
public class Circle implements Shape
{
private Point center;
private double radius;
public final double PI = 3.14;
public double area()
{
return PI * radius * radius;
}
}
objects vs data structures
- objects hide their data behind abstractions and expose functions that operate on that data
- see polymorphic shapes code above
- the objects have hidden data, and public functions operating on that data
- data structures expose their data and have no meaningful functions
- see procedural shapes code above
- the objects are really just data structures - public data, no functions
the complimentary nature of objects and data structures
- Procedural code makes it easy to add new functions without changing the existing data structures
- see procedural shapes code above
- could easily add a new
perimeter
function to theGeometry
class, and all "shape" data structures would be unaffected
- OO code makes it easy to add new classes without changing existing functions
- see polymorphic shapes code above
- can easily add a new shape class, and the existing shape classes'
area
functions would be unaffected
the compliment is also true:
- Procedural code makes it hard to add new data structures, because all the functions must change
- adding a new "shape" data structure would require updating the
Geometry.area
function (and aperimeter
function, and so on)
- adding a new "shape" data structure would require updating the
- OO code makes it hard to add new functions because all the classes must change
- if we wanted to add a
perimeter
function to shapes, we'd have to add the new function to the super class and implement that in all inheriting classes
- if we wanted to add a
So, the things that are hard for OO are easy for procedures, and the things that are hard for procedures are easy for OO!
In any complex system there are going to be times when we want to add new data types rather than new functions. For these cases objects and OO are most appropriate. On the other hand, there will also be times when we'll want to add new functions as opposed to data types. In that case procedural code and data structures will be more appropriate.
Mature programmers know that the idea that everything is an object is a myth. Sometimes you really do want simple data structures with procedures operating on them.
Law of Demeter (LOD)
- a module should not know about "the innards" of the objects it manipulates
- an object should not expose its internal structure through accessors, because to do so is to expose, rather than to hide, its internal structure
- the Law states that a method
f
of classC
should only call the methods of these:C
- an object created by
f
- an object passed as an argument to
f
- an object held in an instance variable of
C
- the method
f
should not invoke methods on objects that are returned by any of the allowed functions - example possible violation:
final String outputDir = ctxt.getOptions().getScratchDir().getAbsolutePath();
- "train wreck" ... code like the above is often called a train wreck bc it looks like a bunch of coupled train cars
Options opts = ctxt.getOptions();
File scratchDir = opts.getScratchDir();
final String outputDir = scratchDir.getAbsolutePath();
- the above snippets (train wreck style or the second) are in a function that has a lot of knowledge
- knowledge of options, of a scratch directory, and of an absolute path
- whether the code violates the LOD depends on whether
ctxt
,Options
, andScratchDir
are objects or data structures- if they are objects, then exposing their "innards" is a violation
- if they are data structures, then LOD doesn't apply
- bc data structures naturally expose data and have not functionality
- the use of accessor functions confuses the issue
- LOD would not be a question if the code had been written like this:
final String outputDir = ctxt.options.scratchDir.absolutePath;
- this issue would be a lot less confusing if data structures simply had public variables and no functions, and objects had private variables and public functions
- but there are frameworks and standards (e.g. "beans") that demand that even simple data structures have accessors and mutators
Hybrids
- avoid these
- they are indicative of a muddled design whos authors are unsure or (worse) ignorant of whether they need protection from functions or types
- what are they
- part object, part data structure
- have functions that do significant things
- have public variables or accessors and mutators that basically make the private variables public
- these hybrids are the worst of both worlds
- they make it hard to add new functions AND hard to add new data structures
Hiding structure
- make sure that higher level classes aren't aware of lower level functionality
- make sure that classes aren't breaking the LOD - make sure they're not aware of internals of classes that they shouldn't know the internals of
DTOs
- a quintessential data structure, a class with public variables and no functions
- very useful, for I/O communication (e.g. database, sockets, etc.)
- often the first in a series of translation stages converting raw data in a database into objects in the application code
- the "bean"
- somewhat more common (at least in Java i assume--my thought)
- have private variables manipulated by getters/setters
-
the "quasi-encapsulation of beans seems to make some OO purists feel better but usually provides no other benefit
- active record
- a specific form of DTO
- in addition to the public (or publicly accessible) variables, they have navigational methods
- e.g.
save()
andfind()
- e.g.
- typically direct translations from database tables or other data sources
- developers often treat these as objects by putting business rules in them - don't
- instead, use an object that hides it's internals, and have these internals be instances of these active records (this is an example approach, there are other ways)
conclusion
- objects expose behavior and hide data
- makes it easy to add new kinds of objects without changing existing behaviors
- makes it hard to add new behaviors to existing objects
- data structures expose data and have no significant behavior
- makes it easy to add new behaviors to existing data structures
- (my thought: idk if "better" wording here might be:)
- ("makes it easy to add new behaviors that operate on existing data structures" or)
- ("makes it easy to add new functions that operate on existing data structures")
- (my thought: idk if "better" wording here might be:)
- makes it hard to add new data structures to existing functions
- makes it easy to add new behaviors to existing data structures
- many code bases are dominated by error handling
- error handling is important, but if it obscures logic, it's wrong
Use exceptions rather than return codes
- return codes are used in languages that don't have exceptions
- these languages are still around but if you have exceptions, you should use them
- problems with working with flags or return codes
- clutter the caller
- require that the caller check for errors immediately after the call
- exceptions allow cleaner code with logic unobscured by error handling
without exceptions
public class DeviceController
{
...
public void sendShutDown()
{
DeviceHandle handle = getHandle(DEV1);
// Check the state of the device
if (handle != DeviceHandle.INVALID)
{
// Save the device status to the record field
retrieveDeviceRecord(handle);
// if not suspended, shut down
if (record.getStatus() != DEVICE_SUSPENDED)
{
pauseDevice(handle);
clearDeviceWorkQueue(handle);
closeDevice(handle);
}
else
{
logger.log("Device suspended. Unable to shut down");
}
}
else
{
logger.log("Invalid handle for: " + DEV1.toString());
}
}
...
}
with exceptions
public class DeviceController
{
...
public void sendShutDown()
{
try
{
tryToShutDown();
}
catch (DeviceShutDownError e)
{
logger.log(e);
}
}
private void tryToShutDown() throws DeviceShutDownError
{
DeviceHandle handle = getHandle(DEV1);
DeviceRecord record = retrieveDeviceRecord(handle);
pauseDevice(handle);
clearDeviceWorkQueue(handle);
closeDevice(handle);
}
private DeviceHandle getHandle(DeviceID id)
{
...
throw new DeviceShutDownError("Invalid handle for: " + id.toString());
...
}
...
}
- the code with exceptions untangled the 2 concerns - 1. algorithm for device shutdown 2. error handling
Write your try-catch-finally
statement first
t-c-f => try-catch-finally statement
- one of the most interesting things about exceptions is that they define a scope within your program
- when you execute code in the try block of a t-c-f you are stating that execution can abort at any point and then resume in the catch
- a t-c-f is, in a way, like a transaction - the catch block must leave the program in a consistent state regardless of what happens in the try
- good practice to start with a t-c-f whenever writing code that could throw exceptions
- this helps define what users of the code should expect if something goes wrong
Use unchecked exceptions
- java has "checked exceptions" where a method's signature includes a list of exception types it can pass to its caller
- examples of languages that have no such thing: C#, C++, Python, Ruby
- robust systems can be created without checked exceptions - so does there benefit justify their price?
- the price: violating the OCP
- if you throw a checked exception who's catch is 3 methods above, the signatures of the 2 methods under the catching method must declare that exception type
- this means changes to low level methods' signatures for checked methods will force signature changes to higher level methods
- worth the price? basically no (but it was mentioned maybe for a critical library)
- the price: violating the OCP
Provide context with exceptions
- create informative error messages and pass them along with your exceptions
- mention the operation that failed and the type of failure
Define exception classes in terms of a caller's needs
- many ways to classify errors
- by source - what component did they come from
- by type - device failures, network failures, programming errors, etc.
- when defining exception classes in an app, most important concern should be how they are caught
poor exception classification
ACMEPort port = new ACMEPort(12);
try
{
port.open();
}
catch (DeviceResponseException e)
{
reportPortError(e);
logger.log("Device response exception", e);
}
catch (ATM1212UnlockedException e)
{
reportPortError(e);
logger.log("Unlock exception", e);
}
catch (GMXError e)
{
reportPortError(e);
logger.log("Device response exception", e);
}
finally
{
...
}
refactored
LocalPort port = new LocalPort(12);
try
{
port.open();
}
catch (PortDeviceFailure e)
{
reportError(e);
logger.log(e.getMessage(), e);
}
finally
{
...
}
public class LocalPort
{
private ACMEPort innerPort;
public LocalPort(int portNumber)
{
innerPort = new ACMEPort(portNumber);
}
public void open()
{
try
{
innerPort.open();
}
catch (DeviceResponseException e)
{
throw new PortDeviceFailure(e);
}
catch (ATM1212UnlockedException e)
{
throw new PortDeviceFailure(e);
}
catch (GMXError e)
{
throw new PortDeviceFailure(e);
}
}
}
- why is the refactored version better
- simplifying the code by wrapping the API we're calling and making sure it returns a common exception
- since we are doing basically the same thing for each exception type
benefits of wrapping APIs
- wrapping 3rd-party APIs is considered best practice, here's why
- wrapping it minimizes your dependency on it
- you can choose to use a different library with minimal penalty
- makes it easier to mock
- not tied to the API vendors design choices
- your wrapper is the API you want, and if the vendor changes something, you have minimal changes to make
define the normal flow
- following the above discussed exception handling practices pushes error detection to the edges of the program resulting in less cluttered code
- wrap APIs to throw your own exceptions
- handler above your code to handle aborted computations
- this is good but sometimes you may not want to abort a computation
awkward code
try
{
MealExpenses expenses = expenseReportDAO.getMeals(employee.getID());
m_total += expenses.getTotal();
}
catch (MealExpensesNotFound e)
{
m_total += getMealPerDiem();
}
- in this business, if meals are expensed, they're part of the total, otherwise, a per diem is expensed for the employee
- the exception handling is cluttering the logic
- would rather the code simply look like this:
MealExpenses expenses = expenseReportDAO.getMeals(employee.getID());
m_total += expenses.getTotal();
- we can make it that simple
- change the
ExpenseReportDAO
to always return aMealExpenses
object - if there are no meal expenses, it returns a
MealExpenses
object that returns the per diem as its total
public class PerDiemMealExpenses implements MealExpenses
{
public it getTotal()
{
// return the default per diem
}
}
- this is the "Special Case Pattern" (book references Fowler)
- create a class or configure an object to handle a special case
- doing so relieves client code of the burden of dealing with exceptional behavior
- this is an error handling chapter, so we will discuss things we do that invite errors
Don't return null
- it's very common to see lines of code all over a program checking for null
- this is bad!
- returning null is creating problems for yourself - for the caller of what returns null
-
if you are tempted to return null from a method, consider throwing an exception or returning a special case object instead
- similarly with 3rd party APIs, wrap an API method that returns null in your own method that either returns a special case object or throws
- an example: a method that returns a list that can return null
- instead of checking for null returned from this method, make the method return an empty list instead of null
- now you don't have to check for null before iterating the list, just iterate it
Don't pass null
- avoid passing null into methods whenever possible - this is even worse than returning null
- working with an API method that expects you to pass null is an example where you can't avoid this
- in most programming languages, there isn't a good way to handle null passed in to a method
- you can check for null and have an exception handler for
InvalidArgumentException
, but what should the handler do?
- you can check for null and have an exception handler for
- just avoid passing null into methods
- we seldom control all the software in our system
- we may buy 3rd-party packages, use open source, depend on components from other teams, etc.
- we have to integrate the foreign code with our own
- we have to keep the boundaries of our software clean
Using 3rd-Party code
- there's a natural tension between an interface provider and the user of that interface
- interface providers strive for broad applicability, users want the interface to focus on their needs
- this tension can lead to problems at system boundaries
- example:
java.util.Map
- what if we want to insure that no one deletes items in a map we want to pass around?
- there's a
clear
function in theMap
's interface
- there's a
- what if we want to limit the types of objects put in a map?
-
Map
is generic
-
- what if we want to insure that no one deletes items in a map we want to pass around?
// say we have an app that needs a Map of Sensors
Map sensors = new HashMap();
// now some other part of the code needs to access sensor
// - we notice that this tends to be all over the place in the application
// - the client is carrying the responsibility of casting
Sensor s = (Sensor)sensors.get(sensorId);
// let's improve the readability of the code
Map<Sensor> sensors = new HashMap<Sensor>();
// now the client no longer needs to handle casting in all those places in the code
Sensor s = sensors.get(sensorId);
// Map<Sensor> is still providing more capability than we need or want
// And if the Map interface ever changes, we may have many places in the code to change
// ...
// This class could be an good approach
// this can evolve with very little impact on the rest of the app
// the use of generics is no longer a big issue, bc casting is handled in the class
// this interface is tailored to the needs of the app
// the code is easier to read, and harder to misuse
// this class can enforce design and business rules
public class Sensors
{
private Map sensors = new HashMap();
public Sensors getById(String id)
{
return (Sensor)sensors.get(id);
}
// snip
}
// use
Sensors sensors = new Sensors();
Sensor s = sensors.getById(sensorId);
- this is not a suggestion to always encapsulate the use of Map in this form
- this is advice that you NOT pass Maps (or any other interface at the boundary) around your system
- if you use a boundary interface like Map, keep it inside the class, or close family of classes, where it's used
- avoid returning Map from, or accepting Map as an argument to, public APIs
Exploring and learning boundaries
- learning and integrating a 3rd-party API can take a lot of time and effort
- common to spend long debugging sessions trying to figure out how the 3rd-party code works and determining whether an exception is being thrown by a bug in your code or by code that is simply using the API wrong
- using "learning tests" can help
- create tests that focus on what you want out of the API
- these are essentially controlled experiments that check your understanding of the API
- learning tests not only help us learn the API, they verify that the package works as we expect
- if an update comes later, our learning tests show whether the package still works as we expect
- these learning tests can serve as "boundary tests"
- it's always good to support a clean boundary with a set of outbound tests that exercise the interface in a way that production code does
- boundary tests helps us migrate to newer versions rather than feeling like we have to stay with an old version longer than we maybe should
Using code that does not yet exist
- there's another kind of boundary, one that separates the known from the unknown
- example:
- sometimes you have to write code for an application that relies on 3rd-party code for which you don't know the interface yet
- how do you start writing you code in this scenario??
- start working far away from the boundary
- you'll find yourself getting close to that boundary and even bumping up against it
- as you do this, you'll learn more about what you WANT the interface to do for you
- define the interface you WISH you had
- you can mock the API in tests
- once the real interface is ready
- create an Adapter to bridge the gap between your application interface and the other interface
- this Adapter encapsulates the interaction with the API and provides a single place to change when the API changes
- you can create boundary tests once you have the API
- create an Adapter to bridge the gap between your application interface and the other interface
principles
- design to accommodate change without huge investments and rework
- avoid letting to much of your code know about external APIs
- have clear separation and tests that define expectations
- 3 laws of TDD given
- writing tests this way results in a bulk of test code that can present a daunting management problem
Keeping tests clean
- dirty test code is as bad as--or worse than--no test code
- test code has to change as production code changes, and the dirtier the code the greater the maintenance burden
- with dirty test code, tests will become viewed as an ever-increasing liability
- test code is just as important as production code
Tests enable the -ilities
-
unit tests are what keep your code flexible, maintainable, and reusable
- if you have unit tests, you do not fear making changes to the code!
- unit tests allow you to make changes with confidence
- confidence that you have not broken anything
- this means you can make refactor and improve the architecture and design without fear
- tests enable all the -ilities, bc tests enable change
Clean tests
- what makes a clean test? readability
- you want to say a lot with as few expressions as possible
Domain-specific testing language
- design a testing API that helps the tests be simple and readable
- this testing API is not designed up-front, but evolves over time as a result of continually refactoring the tests
Dual standard
- this dual standard is related to things like efficiency in memory and CPU
- unit tests don't have to be as efficient as prod code
- (but tests MUST still be clean and expressive)
One assert per test
- this helps readability and is a good guideline
- but doesn't necessarily always have to be followed
- definitely minimize the number of asserts
Single concept per test
- test a single concept in each test function
- this is probably a better rule than the one-assert-per-test rule
F.I.R.S.T.
- Fast - tests should run fast
- Independent - tests should not depend on each other
- Repeatable - tests should be repeatable in any environment
-
you should be able to run the tests in the production environment, in the QA environment, and on your laptop while riding home on the train without a network
-
- Self-Validating - tests should have a boolean output (pass or fail)
- you shouldn't have read a log file or compare output files
- Timely - tests should be written in a timely fashion
- tests should be written ahead of the production code that makes them pass
class organization
- standard Java convention says ...
- a class should begin with a list of variables
- public static constants, then
- private static variables, then
- private instance variables
- (there is seldom a good reason to have a public variable)
- public functions should follow the variables
- put the private functions called by a public function immediately after than public function
Encapsulation (emphasis in this quite is mine)
We like to keep our variables and utility functions private, but we're not fanatic about it. Sometimes we need to make a variable or utility function protected so that it can be accessed by a test. For us, tests rule. If a test in the same package needs to call a function or access a variable, we'll make it protected or package scope. However, we'll first look for a way to maintain privacy. Loosening encapsulation is always the last resort.
p. 136 (p. 167 of PDF)
- my thoughts from reading this - and from other things he has said
-
uncle bob apparently sometimes puts unit tests in the same package as the production code they're testing??
-
protected
, in Java, means accessible in the containing type and derived types, and in the containing package -
packages
, in Java, are basically Namespaces- source: Oracle here - https://docs.oracle.com/javase/tutorial/java/concepts/package.html
- source: Jon Skeet here - https://stackoverflow.com/a/25160624/9533368
- i assume that this choice is not common?
- the last sentence in that paragraph emphasizes that "loosening of encapsulation is always a last resort"
- so that may mean that even if they have a test in the same package, it likely won't affect the SUT's encapsulation
- it also could maybe indicate that it's not very common for him to put unit tests in the same package? (but this is a guess/assumption)
-
- on our team, we never put tests in the same project as our production code
- we have separate designated test projects where we put all tests
-
uncle bob apparently sometimes puts unit tests in the same package as the production code they're testing??
Classes should be small!
- how small?
-
measure in # of responsibilities - 1
- with functions, we measured in # of lines
- a class name should describe its responsibility
- it should be concise and unambiguous
Single Responsibility Principle (SRP)
- a class or module should have one, and only one, reason to change
- many developers fear that a large number of small, single-purpose classes makes it more difficult to understand the bigger picture
- concerned about having to navigate from class to class to figure out how a larger piece of work gets accomplished
- just as much to learn about the app regardless of organization
- analogy of a toolbox:
- you want everything thrown in a couple drawers?
- or everything organized into smaller, labeled drawers?
- analogy of a toolbox:
Cohesion
- classes should have a small # of instance variables
- each method of a class should manipulate one or more of these instance variables
- in general, the more instance variables a method manipulates, the more cohesive that method is to its class
- a class in which each variable is used by each method is maximally cohesive
- in general it is neither advisable nor possible to create such maximally cohesive classes
- but we do want high cohesion
- having high cohesion means that the methods and variables of the class make logical sense to be together
- the strategy of keeping functions small and parameter lists short can sometimes lead to a proliferation of instance variables that are used by a subset of methods
- when this happens, it normally means you can split it into smaller classes
Maintaining cohesion results in many small classes
- example given of an original code version and a refactored version
- original version was one long method
- new version was many small classes and functions
- more readable, understandable, maintainable
Organizing for change
- any time that you have to modify a class, you are introducing a risk that the system no longer works as expected
- open/closed principle was alluded to here
- what to do when we spot a class that seems to violate SRP
- we should primarily consider system change over "fixing" a violation of SRP
- if we have no current reason to change a class that violates SRP and no need for change in the forseeable future, then we leave the class alone
- introducing the change at all is still introducing risk
- if we do have a reason to change a class that violates SRP, it's time to strongly consider refactoring
- now introducing the change is happening regardless, so we want to change it the right way
Isolating from change
- client classes of concrete objects depend on those concrete objects' concrete details
- these client classes are at risk when those concrete details change
- minimize the impact of these changes with interfaces and abstract classes
- DIP (dependency inversion principle) says that our classes should depend on abstractions, not concrete details
Separate constructing a system from using it
Software systems should separate the startup process, when the application objects are constructed and the dependencies are "wired" together, from the runtime logic that takes over after startup.
- The startup process is a concern that the app must address
- the separation of concerns is important in software design
- an example of NOT separating the startup concern:
public Service getService()
{
if(service = null)
service = new MyServiceImpl(...); // Good enough default for most cases?
return service;
}
- that's the lazy initialization/evaluation idiom
- merits:
- you only incur construction overhead if you use the object
- this can result in faster startup times
- you ensure null isn't returned
- cons:
- hard-coded dependency on
MyServiceImpl
and everything its constructor requires - you can't even compile without resolving these dependencies--even if you never use it
- testing can be a problem especially if
MyServiceImpl
is a heavy object - construction logic is mixed in with normal runtime processing
- breaking SRP
- requires testing for more execution paths
- hard-coded dependency on
- merits:
- one occurrence of lazy initialization isn't a serious problem
- but too many occurrences of this can hurt modularity and cause duplication
- we should have a global, consistent strategy for resolving major dependencies
Separation of Main
One way to separate construction from use is simply to move all aspects of construction to
main
, or modules called bymain
, and to design the rest of the system assuming that all objects have been constructed and wired up appropriately.
- the flow of control is easy to follow
- all objects the system needs are constructed in main
- all dependency arrows go the same way crossing the barrier between main and the app
- they all point away from main
- the app doesn't know about main, it just expects that everything it needs has been built
Factories
- sometimes the app needs to control when objects are created
- Abstract Factory pattern gives the app control of when but hides the details of how
Dependency Injection
- DI is a mechanism for separating construction from use
- DI is the application of IoC to dependency management
- IoC moves secondary responsibilities from an object to other objects dedicated to those purposes
- an object should not instantiate its dependencies
- this should be handled by another "authoritative" mechanism - inverting control
- setup is a global concern, so the authoritative mechanism should be "main" routine or a special-purpose container
- example of "partial" DI:
MyService myService = (MyService)(jndiContext.lookup("NameOfMyService"));
- this is "partial" DI, bc:
- the invoking object does not control what type of object is returned
- but, the invoking object actively resolves the dependency
- True DI goes a step further
- the class takes no direct steps to resolve the dependency, it's passive
- it provides a setter method or constructor argument that are used to inject the D
- example of True DI
public MyInvokingObject(MyService myService){...}
Scaling Up
- it is a myth that we can get systems "right the first time"
- scale at the code level
- implement what you need today, then refactor and expand tomorrow
- scale at system level
- thinking in comparison to the approach at the code level
- doesn't system architecture require preplanning?
- certainly system architecture can't grow incrementally from simple to complex?
- the ephemeral nature of software systems allows incremental growth
Software systems are unique compared to physical systems. Their architectures can grow incrementally, if we maintain the proper separation of concerns.
- thinking in comparison to the approach at the code level
Cross-Cutting Concerns
- example, persistence concerns
- even though we want to achieve modularity - separation of concerns, implementations of persistency strategies must be spread across many many objects
- AOP (Aspect-Oriented Programming)
- an approach intended to restore modularity for cross-cutting concerns
- modular constructs called "aspects" specify which points in the system should have their behavior modified in some consistent way to support a particular concern
- this specification is done using a succinct declarative or programmatic mechanism
- in AOP, with persistence, you would declare which objects and attributes (or patterns thereof) should be persisted and then delegate the persistence tasks to your persistence framework
- AOP Frameworks
- Spring AOP and JBoss AOP are examples of Pure Java AOP frameworks
- these tools automate the implementation of complicated proxy boilerplate (see Java Proxies section on p. 161, PDF p. 192)
- this allows the programmer to focus on their domain and implement business logic in POJOs
- the frameworks are incorporated via declarative configuration files or APIs
- this enables defining more consistent system-wide behavior
- the frameworks create nested
decorator
objects around the POJO
BDUF (big design up front) - bad
- this is not only unnecessary, it's potentially harmful due to inhibiting adaptation to change
- psychologically difficult to discard work that resulted from a lot of previous effort
- architectural choices influence subsequent thinking about the design
- building architects have to BDUF bc radical architectural changes aren't economically feasible after significant construction
- it can be economically feasible for software architects to make radical changes after significant development if their application's structure has effectively separated concerns
Simple Design up front - good
- a "naively simple" but nicely decoupled architecture allows for quicker delivery and future scalability
Optimize decision making
- delay decisions as long as possible to ensure making the most informed choices
Use standards wisely - when they add demonstrable value
- there are many mature standards and standards-based tools available, not all situations call for them though
- i thought of WCF in this brief section
A design is "simple" if it: (Kent Beck's 4 rules of simple design - in order of importance)
- Runs all the tests
- Contains no duplication
- Expresses the intent of the programmer
- Minimizes the number of classes and methods
And, couched in TDD
-
Runs all the tests
- this enables refactoring safely
-
Refactoring: Contains no duplication
- not only exact duplicate lines, but also this type of example:
int size() {} boolean isEmpty() {} // can eliminate duplication by something like this: // - calling size rather than repeating what size does and checking the result boolean isEmpty() { return 0 == size(); }
-
Refactoring: Expresses the intent of the programmer
-
Refactoring: Minimizes the number of classes and methods
Objects are abstractions of processes. Threads are abstractions of schedule.
James O. Coplien
Concurrency is hard.
Concurrency is a decoupling strategy
- it helps us decouple what gets done from when it gets done
- a single-threaded app has highly coupled what and when
- can determine app's state by just looking the stack backtrace
- can set breakpoints and know the state by the breakpoint that's hit
- decoupling what from when can dramatically improve both throughput and structures of an app
- structurally, this kind of app can look like a bunch of tiny computers collaborating rather than one main loop
- this offers powerful ways to separate concerns
- this can make the system easier to understand
example scenario where concurrency could improve a system
- info aggregator that retrieves data from websites and reports a daily summary
- a lot of network I/O waiting - the websites list grows, combing all websites can take too long
- a system handles one user at-a-time and requires 1 second per user
- fine for low number of users, but what about if users have to get in line of 150 users
- a system that interprets large data sets but can only give complete solution after processing all sets
- perhaps each set should be processed by a different machine for parallel processing
Myths and misconceptions
- concurrency always improves performance
- it can sometimes improve performance, but only when there is a lot of wait time that can be shared between multiple threads or multiple processors - neither situation is trivial
- design does not change when designing concurrent programs
- the design of a concurrent algorithm can be remarkably different from a single-threaded
- the decoupling of what from when usually has a huge effect on the structure of the system
- understanding concurrency issues is not important when working with a container such as a Web or EJB container
- you should know about concurrent update and deadlock (described later in book)
Some more balanced statements regarding concurrency
- concurrency incurs some overhead, both in performance as well as writing additional code
- correct concurrency is complex, even for simple problems
- concurrency bugs aren't usually repeatable, so they are often ignored as one-offs instead of the true defects they are
- concurrency often requires a fundamental change in design strategy
consider class X
public class X
{
private int lastIdUsed;
public int getNextId()
{
return ++lastIdUsed;
}
}
- say we
- create an instance of
X
, - then set the
lastIdUsed
field to 42, - and then share the instance between 2 threads
- create an instance of
- now suppose both threads call
getNextId()
- there are 3 possible outcomes-
thread 1
gets value43
;thread 2
gets value44
;lastIdUsed
is44
-
thread 1
gets value44
;thread 2
gets value43
;lastIdUsed
is44
-
thread 1
gets value43
;thread 2
gets value43
;lastIdUsed
is43
-
- to understand how that 3rd result is possible, you need to understand the JIT compiler and Java memory model
- in short, there are 1,000's of possible paths for the threads to take within that
getNextId
method - most of the paths produce the right result - the problem is that some of those paths produce the wrong result
- in short, there are 1,000's of possible paths for the threads to take within that
Principles and techniques for defending your systems from problems of concurrent code
SRP
- Concurrency is complex enough to be a reason to change in its own right, and therefore should be separated from the rest of the code.
- unfortunately it is common for concurrency implementation details to be embedded directly into other production code
- consider these things
- concurrency-related code has its own life cycle of development, change and tuning
- concurrency-related code has its own challenges, which are different from and often more difficult than nonconcurrency-related code
- the number of ways in which miswritten concurrency-based code can fail makes it challenging enough without the added burden of surrounding application code
- recommendation: keep your concurrency-related code separate from other code
Corollary: limit the scope of data
- 2 threads modifying a shared object can interfere with each other
- one solution is to use the
synchronized
keyword to protect a critical section in the code that uses the shared object - these critical sections must be minimized:
- you have to remember to protect each of these sections
- you want to avoid duplicating effort to protect multiple sections
- more of these sections makes it more difficult to determine sources of failures
- recommendation: take data encapsulation to heart; severely limit the access of any data that may be shared
Corollary: use copies of data
- try to avoid sharing data all together
- sometimes it's possible to copy objects and treat them as read-only
- sometimes it's possible to copy objects, collect results from multiple threads, and then merge the results in a single thread
- the extra object creation is a possible concern
- experiment to determine if this is an issue
- more than likely, the savings in avoiding the intrinsic lock will make up for additional creation and GC overhead
Corollary: threads should be as independent as possible
- consider writing your threaded code such that each thread is isolated, sharing no data with other threads
- each thread processes one client request with all of its required data coming from an unshared source and stored as local variables
- it's common to eventually run into shared resources (e.g. db connections) despite this effort
- recommendation: attempt to partition data into independent subsets that can be operated on by independent threads, possibly in different processors
Know your library
- mention of java-specific libraries that are designed for use in concurrency
Thread-safe collections
- recommendation: review the classes available to you. in the case of java, become familiar with (list of specific java packages)
Know your execution models
key definitions
- bound resources: resources of a fixed size or number used in a concurrent environment (e.g. db connectiosn, fixed-size r/w buffers)
- mutual exclusion: only one thread can access shared data or a shared resource at a time
- starvation: one thread or a group of threads is prohibited from proceeding for an excessively long time or forever (e.g. always letting fast-running threads first could mean long-running threads never run if there's no end to fast-running threads)
- deadlock: two or more threads waiting for each other to finish--each thread has a resource that the other requires and neither can finish until it gets the other resource
- livelock: threads in lockstep, each trying to do work but finding another "in the way." due to resonance, threads continue trying to make progress but are unable to for an excessively long time or forever
Producer-consumer
- what it is
- producer thread(s) create work and place on queue (bound resource) and signal that the queue is no longer empty
- consumer thread(s) get work from queue and complete it and signal that the queue is no longer full
- both potentially wait to be notified that they can continue
- personal note: this sounds like what the filesystemwatcher library does
- the one that i found online and modified
Readers-writers
- this is about finding a balance between the needs of writer's vs those of readers
- writers tend to block many readers, you could require the writer to wait until there are no readers
- this emphasizes throughput at the expense of potentially starving writers
- prioritizing writers too much can hurt throughput by blocking readers for a long period of time
- writers tend to block many readers, you could require the writer to wait until there are no readers
- the challenge is finding the right balance and avoiding concurrent update issues
Dining philosophers
- a metaphor for threads competing for resources
- the philosophers--sitting at a circular table with single-forks between them--think util hungry and require a fork-per-hand to eat
- this can result in deadlock, livelock, throughput, and efficiency degradation
- recommendation: learn the basic algorithms and understand their solutions
Beware dependencies between synchronized methods
- recommendation: avoid using more than one method on a shared object
- sometimes you just need to do this, 3 ways to handle correctly:
- client-based locking
- client locks server before calling 1st method and releases after last method returns
- server-based locking
- server has method (which client calls) that locks server, calls all methods, then unlocks
- adapted server
- intermediary performs locking; this is server-based locking where original server can't be changed
- client-based locking
Keep synchronized sections small
- the synchronized keyword creates a lock to ensure a section of code only ever has 1 thread executing
- locks create delays, add overhead, can result in contention and degraded performance
- so use judiciously, and keep the sections small
- recommendation: keep your synchronized sections as small as possible
Writing correct shut-down code is hard
- writing graceful shutdown code is hard
- common issues involve deadlock, with threads waiting for a signal to continue that never comes
- e.g. parent thread spawns child threads, then waits for all to finish before shutting down
- what if a child thread is deadlocked? the system will never shut down
- e.g. similar situation but system has been instructed to shutdown
- parent tells children to abandon work and finish--what if there was a pair of children working as producer/condsumer, the producer shuts down, and the consumer gets stuck waiting forever for a signal it expected from the producer
- this would prevent the parent thread from finishing and the system would not shut down
- parent tells children to abandon work and finish--what if there was a pair of children working as producer/condsumer, the producer shuts down, and the consumer gets stuck waiting forever for a signal it expected from the producer
- recommendation: think about shut-down early and get it working early. it's going to take longer than you expect. review existing algorithms, bc this is probably harder than you think
Testing threaded code
-
recommendations:
- Treat spurious failures as candidate threading issues
- do not ignore system failures as one-offs
- Get your nonthreaded code working first
- don't chase nonthreading and threading bugs simultaneously
- Make your threaded code pluggable
- make it runnable with one, several, and varied # of threads during execution
- make it runnable with test doubles and test it with test doubles that are fast/slow/varied
- test to run a number of iterations
- Make your threaded code tuneable
- early on, find ways to test under many different configurations, and ensure the # of threads can be easily tuned
- consider allowing that # to change while the system is running
- consider self-tuning based on throughput and system utilization
- Run with more threads than processors
- running with more threads than processors forces task swapping
- more task swapping will cause defective code to the surface (like a chaos monkey scenario)
- Run on different platforms
- run your code on all target platforms early and often
- Instrument your code to try and force failures
- Hand-coded
- inserting statements like
Thread.yield()
,wait()
, etc. - this approach isn't great
- odds still aren't with you that you choose the right statement or location to insert it
- might accidentally leave this code in prod
- inserting statements like
- Automated
- use jiggling strategies to ferret out errors
- this is a unit testing-based approach
- the simplistic suggestion to create the below example class with static method used to instrument the code with "jiggle points"
- inserting
ThreadJigglePoint.jiggle();
in a number of places - then write one implementation that simply does nothing for prod
- then write another implementation that randomly chooses between different thread jiggling statements (e.g.
Thread.yield()
,wait()
, etc.) or nothing - then have testing automatically run this a 1,000 times
- then you either find an error or at least demonstrate due diligence
- Hand-coded
- Treat spurious failures as candidate threading issues
public class ThreadJigglePoint
{
public static void jiggle(){}
}
Incrementalism
- big changes bad
- it's tempting to make massive changes to a program structure to "improve" it
- it's very hard to get a program working the same way it worked before the "improvement"
- incremental changes good
- use TDD to help
- make the small changes, run the tests to ensure everything still works
- always keep the system running!
- refactoring is like solving a Rubik's cube
Refactoring can take a long time, will be iterative, can involve reversals of previous changes before being done.
SerialDate
is a class within a Java package.
Unit Tests
- existing test suite covered about 50%
- Before beginning to refactor this class, he created a suite of tests to provide full coverage of the class's functionality
- These tests did not all pass from the start--they defined the behavior he thought the class should have
- New test suite started at 92% coverage (with some tests commented out to start)
- run tests after every change to keep it working as you refactor
A helpful summary chapter!
This was more or less a summarizing of previous points in the book under a helpful set of categories.
Concurrency is really hard.
This is just reference code for an earlier chapter in the book.
Reference for earlier chapter in the book.