Code conventions and style - leondutoit/data-centric-programming GitHub Wiki

When I first started working in an engineering company - as a data scientist doing mostly infrastructure work for the first year - I knew very little about software engineering practices. I was able to code up my data analyses in executables, and understood many of the core concepts of general purpose programming, but I was surprised to encounter such strict coding style rules and conventions. I had just never given it much thought.

Anyway, it turns out that Python has an official style guide called the PEP 8 Style Guide for Python Code - which is more a strong recommendation than anything else. Read it and be wiser.

While there is no "official" equivalent in the R world a common reference point is the Google R Style Guide. Hadley Wickham also discusses some helpful style conventions in his Advanced R book.

Style guides and coding conventions are meant to be helpful, not weird and arbitrary. It turns out that code has many more readers than authors most of the time. It is also very uncommon for code to have just a single author. Or in the case that a specific piece of code is written by one person, most professionals would have their code reviewed by others before merging it with the existing code base. This can only be truly appreciated once you start to read others' code on a daily basis.

All of this implies that the author of the code must strive for legibility and consistency. And this is where a common standard for code style is incredibly helpful. Wherever you work and with whomever you collaborate on coding my advice would be to agree on a style guide and to stick with it :)

Naming things

Things in programs have names. Modules, classes, functions, variables, tests, parameters - all of them have names. Those names should help the reader and the writer of the code to make sense of what is going on. To give a concrete example, consider these two code snippets, each of which do the exact same thing:

With bad names:


def a(i):
    x = sum(i)/len(i)
    return x

def morea(*args):
    return map(a, [args])

With good names:


def list_average(input_list):
    ave = sum(input_list)/len(input_list)
    return ave

def multiple_list_averages(*args):
    input_lists = args
    averages = map(list_average, [input_lists])
    return averages

In the second case there is intentional descriptive content added to function names, variables and parameters. This makes for much easier reading and understanding. In general one should strive to give useful names to everything - your readers (including your future self) will be thankful for it.