review thoughts parsl - benclifford/text GitHub Wiki

These are some rambling thoughts and personal opinions that have come up while doing code reviews in Parsl. (they are not the official policy of the parsl project)

Code structure

When writing an if statement, always use an else unless there's a good reason that you can justify - if you believe it's an impossible situation that will never be reached, make an else block that raises parsl.errors.InternalConsistencyError with text describing why you claim this situation will never be reached. (this is similar in concept to putting in assert statements for "impossible" situations)

Don't use argument defaults in a function definition when a parameter is always specified at the functions call sites: defaults should usually be for user facing code where there is a clear meaning to a default value. Providing defaults that aren't ever used are noise and a potential source of bugs (when a default is accidentally used...)

With functions with several arguments use named keyword arguments, rather than positional arguments, and use * to enforce their use as keywords. (This combined with not using defaults, to catch missing kwargs - see unused Logan thread code)

Alongside using kwargs: don't put in defaults that are then overridden by other pieces of parsl always (so that the defaults are never used). - this can lead to missing parameters not being discovered automatically because they default - cite a PR bugfix as an example. and can lead to misunderstanding which default value is used by readers of the code.

when you're naming a parameter that switches a feature on and off, avoid negatives: if the previous behaviour was on, make the default value True, rather than calling the option "inhibit_previous_behaviour" with a default of False.

Exceptions

Don't catch exceptions... I know you really want to, but you're probably not in a place to do something with the exceptions. Things you often can do:

  • log and reraise.
  • Pass them on through a different form (eg as a Future set_exception. Htex permanent failure. Networked message delivery).

Don't have a mindset that you should be hiding your errors because your code is not very good and will inevitably crash - you do a disservice to the users later on who wonder why mysterious silent failures happen.

... but liberally raise exceptions: for the same reason that you probably aren't in a good place to catch exceptions, you're probably not in a good place to deal with other errors and you should usually prefer to raise an exception rather than trying to be clever in handling that error.

Beware exceptions falling off the top of threads and tasks - this is a place where exceptions don't "naturally" propagate and you should put some thought into what happens: how can you catch at the very outer layer of your thread and pass the exception on in different form? This especially causes problems in Parsl for hanging bugs that are summarised as "loop that must run forever stops running forever but nobody cares"

Type annotations

Use type annotations.

If a function is user-facing, use typeguard to check the parameters when the user makes a call.

If a call is internal, use mypy to check the call statically (eg at PR merge time).

If you're finding it hard to write a type signature, that might be a sign there's something wrong with your logic - but gradual typing in Python can be hard, so don't over-exert yourself. In that case, it is fine to use # type: ignore, but you should describe in a comment right there why an annotation is not possible.

Constant numbers

Constants that aren't 0 or 1 might be suspicious - why did you choose a 30 second timeout or a 10ms poll period vs other values? Perhaps make them into a named constant value defined somewhere, and put some discussion in the documentation: why did you choose that value? what happens if the value is much bigger? or much smaller? [this is coming from somewhere else that I've read about this - should be a reference to link to]

Abbreviations

Don't use abbreviations except in cases where those abbreviations are already extremely well defined. Just because you think everyone knows an abbreviation you just made up doesn't make it true. Especially when everyone isn't deep in the codebase.

Testing

Tests are not your enemy - don't get into the mindset that you are fighting them.

When a test fails, ask yourself "when/how would a user experience this problem?" - imagine the tests as a user trying to use your code.

Include a test to demonstrate that your feature works! (otherwise how will people reading your code know?)

static type checking with mypy is a form of testing, not an enemy of testing.

consider both unit tests which test the behaviour of small pieces of your code, and broader integration tests which check a whole Parsl run using your feature.

Don't assume someone else has already written the tests that will check you didn't break something - if the test suite passes, it's only the tests that have been written that pass!

Making PRs

Who is the audience? Some people looking at your PRs will be:

  • other developers tracking ongoing parsl development
  • users upgrading parsl and wanting to see what changed (git log is the changelog in parsl)
  • developers weeks or years in the future who are running git bisect to understand why your PR broke something

Titles for PRs - imperative. There should be a link for this: edit PR template.

Stay on topic: don't make changes that are irrelevant to the subject line of your PR. Stay focused. This is really helpful for future developers who are trying to understand the history of the code. Maybe these other changes are sensible changes - in which case they will stand as their own PR. PRs are cheap, and its easier to review two orthogonal PRs than one that merged them. (Both now and when working with history later)

Avoid "here's everything in my hacked up dev copy of parsl" PRs - pick a topic and stay on topic. Use git branches to separate work on different topics. don't be scared of making a new branch for a small change. (and don't be scared of making a small PR for a small thing: a one byte typo fix is perfectly legitimate as a PR)

Don't do "quick tidy ups" inside a PR with a different topic: make a tidy up PR with just that tidyup.

Write tests. If someone points out a problem in your PR, that should be a suggestion to test the problem. Consider the types of test: if you're fixing a bug that you think might happen again if someone else perturbs the code, write a regression test (link to eg monitoring colour tables test). If you are making claims about how your code behaves in comments perhaps you can make those claims in a test too - then on every commit, those claims can be checked.

Commit messages - here are some general guides that aren't exactly how things have happened in parsl. Have a read of recent commit messages using git log to get a feel for the general parsl style.

Your PR will be squash -merged. Hopefully whoever merges it will remember to take your PR message as the commit message. Because it is squash merged, you can merge master, make minor fixes, rebase or not rebase (assuming you are working on the PR branch alone). But also because every PR is squash merged, if you want different things to appear in git history as different commits, you need to make them as different PRs

Leave the campground cleaner than you found it - but don't make tidyups that are off topic - see above about separate PRs.

Text in the PR should describe why/how things changed. Text in the code base (including documentation) should describe how things are (ignoring history, mostly)

Version compatibility

Parsl doesn't have a strict cross-version compatibility rule, but you should try to avoid breaking cross version compatibility.

If you break cross version compatibility, you should be aware that you are doing so, comment in your PR how you are breaking it, and give reasons why.

For understanding what it means to break compatibility, consider this situation:

a user has a working parsl setup which they are regularly using, using the current master branch. They replace their Parsl install with your PR and change nothing else. What will break? What else do they need to change?

  • workflow definitions?
  • configuration parameters?
  • Will the monitoring database schema now be out of date?
  • Perhaps they have written plugins using some of the parsl plugin APIs - have those APIs changed (syntactically or semantically?)

Human text

Human text: anywhere but comments! For example if you are writing text for a reader (the only reason to write text) might that reader get better access to the text if it is in:

  • A docstring
  • A log message
  • An exception message
  • An argument, method, class name
  • A type annotation
  • An assert statement - but asserts should only be checking for programmer error (that programmers have incorrectly reasoned about the code) not for user error (for example, checking if a user passed in an invalid argument). C.f. 4xx you did something wrong Vs 5xx I did something wrong in HTTP

Claims about behaviour can be tested in the test suite - more verified (both when your PR is merged, and onwards into the future) than raw human-readable comment text.

Logging

What should be at which level?

Error - usually (but not always) would expect an exception to be raised here right after the error log.

don't hide your errors. for example, don't have a mindset: my code isn't very good and likely to crash, so i should be careful that no one sees the inevitable errors.

at INFO level: state-change on a "business entity" - for example, tasks, blocks, batch jobs

at DEBUG level: tracing of non-state-change activity or extra info that is usually regarded as noise but might be helpful. for example, progressions through an algorithm (for example, rich logging about behaviour of saling strategy code). At DEBUG level, avoid formatting strings using f-strings or format: usually DEBUG messages will be discarded so this can be expensive string rendering unnecessarily. instead use logger's formatting such as logger.debug("value is %s", value) to defer formatting.

Don't log periodic polling unless it is especially hang prone.

don't put ! at the end of messages. express intensity of surprise using log levels.

Consider the future

Responsibility

Be prepared to take responsibility for any breakage your PR introduces, even though you didn't foresee that breakage when the PR was merged (credit to squeak dev guide and perhaps link?).

Support and maintenance

Consider who will maintain this feature in one year's time - generally there are no magic parsl angels to take over maintenance of your contribution. Are you ready to support people when they start using your contribution in a week or a year?

If your instant reaction is: I hope no one uses it so I don't have to do that, then maybe you should be publishing your code in your own repo, rather than pushing it into mainline parsl

Support in that context means: helping users use your contribution, making follow-up PRs to fix the problems that are inevitably discovered

Plugins

If you're implementing several modes and can imagine potential for other modes that you aren't implementing now, consider a plugin interface rather than an enumerated list of modes.

Code formatting

don't reformat code to your own personal preferred style

related, don't configure your editor to your own personal style and then push reformats as part of other changes

Big formatting changes: if you want to introduce big formatting changes across the codebase, introduce an appropriate check into the CI workflow (as happens with flake8) to preserve/enforce that formatting rule as the codebase continues to change.

don't commit cross codebase reformats as an "easy first contribution" - it might have been easy for you to press the reformat button. it's not super easy to review. it probably makes a merge conflict that needs manual fixup for every open PR and every devlepment branch

Moving code

When moving code around for tidyup, don't make bugfixes "on the way": if you're trying to debug why this broke things, it's easier to be able to tell if it was something with the move or something with the alleged "fix"