Coding guidelines - ProofDrivenQuerying/pdq GitHub Wiki

This page gathers coding guidelines we should agree on when PDQ for the project. During code reviews, the reviewer should ensure these guidelines are respected.

Code-level documentation

Every package, class, method and attributes, must have up-to-date and concise documentation.

Javadoc blocks come directly before the items being documented, and may feature '@'-prefixed annotations, used for prettier HTML formatting.

/**
 * Description of SomeInterface
 * @author Author Name
 */
public interface SomeInterface {
}

For packages, this requires an additional specific file called package-info.java, stored in the package it self and containing the package name, preceded with a typical javadoc block.

/**
 * Here comes the description of the package
 *
 * @author Author Name
 */
package uk.ac.ox.cs.pdq.planner.parallel;

Javadoc

Javadoc is first meant to publish documentation is HTML. It is good if readable from the source (that always helps), but when writing javadoc, one has to primarily think of how it will fit into the overall document.

There are simple rules to follow:

Do not use special character, prefer HTML-friendly stuff instead (in my last commit, I had to replace some illegal character, that prevented the whole documentation to compile). Javadoc also comes with a few key words to obtain special formatting, links, etc.
Each package, class, method, variable should have a one line description (which will be automatically used as a short description in the upper-pages), and only if necessary paragraphs with more details below. The doc in staging used to generally follow that rule, but review has broken most of it. The main reason seems to be poor formatting due to copy/pasting in the Javadoc block. The way Javadoc is currently written in most of review is not even picked up by the compiler.
Do not copy/paste description blocks between package, classes, method, etc. This makes documentation harder to read and maintain. In case where you'd rather reuse documentation for something from the super class, you can use {@inheritDoc}, which will automatically insert the appropriate description.
Do not list the package/classes structure under a package in the text itself. This is automatically done, and would create redundancy.

Note that the Javadoc compiler conveniently warns you when some return, param or throw attribute is not set correctly.

About details in the documentation, we have to find trade-off between what goes in the source, and what goes in the wiki. It is not always easy to draw the line, but as a rule of thumb, doc that is not about high-level description, specific algorithm, the contracts of an item (inputs, output, pre-/post-conditions, invariants, exceptions), or examples of how to use an item, should be written elsewhere, for instance the wiki.

Code formatting

To ease code comparison, it is good to all use the same code formatting conventions. What is used currently is the "Java Conventions", which comes as a built-in style in Eclipse. If it is not set so, you can do it from the Preferences menu: Java>Code Style>Formatter

To format the code according to the style automatically (e.g. before a commit), right-click on the project, then Source > Format.

Obsolete and unused code

Code that is not used should be kept in the repository only if:

its purpose is clear, its fit in the architecture, its documentation and tests are being maintained,

Other code should be deleted. It can be restored via version control.

This also applies to tests.

Preconditions vs. assertions

There are currently two concurrent ways consistency is check in the code.

Java's Assertions

Assertions are Java's own mechanism for checking consistency. They take the following form: assert boolean-condition; assert boolean-condition : "Optional error message";

At runtime, when the condition is not met, an AssertionError is thrown.

The use of assertions has both pros and cons:

Pros: they can be turned-off at compilation time to improve performance
Cons: they throw Errors rather than Exceptions, which cannot be caught without compromising the consistency of the JVM.

Guava's Precondition

These are static methods to allow l e to check certain properties are any given time, and throw a RuntimeException if the conditions are not met. For instance:

Preconditions.checkNotNull(bla): throws a NullPointerException if bla is null
Preconditions.checkArgument(arg >= 0): throws an IllegalArgumenException is arg is negative.
Preconditions.checkState(arg >= 0): throws an IllegalStateException is arg is negative.

Unlike assertions, preconditions cannot be "turned off". As it states in the documentation:

The goal of this class is to improve readability of code, but in some circumstances this may come at a significant performance cost. Remember that parameter values for message construction must all be computed eagerly, and autoboxing and varargs array creation may happen as well, even when the precondition check then succeeds (as it should almost always do in production). In some circumstances these wasted CPU cycles and allocations can add up to a real problem. Performance-sensitive precondition checks can always be converted to the customary form:

if (value < 0.0) { throw new IllegalArgumentException("negative value: " + value); }}

Choosing the right preconditions

There are cases where one wants to ensure an argument is not null. For this, there are two options:

Preconditions.checkNotNull(arg)
Preconditions.checkArgument(arg != null). This is not a huge thing, but it can create problems in unit tests. The second option is arguably more informative (when this is thrown, we known right away, i.e. from the StackTrace, there is something wrong with the way the method is called). The suggestion is to reserve the firs option to check for null pointer inside the body of a method, for instance in a loop.

Bottom line: we should use assertions preferentially in CPU/IO intensive code to factor them out more easily in experiments (e.g. in the chase, explorers, runtime iterators), and reserve the guava-style preconditions for non-intensive, interactive, and user-facing code.

Message logging

There are two types of logging in PDQ. On the one hand, there is logging that it is used for coding, maintenance and debugging purpose and which is based on the Log4j library. On the other hand, there is performance logging specifically for collecting experiments results. This paragraph only addresses the former.

Committed code should only ever use System.out for functional reasons (not debugging). Likewise, the use of log messages should reflect the purpose of messages (e.g. log.info() in the code, which should really be log.debug()). The main reason for this is that log4j always stores fine-grained information of what gets logged, where and when. This can be tuned in each log4j2.xml file within each project's main/src/resources/. Any message logged directly through a System.out.println by-passes that and can also interfere with the performance logging mechanism.

Log messages have a CPU/IO cost even when not actually printed anywhere, so they should be avoided when not absolutely necessary, especially on parts of the code that is CPU-intensive (e.g. planner). Here is a set of simple rules:

log.fatal: use for message printing right before something known to cause a crash
log.error: use for problems (e.g. Exceptions and Errors) that threaten the stability or consistency of the application (e.g. trustworthiness of experiments results)
log.warning: use for problems (e.g. Exception) that can be worked around at runtime, but should be brought to the user's attention
log.info: use for messages that are always useful, e.g. events marking the beginning and the end of a important tasks.
log.debug: use for messages that carrying more details than log.info, that is useful for debugging but typically to detailed for common use.
log.trace: use for very fine-grained printing, e.g. what people would usually use System.out.println for. These should basically not be committed.

In the log4j2.xml the following logger level types are used

level="ALL" = Logs all the below mentioned types in a log file.
level="FATAL" = Logs all where log.fatal is used into a log file.
level="ERROR" = Logs all where log.error is used into a log file.
level="WARNING" = Logs all where log.warning is used into a log file.
level="INFO" = Logs all where log.info is used into a log file.
level="DEBUG" = Logs all where log.debug is used into a log file.