Skip to content

Pull request review guide

Andrea Aime edited this page Oct 23, 2016 · 14 revisions

This guide tries to provide some guidance about pull requests code reviews, trying to help both newbie and experienced reviewers to provide a proper pull request review.

First off, before reviewing, read and consider what we ask of contributors in the official contribution guide, as anything in there could be linked to as a minimum requirement for a pull request (GitHub also links at the document every time a new pull request is made... which is too late, but better than never).

When reviewing be clear between review directives (iso speak MUST,FIX,SHALL), and suggestions or ideas for improvements (iso speak SHOULD, MAY, IDEA).

Fast formal checks

Warning: contentious content follows.

Pull request reviewing is hard business, as a reviewer you're often challenged between not having enough time for a thorough review, and the desire to give at least some feedback making sure the contributor does not feel abandoned.

If you are in a pinch there is a set of review steps that you can perform in a short time:

  • Contribution agreement. Check if the contribution requires a contribution agreement, and if it matches the requisites, verify that a contribution agreement has been provided to OSGeo, if not, demand for one.
  • Presence of tests. Given a large code base, that pretty much nobody really can hope to master fully, the large number of external contributors, as well as focused changes to the code base, and the fast evolution of the code base, tests are really the only line of defense against accidental breakage of the contributed functionality. That is why we always demand to have at least one test, it's not a "punishment", but a basic guarantee that the fix or new functionality being contributed will still be there, and working, a bare few months. So always check there is at least one test, one that would break in a evident way if there ever was a regression in the contributed code.
  • Presence of documentation. For functional, user or protocol facing changes, check that some basic documentation has been contributed. Almost no one will know the functionality is there if we don't have documentation on how it works. As a plus, documentation will help you understand quicker what is getting contributed, both from a high level view, and as a guide though the changes being made.
  • Presence of a proposal, if required. For any large change our process demands a formal proposal to be made and discussed in the community before a pull request gets made. If the pull request is made anyways, point them to the process and warn that the community might eventually request changes in the approach
  • Copyright headers are present. Every source file needs a copyright header to be present. For new files demand the contributor to add the header, for files that are being modified and lack one, gently ask if the contributor could add one.
  • No commented out code sections. The version control is able to tell differences between existing and past versions of a certain file, thus, the commit should not contain commented out code sections.
  • Javadocs and comments. Public classes and methods should have a minimum of javadoc (a simple sentence explaning what the method or class does will be sufficient), difficult parts of the code should have some comments explaining what the code does (how it does it, is evident by the code). Comments should be professional (no personal opinions or jokes), current to the code (no need to explain what was there before, unless there is a high risk of it coming back), with no reference to the comment author (in case we need to know that information, a git blame will do)
  • Code formatting. The project has code formatting guidelines embodied in a Eclipse code formatting template, for every other IDEs it's basically the official Java coding conventions (spaces instead of tabs) with "long" lines.
  • Reformats and other changes obfuscating the actual patch. We recommend contributors to limit changes to a minimum and avoid reformatting the code, even if the existing code is not following the existing coding conventions. Reformats can be put in separate commits if necessary.

All of the above checks are quick to perform, and do not require significant knowledge about the code being modified.

Providing just them on one side can be seen as a follow up, better than no attention at all, on the other side might come out as petty and discourage the contributor. If the above are the only ones performed due to lack of time, it's probably a good idea to state in the review that some formal checks have been made, but that a more substantive review will follow.

Substantive checks

The following checks are the real "meat" of a review, are performed to make sure that GeoServer does not regress in behavior, or that does not grow some change that we won't be able to maintain, or that will impede proper evolution in the future.

  • Backwards compatibility. The change being proposed should not hamper backwards compatibility, every change must be performed ensuring that users can keep on upgrading the GeoServer behind their services and application without functional regressions.
  • Standards conformance. The change being proposed must not be at odds with GeoServer OGC standard conformance, in particular, after the change the CITE tests must keep on passing properly, and there should be no visible violation of known standards. This is not to say that we cannot occasionally break compliance, but that doing so must be controlled by a configuration flag that the admin chooses to enable, and that by default, the flag is turned off. Extensions to the standard are also welcomed, but are better discussed on the list and setup in such a way that they look like a natural extension.
  • Performance. The change should not introduce evident performance regressions. This is not to say that every pull request must be load tested, but some attention should be paid during the review to changes that might be damaging in those respects, looking for CPU hungry code or heavy memory allocation
  • Leaks. A java server side application like GeoServer should be able to run for months uninterrupted, thus particular attention should be paid to resource control, in particular resources that ought to to closed (files, connections, pools in general), significant allocation of classes with finalizers.
  • Thread safety. GeoServer is, like all Java server side application, serving one request per thread. In this respect thread safety is of paramount importance. Be on the lookout for lazy initialization, stateful classes shared among threads, thread locals that fail to be properly cleaned at the end of the request, and static fields and data structures in general.
  • Good usage and fit with the existing code and architecture. The code is easier to understand and maintain when it follows common pattern across the code base, and when there is little or no code duplication. Check the pull request for conformance with the existing code, and proper usage of existing facilities.
  • Use of the File System. Contributors should read and write to resources using the ResourceStore API whenever applicable and only convert Resources to Files when absolutely necessary (for example, for a third party library).
  • Proper module usage. There is often a strong temptation to put new functionality in core as opposed to a new community module. If this is the case, verify the functionality is indeed core worthy, that is, relevant for many users, properly documented, has core developers interested in maintaining it long term, and heavily tested.
  • IP checks. When there is evidence that some of the code is coming from a different code base, check the contributor actually has the rights to donate it to GeoServer, and that the original licence is compatible (or that the author owns the code, and can thus relicense it under the GPL terms).
  • Current Java version and library usage. Check the new code uses the current version of Java (e.g. foreach, try with resources, generics, lambdas), and current library facilities (JUnit, Spring) instead of using outdated structures, rolling its own replacements or adding new dependencies. Attention should be paid to patterns that while elegant, might incur in significant overhead in performance sensitive areas of the code (e.g., arrays vs collection, inheritance and overridden methods, and other forms of abstraction above the "bare metal").
  • Malicious code. While unlikely, a pull request might contain malicious code to create, by design or accident, openings in the security of GeoServer that an external attacker might use. Attention should be paid to input checks, XML expansion attacks, reflection though serialization (which can be used to generate a remote execution attack).

This kind of review often requires some experience with the code being modified, and will require more time and focus than the quick formal checks described in the previous section.

A note for community modules

Community modules are a playground where little or no control is exerted, in order to ease up the development of new functionality and allow the contributors of it to come to terms with the relatively "high bar" of GeoServer code gently.

Thus, most of the checks above should be toned down from demands to suggestions. However, copyright assignment and IP checks, when required, still have to be performed so that we don't get tainted code among the sources.

Clone this wiki locally