Visual test system - wch/ggplot2 GitHub Wiki

Visual test system overview

When working on the internal components of a complicated system like ggplot2, it's easy to inadvertantly introduce bugs. Unit tests (with the testthat package) can help detect new bugs and prevent regressions, but in many cases it's practically impossible to write a unit test for a bug that involves graphical output.

I've written a package for visual testing designed for the needs of ggplot2, but it can be used for other graphical packages in R. Here's a rough sketch of how it works:

  • You write test scripts that will generate images. These are part of your package. For example, ggplot2 has tests in ggplot2/visual_test/.
  • vtest runs the test scripts, and saves the images to a test store, which is in a subdirectory, e.g. ggplot2/visual_test/vtest/.

After test results have been added to the test store, they can be used as a a reference to compare against in the future. In a typical use scenario, you make some changes to the code, and then want to check if the code changes altered the appearance of any visual tests. The goal of this visual test system is to make it easy to check for changes, and display them visually.

Here are some example results:

Visual test functions

See the Visual test system instructions page for information on how to use vtest with ggplot2.

  • vtest(): Runs a set of test scripts, similar to test() in devtools. The format of test scripts is described below. The test scripts generate PDF images, which should then be committed to the git repository to serve as a reference for future tests.
  • save_vtest(): This "prints" a ggplot object to a PDF file, and records information about the test in a small database.
  • vtest_webpage(): Generates web pages that display the results of visual tests. Because these pages only show the current output of the test scripts, they are probably not as useful as the comparison web pages generated by vdiff_webpage().
  • recent_vtest(): Reports recent commits and their associated resultsets, if they have been stored in the database.
  • vdiffstat(): Reports what changed between a given commit (the default is HEAD, the previous commit) to another. It can also compare a commit to the (uncommitted) working tree.
  • vdiff_webpage(): Generates web pages that display the changes between one two commits. It can also compare a commit to the (uncommitted) working tree.
  • check_vtest_db(): Checks the integrity of the results database.

Test scripts

The test scripts are easy to write. Here's a simple one that generates two test images. It should be saved in visual_test/aspect-ratio.r. Note that the file name matches the name given to vcontext().

vcontext("aspect-ratio")

dat <- data.frame(x=1:8, y=1:8, f=gl(2,4), expand.grid(f1=1:2, f2=1:2, rep=1:2))
p <- ggplot(dat, aes(x, y)) + geom_point()

# If you use save_vtest, it will use ggsave() to save the previous ggplot object
p + opts(aspect.ratio=3)
save_vtest("height is 3 times width")

# If you use save_vtest2, you can pass it an expression, which will be stored in the database.
# You can use base or grid graphics in the expression.
save_vtest2("height is 3 times width, 2 wrap facets", {
  p + facet_wrap(~f) + opts(aspect.ratio=3)
})

end_vcontext()

The test context is started with vcontext(). Then you generate your ggplot objects, and save them using save_vtest(). The save_vtest() function requires you to supply a short description of the test. This description is run through a hash function, and the resulting hash is used as the filename -- the hash is where the random-looking letters and numbers come from. Each description must be unique, otherwise the filenames will be the same, and some of test image files will be overwritten.

Finally, you close the test script with end_vcontext().


Internals

Requirements

The vtest system has certain expectations about the code base that it's used on. The code:

  • Must be an R package. (In the future, it may be possible to use this on code that's not a package)
  • The package must be stored in git. (In the future, other systems like svn may be possible)

Database format

There is a small database that stores information about the test results. It consists of two tables: the resultsets table and the commit table. They are stored in visual_test/vtest/resultsets.csv and visaul_test/vtest/commits.csv. There is an additional table, last_resulttest, that stores just the results from the last-run test.

resultsets table

When vtest() runs a set of tests, it generates a data frame of test results, which I'll call a result set. (The result set is saved the last_resulttest table.) The result set is MD5-hashed using digest(), resulting in the resultset_hash. This hash is used to efficiently store and identify the result set.

The resultsets table consists of the following columns:

  • resultset_hash: MD5 hash of the result set
  • context: The user-specified context, such as "aspect-ratio" or "dotplot"
  • desc: Description of the individual test
  • type: File type (only "pdf" supported right now)
  • width: Width (inches)
  • height: Height (inches)
  • dpi: Pixels per inch, NA for PDF. (It really should be ppi, but I use dpi to be consistent with other R functions)
  • err: Error state from running the test: "ok", "warn", or "error"
  • hash: MD5 hash of the output file
  • order: The position of the test, within the context (1, 2, 3, ...)

The image files are identified by a MD5 hash. This makes it easy to detect changes: if the hash stays the same from one test run to another, then this means the file stayed the same; if the hash changes, this means that the file changed. The files are also named to match the MD5 hash of their contents. This makes them easy to find and access for making comparisons, and also helps to avoid storing duplicates of the same file.

At the time of this writing, there are 87 visual tests. This means that each new result set adds 87 rows to the resultsets table. This could lead to a very large resultsets table over time, but fortunately the commit table makes for much more efficient storage.

commit table

The vast majority of commits to the ggplot2 repository should result in the same result set as the previous commit, and therefore resultset_hash should stay the same for most commits. The commit table makes use of this fact, and stores the following:

  • commit: The git commit of the ggplot2 project
  • resultset_hash: Same as above

With the tables structured this way, most commits will result in one more row added to the commit table.

Database integrity

There will be functions to test and repair database integrity: checking that hashes match, that there are no commits with missing resultsets, that each file hash corresponds to an actual file (and vice versa), and so on.

Files

This is how the files are stored. It automatically creates a directory at the same level as ggplot2, called ggplot2-vtest. For any package xxx, it will create a directory xxx-vtest.

--ggplot2/
  `-visual_test/
    |-aspect-ratio.r  # Test scripts
    |-dotplot.r 
    |
    `-vtest/
      |-resultsets.csv    # Table of resultsets
      |-commits.csv       # Table of commits and resultset hashes
      |-images/
      | |-02d76d47ed0fa35cc327d1e29ef1db7a       # Test result images
      | |-78bf1c3aaadcab140ceee08a72858cf6
      | `-f660282385f5b6d390d26903f39651dc
      |
      |-lasttest/
      | |-resultset.csv                          # Table of last-run test
      | |-02d76d47ed0fa35cc327d1e29ef1db7a       # Test result images from last test
      | `-78bf1c3aaadcab140ceee08a72858cf6
      |
      |-pngcache/
      | |-02d76d47ed0fa35cc327d1e29ef1db7a.png   # Cached png-converted images
      | `-78bf1c3aaadcab140ceee08a72858cf6.png
      |
      |-html/
      | -index.html                              # Web page of test results
      | |-02d76d47ed0fa35cc327d1e29ef1db7a.png   # Test result images converted to png
      | `-78bf1c3aaadcab140ceee08a72858cf6.png
      `-diff/
        |-index.html                             # Web page comparing test results
        |-02d76d47ed0fa35cc327d1e29ef1db7a.png   # Test result images converted to png
        |-78bf1c3aaadcab140ceee08a72858cf6.png
        `-78bf1c3aaadcab140ceee08a72858cf6-f660282385f5b6d390d26903f39651dc.png # Diff image

The images are stored in the images/ directory. Each image is simply renamed to match the MD5 hash of its contents. This makes it very easy to avoid storing duplicates of the same image.

These files/dirs are meant to be permanent stores of data:

  • resultsets.csv
  • commits.csv
  • images/

These dirs are ephemeral, and can easily be re-generated:

  • lasttest/
  • pngcache/
  • html/
  • diff/

Store results in a git repository

The vtest directory can be stored in its own git repository. If you do this, then you can commit results and push them to a server for easy distribution (this is how vtest is used with ggplot2).

Retrospective tests

The vtest system presently runs tests on ggplot2 by reading the tests from ggplot2/visual_test. Since the visual test scripts are part of the repository, it's not possible to run them on old commits -- checking out an old commit would cause the test scripts to disappear.

I think it would be simple to read the scripts from a different directory -- one outside of ggplot2, and this would make it possible to run retrospective tests. For example, you might come up with a test today that you'd like to run against an old version of ggplot2. But to do this, you'd probably want to use a new database -- the current format doesn't deal well with multiple sets of tests for a given commit.


Why PDF files?

I initially tried to use PNG files, but encountered problems with consistency. I found that there are many ways of outputting PNG files from R: using the x11 device, the quartz device (on Mac only), with cairoPNG() from the Cairo library, and with Cairo_png from the cairoDevice library. The results are different between methods, and they're also inconsistent across platforms, especially with the fonts. There's a test script here if you'd like to try it out.

I thought at first that PDF files would be inefficient for storage, but it turns out that they have many advantages:

  • The PDF output is consistent across platforms (except for a small header that needs to be altered; see below.)
  • Uncompressed PDFs are plain text documents, which means that they can be diffed, and changes can be stored very efficiently in a git repository.
  • The uncompressed PDFs generated by R and ggplot2 share a lot of content between different files. There is probably a lot of standard stuff in every PDF generated by R.
    • To test this, I created a new, empty repository, then added a lot of PDFs generated by this test system, totaling about 3.4 MB. After running git gc --aggressive to shrink the git repository as much as possible, the size of the .git/ directory was 165 KB. Clearly, git can store these files very efficiently.

It probably requires more testing to make sure the PDFs are consistent across platforms, especially on Windows. I also don't know if R's PDF engine has changed over time. If so, this is something to keep in mind.

The PDF header

The PDFs generated by R have a header containing the creation and modification time. This is a problem because it results in a slightly different file each time you generate an image, even if you use the exact same code each time. I dealt with this by rewriting the headers to set the times to zero.

PDF to PNG conversion

To view the files, the test system can generate web pages that display the images. Most web browsers won't display PDFs as regular images on a web page (Safari is the notable exception), so in creating the web pages, the images are converted to PNGs by default. (If you use Safari, you may choose not to convert the files, which saves time.)

It supports two conversion methods: convert from ImageMagick, and gs from GhostScript (this is the default). Both of these are external to R.

To convert PDF to PNG, ImageMagick actually just uses Ghostscript. I had serious problems with color consistency across platforms using convert. On the other hand, gs gave consistent colors across platforms, and was also slightly ~(10%) faster.