Revdep checks - Rdatatable/data.table GitHub Wiki

Revdep (reverse dependency) checks are required by CRAN, to ensure that any new version of data.table does not break other CRAN packages that depend on it.

Related team

https://github.com/orgs/Rdatatable/teams/revdep-managers

Run on your local machine

If you want to run revdep checks on your local machine, there is some code here: https://github.com/Rdatatable/data.table/blob/master/.dev/revdep.R but that may take a long time if not parallelized (10-20 days).

Interpret results computed on NAU Monsoon

Toby Dylan Hocking @tdhock maintains a revdep check system which publishes the results on web pages linked in this directory https://rcdata.nau.edu/genomic-ml/data.table-revdeps/analyze/ This system runs each of the 1400+ revdep checks in parallel on the NAU Monsoon compute cluster, so we can get all results in less than 12 hours. Every day at 00:01 MST (1 minute past midnight, Mountain Standard Time) a check is started with current R-release, R-devel, data.table master, and data.table CRAN release. The code that is used is this git repo, https://github.com/tdhock/data.table-revdeps and as of 28 Nov 2022 the checks are on all dependencies ("Depends", "Imports", "LinkingTo", "Suggests", "Enhances"). The top of a typical result web page is shown below. It shows what versions of R and data.table were used for the checks.

image

For each version of R, each revdep is checked with data.table master and release.

Significant differences table

If there are any differences found in the check results, then there will be a row in the "significant differences" table, example below:

dt-revdeps-sig-diffs-top dt-revdeps-sig-diffs-new

The significant differences table is sorted by the first column, which is the first bad commit which git bisect found which causes the problem. So you can easily see if there are any revdeps which may have similar issues (resulting from the same data.table commit/pr).

Links are:

  • first.bad.commit: commit on github -- this is useful for determining the commit/PR where the problem started.
  • Package: log file from running the revdep checks on monsoon -- search this log for the new bad check to see additional details.
  • CRAN: current check results on CRAN using data.table release on a linux machine, for comparison (hopefully should be same as release column which was computed on Monsoon).

Significant differences fixed since last check

Also see below for an example of how it looks when a significant difference in the previous check has disappeared in the current check: dt-revdeps-newly-fixed

Steps to report a new revdep check problem

  • First make sure that the issue/difference is real, by looking to see (1) if it was found in other recent checks (for example, the previous day), (2) if it occurs in both R-devel and R-release, (3) if result for data.table release equals result from CRAN, (4) if git bisect found a non-trivial commit (trivial is when commit/parent is same as git bisect new/old, as in exDE above), and (5) if the issue is in master (not release, see https://github.com/Rdatatable/data.table/issues/5733 for an example of an issue which only happened with data.table release and R-devel, after making a fix in master).
  • Then search for the package name, and commit/pr where the problem started (we group revdep issues by what commit/pr caused them), in the data table issue tracker, https://github.com/Rdatatable/data.table/issues to make sure there is no existing issue already. If an issue already exists, just add a new comment on that issue. Otherwise, create a new issue.
  • Describe in issue comments at least (1) a brief description of the problem, (2) how to reproduce it, and (3) a link to the commit/PR where git bisect says the problem started happening (first.bad.commit column).
  • Optionally, add (4) @mentions to people who authored the commit/PR where the problem started happening, and (5) a minimal reproducible example. (sometimes it is not easy to create a MRE, but if you can then it would likely be useful as a test case for data.table)
  • Example with minimal info and a mention: https://github.com/Rdatatable/data.table/issues/5544
  • Example with more info/analysis and a minimal reproducible example: https://github.com/Rdatatable/data.table/issues/5536
  • If the issue should be fixed by the package which depends on data.table, then please look on CRAN for how to contact the maintainer (github, email, etc), and ask them nicely for a fix using this revdep issue template.

When is a breaking change warranted?

Here are some historical examples of breaking changes that have been allowed:

A common trend in the examples above is that we create PRs for revdeps, and give plenty of time to revdeps to merge/fix, before we submit new data.table to CRAN.

When is a breaking change not warranted?