Understanding Jenkins - dmwm/WMCore GitHub Wiki

Introduction

The homepage for our Jenkins tests is https://cmssdt.cern.ch/dmwm-jenkins/ which is behind the CERN SSO service.

The test that ultimately decides what Jenkins thinks about your pull request is DMWM-WMCore-PR-test, which compares all the results from your feature branch against the baseline results collected twice a day, through DMWM-WMAgentPy3-TestAll for python3. Each of these tests also trigger other Jenkins jobs to perform other specific checks.

How to interpret Jenkins's comments on your pull request

Jenkins will post a summary message of how our tests and checks went into the GitHub conversation. It also produces a detailed report which is linked. The summary and report are broken down into three different sections:

Unit tests

Jenkins runs every unit test we have in the code base (currently about 1500) and compares them with the last time the baseline unit tests were run against the master branch (this should happen twice per day). It reports on the differences it finds. Your PR is failed if you add a failing unit test or cause any existing unit tests to fail that previously succeeded. Unit tests that are known to be unstable (test/etc/UnstableTests.txt) are reported on, but won't trigger a failure of your pull request.

Starting in WMCore tag 1.4.9.pre5, we have integrated Python3 checks to our CI Jenkins setup. So the same set of checks and comparisons will be performed against the Python3 unit tests. And starting in 2022 (WMcore tag 2.0.0), WMCore no longer supports Python2 and all its CI/CD code is getting removed; in a later stage, the future compatibility library will be removed as well.

If you do have a failure and want to see the unit test output in Jenkins, please follow these steps:

In your GitHub pull request page, there is a row and its status for each test executed by Jenkins (e.g.: Py3 Pylint, Py3 Unit tests, default, etc)
Click in the "Details" link, to the right side of the default check/row. This link will bring you to dmwm-jenkins page, which gives you a summary of your PR tests
Now in the Jenkins webpage, click in the "Test Result" link, which should show all the failed and successful unit tests.
finally, select which unit test test you want to look at, click on it and it brings up details of the unit test execution, status, traceback, etc.

False positives: Because of how the snapshot of the master branch is taken (twice a day), it happens occasionally that code recently added to master will also show up in your PR report. Those could actually not be affected by your feature branch, but from recent changes made to master.

Pylint and pycodestyle code quality checks

The Jenkins job DMWM-WMCore-PR-pylintpy3 runs a pylint and pycodestyle check on changed files on both the master (or other) branch you are pulling towards and your proposed branch. It fails your PR under some circumstances, usually when there are either Errors or Warnings (reported in bold). Please check the full report and make sure we have zero pylint messages - if possible - even if it is in a part of the code that you did not change. Things that are making the check fail must be fixed, unless it is not possible.

How to debug Jenkins when it doesn't produce results

In principle, none of these jobs is supposed to fail. It doesn't work out that way, though

Any of them can fail because they can't install the WMAgent RPMs from cmsweb
Any of them can fail because they can't contact GitHub
Unit test slices can fail because they get stuck or take too long, in which case Jenkins kills them
PR-27 can fail if the proposed code introduces a python syntax error. That never happens because everyone runs their code, right?

Any time a failure occurs, the job that started it also fails. If this is the case, DMWM-WMCore-PR-test fails and is restarted (up to four times). It is not restarted if it determines that your code was bad, only if there is a problem with the infrastructure.

Ok, so let's look back at that top level page: https://cmssdt.cern.ch/jenkins/view/DMWM/ You may see DMWM-WMCore-PR-test as red and with stormy skies (previous builds failed). You shouldn't see the sub-tests mentioned with anything but green and sunny skies.

Now, you can click on DMWM-WMCore-PR-test and see the history of tests on our pull requests. On the left you will see a list of pull requests and Jenkins build numbers. If you see a little green arrow in that column, that means Jenkins had to automatically restart the pull request for one of the reasons above. If not, that build was started because of a new commit or someone saying "test this please" on the pull request.

Now lets say you want to see why the unit tests were unable to complete.

Click on the build # next to the red dot
Click on console log (these two steps can be combined with a dropdown)
Scroll down to the bottom and you will see something like

Waiting for the completion of DMWM-WMCorePy3-PR-unittests
DMWM-WMCorePy3-PR-unittests #642 completed. Result was FAILURE

This tells you the build # of the sub-build (642). Click on that link, not the generic one for all DMWM-WMCorePy3-PR-unittests
Scroll to the bottom. You will see the status of each slice. Yellow is OK (tests failing). Red is not. Find the red one and click on that
Click on console output (again these can be combined using the drop down)
Usually at the end you will see the problem. It should be one of the causes listed above.

The steps to diagnose pylint or "27" (the python 2.7 future checker) are a easier since they don't have slices. And in the "27" case at the end you might find something like this:

RefactoringTool: There was 1 error:
RefactoringTool: Can't parse src/python/CRABInterface/HTCondorDataWorkflow.py: ParseError: bad input: type=8, value=u')', context=('', (643, 107))

This means that line 643 of the proposed version of HTCondorDataWorkflow.py had a syntax error.

Tasks executed by the CI Jenkins setup

This is a short summary and break down of every single job/project executed in the DMWM Jenkins setup. Our current WMCore CI Jenkins setup can be seen in this diagram: WMCore_Jenkins_Diagram

However, in short:

DMWM-WMCore-PR-test: is the entry door for the pull request checks, in charge of running the full suite of checks for a PR.
DMWM-WMCore-TagBaseline: runs twice a day, creating a new baseline tag and running all Python3 unit tests (spawning dozens of unit test slices).

HEAD/master tasks

DMWM-WMCore-TagBaseline This one is simply creating and pushing a tag to the repository (and deleting old tags), such that we know what the baseline is (which is compared against each PR results), this is done via docker container. Once this job is completed and successful, it will also trigger: DMWM-WMAgent-TestAll and DMWM-WMAgentPy3-TestAll.

DMWM-WMAgentPy3-TestAll Exactly the same as DMWM-WMAgent-TestAll, but running on a Python3 stack. It triggers the jenkins job: DMWM-WMCorePy3-UnitTests

DMWM-WMCorePy3-UnitTests Exactly the same as DMWM-WMCore-UnitTests, but running on a Python3 stack.

DMWM-WMCore-PR-pylintpy3 Exactly the same as DMWM-WMCore-pylint, but running on a Python3 stack.

DMWM-WMCore-PR-pylint3k This is a standalone job that runs Python2 pylint checks on the master branch, checking for Python3 compatibility, using a docker container and running this script: https://github.com/dmwm/Docker/blob/master/wmcore_base/ContainerScripts/pylint3kTest.sh. This is the actual command:

pylint --py3k -d W1618,W1619 --evaluation='10.0 - ((float(5 * error + warning) / statement) * 10)'  --rcfile standards/.pylintrc

DMWM-WMCore-TestOracle Exactly the same as DMWM-WMAgentPy3-TestAll, however, a single slice of tests execute on one worker node identified by the "oracle-env" label. These tests use Oracle as a database backend. Since the tests must share a single instance of a remote database, one slice is used so there is no contention between the tests for database access.

Pull request triggered checks

DMWM-WMCore-PR-test This is the main job responsible for testing pull requests. It's triggered by a new pull request, or a commit, or via test this please message. It marks the PR status to pending, using a docker container and running this script:: https://github.com/dmwm/Docker/blob/master/jenkins_python/scripts/PullRequestTestBegin.py

then it triggers many downstream jenkins jobs, such as: DMWM-WMCorePy3-PR-unittests, DMWM-WMCore-PR-pylint3k and DMWM-WMCore-PR-pylintpy3.

Once all of those have completed, it parses all their output artifacts and create the final HTML report. It also updates the final outcome of each of those checks in the PR. All of this happens from inside a docker container and running this script: https://github.com/dmwm/Docker/blob/master/jenkins_python/scripts/PullRequestReport.py

DMWM-WMCorePy3-PR-unittests Exactly the same as DMWM-PR-unittests, but running on a Python3 stack. Output artifacts are suffixed with `py3. The script executed to run the unit tests is: https://github.com/cms-sw/cms-bot/blob/master/DMWM/test-wmcorepy3.sh

DMWM-WMCore-PR-27 DECOMMISSIONED Check pull requests for inadvertent reintroduction of pre-python 2.7 idioms using a docker container and running this script: https://github.com/dmwm/Docker/blob/master/wmcore_base/ContainerScripts/pyfutureTest.sh Which basically checks all python files changed in the PR against futurize, e.g.:

  futurize -1 $name >> test.patch
  futurize -f execfile -f filter -f raw_input $name >> test.patch || true
  futurize -f idioms $name  >> idioms.patch || true

In addition to that, it also checks whether new python files have from __future__ import division, from this script: https://github.com/dmwm/Docker/blob/master/wmcore_base/ContainerScripts/AnalyzePyFuture.py

DMWM-WMCore-PR-pylintpy3 Exactly the same as DMWM-WMCore-PR-pylint, but running on a Python3 stack. Output artifacts are suffixed with py3.

DMWM-WMCore-PR-pylint3k This job executes Python2 pylint in python3 compatibility mode, thus using the --py3k option, on all the python files that have changed within the feature branch. It then compares the pylint report between master and the feature branch using a docker container and running this script: https://github.com/dmwm/Docker/blob/master/wmcore_base/ContainerScripts/pylint3kTest.sh thus a test like the following is executed:

  pylint --py3k -d W1618,W1619 --evaluation='10.0 - ((float(5 * error + warning) / statement) * 10)'  --rcfile standards/.pylintrc  --msg-template='{path}:{line}: [{msg_id}({symbol}), {obj}] {msg}'  $name  > pylint.out || true