[GSoC 2018] Automated regression tests against real world projects - pmd/pmd GitHub Wiki

Automated regression tests against real-world projects(GSoC 2018)

Binguo Bao

GSoC 2018 Final Work Product


Timeline

Community Bonding Period

  • Learning ruby and gem
  • Working with the Mentor on refining the project plan e.g. preliminary design structure of the tool and structure of the baseline
  • Investigating the feasibility of various options e.g. generating xref file or just simply refering to wherever the standard project is hosted

First coding period (May 15, 2018 - June 15, 2018)

  • End-of-period goal: Refactoring the script to a gem + getting the original html diff report
  • Deliverables: A flexible and easily extensible code organization structure + original html diff report
  • Details
    • Week1: Further design structure of the tool + Determine the list of standard projects with specific branch/tag + Refactoring the script with the gem structure
      Status: Determining the list of standard items was deferred until the diff report can be generated.
    • Week2: Adding test cases for every module + Executing system commands more securely( Now I'm just using the backquote in ruby to execute the command )
      Status: Finished
    • Week3: Adding the ability to upload and download the baseline + Use bundler for dependency management and Investigate into ruby + unit testing and mocking frameworks + Check ruby code style with Robocop
      Status: Upload and download the baseline is delayed to the fifth week in first coding period.
    • Week4: Finishing the mode option of the tool + the test cases for the mode option
      Status: Finished all modes except 'online' mode which involves downloading baseline. And downloading baseline should be completed in second coding period.
    • Week5: As a buffer in case the previous plan cannot be completed on time.

Second coding period (June 16, 2018 - July 13, 2018)

  • End-of-period goal: Polishing the diff report, create initial baseline, online mode, release pmdtester. Optional: generating configuration dynamically
  • Deliverables: The polished diff report, working online mode, a documented way to update the baseline, pmdtester gem. Optional: The module to generate configuration dynamically
  • Details
    • Week1:
      • Adding more details to the diff report e.g. the time spent executing PMD on each project, timestamp and information about base and patch
      • Specify baseline format (zip archive, included files) and how to create the initial baseline. Baselines probably will exist here: https://sourceforge.net/projects/pmd/files/pmd-regression-tester/
      • Online mode (download the baseline and create diff report)
    • Week2:
      • Continue online mode implementation
      • Verify the links to the sources of the projects works (make sure there is a test case)
    • Week3 & Week4:
      • Parsing the changes in PMD: e.g. which rules are affected
      • generating configuration dynamically: rule set with only the changed rules instead of all-java.xml
      • integrate with baseline: do we need special baselines or can we reuse existing baselines?
      • Some complex parsing may be skipped if time is short.

Third coding period (​July 14, 2018 - August 14, 2018​)

  • End-of-period goal: Deploy the report to the public and run the travis job successfully
  • Deliverables: ​The dangerfile + the complete travis job + documentation page
  • Details
    • Week1: Releasing pmdtester + looking into creating baseline scripts (PR for pmd/pmd) + Learning danger
    • Week2: Integrating the tool into travis ci + Uploading diff report to chunk.io + Running danger for pull request
    • Week3: Creating explicitly PRs which we don't merge and deliberately break something in PMD to test pmdtester + fixing issues and bugs
    • Week4: Writing documentation of the regression tester tool in https://pmd.github.io/pmd/ under "Developer Documentation"

The project

Open Issues

Issues for this project and beyond...

  • We execute PMD just on the source code of standard projects now, but specifying the classpath for libraries used by the source code is also crucial for the type resolution. See #48
  • About new rules PR, we need to perform a special regression test on it later since new rules introduce new configurations which old version pmd cannot support.
  • Now PmdTester only supports single-threading, we can improve it by using multi-threading to generate diff reports. See #46
  • Where to host the diff report for each PR? (It should be free for open source projects)
  • Support other languages. Current focus is Java.
  • Support comparing the two stacktraces of PMDException and highlighting the differences within - that way, it's easier to see, what the diff is. Partially addressed with #39. See #49

Proposed approach

The project structure

pmd-regression-tester  
├── bin  
│   └── pmd-regression-tester  <- command-line interface  
├── config  
│   ├── all-java.xml  
│   ├── SOME_PMD_CONFIG.xml  
│   ├── projectsList.txt  <- contains list of standard projects  
├── lib  
│   ├── builders  
│   │   ├── pmd_config_builder.rb  <- Builds dynamic PMD configs based on the change in patch  
│   │   ├── pmd_report_builder.rb  <- Builds xml PMD reports on standard projects in the list  
│   │   ├── diff_builder.rb  <- Compares base and patch PMD xml reports and generate diff intermediate results  
│   │   ├── html_report_builder.rb  <- Builds diff html reports based on intermediate results  
│   │   ├── link_builder.rb  <- Builds the link points to the line of the source file which causes violations  
│   │   └── summary_report_builder.rb  <- Aggregate diff reports for all projects  
│   ├── parsers  
│   │   ├── options.rb  <- Parses CLI options  
│   │   └── projects_parser.rb  <- Parses projectsList.txt  
│   ├── mode  
│   │   ├── online.rb <- The mode downloads PMD reports of master/base branch rather than generating it locally  
│   │   ├── local.rb  <- The mode generates both base and patch reports locally  
│   │   └── single.rb  <- The mode just generates patch reports  
│   ├── project.rb  <- Contains information about the standard project  
│   ├── runner.rb  <- Knits all libraries  
│   └── violation.rb  <- Contains information about the pmd violation  
├── resource  
│   └── css  
│       ├── maven-base.css  
│       └── maven-theme.css  
├── test  
├── LICENSE  
└── README.md  

The format of baseline

branch_name  
├── branch_info.json  
├── config.xml  
├── STANDARD_PROJECT_NAME_1  
│   ├── report_info.json  
│   └── pmd_report.xml    
├── ......................  
│   ├── report_info.json  
│   └── pmd_report.xml    
└── STANDARD_PROJECT_NAME_n  
  ├── report_info.info  
  └── pmd_report.xml  

Example

Install

git clone https://github.com/pmd/pmd
git clone https://github.com/pmd/pmd-regression-tester.git
cd pmd-regression-tester
bundle install

Run local mode

# install pmd-regression-tester
bundle exec bin/pmdtester -r ../pmd -b master -p pmd_releases/6.4.0 -c config/all-java.xml -l config/project-list.xml

Run online mode

# install pmd-regression-tester
bundle exec bin/pmdtester -r ../pmd -b master -p pmd_releases/6.4.0 -m online

Generate baseline

# install pmd-regression-tester
bundle exec bin/pmdtester -r ../pmd -p master -c config/all-java.xml-l config/project-list.xml -m single -f
cd target/reports
zip -q -r master-baseline.zip master/

Ouput

The tool creates the following folders:

target  
├── repositories         <- the analyzed projects are cloned here  
│   ├── PROJECT_NAME_1  
│   ├── ......  
│   └── PROJECT_NAME_n  
└── reports  
    ├── BASE_BRANCH_NAME      <- the base baseline is placed here  
    ├── PATCH_BRANCH_NAME      <- the patch baseline is placed here  
    └── diff  
        ├── index.xml <- the summary report of diff reports  
        ├── base_config.xml  <- the resources of the summary report
        ├── patch_config.xml  <- the resources fo the summary report
        ├── css  <- css reources are placed here  
        ├── PROJECT_NAME_1  
            └── index.xml   <- the diff report of PROJECT_1  
        ├── .......  
        └── PROJECT_NAME_n  
            └── index.xml   <- the diff report of PROJECT_N  

This produces this report: https://chunk.io/adangel/2e9e4bc8f6b840459e3dfc1770d68ed6/index.html

Planning for dynamic configuration

  • Basic requirements
    Determine whether the changes in the two branches of pmd affect the java rules
    • If no effect, e.g.changes only appear in docs or modify rules in other languages, PmdTester exit with code 0 which means no java rules have been changed.
    • If some java rules have been changed, go to generate the configuration dynamically.
  • Dynamic configuration
    • Figure out: which ruleset(e.g. design ruleset) has been changed.
      • Java-based rules: check directory pmd/pmd-java/src/main/java/net/sourceforge/pmd/lang/java/rule/RULESET_NAME
      • Xpath-based rules: check ruleset file pmd/pmd-java/src/main/resources/category/java/RULESET_NAME.xml
      • If there are other directories(e.g. /pmd/core) related to the java rule, and the contents of the files in the directory have been changed, then we assume that all java rules have been changed.
    • Generate the dynamic configuration from e.g. a ruleset, which contains all rulesets (similar to all-java.xml): We select only the rulesets by name
    • Creating a subset of the baseline, selecting only the violations by the changed rulesets