[GSoC 2018] Automated regression tests against real world projects - pmd/pmd GitHub Wiki
Automated regression tests against real-world projects(GSoC 2018)
Timeline
Community Bonding Period
- Learning ruby and gem
- Working with the Mentor on refining the project plan e.g. preliminary design structure of the tool and structure of the baseline
- Investigating the feasibility of various options e.g. generating xref file or just simply refering to wherever the standard project is hosted
First coding period (May 15, 2018 - June 15, 2018)
- End-of-period goal: Refactoring the script to a gem + getting the original html diff report
- Deliverables: A flexible and easily extensible code organization structure + original html diff report
- Details
- Week1: Further design structure of the tool + Determine the list of standard projects with specific branch/tag + Refactoring the script with the gem structure
Status: Determining the list of standard items was deferred until the diff report can be generated. - Week2: Adding test cases for every module + Executing system commands more securely( Now I'm just using the backquote in ruby to execute the command )
Status: Finished - Week3: Adding the ability to upload and download the baseline + Use bundler for dependency management and Investigate into ruby + unit testing and mocking frameworks + Check ruby code style with Robocop
Status: Upload and download the baseline is delayed to the fifth week in first coding period. - Week4: Finishing the mode option of the tool + the test cases for the mode option
Status: Finished all modes except 'online' mode which involves downloading baseline. And downloading baseline should be completed in second coding period. - Week5: As a buffer in case the previous plan cannot be completed on time.
- Week1: Further design structure of the tool + Determine the list of standard projects with specific branch/tag + Refactoring the script with the gem structure
Second coding period (June 16, 2018 - July 13, 2018)
- End-of-period goal: Polishing the diff report, create initial baseline, online mode, release pmdtester. Optional: generating configuration dynamically
- Deliverables: The polished diff report, working online mode, a documented way to update the baseline, pmdtester gem. Optional: The module to generate configuration dynamically
- Details
- Week1:
- Adding more details to the diff report e.g. the time spent executing PMD on each project, timestamp and information about base and patch
- Specify baseline format (zip archive, included files) and how to create the initial baseline. Baselines probably will exist here: https://sourceforge.net/projects/pmd/files/pmd-regression-tester/
- Online mode (download the baseline and create diff report)
- Week2:
- Continue online mode implementation
- Verify the links to the sources of the projects works (make sure there is a test case)
- Week3 & Week4:
- Parsing the changes in PMD: e.g. which rules are affected
- generating configuration dynamically: rule set with only the changed rules instead of all-java.xml
- integrate with baseline: do we need special baselines or can we reuse existing baselines?
- Some complex parsing may be skipped if time is short.
- Week1:
Third coding period (July 14, 2018 - August 14, 2018)
- End-of-period goal: Deploy the report to the public and run the travis job successfully
- Deliverables: The dangerfile + the complete travis job + documentation page
- Details
- Week1: Releasing pmdtester + looking into creating baseline scripts (PR for pmd/pmd) + Learning danger
- Week2: Integrating the tool into travis ci + Uploading diff report to chunk.io + Running danger for pull request
- Week3: Creating explicitly PRs which we don't merge and deliberately break something in PMD to test pmdtester + fixing issues and bugs
- Week4: Writing documentation of the regression tester tool in https://pmd.github.io/pmd/ under "Developer Documentation"
The project
Open Issues
Issues for this project and beyond...
- We execute PMD just on the source code of standard projects now, but specifying the classpath for libraries used by the source code is also crucial for the type resolution. See #48
- About new rules PR, we need to perform a special regression test on it later since new rules introduce new configurations which old version pmd cannot support.
- Now PmdTester only supports single-threading, we can improve it by using multi-threading to generate diff reports. See #46
- Where to host the diff report for each PR? (It should be free for open source projects)
- http://chunk.io/
- http://curldu.mp/
- https://surge.sh/
- a dedicated github repo
- https://bintray.com/
- GCS (there is a "Always Free Usage" possibility...)
- Support other languages. Current focus is Java.
- Support comparing the two stacktraces of PMDException and highlighting the differences within - that way, it's easier to see, what the diff is. Partially addressed with #39. See #49
Proposed approach
The project structure
pmd-regression-tester
├── bin
│ └── pmd-regression-tester <- command-line interface
├── config
│ ├── all-java.xml
│ ├── SOME_PMD_CONFIG.xml
│ ├── projectsList.txt <- contains list of standard projects
├── lib
│ ├── builders
│ │ ├── pmd_config_builder.rb <- Builds dynamic PMD configs based on the change in patch
│ │ ├── pmd_report_builder.rb <- Builds xml PMD reports on standard projects in the list
│ │ ├── diff_builder.rb <- Compares base and patch PMD xml reports and generate diff intermediate results
│ │ ├── html_report_builder.rb <- Builds diff html reports based on intermediate results
│ │ ├── link_builder.rb <- Builds the link points to the line of the source file which causes violations
│ │ └── summary_report_builder.rb <- Aggregate diff reports for all projects
│ ├── parsers
│ │ ├── options.rb <- Parses CLI options
│ │ └── projects_parser.rb <- Parses projectsList.txt
│ ├── mode
│ │ ├── online.rb <- The mode downloads PMD reports of master/base branch rather than generating it locally
│ │ ├── local.rb <- The mode generates both base and patch reports locally
│ │ └── single.rb <- The mode just generates patch reports
│ ├── project.rb <- Contains information about the standard project
│ ├── runner.rb <- Knits all libraries
│ └── violation.rb <- Contains information about the pmd violation
├── resource
│ └── css
│ ├── maven-base.css
│ └── maven-theme.css
├── test
├── LICENSE
└── README.md
The format of baseline
branch_name
├── branch_info.json
├── config.xml
├── STANDARD_PROJECT_NAME_1
│ ├── report_info.json
│ └── pmd_report.xml
├── ......................
│ ├── report_info.json
│ └── pmd_report.xml
└── STANDARD_PROJECT_NAME_n
├── report_info.info
└── pmd_report.xml
Example
Install
git clone https://github.com/pmd/pmd
git clone https://github.com/pmd/pmd-regression-tester.git
cd pmd-regression-tester
bundle install
Run local mode
# install pmd-regression-tester
bundle exec bin/pmdtester -r ../pmd -b master -p pmd_releases/6.4.0 -c config/all-java.xml -l config/project-list.xml
Run online mode
# install pmd-regression-tester
bundle exec bin/pmdtester -r ../pmd -b master -p pmd_releases/6.4.0 -m online
Generate baseline
# install pmd-regression-tester
bundle exec bin/pmdtester -r ../pmd -p master -c config/all-java.xml-l config/project-list.xml -m single -f
cd target/reports
zip -q -r master-baseline.zip master/
Ouput
The tool creates the following folders:
target
├── repositories <- the analyzed projects are cloned here
│ ├── PROJECT_NAME_1
│ ├── ......
│ └── PROJECT_NAME_n
└── reports
├── BASE_BRANCH_NAME <- the base baseline is placed here
├── PATCH_BRANCH_NAME <- the patch baseline is placed here
└── diff
├── index.xml <- the summary report of diff reports
├── base_config.xml <- the resources of the summary report
├── patch_config.xml <- the resources fo the summary report
├── css <- css reources are placed here
├── PROJECT_NAME_1
└── index.xml <- the diff report of PROJECT_1
├── .......
└── PROJECT_NAME_n
└── index.xml <- the diff report of PROJECT_N
This produces this report: https://chunk.io/adangel/2e9e4bc8f6b840459e3dfc1770d68ed6/index.html
Planning for dynamic configuration
- Basic requirements
Determine whether the changes in the two branches of pmd affect the java rules- If no effect, e.g.changes only appear in docs or modify rules in other languages, PmdTester exit with code 0 which means no java rules have been changed.
- If some java rules have been changed, go to generate the configuration dynamically.
- Dynamic configuration
- Figure out: which ruleset(e.g. design ruleset) has been changed.
- Java-based rules: check directory
pmd/pmd-java/src/main/java/net/sourceforge/pmd/lang/java/rule/RULESET_NAME
- Xpath-based rules: check ruleset file
pmd/pmd-java/src/main/resources/category/java/RULESET_NAME.xml
- If there are other directories(e.g. /pmd/core) related to the java rule, and the contents of the files in the directory have been changed, then we assume that all java rules have been changed.
- Java-based rules: check directory
- Generate the dynamic configuration from e.g. a ruleset, which contains all rulesets (similar to all-java.xml): We select only the rulesets by name
- Creating a subset of the baseline, selecting only the violations by the changed rulesets
- Figure out: which ruleset(e.g. design ruleset) has been changed.