Roadmap and future directions - pmd/pmd GitHub Wiki

:warning: This document is very outdated and it needs to be reviewed - [doc] Update roadmap document #4324


Comments/Ideas taken from #3898 on 2023-01-12:

These are enhancements that we could perform later than PMD 7.0.0 and are not necessary for the release. They might be "nice to haves" should we have time to look into them (for some of them @oowekyala has PoCs in his branches).

  • Support groovy rules
  • Virtual file system (VFS) to back TextFile, would be useful in particular for CPD and renderers
    • We need to get rid of Path, eg Paths.get(textFile.getDisplayName()) will be wrong in many cases, and many renderers do that
    • I added 2 methods to TextFile to represent different concerns (getDisplayName for rendering, getPathId for identity and caching)
    • These are still interpreted as Paths in some places which is inappropriate. They are strings so expose no "containment" information, which is required by some renderers.
    • A VFS with some abstract "Paths" could fix this situation, and also enable #3131 easily
  • Caching improvements (temp dir <-> zip file rotation for caching, give renderers a sandbox to render pieces of the report concurrently)
  • AST-based copy paste detection (I implemented that but expect it to work better when we're further ahead with the CPD refactoring)
  • SSA (single static assignment) representation for Java DFA
  • Concurrent renderers ("rendering server" thread)

This is taken from future.md, enriched with the mailing list discussions and moved to the wiki here, so that it hopefully becomes a living document...


Want to know what's coming? Or, better, wanna contribute ? Here is the page listing what are our plans - when we have ones, for the future of PMD. It also give you hints at part of the code we would like to clean - that you may want to clean to contribute to the project!

Of course, an easy way to contribute is too check out the github bug tracker or sourceforge bug tracker and see if you can fix some issues - some could be quite easy, we simply have not the time to look at them all!

At last, if you want to contribute, register on the pmd-devel mailing list, and come discuss with us! See also CONTRIBUTING.md.

Roadmap

This roadmap contains all the different bigger ideas, that PMD's developers have in mind right now. Some topics are in the works already, some can be picked. See also Project Ideas [Inception] and Project Ideas [Mature].

The topics listed here will eventually moved to the Project Ideas pages.

Please note that, of course, there is no warranty about when those 'features' will be finished, if they ever are.

Mature descriptions available:

Inception phase:

Archive / Taken:

Single Jar release

First would it be possible to release a single jar file? The recent change to distribute PMD in 30+ separate jar files is a pain for me to repackage and distribute with the plugin. I've written an ant file that that unjars them, adjusts the CPD services files, and rejars them all into a single jar file. At the moment, I'm not including the scala since without them, the combined jar file is about 5.5 MB, the scala files would add about another 20 MB and it's not a real popular language for jEdit users.

That would be definitely possible. This would be a "uber"-jar and could be generated by the pmd-dist module along with the current pmd-bin-*.zip file. The question however is this: Which language modules should be included? If we include all languages, then it will become huge, as you pointed out. We could create several flavors: pmd-all, pmd-all-without-scala, pmd-"most common", ...

Another possible packaging could be those languages that have both PMD and CPD support, that is, leave out those that only have CPD support.

Rule Ideas

  • Do we have a rule to style check for multiple declarations and chained assignments? (e.g. int a, b; int a = b = x;)

  • java-design: PositionalIteratorRule. This rule seems to be not fully implemented yet. It would detect multiple calls to Iterator.next() within a loop. See PositionalIteratorRule.java and testdata.

  • java-design: TooManyHttpFilter. Ensure that not more than x Servlet filters are in the code. Only testdata is available.

  • java-basic: AvoidUsingHardCodedURL. Similar to AvoidUsingHardCodedIP.

  • testng rules: #1447 Provide TestNG rules similar to the existing JUnit rules

  • I'm thinking about VariableNamingConventions rule, which could be improved by using regexes instead of the current prefix/suffix (and underdocumented) properties. We could maybe even split the rule into one rule for fields and one for local variables/parameters

  • In general, the Code Style rules could use a brushing up. There are some rules that could be merged (eg If and IfElse braces), and redundancy, e.g. MIsLeadingVariableName (which has 2 typos btw), which is a special case of VariableNamingConventions. SuspiciousConstantFieldName too, even though it seems backwards.

  • ClassNamingConventions is not configurable, it could use regexes for different class types too.

  • Android: ProtectLogD and ProtectLogV. Similar to GuardLogStatement, but for Android

    // good
    public class MyActivity extends Activity {
        protected void foo() {
            if (Config.LOGD) {
                Log.d("Tag", "msg 1");
            }
            bar();
            if (Config.LOGD) Log.d("Tag", "msg 2");
            bar();
            if (Config.LOGD && OTHER_DEBUG_FLAG) {
                Log.d("Tag", "msg 3");
            }
        }
        protected void foo2() {
            if (Config.LOGV) {
                Log.v("Tag", "msg 1");
            }
            bar();
            if (Config.LOGV) Log.v("Tag", "msg 2");
            bar();
            if (Config.LOGV && OTHER_DEBUG_FLAG) {
                Log.v("Tag", "msg");
            }
        }
    }
    
    // not good
    public class MyActivity extends Activity {
        protected void foo() {
            Log.d("Tag", "msg");
            bar();
        }
        protected void foo2() {
            Log.v("Tag", "msg");
            bar();
        }
    }
    

Further thoughts

These are food for thought, perhaps future items. If you think you'd like to work on one of these, check with pmd-devel to see what the current thoughts are on the topic.

  • CPD needs work on use of Language. It currently is hardcoded to only handled Java 1.4. Integrate CPD needs into core PMD where appropriate. Otherwise, drive CPD behavior based off of core PMD, instead of duplicating some logic.

  • Need a more flexible and powerful scheme for classifying files to various Languages. At a minimum, should have the ability to specify which file extensions you want to be used for a language (e.g. not everyone uses .jsp for JSP extensions, some use .jspx, .xhtml, etc.). Also, consider hooks into the LanguageVersionDiscoverer process for classifying a File/String to a LanguageVersion of a specific Language, one could imaging using a 'magic' system like Unix uses to tell different versions of files apart based on actual content.

  • Should we change Node interface to something like 'Node<T extends Node<T>>', and then declare the language specific node interfaces to be something like 'JavaNode extends Node<JavaNode>'? This could allow anything on the Node interface to return the language specific node type instead of generic node. For example, ASTStatement.jjtGetParent() to return a JavaNode, instead of a Node. This is a rather huge change, as the Node interface is one of the pervasive things in the PMD code base. Is the extra work of using the Node interface properly with generics, worth the omission of occasional some casting?

  • Should multiple Languages be able to claim a single source file? Imagine XML format JSP file, for which you've defined a ruleset which uses JSP and XML rules. Stating that certain XML rules also can map to the JSP language extensions could be useful. This means Source file to LanguageVersion mapping is not 1-1, but 1-many, we'd need to deal with this accordingly.

  • Additional changes to Rule organization within RuleSets as discussed on this forum thread: Separate Rule and RuleRefs, e.g. categories only contain rules, all other rulesets only contain rule refs. But what about custom rules?

  • Figure out a way to allow Rules to deal with parentheses and blocks, which introduce certain repetitive (and generally ignorable for most Rules) structures into the AST tree. Some rules are making special effort (e.g. ConfusingTernaryRule) to detect these AST patterns. Perhaps a "normalized" AST structure can be created which will make the AST appear consistent regardless of how many parens are presented, or how many blocks have been created (e.g. default block inserted, duplicates collapsed). This should be configurable on per Rule basis similar to TypeResolution and SymbolTable.

Performance

It would be great to have some performance tests, that would automatically signal, if the PMD runs slower.

Performance is a feature. Fast & accurate reports mean high value for low cost. There is great unlocked potential to improve PMD’s performance. In particular, at least for Java, there are 2 great areas of improvement:

  1. Parser: The lookahead on several rules of the grammar ends up looking for too much before committing to parsing it (I found and fixed a couple of these already). Unfortunately, finding these and improving the grammar is hard. Requires enabling DEBUG_LOOKAHEAD for JavaCC and reading the crappy output it prints to decipher what’s going on.

  2. Type Resolution: Even though on 5.5.2 is significantly faster, it’s still the slowest step when analyzing large projects and I believe there is still room for improvement. This one so far has proved to be easier, just requiring a profiler and some patience.

As for performance testing, I love the idea. Doing it however is non-trivial since accurate meditions require:

  • to account for the JVM warmup when measuring both baseline and current versions
  • to count on reliable hardware where CPU / disk availability are not an issue

I’ve done it for a Gradle plugin I wrote for static code analysis https://github.com/monits/static-code-analysis-plugin; and suffered a lot with Travis:

  1. I had to make the Travis.org config much more complex to be able to run all tests with all individual executions under 50’
  2. I have lots of false positives, were rerunning an individual build once or twice ends up passing.
  3. It takes so much time to run on Travis that it’s usually impractical to wait for it to finish on PRs.

It would be great if we were using a real CI such as a hosted Jenkins, where we could run the performance tests on a nightly build with known and dedicated hardware, but that’s wishful thinking…

Further modularization

Each language is right now in an own maven module. However, for releasing, all the modules are released at the same time, although only a few have changes. The idea is, to have an own repository per language. There will be some challenges for releasing (e.g. which versions are compatible and work together). It would foster a more plugin-like structure for the languages. Credit for this idea goes to Joseph Allen.

Right now it’s pretty common to manually override the used version of PMD when using Maven / Gradle to be able to use the latest one immediately, we even do it ourselves https://github.com/pmd/pmd/blob/master/pom.xml#L465-L475;. Having mismatched versions would not only require the need for a compatibility table, but would also make it harder to know if we are using the latest version of each module.

Even if the moved pmd-core from a top-level dependency to a transitive one (meaning, people would just say “I want pmd-swift 5.6.0”, not all the pieces), it would still cause confusion when using different pmd “flavors” throughout the same project. Think for instance having pmd-java 5.8.3, pmd-jsp 5.6.0 and pmd-javascript 5.6.2 for a single Java webapp project.

Assuming that pmd-core is super-stable, would be able to take pmd-core and then the latest versions of the language modules, you need - and they would work flawlessly together. PMD could release 3 or 4 times in the year a bundle - the binary distribution, which contains all modules. But throughout the year, we could release the single language modules as needed...

Complete XML based language modules

The idea is, that PMD understands some kind of XML (or some other format), that describes all aspects of a language module: the grammar, the AST nodes, the available rules. PMD would execute/interpret this module (or plugin). With this in place, it might be easier to add/modify a language. Credit for this idea goes to Joseph Allen.

I guess PMD would either generate code out of this "language description in XML format" at runtime/compile time or interpret this "language description" at runtime.

UI Tools

PMD contains currently a couple of UI tools (GUI for CPD, Designer, and some more). It would be nice to do a complete rewrite to provide "official" GUIs as frontends.

See also https://github.com/NutterzUK/pmd and Discussion and Mailing List

We could maybe also reenable the webstart link ... from time to time, there are users complaining that the link doesn't work on the homepage...

For UI tools, I do care about them. It is quite simple to integrate the rule designer as is into jEdit. For my point of view, it's okay to separate them as you need to, but back to my first comment, it would be nice to put everything into a single jar file.

jEdit currently requires Java 7, however, many of our users are running Java 8 and it is only a matter of time, and not much time, I suspect, that the minimum for jEdit will be Java 8. jEdit doesn't use any javafx, but it's been included in the java runtime since java 7, so I don't think that would be a problem.

Eclipse Plugin

The question is: Keep it alive or just drop it? There is an alternative already: https://acanda.github.io/eclipse-pmd/

Question: Does this alternative plugin support multiple languages, that is other languages than Java?

More Languages

We currently support many languages, but most languages can be used only for CPD. It would be great to have for the existing languages modules full PMD support, too.