PMD 7.0.0 API - pmd/pmd GitHub Wiki

⚠️ This page is outdated. Please refer to PMD 7 Tracking Issue #3898 for the current status.


Main Issue: [core] Define clearly private and public API #995

TOC

Status

  • unclear

What's left to do

  • unclear
  • First step: clarify general principles (not null, logging, exceptions, assertions, internal vs. impl, ...)
  • Second step: break this up into smaller pieces, that means, define separate API parts. Define package names, modules.

Topics

General principles

  • Reduce published API as much as possible. The smaller the API, the more can be changed.
  • Expose abstractions as interfaces, but don't publish the implementation. That allows us to change the implementations as needed. As long as we keep the interfaces.
  • PMD core should depend on a set of interfaces that language implementations can implement however they want, provided they respect the contract of the interface.
  • The published API should not depend on another library (like Jaxen, Antlr), otherwise we won't be able to change/update that library. Because updating the library possibly changes the API then.
    • The exception to this rule is the @Nullable annotation.

Default is NonNull unless Nullable

Package Names

  • Base package is always net.sourceforge.pmd.
  • Each API part must be in a separate sub package (e.g. net.sourceforge.pmd.lang or net.sourceforge.pmd.core.lang?).
  • The .internal packages are used to hide the implementations, that are not part of the public API. This is internal and can change any time.
  • The annotation @InternalApi will be used for APIs that have to live in a public API package. This is still internal and can change any time.
  • There are also .impl packages. These contains base classes of the API (like abstract classes) and default implementations that help to implement our API. The classes in the .impl package are also published API.
    • Example: AbstractNode
  • Package names for modules: each module (e.g. pmd-core, pmd-java, pmd-xml) should use a different base package name, so that from the package name of a class one can conclude in which module the source can be found. This also helps when javadoc is generated, as a package should only be in one module.
    • Example: pmd-core -> net.sourceforge.pmd.core for "very general core APIs", but e.g. net.sourceforge.pmd.lang net.sourceforge.pmd.core.lang for language api?
    • Example: pmd-xml -> net.sourceforge.pmd.xml or net.sourceforge.pmd.lang.xml?? ...
    • General rule: net.sourceforg.pmd.<module-name without pmd- prefix>.sub-packages

Compatibility

  • Semantic Versioning:
    • major: incompatible changes require a major version change
    • minor: compatible changes, improvements are minor version change
    • patch: bugfixes only are patch version change
  • Level of support for language modules: Maintained (= supported), Incubating (= experimental), Deprecated (= will be removed, if no one steps up).
    • Removed language modules can be collected in a pmd-contrib repository, but that will remain probably unmaintained
    • When would we remove a language version? According to semver, only with the next major release.
  • Lifecycle of API:
    • Maintained (= supported) - will be available throughout the major version
    • Incubating (= @Experimental). It can change and might also be removed from one minor version to the next. Once the annotation is removed, it is promoted to being maintained.
    • Deprecated - will be removed with the next major version.
  • Beginning with PMD 7, we'll use a tool to verify compatibility from one (minor) release to the next.

Error handling / exception handling

  • Make use of org.apache.commons.lang3.exception.ContextedRuntimeException
  • Exception type hierarchy?

Glossary

TODO. Here we should describe the concepts (like what is a File, a DataSource, what is the thing, that PMD outputs, probably Violations, ...)

CLI API

Purpose:

Goal:

  • I'd like to see more GNU-style command line options, e.g. --long-options an -l (short ones).
  • Add --version option to display the version and capabilities (supported languages). See also discussion on #1913 [core] "-help" CLI option ends with status code != 0.
  • Same possibilities as with other ways to call PMD (via ant, maven, programmatically, ...). E.g. --failOnViolation needs a corresponding option to set which rule priority should fail the build (like maven's "failurePriority"). Note, this has not to be confused with -minimumpriority... (motivation for this from stackoverflow)
  • Take guidance from JEP 293: Guidelines for JDK Command-Line Tool Options.

Users:

Programmatic API to call PMD and CPD

Purpose: (Tight) Integration of PMD into other tools

Goal:

Users: CLI, maven, eclipse-plugin, ant task (gradle)

Core Concepts: DataSource, Configuration, ProgressMonitor, ...

RuleSet API

Purpose: loading, schema, ...

Goals:

  • better schema validation, e.g. clearly know which attributes are mandatory/optional. Currently 2 use cases are mixed: rule definition, rule reference.
  • Also xpath rules could use some syntactic sugar (currently they are just normal rule definition).
  • Less noisy warning/ error handling (eg hide multiple warnings behind a "your ruleset has warnings"), e.g. avoid issues like #1794 [core] Ruleset Compatibility fails with excluding rules
  • See also sf issue 1537 [core] Implement strict ruleset parsing
  • Maybe allow for multiple instances of the same rule in a ruleset. USe case could be:
    • Rule A with property X=Y and priority 1 (e.g. EmptyCatchBlock with allowCommentedBlocks=true)
    • Rule A with property X=Z and priority 3 (e.g. EmptyCatchBlock with allowCommentedBlocks=false)
    • In that case, both rules should be executed, but the reported violations have a different priority. This allows for failing the build for "serious" problems and have a report with all problems at the same time. One issue with that is of course, that the two rules overlap, so that for catch blocks with comments, two violations will be reported. We might consolidate that and just retain the violation with the highest priority.
    • See comment #2006#issuecomment-547212238.
  • Better output of what's going on: e.g. active rules, ignored rules due to minimum priority, ...
  • Reading and writing rulesets ("RuleSetWriter")
  • Make a nice Java API to get the category rulesets for a language / list built-in rules
  • getCategories(): could be a enum, we have a fixed set of categories.
  • Maybe?? support different versions of the schema - would help in transition from PMD 6 to 7.
  • sourceforge: #1466 [core] Support sharing properties across multiple rules

Documentation:

Rule API

Purpose: adding custom rule, define how rules are executed (multi file analysis, multi threading concerns, etc.)

Documentation:

Reporting API

Language Implementing API

TODO: This needs to split up into smaller chunks "Language Implementing API" is too big.

  • AST API
    • Users: rule developers
    • Note: the changes here affect all languages...
    • AST, Node, NodeStream, DataMap, @Text (textual representation of a node)
      • What methods do the super classes (AbstractNode, ...) provide? Maybe we don't need a abstract node at all, see below.
      • TODO: Describe DataMap purpose and use cases
        • Consider adding something like T getData(DataKey<T> key) to the node interface, where DataKey just ensures the visible API is strongly typed.
        • In fact, the designer already has something like that: DataHolder. Nodes should most probably contain a DataHolder of the sort.
        • That's a replacement for the current getUserData.
        • This makes the AST "extensible" externally by our internal subsystems, by rules, and by external tools (eg designer)
    • Guidelines for AST nodes:
      • should hide what's not relevant to someone examining the AST.
      • constructors and setters for the AST nodes should be package private. These are called only by the parser.
      • concrete node classes should be final
      • base classes should be hidden (and not public): no obscure public base classes, with which a node can have 10 possible types.
      • Nodes should have as shallow a class hierarchy as possible. Class hierarchies like those of AbstractJavaAccessTypeNode or MethodLikeNode and co are inflexible and too specific. LambdaExpression has a method isStrictFp because of that. I don't mind about interface hierarchies with default methods, but nodes shouldn't.
      • deprecate Node.getImage() - it is too generic. In each node, it has a different meaning (e.g. variable name, method name, ...).
        • Note: Designer Bindings -> getMainAttribute
        • also deprecate Node.hasImageEqualTo(...)
        • PLSQL has e.g. getCanonicalImage etc.
      • More useful AST navigation: e.g. a if statement has a condition, so there should be a getCondition() which returns the correct child.
        • should be high-level. n.getLastChild() is more intuitive than n.jjtGetChild(n.jjtGetNumChildren() - 1), klass.getSimpleName() is better than klass.getImage().
        • no implemementation specific methods like jjt*....
      • When to implement Iterator?
      • Nodes should expose the accessors that are relevant to them and only those. Methods like ASTThrowStatement#getFirstClassOrInterfaceTypeImage are way too specific, and are perhaps used by one rule in the codebase.
      • See also @NoAttribute --> XPath
      • Javadoc: BNF snippets should describe all the possible states a node can have when it ends up in a rule. That's to avoid the rule developer finding out about an edge case in the wild. That's also for us to consider all edge cases when making the API, without having to spend our time in the grammar file. bnf snippets for abstract types enumerate the possible subtypes.
        • The designer uses javadoc from AST nodes. So all AST nodes should provide a small BNF snippet of the grammar marked with <pre class="grammar">...</pre>. It might also provide docs for XPath attributes of the current AST node.
      • should be well documented. You shouldn't have to look at the implementation to find out what a method does.
      • The jjtree generated comment ("// generate by jjtree - do not change...") must be removed.
      • Naming: Similar nodes should have a common suffix, like "Expr", "Type", "Name", ...
      • Override getChild() to return the specific child type, if it can be only one type. No need for casting then.
      • AbstractJavaNode overrides getBeginLine()/getBeginColumn(), taking the position from the first token ... does this make sense for other languages too? Should we have something like getPosition() on the nodes? (to not have 4 methods to read the position... beginLine, beginColumn, endLine, endColumn). The position could also have convenience methods like "length", "offset".
        • AbstractNode is becoming completely irrelevant, each language implementation is only using a different subset of the fields.
        • Many methods can be provided by a default implementation in the Node interface
        • The "id" of a node is not needed (was only used to get the XPath node name from jjtree)
        • text positions (beginLine, beginColumn, etc.) might not be needed, if we can always delegate to a token
      • Debugging support: toString() of nodes...
    • --> These guidelines should end up in a custom PMD rule to enforce this.
    • Designer Bindings / Tree Dumper
    • Autofix/Quickfix -> Document API
    • AST Manipulation methods
      • Needed for languages, that??? maybe tree builder
      • AbstractNode::addChild, insertChild, setChildren
    • Concepts: TextAvailableNode, RootNode, ... GenericNode?, ... TextCoordinates/Position/...,
  • Uniform handling of comments and whitespaces
    • comments should be nodes?
      • From #2447: In pmd-java, Comment now extends AbstractJjtreeNode, it's technically a separate tree: the parent of a comment cannot be a JavaNode, the child of a JavaNode cannot be a Comment. Thankfully this is now enforced at compile time. This needs some more thought, I don't know if this should stay or go, or what relation this would have to a "javadoc language module".
    • should we keep whitespace as special nodes as well?
    • Maybe Parser Options? -> parser options will be removed (#2675, #2602)
  • Clarify root node: Consider introducing a real "RootNode" kind of node type. In Java we have now (in PMD 7 java-grammar) both a ASTCompilationUnit and a getRootNode() method. Apex has multiple root nodes (see #1937 [apex] Apex should only have a single RootNode. In PLSQL, I've implemented a way to get hold of the complete source code of the analyzed file. In that case, this root node might be a file node "FileNode" (however, the analyzed source code does not need to originate from files only, it could also be downloaded from the web or from a database).
    • Each language should have only one node type, that serves a a root node.
    • It might be a "virtual"/"artificial" node like "CompilationUnit". Maybe the name should be something like "CodeUnit", "AnalyzedCodeUnit". It is the "thing" on which PMD operates, the "DataSource".
    • Need a core concept for the thing, that PMD operates on - ASTRoot?
    • The Parser returns always a RootNode rather than a Node: net.sourceforge.pmd.lang.Parser::parse
      • Done with #2447
  • AST Visitors - generic visitors
  • Parser + Parser Options
  • Metrics
  • Language Module
    • Make it easy to implement a new language
    • Remove duplication!
  • Support for both CPD and PMD
  • Symbol Table
  • ❌ Dataflow (Deprecated on PMD 6, Remove with PMD 7)
    • This seems to be java specific for now
    • There might be some usages in plsql... (but there are no rules, that use it)
    • I'd rather make this one a java specific API and only later a core api
    • That means deprecating the API on PMD6
    • Note: The only rule for Java DataflowAnomalyAnalysis has been deprecated #2647 and removed in PMD 7
    • [core] Deprecate DFA #2681
    • [core] Remove DFA #2696
  • Mixing Languages (jsp + html + javascript, apex + visualforce, java + javadoc, ...)
  • Logging of processing errors (cross-cutting!)
    • Semantic Error vs. Parsing Error
  • Grammar
    • javacc
    • antlr
    • decorater / treebuilder
  • Custom Violation Suppressors (see #2055) --> Rule API
  • Designer Bindings (see #2118) --> Debugging Support
  • "Document Manipulation API" - net.sourceforge.pmd.document. API added when working on quick fixes/autofixes.
  • internal implementation of source processing / source processing stages
  • Language instances should only be retrieved queried from LanguageRegistry. LanguageRegistry could be simplified, eg it's very error-prone to have to rely on strings, and finding the right type of "name" for a language. (findLanguageByTerseName, findLanguage, ...)
    • I'd say we should publish an enum of standard language ids, and have a method LanguageRegistry#findByKey. E.g.
// if java is not on the classpath, returns null.
Language java = LanguageRegistry.findByKey(StandardLanguages.JAVA);
  • Instances of interfaces like LanguageVersionHandler, Parser and stuff should only be retrieved from the corresponding Language instance. Eg.
Parser javaParser = LanguageRegistry.findByKey(StandardLanguages.JAVA).getParser();
// But NOT
JavaParser j = new JavaParser();
  • That decouples client code from the implementation classes, which is the whole point of publishing interfaces. The implementation classes can be made internal and changed at our discretion.
  • LanguageVersion should be turned an interface. The current ordering behaviour is broken (refs #1603) and language implementations should be able to implement it at their own discretion. Language versions could for example be implemented in some enums to get a consistent ordering.
  • The problem of relying on strings is also true for language versions. We could eg publish enums in language modules that represent standard language versions. E.g. in the Java module:
// usage if you have a compile dependency on pmd-java
// clients of just pmd-core should still use the Language instance directly
LanguageVersion java12 = JavaVersions.JAVA_12.get();
// published Api of pmd-java
// the class or enum implementing LanguageVersion is internal
enum JavaVersion {
   // ...
   ;
   private String myName; // found out by constructor

   public LanguageVersion get() {
      LanguageRegistry.findByKey(StandardLanguages.JAVA).getVersion(myName);
   }
}

Language registry

XPath

  • XML schema of AST

  • @NoAttribute extension

  • Logging of processing errors (cross-cutting!)

  • Ideally, hide all Saxon-specific stuff from our published API, publish a set of implementation-independent interfaces instead.

  • Turn XPathRule into an interface, make each LanguageVersionHandler return an XPathRuleBuilder so that the actual implementation can be language-specific

    • Made possible by the Schema change for XPath rule defs
  • Issues

Debugging Support

Miscellaneous

Stuff the doesn't fit into the categories above.


API General Design Goals

Description of the new APIs

TODO

AST API

Description of the new NodeStream API

Package: net.sourceforge.pmd.lang.ast

Implementation: #1622 [core] NodeStream API

In java rule implementations, you often need to navigate the AST to find the interesting nodes. In PMD 6, this was often done by calling jjtGetChild(int) or jjtGetParent(int) and then checking the node type with instanceof. There are also helper methods available, like getFirstChildOfType(Class) or findDescendantsOfType(Class). These methods might return null and you need to check this for every level.

Therefore, the NodeStream API provides easy to use methods that follow the Java Stream API (java.util.stream).

Node and Node Stream

plantuml source

interface Node {
    NodeStream<Node> asStream()
    NodeStream<Node> children()
    NodeStream<Node> descendants()
    NodeStream<Node> descendantsOrSelf()
    NodeStream<Node> ancestors()
    NodeStream<Node> ancestorsOrSelf()

    NodeStream<R> children(Class<R> rClass)
    NodeStream<R> descendants(Class<R> rClass)
    NodeStream<R> ancestors(Class<R> rClass)
}

interface NodeStream {
    Stream<@NonNull T> toStream()
    NodeStream<R> flatMap(Function<? super T, ? extends @Nullable NodeStream<? extends R>> mapper)
    NodeStream<R> map(Function<? super T, ? extends @Nullable R> mapper)
    NodeStream<T> filter(Predicate<? super @NonNull T> predicate)
    NodeStream<T> peek(Consumer<? super @NonNull T> action)
    NodeStream<R> forkJoin(Function<..> first, Function<..> second, Function<..> ... rest)
    NodeStream<T> append(NodeStream<? extends T> right)
    NodeStream<T> prepend(NodeStream<? extends T> right)
    NodeStream<T> cached()
    NodeStream<T> take(int maxSize)
    NodeStream<T> drop(int n)
    NodeStream<T> dropLast(int n)
    NodeStream<T> takeWhile(Predicate<? super T> predicate)

    NodeStream<Node> ancestors()
    NodeStream<Node> ancestorsOrSelf()
    NodeStream<Node> parents()
    NodeStream<Node> children()
    NodeStream<Node> descendants()
    NodeStream<Node> descendantsOrSelf()
    NodeStream<Node> followingSiblings()
    NodeStream<Node> precedingSiblings()
    NodeStream<R> children(Class<R> rClass)
    NodeStream<R> descendants(Class<R> rClass)
    NodeStream<R> ancestors(Class<R> rClass)
    NodeStream<T> filterNot(Predicate<? super @NonNull T> predicate)
    NodeStream<T> filterMatching(Function<? super @NonNull T, ? extends @Nullable U> extractor, U comparand)
    NodeStream<R> filterIs(Class<R> rClass)
    NodeStream<T> filterNotMatching(Function<? super @NonNull T, ? extends @Nullable U> extractor, U comparand)

    void forEach(Consumer<? super @NonNull T> action)
    R reduce(R identity, BiFunction<? super R, ? super T, ? extends R> accumulate)
    int sumBy(ToIntFunction<? super T> toInt)
    int count()
    boolean nonEmpty()
    boolean isEmpty()
    boolean any(Predicate<? super T> predicate)
    boolean none(Predicate<? super T> predicate)
    boolean all(Predicate<? super T> predicate)
    T get(int n)
    T first()
    Optional<T> firstOpt()
    T first(Predicate<? super T> predicate)
    R first(Class<R> rClass)
    T last()
    R last(Class<R> rClass)
    R collect(Collector<? super T, A, R> collector)
    List<T> toList()
    List<R> toList(Function<? super T, ? extends R> mapper)

    NodeStream<T> of(@Nullable T node)
    NodeStream<T> fromIterable(Iterable<@Nullable T> iterable)
    NodeStream<T> ofOptional(Optional<T> optNode)
    NodeStream<T> of(T... nodes)
    NodeStream<T> union(NodeStream<? extends T>... streams)
    NodeStream<T> union(Iterable<? extends NodeStream<? extends T>> streams)
    NodeStream<T> empty()
}

Many complex predicates about nodes can be expressed by testing the emptiness of a node stream. E.g. the following tests if the node is a variable declarator id initialized to the value 0:

Example:

     NodeStream.of(someNode)                           // the stream here is empty if the node is null
               .filterIs(ASTVariableDeclaratorId.class)// the stream here is empty if the node was not a variable declarator id
               .followingSiblings()                    // the stream here contains only the siblings, not the original node
               .filterIs(ASTVariableInitializer.class)
               .children(ASTExpression.class)
               .children(ASTPrimaryExpression.class)
               .children(ASTPrimaryPrefix.class)
               .children(ASTLiteral.class)
               .filterMatching(Node::getImage, "0")
               .filterNot(ASTLiteral::isStringLiteral)
               .nonEmpty(); // If the stream is non empty here, then all the pipeline matched

Description of Node API

Package: net.sourceforge.pmd.lang.ast

Implementation: see #2172 [core] Deprecate jjtree methods from the Node interface

The naming of the methods jjt* was bad, since these methods indicate, they are from javacc, while we as PMD own these methods and javacc is a implementation detail.

The purpose of the Node API is

  • Tree traversal
  • XPath support
  • Location metadata

Tree traversal

Navigate the AST. This is an alternative way to NodeStream API for convenience.

Node getParent()
Node getChild(int index)
int getNumChildren()
int getIndexInParent()
Iterable<? extends Node> children()

Node getNthParent(int n)
T getFirstParentOfType(Class<T> parentType)
List<T> getParentsOfType(Class<T> parentType)
T getFirstParentOfAnyType(Class<? extends T>... parentType)

T getFirstChildOfType(Class<T> childType)
List<T> findChildrenOfType(Class<T> childType)
T getFirstDescendantOfType(Class<T> descendantType)
List<T> findDescendantsOfType(Class<T> targetType)
List<T> findDescendantsOfType(Class<T> targetType, boolean crossFindBoundaries)
boolean hasDescendantOfType(Class<T> type)

XPath support

Describes nodes in a form understandable by XPath expressions.

String getXPathNodeName()
Iterator<Attribute> getXPathAttributesIterator()

Location metadata

Provides access to the location of the node in the source code.

int getBeginLine()
int getBeginColumn()
int getEndLine()
int getEndColumn()

TODO: This should be changed to provide higher level access to location metadata, e.g. Location getLocation() with location having access to line-based-location and offset-based-location.

See also "Document API" (net.sourceforge.pmd.document) and RegionByLine/RegionByOffset.

Data Map

TODO...

AST Visitors

See [swift] Refactor antlr implementation #2463 and [core] Add generic visitor interface in pmd-core #2589.

  • net.sourceforge.pmd.lang.ast.AstVisitor and net.sourceforge.pmd.lang.ast.AstVisitorBase Parser-generator-agnostic visitors to be used by both JavaCC and Antlr.
  • The main Node interface net.sourceforge.pmd.lang.ast.Node has the appropriate acceptVisitor method.

Rule and RuleChain API

  • Rule::apply is the only entry point for AST processing for rules, regardless of their type/impl (XPath, Java, rulechain or not)
  • How to use rulechain for your own rule?
    • The default behavior for Rule::buildTargetSelector() is RuleTargetSelector.forRootOnly(), which means, that no rule chain is used, AST traversal starts at the root.
    • In order to use rulechain, rule impls need to provide RuleTargetSelector via Rule::buildTargetSelector()
    • PMD will use Rule::getTargetSelector()...

RuleSet API

Package: net.sourceforge.pmd.core

Purpose: Loading and parsing ruleset files, creating rulesets from rules.

Replaces: RuleSetFactory

Important classes: RuleSetLoader, RuleSetLoadException

Symbol (Table) API

net.sourceforge.pmd.lang.java.symbols

TODO...

(Rule) Properties API

TODO...

Metrics API

Report API

Package: net.sourceforge.pmd

Implementation: partly: [core] Deprecate report methods #2680

Report classes

plantuml source

class Report {
    void addConfigError(ConfigurationError e)
    void addError(ProcessingError e)
    void addRuleViolation(RuleViolation rv)
    
    List<ConfigurationError> getConfigurationErrors()
    List<ProcessingError> getProcessingErrors()
    List<RuleViolation> getViolations()
    List<SuppressedViolation> getSuppressViolations()
    
    void merge(Report other)
    
    void addListener(ThreadSafeReportListener l)
    void addListeners(List<ThreadSafeReportListener> ll)
    List<ThreadSafeReportListener> getListeners()
}

class ConfigurationError {
    String issue()
    Rule rule()
}

class ProcessingError {
    String getDetail()
    Throwable getError()
    String getFile()
    String getMsg()
}

class SuppressedViolation {
    RuleViolation getRuleViolation()
    String getUserMessage()
    ViolationSuppressor getSuppressor()
}

interface ViolationSuppressor {
    String getId()
    SuppressedViolation suppressOrNull(RuleViolation rv, @NonNull Node node)
}

Dataflow API

  • ⚠ This has been removed with PMD 7, so there is no language agnostic dataflow support anymore for now.

Properties for Renderer/Formatter API

TODO...

Language implementations

  • PMD publishes an API to implement new languages

  • For a package p for a feature (eg rule violations):

    • published API are in p
      • This should be enough to understand what the feature does while browsing javadocs
      • eg the interface RuleViolation, ViolationSuppressor
    • implementation classes that we publish and support for language implementations to use are in p.impl
      • these are classes that are irrelevant to understanding the API, but ease implementation/ share behaviour
      • eg the DefaultRuleViolationFactory, the RuleViolationFactory interface (if we need it anyway)
    • internal API, that p.impl is built upon, or that we don't want to publish yet are in p.internal
  • PMD should offer minimal extension points

    • Prefer composition to inheritance (ie prefer aggregating many small, orthogonal services instead of extending a big base class)
    • For example, providing custom ViolationSuppressors and ViolationDecorators is better than extending RuleViolationFactory and RuleViolation
  • The only service that is truly required for a PMD language implementation is a Parser

    • The Parser interface should be simplified to just RootNode parse(DataSource source, LanguageVersion version);
      • Many Parsers don't expose a tokenizer, and that's only relevant for CPD, so should be another extension point
  • No ParserOptions anymore, instead LanguageProperties (#2518).

    • Language Properties can be set via CLI (#2947)
    • Language Properties can be set via Environment Variables. These override the previous ones (#3085)
  • A RootNode has a AstInfo object, which provides access to

    • the current parser task, and through that access to the file that is being parsed/has been parsed.
    • the ast itself
    • suppressions
    • see #3085

CPD API

Programmatic API to call PMD and CPD and File Collector API

Purpose: API that is used internally by the CLI part of PMD. It selects the files to be analyzed and provides a way to start the analysis using the new PmdAnalysis entry point.

Changed by: [core] Add file collector and new programmatic API for PMD #3785 and [x] [core] Improve PmdAnalysis #3822

Replaces: It replaces the old main entry point net.sourceforg.pmd.PMD.processFiles()

Example usage:

PMDConfiguration config = new PMDConfiguration();
config.setDefaultLanguageVersion(LanguageRegistry.findLanguageByTerseName("java").getVersion("11"));
config.setInputPaths("src/main/java");
config.prependClasspath("target/classes");
config.setMinimumPriority(RulePriority.HIGH);
config.addRuleSet("rulesets/java/quickstart.xml");
config.setReportFormat("xml");
config.setReportFile("target/pmd-report.xml");

try (PmdAnalysis pmd = PmdAnalysis.create(config)) {
    // optional: add more rulesets
    pmd.addRuleSet(pmd.newRuleSetLoader().loadFromResource("custom-ruleset.xml"));
    // optional: add more files
    pmd.files().addFile(FileSystems.getDefault().getPath("src", "main", "more-java", "ExtraSource.java"));
    // optional: add more renderers
    pmd.addRenderer(renderer);

    // or just call PMD
    pmd.performAnalysis();
}

File Collector API:

  • FileCollector PmdAnalysis.files()
  • net.sourceforge.pmd.lang.document.FileCollector
    • close()
    • getLog()
    • collection
      • addFile(Path file)
      • addFile(Path file, Language language)
      • addFile(TextFile textFile)
      • addSourceFile(String sourceContents, String pathId)
      • addDirectory(Path dir)
      • addFileOrDirectory(Path file)
      • addZipFile(Path zipFile)
    • configuration
      • setCharset(Charset charset)
      • relativizeWith(String prefix)
    • filtering
      • exclude(FileCollector excludeCollector)
      • filterLanguages(Set<Language> languages)

Spitballing on the programmatic usage of PMD

Note: This is probably outdated!

  • General observations

    • RuleContext is an internal detail, one should not have to provide it themselves
    • RuleContext is stateful but doesn't need to be (language version + file name are properties of the nodes)
    • SourceCodeProcessor is a mess (+ we have PMDTask, AbstractPMDProcessor and its implementations), it's really difficult to know how to launch PMD
    • Most of the times a user doesn't need a Report, the renderer could just append a violation to the output stream
  • PMD at its core boils down to a function (files, rules) -> violations

  • Of course as input it also takes more configuration, eg number of threads wanted, classloader, language versions to use etc

  • On a higher level we can also ignore the "return type" of the function type above, and instead use listeners to handle violations as they come. We already have this concept: ThreadSafeReportListener

Consider enriching this interface and making it more central to the API, eg:

// this listens to an analysis
// example implementations: 
//  * appending to a Report
//  * forwarding to a Renderer
//  * teeing to several listeners
//  * synchronized wrapper (makes a listener thread safe)
interface PmdAnalysisListener {
    void handleViolation(RuleViolation)
    void handleSuppressedViolation(SuppressedViolation)
    void handleError(ProcessingError)
}

// now rulecontext doesn't store language version or file name
// it's just the API to create violations and forward to the listener
class RuleContext {
   // this listener creates the result of the analysis 
   PmdAnalysisListener listener;
   

   void addViolation(node, message, messageArgs) {
       var v = createViolation(node, message, messageArgs);
       var suppressed = node.getLanguage().getViolationSuppressor().suppressOrNull(v);
       if (suppressed != null) 
        listener.handleSuppressedViolation(suppressed)
       else
        listener.handleViolation(v);
   }

   // + overloads for convenience
}

A rule implementation could look like:

class SomeRule {
    void visit(Node n, RuleContext ctx) { // (we don't have to keep the parameters as Object, would be safer if it was RuleContext).
        if (n is violation)
            ctx.addViolation(n, ...)
    }
}

Calling PMD could look like:

var report = new Report()
var renderer = new HtmlRenderer()
var files = PmdFiles.collectFromDirs(dirnames)
var ruleset = RuleSetFactory.parse(url)

var listener = PmdAnalysisListener.tee(report.listener(), renderer.listener())

var extraConfig = ExtraConfig.DEFAULT.withThreads(4)
                                     .withClasspath(someClassPath)
                                     .useLanguageVersion("java", "8")
                                     .addExtension("xml", ".fxml")
                                     // logging options, etc

PMD.call(files, ruleset, listener, extraConfig)

// when this returns the result is both the output of the renderer and the stuff in the Report
void callPmd(files, ruleset, listener, extraConfig) {
   var rc =  new RuleContext(PmdAnalysisListener.threadSafe(listener))
   var buckets = partitionFiles(files, extraConfig.numThreads) 
   var tasks = buckets.map { bucket -> new PMDTask(bucket, rc, ruleset.deepCopy()) }
   startAll(tasks)
   join(tasks)
}

Parser API

  • ParsingOptions is useless

    • Used atm for suppressMarker
    • in Javascript for Rhino Version -> convert to LanguageVersion
    • in XML for XML parser options -> figure out common denominator
  • Deprecate Rule::getParserOptions

  • Parser::parse could use much more data

    • Suppress marker
    • LanguageVersion
    • TextDocument(filename + Reader + line mapper, etc)
    • Semantic logger
      • semanticError(node, “....”)/warning/etc
      • benchmark(“...”, () -> { /*action*/ }) -> this is public api for language impls
    • -> wrap all this into a ParsingTask or something
  • Return a RootNode which knows about its TextDocument, LanguageVersion, etc

  • Replace processing stages

Rulechain revamp

class TreeIdx {
   RootNode root;
   Map<String, List<Node>> byName;
   Map<Class, List<Node>> byClass;
}


class RuleNodeSelector {
 abstract List<Node> selectInTree(TreeIdx);

 class StringSelector extends ... {
      List<Node> selectInTree(TreeIdx) {
         idx.byName().get(“PrimaryExpr”);
     }
 }
 class RootSelector extends ... { }
}

// Replaces Rule::getRulechainVisits
Rule::getSelector(): RuleNodeSelector
// Replaces Rule::apply
Rule::apply(Node, RuleContext) { }


// Reimplement Ruleset::apply
RuleSet::apply(RootNode n, RuleContext ctx) {
   TreeIdx idx = indexTree(n);
   this.foreach { rule ->
      List<Node> todo = rule.getSelector().selectInTree(idx)
      todo.foreach { node -> 
        try {
          rule.appy(node, ctx);   
        } catch (_) {
            ctx.handleError(_)
        }
      }
   }
}  

Analysis Listeners

See [core] Analysis listeners #3692.

Abstractions

  • GlobalAnalysisListener - main listener for the entry point PmdAnalysis. This listener is queried for FileAnalysisListeners during the analysis.
  • FileAnalysisListener - used during the analysis of a single file

Use cases

  • Violations and errors are reported through these analysis listeners.
  • No more using Report to transport the found violations by default. It's still possible, though, to get a report at the end with all violations.
  • Potentially interesting for IDE integrations.

Relationship to Renderers

Renderers receive their violations through these analysis listeners.

Relationship to RuleContext

RuleContext is not global to the analysis run anymore, but it only provides an API for rules to report violations.

RuleContext API

Purpose: API for rules to notify about found violations. It filters suppressed violations.

Changed by: [core] Analysis listeners #3692

RuleContext {
    void addViolation(Node location); // with default rule message
    void addViolation(Node location, Object... formatArgs); // with default rule message + message args
    void addViolationWithMessage(Node location, String message);
    void addViolationWithMessage(Node location, String message, Object... formatArgs);
    void addViolationWithPosition(Node location, int beginLine, int endLine, String message, Object... formatArgs);
    void addViolationNoSuppress(RuleViolation rv); // ???
}

Text Document API

Package: net.sourceforge.pmd.lang.document

Purpose: Unifying CPD/PMD handling, support modifying files (for autofix)

Changed by:

Description:

  • Chars: a string slice type. Supports efficient string-like operations without copying the char array of the file.

    • It is a CharSequence
    • constructors
      • wrap(CharSequence chars)
    • methods
      • writeFully(@NonNull Writer writer)
      • write(@NonNull Writer writer, int start, int count)
      • getChars(int srcBegin, char @NonNull [] cbuf, int dstBegin, int count)
      • appendChars(StringBuilder sb, int off, int len)
      • appendChars(StringBuilder sb)
      • ByteBuffer getBytes(Charset charset)
      • int indexOf(String searched, int fromIndex)
      • int indexOf(int ch, int fromIndex)
      • int lastIndexOf(int ch, int fromIndex)
      • boolean startsWith(String prefix, int fromIndex)
      • boolean startsWith(String prefix)
      • boolean startsWith(char prefix, int fromIndex)
      • boolean endsWith(String suffix)
      • Chars trimStart()
      • Chars trimEnd()
      • Chars trim()
      • Chars trimBlankLines()
      • Chars removeSuffix(String charSeq)
      • boolean contentEquals(CharSequence cs, boolean ignoreCase)
      • boolean contentEquals(CharSequence cs)
      • Chars subSequence(int start, int end)
      • Chars subSequence(int start)
      • Chars slice(TextRegion region)
      • Chars slice(int off, int len)
      • String substring(int start, int end)
      • Iterable<Chars> lines()
      • Stream<Chars> lineStream()
      • Reader newReader()
  • TextFile

    • TextFileContent
      • backing content of a TextFile
      • Uses Chars
      • Line terminators are normalized to \n
    • Reading the file computes a checksum which is used by incremental analysis later
    • TextFile replaces DataSource everywhere in the internals of PMD
    • implementations: StringTextFile, NioTextFile, ReaderTextFile
    • factory methods:
      • static TextFile forPath(Path path, Charset charset, LanguageVersion languageVersion)
      • static TextFileBuilder builderForPath(Path path, Charset charset, LanguageVersion languageVersion)
      • static TextFile forCharSeq(CharSequence charseq, String pathId, LanguageVersion languageVersion)
      • static TextFileBuilder builderForCharSeq(CharSequence charseq, String pathId, LanguageVersion languageVersion)
      • static TextFile forReader(Reader reader, String pathId, LanguageVersion languageVersion)
      • static TextFileBuilder builderForReader(Reader reader, String pathId, LanguageVersion languageVersion)
    • methods:
      • LanguageVersion getLanguageVersion()
      • String getDisplayName()
      • TextFileContent readContents()
      • boolean isReadOnly()
      • void writeContents(TextFileContent content)
    • TextFileBuilder
      • methods:
        • TextFileBuilder withDisplayName(@Nullable String displayName)
        • TextFile build()
  • TextDocument, built on top of TextFile

    • interface. implementations: RootTextDocument
    • Replaces CPD's SourceCode
    • Provides a view of a TextFile, potentially only selecting part of the whole file.
      • Addressing by offset/length (TextRange) in the view is relative to the view
      • Addressing by line/column (toLocation()) is absolute, relative to the underlying file.
    • Since it is only a view of a TextFile, it can provide (temporary) modifications
    • Tree:
      • at the root a RootTextDocument with the original TextFile. Coordinate system based on line+column, absolute in the original TextFile
      • one or more views, which potentially access only part of the whole original TextFile.
    • Backed by a TextFile
    • factories:
      • static TextDocument create(TextFile textFile)
      • static TextDocument readOnlyString(final CharSequence source, LanguageVersion lv)
      • static TextDocument readOnlyString(@NonNull CharSequence source, @NonNull String filename, @NonNull LanguageVersion lv)
    • methods:
      • FileLocation toLocation(TextRegion region)
      • LanguageVersion getLanguageVersion()
      • String getPathId()
      • String getDisplayName()
      • Chars getText()
      • Chars sliceText(TextRegion region)
      • long getCheckSum()
      • Reader newReader()
      • int getLength()
      • TextRegion getEntireRegion()
      • TextPos2d lineColumnAtOffset(int offset, boolean inclusive)
  • Position / locations in files

    • TextPos2d - wraps a line and column tuple. line+column
    • TextRange2d - wraps a tuple of TextPos2 for start and end positions. Also line+column based.
    • FileLocation
      • This represents the line/column coordinates of a place in a TextDocument. Used for reporting.
      • wraps: filename+textrange2d
      • factory via: TextDocument.toLocation(TextRegion)
        • static FileLocation range(String fileName, TextRange2d range2d)
      • TextPos2d getStartPos()
      • TextPos2d getEndPos()
      • TextRange2d toRange2d()
    • TextRegion
      • This represents a contiguous range of text in a TextDocument with offset/length, which is useful for tokens and for TextAvailableNodes in general.
      • In contrast to "*2d" - this uses offset+length
        • constructors:
          • fromOffsetLength(int startOffset, int length)
          • fromBothOffsets(int startOffset, int endOffset)
          • caretAt(int startOffset)
        • methods:
          • int getStartOffset()
          • int getEndOffset()
          • int getLength()
          • boolean isEmpty()
          • boolean contains(int offset)
          • boolean contains(TextRegion other)
          • boolean overlaps(TextRegion other)
          • TextRegion growLeft(int delta)
          • TextRegion growRight(int delta)
        • static methods:
          • @Nullable TextRegion intersect(TextRegion r1, TextRegion r2)
          • TextRegion union(TextRegion r1, TextRegion r2)
  • Reportable

    • package: net.sourceforge.pmd.reporting
    • This is an interface for anything that can provide a FileLocation (Nodes and Tokens currently)
    • Implemented by Nodes and Tokens - on these RuleViolations can be reported
    • methods:
      • FileLocation getReportLocation()

Text Documents

plantuml source

@startuml
interface TextDocument {
  LanguageVersion getLanguageVersion()
  String getPathId()
  String getDisplayName()
  Chars getText()
  Chars sliceText(TextRegion region)
  long getCheckSum()
  Reader newReader()
  FileLocation toLocation(TextRegion region)
}
~class RootTextDocument implements TextDocument {
  TextFile backend
  TextFileContent content
  LanguageVersion langVersion
  String fileName
  String pathId
}

interface TextFile {
  {abstract} LanguageVersion getLanguageVersion()
  {abstract} String getDisplayName()
  {abstract} TextFileContent readContents()
  {abstract} boolean isReadOnly()
  {abstract} void writeContents(TextFileContent content)
}
~class NioTextFile implements TextFile {
  Path path
  Charset charset
  LanguageVersion languageVersion
  String displayName
  boolean readOnly
}
~class StringTextFile implements TextFile {
  TextFileContent content
  String name
  LanguageVersion languageVersion
}
~class ReaderTextFile implements TextFile {
  String name
  LanguageVersion languageVersion
  Reader reader
}
abstract class TextFileBuilder {
  TextFileBuilder withDisplayName(String displayName)
  TextFile build()
}

class TextFileContent {
  Chars cdata
  String lineTerminator
  long checkSum
  SourceCodePositioner positioner
}

RootTextDocument .. TextFileContent
StringTextFile .. TextFileContent
TextFile .. TextFileContent
TextFile .. TextFileBuilder
RootTextDocument .. TextFile

@enduml

Open questions

Deprecated APIs

pmd-apex

For internalization

  • net.sourceforge.pmd.lang.apex.ast

pmd-core

For removal

  • RulesetWriter: do we really need that? It will be a pain to update during the schema update process
    • Yes, remove
  • The “dump façade”. I don’t really see much use to it tbh, and if we want to inspect nodes there’s the designer
    • Yes, remove.
    • ✅ Done with #1801
    • ✅ Marked as deprecated with #1806
  • Package net.sourceforge.pmd.util.filter:
    • Based around an equivalent to java.util.function.Predicate, can be replaced by that one
      • Make it private, maybe remove in 7.0.0, if we don’t need it anymore
  • PMD.getUriDataSources:
    • public static List getURIDataSources(String uriString) throws PMDException {
    • Only a helper method, could be private
      • private
  • PMDException
    • Its defining feature is the “severity” of the error, but this is not used once in our codebase.
    • It’s used very inconsistently within PMD
      • Yes, remove it
  • Language interface:
    • getRuleChainVisitorClass() -> there should be a method ‘RulechainVisitor getRuleChainVisitor()’ instead to avoid instantiating it via reflection
      • Yes
  • LanguageRegistry:
    • getInstance() -> I think at this point it’s quite clear that it’s a utility class since all methods are static, and getInstance() should be removed.
      • Yes
    • commaSeparatedTerseNamesForLanguageVersion, commaSeparatedTerseNamesForLanguage -> these are not generic enough to be on there, they are also used at most once or twice across PMD, could be inlined at call sites
      • Yes, they are at least private API.
  • findWithRuleSupport -> CPD-only languages should not provide net.sourceforge.pmd.lang.Language as a service, they have zero use and the modules already provide a net.sourceforge.pmd.cpd.Language
    • See eg the C++ language module, it provides implementations for CppHandler, CppParser, etc which only throw UnsupportedOperationExceptions
    • The LanguageRegistry should only provide a collection of fully supported PMD languages, but these dummy implementations clutter it.
    • Yes, deprecate for removal
  • net.sourceforge.pmd.util.log.PmdLogFormatter (unused)
    • Yes, deprecate for removal
  • #1550 Deprecate StatisticalRule
    All of net.sourceforge.pmd.stat, net.sourceforge.pmd.lang.rule.stat, and their language specific base classes. StatisticalRule is superfluous, and existing rules may be rewritten more clearly without it.
    • Remove
  • net.sourceforge.pmd.rule.ImportWrapper
    • Should be moved to pmd-java, it’s language specific
    • Yes :)
  • Rule interface:
    • getRuleClass/setRuleClass -> This is just getClass().getName(), and shouldn’t ever be set
      • Yes, remove
    • getParserOptions -> supporting rules with different parser options has never been implemented and would require several parsing phases if options conflict between rules -plus it’s probably not useful. I think it’s safe to say this is never going to see light
      • used in XmlXPathRule
      • but no implementation that would use XmlParserOptions…
      • Deprecate it for removal
    • addRuleChainVisit(String) -> unsafe since we don’t use a Class, and the implementation is wonky. Better keep to the version taking a Class parameter
      • Yes

For internalization

  • All net.sourceforge.pmd.util
  • net.sourceforge.pmd.ant.(SourceLanguage | RuleSetWrapper)
  • net.sourceforge.pmd.ant.Formatter
  • net.sourceforge.pmd.lang:
    • LanguageFilenameFilter
    • XPathHandler -> exposes implementation details, and not useful to an end-user
    • BaseLanguageModule
    • AbstractLanguageVersionHandler
    • AbstractParser
    • DataFlowHandler
    • Basically we can’t expect many other custom language implementations out there, so I’d rather most of the implementation details to be better in an internal package
  • net.sourceforge.pmd.rule:
    • AbstractRuleChainVisitor
    • AbstractRuleViolationFactory
    • Do we really need to publish the interface RuleChainVisitor? It seems to me that that is very implementation-specific. Ideally the Rulechain would happen transparently to users.
  • All of net.sourceforge.pmd.processor, plus SourceCodeProcessor
    • We should propose a simpler API to process some source and hide those as implementation details. Usage of MultiThreadProcessor vs MonoThreadProcessor could be hidden behind a method which specifies the number of threads, PmdThreadFactory could be hidden inside MultiThreadProcessor, etc
  • Rule interface:
    • setLanguage, setMinimumLanguageVersion, setMaximumLanguageVersion -> No rule supports a change to its language. net.sourceforge.pmd.lang.rule.ImmutableLanguage is in that sense redundant, and is implemented by every abstract rule class so could be removed
  • net.sourceforge.pmd.rules.(RuleFactory | RuleBuilder)
    • The current classes are there to help RulesetFactory. I see a use case for having a rule builder as public API (eg to build UIs), but this one takes only string arguments and is overall made to agree with the parsing done by RuleFactory, so is probably not made to be published
  • net.sourceforge.pmd.lang.ast
    • AbstractTokenManager
  • All the implementation of the XPath support
  • All of the implementations of language-specific analysis passes like symboltable, typeresolution, etc. Only user-facing things should be published API, like eg interfaces, but not abstract classes, or visitors
    • In particular, all of net.sourceforge.pmd.lang.dfa (pmd-core) should be deprecated. No user rule should depend on that framework.

pmd-java

For removal

  • In AbstractJavaRule:
    • getDeclaringType, isSuppressed, isQualifiedName, importsPackage: not used or barely used and don’t apply to all rules
    • visitAll should be internalized
  • Base rule classes (#971 Deprecate overly specific base rule classes)
    • AbstractCommentRule
      • assignCommentsToDeclarations could be made into an optional AstProcessingStage activated by comment rules instead of manually assigning comments on visit(ASTCompilationUnit)
      • The rest of the functionality can be moved to the AST, like for CommentUtilAbstractInefficientZeroCheck -> way too specific, and a better AST structure for expressions will remove the need to do something like that
    • AbstractJavaMetricsRule -> not specific to metrics, plus I think these added “visit” methods that were not on JavaParserVisitor could be moved to JavaParserVisitorAdapter directly
    • AbstractOptimizationRule -> not useful at all, the method “assigned” could be moved on VariableDeclaratorId as “isEffectivelyFinal” or something.
    • AbstractPoorMethodCall -> not generic enough, extended by only one rule
    • AbstractStatisticalJavaRule -> since we remove StatisticalRule
    • AbstractSunSecureRule
      • isField and isLocalVariable can be superseded by improvements to the symbol table
      • The rest can be moved to the rule that use it, but has no use being shared
  • net.sourceforge.pmd.lang.java.(AbstractJavaHandler | AbstractJavaParser)
    • These only have one extending class and it’s realistically not useful to abstract them

For internalization

  • net.sourceforge.pmd.lang.java.ast

pmd-javascript

For internalization

  • net.sourceforge.pmd.lang.ecmascript.ast

pmd-jsp

For internalization

  • net.sourceforge.pmd.lang.jsp.ast

pmd-plsql

For internalization

  • net.sourceforge.pmd.lang.plsql.ast

pmd-visualforce

For internalization

  • net.sourceforge.pmd.lang.vf.ast

pmd-vm

For internalization

  • net.sourceforge.pmd.lang.vm.ast

pmd-xml

For internalization

  • net.sourceforge.pmd.lang.xml.ast
⚠️ **GitHub.com Fallback** ⚠️