PMD 7.0.0 API - pmd/pmd GitHub Wiki
Main Issue: [core] Define clearly private and public API #995
- Status
- What's left to do
- Topics
- API General Design Goals
-
Description of the new APIs
- AST API
- Rule and RuleChain API
- RuleSet API
- Symbol (Table) API
- (Rule) Properties API
- Metrics API
- Report API
- Dataflow API
- Properties for Renderer/Formatter API
- Language implementations
- CPD API
- Programmatic API to call PMD and CPD and File Collector API
- Analysis Listeners
- RuleContext API
- Text Document API
- Open questions
- Deprecated APIs
- unclear
- unclear
- First step: clarify general principles (not null, logging, exceptions, assertions, internal vs. impl, ...)
- Second step: break this up into smaller pieces, that means, define separate API parts. Define package names, modules.
- Reduce published API as much as possible. The smaller the API, the more can be changed.
- Expose abstractions as interfaces, but don't publish the implementation. That allows us to change the implementations as needed. As long as we keep the interfaces.
- PMD core should depend on a set of interfaces that language implementations can implement however they want, provided they respect the contract of the interface.
- The published API should not depend on another library (like Jaxen, Antlr), otherwise
we won't be able to change/update that library. Because updating the library possibly
changes the API then.
- The exception to this rule is the
@Nullable
annotation.
- The exception to this rule is the
- API is considered NonNull unless otherwise noted.
- We use The Checker Framework with the
@Nullable
annotation to mark parameters or return types nullable.- [core] Add checker-qual dependency #1827
- Optional might be used, but it has e.g. the disadvantage, that it can't be overridden.
- See Optional vs. Nullable for details.
- Base package is always
net.sourceforge.pmd
. - Each API part must be in a separate sub package (e.g.
net.sourceforge.pmd.lang
ornet.sourceforge.pmd.core.lang
?). - The
.internal
packages are used to hide the implementations, that are not part of the public API. This is internal and can change any time. - The annotation
@InternalApi
will be used for APIs that have to live in a public API package. This is still internal and can change any time. - There are also
.impl
packages. These contains base classes of the API (like abstract classes) and default implementations that help to implement our API. The classes in the.impl
package are also published API.- Example: AbstractNode
- Package names for modules: each module (e.g. pmd-core, pmd-java, pmd-xml) should use
a different base package name, so that from the package name of a class one can conclude in which
module the source can be found. This also helps when javadoc is generated, as a package
should only be in one module.
- Example: pmd-core -> net.sourceforge.pmd.core for "very general core APIs", but e.g.
net.sourceforge.pmd.langnet.sourceforge.pmd.core.lang for language api? - Example: pmd-xml -> net.sourceforge.pmd.xml
or net.sourceforge.pmd.lang.xml??... - General rule:
net.sourceforg.pmd.<module-name without pmd- prefix>.sub-packages
- Example: pmd-core -> net.sourceforge.pmd.core for "very general core APIs", but e.g.
- Semantic Versioning:
- major: incompatible changes require a major version change
- minor: compatible changes, improvements are minor version change
- patch: bugfixes only are patch version change
- Level of support for language modules: Maintained (= supported), Incubating (= experimental),
Deprecated (= will be removed, if no one steps up).
- Removed language modules can be collected in a pmd-contrib repository, but that will remain probably unmaintained
- When would we remove a language version? According to semver, only with the next major release.
- Lifecycle of API:
- Maintained (= supported) - will be available throughout the major version
- Incubating (=
@Experimental
). It can change and might also be removed from one minor version to the next. Once the annotation is removed, it is promoted to being maintained. - Deprecated - will be removed with the next major version.
- Beginning with PMD 7, we'll use a tool to verify compatibility from one (minor) release to the next.
- Make use of
org.apache.commons.lang3.exception.ContextedRuntimeException
- Exception type hierarchy?
TODO. Here we should describe the concepts (like what is a File, a DataSource, what is the thing, that PMD outputs, probably Violations, ...)
Purpose:
Goal:
- I'd like to see more GNU-style command line options, e.g.
--long-options
an-l
(short ones). - Add
--version
option to display the version and capabilities (supported languages). See also discussion on #1913 [core] "-help" CLI option ends with status code != 0. - Same possibilities as with other ways to call PMD (via ant, maven, programmatically, ...).
E.g.
--failOnViolation
needs a corresponding option to set which rule priority should fail the build (like maven's "failurePriority"). Note, this has not to be confused with-minimumpriority
... (motivation for this from stackoverflow) - Take guidance from JEP 293: Guidelines for JDK Command-Line Tool Options.
Users:
- Issues / PRs
- [core] Error recovery mode #2885
- [core] Add error recovery mode #2901
- [core] Consolidate PMD CLI into a single command #2234
- [core] Allow CLI to take globs as parameters #1445
- [core] Allow to set additional java runtime options on CLI #898
- [core] CPD: supporting regular expression for "--exclude" option #1763
- [core] Allow to specify command line options via properties file #2087
- [doc] Document all system properties / environment variables, that are understood by PMD.
E.g. net.sourceforge.pmd.cli.noExit, net.sourceforge.pmd.cli.status, pmd.color.
Partly, this forms the interface for calling PMD programmatically.
Document
pmd.error_recovery
(see [core] Error recovery mode #2885). Note: Some are deprecated, see [core] Deprecate system properties of PMDCommandLineInterface #3299 and [core] Deprecate system properties of CPDCommandLineInterface #3835 - How to add multiple path in PMD #2571
- [core] Use PicoCli and unify PMD usage under a single main #4059
Purpose: (Tight) Integration of PMD into other tools
Goal:
Users: CLI, maven, eclipse-plugin, ant task (gradle)
Core Concepts: DataSource, Configuration, ProgressMonitor, ...
-
Loading files -> file extensions, data sources
- File Collector API
-
Processing API
- progress report
-
Multifiles Analysis
-
Error handling: We should avoid flooding users with deprecation warnings. (cross-cutting!)
-
Issues / PRs
- [core] Replace Ruleset file filters with predicates #2198
- [core] New RuleSet API and deprecations for PMD's entry point APIs #2635
- [core] Add file collector and new programmatic API for PMD #3785
- [core] Improve PmdAnalysis #3822
- [core] Fixes 1536: language file filter not applied when a command line spec contains … #120
- [core] Refrain from using File / filename internally in favor of DataSource #2004
- [xml] Please externalize the mapping of file extensions to languages as preference file #461
- [pom] Maven POM Rules don't check files named "pom.xml" #1540
- [java] Classloader not being closed after PMD run #2163
Purpose: loading, schema, ...
Goals:
- better schema validation, e.g. clearly know which attributes are mandatory/optional. Currently 2 use cases are mixed: rule definition, rule reference.
- Also xpath rules could use some syntactic sugar (currently they are just normal rule definition).
- Less noisy warning/ error handling (eg hide multiple warnings behind a "your ruleset has warnings"), e.g. avoid issues like #1794 [core] Ruleset Compatibility fails with excluding rules
- See also sf issue 1537 [core] Implement strict ruleset parsing
- Maybe allow for multiple instances of the same rule in a ruleset. USe case could be:
- Rule A with property X=Y and priority 1 (e.g. EmptyCatchBlock with allowCommentedBlocks=true)
- Rule A with property X=Z and priority 3 (e.g. EmptyCatchBlock with allowCommentedBlocks=false)
- In that case, both rules should be executed, but the reported violations have a different priority. This allows for failing the build for "serious" problems and have a report with all problems at the same time. One issue with that is of course, that the two rules overlap, so that for catch blocks with comments, two violations will be reported. We might consolidate that and just retain the violation with the highest priority.
- See comment #2006#issuecomment-547212238.
- Better output of what's going on: e.g. active rules, ignored rules due to minimum priority, ...
- See comment at #2006#issuecomment-540242439
- Currently this is logged only in debug
- Reading and writing rulesets ("RuleSetWriter")
- Make a nice Java API to get the category rulesets for a language / list built-in rules
-
getCategories()
: could be a enum, we have a fixed set of categories. - Maybe?? support different versions of the schema - would help in transition from PMD 6 to 7.
- sourceforge: #1466 [core] Support sharing properties across multiple rules
Documentation:
-
Tooling:
- Have a tool, that analysis your ruleset and tells you: Which rules are available but not used by your ruleset? Which rules are used by your ruleset, but a deprecated? That would be a interesting info when upgrading from one PMD version to the next.
- Maybe part of a ruleset editor?
- See comment at #2059#issuecomment-547274683
-
Issues
- [core] New RuleSet API and deprecations for PMD's entry point APIs #2635
- [core] Use ruleset loader in pmd 7 #2969
- [core] RulesetFactoryCompatibility stores the whole ruleset file in memory as a string #1451
- [core] Deprecate - hyphen notation for ruleset references #2352
- [java] Add references to CWE issues on applicable rules #1765 - because that affects rule set API
- [core] questions for getting all ruleset #1335
Purpose: adding custom rule, define how rules are executed (multi file analysis, multi threading concerns, etc.)
Documentation:
-
- allow to use properties in rule messages, e.g.
$0 exceeds the configured minimumLength of $minimumLength
...
- allow to use properties in rule messages, e.g.
-
Reporting Violations (rule context?)
-
i18n for (rule) violation messages
-
Violation Suppressing --> Language Implementing API
- Support for language agnostic suppression mechanism (like
@SuppressWarning('abc')
). This applies at least for Java and Apex.- See #1927 [java] Refactor annotation suppression (
AnnotationSuppressionUtil
) - Could use rule tags/category information?
- See also #1900 [java] Full @SuppressWarnings support
- See also [core] Violation suppressors #2055
- See #1927 [java] Refactor annotation suppression (
- Support for language agnostic suppression mechanism (like
-
Logging of processing errors (cross-cutting!)
-
Rule Execution Lifecycle (start, end, ...): Missing lifecycle methods in
Rule
:- Rule::beforeAllFiles
- Rule::afterAllFiles
- Rule::beforeFile (currently start())
- Rule::afterFile (currently end())
- TODO figure out parameters for these methods
-
Define state across rules / threads / files -> multi-file-analysis
- Previously theoretically possible on PMD 6 with RuleContext::setAttribute
- Could be part of a solution for multi file analysis?
-
Rule Development Lifecycle (incubating, stable, deprecated) -> https://pmd.github.io/latest/pmd_devdocs_rule_deprecation_policy.html
- we don't have incubating... maybe that is what in the old days "controversial" was for?
-
Rule Chain -> performance
-
Rule Implementation Types
- Parsed
- Java
- XPath --> XPath
- scripting language (e.g. groovy, future)
- Based on tokens?
- Files based (no parsing, plain text - e.g. regex --> document API??)
- Rule != visitor, separating rules and visitors
- Parsed
-
Autofixes support --> Language Implementing API
-
Issues
- [core] Violation suppressors #2055
- [core] Remove statistical rule (final step) #2005
- [core] Update remaining properties to use PropertyFactory #2450
- [core] Simplify the rulechain #2490
- [core] Allow abstract node types to be valid rulechain visits #1785
- [core] Deprecate RuleContext attributes #2676 (note: done on PMD 6)
- [core] Remove deprecated things about rules #2742
- [core] Discussion & implementation plan for property framework changes #1432
- [core] Apply the new PropertyDescriptor type where applicable #1027
- [core] Report unused violation suppression comments/annotations #2074 - because we need to support it in the APIs
- [core] Update remaining properties to use PropertyFactory #2450
- [core] Allow numeric properties to be within an unbounded range #1204
-
Renderers
-
Logging of processing errors (cross-cutting!)
-
Properties (configurability of renderers)
-
Proper handling of charset/encoding
-
How to report violations?
-
How to report metrics?
-
Encoding for all renderers. Currently only XMLRenderer supports this... See also [core] PMD/CPD produces invalid XML (insufficient escaping/wrong encoding). Experimental API
Renderer::setReportFile
. -
Report API - see below "description of new APIs"
-
Issues
- [core] Simplify Report API #1962 (note: done on PMD 6 already)
- [core] Deprecate report methods #2680 (note: done on PMD 6 already)
- [core] Deprecate ruleviolation comparator #2741
- [core] Render config errors #194
- [core] PMD must display number of rules contravened or errors found #1995
TODO: This needs to split up into smaller chunks "Language Implementing API" is too big.
- AST API
- Users: rule developers
- Note: the changes here affect all languages...
- AST, Node, NodeStream, DataMap, @Text (textual representation of a node)
- What methods do the super classes (
AbstractNode
, ...) provide? Maybe we don't need a abstract node at all, see below. - TODO: Describe DataMap purpose and use cases
- Consider adding something like
T getData(DataKey<T> key)
to the node interface, whereDataKey
just ensures the visible API is strongly typed. - In fact, the designer already has something like that: DataHolder. Nodes should most probably contain a DataHolder of the sort.
- That's a replacement for the current
getUserData
. - This makes the AST "extensible" externally by our internal subsystems, by rules, and by external tools (eg designer)
- Consider adding something like
- What methods do the super classes (
- Guidelines for AST nodes:
- should hide what's not relevant to someone examining the AST.
- constructors and setters for the AST nodes should be package private. These are called only by the parser.
- concrete node classes should be final
- base classes should be hidden (and not public): no obscure public base classes, with which a node can have 10 possible types.
- Nodes should have as shallow a class hierarchy as possible. Class hierarchies like those
of
AbstractJavaAccessTypeNode
orMethodLikeNode
and co are inflexible and too specific.LambdaExpression
has a methodisStrictFp
because of that. I don't mind about interface hierarchies with default methods, but nodes shouldn't. - deprecate
Node.getImage()
- it is too generic. In each node, it has a different meaning (e.g. variable name, method name, ...).- Note: Designer Bindings -> getMainAttribute
- also deprecate
Node.hasImageEqualTo(...)
- PLSQL has e.g. getCanonicalImage etc.
- More useful AST navigation: e.g. a if statement has a condition, so there should be a
getCondition()
which returns the correct child.- should be high-level.
n.getLastChild()
is more intuitive thann.jjtGetChild(n.jjtGetNumChildren() - 1)
,klass.getSimpleName()
is better thanklass.getImage()
. - no implemementation specific methods like
jjt*...
.
- should be high-level.
- When to implement Iterator?
- Nodes should expose the accessors that are relevant to them and only those. Methods
like
ASTThrowStatement#getFirstClassOrInterfaceTypeImage
are way too specific, and are perhaps used by one rule in the codebase. - See also
@NoAttribute
--> XPath - Javadoc: BNF snippets should describe all the possible states a node can have when it ends up
in a rule. That's to avoid the rule developer finding out about an edge case in the wild.
That's also for us to consider all edge cases when making the API, without having to spend
our time in the grammar file. bnf snippets for abstract types enumerate the possible subtypes.
- The designer uses javadoc from AST nodes. So all AST nodes should provide a small BNF
snippet of the grammar marked with
<pre class="grammar">...</pre>
. It might also provide docs for XPath attributes of the current AST node.
- The designer uses javadoc from AST nodes. So all AST nodes should provide a small BNF
snippet of the grammar marked with
- should be well documented. You shouldn't have to look at the implementation to find out what a method does.
- The jjtree generated comment ("// generate by jjtree - do not change...") must be removed.
- Naming: Similar nodes should have a common suffix, like "Expr", "Type", "Name", ...
- Override
getChild()
to return the specific child type, if it can be only one type. No need for casting then. -
AbstractJavaNode
overridesgetBeginLine()
/getBeginColumn()
, taking the position from the first token ... does this make sense for other languages too? Should we have something likegetPosition()
on the nodes? (to not have 4 methods to read the position... beginLine, beginColumn, endLine, endColumn). The position could also have convenience methods like "length", "offset".- AbstractNode is becoming completely irrelevant, each language implementation is only using a different subset of the fields.
- Many methods can be provided by a default implementation in the Node interface
- The "id" of a node is not needed (was only used to get the XPath node name from jjtree)
- text positions (beginLine, beginColumn, etc.) might not be needed, if we can always delegate to a token
- Debugging support: toString() of nodes...
- --> These guidelines should end up in a custom PMD rule to enforce this.
- Designer Bindings / Tree Dumper
- Autofix/Quickfix -> Document API
- AST Manipulation methods
- Needed for languages, that??? maybe tree builder
- AbstractNode::addChild, insertChild, setChildren
- Concepts: TextAvailableNode, RootNode, ... GenericNode?, ... TextCoordinates/Position/...,
- Uniform handling of comments and whitespaces
- comments should be nodes?
- From #2447: In pmd-java, Comment now extends AbstractJjtreeNode, it's technically a separate tree: the parent of a comment cannot be a JavaNode, the child of a JavaNode cannot be a Comment. Thankfully this is now enforced at compile time. This needs some more thought, I don't know if this should stay or go, or what relation this would have to a "javadoc language module".
- should we keep whitespace as special nodes as well?
- Maybe Parser Options? -> parser options will be removed (#2675, #2602)
- comments should be nodes?
- Clarify root node: Consider introducing a real "RootNode" kind of node type. In Java we have now
(in PMD 7 java-grammar) both
a
ASTCompilationUnit
and agetRootNode()
method. Apex has multiple root nodes (see #1937 [apex] Apex should only have a single RootNode. In PLSQL, I've implemented a way to get hold of the complete source code of the analyzed file. In that case, this root node might be a file node "FileNode" (however, the analyzed source code does not need to originate from files only, it could also be downloaded from the web or from a database).- Each language should have only one node type, that serves a a root node.
- It might be a "virtual"/"artificial" node like "CompilationUnit". Maybe the name should be something like "CodeUnit", "AnalyzedCodeUnit". It is the "thing" on which PMD operates, the "DataSource".
- Need a core concept for the thing, that PMD operates on - ASTRoot?
- The Parser returns always a RootNode rather than a Node:
net.sourceforge.pmd.lang.Parser::parse
- Done with #2447
- AST Visitors - generic visitors
- Details [core] Add generic visitor interface in pmd-core #2589 and [core] Make visitors generic #880
AstVisitor<P, R>
- In nodes:
<P, R> R acceptVisitor(AstVisitor,P)
- The node method
acceptVisitor
in the specific node classes should have visibility protected
- Parser + Parser Options
- parser options will be removed (#2675, #2602)
- Possible alternative: [core] Language properties
- Metrics
- Would be useful to report metrics somehow. Use case: https://github.com/jenkinsci/metrics-aggregation-plugin
- Language Module
- Make it easy to implement a new language
- Remove duplication!
- Support for both CPD and PMD
- Symbol Table
- there are some helper classes/API in the core package
net.sourceforge.pmd.lang.symboltable
. - [java] Deprecate old symbol table, add replacement for TypeHelper #2721
- there are some helper classes/API in the core package
- ❌ Dataflow (Deprecated on PMD 6, Remove with PMD 7)
- This seems to be java specific for now
- There might be some usages in plsql... (but there are no rules, that use it)
- I'd rather make this one a java specific API and only later a core api
- That means deprecating the API on PMD6
- Note: The only rule for Java DataflowAnomalyAnalysis has been deprecated #2647 and removed in PMD 7
- [core] Deprecate DFA #2681
- [core] Remove DFA #2696
- Mixing Languages (jsp + html + javascript, apex + visualforce, java + javadoc, ...)
- Logging of processing errors (cross-cutting!)
- Semantic Error vs. Parsing Error
- Grammar
- javacc
- antlr
- decorater / treebuilder
- Custom Violation Suppressors (see #2055) --> Rule API
- Designer Bindings (see #2118) --> Debugging Support
- "Document Manipulation API" -
net.sourceforge.pmd.document
. API added when working on quick fixes/autofixes.-
getText()
- Bridge between AST and the underlying text - see [java] Make nodes have access to their text #2166- Unicode escapes are not handled gracefully. This needs to be fixed - discovered in #2166
- Text regions
-
- internal implementation of source processing / source processing stages
- AST traversals, processing stages
- Analysis Context? Might be helpful with #1371 [RFC] [POC] - report missing classes
- Language instances should only be retrieved queried from
LanguageRegistry
. LanguageRegistry could be simplified, eg it's very error-prone to have to rely on strings, and finding the right type of "name" for a language. (findLanguageByTerseName, findLanguage, ...)- I'd say we should publish an enum of standard language ids, and have a method
LanguageRegistry#findByKey
. E.g.
- I'd say we should publish an enum of standard language ids, and have a method
// if java is not on the classpath, returns null.
Language java = LanguageRegistry.findByKey(StandardLanguages.JAVA);
- Instances of interfaces like LanguageVersionHandler, Parser and stuff should only be retrieved from the corresponding Language instance. Eg.
Parser javaParser = LanguageRegistry.findByKey(StandardLanguages.JAVA).getParser();
// But NOT
JavaParser j = new JavaParser();
- That decouples client code from the implementation classes, which is the whole point of publishing interfaces. The implementation classes can be made internal and changed at our discretion.
- LanguageVersion should be turned an interface. The current ordering behaviour is broken (refs #1603) and language implementations should be able to implement it at their own discretion. Language versions could for example be implemented in some enums to get a consistent ordering.
- The problem of relying on strings is also true for language versions. We could eg publish enums in language modules that represent standard language versions. E.g. in the Java module:
// usage if you have a compile dependency on pmd-java
// clients of just pmd-core should still use the Language instance directly
LanguageVersion java12 = JavaVersions.JAVA_12.get();
// published Api of pmd-java
// the class or enum implementing LanguageVersion is internal
enum JavaVersion {
// ...
;
private String myName; // found out by constructor
public LanguageVersion get() {
LanguageRegistry.findByKey(StandardLanguages.JAVA).getVersion(myName);
}
}
-
Language Properties (see [core] Language properties #2518): Use cases are:
- auxclasspath (only used by java, right now a "global" property)
- suppress marker (currently a global property)
- filename extensions / filename pattern
- tab size (Configurable tab size #292)
-
Issues
- [core] Add wildcards to node stream API #2405
- [core] Generify GenericToken, cleanup Token managers & CPD #2371
- [core] Simplify metrics framework (for PMD 7) #2292
- [core] Cleanup language version handlers #2248
- [core] Merge JavaCC build scripts #2211
- [core] Things missing in the node stream API #2220
- [core] NodeStream API #1622
- [core] Abstract away optional AST traversals (first step) #1426
- [core] Wire processing stages into SourceCodeProcessor #1796
- [core] Antlr visitor rules #1774
- [xml] Unimplement org.w3c.dom.Node from the XmlNodeWrapper #1800
- [java] Make nodes have access to their text #2166
- [core] Deprecate jjtree methods from the Node interface #2172
- [java] Use default method on generated visitor interfaces #1446
- [core] Cleanup Node and AbstractNode #2447
- [core] Revamp the language version alias APIs #2486
- [apex] Apex should only have a single RootNode #1937
- [core] Add generic visitor interface in pmd-core #2589
- [core] Bug with node stream drop #2664
- [core] Deprecate parser options #2675
- [java] Rename visitor methods to reduce overloading #2706
- [javascript] Generic visitor for JS #2707
- [apex] Make apex visitor generic #2661
- [core, ...] Finish generic visitors #2746
- [core] Make visitors generic #880
- [javascript] Remove JS parser options #2837
- [xml] Remove xml dom rule and parser options #2971
- [core] Remove ParserOptions #2602
- [core] Improve Parser interface #3085
- [RFC] [POC] - report missing classes #1371
- [core] Language version comparison #1603
- [core]
isFindBoundary
should not be an attribute #2218 - [core] Support configurable tabsize for AST node position in reports #292
- [core] RFC: Analyzing embedded snippets from other languages #237
- [core] DFA hits a hard limit of 6 iterations without results and logs #873
- [cpd][core] Add displaying total number of tokens per file #1569
- [core] Conflicting contracts of getChild between PMD & Antlr #2235
- [core] Autofixes feature development #693
- [core] Merging Javacc build scripts #2239
- [java] Generate visitor delegation logic #1786
- [core] Language properties
- Issues / PRs
-
XML schema of AST
-
@NoAttribute extension
-
Logging of processing errors (cross-cutting!)
-
Ideally, hide all Saxon-specific stuff from our published API, publish a set of implementation-independent interfaces instead.
-
Turn XPathRule into an interface, make each LanguageVersionHandler return an XPathRuleBuilder so that the actual implementation can be language-specific
- Made possible by the Schema change for XPath rule defs
-
Issues
-
Designer Bindings --> Language Implementing API
-
Tree Dumper
-
toString of Nodes...
-
Test framework (pmd-test, pmd-lang-test)
- Kotlin DSL for node/AST tests
- Rule tests
- Tree dumper based AST tests
- rule test schema
-
Benchmark
- The old code has been reworked with PMD 6.4.0 ([core] Rework benchmarking code #1075)
- Has been removed with [core] Remove deprecated things #2672
- Still need docs on how to use it for rule benchmarking of custom rules.
-
Exceptions and error handling
- PMDException yes/no
- Checked or Unchecked
- ContextedRuntimeException -> unchecked...
- PmdXPathException -> [java] New Java XPath functions #2917
-
Issues
Stuff the doesn't fit into the categories above.
- Source Code Minimizer (-> https://github.com/pmd/pmd-scm). Different use case.
- [all] Breaking API changes for 7.0.0 #881
TODO
Package: net.sourceforge.pmd.lang.ast
Implementation: #1622 [core] NodeStream API
In java rule implementations, you often need to navigate the AST to find the interesting nodes. In PMD 6, this
was often done by calling jjtGetChild(int)
or jjtGetParent(int)
and then checking the node type
with instanceof
. There are also helper methods available, like getFirstChildOfType(Class)
or
findDescendantsOfType(Class)
. These methods might return null
and you need to check this for every
level.
Therefore, the NodeStream API provides easy to use methods that follow the Java Stream API (java.util.stream
).
plantuml source
interface Node {
NodeStream<Node> asStream()
NodeStream<Node> children()
NodeStream<Node> descendants()
NodeStream<Node> descendantsOrSelf()
NodeStream<Node> ancestors()
NodeStream<Node> ancestorsOrSelf()
NodeStream<R> children(Class<R> rClass)
NodeStream<R> descendants(Class<R> rClass)
NodeStream<R> ancestors(Class<R> rClass)
}
interface NodeStream {
Stream<@NonNull T> toStream()
NodeStream<R> flatMap(Function<? super T, ? extends @Nullable NodeStream<? extends R>> mapper)
NodeStream<R> map(Function<? super T, ? extends @Nullable R> mapper)
NodeStream<T> filter(Predicate<? super @NonNull T> predicate)
NodeStream<T> peek(Consumer<? super @NonNull T> action)
NodeStream<R> forkJoin(Function<..> first, Function<..> second, Function<..> ... rest)
NodeStream<T> append(NodeStream<? extends T> right)
NodeStream<T> prepend(NodeStream<? extends T> right)
NodeStream<T> cached()
NodeStream<T> take(int maxSize)
NodeStream<T> drop(int n)
NodeStream<T> dropLast(int n)
NodeStream<T> takeWhile(Predicate<? super T> predicate)
NodeStream<Node> ancestors()
NodeStream<Node> ancestorsOrSelf()
NodeStream<Node> parents()
NodeStream<Node> children()
NodeStream<Node> descendants()
NodeStream<Node> descendantsOrSelf()
NodeStream<Node> followingSiblings()
NodeStream<Node> precedingSiblings()
NodeStream<R> children(Class<R> rClass)
NodeStream<R> descendants(Class<R> rClass)
NodeStream<R> ancestors(Class<R> rClass)
NodeStream<T> filterNot(Predicate<? super @NonNull T> predicate)
NodeStream<T> filterMatching(Function<? super @NonNull T, ? extends @Nullable U> extractor, U comparand)
NodeStream<R> filterIs(Class<R> rClass)
NodeStream<T> filterNotMatching(Function<? super @NonNull T, ? extends @Nullable U> extractor, U comparand)
void forEach(Consumer<? super @NonNull T> action)
R reduce(R identity, BiFunction<? super R, ? super T, ? extends R> accumulate)
int sumBy(ToIntFunction<? super T> toInt)
int count()
boolean nonEmpty()
boolean isEmpty()
boolean any(Predicate<? super T> predicate)
boolean none(Predicate<? super T> predicate)
boolean all(Predicate<? super T> predicate)
T get(int n)
T first()
Optional<T> firstOpt()
T first(Predicate<? super T> predicate)
R first(Class<R> rClass)
T last()
R last(Class<R> rClass)
R collect(Collector<? super T, A, R> collector)
List<T> toList()
List<R> toList(Function<? super T, ? extends R> mapper)
NodeStream<T> of(@Nullable T node)
NodeStream<T> fromIterable(Iterable<@Nullable T> iterable)
NodeStream<T> ofOptional(Optional<T> optNode)
NodeStream<T> of(T... nodes)
NodeStream<T> union(NodeStream<? extends T>... streams)
NodeStream<T> union(Iterable<? extends NodeStream<? extends T>> streams)
NodeStream<T> empty()
}
Many complex predicates about nodes can be expressed by testing the emptiness of a node stream.
E.g. the following tests if the node is a variable declarator id initialized to the value 0
:
Example:
NodeStream.of(someNode) // the stream here is empty if the node is null
.filterIs(ASTVariableDeclaratorId.class)// the stream here is empty if the node was not a variable declarator id
.followingSiblings() // the stream here contains only the siblings, not the original node
.filterIs(ASTVariableInitializer.class)
.children(ASTExpression.class)
.children(ASTPrimaryExpression.class)
.children(ASTPrimaryPrefix.class)
.children(ASTLiteral.class)
.filterMatching(Node::getImage, "0")
.filterNot(ASTLiteral::isStringLiteral)
.nonEmpty(); // If the stream is non empty here, then all the pipeline matched
Package: net.sourceforge.pmd.lang.ast
Implementation: see #2172 [core] Deprecate jjtree methods from the Node interface
The naming of the methods jjt*
was bad, since these methods indicate, they are from javacc, while we as PMD own
these methods and javacc is a implementation detail.
The purpose of the Node API is
- Tree traversal
- XPath support
- Location metadata
Navigate the AST. This is an alternative way to NodeStream API for convenience.
Node getParent()
Node getChild(int index)
int getNumChildren()
int getIndexInParent()
Iterable<? extends Node> children()
Node getNthParent(int n)
T getFirstParentOfType(Class<T> parentType)
List<T> getParentsOfType(Class<T> parentType)
T getFirstParentOfAnyType(Class<? extends T>... parentType)
T getFirstChildOfType(Class<T> childType)
List<T> findChildrenOfType(Class<T> childType)
T getFirstDescendantOfType(Class<T> descendantType)
List<T> findDescendantsOfType(Class<T> targetType)
List<T> findDescendantsOfType(Class<T> targetType, boolean crossFindBoundaries)
boolean hasDescendantOfType(Class<T> type)
Describes nodes in a form understandable by XPath expressions.
String getXPathNodeName()
Iterator<Attribute> getXPathAttributesIterator()
Provides access to the location of the node in the source code.
int getBeginLine()
int getBeginColumn()
int getEndLine()
int getEndColumn()
TODO: This should be changed to provide higher level access to location metadata, e.g. Location getLocation()
with location having access to line-based-location and offset-based-location.
See also "Document API" (net.sourceforge.pmd.document
) and RegionByLine/RegionByOffset.
TODO...
See [swift] Refactor antlr implementation #2463 and [core] Add generic visitor interface in pmd-core #2589.
-
net.sourceforge.pmd.lang.ast.AstVisitor
andnet.sourceforge.pmd.lang.ast.AstVisitorBase
Parser-generator-agnostic visitors to be used by both JavaCC and Antlr. - The main Node interface
net.sourceforge.pmd.lang.ast.Node
has the appropriateacceptVisitor
method.
-
Rule::apply
is the only entry point for AST processing for rules, regardless of their type/impl (XPath, Java, rulechain or not) - How to use rulechain for your own rule?
- The default behavior for
Rule::buildTargetSelector()
isRuleTargetSelector.forRootOnly()
, which means, that no rule chain is used, AST traversal starts at the root. - In order to use rulechain, rule impls need to provide
RuleTargetSelector
viaRule::buildTargetSelector()
- PMD will use
Rule::getTargetSelector()
...
- The default behavior for
Package: net.sourceforge.pmd.core
Purpose: Loading and parsing ruleset files, creating rulesets from rules.
Replaces: RuleSetFactory
Important classes: RuleSetLoader
, RuleSetLoadException
net.sourceforge.pmd.lang.java.symbols
TODO...
TODO...
- Caching is now on the nodes, this is implemented on master. Subsequent API removal has been done on 7.0.x.
- It's possible that the framework will be further changed, for example to stop limiting each language to 2 node types for metrics, which is an artificial limitation. See https://github.com/pmd/pmd/pull/2292#discussion_r396693472
Package: net.sourceforge.pmd
Implementation: partly: [core] Deprecate report methods #2680
plantuml source
class Report {
void addConfigError(ConfigurationError e)
void addError(ProcessingError e)
void addRuleViolation(RuleViolation rv)
List<ConfigurationError> getConfigurationErrors()
List<ProcessingError> getProcessingErrors()
List<RuleViolation> getViolations()
List<SuppressedViolation> getSuppressViolations()
void merge(Report other)
void addListener(ThreadSafeReportListener l)
void addListeners(List<ThreadSafeReportListener> ll)
List<ThreadSafeReportListener> getListeners()
}
class ConfigurationError {
String issue()
Rule rule()
}
class ProcessingError {
String getDetail()
Throwable getError()
String getFile()
String getMsg()
}
class SuppressedViolation {
RuleViolation getRuleViolation()
String getUserMessage()
ViolationSuppressor getSuppressor()
}
interface ViolationSuppressor {
String getId()
SuppressedViolation suppressOrNull(RuleViolation rv, @NonNull Node node)
}
- ⚠ This has been removed with PMD 7, so there is no language agnostic dataflow support anymore for now.
TODO...
-
PMD publishes an API to implement new languages
-
For a package
p
for a feature (eg rule violations):- published API are in
p
- This should be enough to understand what the feature does while browsing javadocs
- eg the interface RuleViolation, ViolationSuppressor
- implementation classes that we publish and support for language implementations to use are in
p.impl
- these are classes that are irrelevant to understanding the API, but ease implementation/ share behaviour
- eg the DefaultRuleViolationFactory, the RuleViolationFactory interface (if we need it anyway)
- internal API, that
p.impl
is built upon, or that we don't want to publish yet are inp.internal
- published API are in
-
PMD should offer minimal extension points
- Prefer composition to inheritance (ie prefer aggregating many small, orthogonal services instead of extending a big base class)
- For example, providing custom ViolationSuppressors and ViolationDecorators is better than extending
RuleViolationFactory
andRuleViolation
-
The only service that is truly required for a PMD language implementation is a Parser
- The Parser interface should be simplified to just
RootNode parse(DataSource source, LanguageVersion version);
- Many Parsers don't expose a tokenizer, and that's only relevant for CPD, so should be another extension point
- The Parser interface should be simplified to just
-
No ParserOptions anymore, instead LanguageProperties (#2518).
-
A
RootNode
has aAstInfo
object, which provides access to- the current parser task, and through that access to the file that is being parsed/has been parsed.
- the ast itself
- suppressions
- see #3085
- Tokens, TokenEntry, ...
- ⚠: End Columns of Tokens are exclusive on PMD 7, but inclusive on PMD 6.x. See https://github.com/pmd/pmd/commit/5b7ed588ea15961e5909f7c24fca1c4cbb914a52
- This change might be worth documenting on [doc] Documentations improvements tracker #1139
Purpose: API that is used internally by the CLI part of PMD. It selects the files to be analyzed and provides
a way to start the analysis using the new PmdAnalysis
entry point.
Changed by: [core] Add file collector and new programmatic API for PMD #3785 and [x] [core] Improve PmdAnalysis #3822
Replaces: It replaces the old main entry point net.sourceforg.pmd.PMD.processFiles()
Example usage:
PMDConfiguration config = new PMDConfiguration();
config.setDefaultLanguageVersion(LanguageRegistry.findLanguageByTerseName("java").getVersion("11"));
config.setInputPaths("src/main/java");
config.prependClasspath("target/classes");
config.setMinimumPriority(RulePriority.HIGH);
config.addRuleSet("rulesets/java/quickstart.xml");
config.setReportFormat("xml");
config.setReportFile("target/pmd-report.xml");
try (PmdAnalysis pmd = PmdAnalysis.create(config)) {
// optional: add more rulesets
pmd.addRuleSet(pmd.newRuleSetLoader().loadFromResource("custom-ruleset.xml"));
// optional: add more files
pmd.files().addFile(FileSystems.getDefault().getPath("src", "main", "more-java", "ExtraSource.java"));
// optional: add more renderers
pmd.addRenderer(renderer);
// or just call PMD
pmd.performAnalysis();
}
File Collector API:
FileCollector PmdAnalysis.files()
-
net.sourceforge.pmd.lang.document.FileCollector
close()
getLog()
- collection
addFile(Path file)
addFile(Path file, Language language)
addFile(TextFile textFile)
addSourceFile(String sourceContents, String pathId)
addDirectory(Path dir)
addFileOrDirectory(Path file)
addZipFile(Path zipFile)
- configuration
setCharset(Charset charset)
relativizeWith(String prefix)
- filtering
exclude(FileCollector excludeCollector)
filterLanguages(Set<Language> languages)
Note: This is probably outdated!
-
General observations
- RuleContext is an internal detail, one should not have to provide it themselves
- RuleContext is stateful but doesn't need to be (language version + file name are properties of the nodes)
- SourceCodeProcessor is a mess (+ we have PMDTask, AbstractPMDProcessor and its implementations), it's really difficult to know how to launch PMD
- Most of the times a user doesn't need a Report, the renderer could just append a violation to the output stream
-
PMD at its core boils down to a function
(files, rules) -> violations
-
Of course as input it also takes more configuration, eg number of threads wanted, classloader, language versions to use etc
-
On a higher level we can also ignore the "return type" of the function type above, and instead use listeners to handle violations as they come. We already have this concept: ThreadSafeReportListener
Consider enriching this interface and making it more central to the API, eg:
// this listens to an analysis
// example implementations:
// * appending to a Report
// * forwarding to a Renderer
// * teeing to several listeners
// * synchronized wrapper (makes a listener thread safe)
interface PmdAnalysisListener {
void handleViolation(RuleViolation)
void handleSuppressedViolation(SuppressedViolation)
void handleError(ProcessingError)
}
// now rulecontext doesn't store language version or file name
// it's just the API to create violations and forward to the listener
class RuleContext {
// this listener creates the result of the analysis
PmdAnalysisListener listener;
void addViolation(node, message, messageArgs) {
var v = createViolation(node, message, messageArgs);
var suppressed = node.getLanguage().getViolationSuppressor().suppressOrNull(v);
if (suppressed != null)
listener.handleSuppressedViolation(suppressed)
else
listener.handleViolation(v);
}
// + overloads for convenience
}
A rule implementation could look like:
class SomeRule {
void visit(Node n, RuleContext ctx) { // (we don't have to keep the parameters as Object, would be safer if it was RuleContext).
if (n is violation)
ctx.addViolation(n, ...)
}
}
Calling PMD could look like:
var report = new Report()
var renderer = new HtmlRenderer()
var files = PmdFiles.collectFromDirs(dirnames)
var ruleset = RuleSetFactory.parse(url)
var listener = PmdAnalysisListener.tee(report.listener(), renderer.listener())
var extraConfig = ExtraConfig.DEFAULT.withThreads(4)
.withClasspath(someClassPath)
.useLanguageVersion("java", "8")
.addExtension("xml", ".fxml")
// logging options, etc
PMD.call(files, ruleset, listener, extraConfig)
// when this returns the result is both the output of the renderer and the stuff in the Report
void callPmd(files, ruleset, listener, extraConfig) {
var rc = new RuleContext(PmdAnalysisListener.threadSafe(listener))
var buckets = partitionFiles(files, extraConfig.numThreads)
var tasks = buckets.map { bucket -> new PMDTask(bucket, rc, ruleset.deepCopy()) }
startAll(tasks)
join(tasks)
}
-
ParsingOptions is useless
- Used atm for suppressMarker
- in Javascript for Rhino Version -> convert to LanguageVersion
- in XML for XML parser options -> figure out common denominator
-
Deprecate
Rule::getParserOptions
-
Parser::parse
could use much more data- Suppress marker
- LanguageVersion
- TextDocument(filename + Reader + line mapper, etc)
- Semantic logger
-
semanticError(node, “....”)
/warning/etc -
benchmark(“...”, () -> { /*action*/ })
-> this is public api for language impls
-
- -> wrap all this into a ParsingTask or something
-
Return a RootNode which knows about its TextDocument, LanguageVersion, etc
-
Replace processing stages
class TreeIdx {
RootNode root;
Map<String, List<Node>> byName;
Map<Class, List<Node>> byClass;
}
class RuleNodeSelector {
abstract List<Node> selectInTree(TreeIdx);
class StringSelector extends ... {
List<Node> selectInTree(TreeIdx) {
idx.byName().get(“PrimaryExpr”);
}
}
class RootSelector extends ... { }
}
// Replaces Rule::getRulechainVisits
Rule::getSelector(): RuleNodeSelector
// Replaces Rule::apply
Rule::apply(Node, RuleContext) { }
// Reimplement Ruleset::apply
RuleSet::apply(RootNode n, RuleContext ctx) {
TreeIdx idx = indexTree(n);
this.foreach { rule ->
List<Node> todo = rule.getSelector().selectInTree(idx)
todo.foreach { node ->
try {
rule.appy(node, ctx);
} catch (_) {
ctx.handleError(_)
}
}
}
}
See [core] Analysis listeners #3692.
Abstractions
- GlobalAnalysisListener - main listener for the entry point PmdAnalysis. This listener is queried for FileAnalysisListeners during the analysis.
- FileAnalysisListener - used during the analysis of a single file
Use cases
- Violations and errors are reported through these analysis listeners.
- No more using
Report
to transport the found violations by default. It's still possible, though, to get a report at the end with all violations. - Potentially interesting for IDE integrations.
Relationship to Renderers
Renderers receive their violations through these analysis listeners.
Relationship to RuleContext
RuleContext is not global to the analysis run anymore, but it only provides an API for rules to report violations.
Purpose: API for rules to notify about found violations. It filters suppressed violations.
Changed by: [core] Analysis listeners #3692
RuleContext {
void addViolation(Node location); // with default rule message
void addViolation(Node location, Object... formatArgs); // with default rule message + message args
void addViolationWithMessage(Node location, String message);
void addViolationWithMessage(Node location, String message, Object... formatArgs);
void addViolationWithPosition(Node location, int beginLine, int endLine, String message, Object... formatArgs);
void addViolationNoSuppress(RuleViolation rv); // ???
}
Package: net.sourceforge.pmd.lang.document
Purpose: Unifying CPD/PMD handling, support modifying files (for autofix)
Changed by:
Description:
-
Chars: a string slice type. Supports efficient string-like operations without copying the char array of the file.
- It is a
CharSequence
- constructors
wrap(CharSequence chars)
- methods
writeFully(@NonNull Writer writer)
write(@NonNull Writer writer, int start, int count)
getChars(int srcBegin, char @NonNull [] cbuf, int dstBegin, int count)
appendChars(StringBuilder sb, int off, int len)
appendChars(StringBuilder sb)
ByteBuffer getBytes(Charset charset)
int indexOf(String searched, int fromIndex)
int indexOf(int ch, int fromIndex)
int lastIndexOf(int ch, int fromIndex)
boolean startsWith(String prefix, int fromIndex)
boolean startsWith(String prefix)
boolean startsWith(char prefix, int fromIndex)
boolean endsWith(String suffix)
Chars trimStart()
Chars trimEnd()
Chars trim()
Chars trimBlankLines()
Chars removeSuffix(String charSeq)
boolean contentEquals(CharSequence cs, boolean ignoreCase)
boolean contentEquals(CharSequence cs)
Chars subSequence(int start, int end)
Chars subSequence(int start)
Chars slice(TextRegion region)
Chars slice(int off, int len)
String substring(int start, int end)
Iterable<Chars> lines()
Stream<Chars> lineStream()
Reader newReader()
- It is a
-
TextFile
- TextFileContent
- backing content of a TextFile
- Uses Chars
- Line terminators are normalized to
\n
- Reading the file computes a checksum which is used by incremental analysis later
- TextFile replaces DataSource everywhere in the internals of PMD
- implementations: StringTextFile, NioTextFile, ReaderTextFile
- factory methods:
static TextFile forPath(Path path, Charset charset, LanguageVersion languageVersion)
static TextFileBuilder builderForPath(Path path, Charset charset, LanguageVersion languageVersion)
static TextFile forCharSeq(CharSequence charseq, String pathId, LanguageVersion languageVersion)
static TextFileBuilder builderForCharSeq(CharSequence charseq, String pathId, LanguageVersion languageVersion)
static TextFile forReader(Reader reader, String pathId, LanguageVersion languageVersion)
static TextFileBuilder builderForReader(Reader reader, String pathId, LanguageVersion languageVersion)
- methods:
LanguageVersion getLanguageVersion()
String getDisplayName()
TextFileContent readContents()
boolean isReadOnly()
void writeContents(TextFileContent content)
- TextFileBuilder
- methods:
TextFileBuilder withDisplayName(@Nullable String displayName)
TextFile build()
- methods:
- TextFileContent
-
TextDocument, built on top of TextFile
- interface. implementations: RootTextDocument
- Replaces CPD's SourceCode
- Provides a view of a TextFile, potentially only selecting part of the whole file.
- Addressing by offset/length (TextRange) in the view is relative to the view
- Addressing by line/column (toLocation()) is absolute, relative to the underlying file.
- Since it is only a view of a TextFile, it can provide (temporary) modifications
- Tree:
- at the root a RootTextDocument with the original TextFile. Coordinate system based on line+column, absolute in the original TextFile
- one or more views, which potentially access only part of the whole original TextFile.
- Backed by a TextFile
- factories:
static TextDocument create(TextFile textFile)
static TextDocument readOnlyString(final CharSequence source, LanguageVersion lv)
static TextDocument readOnlyString(@NonNull CharSequence source, @NonNull String filename, @NonNull LanguageVersion lv)
- methods:
FileLocation toLocation(TextRegion region)
LanguageVersion getLanguageVersion()
String getPathId()
String getDisplayName()
Chars getText()
Chars sliceText(TextRegion region)
long getCheckSum()
Reader newReader()
int getLength()
TextRegion getEntireRegion()
TextPos2d lineColumnAtOffset(int offset, boolean inclusive)
-
Position / locations in files
- TextPos2d - wraps a line and column tuple. line+column
- TextRange2d - wraps a tuple of TextPos2 for start and end positions. Also line+column based.
- FileLocation
- This represents the line/column coordinates of a place in a TextDocument. Used for reporting.
- wraps: filename+textrange2d
- factory via:
TextDocument.toLocation(TextRegion)
static FileLocation range(String fileName, TextRange2d range2d)
TextPos2d getStartPos()
TextPos2d getEndPos()
TextRange2d toRange2d()
- TextRegion
- This represents a contiguous range of text in a TextDocument with offset/length, which is useful for tokens and for TextAvailableNodes in general.
- In contrast to "*2d" - this uses offset+length
- constructors:
fromOffsetLength(int startOffset, int length)
fromBothOffsets(int startOffset, int endOffset)
caretAt(int startOffset)
- methods:
int getStartOffset()
int getEndOffset()
int getLength()
boolean isEmpty()
boolean contains(int offset)
boolean contains(TextRegion other)
boolean overlaps(TextRegion other)
TextRegion growLeft(int delta)
TextRegion growRight(int delta)
- static methods:
@Nullable TextRegion intersect(TextRegion r1, TextRegion r2)
TextRegion union(TextRegion r1, TextRegion r2)
- constructors:
-
Reportable
- package:
net.sourceforge.pmd.reporting
- This is an interface for anything that can provide a FileLocation (Nodes and Tokens currently)
- Implemented by Nodes and Tokens - on these RuleViolations can be reported
- methods:
FileLocation getReportLocation()
- package:
plantuml source
@startuml
interface TextDocument {
LanguageVersion getLanguageVersion()
String getPathId()
String getDisplayName()
Chars getText()
Chars sliceText(TextRegion region)
long getCheckSum()
Reader newReader()
FileLocation toLocation(TextRegion region)
}
~class RootTextDocument implements TextDocument {
TextFile backend
TextFileContent content
LanguageVersion langVersion
String fileName
String pathId
}
interface TextFile {
{abstract} LanguageVersion getLanguageVersion()
{abstract} String getDisplayName()
{abstract} TextFileContent readContents()
{abstract} boolean isReadOnly()
{abstract} void writeContents(TextFileContent content)
}
~class NioTextFile implements TextFile {
Path path
Charset charset
LanguageVersion languageVersion
String displayName
boolean readOnly
}
~class StringTextFile implements TextFile {
TextFileContent content
String name
LanguageVersion languageVersion
}
~class ReaderTextFile implements TextFile {
String name
LanguageVersion languageVersion
Reader reader
}
abstract class TextFileBuilder {
TextFileBuilder withDisplayName(String displayName)
TextFile build()
}
class TextFileContent {
Chars cdata
String lineTerminator
long checkSum
SourceCodePositioner positioner
}
RootTextDocument .. TextFileContent
StringTextFile .. TextFileContent
TextFile .. TextFileContent
TextFile .. TextFileBuilder
RootTextDocument .. TextFile
@enduml
- net.sourceforge.pmd.lang.apex.ast
- RulesetWriter: do we really need that? It will be a pain to update during the schema update process
- Yes, remove
- The “dump façade”. I don’t really see much use to it tbh, and if we want to inspect nodes there’s the designer
- Package net.sourceforge.pmd.util.filter:
- Based around an equivalent to java.util.function.Predicate, can be replaced by that one
- Make it private, maybe remove in 7.0.0, if we don’t need it anymore
- Based around an equivalent to java.util.function.Predicate, can be replaced by that one
- PMD.getUriDataSources:
- public static List getURIDataSources(String uriString) throws PMDException {
- Only a helper method, could be private
- private
- PMDException
- Its defining feature is the “severity” of the error, but this is not used once in our codebase.
- It’s used very inconsistently within PMD
- Yes, remove it
- Language interface:
- getRuleChainVisitorClass() -> there should be a method ‘RulechainVisitor getRuleChainVisitor()’ instead to avoid instantiating it via reflection
- Yes
- getRuleChainVisitorClass() -> there should be a method ‘RulechainVisitor getRuleChainVisitor()’ instead to avoid instantiating it via reflection
- LanguageRegistry:
- getInstance() -> I think at this point it’s quite clear that it’s a utility class since all methods are static, and getInstance() should be removed.
- Yes
- commaSeparatedTerseNamesForLanguageVersion, commaSeparatedTerseNamesForLanguage -> these are not generic enough to be on there, they are also used at most once or twice across PMD, could be inlined at call sites
- Yes, they are at least private API.
- getInstance() -> I think at this point it’s quite clear that it’s a utility class since all methods are static, and getInstance() should be removed.
- findWithRuleSupport -> CPD-only languages should not provide net.sourceforge.pmd.lang.Language as a service, they have zero use and the modules already provide a net.sourceforge.pmd.cpd.Language
- See eg the C++ language module, it provides implementations for CppHandler, CppParser, etc which only throw UnsupportedOperationExceptions
- The LanguageRegistry should only provide a collection of fully supported PMD languages, but these dummy implementations clutter it.
- Yes, deprecate for removal
- net.sourceforge.pmd.util.log.PmdLogFormatter (unused)
- Yes, deprecate for removal
-
#1550 Deprecate StatisticalRule
All of net.sourceforge.pmd.stat, net.sourceforge.pmd.lang.rule.stat, and their language specific base classes. StatisticalRule is superfluous, and existing rules may be rewritten more clearly without it.- Remove
- net.sourceforge.pmd.rule.ImportWrapper
- Should be moved to pmd-java, it’s language specific
- Yes :)
- Rule interface:
- getRuleClass/setRuleClass -> This is just getClass().getName(), and shouldn’t ever be set
- Yes, remove
- getParserOptions -> supporting rules with different parser options has never been implemented and would require several parsing phases if options conflict between rules -plus it’s probably not useful. I think it’s safe to say this is never going to see light
- used in XmlXPathRule
- but no implementation that would use XmlParserOptions…
- Deprecate it for removal
- addRuleChainVisit(String) -> unsafe since we don’t use a Class, and the implementation is wonky. Better keep to the version taking a Class parameter
- Yes
- getRuleClass/setRuleClass -> This is just getClass().getName(), and shouldn’t ever be set
- All net.sourceforge.pmd.util
- net.sourceforge.pmd.ant.(SourceLanguage | RuleSetWrapper)
- net.sourceforge.pmd.ant.Formatter
- net.sourceforge.pmd.lang:
- LanguageFilenameFilter
- XPathHandler -> exposes implementation details, and not useful to an end-user
- BaseLanguageModule
- AbstractLanguageVersionHandler
- AbstractParser
- DataFlowHandler
- Basically we can’t expect many other custom language implementations out there, so I’d rather most of the implementation details to be better in an internal package
- net.sourceforge.pmd.rule:
- AbstractRuleChainVisitor
- AbstractRuleViolationFactory
- Do we really need to publish the interface RuleChainVisitor? It seems to me that that is very implementation-specific. Ideally the Rulechain would happen transparently to users.
- All of net.sourceforge.pmd.processor, plus SourceCodeProcessor
- We should propose a simpler API to process some source and hide those as implementation details. Usage of MultiThreadProcessor vs MonoThreadProcessor could be hidden behind a method which specifies the number of threads, PmdThreadFactory could be hidden inside MultiThreadProcessor, etc
- Rule interface:
- setLanguage, setMinimumLanguageVersion, setMaximumLanguageVersion -> No rule supports a change to its language. net.sourceforge.pmd.lang.rule.ImmutableLanguage is in that sense redundant, and is implemented by every abstract rule class so could be removed
- net.sourceforge.pmd.rules.(RuleFactory | RuleBuilder)
- The current classes are there to help RulesetFactory. I see a use case for having a rule builder as public API (eg to build UIs), but this one takes only string arguments and is overall made to agree with the parsing done by RuleFactory, so is probably not made to be published
- net.sourceforge.pmd.lang.ast
- AbstractTokenManager
- All the implementation of the XPath support
- All of the implementations of language-specific analysis passes like symboltable, typeresolution, etc. Only user-facing things should be published API, like eg interfaces, but not abstract classes, or visitors
- In particular, all of net.sourceforge.pmd.lang.dfa (pmd-core) should be deprecated. No user rule should depend on that framework.
- In AbstractJavaRule:
- getDeclaringType, isSuppressed, isQualifiedName, importsPackage: not used or barely used and don’t apply to all rules
- visitAll should be internalized
- Base rule classes (#971 Deprecate overly specific base rule classes)
- AbstractCommentRule
- assignCommentsToDeclarations could be made into an optional AstProcessingStage activated by comment rules instead of manually assigning comments on visit(ASTCompilationUnit)
- The rest of the functionality can be moved to the AST, like for CommentUtilAbstractInefficientZeroCheck -> way too specific, and a better AST structure for expressions will remove the need to do something like that
- AbstractJavaMetricsRule -> not specific to metrics, plus I think these added “visit” methods that were not on JavaParserVisitor could be moved to JavaParserVisitorAdapter directly
- AbstractOptimizationRule -> not useful at all, the method “assigned” could be moved on VariableDeclaratorId as “isEffectivelyFinal” or something.
- AbstractPoorMethodCall -> not generic enough, extended by only one rule
- AbstractStatisticalJavaRule -> since we remove StatisticalRule
- AbstractSunSecureRule
- isField and isLocalVariable can be superseded by improvements to the symbol table
- The rest can be moved to the rule that use it, but has no use being shared
- AbstractCommentRule
- net.sourceforge.pmd.lang.java.(AbstractJavaHandler | AbstractJavaParser)
- These only have one extending class and it’s realistically not useful to abstract them
- net.sourceforge.pmd.lang.java.ast
- All abstract classes for nodes
- All node constructors
- All setters that can be made package-private
- see #1846 Deprecate AST internal API
- net.sourceforge.pmd.lang.ecmascript.ast
- net.sourceforge.pmd.lang.jsp.ast
- net.sourceforge.pmd.lang.plsql.ast
- net.sourceforge.pmd.lang.vf.ast
- net.sourceforge.pmd.lang.vm.ast
- net.sourceforge.pmd.lang.xml.ast