⚠️ This page is outdated. Please refer to PMD 7 Tracking Issue #3898 for the current status.

Main Issue: ??

Status

TODO move what's done out to make this document clearer

AST

Grammar is mostly done, these are most of these PRs in the past year: https://github.com/pmd/pmd/pulls?page=2&q=label%3Ain%3Aast+is%3Aclosed+is%3Apr

This wiki page has some more details further down

Some changes were ported back to be developed on pmd 7.0.x. This includes

#2166 (TextAvailableNode)
#2055 (ViolationSuppressor)
Some changes to core utilities like CollectionUtil, NodeStream

Current Status: [java] Java-grammar progress document #2701

Symbol table

Background info

The point of the java grammar rewrite is to have a more abstract syntax tree, with 1. more regularity, and 2. more specificity in the type of nodes it exposes. In particular, one end goal is to remove ambiguous names from the tree, by using semantic information.

To that end the symbol table code has been rewritten to be more general and more useful. Where the old symbol table was mostly an index of AST nodes, and type resolution complemented it with information extracted from Class instances, the new symbol table bridges these two representation with the concept of symbols.

Symbols are an abstraction over a java class. Currently, two implementations are provided:

one backed by AST nodes
one backed by Class instances

Timeline until now: https://github.com/pmd/pmd/pulls?q=is%3Apr+label%3Ain%3Asymbol-table+updated%3A%3E%3D2019-10-24+

Summary of what is already merged:

symbols implementations for Class and AST nodes
there is also a "placeholder" implementation for unresolved symbols in case the classpath is incomplete.
symbol tables are implemented
disambiguation is implemented

Type resolution

[java] New type resolution #2689

Background info

TODO

Downsides of current typeres

unmaintainable code
no separation of concerns
using Class doesn't preserve generics, needs a complete classpath
performance
- too many classloader calls
- type inference engine is exponential in number of variables
functionality
- inference of lambdas/method refs
- inference using context information (java 8+)
- too much information that could be used by rules is just thrown away

What's left to do

Symbol table

inherited members are not considered for disambiguation (easy to do but needs a lot of tests)
error reporting of disambiguation is ok, but the logger implementation is a placeholder that needs to be moved to pmd-core (needs changes to the Parser API)
symbols have no access to type information
using Class is not a great solution, we could use ASM to read class files instead, so we can avoid loading the classes. We can also get more info this way, eg constant initializers, and we avoid reading and decoding the body of methods for nothing.
[java] Rewrite symbol table #2601
[java] Name disambiguation in annotations #2665

Type resolution

provide a set of classes and interfaces to represent types
- class and interface types, and type variables, will be backed by symbols
- primitive types are their own thing, don't need user-supplied symbols
- intersection, wildcard, array types just combine other types
refactor the API of TypeNode and subclasses (not a big change)
provide an alternative to ClassTypeResolver
implement a new type inference engine
lastly, remove the old implementation
[java] Deprecate old symbol table, add replacement for TypeHelper #2721
There is a new TypeHelper replacement net.sourceforge.pmd.lang.java.types.TypeTestUtil that has been backported to pmd6 and will be part of the API for pmd7.

First details

Grammar changes: see Java changes
Symbol table
- #1566 Symbol reference framework
Data flow
Type resolution: report missing classes / incomplete auxclasspath #1371
Drop parsing support for Java <= 1.5?

Should we stop supporting java versions <= 1.5 ? The fact that “assert” and “enum” are not tokens in our grammar makes many parts of it wonky. Public support for 1.5 ended in 2009, paying customer support in 2015.

->Yes, let's remove support for it in PMD 7. If someone needs Java 1.5, there is still PMD 6.x available.

enumLookahead() seems like a bit of overkill, can it use Modifiers somehow?
Rule updates (deprecations, etc..)
- UseObjectForClearerApi: Deprecate in favor of ExcessiveParameterList, which is even configurable. See also discussion at #1752.
- AvoidFinalLocalVariable: Deprecate because it does not make much sense. See discussion at #1482 and #811.
- SimplifyStartsWith: This might not be a performance benefit anymore. The suggested code is not only to so readable but also prone to IndexOutOfBounds - see #1868.
- #2022 [java,apex] Remove deprecated and unused rules
- #1609 [java] Drop AtLeastOneConstructor - maybe. Need to understand the motivation behind the rule. Seems to be just style.
- AvoidLiteralsInIfCondition: Change default value of property "ignoreExpressions" - see #2150
- Unused rules (rules that have not been referenced by any ruleset) have been removed: #2022. The list of rules is maintained here: https://github.com/pmd/pmd/wiki/Project-Ideas-%5BMature%5D#new-rules

Java AST Changes

This is the TODO/ Done tracker page, see Java_clean_changes for detailed, presentable changes, see next major page for what's destined to the website

https://github.com/pmd/pmd/labels/in%3Aast

Implementation plan

Goals:

keep the 7.0.x branch buildable
- Branch for the grammar changes: java-grammar, diff
identify bottlenecks and remove them to allow as much work to be done in parallel

Dependency graph and tasks (edit it on http://asciiflow.com/):

                                       +--------+
                                       | Parser |
                                       +--------+
                                                                                                +------------------+
                                                                                                | New symbol table |
                            Expression                                                          +------------------+
                          & type grammar
                              #1759                                                       Symbol interfaces
                                +                   Temp                                        #1566
                                |                processing                                       +
                                |                  stages                                         |
                                |                    +                                            |
                                |                    |                                 +-----+----------+------+
+---------------------------+   +------------------+ |                                 |     |    |     |      |
|Misc AST changes           |   |                  | |                                 v     v    v     v      v
|                           |   |                  v v                                 +-----+----+-----+------+
|                       +------>+                   Temp                                 Symbol implementations
|                           |   |                 disambig +---------+                            +
|                           |   |                   pass             |                            |
|   Annotation changes? +------>+                    +               |                            v
|                           |   |                    |               |                          Local     +----------+
|                       +------>+                    v               |                      symbol tables            |
|                           |   |                  Typeres           |                            +                  v
|                           |   |                  update            |                            |               Reflective
|                           |   |                    +               |                            |              symbol tables
+---------------------------+   |                    |               |                            |                  +
                                |                    |               |                            v                  |
                                |                    |               +------------------> Real disambig pass         |
                                v                    |                                            +                  |
                      Stable AST structure           |                                            |                  |
                                +                    |                                            |                  |
                                |                    |                                            |                  |
                                |                    |                                            +------>+<---------+
                                +-------->+<---------+                                                    |
                                          |                                                               v
                                          |                                                           More typeres
                                          |                                                           improvements
                                          v
                         +-----+----+------------+---+------+
                         |     |    |     |      |   |      |
                         v     v    v     v      v   v      v
                         +-----+----+-----+------+---+------+
                                 Rule updates

Plan:

Merge all grammar changes into a new branch
- Changes are detailed below
Wire-in the processing stages stuff
- With a temporary hackish implementation, that can be replaced transparently later on with the real thing (which involves changing Ruleset and more stuff)
Create the AST disambiguation pass using the current symbol table
- For now the symbol table is not very broken, only usage search is dead but scopes and declarations still work properly
- This would be a temporary approach with the goal to take easy to grab stuff to remove some easy ambiguous names. Implementation of the new symbol table can proceed in parallel, and when it's done we can switch the implementation of the processing stage transparently.
- This would be easy to implement using the current implementation.
Then we can merge the new symbol table and ast changes independently until we have a stable AST structure
Once we have that, and an OK typeres, we can begin updating rules

Guidelines

Open guidelines

Identifying abstractions

The JLS grammar uses productions to denote two separate things
- To describe the abstract relations between some constructs, eg Type ::= ReferenceType | PrimitiveType -> "A type is either a reference or primitive type"
- To describe the concrete syntax of code constructs, eg PrimitiveType ::= int | float | ... "A primitive type is written int or float or ..."
While the second type of production mostly should correspond to a node in an AST, the first type would be better rendered by subtyping, which are a more natural way to abstract over stuff in Java.
- class ASTPrimitiveType implements ASTType
- ASTLocalVariableDeclaration could have a method ASTType getTypeNode() which returns a subtype of ASTType
Our Java grammar is based on some old JLS, which was directly translated to JavaCC without paying attention to the above distinction. Consequently we have nodes like ASTExpression or ASTType everywhere in ASTs, which are just clutter, and no abstraction whatsoever in the AST.
Identifying those nodes and turning them into interfaces implemented by the concrete syntax nodes makes trees smaller, easier to navigate, and provides more solid abstractions to work with

Work in Progress 🔥

Tasks

Note: As of early June 2019, this describes the work merged into the java-grammar branch and not yet on pmd/7.0.x

Download and run this runnable jar to check out the current version

Expression-related

Summary: Made left-recursive, trimmed down unnecessary layers, use AmbiguousName
#1759 [java] New expression and type grammar
#1855 [java] Make new java expression grammar stricter
#1857 [java] Make ClassLiteral not an ASTLiteral
#1872 [java] Remove ParenthesizedExpr
Fix text bounds of parenthesized expressions. For now, a parenthesized expression's text span is only the span of the innermost expr, but the first and last token should be the parentheses -> fixed via cf43a6f18e3
#1981 [java] Simplify array allocation expressions
[java] Implement constant values #2777
- Make constant expressions have access to their value
- In particular, implement the string formatting routines relevant to text blocks
Remove MethodLikeNode. LambdaExpression should extend the same base class as other expressions
- LambdaExpression currently implements AccessNode because of that, which is absurd
- This was initially done for the metrics framework. We can compute metrics on the block of the lambda though, and stop using qualified names (implementation dependent for lambdas). Using a data store system to cache metrics on the node will make that painless.
Improve SwitchExpression structure Refactor switch grammar #2204

* TODO Do we group the statements of a SwitchNormalBlock ? Do we unify their interface ? SwitchStatement implements Iterator<SwitchLabel> which seems shitty tbh. * Something like "SwitchLabeledStatementGroup"?
Merge different increment/decrement expressions #1890
Close #497 RFC: new layer of abstraction atop PrimaryExpressions. Solution is implemented.
Operator nodes:
- Implement #1661 About operator nodes
- The methods getOperator() should not be deprecated. We'll change the return type from String to BinaryOp, which will be a breaking change on PMD 6, but is acceptable. There are anyway more changes to do when switching to PMD 7. This means, the new methods getOp() will be removed again.
- Problems with String concat and + operators -> try left recursiveness (refs #1979, #2039)
About left-recursiveness

Sensible solution would be to make AdditiveExpression be parsed completely left-recursively, but then, it looks weird that all other binary expressions are not left-recursive… The problem also arises for operators &, |, ^, which may be boolean non-shortcut expressions or numeric bitwise expressions

Arguments for left recursiveness:
- Preserve all intermediary type resolution results, because of numeric promotion -> the whole point of #497
- AST is JLS-like, easier to document (we don’t have to explain “sometimes it’s left-recursive and sometimes not”)
- Easier mapping of type resolution algorithms from the JLS
- Stronger contract for binary expressions: all can present the same interface:
```
getLhs() : ASTExpression
getRhs() : ASTExpression
getOperator() : BinaryOp
```
- Classpath independent
  - Users should configure their tool correctly anyway
  - But eg the structure should be not too different between a no-classpath designer and a real run with classpath
Arguments against:
- Not super intuitive
- Breaks away from previous AST structure, needs to be relearned
- Needs more tooling for some use cases: Eg how to catch a string concat statement like a = … + a + …; (taken from AvoidConcatInLoop #1932)
  - In Java we can imagine the following method for ASTAdditiveExpression which yields all operands that are added with the same operator: operandStream(): NodeStream<ASTExpression>
  - Eg for <a + b + c * d>, operandStream returns Stream.of(<a>, <b>, <c*d>). So you can do eg:
```
  additiveExpr.operandStream()
                    .filter(ASTVariableReference.class)
                    .any(ref -> ref.getName().equals(“a”))
```
  - However in XPath it’s less trivial.. We’d have to provide access to the node stream with a function:
    - Old way, with flat additive expr:
```
AdditiveExpression[@Operator=”+”]/VariableReference[@Image = “a”]
```
  New way:
```
AdditiveExpression[@Operator=”+”]
  [pmd-java:operandsOf(.)/self::VariableReference[@Image = “a”]]
```

Expression grammar

Make ASTVariableInitializer, ASTExpression, ASTPrimaryExpression, ASTLiteral interfaces
ExplicitConstructorInvocation uses ArgumentList
Introduce new node types:
- ASTNumericLiteral, ASTStringLiteral, ASTCharLiteral
  - those divide the work of ASTLiteral, they implement it along with preexisting ASTBooleanLiteral, ASTNullLiteral
- ASTFieldAccess
  - Only when it has a LHS, e.g. "a.b"
  - Only pushed when we're certain this cannot be a FQCN or type member
- ASTVariableReference
  - Unqualified reference to a variable
- ASTAmbiguousName
  - For those names that are syntactically ambiguous
  - Most of them aren't semantically ambiguous, even with no auxclasspath, meaning they can be rewritten to a more meaningful node (a VariableReference or FieldAccess). The attribute AmbiguousName/@SemanticCheck makes a (very simple) disambiguation using the current symbol table and other easy to grab stuff:
    - If there's a variable in scope with the name, then it's classified as EXPR
    - Otherwise, if there's a static import with the given name, then it's classified as EXPR
    - Otherwise, if there's an import with the given name, it's classified as TYPE
    - Otherwise, it's classified as AMBIGUOUS
    - Many scenarios where there's no ambiguity without auxclasspath are not covered
    - With an auxclasspath and a proper symbol table, we could reclassify them all
    This is a very simple prototype, any real implementation would use the more precise JLS rules and some more heuristics. I found that nevertheless, around 90% of syntactically ambiguous names are semantically unambiguous using those simple rules (that was observed on several large codebases). This is without any auxclasspath support.
- ASTMethodCall
  - Doesn't use ASTArguments anymore, ASTArgumentsList is enough
- ASTArrayAccess
- ASTClassLiteral
- ASTAssignmentExpression
- ASTArrayAllocation, ASTConstructorCall
  - those replace ASTAllocationExpression
- ASTThisExpression, ASTSuperExpression
  - those enclose their qualifier if any
Remove unnecessary node types:
- ASTPrimarySuffix and ASTPrimaryPrefix
  - they're productions, not nodes
- ASTUnaryExpressionNotPlusMinus
  - Merged into ASTUnaryExpression
- ASTAssignmentOperator
  - made obsolete by ASTAssignmentExpression
- ASTArguments
  - To remove it, ASTExplicitConstructorInvocation must be changed too
- ASTVariableInitializer
  - Is superseded by ASTExpression, through making ASTArrayInitializer implement ASTExpression. This is only logical, they're expressions according to JLS. See deprecation note.
- ASTRSIGNEDSHIFT, ASTRUNSIGNEDSHIFT
- ASTAllocationExpression
  - The concrete node is replaced by ASTConstructorCall and ASTArrayAllocation, though we could make it an interface instead of removing it

Type-related

Make ClassOrInterfaceType left-recursive (in #1759 [java] New expression and type grammar)
Type grammar changes

Changelog: * ClassOrInterfaceTypes are now left-recursive. They use AmbiguousName too, because some segments need to be disambiguated between package or type name * TypeArgument, WildcardBound are removed, replaced with WildcardType and IntersectionType * Add ArrayType
```
* Fixes
    * [#910 AST inconsistency between primitive and reference type arrays](https://github.com/pmd/pmd/issues/910)
    * [#1150 ClassOrInterfaceType AST improvements](https://github.com/pmd/pmd/issues/1150)
```
Close #1150 ClassOrInterfaceType AST improvements.
Use ArrayTypeDim in the other contexts it can appear in
- #2203 Leftovers about Dimensionable & type annotations
* After a VariableDeclaratorId * Should probably be nested inside the VariableDeclaratorId, since everywhere a VariableDeclaratorId can occur, braces can occur (field decl, param, etc) * After the FormalParameters of a method declaration * In array types (done)
Improve ASTType::getTypeImage or remove it

Annotation-related

Make annotation an interface (in #1759)
- Turn Annotation, MemberValue into interfaces
- Remove Name nodes, useless in all cases
- Remove MemberValuePairs. MemberValuePair may only occur in NormalAnnotation so that node added no information.
#1875 [java] Move annotations inside the node they apply to
About annotation placement...
- Moving the annotations inside the node they apply to is easy implementation-wise, and would be cool abstraction-wise, e.g.
  - Annotations on EnumConstants are sprayed out in EnumBody at the moment
  - The annotations of a type may be in several places. E.g. in @A int[], the annot applies to int, not the array type, yet it's a sibling of the array type.
  - If you view the AST as a DOM, like we do for XPath, then annots would probably be elements nested inside the element they apply to, instead of siblings randomly placed outside of the node.
- Use TypeAnnotationList or AnnotationList, where it is correct.
  
  Basically AnnotationList should be used on all "declaration contexts" listed here:
  
  https://docs.oracle.com/javase/specs/jls/se9/html/jls-9.html#jls-9.6.4.1
  
  And TypeAnnotationList in the "type contexts" listed here:
  
  https://docs.oracle.com/javase/specs/jls/se9/html/jls-4.html#jls-4.11
  
  There is syntactic ambiguity about it, which is clarified in
  
  https://docs.oracle.com/javase/specs/jls/se9/html/jls-9.html#jls-9.7.4
  
  The BNF excerpts in the JLS don't mention explicitly whether an annotation is a type annotation or not, so we have to cross-check with those reference sections to find out which one is appropriate.
  
  (Not all occurrences of ( Annotation() )* have been replaced yet in our grammar.)
Verify that #1367 Parsing error on annotated subclass is fixed and documented (next major development, template in Java_clean_changes)
- Tested in FullTypeAnnotations.java
Verify that #997 Java8 parsing corner case with annotated array types is fixed and documented (next major development, template in Java_clean_changes)
- Depends on "use ArrayTypeDims on VariableDeclaratorId, MethodDeclaration"
Remove wrapper nodes that only serve to group the annotations with their subject
- Fixed by #2300
- ClassOrInterfaceBodyDeclaration, BlockStatement, TypeDeclaration, AnnotationTypeBodyDeclaration
  - They're noise, circumvented awkwardly in every rule to reach the enclosed node.
  - Those were meant to be interfaces anyway, which are kind of emulated by eg ASTAnyTypeDeclaration and ASTAnyTypeBodyDeclaration. We should probably remove those and introduce interfaces like ASTDeclaration and ASTTypeDeclaration
- ASTClassOrInterfaceBody, ASTEnumBody, ASTAnnotationTypeBody
  - Provided we flatten the nodes like described in the above bullet point, then it would make sense to delete the AST*Body nodes completely too. This would make the declarations of a type declaration direct children of the node, which would basically let us eg get the methods of a type declaration with node.children(ASTMethodDeclaration.class) (using the node stream API). That means that "find boundaries" can be removed from the Java AST, and makes everything more efficient. The only practical way to fetch methods currently is to call findDescendantsOfType, which traverses all the tree, and that's why find boundaries were necessary (because otherwise we go too deep). Instead, if we just remove this level of nesting, we can avoid traversing all the tree all the time, and remove the "find boundaries", which were a workaround to begin with.
  - This also removes three different node types which don't provide any information anyway, since their type is entirely determined by the type of their parent node.
Probably: unify all Annotation nodes (merge NormalAnnotation, MarkerAnnotation and SingleMemberAnnotation). Maybe not necessary, if we can select by ASTAnnotation (that's the interface)? 👍
- IMO it's better to unify them. That's because @A and @A() are semantically equivalent, yet are different nodes. The parsing is also not very natural and uses a lot of lookaheads, just to respect a syntax-only distinction.
- Fixed by #2282

Annotated vargargs. See comment on #1875

#2203 Leftovers about Dimensionable & type annotations

On the varargs ellipsis ...: void varargs(String @A ... last);
- In that case, the ... is treated just like array dimensions and annotations apply exactly as if the param was declared as String @A [] last
- The sensible way to parse that would probably be to use ArrayTypeDims in some way... I'm not sure yet.
- Eg maybe we should parse String... just as if it was String[], so use an ArrayType.
  - This causes problems because ArrayType is not fully recursive, so that the component type of an ArrayType is not always represented by a node.
    - Eg in String[], the component type is represented by the ClassOrInterfaceType
    - But in String[][], there is no node representing the component type String[] , since [][] are flat.
    - So maybe we should represent array types like the following:

Edit: this is impossible, array types are not left-recursive in that way. Extra dimensions nest more closely. Eg int @A[] f @B[];, the type of f is int @B[] @A[], not int @A[] @B[], see https://docs.oracle.com/javase/specs/jls/se11/html/jls-10.html#jls-10.2

Code	Old AST	Current 7.0 AST	Proposed representation
String[][]	+Type +ReferenceType[@ArrayDepth=2] +ClassOrInterfaceType "String"	+ArrayType +ClassOrInterfaceType "String" +ArrayTypeDims +ArrayTypeDim +ArrayTypeDim	+ArrayType +ArrayType +ClassOrInterfaceType "String"
String @A [] @B []	N/A (Parse error)	+ArrayType +ClassOrInterfaceType +ArrayTypeDims +ArrayTypeDim +Annotation "A" +ArrayTypeDim +Annotation "B"	+ArrayType +ArrayType +ClassOrInterfaceType +Annotation "A" +Annotation "B"
With varargs: String @A [] @B ...	N/A (Parse error)	N/A (Parse error)	+ArrayType[@Varargs = true()] +ArrayType +ClassOrInterfaceType +Annotation "A" +Annotation "B"

Code

Old AST

Current 7.0 AST

Proposed representation

String[][]

+Type
  +ReferenceType[@ArrayDepth=2]
  +ClassOrInterfaceType "String"

+ArrayType
    +ClassOrInterfaceType "String"
    +ArrayTypeDims
      +ArrayTypeDim
      +ArrayTypeDim

+ArrayType
    +ArrayType
      +ClassOrInterfaceType "String"

String @A [] @B []

N/A (Parse error)

+ArrayType
    +ClassOrInterfaceType
    +ArrayTypeDims
      +ArrayTypeDim
          +Annotation "A"
      +ArrayTypeDim
          +Annotation "B"

+ArrayType
    +ArrayType
      +ClassOrInterfaceType
      +Annotation "A"
    +Annotation "B"

With varargs:

String @A [] @B ...

N/A (Parse error)

+ArrayType[@Varargs = true()]
    +ArrayType
      +ClassOrInterfaceType
      +Annotation "A"
    +Annotation "B"

Note that this representation is both more economic in number of nodes (especially in the most common case where there are no annotations on the array dimensions), and also preserves every intermediary array type, which is nice.

[java] Allow @SuppressWarnings with constants instead of literals #520

TypeNode related

Make those nodes unimplement TypeNode:
- CompilationUnit
- ImportDeclaration (refs #1888)
- FieldDeclaration (refs #910)
Fix #910 AST inconsistency between primitive and reference type arrays
Fix #905 [java] Add new node for anonymous class declaration
- This is maybe already done? AnonymousClassDeclaration
[java] Add void type node to replace ResultType #2715

API-related

Remove ASTExtendsList::extendsMoreThanOne -> too specific
Remove ASTThrowStatement::getFirstClassOrInterfaceTypeImage -> too specific
Verify AST nodes comply to our "standard"

* Javadoc with grammar * constructor package private * setters package private
#1856 [java] Hide internal AST api

* API cleanup * `InternalApiBridge` for setting data into nodes at various stages. This is a internal API and not public API.

Documentation

Example usage of AssignmentOp (AssignmentExpressions), with the removal of ASTAssignmentOperator
Annotation() is now #void
Parenthesized Expressions: provide example to illustrate how the AST is improved now using attributes @Parenthesized and @ParenthesisDepth - see #1872 [java] Remove ParenthesizedExpr
ClassOrInterfaceType has now a TypeAnnotationList as child (as part of ClassTypeSegment) - see #1855 [java] Make new java expression grammar stricter
Document new ASTLiteral / ASTClassLiteral nodes (merely just a example, before - after)
High-level document about what processing stages Java is using to process java sources
- parsing
- rewrite ast / symbol table / disambiguation
- type resolution
Document example of how method calls are parsed now. How the AST looks like. E.g. new Foo().bar.foo(1) is now parsed as MethodCall[foo]/FieldAccess[bar]/ConstructorCall[Foo].
- document left recursiveness when we're done discussing the semantic rewrite phase
- write a summary of changes in the javadoc of the package, will be more accessible.

In Javadoc: Add additional example with expressions: 1 + 2 / 3 is parsed as 1 + (2 / 3) resulting in the following AST:

    AdditiveExpression[@Operator="+"]
    + NumericLiteral["1"]
    + MultiplicativeExpression[@Operator="/"]
      + NumericLiteral["2"]
      + NumericLiteral["3"]

Deprecations due on PMD 6

ASTCastExpression (hasIntersectionType()) - no direct replacement
ASTClassOrInterfaceType (isArray() and getArrayDepth())
ASTEnumConstant (getQualifiedName())
ASTMarkerAnnotation (getAnnotationName())
ASTNormalAnnotation (getAnnotationName())
ASTSingleMemberAnnotation (getAnnotationName())
ASTImportDeclaration (getPackage() - 295759456dacef)

Other APIs that cannot be marked `@Deprecated` on 6.0.x

Nodes/classes that are deprecated on PMD 7, but not on PMD 6

List of classes:
- ASTAllocationExpression
- ASTArguments
- ASTArrayDimsAndInits
- ASTAssignmentOperator
- ASTBlockStatement
- ASTInstanceOfExpression
- ASTMemberSelector
- ASTMethodDeclarator
- ASTNameList
- ASTPrimaryPrefix
- ASTPrimarySuffix
- ASTResources
- ASTStatementExpression
- ASTTypeArgument
- ASTTypeBound
- ASTUnaryExpressionNotPlusMinus
- ASTVariableInitializer
- ASTWildcardBounds
- ASTAnnotationMethodDeclaration
- JavaParserVisitorReducedAdapter
- ASTAnyTypeBodyDeclaration
List of classes, that are already removed on the 7.0 branch:
- ASTMarkerAnnotation
- ASTMemberValuePairs
- ASTNormalAnnotation
- ASTPreIncrementExpression
- ASTPreDecrementExpression
- ASTPostfixExpression
- ASTSingleMemberAnnotation
- ASTAnnotationTypeMemberDeclaration
- ASTTypeDeclaration
List of renamed AST classes:
- ASTResourceSpecification -> ASTResourceList
- ASTCatchStatement -> ASTCatchClause
- ASTFinallyStatement -> ASTFinallyClause
Document in Javadoc and next major development, that these classes will go away? We don't deprecate them, because we have no replacement for the users in PMD 6 (it would only be noise).

Nodes that are moved from class to interface

List of nodes
- ASTAnnotation
- ASTExpression
- ASTLiteral
- ASTMemberValue
- ASTPrimaryExpression
- ASTReferenceType
- ASTType
- ASTStatement
Document in Javadoc and next major development, that these nodes will go away? We don't deprecate them, because we have no replacement for the users in PMD 6 (it would only be noise).

Other Java-specific tasks

General improvements needed from core

#1796 [core] Wire processing stages into SourceCodeProcessor
#1787 Match abstract types in XPath queries (e.g. //*[nodeIsOfType(Expression)])
Externalise suppression, eg with a Suppressor interface, to isolate the AST from rules
- That can use rule category and tags, eg use tag information to know, which rules to suppress with "unused", etc.
- Java impl. is now AnnotationSuppressionUtil, see #1927 [java] Refactor annotation suppression
- #2055 [core] Violation Suppressors
Split AbstractNode to not force the root class to have beginLine, endLine, etc fields
- Part of #2447
`AbstractJavaNode`: beginLine, beginColumn, endLine, endColumn - setting from firstToken/lastToken? Or in AbstractNode? Since we store the first/last token, we could return the position from the tokens instead of maintaining fields beginColumn... * This is true, but for ASTs that wrap some other representation, maybe there's no token (eg Apex, JS?). Most probably, we should split AbstractNode into an AbstractJjtreeNode, that uses the tokens, and maybe AbstractWrapperNode, for the other stuff. Default methods on the Node interface would help.

Open questions

Naming of AST Nodes: Should all expression nodes end with the suffix Expr? For example: ArrayCreationExpr instead of just ArrayAllocation. Other possible suffixes e.g. Type, Name, Literal, Statement, Declaration.
We sometimes have *List nodes, but sometimes they are #void. E.g. LambdaParameterList is not void, but AnnotationList is...
- This doesn't strike me as a problem, could you elaborate?
Should we remove checks for old java version (like check for bad diamond usage...)? In what measure? When do we do this? I think, it should be also done on the java-grammar branch?
Comments
- Should they be part of AST?
  - Probably not in the main tree, ie not matchable via name test in XPath (we can use the comment() kind test).
- Currently we have JavaNode.comment() - so every node has this method.
- However, the AST assignment is done in AbstractCommentRule... The parser only provides a list of comments on the compilation unit (see ASTCompilationUnit.getComments()).
  - FIXME: AbstractCommentRule: assignCommentsToDeclaration -> FIXME, make that a processing stage
  - I think the rules Spoon uses are as close to "canonical" as can be.
Avoid reliance on the image of nodes. It's not semantic, some getter with a specific name should be used instead. I'd be for deprecating get/setImage from 7.0.0 on, not necessarily removing it.
- Proposed guideline: @NoAttribute the image attribute on new node types
- Also in TypeHelper: "FIXME checking against the image is for the most part meaningless."
AST simpler -> towards Universal AST?
ClassOrInterfaceBodyDeclaration has a monstrous lookahead to check for enums
Remove ASM dependency? Do we still need it for type resolution?
- I think it's too early to decide. Type resolution most probably won't need it (it only does for imports..), but there are some other probably useful things we can only find-out through bytecode analysis, because there's no reflection API for it. Examples:
- Finding out whether a public static final field in a library is a compile-time constant, and hence whether eg 1 * Math.PI in a user class is also constant.
- Finding with 100% certainty whether a method is overridden, and to which method it delegates. Currently MissingOverride uses a heuristic to do that, which fails in some cases where generics is involved, or with covariant return types.
  - This also helps the symbol table and type resolution, which need to take covariant return types into account, etc.
Modifiers: Use different kinds - ClassModifiers, MethodModifiers, InterfaceMemberModifiers, ... to avoid problems like in #1849 Preserve local class modifiers.
- See comment about that, maybe it's useful to be lenient
Split ForStatement into ForStatement and ForeachStatement

Dropped changes

SwitchExpressions and lambda workaround (inSwitchLabel flag). Try to get rid of it.
- The flag cannot be removed, because a constant expr (in a switch label) can be a ? b : c, and c could be a lambda, but in a switch label, this is ambigous with the switch rules.
- TODO explain
Remove ASTResultType
- If a method declaration has no Type node, then it's void. This removes yet another node from the grammar. Also, void is not a type, but rather denotes the absence of a type, and it's not useful to populate the type with void.class... Otherwise, I'd propose to have a VoidType node, but no intermediary ResultType node.
- Why it's problematic:
  - In fact we probably need to keep that, to have a node whose type is the result type of the method declaration, taking into account additionnal brackets after the formal parameters. This is a corner case, yet we should probably handle it gracefully. Alternatively, maybe make MethodDeclaration a TypeNode ?
Deprecate getOperator methods. We'll simply change the return type from String to BinaryOp. This breaks Java compat, but ensures a smooth transition for XPath users.

Rules for Java

Codestyle

UnnecessaryCast: currently only cares about a few very specific cases about collections, could be generalized to any cast, including
- casts that provoke boxing/unboxing of the operand, where autoboxing would apply
- casts that widen a primitive type (double) i
- but not: casts that disambiguate between several overloads (super.visit((JavaNode) node, data);)
- difficulty: very easy, except the last item
- should react to @SuppressWarnings("cast")
UseDiamondOperator: could be generalised to UnnecessaryTypeArguments, that eg also reports
- List<T> a = Collections.<T>emptyList();
- List<T> a = bool ? Collections.<T>emptyList() : otherList; Note that this type argument is necessary in java 7 but not in java 8. This means, it could be a useful rule for people migrating, including us.
- List<T> a; a = new ArrayList<T>() (not in the same statement)
- Arrays.<Integer>asList(1, 2) (even without context)
- etc
- difficulty: hard, as it needs to play with type inference
- TODO: maybe this is done by [java] Update rule UseDiamondOperator #3253
ExtendsObject: rule could also check for
- wildcard bounds: ? extends Object
- type parameter bounds <T extends Object>
- difficulty: trivial
UselessParentheses: rule should be aware of
- parentheses that differentiate addition from string concatenation: 1 + (2 + "") == "12", 1 + 2 + "" == "3"
- parentheses that clarify precedence (optionally) (#1918)
- I already refactored the rule, which anyway needed a complete rewrite because of the changes to ASTInfixExpression
- It would be nice to rename it to "UnnecessaryParentheses", for consistency with all the other "unnecessary" rules
- Also fix FP #1673
- difficulty: done
UnnecessaryFullyQualifiedName: rule should be scoped down or scoped up, and probably renamed. For now it detects fully qualified names (java.io.IOException), and also qualifiers for static methods & fields (Collections.emptyList()). It misses cases of either, as it only considers imports, not inherited members.
- With the new symbol table it's really easy to write a general rule for the first case, that in principle handles all cases: there's ASTClassOrInterfaceType::isFqcn and you can basically write node.getSymbolTable().types().resolve(node.getSimpleName()) == node.getSymbol() to check if the qualification can be removed without changing the meaning of the type.
- For the case of method qualifiers, in the general case, the qualifier could be differentiating two methods with the same signature that are both in scope, and we might have to perform overload resolution to check whether removing the qualifier is really a noop, semantically. This logic is anyway entirely separate from checking for qualified names for types.
- My point about scope is:
  - either the rule should only care about fully qualified names, and not method qualifiers (and a separate rule should be split off)
  - or the rule should be renamed to UnnecessaryQualifier/UnnecessaryQualification, and handle both cases, plus possibly, the following things which are also qualifiers:
    - this.foo/this.foo(): could be optionally flagged
    - Outer.this.foo(): this is UnnecessaryQualifiedThis, but that rule only handles a single case and not eg a qualification that doesn't actually lift an ambiguity. It would be merged into the new more general rule
- difficulty: the part about types is easy, the part about methods is less easy in the general case (but the simple cases are common enough for a first version)
UnnecessaryAnnotationValueElement: could be generalized to UseAnnotationShorthand, because there are other shorthands for annotations:
- @A(): empty member list can be removed
- @A(value=..): already caught
- @A(a={""}) : arrays with single elements can be flattened -> @A(a="")
- @A({"a"}) -> @A("a")
- difficulty: easy
DefaultAnnotationValue: reports setting attributes of an annotation to their default value
- Difficulty: easy, with proper symbol support (we need to expose the default value on the method symbol)

Best practices

UnusedAssignment has a couple of known limitations that can be fixed thanks to precise overload resolution:
- thrown exceptions in try blocks
- explicit this constructor calls
- difficulty: medium
MissingOverride:
- There are a few known corner cases, because it uses bridge methods. These are documented in the class. With #1673 we can fix them.
- The rule will have to be rewritten to not use Class, instead using symbols/types. I believe most of the tooling is already in #1673, as it also has to handle overriding for overload resolution
- difficulty: medium

Error-prone

MissingBreakInSwitch: could probably be improved to detect whether a case is fallthrough on any code path. Currently if you add a break anywhere, the case may still fall through.
- The flow analysis done by UnusedAssignment already detects abrupt completion, and can detect whether a switch case is fallthrough. It's just a matter of exposing that information in the right place, and moving the analysis visitor out from UnusedAssignment to be shared by some other rules
- would be nice to ignore //fallthrough comments, also fix #1899
- difficulty: medium

Performance

StringToString: could be generalized to UnnecessaryStringConversion, and UselessStringValueOf could be merged into it. There are many places where an explicit toString is not needed:
- in a string concatenation: "foo: " + foo.toString()
- as the message of an assert statement assert true : foo.toString()
- as the argument of a StringBuilder.append call sb.append(foo.toString())
- as the argument of a PrintStream.println/print call System.out.println(foo.toString())
- the rule could also consider String.valueOf like toString in those contexts
- difficulty: not hard, i guess

This should be removed from the performance category. "".toString() will just return this, as per the trivial implementation in class String. Any JIT would optimize this away by inlining the instant it would make a performance impact, and if it isn't inlined, then it doesn't make a performance impact. This is mostly for code readability I think.

UnnecessaryWrapperObjectCreation: could be generalized to UnnecessaryBoxing, and handle both boxing and unboxing. Currently the rule only cares about primitive wrappers that are created and unboxed explicitly in the same expression, like Integer.valueOf(0).intValue(), which is absurd and no one does. The rule could care about
- Explicit boxing in contexts where the primitive would be autoboxed: Object i = Integer.valueOf(2)
- Explicit unboxing in a context where it would be auto-unboxed: int i = integer.intValue()
- Explicit unboxing in contexts where the value is immediately reboxed: Object i = integer.intValue() (would rebox the int)
- Explicit boxing in contexts where the value is immediately unboxed: int i = Integer.valueOf(0) (would unbox the integer)
- Boxing of an already boxed value: Integer.valueOf(integer) (unboxes integer, then boxes it)
- Uses of Integer.valueOf(someString) where an int is expected: Integer.parseInt is more appropriate.
- This applies to all primitives
- Casts that box/unbox their operand could either fall in this rule or UnnecessaryCast
- Should react to @SuppressWarnings("boxing")
- difficulty: easy with #2689

Tbh UnnecessaryBoxing should be part of codestyle.xml, like UnnecessaryCast, and not performance. Apart from Integer.valueOf, the JIT probably optimizes these out and I'd be amazed if this is still relevant for performance outside of a hot loop. This rule is more about code style, like UnnecessaryCast.

Possible new rules

Other quirks about current rules

DefaultPackage and CommentDefaultAccessModifier overlap: choose one
Could BooleanGetMethodName be merged into LinguisticNaming?
SignatureDeclareThrowsException looks useless. Sometimes it isn't up to you.
design/UseObjectForClearerAPI <-> codestyle/ExcessiveParameterList
- the former is just a sub case of the latter, only considers strings
AvoidInstantiatingObjectsInLoops
- super old case, only relevant to hot spots, irrelevant if object does not escape

Change default properties of some rules, e.g. CloseResource
- CloseResource: closeNotInFinally=true (see also #3586)
StringBuilder append performance: Suggest String.format instead of StringBuilder, see https://www.javaspecialists.eu/archive/Issue294-String.format-3x-faster-in-Java-17.html

errorprone/AvoidDuplicateLiterals: Maybe change default value of property skipAnnotations to true ? (see https://github.com/pmd/pmd/pull/3663#issuecomment-1003739127)

PMD 7.0.0 Java - pmd/pmd GitHub Wiki

Status

AST

Symbol table

Type resolution

What's left to do

Symbol table

Type resolution

First details

Java AST Changes

Implementation plan

Guidelines

Identifying abstractions

Work in Progress 🔥

Tasks

Expression-related

Arguments for left recursiveness:

Arguments against:

Expression grammar

Type-related

Type grammar changes

Annotation-related

TypeNode related

API-related

Documentation

Deprecations due on PMD 6

Nodes/classes that are deprecated on PMD 7, but not on PMD 6

Nodes that are moved from class to interface

Other Java-specific tasks

General improvements needed from core

Open questions

Dropped changes

Rules for Java

Codestyle

Best practices

Error-prone

Performance

Possible new rules

Other quirks about current rules

⚠️ GitHub.com Fallback ⚠️

PMD 7.0.0 Java - pmd/pmd GitHub Wiki

Status

AST

Symbol table

Type resolution

What's left to do

Symbol table

Type resolution

First details

Java AST Changes

Implementation plan

Guidelines

Identifying abstractions

Work in Progress 🔥

Tasks

Expression-related

Arguments for left recursiveness:

Arguments against:

Expression grammar

Type-related

Type grammar changes

Annotation-related

TypeNode related

API-related

Documentation

Deprecations due on PMD 6

Nodes/classes that are deprecated on PMD 7, but not on PMD 6

Nodes that are moved from class to interface

Other Java-specific tasks

General improvements needed from core

Open questions

Dropped changes

Rules for Java

Codestyle

Best practices

Error-prone

Performance

Possible new rules

Other quirks about current rules

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️