PMD 7.0.0 Java - pmd/pmd Wiki

Main Issue: ??

Status

TODO move what's done out to make this document clearer

AST

Grammar is mostly done, these are most of these PRs in the past year: https://github.com/pmd/pmd/pulls?page=2&q=label%3Ain%3Aast+is%3Aclosed+is%3Apr

This wiki page has some more details further down

Some changes were ported back to be developed on pmd 7.0.x. This includes

  • #2166 (TextAvailableNode)
  • #2055 (ViolationSuppressor)
  • Some changes to core utilities like CollectionUtil, NodeStream

Current Status: [java] Java-grammar progress document #2701

Symbol table

Background info

The point of the java grammar rewrite is to have a more abstract syntax tree, with 1. more regularity, and 2. more specificity in the type of nodes it exposes. In particular, one end goal is to remove ambiguous names from the tree, by using semantic information.

To that end the symbol table code has been rewritten to be more general and more useful. Where the old symbol table was mostly an index of AST nodes, and type resolution complemented it with information extracted from Class instances, the new symbol table bridges these two representation with the concept of symbols.

Symbols are an abstraction over a java class. Currently, two implementations are provided:

  • one backed by AST nodes
  • one backed by Class instances

Timeline until now: https://github.com/pmd/pmd/pulls?q=is%3Apr+label%3Ain%3Asymbol-table+updated%3A%3E%3D2019-10-24+

Summary of what is already merged:

  • symbols implementations for Class and AST nodes
  • there is also a "placeholder" implementation for unresolved symbols in case the classpath is incomplete.
  • symbol tables are implemented
  • disambiguation is implemented

Type resolution

[java] New type resolution #2689

Background info

TODO

Downsides of current typeres

  • unmaintainable code
  • no separation of concerns
  • using Class doesn't preserve generics, needs a complete classpath
  • performance
    • too many classloader calls
    • type inference engine is exponential in number of variables
  • functionality
    • inference of lambdas/method refs
    • inference using context information (java 8+)
    • too much information that could be used by rules is just thrown away

What's left to do

Symbol table

  • inherited members are not considered for disambiguation (easy to do but needs a lot of tests)
  • error reporting of disambiguation is ok, but the logger implementation is a placeholder that needs to be moved to pmd-core (needs changes to the Parser API)
  • symbols have no access to type information
  • using Class is not a great solution, we could use ASM to read class files instead, so we can avoid loading the classes. We can also get more info this way, eg constant initializers, and we avoid reading and decoding the body of methods for nothing.
  • [java] Rewrite symbol table #2601
  • [java] Name disambiguation in annotations #2665

Type resolution

  • provide a set of classes and interfaces to represent types
    • class and interface types, and type variables, will be backed by symbols
    • primitive types are their own thing, don't need user-supplied symbols
    • intersection, wildcard, array types just combine other types
  • refactor the API of TypeNode and subclasses (not a big change)
  • provide an alternative to ClassTypeResolver
  • implement a new type inference engine
  • lastly, remove the old implementation
  • [java] Deprecate old symbol table, add replacement for TypeHelper #2721
  • There is a new TypeHelper replacement net.sourceforge.pmd.lang.java.types.TypeTestUtil that has been backported to pmd6 and will be part of the API for pmd7.

First details

Java AST Changes

This is the TODO/ Done tracker page, see Java_clean_changes for detailed, presentable changes, see next major page for what's destined to the website

Implementation plan

Goals:

  • keep the 7.0.x branch buildable
  • identify bottlenecks and remove them to allow as much work to be done in parallel
Dependency graph and tasks (edit it on http://asciiflow.com/):
                                       +--------+
                                       | Parser |
                                       +--------+
                                                                                                +------------------+
                                                                                                | New symbol table |
                            Expression                                                          +------------------+
                          & type grammar
                              #1759                                                       Symbol interfaces
                                +                   Temp                                        #1566
                                |                processing                                       +
                                |                  stages                                         |
                                |                    +                                            |
                                |                    |                                 +-----+----------+------+
+---------------------------+   +------------------+ |                                 |     |    |     |      |
|Misc AST changes           |   |                  | |                                 v     v    v     v      v
|                           |   |                  v v                                 +-----+----+-----+------+
|                       +------>+                   Temp                                 Symbol implementations
|                           |   |                 disambig +---------+                            +
|                           |   |                   pass             |                            |
|   Annotation changes? +------>+                    +               |                            v
|                           |   |                    |               |                          Local     +----------+
|                       +------>+                    v               |                      symbol tables            |
|                           |   |                  Typeres           |                            +                  v
|                           |   |                  update            |                            |               Reflective
|                           |   |                    +               |                            |              symbol tables
+---------------------------+   |                    |               |                            |                  +
                                |                    |               |                            v                  |
                                |                    |               +------------------> Real disambig pass         |
                                v                    |                                            +                  |
                      Stable AST structure           |                                            |                  |
                                +                    |                                            |                  |
                                |                    |                                            |                  |
                                |                    |                                            +------>++                                 Rule updates

Plan:

  • Merge all grammar changes into a new branch

    • Changes are detailed below
  • Wire-in the processing stages stuff

    • With a temporary hackish implementation, that can be replaced transparently later on with the real thing (which involves changing Ruleset and more stuff)
  • Create the AST disambiguation pass using the current symbol table

    • For now the symbol table is not very broken, only usage search is dead but scopes and declarations still work properly
    • This would be a temporary approach with the goal to take easy to grab stuff to remove some easy ambiguous names. Implementation of the new symbol table can proceed in parallel, and when it's done we can switch the implementation of the processing stage transparently.
    • This would be easy to implement using the current implementation.
  • Then we can merge the new symbol table and ast changes independently until we have a stable AST structure

  • Once we have that, and an OK typeres, we can begin updating rules

Guidelines

Open guidelines

Identifying abstractions

  • The JLS grammar uses productions to denote two separate things
    • To describe the abstract relations between some constructs, eg Type ::= ReferenceType | PrimitiveType -> "A type is either a reference or primitive type"
    • To describe the concrete syntax of code constructs, eg PrimitiveType ::= int | float | ... "A primitive type is written int or float or ..."
  • While the second type of production mostly should correspond to a node in an AST, the first type would be better rendered by subtyping, which are a more natural way to abstract over stuff in Java.
    • class ASTPrimitiveType implements ASTType
    • ASTLocalVariableDeclaration could have a method ASTType getTypeNode() which returns a subtype of ASTType
  • Our Java grammar is based on some old JLS, which was directly translated to JavaCC without paying attention to the above distinction. Consequently we have nodes like ASTExpression or ASTType everywhere in ASTs, which are just clutter, and no abstraction whatsoever in the AST.
  • Identifying those nodes and turning them into interfaces implemented by the concrete syntax nodes makes trees smaller, easier to navigate, and provides more solid abstractions to work with

Work in Progress :fire:

Tasks

Note: As of early June 2019, this describes the work merged into the java-grammar branch and not yet on pmd/7.0.x

Download and run this runnable jar to check out the current version

Expression-related

  • Summary: Made left-recursive, trimmed down unnecessary layers, use AmbiguousName

  • #1759 [java] New expression and type grammar

  • #1855 [java] Make new java expression grammar stricter

  • #1857 [java] Make ClassLiteral not an ASTLiteral

  • #1872 [java] Remove ParenthesizedExpr

  • Fix text bounds of parenthesized expressions. For now, a parenthesized expression's text span is only the span of the innermost expr, but the first and last token should be the parentheses -> fixed via cf43a6f18e3

  • #1981 [java] Simplify array allocation expressions

  • [java] Implement constant values #2777

    • Make constant expressions have access to their value
    • In particular, implement the string formatting routines relevant to text blocks
  • Remove MethodLikeNode. LambdaExpression should extend the same base class as other expressions

    • LambdaExpression currently implements AccessNode because of that, which is absurd
    • This was initially done for the metrics framework. We can compute metrics on the block of the lambda though, and stop using qualified names (implementation dependent for lambdas). Using a data store system to cache metrics on the node will make that painless.
  • Improve SwitchExpression structure Refactor switch grammar #2204

    * TODO Do we group the statements of a SwitchNormalBlock ? Do we unify their interface ? SwitchStatement implements Iterator<SwitchLabel> which seems shitty tbh. * Something like "SwitchLabeledStatementGroup"?
  • Merge different increment/decrement expressions #1890

  • Close #497 RFC: new layer of abstraction atop PrimaryExpressions. Solution is implemented.

  • Operator nodes:

    • Implement #1661 About operator nodes
    • The methods getOperator() should not be deprecated. We'll change the return type from String to BinaryOp, which will be a breaking change on PMD 6, but is acceptable. There are anyway more changes to do when switching to PMD 7. This means, the new methods getOp() will be removed again.
    • Problems with String concat and + operators -> try left recursiveness (refs #1979, #2039)
    About left-recursiveness

    Sensible solution would be to make AdditiveExpression be parsed completely left-recursively, but then, it looks weird that all other binary expressions are not left-recursive… The problem also arises for operators &, |, ^, which may be boolean non-shortcut expressions or numeric bitwise expressions

    Arguments for left recursiveness:

    • Preserve all intermediary type resolution results, because of numeric promotion -> the whole point of #497
    • AST is JLS-like, easier to document (we don’t have to explain “sometimes it’s left-recursive and sometimes not”)
    • Easier mapping of type resolution algorithms from the JLS
    • Stronger contract for binary expressions: all can present the same interface:
    getLhs() : ASTExpression
    getRhs() : ASTExpression
    getOperator() : BinaryOp
    
    • Classpath independent
      • Users should configure their tool correctly anyway
      • But eg the structure should be not too different between a no-classpath designer and a real run with classpath

    Arguments against:

    • Not super intuitive

    • Breaks away from previous AST structure, needs to be relearned

    • Needs more tooling for some use cases: Eg how to catch a string concat statement like a = … + a + …; (taken from AvoidConcatInLoop #1932)

      • In Java we can imagine the following method for ASTAdditiveExpression which yields all operands that are added with the same operator: operandStream(): NodeStream<ASTExpression>
      • Eg for <a + b + c * d>, operandStream returns Stream.of(<a>, <b>, <c*d>). So you can do eg:
        additiveExpr.operandStream()
                          .filter(ASTVariableReference.class)
                          .any(ref -> ref.getName().equals(“a”))
      • However in XPath it’s less trivial.. We’d have to provide access to the node stream with a function:
        • Old way, with flat additive expr:
      AdditiveExpression[@Operator=+”]/VariableReference[@Image = “a”]

      New way:

      AdditiveExpression[@Operator=+”]
        [pmd-java:operandsOf(.)/self::VariableReference[@Image = “a”]]

Expression grammar

  • Make ASTVariableInitializer, ASTExpression, ASTPrimaryExpression, ASTLiteral interfaces

  • ExplicitConstructorInvocation uses ArgumentList

  • Introduce new node types:

    • ASTNumericLiteral, ASTStringLiteral, ASTCharLiteral

      • those divide the work of ASTLiteral, they implement it along with preexisting ASTBooleanLiteral, ASTNullLiteral
    • ASTFieldAccess

      • Only when it has a LHS, e.g. "a.b"
      • Only pushed when we're certain this cannot be a FQCN or type member
    • ASTVariableReference

      • Unqualified reference to a variable
    • ASTAmbiguousName

      • For those names that are syntactically ambiguous

      • Most of them aren't semantically ambiguous, even with no auxclasspath, meaning they can be rewritten to a more meaningful node (a VariableReference or FieldAccess). The attribute AmbiguousName/@SemanticCheck makes a (very simple) disambiguation using the current symbol table and other easy to grab stuff:

        • If there's a variable in scope with the name, then it's classified as EXPR
        • Otherwise, if there's a static import with the given name, then it's classified as EXPR
        • Otherwise, if there's an import with the given name, it's classified as TYPE
        • Otherwise, it's classified as AMBIGUOUS
        • Many scenarios where there's no ambiguity without auxclasspath are not covered
        • With an auxclasspath and a proper symbol table, we could reclassify them all

        This is a very simple prototype, any real implementation would use the more precise JLS rules and some more heuristics. I found that nevertheless, around 90% of syntactically ambiguous names are semantically unambiguous using those simple rules (that was observed on several large codebases). This is without any auxclasspath support.

    • ASTMethodCall

      • Doesn't use ASTArguments anymore, ASTArgumentsList is enough
    • ASTArrayAccess

    • ASTClassLiteral

    • ASTAssignmentExpression

    • ASTArrayAllocation, ASTConstructorCall

      • those replace ASTAllocationExpression
    • ASTThisExpression, ASTSuperExpression

      • those enclose their qualifier if any
  • Remove unnecessary node types:

    • ASTPrimarySuffix and ASTPrimaryPrefix
      • they're productions, not nodes
    • ASTUnaryExpressionNotPlusMinus
      • Merged into ASTUnaryExpression
    • ASTAssignmentOperator
      • made obsolete by ASTAssignmentExpression
    • ASTArguments
      • To remove it, ASTExplicitConstructorInvocation must be changed too
    • ASTVariableInitializer
      • Is superseded by ASTExpression, through making ASTArrayInitializer implement ASTExpression. This is only logical, they're expressions according to JLS. See deprecation note.
    • ASTRSIGNEDSHIFT, ASTRUNSIGNEDSHIFT
    • ASTAllocationExpression
      • The concrete node is replaced by ASTConstructorCall and ASTArrayAllocation, though we could make it an interface instead of removing it

Type-related

  • Make ClassOrInterfaceType left-recursive (in #1759 [java] New expression and type grammar)

    Type grammar changes

    Changelog: * ClassOrInterfaceTypes are now left-recursive. They use AmbiguousName too, because some segments need to be disambiguated between package or type name * TypeArgument, WildcardBound are removed, replaced with WildcardType and IntersectionType * Add ArrayType

    * Fixes
        * [#910 AST inconsistency between primitive and reference type arrays](https://github.com/pmd/pmd/issues/910)
        * [#1150 ClassOrInterfaceType AST improvements](https://github.com/pmd/pmd/issues/1150)
    
  • Close #1150 ClassOrInterfaceType AST improvements.

  • Use ArrayTypeDim in the other contexts it can appear in

    * After a VariableDeclaratorId * Should probably be nested inside the VariableDeclaratorId, since everywhere a VariableDeclaratorId can occur, braces can occur (field decl, param, etc) * After the FormalParameters of a method declaration * In array types (done)
  • Improve ASTType::getTypeImage or remove it

Annotation-related

  • Make annotation an interface (in #1759)

    • Turn Annotation, MemberValue into interfaces
    • Remove Name nodes, useless in all cases
    • Remove MemberValuePairs. MemberValuePair may only occur in NormalAnnotation so that node added no information.
  • #1875 [java] Move annotations inside the node they apply to

    About annotation placement...
    • Moving the annotations inside the node they apply to is easy implementation-wise, and would be cool abstraction-wise, e.g.
      • Annotations on EnumConstants are sprayed out in EnumBody at the moment
      • The annotations of a type may be in several places. E.g. in @A int[], the annot applies to int, not the array type, yet it's a sibling of the array type.
      • If you view the AST as a DOM, like we do for XPath, then annots would probably be elements nested inside the element they apply to, instead of siblings randomly placed outside of the node.
    • Use TypeAnnotationList or AnnotationList, where it is correct.

      Basically AnnotationList should be used on all "declaration contexts" listed here:

      https://docs.oracle.com/javase/specs/jls/se9/html/jls-9.html#jls-9.6.4.1

      And TypeAnnotationList in the "type contexts" listed here:

      https://docs.oracle.com/javase/specs/jls/se9/html/jls-4.html#jls-4.11

      There is syntactic ambiguity about it, which is clarified in

      https://docs.oracle.com/javase/specs/jls/se9/html/jls-9.html#jls-9.7.4

      The BNF excerpts in the JLS don't mention explicitly whether an annotation is a type annotation or not, so we have to cross-check with those reference sections to find out which one is appropriate.

      (Not all occurrences of ( Annotation() )* have been replaced yet in our grammar.)

  • Verify that #1367 Parsing error on annotated subclass is fixed and documented (next major development, template in Java_clean_changes)

  • Verify that #997 Java8 parsing corner case with annotated array types is fixed and documented (next major development, template in Java_clean_changes)

    • Depends on "use ArrayTypeDims on VariableDeclaratorId, MethodDeclaration"
  • Remove wrapper nodes that only serve to group the annotations with their subject

    • ClassOrInterfaceBodyDeclaration, BlockStatement, TypeDeclaration, AnnotationTypeBodyDeclaration
      • They're noise, circumvented awkwardly in every rule to reach the enclosed node.
      • Those were meant to be interfaces anyway, which are kind of emulated by eg ASTAnyTypeDeclaration and ASTAnyTypeBodyDeclaration. We should probably remove those and introduce interfaces like ASTDeclaration and ASTTypeDeclaration
    • ASTClassOrInterfaceBody, ASTEnumBody, ASTAnnotationTypeBody
      • Provided we flatten the nodes like described in the above bullet point, then it would make sense to delete the AST*Body nodes completely too. This would make the declarations of a type declaration direct children of the node, which would basically let us eg get the methods of a type declaration with node.children(ASTMethodDeclaration.class) (using the node stream API). That means that "find boundaries" can be removed from the Java AST, and makes everything more efficient. The only practical way to fetch methods currently is to call findDescendantsOfType, which traverses all the tree, and that's why find boundaries were necessary (because otherwise we go too deep). Instead, if we just remove this level of nesting, we can avoid traversing all the tree all the time, and remove the "find boundaries", which were a workaround to begin with.
      • This also removes three different node types which don't provide any information anyway, since their type is entirely determined by the type of their parent node.
  • Probably: unify all Annotation nodes (merge NormalAnnotation, MarkerAnnotation and SingleMemberAnnotation). Maybe not necessary, if we can select by ASTAnnotation (that's the interface)? :+1:

    • IMO it's better to unify them. That's because @A and @A() are semantically equivalent, yet are different nodes. The parsing is also not very natural and uses a lot of lookaheads, just to respect a syntax-only distinction.
    • Fixed by #2282
  • Annotated vargargs. See comment on #1875

    • On the varargs ellipsis ...: void varargs(String @A ... last);
      • In that case, the ... is treated just like array dimensions and annotations apply exactly as if the param was declared as String @A [] last
      • The sensible way to parse that would probably be to use ArrayTypeDims in some way... I'm not sure yet.
      • Eg maybe we should parse String... just as if it was String[], so use an ArrayType.
        • This causes problems because ArrayType is not fully recursive, so that the component type of an ArrayType is not always represented by a node.
          • Eg in String[], the component type is represented by the ClassOrInterfaceType
          • But in String[][], there is no node representing the component type String[] , since [][] are flat.
          • So maybe we should represent array types like the following:

    Edit: this is impossible, array types are not left-recursive in that way. Extra dimensions nest more closely. Eg int @A[] f @B[];, the type of f is int @B[] @A[], not int @A[] @B[], see https://docs.oracle.com/javase/specs/jls/se11/html/jls-10.html#jls-10.2

    Code Old AST Current 7.0 AST Proposed representation
    String[][]
    +Type
      +ReferenceType[@ArrayDepth=2]
      +ClassOrInterfaceType "String"
    +ArrayType
        +ClassOrInterfaceType "String"
        +ArrayTypeDims
          +ArrayTypeDim
          +ArrayTypeDim
    +ArrayType
        +ArrayType
          +ClassOrInterfaceType "String"
    String @A [] @B []

    N/A (Parse error)

    +ArrayType
        +ClassOrInterfaceType
        +ArrayTypeDims
          +ArrayTypeDim
              +Annotation "A"
          +ArrayTypeDim
              +Annotation "B"
    +ArrayType
        +ArrayType
          +ClassOrInterfaceType
          +Annotation "A"
        +Annotation "B"
    

    With varargs:

    String @A [] @B ...

    N/A (Parse error)

    N/A (Parse error)

    +ArrayType[@Varargs = true()]
        +ArrayType
          +ClassOrInterfaceType
          +Annotation "A"
        +Annotation "B"
    

    Note that this representation is both more economic in number of nodes (especially in the most common case where there are no annotations on the array dimensions), and also preserves every intermediary array type, which is nice.

  • [java] Allow @SuppressWarnings with constants instead of literals #520

TypeNode related

API-related

  • Remove ASTExtendsList::extendsMoreThanOne -> too specific

  • Remove ASTThrowStatement::getFirstClassOrInterfaceTypeImage -> too specific

  • Verify AST nodes comply to our "standard"

    * Javadoc with grammar * constructor package private * setters package private
  • #1856 [java] Hide internal AST api

    * API cleanup * `InternalApiBridge` for setting data into nodes at various stages. This is a internal API and not public API.

Documentation

  • Example usage of AssignmentOp (AssignmentExpressions), with the removal of ASTAssignmentOperator

  • Annotation() is now #void

  • Parenthesized Expressions: provide example to illustrate how the AST is improved now using attributes @Parenthesized and @ParenthesisDepth - see #1872 [java] Remove ParenthesizedExpr

  • ClassOrInterfaceType has now a TypeAnnotationList as child (as part of ClassTypeSegment) - see #1855 [java] Make new java expression grammar stricter

  • Document new ASTLiteral / ASTClassLiteral nodes (merely just a example, before - after)

  • High-level document about what processing stages Java is using to process java sources

    • parsing
    • rewrite ast / symbol table / disambiguation
    • type resolution
  • Document example of how method calls are parsed now. How the AST looks like. E.g. new Foo().bar.foo(1) is now parsed as MethodCall[foo]/FieldAccess[bar]/ConstructorCall[Foo].

    • document left recursiveness when we're done discussing the semantic rewrite phase
    • write a summary of changes in the javadoc of the package, will be more accessible.
  • In Javadoc: Add additional example with expressions: 1 + 2 / 3 is parsed as 1 + (2 / 3) resulting in the following AST:

        AdditiveExpression[@Operator="+"]
        + NumericLiteral["1"]
        + MultiplicativeExpression[@Operator="/"]
          + NumericLiteral["2"]
          + NumericLiteral["3"]

Deprecations due on PMD 6

  • ASTCastExpression (hasIntersectionType()) - no direct replacement
  • ASTClassOrInterfaceType (isArray() and getArrayDepth())
  • ASTEnumConstant (getQualifiedName())
  • ASTMarkerAnnotation (getAnnotationName())
  • ASTNormalAnnotation (getAnnotationName())
  • ASTSingleMemberAnnotation (getAnnotationName())
  • ASTImportDeclaration (getPackage() - 295759456dacef)
Other APIs that cannot be marked `@Deprecated` on 6.0.x

Nodes/classes that are deprecated on PMD 7, but not on PMD 6

  • List of classes:

    • ASTAllocationExpression
    • ASTArguments
    • ASTArrayDimsAndInits
    • ASTAssignmentOperator
    • ASTBlockStatement
    • ASTInstanceOfExpression
    • ASTMemberSelector
    • ASTMethodDeclarator
    • ASTNameList
    • ASTPrimaryPrefix
    • ASTPrimarySuffix
    • ASTResources
    • ASTStatementExpression
    • ASTTypeArgument
    • ASTTypeBound
    • ASTUnaryExpressionNotPlusMinus
    • ASTVariableInitializer
    • ASTWildcardBounds
    • ASTAnnotationMethodDeclaration
    • JavaParserVisitorReducedAdapter
    • ASTAnyTypeBodyDeclaration
  • List of classes, that are already removed on the 7.0 branch:

    • ASTMarkerAnnotation
    • ASTMemberValuePairs
    • ASTNormalAnnotation
    • ASTPreIncrementExpression
    • ASTPreDecrementExpression
    • ASTPostfixExpression
    • ASTSingleMemberAnnotation
    • ASTAnnotationTypeMemberDeclaration
    • ASTTypeDeclaration
  • List of renamed AST classes:

    • ASTResourceSpecification -> ASTResourceList
    • ASTCatchStatement -> ASTCatchClause
    • ASTFinallyStatement -> ASTFinallyClause
  • Document in Javadoc and next major development, that these classes will go away? We don't deprecate them, because we have no replacement for the users in PMD 6 (it would only be noise).

Nodes that are moved from class to interface

  • List of nodes

    • ASTAnnotation
    • ASTExpression
    • ASTLiteral
    • ASTMemberValue
    • ASTPrimaryExpression
    • ASTReferenceType
    • ASTType
    • ASTStatement
  • Document in Javadoc and next major development, that these nodes will go away? We don't deprecate them, because we have no replacement for the users in PMD 6 (it would only be noise).

Other Java-specific tasks

  • Turn Statement into interface, remove BlockStatement

    Why it makes sense...

    We only need a Statement abstraction. BlockStatement is used to enforce, that no variable or local class declaration is found alone as the child of e.g. an unbraced if, else, for, etc. So, I think this is a parser-only distinction that's not that useful for analysis later on (in the same vein, see eg the deprecation comment of VariableInitializer)

  • Rename CatchStatement and FinallyStatement to CatchClause and FinallyClause -> they're not statements, they're components of TryStatement

  • Rename StatementExpression to ExpressionStatement -> it's a statement, not an expression

    It's actually a bit subtler than that. StatementExpression in the JLS is a kind of expression. That's why the Statement production has an extra ; at the end: expression statements are defined as StatementExpression ";". Same thing for StatementExpressionList in ForInit: StatementExpression ("," StatementExpression)*

    However, the distinction between Expression and StatementExpression is confined to the syntax level: it's just there to prevent writing eg 1+1; as a statement. So, ASTStatementExpression usages in the API can be replaced with just ASTExpression, while the grammar file should keep using a #void StatementExpression production to prevent writing incorrect expression statements.

    Finally, ASTStatement would be a useful abstraction to group all statements under a common interface (eg, allowing making ASTBlock an Iterable<ASTStatement>). However in this case, we need an intermediary node to represent an "expression statement", ie a new ExpressionStatement node (which conceptually is different from StatementExpression). Making some ASTExpression nodes implement directly ASTStatement is in my experience writing other parsers not a good idea, and we should keep the ASTStatement hierarchy strictly separate from the ASTExpression hierarchy.

  • Extract ForeachStatement from ForStatement

  • Improve try-with-resources grammar (refs #1897)

    PR summary:

    • ASTResources is made void. There were two layers of nodes that represented the "resource list" abstraction, and that wasn't necessary
    • Rename ResourceSpecification to ResourceList (this is more consistent with some other *List nodes we have: ExtendsList, ArgumentList, ImplementsList, soon-to-be ThrowsList)
    • Add a LocalVariableDeclaration node inside Resource node, when they're not a concise resource
      • This is because Resource in that case has an identical API/ represents the same concept
      • This simplifies type resolution, since ClassTypeResolver can work on the VariableDeclarator's initializer just like in other LocalVariableDeclaration
    • Remove the workaround in OccurenceFinder. This worked around the fact that concise resources were just a Name, however, it's now an expression, so is handled by the rest of the class
    • Resource doesn't extend FormalParameter anymore, refs #998 point 3
    • Add an attribute to ResourceList to expose whether there was a trailing semicolon or not -> we can make a rule later to check that
  • Create a ThrowsList node instead of that horrible NameList for methods & constructors. Children should be ClassOrInterfaceType, not Name. (refs #2034)

    Why it makes sense...
    • This uniformizes the representation of types everywhere. It is also more in line with the actual JLS grammar, since an exception type in a throws clause may be generic or annotated (JLS)
    • Allows for annotations, see comment on #1875
  • Remove the Name node in imports and package declaration, (refs #1888)

    Why it makes sense...

    Name is a TypeNode, but it's equivalent to AmbiguousName in that it describes nothing about what it represents. The name in an import may represent a method name, a type name, a field name... It's too ambiguous to treat in the parser and could just be the image of the import, or package, or module.

  • Sorting out receiver parameters: Deal with ExplicitReceiverParameters explicitly in the grammar (refs #1980)

    https://docs.oracle.com/javase/specs/jls/se12/html/jls-8.html#jls-8.4 - in our grammar, we have packed it into VariableDeclaratorId, so it is allowed in more places (also it slows down VariableDeclaratorId).

    See also #1017 [java] Better handle explicit receiver parameters.

  • Align method and constructor declaration grammar (refs #2034)

  • Remove ASTMethodDeclarator (refs #2034)

    The info should be added directly to ASTMethodDeclaration. This is because, the node isn't even shared with AnnotationMethodDeclaration, so it only introduces inconsistencies.
  • EnumConstant changes

    * Add a VariableDeclaratorId inside, to make it consistent with eg FieldDeclaration * Use the new node AnonymousClassDeclaration
  • Add openjdk12 sources for use with pmd-regression-tester

  • Stop ClassScope building nodes itself to represent implicit declarations of enums

    • Technically the problems it causes (internal API leak) are fixed by #2447, can be removed later
    • The newer symbol table already updated to create symbols for implicit enum & record members
    Symbols should be used instead
  • Fix #998 [java] AST inconsistencies around FormalParameter

    • Separate FormalParameter from catch clause formal parameters (see JLS §14.20) and other improvements around FormalParameter
  • Fix #1128 [java] Improve ASTLocalVariableDeclaration

    • Improvements to LocalVariableDeclaration and FieldDeclaration
  • Fix #1307 [java] AccessNode API changes

General improvements needed from core

  • #1796 [core] Wire processing stages into SourceCodeProcessor

  • #1787 Match abstract types in XPath queries (e.g. //*[nodeIsOfType(Expression)])

  • Externalise suppression, eg with a Suppressor interface, to isolate the AST from rules

  • Split AbstractNode to not force the root class to have beginLine, endLine, etc fields

    `AbstractJavaNode`: beginLine, beginColumn, endLine, endColumn - setting from firstToken/lastToken? Or in AbstractNode? Since we store the first/last token, we could return the position from the tokens instead of maintaining fields beginColumn... * This is true, but for ASTs that wrap some other representation, maybe there's no token (eg Apex, JS?). Most probably, we should split AbstractNode into an AbstractJjtreeNode, that uses the tokens, and maybe AbstractWrapperNode, for the other stuff. Default methods on the Node interface would help.

Open questions

  • Naming of AST Nodes: Should all expression nodes end with the suffix Expr? For example: ArrayCreationExpr instead of just ArrayAllocation. Other possible suffixes e.g. Type, Name, Literal, Statement, Declaration.

  • We sometimes have *List nodes, but sometimes they are #void. E.g. LambdaParameterList is not void, but AnnotationList is...

    • This doesn't strike me as a problem, could you elaborate?
  • Should we remove checks for old java version (like check for bad diamond usage...)? In what measure? When do we do this? I think, it should be also done on the java-grammar branch?

  • Comments

    • Should they be part of AST?
      • Probably not in the main tree, ie not matchable via name test in XPath (we can use the comment() kind test).
    • Currently we have JavaNode.comment() - so every node has this method.
    • However, the AST assignment is done in AbstractCommentRule... The parser only provides a list of comments on the compilation unit (see ASTCompilationUnit.getComments()).
      • FIXME: AbstractCommentRule: assignCommentsToDeclaration -> FIXME, make that a processing stage
      • I think the rules Spoon uses are as close to "canonical" as can be.
  • Avoid reliance on the image of nodes. It's not semantic, some getter with a specific name should be used instead. I'd be for deprecating get/setImage from 7.0.0 on, not necessarily removing it.

    • Proposed guideline: @NoAttribute the image attribute on new node types
    • Also in TypeHelper: "FIXME checking against the image is for the most part meaningless."
  • AST simpler -> towards Universal AST?

  • ClassOrInterfaceBodyDeclaration has a monstrous lookahead to check for enums

  • Remove ASM dependency? Do we still need it for type resolution?

    • I think it's too early to decide. Type resolution most probably won't need it (it only does for imports..), but there are some other probably useful things we can only find-out through bytecode analysis, because there's no reflection API for it. Examples:
    • Finding out whether a public static final field in a library is a compile-time constant, and hence whether eg 1 * Math.PI in a user class is also constant.
    • Finding with 100% certainty whether a method is overridden, and to which method it delegates. Currently MissingOverride uses a heuristic to do that, which fails in some cases where generics is involved, or with covariant return types.
      • This also helps the symbol table and type resolution, which need to take covariant return types into account, etc.
  • Modifiers: Use different kinds - ClassModifiers, MethodModifiers, InterfaceMemberModifiers, ... to avoid problems like in #1849 Preserve local class modifiers.

    • See comment about that, maybe it's useful to be lenient
  • Split ForStatement into ForStatement and ForeachStatement

Dropped changes

  • SwitchExpressions and lambda workaround (inSwitchLabel flag). Try to get rid of it.

    • The flag cannot be removed, because a constant expr (in a switch label) can be a ? b : c, and c could be a lambda, but in a switch label, this is ambigous with the switch rules.
    • TODO explain
  • Remove ASTResultType

    • If a method declaration has no Type node, then it's void. This removes yet another node from the grammar. Also, void is not a type, but rather denotes the absence of a type, and it's not useful to populate the type with void.class... Otherwise, I'd propose to have a VoidType node, but no intermediary ResultType node.
    • Why it's problematic:
      • In fact we probably need to keep that, to have a node whose type is the result type of the method declaration, taking into account additionnal brackets after the formal parameters. This is a corner case, yet we should probably handle it gracefully. Alternatively, maybe make MethodDeclaration a TypeNode ?
  • Deprecate getOperator methods. We'll simply change the return type from String to BinaryOp. This breaks Java compat, but ensures a smooth transition for XPath users.

Rules for Java

Codestyle

  • UnnecessaryCast: currently only cares about a few very specific cases about collections, could be generalized to any cast, including

    • casts that provoke boxing/unboxing of the operand, where autoboxing would apply
    • casts that widen a primitive type (double) i
    • but not: casts that disambiguate between several overloads (super.visit((JavaNode) node, data);)
    • difficulty: very easy, except the last item
    • should react to @SuppressWarnings("cast")
  • UseDiamondOperator: could be generalised to UnnecessaryTypeArguments, that eg also reports

    • List<T> a = Collections.<T>emptyList();
    • List<T> a = bool ? Collections.<T>emptyList() : otherList; Note that this type argument is necessary in java 7 but not in java 8. This means, it could be a useful rule for people migrating, including us.
    • List<T> a; a = new ArrayList<T>() (not in the same statement)
    • Arrays.<Integer>asList(1, 2) (even without context)
    • etc
    • difficulty: hard, as it needs to play with type inference
    • TODO: maybe this is done by [java] Update rule UseDiamondOperator #3253
  • ExtendsObject: rule could also check for

    • wildcard bounds: ? extends Object
    • type parameter bounds <T extends Object>
    • difficulty: trivial
  • UselessParentheses: rule should be aware of

    • parentheses that differentiate addition from string concatenation: 1 + (2 + "") == "12", 1 + 2 + "" == "3"
    • parentheses that clarify precedence (optionally) (#1918)
    • I already refactored the rule, which anyway needed a complete rewrite because of the changes to ASTInfixExpression
    • It would be nice to rename it to "UnnecessaryParentheses", for consistency with all the other "unnecessary" rules
    • Also fix FP #1673
    • difficulty: done
  • UnnecessaryFullyQualifiedName: rule should be scoped down or scoped up, and probably renamed. For now it detects fully qualified names (java.io.IOException), and also qualifiers for static methods & fields (Collections.emptyList()). It misses cases of either, as it only considers imports, not inherited members.

    • With the new symbol table it's really easy to write a general rule for the first case, that in principle handles all cases: there's ASTClassOrInterfaceType::isFqcn and you can basically write node.getSymbolTable().types().resolve(node.getSimpleName()) == node.getSymbol() to check if the qualification can be removed without changing the meaning of the type.
    • For the case of method qualifiers, in the general case, the qualifier could be differentiating two methods with the same signature that are both in scope, and we might have to perform overload resolution to check whether removing the qualifier is really a noop, semantically. This logic is anyway entirely separate from checking for qualified names for types.
    • My point about scope is:
      • either the rule should only care about fully qualified names, and not method qualifiers (and a separate rule should be split off)
      • or the rule should be renamed to UnnecessaryQualifier/UnnecessaryQualification, and handle both cases, plus possibly, the following things which are also qualifiers:
        • this.foo/this.foo(): could be optionally flagged
        • Outer.this.foo(): this is UnnecessaryQualifiedThis, but that rule only handles a single case and not eg a qualification that doesn't actually lift an ambiguity. It would be merged into the new more general rule
    • difficulty: the part about types is easy, the part about methods is less easy in the general case (but the simple cases are common enough for a first version)
  • UnnecessaryAnnotationValueElement: could be generalized to UseAnnotationShorthand, because there are other shorthands for annotations:

    • @A(): empty member list can be removed
    • @A(value=..): already caught
    • @A(a={""}) : arrays with single elements can be flattened -> @A(a="")
    • @A({"a"}) -> @A("a")
    • difficulty: easy
  • DefaultAnnotationValue: reports setting attributes of an annotation to their default value

    • Difficulty: easy, with proper symbol support (we need to expose the default value on the method symbol)

Best practices

  • UnusedAssignment has a couple of known limitations that can be fixed thanks to precise overload resolution:

    • thrown exceptions in try blocks
    • explicit this constructor calls
    • difficulty: medium
  • MissingOverride:

    • There are a few known corner cases, because it uses bridge methods. These are documented in the class. With #1673 we can fix them.
    • The rule will have to be rewritten to not use Class, instead using symbols/types. I believe most of the tooling is already in #1673, as it also has to handle overriding for overload resolution
    • difficulty: medium

Error-prone

  • MissingBreakInSwitch: could probably be improved to detect whether a case is fallthrough on any code path. Currently if you add a break anywhere, the case may still fall through.
    • The flow analysis done by UnusedAssignment already detects abrupt completion, and can detect whether a switch case is fallthrough. It's just a matter of exposing that information in the right place, and moving the analysis visitor out from UnusedAssignment to be shared by some other rules
    • would be nice to ignore //fallthrough comments, also fix #1899
    • difficulty: medium

Performance

  • StringToString: could be generalized to UnnecessaryStringConversion, and UselessStringValueOf could be merged into it. There are many places where an explicit toString is not needed:
    • in a string concatenation: "foo: " + foo.toString()
    • as the message of an assert statement assert true : foo.toString()
    • as the argument of a StringBuilder.append call sb.append(foo.toString())
    • as the argument of a PrintStream.println/print call System.out.println(foo.toString())
    • the rule could also consider String.valueOf like toString in those contexts
    • difficulty: not hard, i guess

This should be removed from the performance category. "".toString() will just return this, as per the trivial implementation in class String. Any JIT would optimize this away by inlining the instant it would make a performance impact, and if it isn't inlined, then it doesn't make a performance impact. This is mostly for code readability I think.

  • UnnecessaryWrapperObjectCreation: could be generalized to UnnecessaryBoxing, and handle both boxing and unboxing. Currently the rule only cares about primitive wrappers that are created and unboxed explicitly in the same expression, like Integer.valueOf(0).intValue(), which is absurd and no one does. The rule could care about
    • Explicit boxing in contexts where the primitive would be autoboxed: Object i = Integer.valueOf(2)
    • Explicit unboxing in a context where it would be auto-unboxed: int i = integer.intValue()
    • Explicit unboxing in contexts where the value is immediately reboxed: Object i = integer.intValue() (would rebox the int)
    • Explicit boxing in contexts where the value is immediately unboxed: int i = Integer.valueOf(0) (would unbox the integer)
    • Boxing of an already boxed value: Integer.valueOf(integer) (unboxes integer, then boxes it)
    • Uses of Integer.valueOf(someString) where an int is expected: Integer.parseInt is more appropriate.
    • This applies to all primitives
    • Casts that box/unbox their operand could either fall in this rule or UnnecessaryCast
    • Should react to @SuppressWarnings("boxing")
    • difficulty: easy with #2689

Tbh UnnecessaryBoxing should be part of codestyle.xml, like UnnecessaryCast, and not performance. Apart from Integer.valueOf, the JIT probably optimizes these out and I'd be amazed if this is still relevant for performance outside of a hot loop. This rule is more about code style, like UnnecessaryCast.

Possible new rules

  • UseOfRawTypes: basically the compiler warning.

    • Should support @SuppressWarnings("rawtypes")
    • Category: codestyle? bestpractices?
    • Difficulty: trivial (if (node.getTypeMirror().isRaw()) addViolation(..))
  • UncheckedWarning: basically the compiler warning.

    • Should support @SuppressWarnings("unchecked")
    • Category: codestyle? bestpractices?
    • Difficulty: easy to write, hard to test. There's already an API in #1673 to check whether unchecked conversion occurs. But that is under-tested right now.
  • UnusedSuppressWarnings: flag @SuppressWarnings annotations that don't suppress anything

    • This is enabled by the improvements to MissingBreakInSwitch/FallthroughSwitchCase, and the previous two rules: we have to detect whether there is a warning, to check whether the annotation is really unused
    • This has another interesting application, to detect PMD suppressions that don't suppress anything. This appears hard to do without special-casing it, but looks like a useful feature. Maybe not part of this rule, if it is a special check.
    • Difficulty: ?
  • StaticMemberAccessQualifier:

    • flag static members accessed through an instance (yes, new Double(0).MAX_VALUE is legal, even ((Double) null).MAX_VALUE, which doesn't NPE).
    • flag static member accessed through a subtype. Eg given interface Itf { static int I = 0; } class Sub extends Itf { }, you can say Sub.I, but it would be better to write Itf.I.
    • Should support @SuppressWarnings("static-access")
    • Category: codestyle
    • Difficulty: easy
  • UnusedTypeParameter + UnusedLabel

    • For type params + labeled statements
    • Should support @SuppressWarnings("unused")
    • Category: codestyle
    • Difficulty: easy
  • UnusedThrowsDeclaration: flags exceptions declared in method signature that aren't thrown in the body

    • So we need overload resolution of all methods in the block
    • We also need to make sure the method cannot be overridden
  • DuplicateThrowsDeclaration: flags throws clauses with duplicate exceptions/ exceptions that are subtypes of one another

    • eg throws Exception, Exception
    • eg throws IOException, Exception
  • CharLiteralUsedAsInt: flags char literals which are implicitly widened to an int, which may not be expected.

    • Eg one day I wanted to create a new StringBuilder("{") and though I could replace the single-char string literal by a char literal, like you can do with append. Except new StringBuilder('{') actually widens the char to an int and selects the constructor you use when you want to set the initial capacity.
    • another example is just string concat 1 + ':' + foo.toString() adds 1 and : together.
    • Category: errorprone.xml
    • Difficulty: easy, a priori. Like for UnnecessaryBoxing, #1673 introduces some utilities to get the context type of an expression, which lets us determine which conversions take place.
  • LambdaCanBeMethodReference: flags lambdas like (a1, .., an) -> m(a1, .., an), which can be replaced with eg this::m

    • Category: codestyle?
    • difficulty: easy
  • OptionalApiUsage: #1034, #2719

    • Best to have a single rule as 10 new rules IMO
    • Category: bestpractices
    • difficulty: ok, I guess, but there's a lot of cases
  • StreamExpressionCanBeSimplified: lots of cases, the intellij inspection does the following:

    • Best to have a single rule as 100 new rules IMO
    • Category: bestpractices
    • difficulty: ok, I guess, but there's a lot of cases
    Examples
    • collection.stream().forEach() → collection.forEach()
    • collection.stream().collect(toList/toSet/toCollection()) → new CollectionType<>(collection)
    • collection.stream().toArray() → collection.toArray()
    • Arrays.asList().stream() → Arrays.stream() or Stream.of()
    • IntStream.range(0, array.length).mapToObj(idx -> array[idx]) → Arrays.stream(array)
    • IntStream.range(0, list.size()).mapToObj(idx -> list.get(idx)) → list.stream()
    • Collections.singleton().stream() → Stream.of()
    • Collections.emptyList().stream() → Stream.empty()
    • stream.filter().findFirst().isPresent() → stream.anyMatch()
    • stream.collect(counting()) → stream.count()
    • stream.collect(maxBy()) → stream.max()
    • stream.collect(mapping()) → stream.map().collect()
    • stream.collect(reducing()) → stream.reduce()
    • stream.collect(summingInt()) → stream.mapToInt().sum()
    • stream.mapToObj(x -> x) → stream.boxed()
    • stream.map(x -> {...; return x;}) → stream.peek(x -> ...)
    • !stream.anyMatch() → stream.noneMatch()
    • !stream.anyMatch(x -> !(...)) → stream.allMatch()
    • stream.map().anyMatch(Boolean::booleanValue) -> stream.anyMatch()
    • IntStream.range(expr1, expr2).mapToObj(x -> array[x]) -> Arrays.stream(array, expr1, expr2)
    • Collection.nCopies(count, ...) -> Stream.generate().limit(count)
    • stream.sorted(comparator).findFirst() -> Stream.min(comparator)
  • InnerClassCanBeStatic: local or nested class is non-static, but uses no members of the enclosing class

    • Category: bestpractices?
    • difficulty: ?
  • ArrayDeclarationStyle: avoid declaring eg

    • int i[]
    • for (int i[] : ..);
    • never seen, but exists: int foo()[]
  • UseInstanceofToCompareClasses: this is in our dogfood ruleset, could be a standard rule.

    • also consider #1755, consider other cases where it's ok to use getClass (equals method?)
  • PatternCanBeConstant: reports usages of Pattern.compile with constant arguments, which could be put into a static final constant

Other quirks about current rules

  • DefaultPackage and CommentDefaultAccessModifier overlap: choose one
  • Could BooleanGetMethodName be merged into LinguisticNaming?
  • SignatureDeclareThrowsException looks useless. Sometimes it isn't up to you.
  • design/UseObjectForClearerAPI <-> codestyle/ExcessiveParameterList
    • the former is just a sub case of the latter, only considers strings
  • AvoidInstantiatingObjectsInLoops
    • super old case, only relevant to hot spots, irrelevant if object does not escape
⚠️ **GitHub.com Fallback** ⚠️