[GSoC 2017] Complete type resolution for Java - pmd/pmd GitHub Wiki

Complete type resolution for java (GSoC 2017)

Proposal by [Bendegúz Nagy for implementing complete type resolution for Java, based on this project idea.

GSoC 2017 Final Work Product

Issues

  • Incomplete type resolution: Type resolution is implemented by an AST visitor named ClassTypeResolver. The visitor traverses the whole AST and assigns each node its type. Currently, what's missing is handling ASTPrimaryExpression, ASTPrimaryPrefix and ASTPrimarySuffix correctly.

  • Inefficiency: When a rule requests type resolution, the whole AST tree is parsed even though rules mostly care for only a small portion of the tree.

Goals

  • Complete the implementation of the type resolver so that each AST node which has a type, can know its type.
  • Produce at least one rule demonstrating that complete type resolution is supported
  • Analyse the performance of on-demand type resolution, improve upon complete AST tree processing if required
  • Expose all the feature of type resolution to XPath-based rules
  • Merge back the rules requiring type resolution into the standard rule set

Impact

  • Providing complete type resolution makes way for creating much more complex and useful rules which would not be achievable without it.

Proposed mentors

Proposed approach

Code examples can be found here at this repository.

Note: PMD only has to deal with semantically correct code, thus in many cases, shortcuts can be taken with this assumption because the task at hand doesn't require to identify uncompilable ambiguity. For example, the search can end after finding the first simple name matching a field.

Shorthands for AST nodes: PrimaryExpression == Primary, PrimaryPrefix == Prefix, PrimarySuffix == Suffix

Most expressions fall into being PrimaryExpressions with PrimaryPrefix and one or more PrimarySuffix child nodes. These include literals, ClassLiterals, this keyword, TypeName.this keyword, expressions in parentheses, class instance creation, field access, array access, method invocation and method references.
Primary expressions

Literals, ClassLiterals and expressions in parentheses are already handled by the current implementation.

Field access

Field access is relatively simple compared to method invocations for example. First, we have to identify that we are dealing with field access, "field = 10" would produce AST nodes Primary[Prefix["field"]]. After excluding keywords like "this" and "super", using reflection on the class should do the trick. Other cases like (...).field are yet again easily distinguishable from other PrimaryExpressions. After that, resolution can take two paths depending on if we are dealing with a simple name or not. With simple names, the first local variable, parameter or visible field has to be searched for. Not simple name cases consist of Primary.field, super.field, TypeName.super.field, TypeName.field.

  • In case of Primary.field: the type of Primary has to be searched for 'field'.
  • In case of super.field: the innermost (think Nested classes) where this expression occurs has to be searched for 'field'.
  • In case of TypeName.super.field: the first lexically enclosing type declaration has to be searched for 'field'.
  • In case of TypeName.field, let TypeName be T: if T is a class or interface then find a static variable. If T is an expression then determin it's declared or infered type and search that.

Care: fields may shadow type declarations. -> with qualified names (Q.field) Q has to be determined to be an expression or a type first.
Care: I've no idea what capture conversion is.
Note: this.field, super.field produce PrimaryExpressions where "this" or "super" is in the Prefix.
Note: exact AST nodes produced are very similar to method invocations, except that there is no argument and type argument list.

Fields | Simple names | Qualified names | Primary.field | super.field | Capture conversion | Code examples

Method invocation

  • MethodName example foo(): Primary[Prefix[foo], Suffix[Arguments(0)]]
  • MethodName example foo(10): Primary[Prefix[foo], Suffix[Arguments(1)[Expression...]]]
  • TypeName example Wat.foo(): Primary[Prefix[Name[wat.foo]], Sufix[Arg...]]
  • ExpressionName example this.foo(10) (or super): Primary[Prefix[this], Suffix[foo], Suffix[Arg...]]
  • Primary example (wat).foo(10): Primary[Prefix[Expr...], Suffix[foo], Suffix[Arg...]]
  • Type arg example (wat).foo(10): Primary[Prefix[Expr...], Suffix[MemberSelector[TypeArg...]], Suffix[Arg...]]
  • TypeName.super example Wat.super.foo(10): Primary[Prefix[Name[Wat]], Suffix[super], Suffix[foo], Suffix[Arg..]]

From the above short list we can conclude the following:

  • Arguments always go into the last and separate PrimarySuffix
  • The part until 'this' or 'super' goes into the Prefix[Name[..]] node, the this and super into a Suffix node, for example TypeName.super.foo.wat() would be Prefix[Name[TypeName]], Suffix[super], Suffix[foo], Suffix[wat].
  • The above point also illustrates that the parts after this or super go into separate Suffix nodes, if there is no 'this' or 'super', then the whole chain goes into one Prefix[Name[...]] node: wat.bar.foo() produces Prefix[wat.bar.foo].
  • Type arguments if present, group with the method name into a single Suffix[MemberSelector] node, for example: wat.bar.foo() produces Prefix[Name[wat.bar]], Suffix[MemberSelector(foo)[TypeArgs...]]
  • Primary expressions can only be present at the beginning and have the same effect as 'this' and 'super', meaning that the names following it go into separate Suffix nodes and do not group under a Prefix[Name] node.

Note: there can only be one this or super in an expression with the above forms.
Note: explicit type arguments are ignored if the method involved is not generic.

Determining the type of a method invocation requires finding the invoked method and determining it's return type. The search begins be identifying the type to be searched T, finding all the applicable methods of T with regards to argument compatibility and accessibility, considering matching with the following phases (in order): strict invocation, loose invocation, variable arity invocation. If on of these produce more than one applicable method, then it the most specific one will have to be determined. Finally the type of the method invocation has to be determined taking into account generics and explicit or implicit type arguments. Note: type arguments present difficulty, as when they are present, they introduce capture conversion and they require that their upper bounds be determined which is another set of rules.

Method invocation | Finding the right Type | After finding the right Type | Potentially applicable methods | 1. Strict invocation | 2. Loose invocation | 3. Variable arity invocation | Most specific method | Method invocation type | Code examples

This expression

The this expression can play a role in field access, method access and reference to the current object. In each case, it's type has to be determined as follows.

  • Appearing as a primary: in a constructor, instance or default method, instance initializer. In each case it's type is the enclosing type declaration, (this will have interface type in default methods). 'this' will produce the following AST node when it stands alone Primary[Prefix[this]].

Note: 'this' in lambdas have the type of 'this' in the enclosing context.
Note: this can also stand to denote a receiver type, it is of no interest to us in that position.
Note: this can also be an explicit constructor call, it is of no interest to us in that position.

this expression | Code examples

Qualified this expression

With the help of a qualified 'this' we can refer to any lexically enclosing type declaration (non-static nested classes). The produced AST node of TypeName.this would be Primary[Prefix[TypeName], Suffix[this]].

Care: Qualified this can refer to the current class as well.

Qualified this | Code examples

MethodReference expressions

For code examples and produced AST nodes can be found in the 'Code examples' link below.
Method references produce AST nodes by the following rules:

  • 'super' goes into a Suffix
  • If the expression starts with a TypeName, ClassType, ReferenceType, or ExpressionName then the part before 'super' or '::' goes into Prefix[Name[...]]
  • the part after '::' goes into a Suffix[MethodReference['methodName']]
  • if type arguments exist, then they go into a TypeArguments child node under MethodReference
  • the form 'ArrayType :: new' produces a Primary[Prefix[ RefType['array'], MethodReference[new] ]]

Identifying that we are dealing with a method reference is almost like detecting that we are dealing with a method invocation, except there is not Suffix[ArgumentList[...]] at the end.
Method references are always poly types, determining their type requires examining the surrounding assignment context, this and inferring the right generic type arguments should be sufficient to type them. Their type is always some functional interface.

Method references | Declaration of a method refernce | Type of a method reference | Code examples

ArrayAccess expressions

For code examples and produced AST nodes can be found in the 'Code examples' link below.
Array access can be identified by looking at the PrimaryExpression's last child nodes, they will be of type Suffix['[']. The number of these nodes determines the depth of the access. Determining the expressions type boils down to examining the type of the first part (no Suffix['['] nodes), let it be type T[], this just then has to be combined with the depth of the access.

ArrayAccess expression | Code examples

ArrayCreation expressions

For code examples and produced AST nodes can be found in the 'Code examples' link below.
Array creation expressions produce a Primary[Prefix[AllocationExpression[Node containing type, ArrayDimsAndInts[..]]]], thus identifying them is trivial. Returning their type, as stated in a comment in the current implementation would require instantiating an new array with the right dimensions and asking it for it's type.

Array creation Code examples

Class instance creation expression

Class instance creation can be identified by the last child node of the PrimaryExpression having AllocationExpression as a child node. Identifying the type being instantiated (e.g. the type of the expression) can take the following paths:

  • new Type() -> Primary[Prefix[AllocationExpression[...]]]: the allocation expression node contains the name of the class
  • Primary.new Type() -> Primary[Prefix[Primary[...], Suffix[AllocationExpression[...]]]: the primary's type has to be determined and then combined with the type in the allocation expression to produce the nested class's type
  • ExpressionName.new Type() -> // Primary[Prefix[Name[c]], Suffix[AllocationExpression[...]]]: same as above, except the ExpressionName's type has to be determined.

Class inst. creation expr. | Code examples

General approach

The general approach to solving this project successfully would be to complete the partially complete research (see above) on produced AST nodes, different cases and exact formal rules on the semantics of Java. Since we are dealing with a precisely defined formal language, each piece of code should have an associated JLS part associated with it, so that we can reason about it that it does in fact implement Java's type resolution rules. With this approach, missing rules should be easily spotted when examining the JLS and the type resolution code side-by-side.
Common subtasks: capture conversion, searching a type for fields and methods, search for type declarations outwards, determine the upper bound of generics, filter annotations.

⚠️ **GitHub.com Fallback** ⚠️