Schema version 7.0.0 - pmd/pmd GitHub Wiki

⚠️ This page is outdated. Please refer to PMD 7 Tracking Issue #3898 for the current status.

TOC:

Rule element
Properties
- Property values
- Property definitions
Complete Example
Open Points

Prototype

Branch

Schema
- only implemented rule definitions IIRC, and not rule references
- many design decisions deviate from this document, eg, the use of attributes vs nested elements. This is of course up for discussion. The point is, we can really start fresh because the XSLT can transform everything automatically anyway
- split into several files for clarity, and eg to be able to reuse the property schema from the test XML schema. Maybe, this will cause problems, I know my IDE does not like it.
XSLT
- Converts the old ruleset format to the newer one easily
- This class implements a small CLI to use it (give a file name, get the transform output on stdout)
Example output (autogenerated by the XSLT from codestyle.xml, which doesn't do formatting, so whitespace is ugly)

This is updated very infrequently as implementing these changes is not a priority right now (Nov 2020).

Rule element

Current schema excerpt:

 <xs:complexType name="rule">
    <xs:sequence>
      <xs:element name="description" type="xs:string" minOccurs="0" maxOccurs="1" />
      <xs:element name="priority" type="xs:int" default="5" minOccurs="0" maxOccurs="1"/>
      <xs:element name="properties" type="properties" minOccurs="0" maxOccurs="1" />
      <xs:element name="exclude" type="exclude" minOccurs="0" maxOccurs="unbounded" />
      <xs:element name="example" type="xs:string" minOccurs="0" maxOccurs="unbounded" />
    </xs:sequence>
    <xs:attribute name="language" type="xs:string" use="optional" />
    <xs:attribute name="minimumLanguageVersion" type="xs:string" use="optional" />
    <xs:attribute name="maximumLanguageVersion" type="xs:string" use="optional" />
    <xs:attribute name="name" type="xs:ID" use="optional" />
    <xs:attribute name="since" type="xs:string" use="optional" />
    <xs:attribute name="ref" type="xs:string" use="optional" />
    <xs:attribute name="message" type="xs:string" use="optional" />
    <xs:attribute name="externalInfoUrl" type="xs:string" use="optional" />
    <xs:attribute name="class" type="xs:NMTOKEN" use="optional" />
    <xs:attribute name="dfa" type="xs:boolean" use="optional" />  <!-- rule uses dataflow analysis -->
    <xs:attribute name="typeResolution" type="xs:boolean" default="false" use="optional" />
    <xs:attribute name="deprecated" type="xs:boolean" default="false" use="optional" />
  </xs:complexType>

There are 17 attributes and elements here; they're all optional

I can find 4 different use cases which require different sets of attributes:

Key:

❌ Prohibited
🌕 Required
🌀 Optional
Question mark: I don't really know how it should behave. For most of those in the 4th column I'm leaning towards prohibited, because it overrides on all the rules some things that should normally be thought on a per-rule basis. Do you have an idea?

[adangel]: I think, we should prohibit these attributes, until we have a compelling use case (adding stuff later is always easier). I agree, these define things, that apply on a per-rule basis.

	Java rule def	XPath rule def	Single rule reference	Whole ruleset reference
priority	🌀	🌀	🌀	?
minimumLanguageVersion	🌀	🌀	❌	?
maximumLanguageVersion	🌀	🌀	❌	?
since	🌀	🌀	🌀	?
externalInfoUrl	🌀	🌀	🌀	??
deprecated	🌀	🌀	🌀	🌀
language	? (1)	🌕	❌	❌
name	🌕	🌕	🌀	❌
ref	❌	❌	🌕	🌕
message	🌀	🌕	🌀	❌
class	🌕	❌ (2)	❌	❌
dfa (3)	🌀	🌀	❌	❌
typeResolution (3)	🌀	🌀	❌	❌
description	🌕	🌕	🌀	❌
exclude	❌	❌	❌	🌀
example	🌀	🌀	🌀	❌
properties	Only overrides	Definitions+overrides (4)	Only overrides	❌

(1) language is currently optional for Java rules, since some superclasses like AbstractJavaRule declare the language in java. I think it may be good to choose one strategy, but not have both at the same time, that is, either require the attribute or prohibit it.

[adangel]: I agree, we should choose one way: always specify the language in xml. Will also help later on for ruleset tooling.

(2) If XPath rules don't declare their class explicitly, we can e.g. choose an implementation by language

(3) I think dfa and typeresolution attributes should be dropped from the schema. Ideally we'd use annotations in java rules and detect dependencies automatically for XPath. We could put that on the plan.

(4) XPath rule defs may assign a value to a property available on all rules, e.g. codeclimate multipliers for Apex, or violationSupress___

I think each of these use cases could use a separate element.

[adangel]: Yes, separate elements makes it clearer. See also down below, open points: We might want to separate rule definitions more strictly from rule references as to not mix these use cases in a single XML file.

[jsotuyod]: Beware on hard-splitting rule definition from ruleset configuration. I'm ok in general, but XPath rules are a huge deal here. I believe if anyone is using custom rules, it's either as a plugin (in which case a dedicated rule definition XML seems right), or it's using XPath for something very specific to their project (I personally do that a lot, and we even do so in PMD build tools). Forcing users to create 2 files, once defining the XPath rule and the other one including it in the ruleset is boilerplate and hinders usability.

Cosmetic, free changes

~~priority is turned into an attribute for consistency~~
example elements are wrapped into an examples element for consistency with properties.
I think it would be better style to use attributes very sparingly and prefer nested elements. Eg:
- name is kept an attribute for rules & properties, so that they can be validated to be unique
- language uses an id attribute.
  - The language element may contain minimumVersion and maximumVersion nested elements
- Using a nested element for property descriptions encourages writing more than a one-liner

  <xpath-rule name="ControlStatementBraces">
        <language id="java" />
        <message>This statement should have braces</message>
        <since>6.2.0</since>
        <priority>3</priority>
        <description>
            Enforce a policy for braces on control statements. It is recommended to use braces on 'if ... else'
            statements and loop statements, even if they are optional. This usually makes the code clearer, and
            helps prepare the future when you need to add another statement. That said, this rule lets you control
            which statements are required to have braces via properties.
        </description>

        <xpath>
          <expr>
            <![CDATA[
               (: ... :)
            ]]>
          </expr>
        </xpath>
        <property-defs>
            <property-def name="checkSingleIfStmt">
                <type>Boolean</type>
                <description>Require that 'if' statements with a single branch use braces</description>
                <default>
                    <value>true</value>
                </default>
            </property-def>

            <property-def name="checkWhileStmt">
                <type>Boolean</type>
                <description>Require that 'if' statements with a single branch use braces</description>
                <default>
                    <value>true</value>
                </default>
            </property-def>
        </property-defs>

        <examples>
            <example>
                <![CDATA[
while (true)    // not recommended
  x++;

while (true) {  // preferred approach
  x++;
}
                ]]>
            </example>
        </examples>
  </xpath-rule>

Deprecated element

The deprecated attribute lacks some flexibility. Our warnings just mention that the rule will be removed without specifying a replacement rule. I think we should turn it into an element to allow mentioning some additional metadata, e.g.

<deprecated since="6.6.0">
  <replacement>
    <rule-reference ref="category/java/design.xml/CyclomaticComplexity"/>
  </replacement>
</deprecated>

The replacement element is optional, but in the most common case there is a replacement for a rule.
The inside of the element is a full-on rule reference node. When the deprecated rule is referenced, it's not instantiated, but instead the replacement rule-reference is instantiated.
- This allows e.g. mapping the configuration of the deprecated rule to the properties of the replacement rule. E.g. for StdCyclomaticComplexity, the replacement rule is CyclomaticComplexity with the cyclo option ignoreBooleanPaths. The replacement rule could thus be (modulo changes to the property syntax, described below)

    <rule-reference ref="category/java/design.xml/CyclomaticComplexity">
      <properties>
        <property name="cycloOptions">
          <value>ignoreBooleanPaths</value>
        </property>
      </properties>
    </rule-reference>

That scheme would allow us

To remove the implementations of the deprecated rules and use the newer rules instead everywhere, transparently to the user. We can remove the old rule class from the codebase.
To actually know of a whole rule-reference tag that is functionally equivalent to the deprecated rule, which means we can suggest the user to just copy-paste the replacement XML inside his ruleset when migrating.

[adangel]: I like the idea of the deprecated element. I assume, the "execution" of this information would be part of PMD core.

Avoiding duplicate replacement rules

When eg n rules are deprecated because they're all merged into a newer rule that handles all their use cases, what do we do? We could include the n replacement rules and override their names with those of the deprecated rule.
- E.g. if the user has the following in their ruleset:

<rule-reference ref=".../ForLoopsMustUseBraces " />
<rule-reference ref=".../IfElseStmtsMustUseBraces" />
<rule-reference ref=".../IfStmtsMustUseBrace" />
<rule-reference ref=".../WhileLoopsMustUseBraces" />

The correct replacement would be just

<rule-reference ref=".../ControlStatementBraces" />

and not

<rule-reference name="ForLoopsMustUseBraces" ref=".../ControlStatementBraces " />
<rule-reference name="IfElseStmtsMustUseBraces" ref=".../ControlStatementBraces" />
<rule-reference name="IfStmtsMustUseBrace" ref=".../ControlStatementBraces" />
<rule-reference name="WhileLoopsMustUseBraces" ref=".../ControlStatementBraces" />

On the other hand, if the ruleset includes the following (taken from the blank properties example):

<rule-reference ref=".../LocalInterfaceSessionNamingConvention"/>
<rule-reference ref=".../MDBAndSessionBeanNamingConvention" />

Then the correct replacement would keep each rule distinct because they are presumed to do different things:

<!-- Notice the name overriding to keep them distinct-->
<rule-reference name="LocalInterfaceSessionNamingConvention" ref=".../ClassHierarchyNamingConvention">
  <properties>some properties A</properties>
</rule-reference>
<rule-reference name="MDBAndSessionBeanNamingConvention" ref=".../ClassHierarchyNamingConvention" >
<properties>some properties B</properties>
</rule-reference>

For that reason, when mapping deprecated rules to their replacement:

If several rules are mapped to the exact same rule, taking property values into account, then we suggest to replace them all with a single rule
If several rules are mapped to the same rule, but properties are different, then we suggest to replace each with the replacement rule (and the name overriding is not optional)

A more sophisticated property mapping

(low priority)

Similarly, if the reportLevel for the deprecated rule was set, it can be mapped to the newer properties of the replacement rule (classReportLevel, methodReportLevel), e.g.

    <rule-reference ref="category/java/design.xml/CyclomaticComplexity">
      <properties>
        <property name="cycloOptions">...</property>
        <property name="classReportLevel">
          <expression>$reportLevel * 8</expression>
        </property>
        <property name="methodReportLevel">
          <expression>$reportLevel</expression>
        </property>
      </properties>
    </rule-reference>

Note that these expressions are exactly what's used in the newer class to map the older property, see:

https://github.com/pmd/pmd/blob/0023e5b0d33054c18b84765d1cb96ad56dc17e19/pmd-java/src/main/java/net/sourceforge/pmd/lang/java/rule/design/CyclomaticComplexityRule.java#L86-L99

Implementation-wise, mapping properties with an arbitrary expression is possible without too much hassle. XPath allows to compute arbitrary expressions, not just node sequences, so 1+2 is a valid XPath expression(*), and so is $variable * 2 + 4. The replacement of a deprecated rule by its replacement rule would thus consist of the following steps:

For each expression element:
- Make Saxon run their content as an XPath expression, with the properties of the deprecated rule in the DynamicContext (so $reportLevel would refer to the property of the deprecated rule that was set by the user)
- Convert the result back to an Element (eg. a seq or a value, as defined by our property value schema)
- Replace the expression element with the result Element in the children of the property tag
Hand the whole replacement/rule-reference element to the normal rule reference parser.

(*): SaxonXPathRuleQuery currently fails with a ClassCastException when the XPath evaluates to e.g. a number, because we assume that only nodes may be returned, instead of handling the error

Documentation tags

I already proposed this in 6.0.0, and I think it's worth it. In that old proposal I described what I then called

"primary categories" -> our current rule categories
"secondary categories" -> this is what I reintroduce now under the name "documentation tag", to avoid confusion.

We already have categories (eight per language). Some rules are however related, without being placed inside the same category. "Documentation tags" is a way to tag related rules, in order to discover them easily, even if they're split across categories. Even within one category, tags can be useful to group related rules.

E.g. naming related rules are all in the codestyle category. They're obviously related. To make this clear, we'd add them all a tag "naming". You could then search for "naming" on the website, or click on the "naming" tag to display all rules related to "naming". We could have other tags like "exception handling", "control flow", etc.

Implementation wise:

Add an interface DocumentationTag, with the simple methods String getDisplayName(), String getDescription() and String getId() for now. Two tags are considered equal iff their id is equal. The other methods are used to render documentation
Add a Set<DocumentationTag> getDocumentationTags() on the Rule interface. Tags don't participate in rule equality.
Add an enum StandardDocumentationTags, enumerating tags that we use for built-in rules
- Categories each have a corresponding StandardDocumentationTags, populated implicitly on our built-in rules
Add an XML element tags on rule definitions and references, allowing to specify tags as a comma separated list of ids.
- Tags can be overridden by rule references
- E.g. <tags ids="naming" />
When parsing a rule node, we try to match the ids with our standard tags's ids. If they're non-standard, then an instance of UserDocumentationTag is created instead.
Add a test that asserts that built-in rules only mention standard tags

Use cases:

Documentation
Discovery and grouping of rules on the website (the tags feature native to our Jekyll theme can be useful)
Ruleset import, filtering by tag (user tags may be used as well, but only standard tags are documented on our website):

<!-- Only those rules with the "naming" tag are imported. -->
<ruleset-import ref="...">
  <tag-filters>
    <tags ids="naming" /> <!-- The same element that we use to define tags on rules -->
  </tag-filters>
</ruleset-import>

Programmatically: RuleSet filtered = myRuleset.selectRules(StandardDocumentationTags.NAMING)

It's that simple.

Bonus: for basically free we can have a very complete intersection/union system to filter tags:

  <!-- Selects rules that have both errorprone AND controlflow tags -->
  <tag-filters>
    <tags ids="controlflow,errorprone" />
  </tag-filters>

  <!-- Selects rules that have (both errorprone AND controlflow tags) OR have the naming tag-->
  <tag-filters>
    <tags ids="controlflow,errorprone" />
    <tags ids="naming" />
  </tag-filters>

[adangel]: Documentation tags sound useful, but I wouldn't add too much functionality around them. E.g. I don't see, that we should implement a filter mechanism directly in PMD core. I would see this as part of the ruleset editor - where you can browse the rules e.g. grouped by tags. I'd start using the documentation tags purely as "documentation".

Regarding the XML form: I'd suggest to use <tags> with a sub element <tag> per tag, otherwise, we need to split the string "ids".

I don't think, we need the features around DocumentationTag and StandardDocumentationTags programmatically. We do need however to come up with a initial list of tags (as we need to tag our built-in rules).

I would not allow to override the tag. The tags should be part of the rule definition and stay that way. We might be a bit more lax on changing the tags (as opposed to changing the primary category) of a rule. I don't see a use case for overriding the tags (unless we implement the filtering - but then: why not using a custom ruleset instead of relying on filtering?).

[oowekyala]: I think you're right about not implementing tag filtering. If we are lax with assigning doc tags, we could break compatibility with existing ruleset filters, so that would be another thing we'd have to pay extreme care to between minor versions. Plus, if rules may have several tags, a rule may have to be excluded several times. On the other hand, bulk-adding built-in rules is not so useful if the finest granularity we can offer is "category", which are bound to grow larger and larger... The ruleset editor definitely would be a great way to bypass this lack of flexibility. What you say about tag overriding definitely makes sense too.

[jsotuyod]: I strongly agree with Adangel here. I really like this for documentation purposes / searchability, but the other features really look like overkill / will hardly be used. Our docs point towards single-rule addition, and looking at issues here / questions at StackOverflow people don't really seem to be reading our docs on writing rulesets. I don't think they would even know this feature is available. Using it on the rule editor / IDE plugins seems more reasonable.

XPath rules

XPath rules could use some syntactic sugar / tighter validation.

The xpath property is always required, it could become an element
The class is always XPathRule (or ApexXPathRule), we could make that implicit and forbid the attribute. I think we should let each language plug in its implementation transparently for the user.
On XPath rule declarations properties cannot be overriden, except for violationSuppressXPath and -Regex. XPath declarations are also the only place where it makes sense to declare properties in the XML. In all other places (references, java rules), the properties can never be used. This should be enforced by the schema.
If we update to Saxon HE, the available XPath versions will be 2.0 and 3.1. I think it may be ok to only support a single version to reduce the pain. Wdyt? and do we include a version attribute for the XPath?

    <xpath-rule name="Name"
                language="apex"
                message="string">
        <description>
            A description
        </description>

        <xpath><![CDATA[

            //ClassOrInterfaceDeclaration
              [@Abstract='true' and @Interface='false']

        ]]>xpath>

        <properties>
            <!-- Property definitions -->
        </properties>

        <examples>
            <example>string</example>
        </examples>
    </xpath-rule>

Properties

Current schema excerpt:

 <xs:complexType name="property">
    <xs:sequence>
      <xs:element name="value" type="xs:string" minOccurs="0" maxOccurs="1" />
    </xs:sequence>
    <xs:attribute name="name" type="xs:NMTOKEN" use="required" />
    <xs:attribute name="value" type="xs:string" use="optional" />
    <xs:attribute name="description" type="xs:string" use="optional" />
    <xs:attribute name="type" type="xs:string" use="optional" />
    <xs:attribute name="delimiter" type="xs:string" use="optional" />
    <xs:attribute name="min" type="xs:string" use="optional" />
    <xs:attribute name="max" type="xs:string" use="optional" />
  </xs:complexType>

Property values

For now, there are two different syntaxes to give a value to a property:

The "value" attribute
The "value" child element

But there is no dedicated syntax for multi-valued properties, instead, we use a string delimiter, which makes parsing hard to predict (eg when the delimiter has to be part of one value, or when representing the empty sequence), and some multi-value property types impossible to represent (eg a multi-value RegexProperty).

I think we should support lists as first-class citizens, and drop all delimiter logic from the PropertyDescriptors.

e.g.

<!-- A string property. -->
<property name="foo" >
  <value>hello world</value>
</property>

<!-- A string multi property. -->
<property name="foo" >
  <seq>
    <item>hello world</item>
    <item> world</item>
    <item>hello</item>
  </seq>
</property>

Whatever the actual syntax, having the first child of the property tag represents the whole value is important to keep the parser simple and extensible.

We probably should deprecate the value-attribute and preferring the nested <value> element. It's not possible in XSD to restrict to either an attribute or a child element. But maybe, it's just too convenient to use the attribute.
We can have a schema validation for the contents of the "seq" and "value" element.
Different names (here "seq" and "value") could address different value parsers, so we could add some new syntaxes without breaking the old ones. E.g. we could have a syntax for maps:

  <map>
    <key>hello world</key>
    <key2> world</key2>
  </map>

or for an arbitrary bean:

  <bean>
    <foo>
      <field1>hello world</field1>
    </foo>
  </bean>

We could even represent the null value with <null/>, since it doesn't have a consistent representation in our current framework. I think we're better off explicitly banning the null value from the framework though. IIRC the logic to find the value of a property uses a map and would fail on the null value anyway, though this is not documented.

Property definitions

Property definitions use the same element as property overriding for now. This should be avoided, since

The two use cases don't use the same set of attribute/child tags
The two use cases don't occur in the same contexts

I actually see three use cases:

	Property overriding	Normal property def	Blank property def
name	🌕	🌕	🌕
value (or default)	🌕	🌕	❌
type	❌	🌕	🌕
description	❌	🌕	🌕

The third one is not implemented in PMD for now. A blank property has no default value and requires to be overridden by the user, else a config error is reported. This is described here

The three would their own element, e.g. property, property-def, and blank-property-def.
As noted previously, property definitions may only occur in the properties section of an XPath rule.
The delimiter attribute is not needed anymore since we have dedicated syntax for lists
The min and max attributes are removed with no immediate replacement. We could in the future use a more extensible syntax once the validator framework is firmly in place.
Since the amount of things needed to build a property is fixed (a value, description, name and type), we can remove the overengineered PropertyDescriptorField solution and simplify greatly the properties framework.
~~Should we add a uiOrder attribute? I don't see a compelling reason why not~~ No

Examples:

 <property-def name="strings" type="List[String]" description="strings">
     <default>
         <!-- The empty sequence -->
         <seq/>
     </default>
 </property-def>

Note that I add a default intermediary element, so that if we add some other configurable things (e.g. validators) to property defs, we may add it as a child.

<!-- "default" is forbidden for this one -->
 <blank-property-def name="s" type="Integer" description="a description" />

<!-- Classic property overriding -->
<property name="foo" >
  <value>hello world</value>
</property>

[adangel]: Didn't we agree already to get rid of the uiOrder attribute? (Maybe this proposal wasn't updated afterwards...). [oowekyala] (it hasn't been updated)

So, the only difference between normal and blank property definitions is, whether we provide a default value or not. Do we need a separate element for this use case? Wouldn't it be ok, if there is no default element (or an empty default element)? A concrete example, that we have already for a blank property is LoosePackageCouling, were we use a empty default value. I think, it would work, if we assume, that all properties need to have a value when it comes to execution (if a property does not need to have a value, then the property is probably not used at all...). So the property value could either be a default one or be overridden. The the property definition could either provide a default value or none - then the value is missing.

[oowekyala]: I think having a dedicated xml element for blank property defs makes the intent clearer. All of what you say about execution makes sense though. Maybe the simplest way to implement blank properties is to make PropertyDescriptor#defaultValue return an Optional<T>. That would also de facto forbid null default values.

[jsotuyod]: I agree with adangel here, I'm unsure we need a dedicated element rather than make the default optional.

[oowekyala]: I think you're right guys.

Complete Example

A minimal, but complete example.

Our category rulesets They contain only rule definitions.

<?xml version="1.0"?>

<ruleset name="Best Practices"
    xmlns="http://pmd.sourceforge.net/ruleset/3.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://pmd.sourceforge.net/ruleset/3.0.0 https://pmd.github.io/ruleset_3_0_0.xsd">

    <description>
Rules which enforce generally accepted best practices.
    </description>

    <xpath-rule name="AbstractClassWithoutAbstractMethod"
                since="3.0"
                language="java"
                message="This abstract class does not have any abstract methods"
                priority="3">
        <description>
The abstract class does not contain any abstract methods. An abstract class suggests
an incomplete implementation, which is to be completed by subclasses implementing the
abstract methods. If the class is intended to be used as a base class only (not to be instantiated
directly) a protected constructor can be provided prevent direct instantiation.
        </description>
        <xpath>
<![CDATA[
//ClassOrInterfaceDeclaration
 [@Abstract='true'
  and count( .//MethodDeclaration[@Abstract='true'] )=0 ]
  [count(ImplementsList)=0]
  [count(.//ExtendsList)=0]
]]>
        </xpath>
        <properties>
            <!-- Property definitions - in this example, there are no additional properties for this rule -->
        </properties>
        <examples>
            <example>
<![CDATA[
public abstract class Foo {
  void int method1() { ... }
  void int method2() { ... }
  // consider using abstract methods or removing
  // the abstract modifier and adding protected constructors
}
]]>
            </example>
        </examples>
        <tags>
            <tag>surprising</tag>
        </tags>
    </xpath-rule>

    <java-rule name="AccessorClassGeneration"
               since="1.04"
               language="java"
               maximumLanguageVersion="10"
               message="Avoid instantiation through private constructors from outside of the constructor's class."
               class="net.sourceforge.pmd.lang.java.rule.bestpractices.AccessorClassGenerationRule"
               priority="3">
        <description>
Instantiation by way of private constructors from outside of the constructor's class often causes the
generation of an accessor. A factory method, or non-privatization of the constructor can eliminate this
situation. The generated class file is actually an interface.  It gives the accessing class the ability
to invoke a new hidden package scope constructor that takes the interface as a supplementary parameter.
This turns a private constructor effectively into one with package scope, and is challenging to discern.
        </description>
        <examples>
            <example>
<![CDATA[
public class Outer {
 void method(){
  Inner ic = new Inner();//Causes generation of accessor class
 }
 public class Inner {
  private Inner(){}
 }
}
]]>
            </example>
        </examples>
        <tags>
            <tag>synthetic</tag>
        </tags>
    </java-rule>
</ruleset>

[jsotuyod]: I'd not use java-rule. The usage of Java is circumstantial. Technically, any JVM language will do, people (even ourselves) may write them in Scala, Kotlin, Groovy… maybe just use rule-def and xpath-rule-def?

A sample ruleset Like quickstart

<?xml version="1.0"?>
<ruleset name="quickstart"
         xmlns="http://pmd.sourceforge.net/ruleset/3.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://pmd.sourceforge.net/ruleset/3.0.0 https://pmd.github.io/ruleset_3_0_0.xsd">
    <description>Quickstart configuration of PMD. Includes the rules that are most likely to apply everywhere.</description>

    <rule-reference ref="category/java/bestpractices.xml/AbstractClassWithoutAbstractMethod" />
    <rule-reference ref="category/java/bestpractices.xml/AccessorClassGeneration" />
</ruleset>

TODO Missing (complete) examples: XPath rule with properties, Java rule with properties, deprecations, rule-reference (aka rule usage) with property value overrides, whole ruleset reference

Open Points

Namespace(s)?
- [oowekyala] Do you mean sourceforge or github? I don't really know. I think though, it would be nice to version the schema according to the PMD version when the schema update was released. E.g. that new schema would be XXX-7.0.0.xsd
- [jsotuyod] This may be confusing for users. Is the 7.0.0 schema compatible with 7.1.0? what about 8.0.0? We should very rarely do changes in our Schema.
- [oowekyala] Well IMO it's more confusing that the version of the schema has no relation to the version of PMD that supports it. Is the 3.0.0 schema compatible with PMD 7.0.0, what about with 3.1.0? We commit to backwards compatibility so it's natural that a 7.0.0 ruleset is a valid 7.1.0 ruleset. But you need at least PMD 7.1.0 to run a 7.1.0 ruleset, which isn't explicit at all with the separate versioning scheme. This is actually compounded by the fact the schema doesn't change often, we're currently at 3.0.0 even though PMD 3.0.0 was released 14 years ago... why would somebody assume that the schema version 4.0.0 is not runnable by eg PMD 6.0?
externalInfoUrl: do we need it? for our built-in rules, it's always the same / can be generated. Current usage: When generating a report, this can be used to provide a link to the rule description of a found violation. So this is a information required during reporting.
- [oowekyala] I think we don't need it. At least for built-in categories, it's painful to get right. I don't know if users really use it for their custom rules. If we remove it though, we should take care of adding a link only if it's a built-in rule when reporting.
- [jsotuyod] I agree the property is useless, and it's not even required by anything but our doc generation. We DO need to have the report include the link for built-in rules nontheless.
root element: should we have a <category> root element, which only allows xpath-rule+ and java-rule+ tags? And a "normal" <ruleset>, which allows only rule-reference+ tags?
- [oowekyala] I don't really think that would be useful outside of our codebase. Maybe just a verification test is enough.
- [jsotuyod] I already wrote it above, but I would never restrict rulesets from defining xpath rules. It's the easiest / most common way to implement project-specific rules, and forcing users to write 2 XMLs to use them is just boilerplate / hindering usability.
Property definitions: It might be easier for tools (such as ruleset editors), if all information is available in the rule definition. For java rules, currently one must instantiate the rule to get the available properties... For consistency, we should maybe require to define the properties always in XML and if used in java, verify that they indeed have been defined...
- [jsotuyod] I fear that statically checking if a Java rule has the properties it looks for defined in the XML and they have the proper types may be hard (plus, no type-safety from Java, we would have to repeat the type when accessing it). We will be facing silly runtime errors this way. I'd rather forbid property declarations for non-xpath rules in the XML for what's worth. Both XPath and Java rules allow to get defined properties in runtime by loading the rule, so for tooling, this approach would also provide consistency and reliability (not to mention inherited properties such as a suppression properties).
Do we currently validate the schema before parsing a ruleset? I think we probably should... it would simplify the parsing if we don't have to worry about wrong structure.
- [jsotuyod] We currently don't, but I don't see it adding complexity to our current parsing... we are simply failing with a runtime error. I don't see much value for us to do so at runtime, although there is clear value for developers when writing their own rulesets.