As a follow-up to my earlier tutorial Writing pretty code with PMD I am going to be discussing making custom PMD rules using XPath. The first thing to note is that we use a declarative programming paradigm to define our rules. We define or ‘declare’ what constitutes a PMD violation, but not what should be done about it, PMD takes care of that part.
Abstract Syntax Tree
PMD uses Java Abstract Syntax Tree (AST) representation of your source code to apply rules. Essentially the AST is representation of a java class with generic descriptors to refer each element in the source code. In code you may have a class called MyClass but in the AST its just a “ClassOrInterfacedeclaration”. Each element which is contained within an other is its descendant in the tree. For example a field is defined inside a class declaration so it is a descendant of the class description. Similarly all items defined within the same element are siblings in the tree. Any information that can be used to differentiate an instance of the generic type (such as the data type of the field) is stored as a property of the element. If the description above sounds reminiscent of XML its because ASTs are basically XML documents. As basic example of AST representation see the following code and its AST representation.
PMD Rule Designer
Before we start defining our rules we need a sandbox in which to quickly test our code. Luckily for use PMD comes with just such a tool, PMD Rule designer. I have repackaged the tool into a single executable jar file which you can download here PMDRuleDesigner.jar. To run the tool just type java -jar java -jar PMDDesigner.jar into the *nix terminal or windows command line. Write your code in the top left section and hit the “Go” button. The bottom right section will list the AST representation of your code. The two sections on the right deal with XPath, which we will be discussing next.
XPath Syntax
"/" Has child
The forward slash defines a child query, For example /TypeDeclaration defines that we want to match a child of the root node which is of type “TypeDeclaration”. We can also use multiple slashes in a query to search for a set of child relation ships. For example the query below will match will both field declarations in the code above.
Enter the query specified above into the top left box of the PMD Rule Designer and hit go. In the box in the bottom right you should see the line (Line 4 & Line 5) numbers of the two field declarations.
"//" Has Descendant
You don’t always want to define the full path from root for the element you are searching for so instead of using the has child relation ship you can use the double forward slash or has dependent relationship. This will search for any descendant of the current node. There for the query above to search for field definitions could also be written as //FieldDeclaration.
The has descendant is not only useful for shortening queries it is also useful for searching for specific cases regardless of where they exist in the source tree. For example, if we take the code below. There is no query using child relationships alone that will match both variable declarations (int x and float y). Where as the //PrimativeType query matches both.
"@" Has Property
The @ sign is used to denote has property, and we can use properties to get information about specific instances of elements. For example the Image property stores the name of the element (I have no idea why its stored in the Image property). So this query //VariableDeclaratorId/@Image will return two values x and y.
"*" Wild Card
Wild cards are useful for matching many similar elements for example if we wanted to find all the fields and method declarations in a class. for fields we would use a query such as //ClassOrInterfaceBodyDeclaration/FieldDeclaration where as for a method declaration it would be something like //ClassOrInterfaceBodyDeclaration/MethodDeclaration. Using a wild card we can define the query as //ClassOrInterfaceBodyDeclaration/*
We can also use wild cards to list all properties of an element by combining @ and * e.g. //ClassOrInterfaceBodyDeclaration/FieldDeclaration/@*
"[]" Predicates
Conditionals are used to filter the set of possible nodes that match certain portions of the XPath string. For example if in the code below we wish to single out the function(s) which return an int we would use the XPath //MethodDeclaration/ResultType[contains(Type/*/@Image, "int")]. The first part defines a tree (or sub-tree) to which the conditional applies. In the example we would only apply the condition to the ResultType nodes which are children of a MethodDeclaration node. In the conditional we specify that the node must have a child or type ‘Type’ which in turn has an Image property containing the value ‘int’. Note how we use the wildcard to avoid having to specify whether the function will return PrimitiveType or ReferenceType.
By changing the query slightly we can even find the name of the function(s) with the int return type. Note we are now conditionally selecting a MethodDeclaration node which matches the given criterion. Once we have the node we look for its MethodDeclarator child and retrieve the method name.
We use the contains functions for our query but there are many other function you can use for predicates. An exhaustive list can be found here
My first rule
Ok enough beating around the bush lets create our first PMD rule. This is one of my pet peeves, I hate unused or overly generic imports such as import java.NET.*; that all newbies to java seem to use. So how can we stop them? First of all fire up your pmd rule designer and write some imports to see what they look like, I like to write one example of a violation and one example of proper usage so that I can see the differences. If you look at the properties of the two statements you will see that if we use the wild card import the PackageName property matches the ImportedName property. With the query /ImportDeclaration[@PackageName=@ImportedName] we can select all the imports which use the wild card matching and not the specific imports. Try this query in the Rule designer to verify that it only matches the wild card import.
</table>
Ok now that we have our XPath query lets create a rule set file, and add the custom rule definition. We give the rule a name, an admonishing message to be displayed to violators as well as a longer description. We can set how severe want the rule violation to be considered. As you can see I really hate wild card imports because I set it to highest level. The all important XPath rule goes in the property.value item of the property named xpath.
You can append this rule to the pmd ruleset in my earlier tutorial Writing pretty code with PMD which can be downloaded here. When you run the compile you will get the following output:
And if you open said file you will see the following information, telling you that the problem is on line 8 of the GuiceCreator.java file and that we should not use wild card imports.
Conclusion
Despite the length of this article note that we were able to create a small xml snippet which will fix a coding malpractice through out all our code. Granted that this was a simple case but using XPath we can create more complex ad intricate rules. Further more if we spend the effort writing the rule(s) once we can ensure code quality without the overhead of code reviews as PMD can be integrated with our build.
Source Code
Here is the complete project with updated ruleset file and violation. Note that all code and other source provided here are licensed under the BSD License.
External Links
- http://pmd.sourceforge.net/
- http://www.w3schools.com/xpath/
- http://jenkins-ci.org/
- http://www.sonarsource.org/
- http://pmd.sourceforge.net/rules/index.htm
- http://www.eclipse.org/articles/article.php?file=Article-JavaCodeManipulation_AST/index.html
- http://www.w3schools.com/xpath/xpath_functions.asp#string
- Photo Curtesy of Dan Iggers