An Introduction to XPath with examples in Java

 Introduction to XPath with examples in Java

XPath is a query language that defines syntax for selecting nodes from an XML document. With XPath you can traverse elements and attributes of an XML document. XPath was defined by the World Wide Web Consortium (W3C). This post gives an Introduction to XPath with examples in Java.

What is XPath?

XPath provides path expressions to select nodes or list of nodes in an XML documents. XPath also provides a set of useful functions for string values, numeric values, date and time comparison, node and QName manipulation, sequence manipulation etc. that allows you to perform various operations in the XML document.

XPath views an XML document as a tree of nodes. This tree is very similar to a Document Object Model (DOM) tree. The XPath specification defines seven kinds of nodes. They are

  • Root
  • Element
  • Text
  • Attribute
  • Comment
  • Processing instruction
  • Namespace

The root element of the XML data is modeled by an element node. The XPath root node contains the document’s root element as well as other information relating to the document. In an XPath expression, the root node is defined by a single slash (/).

XPath expression is a major element in the XSLT standard. Without XPath knowledge you will not be able to create XSLT documents.

XPath Expression syntax

XPath uses a path expression(like /company/employee) to select node or list of nodes in an xml document. Following is the list of most useful path expressions.

ExpressionDescription
nodenameSelect all nodes with the given name “nodename
/Selection starts from the root node
//Selects nodes in the document from the current node that match the selection no matter where they are
.Selects the current node
..Selects the parent of the current node
@Selects attributes

In the table below I have listed some path expressions and the result of the expressions.

Path ExpressionResult
employeeSelects all nodes with the name “employee”
/employeesSelects the root element employees

Note: If the path starts with a slash ( / ) it always represents an absolute path to an element.

employees/employeeSelects all employee elements that are children of employees element
//employeeSelects all employee elements no matter where they are in the document
employees//employeeSelects all employee elements that are descendant of the employees element, no matter where they are under the employees element
//@idSelects all attributes that are named ‘id’

Using Predicates

Predicates are used to find a specific node or a node that contains a specific value. Predicates are defined using square brackets […].

In the table below I have listed some path expressions with predicates and the result of the expressions.

Path ExpressionResult
/employees/employee[1]Selects the first employee element that is the child of the employees element.
/employees/employee[last()]Selects the last employee element that is the child of the employees element
/employees/employee[last()-1]Selects the last but one employee element that is the child of the employees element
/employees/employee[position()<3]Selects the first two employee elements that are children of the employees element
employees/employee[@id>’3′]Selects all employee elements (that are children of employees element) with attribute ‘id’ greater than 3
employees/employee[age>35]Selects all employee elements (that are children of employees element) having age element with value greater than 35.

Refer the below xml content for a quick understanding.

<?xml version=”1.0″ encoding=”utf-8″?>
<employees>
<employee id=”1″>
<age>36</age>
<firstname>Arun</firstname>
<lastname>Kumar</lastname>
<role>Developer</role>
</employee>
</employees>

Selecting Unknown Nodes

XPath wildcards can be used to select unknown XML elements.

WildcardDescription
*Matches any element node
@*Matches any attribute node
node()Matches any node of any kind

In the table below I have listed some path expressions and the result of the expressions.

Path ExpressionResult
/employees/*Selects all the child element nodes of the employees element
//*Selects all elements in the document
//employee[@*]Selects all employee elements which have at least one attribute of any kind

Using XPath Functions

As mentioned earlier, XPath provides set of library functions. Below are some of the useful functions.

  • last() – Returns the index of the last item of the current node set.
  • position() – Returns the index of the current item in the current node set.

  • count(node-set) – Returns the number of items in the argument node set. Example – count(/journal/article)
  • contains(string, string) – Returns true if the first argument contains the second argument string. Example – /book/author[contains(name, ‘kumar’)]
  • lower-case(string?) – Retrieves the string argument or context node with all characters converted to lower case. Example – lower-case(‘Foo’)=’foo’
  • upper-case(string?) – Retrieves the string argument or context node with all characters converted to upper case. Example – upper-case(‘Foo’)=’FOO’
  • sum(node-set)  Sums the node set value. Example – sum(/journal/article/author/age)
  • min(node-set)  Returns the minimum value that results from converting the string-values of each node in argument node-set to a number. The minimum is determined with the < operator. If the parameter is an empty node-set, or if any of the nodes evaluate to NaN, the return value is NaN. Example – min(/journal/article/author/age)
  • max(node-set)  Returns the maximum value that results from converting the string-values of each node in argument node-set to a number. The maximum is determined with the > operator. If the parameter is an empty node-set, or if any of the nodes evaluate to NaN, the return value is NaN. Example – max(/journal/article/author/age)

  • text() – Returns all of the text nodes that are children of the context node. It returns a node set.

XPath in Java

javax.xml.xpath package provides XPath support in Java. To create XPath Expression, XPath API provide factory methods.

XPath return types are defined in XPathConstants class.  The supported return types are

  1. XPathConstants.STRING
  2. XPathConstants.NUMBER
  3. XPathConstants.BOOLEAN
  4. XPathConstants.NODE
  5. XPathConstants.NODESET
  6. XPathConstants.DOM_OBJECT_MODEL

Parse XML document using XPath parser in Java

We will parse the below XML file.

XPathParserExample.java

Below is the output of running the above program.

Hope you liked the Introduction to XPath with examples in Java. If you have any comments, post it in the comments section.

References

XPath Tutorial

Using XPath Functions

The following two tabs change content below.
Working as a Java developer since 2010. Passionate about programming in Java. I am a part time blogger.

Add Comment

Required fields are marked *. Your email address will not be published.