Programming + Errors: May 2012

Thursday, May 31, 2012

Java : xpath and NamespaceContext

http://www.ibm.com/developerworks/xml/library/x-nmspccontext/index.html

Prerequisites and the example
In this article, I assume that you are familiar with the technical details described in "Evaluating XPaths from the Java™ platform" by Brett McLaughlin. If you are not sure how to run Java programs using XPath, please refer to the Brett's article (see Resources for a link to the article.) The same is true for the API needed to load an XML file and to evaluate an XPath expression.
You will use the following XML file for all examples:

Listing 1. Example XML

<?xml version="1.0" encoding="UTF-8"?>

<books:booklist
  xmlns:books="http://univNaSpResolver/booklist"
  xmlns="http://univNaSpResolver/book"
  xmlns:fiction="http://univNaSpResolver/fictionbook">
  <science:book xmlns:science="http://univNaSpResolver/sciencebook">
    <title>Learning XPath</title>
    <author>Michael Schmidt</author>
  </science:book>
  <fiction:book>
    <title>Faust I</title>
    <author>Johann Wolfgang von Goethe</author>
  </fiction:book>
  <fiction:book>
    <title>Faust II</title>
    <author>Johann Wolfgang von Goethe</author>
  </fiction:book>
</books:booklist>

This XML example has three namespaces declared in the root element and one declared on an element deeper in the structure. You will see what differences result from this setup.

Frequently used acronyms

API: application programming interface
DOM: Document Object Model
URI: Universal Resource Identifier
XHTML: Extensible Hypertext Markup Language
XML: Extensible Markup Language
XSD: XML Schema Definition
XSLT: Extensible Stylesheet Language Transformations

The second interesting thing about this XML example is that the element booklist has three children, all named book. But the first child has the namespace science, while the following children have the namespace fiction. This means that these elements are completely different to XPath. You will see the consequences in the examples below.
A word about the sample source code: the code is not optimized for maintenance, but for readability. This means that it has some redundancies. The output is produced in the simplest way through System.out.println(). All lines of code concerning output are abbreviated with '...' in the article. Also, I do not cover the helper methods in this article, but they are included in the download file (see Download).
Theoretical background
What is the sense of namespaces and why care about them? A namespace is a part of the identifier for an element or attribute. You can have elements or attributes with the same local name, but different namespaces. They are completely different. See the example above (science:book and fiction:book). You need namespaces to resolve naming conflicts if you combine XML files from different sources. Take, for example, an XSLT file. It consists of elements of the XSLT namespace, elements from your own namespace, and (often) elements of the XHTML namespace. Using namespaces you can avoid ambiguities concerning elements with the same local name.
The namespace is defined by the URI (in this example, http://univNaSpResolver/booklist). To avoid the use of this long string, you define a prefix that is associated with this URI (in the example, books). Please remember that the prefix is like a variable: its name does not matter. If two prefixes reference the same URI, the namespace of the prefixed elements will be the same (see Example 1 in Listing 5 for an example of this).
An XPath expression uses prefixes (for example, books:booklist/science:book) and, you have to provide the URI associated with each prefix. This is where the NamespaceContext comes in. It does exactly that.
This article explains the different ways to provide the mapping between the prefix and the URI.
In the XML file, the mapping is provided by the xmlns attributes like: xmlns:books="http://univNaSpResolver/booklist" or xmlns="http://univNaSpResolver/book" (the default namespace).
The necessity of providing namespace resolution
If you have XML that uses namespaces, an XPath expression will fail if you don't provide a NamespaceContext. Example 0 in Listing 2 shows that case. The XPath object is constructed and evaluated on the loaded XML document. First, try to write the expression without any namespace prefixes (result1). In the second part, write the expression with namespace prefixes (result2).

Listing 2. Example 0 without namespace resolution

    private static void example0(Document example)
            throws XPathExpressionException, TransformerException {
        sysout("\n*** Zero example - no namespaces provided ***");

        XPath xPath = XPathFactory.newInstance().newXPath();

...
        NodeList result1 = (NodeList) xPath.evaluate("booklist/book", example,
                XPathConstants.NODESET);
...
        NodeList result2 = (NodeList) xPath.evaluate(
                "books:booklist/science:book", example, XPathConstants.NODESET);
...
    }

This results in the following output.

Listing 3. Output from example 0

*** Zero example - no namespaces provided ***
First try asking without namespace prefix:
--> booklist/book
Result is of length 0
Then try asking with namespace prefix:
--> books:booklist/science:book
Result is of length 0
The expression does not work in both cases.

In both cases, the XPath evaluation does not return any node and there is no exception. XPath cannot find a node because the mapping of the prefixes to the URIs is missing.
Hardcoded namespace resolution
It is possible to supply the namespaces as hard-coded values which might look like the class in Listing 4:

Listing 4. Hardcoded namespace resolution

public class HardcodedNamespaceResolver implements NamespaceContext {

    /**
     * This method returns the uri for all prefixes needed. Wherever possible
     * it uses XMLConstants.
     * 
     * @param prefix
     * @return uri
     */
    public String getNamespaceURI(String prefix) {
        if (prefix == null) {
            throw new IllegalArgumentException("No prefix provided!");
        } else if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX)) {
            return "http://univNaSpResolver/book";
        } else if (prefix.equals("books")) {
            return "http://univNaSpResolver/booklist";
        } else if (prefix.equals("fiction")) {
            return "http://univNaSpResolver/fictionbook";
        } else if (prefix.equals("technical")) {
            return "http://univNaSpResolver/sciencebook";
        } else {
            return XMLConstants.NULL_NS_URI;
        }
    }

    public String getPrefix(String namespaceURI) {
        // Not needed in this context.
        return null;
    }

    public Iterator getPrefixes(String namespaceURI) {
        // Not needed in this context.
        return null;
    }

}

Please notice that the namespace http://univNaSpResolver/sciencebook is bound to the prefix technical (not science as before). You will see the consequences in the example below (Listing 6). In Listing 5, the code using this resolver uses the new prefix.

Listing 5. Example 1 with hardcoded namespace resolution

    private static void example1(Document example)
            throws XPathExpressionException, TransformerException {
        sysout("\n*** First example - namespacelookup hardcoded ***");

        XPath xPath = XPathFactory.newInstance().newXPath();
        xPath.setNamespaceContext(new HardcodedNamespaceResolver());

...
        NodeList result1 = (NodeList) xPath.evaluate(
                "books:booklist/technical:book", example,
                XPathConstants.NODESET);
...
        NodeList result2 = (NodeList) xPath.evaluate(
                "books:booklist/fiction:book", example, XPathConstants.NODESET);
...
        String result = xPath.evaluate("books:booklist/technical:book/:author",
                example);
...
    }

This is the output from this example.

Listing 6. Output from example 1

*** First example - namespacelookup hardcoded ***
Using any namespaces results in a NodeList:
--> books:booklist/technical:book
Number of Nodes: 1
<?xml version="1.0" encoding="UTF-8"?>
  <science:book xmlns:science="http://univNaSpResolver/sciencebook">
    <title xmlns="http://univNaSpResolver/book">Learning XPath</title>
    <author xmlns="http://univNaSpResolver/book">Michael Schmidt</author>
  </science:book>
--> books:booklist/fiction:book
Number of Nodes: 2
<?xml version="1.0" encoding="UTF-8"?>
  <fiction:book xmlns:fiction="http://univNaSpResolver/fictionbook">
    <title xmlns="http://univNaSpResolver/book">Faust I</title>
    <author xmlns="http://univNaSpResolver/book">Johann Wolfgang von Goethe</author>
  </fiction:book>
<?xml version="1.0" encoding="UTF-8"?>
  <fiction:book xmlns:fiction="http://univNaSpResolver/fictionbook">
    <title xmlns="http://univNaSpResolver/book">Faust II</title>
    <author xmlns="http://univNaSpResolver/book">Johann Wolfgang von Goethe</author>
  </fiction:book>
The default namespace works also:
--> books:booklist/technical:book/:author
Michael Schmidt

As you see, XPath finds the nodes now. The advantage is that you can rename the prefixes as you wish, which is what I did with the prefix science. The XML file contains the prefix science, while XPath uses another prefix, technical. Because the URIs are the same, the nodes are found by XPath. The disadvantage is that you have to maintain the namespaces in more places: the XML, perhaps the XSD, the XPath expression, and the namespace context.
Read the namespaces from the document
The namespaces and their prefixes are documented in the XML file, so you can use them from there. The simplest way to do this is to delegate the lookup to the document.

Listing 7. Namespace resolution directly from the document

public class UniversalNamespaceResolver implements NamespaceContext {
    // the delegate
    private Document sourceDocument;

    /**
     * This constructor stores the source document to search the namespaces in
     * it.
     * 
     * @param document
     *            source document
     */
    public UniversalNamespaceResolver(Document document) {
        sourceDocument = document;
    }

    /**
     * The lookup for the namespace uris is delegated to the stored document.
     * 
     * @param prefix
     *            to search for
     * @return uri
     */
    public String getNamespaceURI(String prefix) {
        if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX)) {
            return sourceDocument.lookupNamespaceURI(null);
        } else {
            return sourceDocument.lookupNamespaceURI(prefix);
        }
    }

    /**
     * This method is not needed in this context, but can be implemented in a
     * similar way.
     */
    public String getPrefix(String namespaceURI) {
        return sourceDocument.lookupPrefix(namespaceURI);
    }

    public Iterator getPrefixes(String namespaceURI) {
        // not implemented yet
        return null;
    }

}

Remember these things:

If the document is changed before XPath is used, this change will still be reflected in the lookup of the namespace, because the delegation is done when needed using the current version of the document.
The lookup for namespaces or prefixes is done in the ancestors of the used node, in our case the node sourceDocument. This means, with the code provided, you only get the namespaces declared on the root node. The namespace science in our example is not found.
The lookup is called when XPath evaluates, so it consumes some extra time.

This is the example code:

Listing 8. Example 2 with namespace resolution directly from the document

    private static void example2(Document example)
            throws XPathExpressionException, TransformerException {
        sysout("\n*** Second example - namespacelookup delegated to document ***");

        XPath xPath = XPathFactory.newInstance().newXPath();
        xPath.setNamespaceContext(new UniversalNamespaceResolver(example));

        try {
...
            NodeList result1 = (NodeList) xPath.evaluate(
                    "books:booklist/science:book", example,
                    XPathConstants.NODESET);
...
        } catch (XPathExpressionException e) {
...
        }
...
        NodeList result2 = (NodeList) xPath.evaluate(
                "books:booklist/fiction:book", example, XPathConstants.NODESET);
...
        String result = xPath.evaluate(
                "books:booklist/fiction:book[1]/:author", example);
...
    }

The output from the example is:

Listing 9. Output from example 2

*** Second example - namespacelookup delegated to document ***
Try to use the science prefix: no result
--> books:booklist/science:book
The resolver only knows namespaces of the first level!
To be precise: Only namespaces above the node, passed in the constructor.
The fiction namespace is such a namespace:
--> books:booklist/fiction:book
Number of Nodes: 2
<?xml version="1.0" encoding="UTF-8"?>
  <fiction:book xmlns:fiction="http://univNaSpResolver/fictionbook">
    <title xmlns="http://univNaSpResolver/book">Faust I</title>
    <author xmlns="http://univNaSpResolver/book">Johann Wolfgang von Goethe</author>
  </fiction:book>
<?xml version="1.0" encoding="UTF-8"?>
  <fiction:book xmlns:fiction="http://univNaSpResolver/fictionbook">
    <title xmlns="http://univNaSpResolver/book">Faust II</title>
    <author xmlns="http://univNaSpResolver/book">Johann Wolfgang von Goethe</author>
  </fiction:book>
The default namespace works also:
--> books:booklist/fiction:book[1]/:author
Johann Wolfgang von Goethe

As you see in the output, the namespace declared on the book element with the prefix science is not resolved. The evaluate method throws an XPathExpressionException. To get around this problem, you might extract the node science:book from the document and use this node as the delegate. But this means extra parsing of the document and is not elegant.
Read the namespaces from the document and cache them
This next version of the NamespaceContext is better. It reads the namespaces only one time in advance in the constructor. Every call for a namespace is answered from a cache. As a consequence, a change in the document does not matter since the list of namespaces is cached at Java object creation time.

Listing 10. Caching the namespace resolution from the document

public class UniversalNamespaceCache implements NamespaceContext {
    private static final String DEFAULT_NS = "DEFAULT";
    private Map<String, String> prefix2Uri = new HashMap<String, String>();
    private Map<String, String> uri2Prefix = new HashMap<String, String>();

    /**
     * This constructor parses the document and stores all namespaces it can
     * find. If toplevelOnly is true, only namespaces in the root are used.
     * 
     * @param document
     *            source document
     * @param toplevelOnly
     *            restriction of the search to enhance performance
     */
    public UniversalNamespaceCache(Document document, boolean toplevelOnly) {
        examineNode(document.getFirstChild(), toplevelOnly);
        System.out.println("The list of the cached namespaces:");
        for (String key : prefix2Uri.keySet()) {
            System.out
                    .println("prefix " + key + ": uri " + prefix2Uri.get(key));
        }
    }

    /**
     * A single node is read, the namespace attributes are extracted and stored.
     * 
     * @param node
     *            to examine
     * @param attributesOnly,
     *            if true no recursion happens
     */
    private void examineNode(Node node, boolean attributesOnly) {
        NamedNodeMap attributes = node.getAttributes();
        for (int i = 0; i < attributes.getLength(); i++) {
            Node attribute = attributes.item(i);
            storeAttribute((Attr) attribute);
        }

        if (!attributesOnly) {
            NodeList chields = node.getChildNodes();
            for (int i = 0; i < chields.getLength(); i++) {
                Node chield = chields.item(i);
                if (chield.getNodeType() == Node.ELEMENT_NODE)
                    examineNode(chield, false);
            }
        }
    }

    /**
     * This method looks at an attribute and stores it, if it is a namespace
     * attribute.
     * 
     * @param attribute
     *            to examine
     */
    private void storeAttribute(Attr attribute) {
        // examine the attributes in namespace xmlns
        if (attribute.getNamespaceURI() != null
                && attribute.getNamespaceURI().equals(
                        XMLConstants.XMLNS_ATTRIBUTE_NS_URI)) {
            // Default namespace xmlns="uri goes here"
            if (attribute.getNodeName().equals(XMLConstants.XMLNS_ATTRIBUTE)) {
                putInCache(DEFAULT_NS, attribute.getNodeValue());
            } else {
                // The defined prefixes are stored here
                putInCache(attribute.getLocalName(), attribute.getNodeValue());
            }
        }

    }

    private void putInCache(String prefix, String uri) {
        prefix2Uri.put(prefix, uri);
        uri2Prefix.put(uri, prefix);
    }

    /**
     * This method is called by XPath. It returns the default namespace, if the
     * prefix is null or "".
     * 
     * @param prefix
     *            to search for
     * @return uri
     */
    public String getNamespaceURI(String prefix) {
        if (prefix == null || prefix.equals(XMLConstants.DEFAULT_NS_PREFIX)) {
            return prefix2Uri.get(DEFAULT_NS);
        } else {
            return prefix2Uri.get(prefix);
        }
    }

    /**
     * This method is not needed in this context, but can be implemented in a
     * similar way.
     */
    public String getPrefix(String namespaceURI) {
        return uri2Prefix.get(namespaceURI);
    }

    public Iterator getPrefixes(String namespaceURI) {
        // Not implemented
        return null;
    }

}

Please note that there is debug output in the code. The attributes of each node are examined and stored. The children are not examined, because the boolean toplevelOnly in the constructor is set to true. If the boolean is set to false, the examination of the children will start after the attributes are stored. One thing to consider about the code: In DOM, the first node represents the document as a whole, so, to get the element book to read the namespaces, you have to go to the children exactly one time.
In this case, using NamespaceContext is rather simple:

Listing 11. Example 3 with cached namespace resolution (toplevel only)

    private static void example3(Document example)
            throws XPathExpressionException, TransformerException {
        sysout("\n*** Third example - namespaces of toplevel node cached ***");

        XPath xPath = XPathFactory.newInstance().newXPath();
        xPath.setNamespaceContext(new UniversalNamespaceCache(example, true));

        try {
...
            NodeList result1 = (NodeList) xPath.evaluate(
                    "books:booklist/science:book", example,
                    XPathConstants.NODESET);
...
        } catch (XPathExpressionException e) {
...
        }
...
        NodeList result2 = (NodeList) xPath.evaluate(
                "books:booklist/fiction:book", example, XPathConstants.NODESET);
...
        String result = xPath.evaluate(
                "books:booklist/fiction:book[1]/:author", example);
...
    }

This results in the following output:

Listing 12. Output from Example 3

 
*** Third example - namespaces of toplevel node cached ***
The list of the cached namespaces:
prefix DEFAULT: uri http://univNaSpResolver/book
prefix fiction: uri http://univNaSpResolver/fictionbook
prefix books: uri http://univNaSpResolver/booklist
Try to use the science prefix:
--> books:booklist/science:book
The cache only knows namespaces of the first level!
The fiction namespace is such a namespace:
--> books:booklist/fiction:book
Number of Nodes: 2
<?xml version="1.0" encoding="UTF-8"?>
  <fiction:book xmlns:fiction="http://univNaSpResolver/fictionbook">
    <title xmlns="http://univNaSpResolver/book">Faust I</title>
    <author xmlns="http://univNaSpResolver/book">Johann Wolfgang von Goethe</author>
  </fiction:book>
<?xml version="1.0" encoding="UTF-8"?>
  <fiction:book xmlns:fiction="http://univNaSpResolver/fictionbook">
    <title xmlns="http://univNaSpResolver/book">Faust II</title>
    <author xmlns="http://univNaSpResolver/book">Johann Wolfgang von Goethe</author>
  </fiction:book>
The default namespace works also:
--> books:booklist/fiction:book[1]/:author
Johann Wolfgang von Goethe

This code only finds the namespaces of the root element. To be precise: the namespaces of the node passed into the method examineNode by the constructor. This speeds up the constructor because it does not have to iterate through the whole document. However, as you can see from the output, the science prefix cannot be resolved. The XPath expression leads to an exception (XPathExpressionException).
Read the namespaces from the document and all its elements and cache them
This version reads all namespace declarations from the XML file. Now, even the XPath on the prefix science works. One situation makes this version complicated: If a prefix is overloaded (declared in nested elements on different URIs), the last one found wins. In the real world, this typically is not a problem.
Using NamespaceContext in this example is the same as in the previous example. The boolean toplevelOnly in the constructor has to be set to false.

Listing 13. Example 4 with cached namespace resolution (all levels)

    private static void example4(Document example)
            throws XPathExpressionException, TransformerException {
        sysout("\n*** Fourth example - namespaces all levels cached ***");

        XPath xPath = XPathFactory.newInstance().newXPath();
        xPath.setNamespaceContext(new UniversalNamespaceCache(example, false));
...
        NodeList result1 = (NodeList) xPath.evaluate(
                "books:booklist/science:book", example, XPathConstants.NODESET);
...
        NodeList result2 = (NodeList) xPath.evaluate(
                "books:booklist/fiction:book", example, XPathConstants.NODESET);
...
        String result = xPath.evaluate(
                "books:booklist/fiction:book[1]/:author", example);
...
    }

This results in the following output:

Listing 14. Output from example 4

*** Fourth example - namespaces all levels cached ***
The list of the cached namespaces:
prefix science: uri http://univNaSpResolver/sciencebook
prefix DEFAULT: uri http://univNaSpResolver/book
prefix fiction: uri http://univNaSpResolver/fictionbook
prefix books: uri http://univNaSpResolver/booklist
Now the use of the science prefix works as well:
--> books:booklist/science:book
Number of Nodes: 1
<?xml version="1.0" encoding="UTF-8"?>
  <science:book xmlns:science="http://univNaSpResolver/sciencebook">
    <title xmlns="http://univNaSpResolver/book">Learning XPath</title>
    <author xmlns="http://univNaSpResolver/book">Michael Schmidt</author>
  </science:book>
The fiction namespace is resolved:
--> books:booklist/fiction:book
Number of Nodes: 2
<?xml version="1.0" encoding="UTF-8"?>
  <fiction:book xmlns:fiction="http://univNaSpResolver/fictionbook">
    <title xmlns="http://univNaSpResolver/book">Faust I</title>
    <author xmlns="http://univNaSpResolver/book">Johann Wolfgang von Goethe</author>
  </fiction:book>
<?xml version="1.0" encoding="UTF-8"?>
  <fiction:book xmlns:fiction="http://univNaSpResolver/fictionbook">
    <title xmlns="http://univNaSpResolver/book">Faust II</title>
    <author xmlns="http://univNaSpResolver/book">Johann Wolfgang von Goethe</author>
  </fiction:book>
The default namespace works also:
--> books:booklist/fiction:book[1]/:author
Johann Wolfgang von Goethe

Conclusion
You can choose from several ideas for the implementation of the namespace resolution that might be better than hardcoding it:

If your example is small and all namespaces are located in the top element, delegating to the document will do.
If you have bigger XML files with deep nesting and multiple XPath evaluations, it might be better to cache the list of namespaces.
But if you don't have control over the XML file, and someone can send you any prefixes they wish, it might be better to be independent of their choices. You can code your own namespace resolution as in Example 1 (HardcodedNamespaceResolver), and use them in your XPath expressions.

In all other cases, the NamespaceContext resolved from the XML file can make your code more general and smaller

xpath in JAVA

http://www.ibm.com/developerworks/library/x-javaxpathapi/index.html

If you send someone out to purchase a gallon of milk, what would you rather tell that person? "Please go buy a gallon of milk." Or, "Exit the house through the front door. Turn left at the sidewalk. Walk three blocks. Turn right. Walk one half block. Turn right and enter the store. Go to aisle four. Walk five meters down the aisle. Turn left. Pick up a gallon jug of milk. Bring it to the checkout counter. Pay for it. Then retrace your steps home." That's ridiculous. Most adults are intelligent enough to procure the milk on their own with little more instruction than "Please go buy a gallon of milk."
Query languages and computer search are similar. It's easier to say, "Find a copy of Cryptonomicon" than it is to write the detailed logic for searching some database. Because search operations have very similar logic, you can invent general languages that allow you to make statements like "Find all the books by Neal Stephenson," and then write an engine that processes those queries against certain data stores.
XPath
Among the many query languages, Structured Query Language (SQL) is a language designed and optimized for querying certain kinds of relational databases. Other less familiar query languages include Object Query Language (OQL) and XQuery. However, the subject of this article is XPath, a query language designed for querying XML documents. For example, a simple XPath query that finds the titles of all the books in a document whose author is Neal Stephenson might look like this:

//book[author="Neal Stephenson"]/title

By contrast, a pure DOM search for that same information would look something like Listing 1:

Listing 1. DOM code to find all the title elements of books by Neal Stephenson

                
ArrayList result = new ArrayList();
NodeList books = doc.getElementsByTagName("book");
for (int i = 0; i < books.getLength(); i++) {
    Element book = (Element) books.item(i);
    NodeList authors = book.getElementsByTagName("author");
    boolean stephenson = false;
    for (int j = 0; j < authors.getLength(); j++) {
        Element author = (Element) authors.item(j);
        NodeList children = author.getChildNodes();
        StringBuffer sb = new StringBuffer();
        for (int k = 0; k < children.getLength(); k++) {
            Node child = children.item(k);
            // really should to do this recursively
            if (child.getNodeType() == Node.TEXT_NODE) {
                sb.append(child.getNodeValue());
            }
        }
        if (sb.toString().equals("Neal Stephenson")) {
            stephenson = true;
            break;
        }
   }

    if (stephenson) {
        NodeList titles = book.getElementsByTagName("title");
        for (int j = 0; j < titles.getLength(); j++) {
            result.add(titles.item(j));
        }
    }

}

Believe it or not, the DOM code in Listing 1 still isn't as generic or robust as the simple XPath expression. Which would you rather write, debug, and maintain? I think the answer is obvious.
However, expressive as it is, XPath is not the Java language -- in fact, XPath is not a complete programming language. There are many things you can't say in XPath, even queries you can't make. For example, XPath can't find all the books whose International Standard Book Number (ISBN) check digit doesn't match or all the authors for whom the external accounts database shows a royalty payment is due. Fortunately, it is possible to integrate XPath into Java programs so that you get the best of both worlds: Java for what Java is good for and XPath for what XPath is good for.
Until recently, the exact application program interface (API) by which Java programs made XPath queries varied with the XPath engine. Xalan had one API, Saxon had another, and other engines had other APIs. This meant your code tended to lock you into one product. Ideally, you'd like to able to experiment with different engines that have different performance characteristics without undue hassle or rewriting of code.
For this reason, Java 5 introduced the javax.xml.xpath package to provide an engine and object-model independent XPath library. This package is also available in Java 1.3 and later if you install Java API for XML Processing (JAXP) 1.3 separately. Among other products, Xalan 2.7 and Saxon 8 include an implementation of this library.

A simple example
I'll begin with a demonstration of how this actually works in practice. Then I'll delve into some of the details. Suppose you want to query a list of books to find those written by Neal Stephenson. In particular, assume the list is in the form shown in Listing 2:

Listing 2. XML document containing book information

                
<inventory>
    <book year="2000">
        <title>Snow Crash</title>
        <author>Neal Stephenson</author>
        <publisher>Spectra</publisher>
        <isbn>0553380958</isbn>
        <price>14.95</price>
    </book>
 
    <book year="2005">
        <title>Burning Tower</title>
        <author>Larry Niven</author>
        <author>Jerry Pournelle</author>
        <publisher>Pocket</publisher>
        <isbn>0743416910</isbn>
        <price>5.99</price>
    </book>
 
    <book year="1995">
        <title>Zodiac</title>
        <author>Neal Stephenson</author>
        <publisher>Spectra</publisher>
        <isbn>0553573862</isbn>
        <price>7.50</price>
    </book>

    <!-- more books... -->
 
</inventory>

Abstract factories

The XPathFactory is an abstract factory. The abstract factory design pattern enables this one API to support different object models such as DOM, JDOM, and XOM. To choose a different model, you pass a Uniform Resource Identifier (URI) identifying the object model to the XPathFactory.newInstance() method. For example, http://xom.nu/ might select XOM. However, in practice, DOM is the only object model this API supports so far.

The XPath query that finds all the books is simple enough: //book[author="Neal Stephenson"]. To find the titles of those books, simply add one more step so the expression becomes //book[author="Neal Stephenson"]/title. Finally, what you really want are the text node children of the title element. This requires one more step so the full expression is //book[author="Neal Stephenson"]/title/text().
Now I'll produce a simple program that executes this search from Java language and then prints out the titles of all the books it finds. First you need to load the document into a DOM Document object. For simplicity, I'll assume the document is in the books.xml file in the current working directory. Here's a simple code fragment that parses the document and constructs the corresponding Document object:

Listing 3. Parsing a document with JAXP

                
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("books.xml");

So far, this is just standard JAXP and DOM, nothing really new.
Next you create an XPathFactory:

XPathFactory factory = XPathFactory.newInstance();

You then use this factory to create an XPath object:

XPath xpath = factory.newXPath();

The XPath object compiles the XPath expression:

XPathExpression expr = xpath.compile("//book[author='Neal Stephenson']/title/text()");

Finally, you evaluate the XPath expression to get the result. The expression is evaluated with respect to a certain context node, which in this case is the entire document. It's also necessary to specify the return type. Here I ask for a node-set back:

Object result = expr.evaluate(doc, XPathConstants.NODESET);

You can then cast the result to a DOM NodeList and iterate through that to find all the titles:

NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
    System.out.println(nodes.item(i).getNodeValue()); 
}

Listing 4 puts this all together into a single program. Notice also that these methods can throw several checked exceptions that I must declare in a throws clause, though I glossed over them above:

Listing 4. A complete program to query an XML document with a fixed XPath expression

                
import java.io.IOException;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import javax.xml.parsers.*;
import javax.xml.xpath.*;

public class XPathExample {

  public static void main(String[] args) 
   throws ParserConfigurationException, SAXException, 
          IOException, XPathExpressionException {

    DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
    domFactory.setNamespaceAware(true); // never forget this!
    DocumentBuilder builder = domFactory.newDocumentBuilder();
    Document doc = builder.parse("books.xml");

    XPathFactory factory = XPathFactory.newInstance();
    XPath xpath = factory.newXPath();
    XPathExpression expr 
     = xpath.compile("//book[author='Neal Stephenson']/title/text()");

    Object result = expr.evaluate(doc, XPathConstants.NODESET);
    NodeList nodes = (NodeList) result;
    for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i).getNodeValue()); 
    }

  }

}

The XPath data model
Whenever you mix two different languages such as XPath and Java, expect some noticeable seams where you've glued the two together. Not everything fits just right. XPath and Java language do not have identical type systems. XPath 1.0 has only four basic data types:

node-set
number
boolean
string

The Java language, of course, has many more, including user-defined object types.
Most XPath expressions, especially location paths, return node-sets. However, there are other possibilities. For example, the XPath expression count(//book) returns the number of books in the document. The XPath expression count(//book[@author="Neal Stephenson"]) > 10 returns a boolean: true if there are more than ten books by Neal Stephenson in the document, false if there are ten or fewer.
The evaluate() method is declared to return Object. What it actually does return depends on the result of the XPath expression, as well as the type you ask for. Generally speaking, an XPath

number maps to a java.lang.Double
string maps to a java.lang.String
boolean maps to a java.lang.Boolean
node-set maps to an org.w3c.dom.NodeList

XPath 2

So far I assumed that you're working with XPath 1.0. XPath 2 significantly expands and revises the type system. The main change needed in the Java XPath API to support XPath 2 is additional constants for returning the new XPath 2 types.

When you evaluate an XPath expression in Java, the second argument specifies the return type you want. There are five possibilities, all named constants in the javax.xml.xpath.XPathConstants class:

XPathConstants.NODESET
XPathConstants.BOOLEAN
XPathConstants.NUMBER
XPathConstants.STRING
XPathConstants.NODE

The last one, XPathConstants.NODE, doesn't actually match an XPath type. You use it when you know the XPath expression will only return a single node or you don't want more than one node. If the XPath expression does return more than one node and you've specified XPathConstants.NODE, then evaluate() returns the first node in document order. If the XPath expression selects an empty set and you've specified XPathConstants.NODE, then evaluate() returns null.
If the requested conversion can't be made, then evaluate() throws an XPathException.

Namespace contexts
If the elements in the XML document are in a namespace, then the XPath expression for querying that document must use the same namespace. The XPath expression does not need to use the same prefixes, only the same namespace URIs. Indeed, when the XML document uses the default namespace, the XPath expression must use a prefix even though the target document does not.
However, Java programs are not XML documents, so normal namespace resolution does not apply. Instead you provide an object that maps the prefixes to the namespace URIs. This object is an instance of the javax.xml.namespace.NamespaceContext interface. For example, suppose the books document is placed in the http://www.example.com/books namespace, as in Listing 5:

Listing 5. XML document using the default namespace

                
<inventory xmlns="http://www.example.com/books">
    <book year="2000">
        <title>Snow Crash</title>
        <author>Neal Stephenson</author>
        <publisher>Spectra</publisher>
        <isbn>0553380958</isbn>
        <price>14.95</price>
    </book>

    <!-- more books... -->

</inventory>

The XPath expression that finds the titles of all of Neal Stephenson's books now becomes something like //pre:book[pre:author="Neal Stephenson"]/pre:title/text(). However, you have to map the prefix pre to the URI http://www.example.com/books. It's a little silly that the NamespaceContext interface doesn't have a default implementation in the Java software development kit (JDK) or JAXP, but it doesn't. However, it's not hard to implement yourself. Listing 6 demonstrates a simple implementation just for this one namespace. You should map the xml prefix as well.

Listing 6. A simple context for binding a single namespace plus the default

                
import java.util.Iterator;
import javax.xml.*;
import javax.xml.namespace.NamespaceContext;

public class PersonalNamespaceContext implements NamespaceContext {

    public String getNamespaceURI(String prefix) {
        if (prefix == null) throw new NullPointerException("Null prefix");
        else if ("pre".equals(prefix)) return "http://www.example.com/books";
        else if ("xml".equals(prefix)) return XMLConstants.XML_NS_URI;
        return XMLConstants.NULL_NS_URI;
    }

    // This method isn't necessary for XPath processing.
    public String getPrefix(String uri) {
        throw new UnsupportedOperationException();
    }

    // This method isn't necessary for XPath processing either.
    public Iterator getPrefixes(String uri) {
        throw new UnsupportedOperationException();
    }

}

It's not hard to use a map to store the bindings and add setter methods that allow for a more reusable namespace context.
After you create a NamespaceContext object, install it on the XPath object before you compile the expression. From that point forward, you can query using those prefixes as before. For example:

Listing 7. XPath query that uses namespaces

                  XPathFactory factory = XPathFactory.newInstance();
  XPath xpath = factory.newXPath();
  xpath.setNamespaceContext(new PersonalNamespaceContext());
  XPathExpression expr 
    = xpath.compile("//pre:book[pre:author='Neal Stephenson']/pre:title/text()");

  Object result = expr.evaluate(doc, XPathConstants.NODESET);
  NodeList nodes = (NodeList) result;
  for (int i = 0; i < nodes.getLength(); i++) {
      System.out.println(nodes.item(i).getNodeValue()); 
  }

Function resolvers
On occasion, it's useful to define extension functions in Java language for use within XPath expressions. These functions perform tasks that are difficult to impossible to perform with pure XPath. However, they should be true functions, not simply arbitrary methods. That is, they should have no side-effects. (XPath functions can be evaluated in any order and any number of times.)
Extension functions accessed through the Java XPath API must implement the javax.xml.xpath.XPathFunction interface. This interface declares a single method, evaluate:

public Object evaluate(List args) throws XPathFunctionException

This method should return one of the five types that Java language can convert to XPath:

String
Double
Boolean
Nodelist
Node

For example, Listing 8 shows an extension function that verifies the checksum in an ISBN and returns a Boolean. The basic rule for this checksum is that each of the first nine digits is multiplied by its position (that is, the first digit times one, the second digit times two, and so on). These values are added, and the remainder after the division by eleven is taken. If the remainder is ten, then the last digit is X.

Listing 8. An XPath extension function for checking ISBNs

                
import java.util.List;
import javax.xml.xpath.*;
import org.w3c.dom.*;

public class ISBNValidator implements XPathFunction {

  // This class could easily be implemented as a Singleton.
    
  public Object evaluate(List args) throws XPathFunctionException {

    if (args.size() != 1) {
      throw new XPathFunctionException("Wrong number of arguments to valid-isbn()");
    }

    String isbn;
    Object o = args.get(0);

    // perform conversions
    if (o instanceof String) isbn = (String) args.get(0);
    else if (o instanceof Boolean) isbn = o.toString();
    else if (o instanceof Double) isbn = o.toString();
    else if (o instanceof NodeList) {
        NodeList list = (NodeList) o;
        Node node = list.item(0);
        // getTextContent is available in Java 5 and DOM 3.
        // In Java 1.4 and DOM 2, you'd need to recursively 
        // accumulate the content.
        isbn= node.getTextContent();
    }
    else {
        throw new XPathFunctionException("Could not convert argument type");
    }

    char[] data = isbn.toCharArray();
    if (data.length != 10) return Boolean.FALSE;
    int checksum = 0;
    for (int i = 0; i < 9; i++) {
        checksum += (i+1) * (data[i]-'0');
    }
    int checkdigit = checksum % 11;

    if (checkdigit + '0' == data[9] || (data[9] == 'X' && checkdigit == 10)) {
        return Boolean.TRUE;
    }
    return Boolean.FALSE;

  }

}

The next step is to make the extension function available to the Java program. To do this, you install a javax.xml.xpath.XPathFunctionResolver in the XPath object before compiling the expression. The function resolver maps an XPath name and namespace URI for the function to the Java class that implements the function. Listing 9 is a simple function resolver that maps the extension function valid-isbn with the namespace http://www.example.com/books to the class in Listing 8. For example, the XPath expression //book[not(pre:valid-isbn(isbn))] finds all the books whose ISBN checksum doesn't match.

Listing 9. A function context that recognizes the valid-isbn extension function

                
import javax.xml.namespace.QName;
import javax.xml.xpath.*;

public class ISBNFunctionContext implements XPathFunctionResolver {

  private static final QName name 
   = new QName("http://www.example.com/books", "valid-isbn");

  public XPathFunction resolveFunction(QName name, int arity) {
      if (name.equals(ISBNFunctionContext.name) && arity == 1) {
          return new ISBNValidator();
      }
      return null;
  }

}

Because extension functions must be in namespaces, you must use a NamespaceResolver when evaluating an expression containing extension functions, even if the document being queried doesn't use namespaces at all. Because XPathFunctionResolver, XPathFunction, and NamespaceResolver are interfaces, you can even put them all in the same class, if that's convenient.

In conclusion
It is far, far easier to write queries in declarative languages, like SQL and XPath, than in imperative languages, like Java and C. It is far, far easier to write complex logic in Turing complete languages, like Java and C, than in declarative languages, like SQL and XPath. Fortunately, it's possible to mix the two using APIs such as Java Database Connectivity (JDBC) and javax.xml.xpath. As more and more of the world's data moves to XML, javax.xml.xpath will become as important as java.sql already is.

Sunday, May 27, 2012

How+to+pass+array+values+in+query+string

http://forums.asp.net/t/1370240.aspx/1?How+to+pass+array+values+in+query+string

use cache...

1.aspx
ArrayList arr = new ArrayList();
arr.Insert(0, "file1");
arr.Insert(1, "file2");
arr.Insert(2, "file3");
Cache["test"] = arr;
arr = null;
Response.Redirect("2.aspx");

2.aspx
ArrayList ary = new ArrayList();
ary.Add(Cache["test"]);
Cache.Remove("test");


another way

1.aspx
ArrayList arr = new ArrayList();
arr.Add("file1");
arr.Add("file2");
arr.Add("file3");
string arry = String.Join(",", ((string[])arr.ToArray(typeof(String))));
Response.Redirect("1.aspx?file=" + arry);

2.aspx
string[] files = Request["file"].ToString().Split(',');
ArrayList arry = new ArrayList();
foreach (string file in files)
{
  arry.Add(file);
}

Friday, May 25, 2012

Dependency injection and Unit Testing

Dependency injection is a software design pattern that allows a choice of component to be made at run-time rather than compile time. This can be used, for example, as a simple way to load plugins dynamically or to choose mock objects in test environments vs. real objects in production environments.

Definition

Dependency injection involves at least three elements:

a dependent consumer,
a declaration of a component's dependencies, defined as interface contracts,
an injector (sometimes referred to as a provider or container) that creates instances of classes that implement a given dependency interface on request.

The dependent object describes what software component it depends on to do its work. The injector decides what concrete classes satisfy the requirements of the dependent object, and provides them to the dependent.
In conventional software development the dependent object decides for itself what concrete classes it will use. In the dependency injection pattern, this decision is delegated to the "injector" which can choose to substitute different concrete class implementations of a dependency contract interface at run-time rather than at compile time.
Being able to make this decision at run-time rather than compile time is the key advantage of dependency injection. Multiple, different implementations of a single software component can be created at run-time and passed into (injected) the same test code. The test code can then test each different software component without being aware that what has been injected is implemented differently.

Motivation

The primary purpose of the dependency injection pattern is to allow selection among multiple implementations of a given dependency interface at runtime, or via configuration files, instead of at compile time. The pattern is particularly useful for providing "mock" test implementations of complex components when testing; but is often used for locating plugin components, or locating and initializing software services.
Unit testing of components in large software systems is difficult, because components under test often require the presence of a substantial amount of infrastructure and set up in order to operate at all. Dependency injection simplifies the process of bringing up a working instance of an isolated component for testing. Because components declare their dependencies, a test can automatically bring up only those dependent components required to perform testing.
More importantly, injectors can be configured to swap in simplified "mock" implementations of dependent components when testing -- the idea being that the component under test can be tested in isolation as long as the substituted dependent components implement the contract of the dependent interface sufficiently to perform the unit test in question.
As an example, consider an automatic stock trading program that communicates with a live online trading service and stores historical analytic data in a distributed database. To test the component which recommends trades, one would ordinarily need to have a connection to the online service, and an actual distributed database, suitably populated with test data.
Using dependency injection, the components that provide access to the online service and back-end databases could be replaced altogether with a test implementation of the dependency interface contracts that provide just enough behavior to perform tests on the component under test.

Basics

Without dependency injection, a consumer component that needs a particular service in order to accomplish a task must create an instance of a class that concretely implements the dependency interface.
When using dependency injection, a consumer component specifies the service contract by interface, and the injector component selects an implementation on behalf of the dependent component.
In its simplest implementation, code that creates a dependent object supplies dependencies to that object via constructor arguments or by setting properties on the object.
More complicated implementations, such as Spring, Google Guice, and Microsoft Managed Extensibility Framework (MEF), automate this procedure. These frameworks identify constructor arguments or properties on the objects being created as requests for dependent objects, and automatically inject constructor arguments or set properties with pre-constructed instances of dependencies as part of the process of creating the dependent object. The client makes a request to the dependency injection system for an implementation of a particular interface; the dependency injection system creates the object, automatically filling in dependencies as required.

Code illustration using Java

Using the stock trading example mentioned above, the following Java examples show how coupled (manually injected) dependencies and framework-injected dependencies are typically staged.
The following interface contracts define the behavior of components in the sample system.

public interface IOnlineBrokerageService {
    String[] getStockSymbols();
    double getAskingPrice(String stockSymbol);
    double getOfferPrice(String stockSymbol);
    void putBuyOrder(String stockSymbol, int shares, double bidPrice);
    void putSellOrder(String stockSymbol, int shares, double offerPrice);
}
 
public interface IStockAnalysisService {
    double getEstimatedValue(String stockSymbol);
}
 
public interface IAutomatedStockTrader {
    void executeTrades();
}

Highly coupled dependency

The following example shows code with no dependency injection applied:

public class VerySimpleStockTraderImpl implements IAutomatedStockTrader {
    private IStockAnalysisService analysisService = new StockAnalysisServiceImpl();
    private IOnlineBrokerageService brokerageService = new NewYorkStockExchangeBrokerageServiceImpl();
 
    public void executeTrades() {
        for (String stockSymbol : brokerageService.getStockSymbols()) {
            double askPrice = brokerageService.getAskingPrice(stockSymbol);
            double estimatedValue = analysisService.getEstimatedValue(stockSymbol);
            if (askPrice < estimatedValue) {
                brokerageService.putBuyOrder(stockSymbol, 100, askPrice);
            }
        }
    }
}
 
public class MyApplication {
    public static void main(String[] args) {
        IAutomatedStockTrader stockTrader = new VerySimpleStockTraderImpl();
        stockTrader.executeTrades();
    }
}

The VerySimpleStockTraderImpl class creates instances of the IStockAnalysisService, and IOnlineBrokerageService by hard-coding constructor references to the concrete classes that implement those services.

Manually injected dependency

Refactoring the above example to use manual injection:

public class VerySimpleStockTraderImpl implements IAutomatedStockTrader {
    private IStockAnalysisService analysisService;
    private IOnlineBrokerageService brokerageService;
 
    public VerySimpleStockTraderImpl(
            IStockAnalysisService analysisService,
            IOnlineBrokerageService brokerageService) {
        this.analysisService = analysisService;
        this.brokerageService = brokerageService;
    }
    public void executeTrades() {
        …
    }
}
 
public class MyApplication {
    public static void main(String[] args) {
        IStockAnalysisService analysisService = new StockAnalysisServiceImpl();
        IOnlineBrokerageService brokerageService = new NewYorkStockExchangeBrokerageServiceImpl();
 
        IAutomatedStockTrader stockTrader = new VerySimpleStockTraderImpl(
            analysisService,
            brokerageService);
        stockTrader.executeTrades();
    }
}

In this example, MyApplication.main plays the role of dependency injector, selecting the concrete implementations of the dependencies required by VerySimpleStockTraderImpl, and supplying those dependencies via constructor injection.

Automatically injected dependency

There are several frameworks available that automate dependency management through delegation. Typically, this is done with a container using XML or metadata definitions. Refactoring the above example to use an external XML-definition framework:

    <contract id="IAutomatedStockTrader">
        <implementation>VerySimpleStockTraderImpl</implementation>
    </contract>
    <contract id="IStockAnalysisService" singleton="true">
        <implementation>StockAnalysisServiceImpl</implementation>
    </contract>
    <contract id="IOnlineBrokerageService" singleton="true">
        <implementation>NewYorkStockExchangeBrokerageServiceImpl</implementation>
    </contract>

public class VerySimpleStockTraderImpl implements IAutomatedStockTrader {
    private IStockAnalysisService analysisService;
    private IOnlineBrokerageService brokerageService;
 
    public VerySimpleStockTraderImpl(
            IStockAnalysisService analysisService,
            IOnlineBrokerageService brokerageService) {
        this.analysisService = analysisService;
        this.brokerageService = brokerageService;
    }
    public void executeTrades() {
        …
    }
}
 
public class MyApplication {
    public static void main(String[] args) {
        IAutomatedStockTrader stockTrader =
            (IAutomatedStockTrader) DependencyManager.create(typeof(IAutomatedStockTrader));
        stockTrader.executeTrades();
    }
}

In this case, a dependency injection service is used to retrieve an instance of a class that implements the IAutomatedStockTrader contract. From the configuration file the DependencyManager determines that it must create an instance of the VerySimpleStockTraderImpl class. By examining the constructor arguments via reflection, the DependencyManager further determines that the VerySimpleStockTraderImpl class has two dependencies; so it creates instances of the IStockAnalysisService and IOnlineBrokerageService, and supplies those dependencies as constructor arguments.
As there are many ways to implement dependency injection, only a small subset of examples are shown here. Dependencies can be registered, bound, located, externally injected, etc., by many different means. Hence, moving dependency management from one module to another can be accomplished in many ways.

Unit testing using injected mock implementations

Testing a stock trading application against a live brokerage service might have disastrous consequences. Dependency injection can be used to substitute test implementations in order to simplify unit testing. In the example given below, the unit test registers replacement implementations of the IOnlineBrokerageService and IStockAnalysisService in order to perform tests, and validate the behavior of VerySimpleStockTraderImpl.

public class VerySimpleStockBrokerTest {
    // Simplified "mock" implementation of IOnlineBrokerageService.
    public static class MockBrokerageService implements IOnlineBrokerageService {
        public String[] getStockSymbols() { 
            return new String[] {"ACME"};
        }
        public double getAskingPrice(String stockSymbol) {
            return 100.0; // (just enough to complete the test)
        }
        public double getOfferPrice(String stockSymbol) { 
            return 100.0;
        }
        public void putBuyOrder(String stockSymbol, int shares, double bidPrice) {
             Assert.Fail("Should not buy ACME stock!");
        }
        public void putSellOrder(String stockSymbol, int shares, double offerPrice) {
             // not used in this test.
             throw new NotImplementedException(); 
        }
    }
 
    public static class MockAnalysisService implements IStockAnalysisService {
        public double getEstimatedValue(String stockSymbol) {
            if (stockSymbol.equals("ACME")) return 1.0;
            return 100.0;
        }
    }
 
    public void TestVerySimpleStockTraderImpl() {
        // Direct the DependencyManager to use test implementations.
        DependencyManager.register(
            typeof(IOnlineBrokerageService),
            typeof(MockBrokerageService));
        DependencyManager.register(
            typeof(IStockAnalysisService),
            typeof(MockAnalysisService));
 
        IAutomatedStockTrader stockTrader =
            (IAutomatedStockTrader) DependencyManager.create(typeof(IAutomatedStockTrader));
        stockTrader.executeTrades();
    }
}

Benefits

One benefit of using the dependency injection approach is the reduction of boilerplate code in the application objects since all work to initialize or set up dependencies is handled by a provider component.^[1]
Another benefit is that it offers configuration flexibility because alternative implementations of a given service can be used without recompiling code. This is useful in unit testing, as it is easy to inject a fake implementation of a service into the object being tested by changing the configuration file, or overriding component registrations at run-time.
Furthermore, dependency injection facilitates the writing of testable code.

Types

Martin Fowler identifies three ways in which an object can get a reference to an external module, according to the pattern used to provide the dependency:^[2]

Type 1 or interface injection, in which the exported module provides an interface that its users must implement in order to get the dependencies at runtime.
Type 2 or setter injection, in which the dependent module exposes a setter method that the framework uses to inject the dependency.
Type 3 or constructor injection, in which the dependencies are provided through the class constructor.

It is possible for other frameworks to have other types of injection, beyond those presented above.^[

Initialization-on-demand holder idiom

http://en.wikipedia.org/wiki/Initialization_on_demand_holder_idiom

In software engineering, the Initialization on Demand Holder idiom (design pattern) is a lazy-loaded singleton. The idiom can be implemented in both single-threaded/serial and concurrent environments, but care must be taken to correctly implement the idiom under concurrent conditions

Example Java Implementation

This implementation is a well-performing and concurrent implementation valid in all versions of Java. The original implementation from Bill Pugh (see links below), based on the earlier work of Steve Quirk, has been modified to reduce the scope of LazyHolder.INSTANCE to private and to make the field final.

public class Something {
        private Something() {
        }
 
        private static class LazyHolder {
                public static final Something INSTANCE = new Something();
        }
 
        public static Something getInstance() {
                return LazyHolder.INSTANCE;
        }
}

How it works

The implementation relies on the well-specified initialization phase of execution within the Java Virtual Machine (JVM); see section 12.4 of Java Language Specification (JLS) for details.
When the class Something is loaded by the JVM, the class goes through initialization. Since the class does not have any static variables to initialize, the initialization completes trivially. The static class definition LazyHolder within it is not initialized until the JVM determines that LazyHolder must be executed. The static class LazyHolder is only executed when the static method getInstance is invoked on the class Something, and the first time this happens the JVM will load and initialize the LazyHolder class. The initialization of the LazyHolder class results in static variable INSTANCE being initialized by executing the (private) constructor for the outer class Something. Since the class initialization phase is guaranteed by the JLS to be serial, i.e., non-concurrent, no further synchronization is required in the static getInstance method during loading and initialization. And since the initialization phase writes the static variable INSTANCE in a serial operation, all subsequent concurrent invocations of the getInstance will return the same correctly initialized INSTANCE without incurring any additional synchronization overhead.

When to use it

Use this pattern if the initialization of the class is expensive and it cannot be done safely at class-loading time and the initialization is highly concurrent. The crux of the pattern is the safe removal of the synchronization overhead associated with accessing a singleton instance.

When not to use it

Avoid this idiom if the construction of INSTANCE can fail. If construction of INSTANCE fails, an invocation of Something.getInstance() will result in a java.lang.NoClassDefFoundError error. Handling, or mishandling, of these types of construction initialization failures is a common criticism of this idiom and the singleton pattern in general.

Volatile variable

Generally speaking, the volatile keyword is intended to prevent the compiler from applying any optimizations on the code that assume values of variables cannot change "on their own."

The Java programming language also has the volatile keyword, but it is used for a somewhat different purpose. When applied to a field, the Java volatile guarantees that:

(In all versions of Java) There is a global ordering on the reads and writes to a volatile variable. This implies that every thread accessing a volatile field will read its current value before continuing, instead of (potentially) using a cached value. (However, there is no guarantee about the relative ordering of volatile reads and writes with regular reads and writes, meaning that it's generally not a useful threading construct.)
(In Java 5 or later) Volatile reads and writes establish a happens-before relationship, much like acquiring and releasing a mutex.^[8]

Using volatile may be faster than a lock, but it will not work in some situations.^{[citation needed]} The range of situations in which volatile is effective was expanded in Java 5; in particular, double-checked locking now works correctly.

Double Checked Locking

http://en.wikipedia.org/wiki/Double-checked_locking

In software engineering, double-checked locking (also known as "double-checked locking optimization"^[1]) is a software design pattern used to reduce the overhead of acquiring a lock by first testing the locking criterion (the "lock hint") without actually acquiring the lock. Only if the locking criterion check indicates that locking is required does the actual locking logic proceed.
The pattern, when implemented in some language/hardware combinations, can be unsafe. At times, it can be considered an anti-pattern.^[2]
It is typically used to reduce locking overhead when implementing "lazy initialization" in a multi-threaded environment, especially as part of the Singleton pattern. Lazy initialization avoids initializing a value until the first time it is accessed.

Usage in Java

Consider, for example, this code segment in the Java programming language as given by [3] (as well as all other Java code segments):

// Single threaded version
class Foo {
    private Helper helper = null;
    public Helper getHelper() {
        if (helper == null) {
            helper = new Helper();
        }
        return helper;
    }
 
    // other functions and members...
}

The problem is that this does not work when using multiple threads. A lock must be obtained in case two threads call getHelper() simultaneously. Otherwise, either they may both try to create the object at the same time, or one may wind up getting a reference to an incompletely initialized object.
The lock is obtained by expensive synchronizing, as is shown in the following example.

// Correct but possibly expensive multithreaded version
class Foo {
    private Helper helper = null;
    public synchronized Helper getHelper() {
        if (helper == null) {
            helper = new Helper();
        }
        return helper;
    }
 
    // other functions and members...
}

However, the first call to getHelper() will create the object and only the few threads trying to access it during that time need to be synchronized; after that all calls just get a reference to the member variable. Since synchronizing a method can decrease performance by a factor of 100 or higher,^[3] the overhead of acquiring and releasing a lock every time this method is called seems unnecessary: once the initialization has been completed, acquiring and releasing the locks would appear unnecessary. Many programmers have attempted to optimize this situation in the following manner:

Check that the variable is initialized (without obtaining the lock). If it is initialized, return it immediately.
Obtain the lock.
Double-check whether the variable has already been initialized: if another thread acquired the lock first, it may have already done the initialization. If so, return the initialized variable.
Otherwise, initialize and return the variable.

// Broken multithreaded version
// "Double-Checked Locking" idiom
class Foo {
    private Helper helper = null;
    public Helper getHelper() {
        if (helper == null) {
            synchronized(this) {
                if (helper == null) {
                    helper = new Helper();
                }
            }
        }
        return helper;
    }
 
    // other functions and members...
}

Intuitively, this algorithm seems like an efficient solution to the problem. However, this technique has many subtle problems and should usually be avoided. For example, consider the following sequence of events:

Thread A notices that the value is not initialized, so it obtains the lock and begins to initialize the value.
Due to the semantics of some programming languages, the code generated by the compiler is allowed to update the shared variable to point to a partially constructed object before A has finished performing the initialization. For example, in Java if a call to a constructor has been inlined then the shared variable may immediately be updated once the storage has been allocated but before the inlined constructor initializes the object.^[4]
Thread B notices that the shared variable has been initialized (or so it appears), and returns its value. Because thread B believes the value is already initialized, it does not acquire the lock. If B uses the object before all of the initialization done by A is seen by B (either because A has not finished initializing it or because some of the initialized values in the object have not yet percolated to the memory B uses (cache coherence)), the program will likely crash.

One of the dangers of using double-checked locking in J2SE 1.4 (and earlier versions) is that it will often appear to work: it is not easy to distinguish between a correct implementation of the technique and one that has subtle problems. Depending on the compiler, the interleaving of threads by the scheduler and the nature of other concurrent system activity, failures resulting from an incorrect implementation of double-checked locking may only occur intermittently. Reproducing the failures can be difficult.
As of J2SE 5.0, this problem has been fixed. The volatile keyword now ensures that multiple threads handle the singleton instance correctly. This new idiom is described in [4]:

// Works with acquire/release semantics for volatile
// Broken under Java 1.4 and earlier semantics for volatile
class Foo {
    private volatile Helper helper = null;
    public Helper getHelper() {
        Helper result = helper;
        if (result == null) {
            synchronized(this) {
                result = helper;
                if (result == null) {
                    helper = result = new Helper();
                }
            }
        }
        return result;
    }
 
    // other functions and members...
}

Note the usage of the local variable result which seems unnecessary. For some versions of the Java VM, it will make the code 25% faster and for others, it won't hurt.^[5]
If the helper object is static (one per class loader), an alternative is the initialization on demand holder idiom ^[6] See Listing 16.6 on ^[7]

// Correct lazy initialization in Java 
@ThreadSafe
class Foo {
    private static class HelperHolder {
       public static Helper helper = new Helper();
    }
 
    public static Helper getHelper() {
        return HelperHolder.helper;
    }
}

This relies on the fact that inner classes are not loaded until they are referenced.
Semantics of final field in Java 5 can be employed to safely publish the helper object without using volatile:^[8]

public class FinalWrapper<T> {
    public final T value;
    public FinalWrapper(T value) { 
        this.value = value; 
    }
}
 
public class Foo {
   private FinalWrapper<Helper> helperWrapper = null;
 
   public Helper getHelper() {
      FinalWrapper<Helper> wrapper = helperWrapper;
 
      if (wrapper == null) {
          synchronized(this) {
              if (helperWrapper == null) {
                  helperWrapper = new FinalWrapper<Helper>(new Helper());
              }
              wrapper = helperWrapper;
          }
      }
      return wrapper.value;
   }
}

The local variable wrapper is required for correctness. Performance of this implementation is not necessarily better than the volatile implementation.

Programming + Errors