7. Parse an XML file
Document document = builder.parse("../inventory.xml");
the entire XML file (as a tree)
(the Document Object Model)
20/01/2009
7
8. Root element
the root element
Element root = document.getDocumentElement();
System.out.println(root.getTagName());
20/01/2009
8
9. Nodes
Node
Text
Element
may have children
Attr
leaves
Operations on Nodes
Element
Text
Attr
getNodeName()
tag name
"#text"
name of attribute
getNodeValue()
null
text contents
value of attribute
getNodeType()
ELEMENT_NODE
TEXT_NODE
ATTRIBUTE_NODE
getAttributes()
NamedNodeMap
null
null
20/01/2009
9
10. Distinguishing Node types
switch(node.getNodeType()) {
case Node.ELEMENT_NODE:
Element element = (Element)node;
...;
break;
case Node.TEXT_NODE:
Text text = (Text)node;
...
break;
case Node.ATTRIBUTE_NODE:
Attr attr = (Attr)node;
...
break;
default: ...
}
20/01/2009
10
19. Create the parser
// Create a parser factory
SAXParserFactory factory = SAXParserFactory.newInstance();
// Tell factory that the parser must understand namespaces
factory.setNamespaceAware(true);
try {
// Make the parser
SAXParser saxParser = factory.newSAXParser();
XMLReader parser = saxParser.getXMLReader();
} catch(Exception e){
e.printStackTrace();
}
20/01/2009
IOException
ParserConfigurationException
SAXException
19
20. Parse an XML file
// Create a handler
Handler handler = new Handler();
// Tell the parser to use this handler
parser.setContentHandler(handler);
// Finally, read and parse the document
parser.parse("./inventory.xml");
20/01/2009
20
21. SAX handlers
A callback handler for SAX must implement four interfaces:
interface ContentHandler
interface DTDHandler
interface EntityResolver
interface ErrorHandler
It is easier to use an adapter class
20/01/2009
21
22. Class DefaultHandler
DefaultHandler is in package org.xml.sax.helpers
DefaultHandler implements ContentHandler, DTDHandler,
EntityResolver, and ErrorHandler
DefaultHandler is an adapter class
Provides empty methods for every method declared in each of the four
interfaces
To use this class, extend it and override the methods that are
important to your application
20/01/2009
22
23. The Handler class
class Handler extends DefaultHandler {
// SAX calls this method when it encounters a start tag
public void startElement(String namespaceURI,
String localName,
String qualifiedName,
Attributes attributes) throws SAXException {
System.out.println("startElement: " + qualifiedName);
}
// SAX calls this method to pass in character data
public void characters(char ch[], int start, int length)
throws SAXException {
System.out.println("characters: "" +
new String(ch, start, length) + """);
}
// SAX call this method when it encounters an end tag
public void endElement(String namespaceURI,
String localName,
String qualifiedName) throws SAXException {
System.out.println("endElement: /" + qualifiedName);
}
}
20/01/2009
23
30. Parse an XML file
Document document = builder.build("../inventory.xml");
20/01/2009
30
31. Root element
Element root = document.getRootElement();
System.out.println(root.getName());
20/01/2009
31
32. Print out the document
XMLOutputter outputter = new XMLOutputter();
outputter.output(document, System.out);
StringWriter sw = new StringWriter();
XMLOutputter outputter = new XMLOutputter();
outputter.output(document, sw);
String xml = sw.toString();
Advantage 1:
20/01/2009
Output facility
32
33. Get children
• Get all direct children
List allChildren = element.getChildren();
• Get all direct children with a given name
List namedChildren = element.getChildren("book");
• Get the first child with a given name
Element child = element.getChild("book");
Advantage 2:
20/01/2009
supports Java Collections
33
34. Travel through children nodes
List children = element.getChildren();
for (int i = 0; i < children.size(); i++) {
Element elem = (Element) children.get(i);
// ....
}
20/01/2009
34
35. Get attributes
• Get all attributes
List attrs = element.getAttributes();
for (int i = 0; i < attrs.size(); i++)
Attribute attr = (Attribute) attrs.get(i);
System.out.println(attr.getName()+" = "+attr.getValue());
}
• Get an attribute with a given name
Attribute attr = element.getAttribute("year");
• Get an attribute value with a given name
String value = element.getAttributeValue("year");
20/01/2009
35
36. Reading Element Content
• The text content is directly available
String content = element.getText();
• Remove extra whitespace
String content = element.getTextTrim();
20/01/2009
36
37. Mixed Content
• Sometimes an element may contain comments, text content, and children
<table>
<!-- Some comment -->
Some text
<tr>Some child</tr>
</table>
String text = table.getTextTrim();
Element tr = table.getChild("tr");
20/01/2009
37
38. Mixed Content
List mixedContent = table.getContent();
Iterator iter = mixedContent.iterator();
while (iter.hasNext()) {
Object obj = iter.next();
if (obj instanceof Comment) {
System.out.println("Comment: " + obj);
} else if (obj instanceof String) {
System.out.println("String: " + obj);
} else if (obj instanceof Element) {
System.out.println("Element: " + ((Element)obj).getName());
}
}
20/01/2009
38
39. References
Processing XML with Java; Elliotte Rusty Harold
http://cafeconleche.org/books/xmljava/chapters/index.html
20/01/2009
39