Presentation describes modern ways of parsing XML documents using Java language. It shows different approaches to the same problem, their capabilities, advantages, disadvantages and their comparison. Moreover, we can learn what to expect from Java 7 in context of XML.
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage
1. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
eXtensible Markup Language APIs in Java 1.6
Simple and efficient XML parsing using Java lanaguage
Wojciech Podg´rski
o
http://podgorski.wordpress.com
April 8, 2008
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
2. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Presentation outline
1 Introduction
What is parsing
Diffrent ways of parsing documents
2 XML API’s in Java
SAX
DOM
StAX
3 Capabilities and performance comparison
4 CASE STUDY: Parsing Really Simple Syndication (RSS) doc
5 What next? Alternatives to API’s, Java SE 7.0 features
6 Summary
7 Further reading...
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
3. Introduction
XML API’s in Java
Capabilities and performance comparison
What is parsing
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
Diffrent ways of parsing documents
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Parsing definition
Parsing, more formally called syntactic analysis is the process of
analyzing a sequence of tokens to determine grammatical structure
with respect to a given formal grammar.
Source: http://en.wikipedia.org/wiki/Parsing
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
4. Introduction
XML API’s in Java
Capabilities and performance comparison
What is parsing
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
Diffrent ways of parsing documents
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
We can distinguish three main models of parsing XML documents.
Each one of them differs with mechanism of traversing between
the nodes and idea of processing XML data.
Those models are:
SAX - Simple API for XML
DOM - Document Object Model
StAX - Streaming API for XML
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
5. Introduction
XML API’s in Java
Capabilities and performance comparison
What is parsing
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
Diffrent ways of parsing documents
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
We can distinguish three main models of parsing XML documents.
Each one of them differs with mechanism of traversing between
the nodes and idea of processing XML data.
Those models are:
SAX - Simple API for XML
DOM - Document Object Model
StAX - Streaming API for XML
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
6. Introduction
XML API’s in Java
Capabilities and performance comparison
What is parsing
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
Diffrent ways of parsing documents
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
We can distinguish three main models of parsing XML documents.
Each one of them differs with mechanism of traversing between
the nodes and idea of processing XML data.
Those models are:
SAX - Simple API for XML
DOM - Document Object Model
StAX - Streaming API for XML
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
7. Introduction
XML API’s in Java
Capabilities and performance comparison
What is parsing
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
Diffrent ways of parsing documents
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
We can distinguish three main models of parsing XML documents.
Each one of them differs with mechanism of traversing between
the nodes and idea of processing XML data.
Those models are:
SAX - Simple API for XML
DOM - Document Object Model
StAX - Streaming API for XML
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
8. Introduction
XML API’s in Java
Capabilities and performance comparison
What is parsing
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
Diffrent ways of parsing documents
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
That’s not all! There are other approaches, which won’t be
described in this presentation.
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
9. Introduction
XML API’s in Java
Capabilities and performance comparison
What is parsing
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
Diffrent ways of parsing documents
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
That’s not all! There are other approaches, which won’t be
described in this presentation.
JAXB - Java XML Binding API
Technology providing ability to marshal Java objects into
XML and the reverse, i.e. to unmarshal XML elements back
into Java objects. Working on top of another parser (mostly
streaming parsers).
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
10. Introduction
XML API’s in Java
Capabilities and performance comparison
What is parsing
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
Diffrent ways of parsing documents
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Javolution
Library providing real-time StAX-like implementation which
does not force object creation and has smaller effect on
memory footprint/garbage collection, using eg. lookup tables
for retriving and reusing data.
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
11. Introduction
XML API’s in Java
Capabilities and performance comparison
What is parsing
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
Diffrent ways of parsing documents
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Javolution
Library providing real-time StAX-like implementation which
does not force object creation and has smaller effect on
memory footprint/garbage collection, using eg. lookup tables
for retriving and reusing data.
VTD-XML - Virtual Token Descriptor for XML
Collection of efficient processing technologies, centered
around a non-extractive and ‘document-centric‘ parsing
technique called VTD. Supports random access’ and XPath
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
12. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
SAX as a processing model
While describing SAX, firstly it should be considered as a specific
processing mechanism, rather then simple API. SAX represents
event-driven architecture. It means, that parser would perform
an operation each time when a particular event will occur.
To handle these occurences, user defines a number of callback
methods, which will be called when parser is notified about
encountered element.
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
13. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Figure: Top-down parsing in SAX API
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
14. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
In Java language, SAX API is a collection of classes and interfaces,
which should be implemented while constructing XML parser.
Package containing this collection is:
org.xml.sax.*
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
15. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Figure: org.xml.sax.* package class diagram
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
16. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Basic class structure
1 // D e c l a r e document URI
2 S t r i n g xmlURI = ” h t t p : / / e x a m p l e . com/ r e p o r t . xml ” ;
3
4 // C r e a t e r e a d e r i n s t a n c e
5 XMLReader r e a d e r = XMLReaderFactory . createXMLReader ( ) ;
6
7 // S e t i m p l e m n t a t i o n c l a s s o f C o n t e n t H a n d l e r
8 r e a d e r . s e t C o n t e n t H a n d l e r ( new MyContentHandler ( ) ) ;
9
10 // R e s o l v e document s o u r c e
11 I n p u t S o u r c e i n p u t S o u r c e = new I n p u t S o u r c e ( xmlURI ) ;
12
13 // P a r s e document
14 reader . parse ( inputSource );
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
17. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Diffrent SAX implementations
1 // X e r c e s i m p l e m e n t a t i o n
2 XMLReader r e a d e r =
3 new o r g . a p a c h e . x e r c e s . p a r s e r s . SAXParser ( ) ;
4
5 // JAXP i m p l e m e n t a t i o n
6 SAXParser p a r s e r = S A X P a r s e r F a c t o r y . newSAXParser ( ) ;
7 XMLReader r e a d e r = p a r s e r ;
8
9 // P i c c o l o i m p l e m e n t a t i o n
10 XMLReader r e a d e r = new com . b l u e c a s t . xml . P i c c o l o ( ) ;
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
18. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Other SAX features
SAX provides number of interfaces for correct data handling. Some
of them, not only process the content of document, but also it’s
structure.
Interfaces such as:
ErrorHandler
EntityResolver
DTDHandler
Analyze also structure of the document, for possible errors, entity
links or elements describing other elements.
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
19. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Advanced SAX features I
SAX API is considered as very flexible solution. Mainly because it
can be configured by properites and features.
1 void setProperty ( S t r i n g propertyID , Object value ) ;
2 void setFeature ( String featureID , boolean state ) ;
Properties and features modify parser behaviour while processing
document. For example, we can validate if document is well-formed
XML file, or validate it against the schema related to it.
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
20. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Advanced SAX features II
Among many other interesting SAX features, one is very important
and radically extends SAX capabilities. Interface XMLFilter allows
to create a cascade of parsers, each for a different processing
operation. It greatly accelerates parsing as a one piece.
Figure: Cascade processing using XMLFilter interface
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
21. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
What SAX cannot do... I
Q: Why do we need other mechanisms, if SAX is so good?
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
22. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
What SAX cannot do... I
Q: Why do we need other mechanisms, if SAX is so good?
A: SAX has some serious limitations due to his sequential data
access.
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
23. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
What SAX cannot do... II
SAX parse data from beginning to end. It doesn’t allow to go
back. And also got some other negative issues.:
it is unable to modify content or structure of document
it cannot access specific or random elements
it cannot access sibling elements
it is not serializable
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
24. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
What SAX cannot do... II
SAX parse data from beginning to end. It doesn’t allow to go
back. And also got some other negative issues.:
it is unable to modify content or structure of document
it cannot access specific or random elements
it cannot access sibling elements
it is not serializable
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
25. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
What SAX cannot do... II
SAX parse data from beginning to end. It doesn’t allow to go
back. And also got some other negative issues.:
it is unable to modify content or structure of document
it cannot access specific or random elements
it cannot access sibling elements
it is not serializable
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
26. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
What SAX cannot do... II
SAX parse data from beginning to end. It doesn’t allow to go
back. And also got some other negative issues.:
it is unable to modify content or structure of document
it cannot access specific or random elements
it cannot access sibling elements
it is not serializable
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
27. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
What SAX cannot do... II
SAX parse data from beginning to end. It doesn’t allow to go
back. And also got some other negative issues.:
it is unable to modify content or structure of document
it cannot access specific or random elements
it cannot access sibling elements
it is not serializable
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
28. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
What SAX cannot do... II
SAX parse data from beginning to end. It doesn’t allow to go
back. And also got some other negative issues.:
it is unable to modify content or structure of document
it cannot access specific or random elements
it cannot access sibling elements
it is not serializable
So it seems, that it is useless. THAT’S NOT TRUE! (comparison
section). Every issue mentioned above can be resolved by SAX
complement...
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
29. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
DOM as a processing model
Document Object Model is based on a whole different idea.
It doesn’t parse document and react to specific events (though it is
able to), instead of this it builds up a tree based on documents
structure, and store it in memory as an object.
Due to this, every node in this tree is always available and can be
accessed later on, many times. Moreover, strucutre stored in
memory, can be easily transformed in many ways.
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
30. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
DOM architecture I
DOM, in contrary to SAX, is a standard developed by W3C1 . Due
to standarization it has strict architecture divided into levels, each
containing required and optional modules.
To claim to support a level, an application must implement all the
requirements of the claimed level and the levels below it. There are
3 levels, the newest (DOM 3) has been developed in 2004 and is
the current release of the DOM specification.
Every level has it’s core, which is a root element for other modules
(figure)
1
Refernce to the standard could be found on W3C sites
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
31. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Figure: Document Object Model architecture (Adapted from original W3C specification)
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
32. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
In Java language, DOM has a different structure than SAX. Almost
every class representing Document Object Model implements
interfaces inherited from org.w3c.dom.Node interface.
Such framework, allows very simple data manipulation and
traversing between nodes contained in tree structure. It is essential
to understand how elements are stored in tree (figure).
For example if we want to read text data from element A, we
should get his child element contatining text, not extract elements
A content.
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
33. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Figure: org.w3c.dom.* package class diagram From [1]
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
34. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Basic class structure using Java implementation
1 S t r i n g docURI = ” h t t p : / / e x a m p l e . o r g / n u t r i t i o n . xml ” ;
2 // g e t new D o c u m e n t B u i l d e r F a c t o r y
3 DocumentBuilderFactory docBuilderFactory =
4 DocumentBuilderFactory . newInstance ( ) ;
5 // g e t new D o c u m e n t B u i l d e r
6 DocumentBuilder d o c B u i l d e r =
7 d o c B u i l d e r F a c t o r y . n ew Do c um en t Bu il de r ( ) ;
8 // i n i t i a l i z e document w i t h n u l l
9 Document doc = n u l l ;
10 // p a r s e document
11 doc = d o c B u i l d e r . p a r s e ( docURI ) ;
12 // e x t r a c t r o o t e l e m e n t and
13 // n o r m l i z e w h o l e t r e e ( o p t i o n a l )
14 doc . getDocumentElement ( ) . n o r m a l i z e ( ) ;
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
35. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Accessing elements
1 NodeList elements = n u l l ;
2 // g e t ” f o o d ” e l e m e n t s
3 e l e m e n t s = doc . getElementsByTagName ( ” f o o d ” ) ;
4 f o r ( i n t i =0; i <e l e m e n t s . g e t L e n g t h ( ) ; i ++)
5 // g e t ” Avocado D i p s ”
6 S t r i n g foodName = e l e m e n t s . i t e m ( i ) . getNodeName ( ) ;
7 i f ( foodName . c o n t a i n s ( ” Avocado Dip ” ) )
8 {
9 NodeList l = elements . item ( i ) . getChildNodes ( ) ;
10 f o r ( i n t j =0; j <l . g e t L e n g t h ( ) ; j ++)
11 // p r i n t o u t c a l o r i e s
12 i f ( l . i t e m ( j ) . getNodeName ( ) . e q u a l s ( ” c a l o r i e s ” ) )
13 System . o u t . p r i n t l n ( l . i t e m ( j ) . g e t T e x t C o n t e n t ( ) ) ;
14 }
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
36. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Modyfing elements
1 ...
2 i f ( l . i t e m ( j ) . getNodeName ( ) . e q u a l s ( ” c a l o r i e s ” ) )
3 {
4 I n t e g e r c a l =( I n t e g e r ) ( l . i t e m ( j ) . g e t T e x t C o n t e n t ( ) ) ;
5 // i f f o o d a v o c a d o d i p h a s more t h a n 300 c a l .
6 i f ( c a l > 300)
7 {
8 El em e n t a v o c a d o d i p = l . i t e m ( j ) . g e t P a r e n t N o d e ( ) ;
9 // r e p l a c e i t w i t h low f a t f o o d
10 El em e n t newfood=doc . c r e a t e E l e m e n t ( ” LowFatFood ” ) ;
11 doc . r e p l a c e C h i l d ( newfood , a v o c a d o d i p ) ;
12 }
13 }
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
37. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Diffrent DOM implementations
1 // X e r c e s DOM i m p l e m e n t a t i o n
2 DOMParser p=new o r g . a p a c h e . x e r c e s . p a r s e r s . DOMParser ( ) ;
3 p . p a r s e ( new I n p u t S o u r c e ( xmlURI ) ) ;
4 Document doc = p . getDocument ( ) ;
5
6 // JDOM DOM i m p l e m e n t a t i o n
7 DOMBuilder b u i l d e r = o r g . jdom . i n p u t . DOMBuilder ( ) ;
8 Document d=b u i l d e r . b u i l d ( new F i l e I n p u t S t r e a m ( xmlURI ) ) ;
9 // i t ’ s o r g . jdom . Document n o t o r g . w3c . dom . Document !
10
11 // dom4j DOM i m p l e m e n t a t i o n
12 SAXReader r e a d e r = new o r g . dom4j . i o . SAXReader ( ) ;
13 Document document = r e a d e r . r e a d ( xmlURI ) ;
14 // i t ’ s o r g . dom4j . Document n o t o r g . w3c . dom . Document !
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
38. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Advanced DOM features I
DOM provides many advanced functionalities with modules
specified in standard (mainly level 3 modules). Some of them:
MutationEvents module provides methods for changes
listining
LS, LS-Async modules provides methods for various kinds of
serialization
Validation module provides methods for real-time validation
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
39. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Advanced DOM features II
It is important, while using specified API, to check what modules
and in what version are implemented. To do this, we can use:
1 boolean hasFeature ( String feature , String v e r s i o n ) ;
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
40. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Streaming API for XML - different approach
The third approach to processing XML data is based on idea to
treat incoming information, about events, as a stream.
Streaming API for XML use technique called pull parsing which
provides a sequential access to the document adapting iterator
design pattern. Associating this with java.util.Iterator is not
accidenatial, because part of API implements this interface.
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
41. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
StAX architecture
StAX in Java divides into two (theoretically) seperate APIs:
cursor API represented by XMLStreamReader and
XMLStreamWriter classes. Maintained as a fast and most
efficient solution.
event API represented by XMLEventReader and
XMLEventWriter classes. Regarded as a simple and and
flexible solution.
Both are specified in JSR173 and contained in javax.xml.stream.*
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
42. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Difference between SAX event-driven architecture
Common view as if StAX API is similar to SAX is wrong.
SAX architecture provides number of interfaces to handle incoming
events. StAX Event API provides methods for iterating through
event stream, and proper handling specific occurences.
Moreover StAX is symmetric Read/Write API which allows also
to modify and store elements.
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
43. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Basic class structure
1 /∗ C r e a t i n g r e a d e r s . . . ∗/
2
3 // c r e a t i n g i n p u t f a c t o r y
4 S t r i n g xmlURI = ” h t t p : / / e x a m p l e . o r g / n u t r i t i o n . xml ”
5 S t r i n g R e a d e r s r = new S t r i n g R e a d e r ( xmlURI ) ;
6 XMLInputFactory i f = XMLInputFactory . n e w I n s t a n c e ( ) ;
7
8 // c u r s o r API r e a d e r
9 XMLStreamReader c u r = i f . createXMLStreamReader ( s r ) ;
10 // e v e n t API r e a d e r
11 XMLEventReader e v e n t = i f . c r e a t e X M L E v e n t R e a d e r ( s r ) ;
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
44. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Identifying events I
Main issue while using StAX is how to identify event which has
just occured. There are many ways to do that, most simple is to
check the constant connected with an event (cursor API).
Constants are declared in XMLStreamConstants interface2 .
For example:
1 - START ELEMENT
2 - END ELEMENT
3 - PROCESSING INSTRUCTION
And so on...
2
https://java.sun.com/webservices/docs/1.5/api/javax/xml/stream/XMLStreamConstants.html
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
45. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Accessing elements by iterator I (cursor API)
1 s t a r t E l e m = XMLStreamConstants . START ELEMENT ;
2 // w h i l e t h e r e i s n e x t e v e n t
3 w h i l e ( cur . hasNext ( ) )
4 {
5 // c a t c h e v e n t t y p e
6 i n t eventType = cur . next ( ) ;
7 System . o u t . p r i n t l n ( evenType ) ;
8 // i f e v e n t t y p e i s START ELEMENT
9 // p r i n t e l e m e n t s t e x t c o n t e n t
10 i f ( e v e n t T y p e == s t a r t E l e m )
11 System . o u t . p r i n t l n ( c u r . g e t E l e m e n t T e x t ( ) ) ;
12 }
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
46. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Identifying events II
In event API identyfing events is a bit different. XMLEventReader
Provides methods:
1 XMLEvent n e x t E v e n t ( ) ;
2 boolean hasNext ( ) ;
So, to identify catched event, we must analyse XMLEvent object
returned from the first method. Once again there are few ways to
do that. Getting event type method can be called:
1 i n t getEventType ( ) ;
Or we can test if element is certain type, by one of “is“ methods.
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
47. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Accessing elements by iterator II (event API)
1 // w h i l e t h e r e i s n e x t e v e n t
2 w h i l e ( event . hasNext ( ) )
3 {
4 XMLEvent e = e v e n t . n e x t E v e n t ( ) ;
5 // i d e n t i f y e v e n t by c a s t i n g !
6 i f ( e instanceof StartElement )
7 {
8 // c a s t e v e n t t o s p e c i f i c e l e m e n t
9 StartElement se = ( StartElement ) e ;
10 QName name = s e . getName ( ) ;
11 // p r i n t e l e m e n t name
12 System . o u t . p r i n t l n ( name . g e t L o c a l P a r t ( ) ) ;
13 }
14 }
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
48. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Advanced iteration methods
Both StAX APIs provides more complex iteration methods.
1 XMLEvent nextTag ( ) ;
2 // o n l y i n XMLEventReader
3 XMLEvent p e e k ( ) ;
4 // o n l y i n XMLStreamReader
5 v o i d r e q u i r e ( i n t t y p e , S t r i n g nsURI , S t r i n g l o c a l N ) ;
First method moves cursor omitting events, until the start or end
of the element. Second allows to check next event before moving
cursor. And third compares cursor position with wanted value.
All methods are well documented and should reviewed by reader.
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
49. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
EventFilters and StreamFilters I
StAX API allows to create filtered readers. It’s not necessary to
create complex stream handlers to process specific events. Only
thing that should be done is implementing one (or both) interface
containing singular method.
Interfaces:
1 E v e n t F i l t e r ( extends XMLFilter )
2 S t r e a m F i l t e r ( extends XMLFilter )
Methods:
1 p u b l i c b o o l e a n a c c e p t ( XMLEvent e v e n t )
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
50. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
EventFilters and StreamFilters II
Implementing filter is simple:
1 p u b l i c c l a s s C h a r F i l t e r implements E v e n t F i l t e r
2 {
3 p u b l i c b o o l e a n a c c e p t ( XMLEvent e v e n t )
4 {
5 r e t u r n ( e v e n t . g e t E v e n t T y p e ( ) ==
6 XMLStreamConstants . CHARACTERS ) ;
7 }
8 }
Filter above will only react to characters elements.
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
51. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Writing elements I
StAX as a symmetric API providing I/O handling is able to write
XML data. It provides to interfaces to do that:
1 XMLEventWriter ( e x t e n d s XMLEventConsumer )
2 XMLStreamWriter
Basic difference between them, is that XMLEventWriter has less
functionalities.
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
52. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Writing elements II
1 // u s i n g XMLStreamWriter
2 OutputStream c o n s o l e = System . o u t ;
3 XMLOutputFactory o f = XMLOutputFactory . n e w I n s t a n c e ( ) ;
4 XMLStreamWriter sw = o f . c r e a t e X M L S t r e a m W r i t e r ( c o n s o l e ) ;
5 sw . w r i t e S t a r t D o c u m e n t ( ” 1 . 0 ” ) ;
6 // c r e a t e document w i t h one meal
7 sw . w r i t e S t a r t E l e m e n t ( ” n u t r i t i o n ” ) ;
8 sw . w r i t e S t a r t E l e m e n t ( ” f o o d ” ) ;
9 sw . w r i t e S t a r t E l e m e n t ( ”name” ) ;
10 sw . w r i t e C h a r a c t e r s ( ” C h o c o l a t e i c e cream ” ) ;
11 sw . w r i t e E n d E l e m e n t ( ) ;
12 sw . w r i t e E n d E l e m e n t ( ) ;
13 sw . w r i t e E n d E l e m e n t ( ) ;
14 sw . writeEndDocument ( ) ;
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
53. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
Writing elements III
1 // t h e same u s i n g XMLEventWriter
2 OutputStream c o n s o l e = System . o u t ;
3 XMLEventFactory x e f = XMLEventFactory . n e w I n s t a n c e ( ) ;
4 XMLOutputFactory o f = XMLOutputFactory . n e w I n s t a n c e ( ) ;
5 XMLEventWriter ew = o f . c r e a t e X M L E v e n t W r i t e r ( c o n s o l e ) ;
6 ew . add ( x e f . c r e a t e S t a r t D o c u m e n t ( ”UTF8” , ” 1 . 0 ” ) ) ;
7 ew . add ( x e f . c r e a t e S t a r t E l e m e n t ( n u l l , n u l l , ” n u t r i t i o n ” ) ) ;
8 ew . add ( x e f . c r e a t e S t a r t E l e m e n t ( n u l l , n u l l , ” f o o d ” ) ) ;
9 ew . add ( x e f . c r e a t e S t a r t E l e m e n t ( n u l l , n u l l , ”name” ) ) ;
10 ew . add ( x e f . c r e a t e C h a r a c t e r s ( ” C h o c o l a t e i c e cream ” ) ) ;
11 ew . add ( x e f . c r e a t e E n d E l e m e n t ( ) ;
12 ew . add ( x e f . c r e a t e E n d E l e m e n t ( ) ;
13 ew . add ( x e f . c r e a t e E n d E l e m e n t ( ) ;
14 ew . add ( x e f . createEndDocument ( ) ) ;
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
54. Introduction
XML API’s in Java
Capabilities and performance comparison SAX
CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM
What next? Alternatives to API’s, Java SE 7.0 features StAX
Summary
Further reading...
XmlPull
XmlPull is ancestor of StAX. Although StAX is a popular standard
for parsing XML data, XmlPull didn’t retire. Due to its lightweight
(JAR file - only 9 kB) XmlPull found applicable for devices with
limited memory. It is often used in developing mobile applications.
http://www.xmlpull.org/
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
55. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Comparing capabilities I
Developing applications processing XML data, always relates with
parser choice.
Selection of proper API is essential to success of the project.
Although choice is not an easy task. Before making decision, ask
yourself few questions:
What needs to be done (using parser)?
Is application platform-dependent? If so, what’s the platform?
Is it a distributed system?
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
56. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Comparing capabilities I
Developing applications processing XML data, always relates with
parser choice.
Selection of proper API is essential to success of the project.
Although choice is not an easy task. Before making decision, ask
yourself few questions:
What needs to be done (using parser)?
Is application platform-dependent? If so, what’s the platform?
Is it a distributed system?
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
57. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Comparing capabilities I
Developing applications processing XML data, always relates with
parser choice.
Selection of proper API is essential to success of the project.
Although choice is not an easy task. Before making decision, ask
yourself few questions:
What needs to be done (using parser)?
Is application platform-dependent? If so, what’s the platform?
Is it a distributed system?
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
58. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Comparing capabilities I
Developing applications processing XML data, always relates with
parser choice.
Selection of proper API is essential to success of the project.
Although choice is not an easy task. Before making decision, ask
yourself few questions:
What needs to be done (using parser)?
Is application platform-dependent? If so, what’s the platform?
Is it a distributed system?
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
59. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Comparing capabilities II
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
60. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Benchmarks I
Figures: From http://piccolo.sourceforge.net/bench.html
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
61. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Benchmarks II
Figures: From http://piccolo.sourceforge.net/bench.html
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
62. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Benchmarks III
Figures: From http://www.xml.com/lpt/a/1702
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
63. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Benchmarks IV
Figure: From: http://www.ximpleware.com/benchmark1.html
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
64. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
CASE STUDY
Parsing Really Simple Syndication documents
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
65. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
RSS definition
RSS is a family of Web feed formats used to publish frequently
updated content. An RSS document (which is called a ”feed“ or
”web feed“ or ”channel“) contains either a summary of content
from an associated web site or the full text stored as a XML. RSS
makes it possible for people to keep up with web sites in an
automated manner that can be piped into applications or filtered
displays.
Source: http://en.wikipedia.org/wiki/RSS
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
66. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
The initials ”RSS” are used to refer to the following formats:
Really Simple Syndication (RSS 2.0)
RDF Site Summary (RSS 1.0 and RSS 0.90)
Rich Site Summary (RSS 0.91)
While creating solution for reading/writing RSS documents we
must remember that, RSS is not a standard, and doesn’t have
XMLSchema doc descrbing it’s strucutre (or DTD)! Only
reference could be found on:
http://www.rssboard.org/rss-specification
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
67. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
The Code
Presenting jNivo RSS Exterior Plugin v.0.1
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
68. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Every previous presented API, can be thought as difficult to learn
and use. It’s partly true, XML APIs in Java have rather difficult
syntax, and hundreds of classes and interfaces, which should be
handled to process XML data.
Another thing is that, there are few standards:
javax.xml.stream.* (StAX, JSR-173)
org.w3c.dom.* (DOM standard)
org.xml.sax.* (SAX standard)
JAXP
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
69. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Mark Reinhold3 suggested different way of expressing XML in
Java language4 .
Built in: java.lang.String ”foo“
New type: java.lang.XML <foo> (syntax!)
New package: java.lang.xml.* (XML Literlas!)
3
Chief Engineer for the Java Platform, Standard Edition, at Sun Microsystems.
4
Java Technical Session 3441 (TS-3441)
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
70. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Proposed syntax I
Figure: From [3]
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
71. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Proposed syntax II
Figure: From [3]
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
72. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Much more...
Obviously new syntax is not just syntactic sugar, it helps improve
proper structure of the document, and prevent from wrong
instruction order.
Mark Reinhold proposed also:
datatype coders
collections
hybrid event/tree API
accessing by XPath
And more! His blog:
http://blogs.sun.com/mr/
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
73. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Three different approaches to XML parsing
SAX - keywords: event-based, callback model, fast, cannot
modify structure, interfaced based API
DOM - keywords: builds tree in memory, divided into
modules, rather slow, can generate and modify documents
StAX -keywords: pull parsing, events catched from stream,
consistent code!, can be used on mobile devices (XmlPull)
RSS parsing? Difficult to make decision about parsing model,
most efficient are already implemented APIs for example ROME
http://rome.dev.java.net
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
74. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Brett McLaughlin, Justin Edelson
Java & XML
O’Reilly Media, 3rd edition, 1 December 2006
Cay S. Horstmann, Gary Cornell
Core Java, Volume II — Advanced Features
Prentice Hall PTR, 8th edition, 7 April 2008
Mark Reinhold
Integrating XML into the Java Programming Language TS-3441
http://developers.sun.com/learning/javaoneonline/sessions/2006/TS-
3441/index.htm
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
75. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Jurgen Salecker
Hybrid Parser Architectural Pattern
http://developerlife.com/tutorials/?p=53
Various APIs documentation
For starters it’s good to search wikipedia...
Xerces 2 Java Parser http://xerces.apache.org/xerces2-j/
JAXP reference implementation https://jaxp.dev.java.net/
XOM - XML Object Model http://www.xom.nu/
JDOM - Java Document Object Model http://www.jdom.org/
StAX - Streaming API for XML http://stax.codehaus.org/
VTD - XML - new way of processing XML
http://vtd-xml.sourceforge.net/
AND OTHER...
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
76. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
Why?...
Questions ?
What if?...
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6
77. Introduction
XML API’s in Java
Capabilities and performance comparison
CASE STUDY: Parsing Really Simple Syndication (RSS) doc
What next? Alternatives to API’s, Java SE 7.0 features
Summary
Further reading...
THANK YOU
Wojciech Podg´rski http://podgorski.wordpress.com
o eXtensible Markup Language APIs in Java 1.6