This document provides a comparative study of three popular Java APIs for processing Resource Description Framework (RDF) data: Jena, JRDF, and Sesame. It examines how each API stores and handles RDF triples, programmer support features like documentation, usability, and integration capabilities. Performance tests show Jena and Sesame provide faster processing and iteration of triples compared to JRDF. Jena offers the most full-featured support for SPARQL queries while Sesame only supports its own query languages. Overall, the document finds Jena to be the best balance of functionality, performance, and usability out of the three APIs.
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Rdf Processing On The Java Platform
1. RDF Processing on the JAVA Platform – A
Comparative Study of Jena, JRDF and Sesame APIs
Adrian Iriciuc, Monica Irina Niculescu
Faculty of Computer Science, General Berthelot, 16, IAŞI 700483, ROMANIA
adrian.iriciuc@infoiasi.ro, monica.niculescu@infoiasi.ro
Abstract. This paper presents a comparative study of three of the most popular
APIs developed for the JAVA Platform that deal with RDF Processing: Jena,
JRDF and Sesame. We will take a comparative look at these APIs: the way they
store the RDF triples, the programmer support, performance, support for
SPARQL interrogations, and licensing. We will also offer concrete examples of
how to use these APIs by providing relevant source code snippets.
Key words: RDF processing, JAVA Platform, Jena, JRDF, Sesame, SPARQL,
comparative study
1 Introduction
The concept of semantic web has known a great development in the past few years
and it has become the most important and researched topic in the web world. What is
of concern today is the meaning of information and services that are available on the
web. And one of the most important steps in the evolution of the semantic web con-
cept is expressing the elements of the semantic web in formal specifications. For more
information about the semantic web please consult [1], [2] and [3].
One of the main sources that offers such formalization, with the intent to formally
describe concepts, terms and relationships, is RDF (Resource Description Frame-
work). RDF proposes a method to describe and model information based on triples..
The information is divided in simple sentences made from a subject, a predicate and
an object that form a rdf triple. Here is a simple example of a rdf triple that models the
sentence ”Jane Austen wrote Pride And Prejudice”.
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-
syntax-ns#" xmlns:lib="http://www.myResource.com/">
<rdf:Description about=" Jane Austen ">
<lib:wrote> Pride And Prejudice </lib:wrote>
</rdf:Description>
</rdf:RDF>
For more information about please consult RDF [4,5].
2. 2 General Descriptions of the APIs
In this section we will take a look at three of the most popular frameworks and libra-
ries that deal with processing RDFs: Jena, JRDF and Sesame. All of them offer sto-
rage of RDF triples, ways to manage these triples (create, add, remove, modify, ite-
rate) and methods to query the RDF data.
We can look at a RDF element as a graph with two nodes (the subject and the ob-
ject) and an arc (the predicate). All the APIs presented here are based on this repre-
sentation. In this section we will take a quick look and these three APIs.
2.1 Jena
Jena is a framework for the JAVA platform that is used to build semantic web appli-
cations. It offers support for RDF, RDFS, OWL and SPARQL. It includes: a RDF
API, reading and writing RDF in RDF/XML, N3 and N-Triples, an OWL API, persis-
tent storage and a SPARQL query engine. The framework also contains a rule-based
inference engine.
The graphs that such a modeling of information produces are named in Jena mod-
els. The user creates such models, empty at first, and then he creates resources (sub-
jects) and properties (objects).
Each arc in a RDF Model is named a statement. Such a statement is made of three
parts: the subject (the resource from where the arc leaves – Jane Austen), the predi-
cate (the label of the arc – wrote) and the object (the resource pointed by the arc –
Pride and Prejudice).
This framework works with the following notions: nodes (representing a resource),
Dublin Core (metadata about web resources – see [6]), literals (strings that represent the
value of a property), subject (a resource – part of the triple), object (part of the triple), predi-
cate (part of the triple), property (attribute of a resource), resource (some entity), statement
and triple.
The API offers methods for working with models, triples and queries, amongst other func-
tionalities. For detailed information about the Jena API consult [7, 8]. More about working with
triples in Jena in Section 4.
2.2 JRDF
JRDF is a library in JAVA that offers base implementations of RDF concepts by
creating a set of standard APIs. It provides features like: a graph API (for graph oper-
2
3. ations like comparisons or set based operations), creation of graph elements (state-
ments, resources, nodes), triples storage, IoC support, RDF data types and SPARQL
querying handling. One important feature that JRDF does not yet contain is inferenc-
ing.
The API works with the general notion of graphs that model de information. There
are two main factories that facilitate working with triples: Elements Factory and
Triple Factory. Element Factory can create elements like URI references, literals,
nodes and resources. The Triple Factory offers the possibility to quickly execute fre-
quent operation like adding RDF data. The user can add data without creating indi-
vidual nodes and he can create RDF collections and containers.
For detailed information about the JRDF API consult [9, 10]. More about working with
triples in JRDF in Section 4.
2.3 Sesame
Sesame is a JAVA based framework for storing, inferencing and querying RDF data.
For modeling the data Sesame offers two methods: usage of a repository (through the
Repository API) and a graph (through the Graph API). Methods for creating, access-
ing and modifying (adding, deleting data) repositories and graph and also querying
methods are made available to the user through these APIs. Graphs can be added to a
repository or deleted from one. Statements can be added to a graph. These statements
represent a triple consisting of a subject, predicate and object. For querying RDF data,
Sesame doesn’t support SPARQL, but only their own query languages: RQL and
SeRQL
For detailed information about the Sesame API consult [11, 12, 13]. More about working
with triples in Sesame in Section 4.
3. Storing the Triples
Storing the triples directly affects the performance. Also, no matter what is the chosen
way for storage, the programmer doesn’t need to have any knowledge of it. A power-
ful and flexible API must provide a way to work with a RDF graph model.
3.1 Jena
The concept that bases Jena architecture is to provide multiple and flexible presenta-
tions of RDF graphs to the programmer while using a simple method for storing
triples. The interface for storing these triples contains three operations: add statement
to database, delete statement from database and find statement.
The most common implementation is using a relation database with a single table
(called a “triple-store”). Each statement is a row having a column for subject, a col-
umn for predicate and a column for object.
For more information see [24, 25].
3
4. 3.2 JRDF
JRDF is using some existing libraries for storing and manipulating the triples [23]. Its
implementation incorporates the following libraries to obtain the best results: Jena,
Aquamarine, Sesame, Sergey and Melnick's RDF API.
JRDF uses NTriples files to store the information for all graphs and its architecture
is based on an interface providing methods for adding and removing graphs. Each
graph adds three statements to the file (name, id, type), and based on this information
there can be paths created to another file containing the triples for the wanted graph.
3.3 Sesame
For storage Sesame is using a Database Management System. But for independence
with the DBMS all specific code is concerned in a single architectural layer called
Storage and Interface Layer (SAIL). SAIL is an API that translate client methods for
RDF into specific DBMS methods.
So the architecture for Sesame is this (simplified) – the following picture was taken
from [13]:
Query module consists of two steps of processing the query:
1. Parse the query and obtain a query model;
2. Optimize the query model and obtain an optimized model.
For more information see [13].
4
5. 4. Working with Triples
In this section we will take a look of the classes and methods that these APIs provide
in order to work with triples (add, remove, iterate). We will provide relevant source
code snippets for a better understanding.
4.1 Jena
The first step in working with triples in Jena is to create a model. This is done using
the ModelFactory.
protected Model _model;
_model = ModelFactory.createDefaultModel();
To add a new triple you first have to create a Resource and then set its property.
Resource resource =
_model.createResource(“www.myOrg.org/JaneAusten”);
resource.addProperty(VCARD.FN, “Jane Austen”);
VCARD.FN is a constant of type Property.
To remove a property call one of the remove methods:
resource.removeAll(VCARD.FN);
To iterate through the triples of a model the user has to list the statements and iterate
through them:
public void iterate() {
StmtIterator iter = _model.listStatements();
while (iter.hasNext()) {
Statement st = iter.nextStatement();
//Process the triple/statement
}
}
5
6. 4.2 JRDF
The first step in working with triples in JRDF is to create a factory and a graph from
that factory and an element factory. This is done using the JRDFFactory and GraphE-
lementFactory.
protected GraphElementFactory elementFactory;
protected Graph graph;
protected JRDFFactory jrdfFactory;
jrdfFactory = SortedMemoryJRDFFactory.getFactory();
graph = jrdfFactory.getNewGraph();
elementFactory = graph.getElementFactory();
To add a new triple you first have to create the corresponding resources and then add
them:
URIReference s = elementFactory.createURIReference
(URI.create("urn:JaneAusten”));
URIReference p = elementFactory.createURIReference
(URI.create("urn:wrote”));
URIReference o = elementFactory.createURIReference
(URI.create("urn:PrideAndPrejudice”));
graph.add(s, p, o);
To iterate through the triples of a graph the user has to find all the triples and iterate
through them:
public void iterate() {
ClosableIterator<Triple> triples =
graph.find(AnySubjectNode.ANY_SUBJECT, Any-
PredicateNode.ANY_PREDICATE, AnyObjectNode.ANY_OBJECT);
while (iter.hasNext()) {
Triple t = iter.next();
//Process the triple
}
6
7. 4.3 Sesame
The first step in working with triples in Sesame is to create a value factory and a
graph.
protected Graph _graph;
protected ValueFactory _factory;
_graph = GraphImpl();
_factory = _graph.getValueFactory();
To add a new triple you first have to create the corresponding resources (URLs and
Literals) and then add them:
URI s = _factory.createURI("urn:", “JaneAusten”);
URI p = _factory.createURI("urn:", “wrote”);
Literal o =
_factory.createLiteral(“PrideAndPrejudice”);
_graph.add(s, p, o);
To iterate through the triples of a graph the user has to call the iterator() method:
public void iterate() {
Iterator iter = _graph.iterator();
while (iter.hasNext()) {
Object obj = iter.next();
//Process the triple
}
}
5. Programmer support
One of the most important aspect when considering using a framework or a library is
the programmer support that it offers. In this section we will analyze how well these
frameworks help the user achieve his goals easily and rapidly. We will look at aspects
like documentation, integration and usability.
7
8. 5.1 Documentation
All three APIs have documentations that are meant to provide information about the
classes and methods that are available, as well as how to use them.
Jena offers tutorials about the Jena API and SPARQL, Javadocs (Jena API and
SPARQL), persistence systems documentation (SDB for RDF and TDB for OWL;
RDB documentation is also available, but this system is deprecated), Ontology API
documentation and some other information. Out of the three, it is the best documented
framework (content-wise and structure-wise). For more information consult [8].
JRDF offers a few wiki pages to help the user get started with the library, that con-
tain a few examples about the JRDF API and SPARQL and some other information,
as well as Javadocs. Out of the three it has the most unorganized documentation, as
well as too little information and tutorials about how to use the API. For more infor-
mation consult [10].
Sesame offers a user manual, an user installation guide, a SeRQL manual, RQL tu-
torial, Javadocs, user and system documentation and some other information. For
more information see [12].
5.2 Integration
All three frameworks are easy to integrate with an application. We need to download
the specific libraries (.jar files) that are provided on the home web sites of each
framework and add them to our project.
5.3 Usability
In general all three APIs are fairly easy to use for the basic operations but due to the
simple and well documented API that Jena provides, it is the easiest one to learn and
use. The hardest API to use is JRDF due to its lack of a consistent documentation.
6. SPARQL Interrogations
It’s important for a RDF library to provide an easy way for interrogations. There may
be many query languages and every library might have an internal way for querying,
but one must not forget the meaning of RDF: standard semantic over the web (for de-
scription of resources, of course). That’s why the most used query language for RDF,
which is SPARQL, must have full support (and it’s also the recommendation of W3C
[19]). In this section we will take a look at how the three APIs deal with SPARQL in-
terrogations.
6.1 Jena
Jena offers full support and easy API for SPARQL. Having the RDF document in
_model variable it can be easily execute a query for a given sting:
8
9. Query query = QueryFactory.create(queryString);
QueryExecution qe = QueryExecutionFactory.create
(query, _model);
ResultSet results = qe.execSelect();
For more information see [7].
6.2 JRDF
Using SPARQL with JRDF is also easy, though it might seem harder to understand
the logic behind the API. The document is contained in the graph variable, but you
need a connection to a SPARQL engine:
Answer answer = jrdfFactory. getNewSparqlConnec-
tion().executeQuery(graph, queryString);
For more information see [10, 22].
6.3 Sesame
As it says in official documentation, Sesame does not offer any support for SPARQL
or other query language except for RQL and SeRQL [20, 21].
7. Performance
In this section we will analyze the three frameworks from the performance point of
view. For this we performed the following tests:
-we randomly generated 100 000 triples and added them to the graph in order to
build a rdf document for each library;
-we iterate through every triple in the graph;
-we analyzed the creation/add and iteration speed, memory allocation and
SPARQL query speed.
7.1 Processing/Adding Triples Speed
After running our tests we obtained the following results:
Jena : 596 ms
JRDF : 2835 ms
9
10. Sesame: 841 ms
Jena and Sesame are close; Jena is a little bit faster, but JRDF is much slower.
7.2 Memory Consumption
After running our tests we obtained the following results:
Jena memory usage:5812 kb
JRDF memory usage:128732 kb
Sesame memory usage:15748 kb
Again Jena is better and this time almost 3 times better than Sesame. JRDF uses much
more memory than Jena, almost 20 times, and 7 times more than Sesame. So JRDF is
a lot slower and uses a lot more memory.
7.3 Interrogation Efficiency
After running our tests we obtained the following results:
Jena : 653 ms
JRDF : 261 ms
This time the extra memory usage is giving results, so JRDF using 20 times more
memory to query 2.5 times faster. With small documents and many queries to execute
in parallel this might be good. For Sesame there is no SPARQL support, so no results.
7.4 Iteration speed
After running our tests we obtained the following results:
Jena : 81 ms
JRDF : 2622 ms
Sesame: 9 ms
Sesame is the fastest this time. Sesame and Jena are really fast for the 100000 triples
that they have to iterate through. With more than 2 seconds latency JRDF is much
slower.
8 Licensing
In this section we will take a look at the licensing terms that are used to distribute
these APIs.
10
12. Neither the name of the copyright holder nor the names of its contributors may be
used to endorse or promote products derived from this software without specific prior
written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
OF SUCH DAMAGE.
9 Conclusions
This article was meant to provide a comparative look at three of the most important
RDF processing APIs for the JAVA Platform: Jena, JRDF and Sesame. We took a
look at the provided APIs, we studied how the triples are stored in each case, how to
work with the triples; we have compared the APIs from the programmer support they
offer and also we have looked into some performance aspects (speed, memory, inter-
rogation efficiency). Support for SPARQL queries was another element we analyzed.
Last we presented the licensing terms under which the APIs were created. We hope
the reader finds useful information about these API and that this article provided him
with sufficient information to help him decide which API would be based to use.
References
[1] http://en.wikipedia.org/wiki/Semantic_Web
[2] http://www.w3schools.com/semweb/
[3] http://semanticweb.org/wiki/Main_Page
[4] http://en.wikipedia.org/wiki/Resource_Description_Framework
[5] http://www.w3.org/RDF/
[6] http://dublincore.org/
[7] http://jena.sourceforge.net/index.html
[8] http://jena.sourceforge.net/documentation.html
[9] http://jrdf.sourceforge.net/index.html
[10] http://jrdf.sourceforge.net/documentation.html
[11] http://www.openrdf.org/index.jsp
[12] http://www.openrdf.org/documentation.jsp
12
13. [13] Jeen Broekstra, Arjohn Kampman, Frankvan Harmelen, “Sesame: A Generic Architecture
for Storing and Querying RDF and RDFS Schema”
(http://www.openrdf.org/doc/papers/Sesame-ISWC2002.pdf)
[14] http://jena.sourceforge.net/license.html
[15] http://www.apache.org/licenses/LICENSE
[16] http://www.gnu.org/licenses/lgpl-2.1.html
[17] http://www.opensource.org/licenses/bsd-license.php
[18] http://www.openrdf.org/license.jsp
[19] http://www.w3.org/TR/rdf-sparql-query/
[20] http://www.openrdf.org/doc/sesame/users/ch06.html
[21] http://www.openrdf.org/doc/rql-tutorial.html
[22] http://code.google.com/p/jrdf/wiki/RelationalSPARQLOperations
[23] http://docs.mulgara.org/system/jrdf.html
[24] http://jena.sourceforge.net/DB/layout.html
[25] E. Bodéré P.Y. Clément, A. Genoud M. Le Trocquer, M. Moras L. Pochard, V. Ribaud Ph.
Saliou, “INTEGRATING THE JENA RDF API WITHIN SAKAI : TOWARDS A
SEMANTIC COLLABORATING LEARNING ENVIRONMENT”
13