OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
Short Report Bridges performance gap between Relational and RDF
1. Bridge Performance Gap Between
Relational and RDF
Muhammad Akram Abbasi Dr. Syed Saif-ur-Rahman
Computer Science department Szabist Computer Science department Szabist
Karachi, Pakistan Karachi, Pakistan
asslam_alikum2002@yahoo.com saif.rahman@szabist.edu.pk
ABSTRACT
A fascinating question which is to get greatest and
appropriate consequence from querying the
published HREF links on the web of documents are
not comprehensible by using search engines along
with advanced optimized options as well to find
pages instead of just browsing like as navigation vs.
Integrated and syntactic web of data is closed world
assumption and it has very extensive unstructured
data which is linked with means. This paper
proposition an inkling of the two types of web of
information one for the syntactic and other one of
the semantic with the entire comprehensible
necessity and feasibility of description and will be
quick intro what is RDF moreover we will provide a
recent description logic research queries will be
checked as recursive (drill-up or drill-down) with
RDF native query languages will be elaborated with
semantic models, So that it does not only target as
respective drawbacks of syntactic web of structured
and semi-structured web but also important
aspects of the RDF model and RDF notation.
Keywords: RDF, Relational, Semantic Web,
Syntactic Web, Jena, Virtuoso, URI, SPARQL,
XML, un-structured, linked data.
1. INTRODUCTION
The Objective of Research is to take same data
set(s) and transform into both RDF and SQL and
check through same queries of performance
comparison on both SPARQL and SQL regarding
throughput and response time along with data size
and achieve performance gap. This paper
proposition an inkling of the two types of web of
information one’s about the syntactic and other one
is about of the semantic with the entire
comprehensible necessity and feasibility of
description and shall be quick intro what is RDF and
what is it good for? Along with basic concepts of
Resource, Properties, values, triples, statements
triples, URIs and URIref with serializations of RDF
graph and along with spanning the performance
gap between relational and RDF data
management, It depicts that how linked data
between two resources and real world objects and
what is an ontology mean semantic web of
vocabulary and their alternative of stack mappings
of the semantic and syntactic web of data how
those stores data in (JENA, SEASAME, RDF BD,
RED LAND, KOWARI, FORTH RDF SUITE, YARS,
VIRTUSO) But most of them here will be simply
used and configured Jena and Virtuoso for
SPARQL query and in regards Relational
Databases ( SQL, ORACLE, SQL LITE, MYSQL)
here will be configured solely MySQL for query
checking. In Methodology chapter will be targeted
as to get same data set which is almost sized as
100M, and then convert that data set into with
respective of SPARQL and SQL usage afterwards
store data and check throughput time on both one.
After checking throughput than we check response
time along with data sizes but being that response
time we need respective data for the same queries
to determine the performance and along with added
indexed at both RDF data for SPARQL queries
(JENA, VIRTUSO) and Relational data for SQL
queries (MYSQL) .
2. LITERATURE REVIEW
2.1 Semantic Web Concept
Regarding conventional (Web of
documents/pages), World wide web consortium is
assisting to organize or built a technology to help a
(WEB of Data) and according to Tim Berns lee he
does scoped respective data named as semantic
web which refers by W3C as a visualization of the
(Linked data). The semantics has connected with
the meanings of words Statements are built with
syntax rules, and relationships will be linked
between data, things , resources but not among
pages on the Semantic Web , it refers to the
relationships between things such as: C has part of
B and X has part of Z and properties such as: size,
weight.
2.2 RDF Concepts with distinct perceptions
It was formerly standardize and created in 1999
specially purpose was as XML for encoding
metadata exactly as (data about data) after the
modernized RDF specification in 2004, the scope
of RDF has really turned into something better than
before. The most thrilling uses of RDF are
modernized not just as encoding information but
regarding relations between things, between Web
of resources, between real world objects, concepts,
places, etc.,
2.3 Most of the key concepts uses of RDF
are as
Graph data model ,
Vocabulary based as URI
Data-types
Literal(s)
2. Serialization syntax of XML
Simple facts Expression
Entailment
2.3.1 Graph Data Model
A Collection of triples in RDF each one consists of
(A subject), (A predicate and an object) a set of
such triples called RDF graph. That can be depicted
by as a node and directed arcs diagram with a link
RDF graph mostly it is conjunctions of (Logical
AND) statements contains of all triple.
2.3.2 URI-based vocabulary
RDF uses the URI (uniform resource identifier) and
how we identify things on the web since RDF is
conceptually with basic triples or with Notations not
it is a Syntax so we do know already that URL
(Uniform Resource Locator) is like
(http://www.dbpedia.org) of course not all URLs are
URIs but the question is that how systems identify
things through a web client agent over URI.
2.3.3 Data types
Data type consists the illustration of data as a
floating points, integers, date(s) and also includes
as a valuable space, comprises of (lexical space)
and a (lexical to value) mapping.
2.3.5 Simple Facts of RDF Expression
RDF triple depicts the relationship between two
stuffs or things and also new blank node may have
read: type of property.
Figure 1. Facts of RDF Expression.
2.3.6 Entailment
The entailment formal concept is expression as A
is Said to be involved with an another expression B
If both of the arrangement of things are possible in
the domain then it make A true to be so A is
Presumed then the truth B is inferred.
Such as: in figure 1 more triples will be added in
RDF graph.
2.4 OWL - Your Web Thesaurus
The OWL term on the semantic web is used as a
richer description of the vocabulary of the language
it proper classes and ties relations between
(disjointness) classes as finality (exactly one) and
equality, characteristics of properties such as:
(symmetry), enumerated classes and richer type of
property.
2.5 Comparing RDF and SQL data
Initially we compare SQL Queries and structure
with RDF Queries and see the difference but before
that we understand the terminology that what is
what. Both of languages give access to user can
combine , Create consume structure data, as SQL
does this in relational databases to access and
RDF does this through a network of associated data
(Using SQAPRQL can be done this) linked data can
be disparate and merged source of data. Unlike
semantic web of data In Relational part of data it is
made up of rows (composed into Objects) which
mostly called in the terminology of RDBMS as
relations. Rows of data authorize to a set of data
types and constraints by using schema generated
for respective tables and subset called DDL which
asserts that schema. How it works in SQL let see in
the example
2.6 Structure of SPARQL and SQL Queries.
Table 1. Structure of SPARQL and SQL Queries.
SQL SPARQL
Simple Select attribute list
SELECT
u.father_name, a.city
FROM USERS AS u,
address AS a
WHERE U.address =
a.ID AND a.state =
`CHICAGO`;
SELECT ?name ?city
WHERE{
?Who
<USERS#father_name
> ?name ;
<
USERS#address >
?adrr .
?adrr <
Address#city > ?city ;
< Address#state
> `CHICAGO`
}
LEFT OUTER JOINS
SELECT
u.father_name, a.city
FROM USERS AS u
LEFT OUTER JOIN
Address AS a
ON (u.addr = a.ID)
WHERE a.state =
`Chicago`;
SELECT ?name ?city
WHERE {
?who <
Person#father_name
> ?name.
OPTIONAL{
?who <
Person#addr > ?adr.
?adr <
Address#city > ?city;
< Address#state
> `Chicago`
}
}
father
_nam
e
state cit
y
Jason
Muxlo
w
CHI
CAG
O
U
S
A
Peter Chic
ago
N
UL
L
?fathe
r_nam
e
?stat
e
?c
ity
Jason
Muxlo
w
CHI
CAG
O
U
S
A
Peter Chic
ago
Now we checked in Table#1 that in the SQL query
state that it has a same SELECT statement as in
SPARQL in SQL In SQL conceptually Selecting a
list of attributes from the table and in where clause
constraints capture relationship as U. address = a.
ID and selection criteria is to choose specific states
of USA like a. state = `CHICAGO`;
3. It shows terminator on the last of Query but in
SPARQL has terminator with respective statements
SQL query has concatenation with dot and it is in a
SPARQL show with Question marks also SQL
query does not add tags in it as like in SPARQL but
rather than that worse or better SPARQL reuses
some key words FROM, WHERE, SELECT,
GROUP BY, UNION, HAVING and Aggregate
function names too.
2.6.1 LEFT OUTER JOIN and OPTIONAL,
NULL
In SQL it uses Null to identify that data is not
applicable or not available most of joins like INNER
join does not consider the NULL values it mean in
INNER join NULL values of data will not be
retrieved but in LEFT join it also shows NULL
values in the left table of data and it does not
eliminate those columns of rows SPARQL uses
keyword OPTIONAL as the place of the SQL LEFT
OUTER JOIN and in SPARQL it will not bind
missing data.
2.7 SQL - SPARQL Mapping using SPASQL
SQL language is for querying relational data
SPARQL is not designed to query relational data,
but to query data as a graph-based on the data
model. RDF links built into it whereas the SQL
query explicit primary and foreign key but instead of
that SPARQL does as an implicit query both of SQL
and SPARQL Queries can be tested on SPASQL it
has the third tool for checking the structure of
queries.
Table 2. SQL, SPASQL, Status.
SQL SPASQL Status
Fields/attributes RDF triple
Row/tuple Node
foreign key /
primary key
data encoding
detail by query
indexes late-binding field
name
SELECT SELECT implemented
SELECT
COUNT(*) > 0
ASK not
serialize RDF
graph/triple
patterns
CONSTRUCT not
serialize RDF
graph
CONSTRUCT not
tuple with
attribute
corresponding
to p
s p o data
model
implemented
WHERE FILTER implemented
LEFT OUTER
JOIN
OPTIONAL
pattern
implemented
UNION UNION partial, see
UNION
Limitations
named
databases and
federated query
named
graphs
not
return tuple
identifier
DESCRIBE
Table Result Modifiers
DISTINCT DISTINCT implemented
ORDER BY ORDER
BY/Groups
implemented
LIMIT LIMIT implemented
OFFSET OFFSET implemented
Operators
same || && + - * / <
< = > >=
Implemented
IS NOT NULL BOUND Implemented
isIRI N/A
isBlank N/A
isLiteral N/A
Str N/A
lang N/A
datatype not a dynamic
question
langMatches N/A
regex regex not
3. METHODOLOGY
Initially we took some open source data set(s)
those were in format of Excel sheet and also in xml
format we converted data through BSBM data
generator [20] which has open source software to
generate data and supports (N-Triples -snt, XML -
s xml, (My-)SQL dump -s sql) formats, it has based
on java language. But collected data was just in
25M limits size so we need more than that to
benchmark therefore we explored and discovered
some free open source data set(s) those which
were sizes as 100M [20]. After that we got 10
Queries from Berlin SPARQL [19]. For RDF triple
store data set but here we need also SQL same
Queries to need to be checked of MySQL results so
we converted all 10 queries into SQL query format
and then configured software MYSQL with
assigned manually upload_max_filesize 700M /file
size, post_max_size 800M, max_execution_time
700s, max_input_time 600, memory_limit 200M
and then also configured Jena as bin/ directory path
in an environment variable of Windows system and
as well as following commands in CLI mode.
We took RDF data set(s) formatted and checked
with SPARQL queries through both Jena and
Virtuoso
We did run small sized data set as 50k, 250K, 1M,
5M, 25M but as growing data sizes of data sets
Jena was getting too much time and on the 100M
Jena was not applicable to respond therefore We
did run 100M at Virtuoso and it has better result in
huge data than Jena.
We executed different data set sizes took first small
50k sized and counted average Query time
execution and checked the same query and same
data set of performance at both Jena and MySQL
and got the statistics After that We got 250k data
sizes and 1M data sizes and then 5M data sizes ,
25M data sizes but here we got a problem in 25M
of sizes data to run on a MySQL interface of Local
host of phpMyadmin of MySQL got the error to
responding and execution time exceeded than we
run same Query on SqlYog interface but it was
talking too much time and didn’t respond and looks
loading time out after that we decided to check on
MySQL console directly than same query was
responding good after that we decided to take all
MySQL queries once again and check through
MySQL console because interface results were so
slow than We tried MySQL console here looks
results were better than before and eventually we
pulled data set of 100M into MySQL and We
checked also throughput statistics data was huge it
was calling for a long time and showing error of
4. exaction time exceed and then we divided it into
different sections and then imported to it and
assigned indexes too
product(producer),offer(product),offer(vendor)
Review (product) and review (person) tables
afterwards checked Queries results.
4. Schema Normalized/Demoralized of
Jena
5. MAP CONVENTIONAL XHTML WITH
RDF
So we try to understand how RDF data simulate
with XHTML (Extensible Hypertext Markup
Language). Just like with human understands
concept foaf (Friend –of- a- friend) vocabulary as
Figure 4. RDF simulate with XHTML.
Let the browser know how it understands in
XHTML
< Body xmlns: foaf=`http: //xmlns.com/foaf/0.1` >
< span typeof=`foaf: person` property=`foaf:
name` > Jason Muxlow < /span >
< span about=`#peter` typeof=`foaf: person`
property=`foaf: name` > Peter Hernandez < /span
>
< span about=`#jason` rel=`foaf: knows`
resource=`#peter` > Knows < /span >
< /body >
5.1 Map conventional Html vs. RDF
RDF has a means for data whereas HTML is made
up of link among or between pages or documents.
RDF data are targetly made to standardize the web
of data which ought to be linked with data and
HTML published documents are standardize as a
to be designed tags but which cannot be able to
understand the document data just it shows how it
should be shown unlike RDF web page of data.
5.2 Map conventional XML vs. RDF
RDF of data is shown as graph data model that
makes use of URI(s) whereas XML is made for data
about data and it has tree data model and it doesn’t
care about the URIs.
6. RESULTS
MY SQL DUMP Data set size
Table 3. MY SQL DUMP Data set size.
100M 25M 5M 1M 250K 50k
3.2
GB
1.06
GB
212.4
MB
41.4
MB
10.3
MB
2.0 MB
Load TIME
Table 4. Load time Mysql.
100M 25M 5M 1M 250K 50k
1129 213 49 17 7 0.9
N-Triples Data set size
Table 5. Triples Data set size.
100M 25M 5M 1M 250K 50k
5.1
GB
1.2
GB
249.8
MB
49.8
MB
12.4
MB
2.4
MB
Overall Query Execution Time
Table 6. Query Execution Time of SPARQL.
100M 25M 5M 1M 250K 50k
5.1
GB
1.2
GB
249.8
MB
49.8
MB
12.4
MB
2.4
MB
Running mixes queries against different stores than
we
took over all results of time (in seconds). And we
got better performance among them those are the
highlights as bold.
Table 7. Over All results.
Data
set
Size
MySQL Jena Virtuoso
50K 66.590 23.540 162.040
250K 153.550 72.968 162.807
1M 484.534 268.004 201.3100
5M 2188.176 1406.690 476.8010
25M not
applicable
7623.962 2089.122
100M not
applicable
not
applicable
906.683*
7. DISCUSSION
We checked in Relational databases (MYSQL) that
when we stored of big data sometimes execution
time exceed or not applicable and then we sliced
data into small chunks of data and imported for
throughput and at the time of query response at big
data used joins but could not retrieved data and got
sometimes error or so it was not applicable
although we indexed on primary key(s) columns but