Employing Graph Databases as a Standardization Model towards Addressing Heterogeneity
1. Employing Graph Databases as a
Standardization Model towards
Addressing Heterogeneity
Dippy Aggarwal and Karen C. Davis
University of Cincinnati
Cincinnati, Ohio
IEEE 17th International Conference on
Information Reuse and Integration
July 28-30, 2016, Pittsburgh, USA
2. Agenda
Employing Graph Databases as a
Standardization Model towards
Addressing Heterogeneity
Motivation and Challenge
Our Proposed
Approach
Results and Future
Work
A Short Example Architecture Novelty
3. Integration of data from multiple sources lays foundation for building
rich and effective analytics systems.
Schema heterogeneity has been perceived as a major
challenge towards data integration and exchange for more
than two decades.
4. Proliferation in data models
Relational databases
de-facto standard for
decades
RDF databases
standard for linked data
NoSQL family of data models
“Map/Reduce is a great hammer but not everything is a nail” –
Benjamin Hindman (Co-Founder and Chief Architect at Mesosphere)
F. O¨ zcan, N. Tatbul, D. J. Abadi, M. Kornacker, C. Mohan, K. Ramasamy, and J. Wiener. Are we experiencing a big data bubble? In
Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14, pages 1407–1408, New York, NY,
USA, 2014. ACM.
Our vision: It would be useful to have
an approach that allows leveraging both schema-based and
schemaless data stores.
+ NoSQL
5. Our research question
Given the the unique advantages possessed by
different classes of data stores, how can we bring
them together under a homogeneous
representation?
Image Credits: http://www.slideshare.net/jexp/intro-to-neo4j-presentation
7. Why graphs?
1. A simple and flexible abstraction for modeling artifacts of different kinds
Facebook Open Graph
Trends in databases
2. Attracting significant attention and interest in the past few years
8. Leveraging Neo4j for graph
implementation
Nodes and
relationships can
have properties
(key-value pairs)
Image Credits: Exploiting RDF Open Data Using NoSQL Graph Databases” – R. Bouhali and A. Laurent
9. Example of schema and data
model heterogeneity
Relational
schema excerpt
RDF excerpt
10. Addressing schema heterogeneity challenge
Relational schema excerpt
Neo4j
representation
Key-value
properties for
a node –
Jason Doe
11. Graph Representation for the
RDF Schema Excerpt
What is the additional merit that the common graph representation offers
compared to the knowledge that could have been derived from the native
model representations?
Name, homepage,
gender, birthday etc.
12. Advantage of graph model towards unification
By unifying them based on common attributes such as date of birth or
SkypeId each of the nodes can benefit by incorporating information from the
other schema.
Maps_With
13. “Exploiting RDF Open Data Using NoSQL Graph
Databases” – R. Bouhali and A. Laurent
R. Bouhali and A. Laurent. Artificial Intelligence Applications and Innovations: 11th IFIP WG 12.5 International Conference, AIAI 2015,
Bayonne, France,September 14-17, 2015, Proceedings, Exploiting RDF Open Data Using NoSQL Graph Databases, pages 177–190.
Springer International Publishing, Cham, 2015.
Data expressed in RDF RDF mapped to a property graph
Limitations: focus on converting only RDF data into a graph model whereas we envision
an extensible approach that embraces model diversity by allowing multiple models.
Novelty of our model: native model’s concept-preserving characteristic.
14. Architecture of our approach
Employs our
transformation
rules.
Export user defined
relational schemas in
a CSV format
15. Evaluation
Evaluation metrics (proposed by
Bouhali et al.)
Conciseness: The total number of nodes and
relationships and can be used to calculate the
graph size.
Connectivity: is calculated by dividing the
number of relationships with the total number of
nodes.
Sakila database in MySQL
Bouhali et al. – connectivity should be at least 1.5
Our results reflect a value (0.32) lower than the benchmark. Why
so? Sakila database: https://dev.mysql.com/doc/sakila/en/
16. Evaluation - trade-off between
conciseness and connectivity
Modeling
attributes
as nodes
Increased
conciseness
17. Evaluation metrics - trade-off between
conciseness and connectivity
Conclusions:
• The connectivity depends on the nature of original model
• A higher connectivity may come at the cost of an increase in the graph size.
Strong connectivity between nodes in a graph certainly is good for processing but
it also does not automatically lead to the conclusion that a lower number is not
desirable.
Increased
conciseness
18. Contributions
• An idea of employing graph databases as a means of
bridging the gap between schema-based and schemaless
data stores.
• A concept-preserving yet integrated graph model that
addresses the model heterogeneity and carries the
potential for handling the variety dimension of the big data
landscape.
• A proof-of-concept that illustrates the potential of
graph-based solutions towards addressing diversity in
data representations.
• A software-oriented, automated approach to transform
relational into a graph database.
19. The Path Forward
1. Extending our work by incorporating additional data
stores and illustrating integration.
2. Incorporate an evaluation study of the transformation
process to address the efficiency of the approach.
3. A performance study of querying an integrated graph
schema versus disconnected original native schemas is
another research direction.
4. The idea of reverse engineering the graph model to
obtain the schemas in the original models can also be
useful.
20. Selected References
• P. Atzeni, P. Cappellari, and P. A. Bernstein. Modelgen:Model
independent schema translation. In Data Engineering, 2005. ICDE
2005. Proceedings. 21st International Conference on, pages 1111–
1112. IEEE, 2005.
• R. Bouhali and A. Laurent. Artificial Intelligence Applications and
Innovations: 11th IFIP WG 12.5 International Conference, AIAI 2015,
Bayonne, France, September 14-17, 2015, Proceedings, chapter
Exploiting RDF Open Data Using NoSQL Graph Databases, pages
177–190. Springer International Publishing, Cham, 2015.
• S. Bowers and L. Delcambre. The uni-level description: A uniform
framework for representing information in multiple data models. In
Conceptual Modeling-ER 2003, pages 45–58. Springer, 2003.
21. References (Image Credits)
• Facebook Open Graph
http://www.nanigans.com/2012/02/03/10-facebook-open-graph-apps-actions/
• Data Integration (Slide 3)
http://www.dbta.com/BigDataQuarterly/Articles/The-New-Newly-Democratized-Data-
Integration-109144.aspx
• Trends in databases
https://www.linkedin.com/pulse/future-decentralized-data-processing-architecture-
raunak-jhawar
https://www.google.com/trends/