SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Employing Graph Databases as a
Standardization Model towards
Addressing Heterogeneity
Dippy Aggarwal and Karen C. Davis
University of Cincinnati
Cincinnati, Ohio
IEEE 17th International Conference on
Information Reuse and Integration
July 28-30, 2016, Pittsburgh, USA
Agenda
Employing Graph Databases as a
Standardization Model towards
Addressing Heterogeneity
Motivation and Challenge
Our Proposed
Approach
Results and Future
Work
A Short Example Architecture Novelty
Integration of data from multiple sources lays foundation for building
rich and effective analytics systems.
Schema heterogeneity has been perceived as a major
challenge towards data integration and exchange for more
than two decades.
Proliferation in data models
Relational databases
de-facto standard for
decades
RDF databases
standard for linked data
NoSQL family of data models
“Map/Reduce is a great hammer but not everything is a nail” –
Benjamin Hindman (Co-Founder and Chief Architect at Mesosphere)
F. O¨ zcan, N. Tatbul, D. J. Abadi, M. Kornacker, C. Mohan, K. Ramasamy, and J. Wiener. Are we experiencing a big data bubble? In
Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14, pages 1407–1408, New York, NY,
USA, 2014. ACM.
Our vision: It would be useful to have
an approach that allows leveraging both schema-based and
schemaless data stores.
+ NoSQL
Our research question
Given the the unique advantages possessed by
different classes of data stores, how can we bring
them together under a homogeneous
representation?
Image Credits: http://www.slideshare.net/jexp/intro-to-neo4j-presentation
Our Solution
Adopting graphs as a means towards
standardization and integration of different
data stores.
Why graphs?
1. A simple and flexible abstraction for modeling artifacts of different kinds
Facebook Open Graph
Trends in databases
2. Attracting significant attention and interest in the past few years
Leveraging Neo4j for graph
implementation
Nodes and
relationships can
have properties
(key-value pairs)
Image Credits: Exploiting RDF Open Data Using NoSQL Graph Databases” – R. Bouhali and A. Laurent
Example of schema and data
model heterogeneity
Relational
schema excerpt
RDF excerpt
Addressing schema heterogeneity challenge
Relational schema excerpt
Neo4j
representation
Key-value
properties for
a node –
Jason Doe
Graph Representation for the
RDF Schema Excerpt
What is the additional merit that the common graph representation offers
compared to the knowledge that could have been derived from the native
model representations?
Name, homepage,
gender, birthday etc.
Advantage of graph model towards unification
By unifying them based on common attributes such as date of birth or
SkypeId each of the nodes can benefit by incorporating information from the
other schema.
Maps_With
“Exploiting RDF Open Data Using NoSQL Graph
Databases” – R. Bouhali and A. Laurent
R. Bouhali and A. Laurent. Artificial Intelligence Applications and Innovations: 11th IFIP WG 12.5 International Conference, AIAI 2015,
Bayonne, France,September 14-17, 2015, Proceedings, Exploiting RDF Open Data Using NoSQL Graph Databases, pages 177–190.
Springer International Publishing, Cham, 2015.
Data expressed in RDF RDF mapped to a property graph
Limitations: focus on converting only RDF data into a graph model whereas we envision
an extensible approach that embraces model diversity by allowing multiple models.
Novelty of our model: native model’s concept-preserving characteristic.
Architecture of our approach
Employs our
transformation
rules.
Export user defined
relational schemas in
a CSV format
Evaluation
Evaluation metrics (proposed by
Bouhali et al.)
Conciseness: The total number of nodes and
relationships and can be used to calculate the
graph size.
Connectivity: is calculated by dividing the
number of relationships with the total number of
nodes.
Sakila database in MySQL
Bouhali et al. – connectivity should be at least 1.5
Our results reflect a value (0.32) lower than the benchmark. Why
so? Sakila database: https://dev.mysql.com/doc/sakila/en/
Evaluation - trade-off between
conciseness and connectivity
Modeling
attributes
as nodes
Increased
conciseness
Evaluation metrics - trade-off between
conciseness and connectivity
Conclusions:
• The connectivity depends on the nature of original model
• A higher connectivity may come at the cost of an increase in the graph size.
Strong connectivity between nodes in a graph certainly is good for processing but
it also does not automatically lead to the conclusion that a lower number is not
desirable.
Increased
conciseness
Contributions
• An idea of employing graph databases as a means of
bridging the gap between schema-based and schemaless
data stores.
• A concept-preserving yet integrated graph model that
addresses the model heterogeneity and carries the
potential for handling the variety dimension of the big data
landscape.
• A proof-of-concept that illustrates the potential of
graph-based solutions towards addressing diversity in
data representations.
• A software-oriented, automated approach to transform
relational into a graph database.
The Path Forward
1. Extending our work by incorporating additional data
stores and illustrating integration.
2. Incorporate an evaluation study of the transformation
process to address the efficiency of the approach.
3. A performance study of querying an integrated graph
schema versus disconnected original native schemas is
another research direction.
4. The idea of reverse engineering the graph model to
obtain the schemas in the original models can also be
useful.
Selected References
• P. Atzeni, P. Cappellari, and P. A. Bernstein. Modelgen:Model
independent schema translation. In Data Engineering, 2005. ICDE
2005. Proceedings. 21st International Conference on, pages 1111–
1112. IEEE, 2005.
• R. Bouhali and A. Laurent. Artificial Intelligence Applications and
Innovations: 11th IFIP WG 12.5 International Conference, AIAI 2015,
Bayonne, France, September 14-17, 2015, Proceedings, chapter
Exploiting RDF Open Data Using NoSQL Graph Databases, pages
177–190. Springer International Publishing, Cham, 2015.
• S. Bowers and L. Delcambre. The uni-level description: A uniform
framework for representing information in multiple data models. In
Conceptual Modeling-ER 2003, pages 45–58. Springer, 2003.
References (Image Credits)
• Facebook Open Graph
http://www.nanigans.com/2012/02/03/10-facebook-open-graph-apps-actions/
• Data Integration (Slide 3)
http://www.dbta.com/BigDataQuarterly/Articles/The-New-Newly-Democratized-Data-
Integration-109144.aspx
• Trends in databases
https://www.linkedin.com/pulse/future-decentralized-data-processing-architecture-
raunak-jhawar
https://www.google.com/trends/
Thank you.
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Christophe Debruyne
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
Shankar R
 

Was ist angesagt? (20)

ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
 
Building Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsBuilding Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 steps
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
 
GraphTech Ecosystem - part 2: Graph Analytics
 GraphTech Ecosystem - part 2: Graph Analytics GraphTech Ecosystem - part 2: Graph Analytics
GraphTech Ecosystem - part 2: Graph Analytics
 
End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed REnd-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
 
Smarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing PlatformSmarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing Platform
 
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and HealthcareBig data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
 
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software ComponentsFIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
 
Connected datalondon metadata-driven apps
Connected datalondon metadata-driven appsConnected datalondon metadata-driven apps
Connected datalondon metadata-driven apps
 
Open source analytics
Open source analyticsOpen source analytics
Open source analytics
 
Vital.AI Creating Intelligent Apps
Vital.AI Creating Intelligent AppsVital.AI Creating Intelligent Apps
Vital.AI Creating Intelligent Apps
 
Bigdata
BigdataBigdata
Bigdata
 
It Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsIt Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got Semantics
 
Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL Server
 
Big data landscape
Big data landscapeBig data landscape
Big data landscape
 
Hadoop - An Introduction
Hadoop - An IntroductionHadoop - An Introduction
Hadoop - An Introduction
 
What Factors Influence the Design of a Linked Data Generation Algorithm?
What Factors Influence the Design of a Linked Data Generation Algorithm?What Factors Influence the Design of a Linked Data Generation Algorithm?
What Factors Influence the Design of a Linked Data Generation Algorithm?
 

Andere mochten auch

Vortrag Graphendatenbanken Uni Stuttgart
Vortrag Graphendatenbanken Uni StuttgartVortrag Graphendatenbanken Uni Stuttgart
Vortrag Graphendatenbanken Uni Stuttgart
Henning Rauch
 
Neo4j - 5 cool graph examples
Neo4j - 5 cool graph examplesNeo4j - 5 cool graph examples
Neo4j - 5 cool graph examples
Peter Neubauer
 

Andere mochten auch (20)

Vortrag Graphendatenbanken Uni Stuttgart
Vortrag Graphendatenbanken Uni StuttgartVortrag Graphendatenbanken Uni Stuttgart
Vortrag Graphendatenbanken Uni Stuttgart
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Neo4j - 5 cool graph examples
Neo4j - 5 cool graph examplesNeo4j - 5 cool graph examples
Neo4j - 5 cool graph examples
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Converting Relational to Graph Databases
Converting Relational to Graph DatabasesConverting Relational to Graph Databases
Converting Relational to Graph Databases
 
Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...
 
Modelling differential clustering and treatment effect heterogeneity in paral...
Modelling differential clustering and treatment effect heterogeneity in paral...Modelling differential clustering and treatment effect heterogeneity in paral...
Modelling differential clustering and treatment effect heterogeneity in paral...
 
HypergraphDB
HypergraphDBHypergraphDB
HypergraphDB
 
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
 
Experimenting with Google Knowledge Graph & How Can we Potentially use it in...
 Experimenting with Google Knowledge Graph & How Can we Potentially use it in... Experimenting with Google Knowledge Graph & How Can we Potentially use it in...
Experimenting with Google Knowledge Graph & How Can we Potentially use it in...
 
Inside Google Knowledge Graph
Inside Google Knowledge GraphInside Google Knowledge Graph
Inside Google Knowledge Graph
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge Graph
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine Business
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine Business
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge Graph
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - Import
 
Google Knowledge Graph
Google Knowledge GraphGoogle Knowledge Graph
Google Knowledge Graph
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 

Ähnlich wie Employing Graph Databases as a Standardization Model towards Addressing Heterogeneity

A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
Alexander Decker
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
Alexander Decker
 
Project Name
Project NameProject Name
Project Name
butest
 

Ähnlich wie Employing Graph Databases as a Standardization Model towards Addressing Heterogeneity (20)

OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4JOUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
 
STI Summit 2011 - DB vs RDF
STI Summit 2011 - DB vs RDFSTI Summit 2011 - DB vs RDF
STI Summit 2011 - DB vs RDF
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
Software-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big ServicesSoftware-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big Services
 
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
 
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceAI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
 
Bridging the gap between the semantic web and big data: answering SPARQL que...
Bridging the gap between the semantic web and big data:  answering SPARQL que...Bridging the gap between the semantic web and big data:  answering SPARQL que...
Bridging the gap between the semantic web and big data: answering SPARQL que...
 
The Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsThe Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing Systems
 
Project Name
Project NameProject Name
Project Name
 
Data centric business and knowledge graph trends
Data centric business and knowledge graph trendsData centric business and knowledge graph trends
Data centric business and knowledge graph trends
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
 
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 

Kürzlich hochgeladen

POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 

Kürzlich hochgeladen (20)

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 

Employing Graph Databases as a Standardization Model towards Addressing Heterogeneity

  • 1. Employing Graph Databases as a Standardization Model towards Addressing Heterogeneity Dippy Aggarwal and Karen C. Davis University of Cincinnati Cincinnati, Ohio IEEE 17th International Conference on Information Reuse and Integration July 28-30, 2016, Pittsburgh, USA
  • 2. Agenda Employing Graph Databases as a Standardization Model towards Addressing Heterogeneity Motivation and Challenge Our Proposed Approach Results and Future Work A Short Example Architecture Novelty
  • 3. Integration of data from multiple sources lays foundation for building rich and effective analytics systems. Schema heterogeneity has been perceived as a major challenge towards data integration and exchange for more than two decades.
  • 4. Proliferation in data models Relational databases de-facto standard for decades RDF databases standard for linked data NoSQL family of data models “Map/Reduce is a great hammer but not everything is a nail” – Benjamin Hindman (Co-Founder and Chief Architect at Mesosphere) F. O¨ zcan, N. Tatbul, D. J. Abadi, M. Kornacker, C. Mohan, K. Ramasamy, and J. Wiener. Are we experiencing a big data bubble? In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14, pages 1407–1408, New York, NY, USA, 2014. ACM. Our vision: It would be useful to have an approach that allows leveraging both schema-based and schemaless data stores. + NoSQL
  • 5. Our research question Given the the unique advantages possessed by different classes of data stores, how can we bring them together under a homogeneous representation? Image Credits: http://www.slideshare.net/jexp/intro-to-neo4j-presentation
  • 6. Our Solution Adopting graphs as a means towards standardization and integration of different data stores.
  • 7. Why graphs? 1. A simple and flexible abstraction for modeling artifacts of different kinds Facebook Open Graph Trends in databases 2. Attracting significant attention and interest in the past few years
  • 8. Leveraging Neo4j for graph implementation Nodes and relationships can have properties (key-value pairs) Image Credits: Exploiting RDF Open Data Using NoSQL Graph Databases” – R. Bouhali and A. Laurent
  • 9. Example of schema and data model heterogeneity Relational schema excerpt RDF excerpt
  • 10. Addressing schema heterogeneity challenge Relational schema excerpt Neo4j representation Key-value properties for a node – Jason Doe
  • 11. Graph Representation for the RDF Schema Excerpt What is the additional merit that the common graph representation offers compared to the knowledge that could have been derived from the native model representations? Name, homepage, gender, birthday etc.
  • 12. Advantage of graph model towards unification By unifying them based on common attributes such as date of birth or SkypeId each of the nodes can benefit by incorporating information from the other schema. Maps_With
  • 13. “Exploiting RDF Open Data Using NoSQL Graph Databases” – R. Bouhali and A. Laurent R. Bouhali and A. Laurent. Artificial Intelligence Applications and Innovations: 11th IFIP WG 12.5 International Conference, AIAI 2015, Bayonne, France,September 14-17, 2015, Proceedings, Exploiting RDF Open Data Using NoSQL Graph Databases, pages 177–190. Springer International Publishing, Cham, 2015. Data expressed in RDF RDF mapped to a property graph Limitations: focus on converting only RDF data into a graph model whereas we envision an extensible approach that embraces model diversity by allowing multiple models. Novelty of our model: native model’s concept-preserving characteristic.
  • 14. Architecture of our approach Employs our transformation rules. Export user defined relational schemas in a CSV format
  • 15. Evaluation Evaluation metrics (proposed by Bouhali et al.) Conciseness: The total number of nodes and relationships and can be used to calculate the graph size. Connectivity: is calculated by dividing the number of relationships with the total number of nodes. Sakila database in MySQL Bouhali et al. – connectivity should be at least 1.5 Our results reflect a value (0.32) lower than the benchmark. Why so? Sakila database: https://dev.mysql.com/doc/sakila/en/
  • 16. Evaluation - trade-off between conciseness and connectivity Modeling attributes as nodes Increased conciseness
  • 17. Evaluation metrics - trade-off between conciseness and connectivity Conclusions: • The connectivity depends on the nature of original model • A higher connectivity may come at the cost of an increase in the graph size. Strong connectivity between nodes in a graph certainly is good for processing but it also does not automatically lead to the conclusion that a lower number is not desirable. Increased conciseness
  • 18. Contributions • An idea of employing graph databases as a means of bridging the gap between schema-based and schemaless data stores. • A concept-preserving yet integrated graph model that addresses the model heterogeneity and carries the potential for handling the variety dimension of the big data landscape. • A proof-of-concept that illustrates the potential of graph-based solutions towards addressing diversity in data representations. • A software-oriented, automated approach to transform relational into a graph database.
  • 19. The Path Forward 1. Extending our work by incorporating additional data stores and illustrating integration. 2. Incorporate an evaluation study of the transformation process to address the efficiency of the approach. 3. A performance study of querying an integrated graph schema versus disconnected original native schemas is another research direction. 4. The idea of reverse engineering the graph model to obtain the schemas in the original models can also be useful.
  • 20. Selected References • P. Atzeni, P. Cappellari, and P. A. Bernstein. Modelgen:Model independent schema translation. In Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on, pages 1111– 1112. IEEE, 2005. • R. Bouhali and A. Laurent. Artificial Intelligence Applications and Innovations: 11th IFIP WG 12.5 International Conference, AIAI 2015, Bayonne, France, September 14-17, 2015, Proceedings, chapter Exploiting RDF Open Data Using NoSQL Graph Databases, pages 177–190. Springer International Publishing, Cham, 2015. • S. Bowers and L. Delcambre. The uni-level description: A uniform framework for representing information in multiple data models. In Conceptual Modeling-ER 2003, pages 45–58. Springer, 2003.
  • 21. References (Image Credits) • Facebook Open Graph http://www.nanigans.com/2012/02/03/10-facebook-open-graph-apps-actions/ • Data Integration (Slide 3) http://www.dbta.com/BigDataQuarterly/Articles/The-New-Newly-Democratized-Data- Integration-109144.aspx • Trends in databases https://www.linkedin.com/pulse/future-decentralized-data-processing-architecture- raunak-jhawar https://www.google.com/trends/