SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Building a Graph based RDF Store for Apache Cassandra
Name: Ravindra Ranwala
ID: 138227T
Supervisor: Dr. Amal Shehan Perera
1
Agenda
● Introduction
● Basic Concepts
● The Problem
● Literature Review
● Methodology
● Demo
● Evaluation and Result
● Conclusion 2
Introduction
● RDFs are used to support queries in the semantic web.
● RDF stores contain trillions of triples.
● Today RDF data is everywhere - commercial search
engines proliferate RDF data ex. Google, yahoo, bing
etc.
● SPARQL - used as a query language.
● Different approaches exists to build a triple store.
● Main challenges are system scalability and generality.
3
Basic Concepts - RDF Triple
● RDF dataset consists of statements in the form of
(subject, predicate, object)
● Subject has a predicate property whose value is the
object.
● Examples: <Titanic, has award, Best picture>
● Core of the semantic web is built on top of the RDF data
model.
● These triples can be stored in different ways.
4
The Problem
● Apache Cassandra is a Nosql, multi tenant and multi
data centric database.
● Our objective is to build a scalable RDF store for
Apache Cassandra.
● Cassandra is used by eBay, Twitter, Cisco, etc.
● This will exponentially increase the value of Cassandra.
● The largest known Cassandra cluster has 300 TB of
data over 400 machines.
● This motivates us to build a distributed, scalable RDF
store to answer user queries on them efficiently. 5
Literature Review - Concepts
● A triple store can be built on top of any DBMS or File system.
● RDF dataset consists of statements in the form of <subject, predicate,
object>
● Subject has a predicate property whose value is object.
● Ex. <person1, name, Mike>
● A typical triple store holds a multi millions/billions of such triples.
● Efficient and scalable management of RDF data is a fundamental
challenge.
● SPARQL queries are submitted to the RDF store.
Jiacheng Yang, Haixun Wang, Bin Shao, Zhongyuan Wang Kai Zeng, "A Distributed Graph
Engine for Web Scale RDF Data,"
6
Apache Cassandra
● Distributed, fault tolerant (i.e. no single point of failures),
post relational, Nosql database system.
● Peer to peer distributed architecture. Supports both strict
and eventual consistency.
● All the nodes are the same. There is no master and slave
nodes.
● Uses read/write anywhere style architecture.
DataStax Corporation. (2011, October) “Welcome to Apache Cassandra 1.0”
7
Triple store –approaches
● There are different approaches the exist to manage
RDF data.
● Each approach has it’s own advantages and
disadvantages.
8
Relational Approach
● Triples are stored using the relational model.
Justin J. Levandoski F. Mokbel, "RDF Data-Centric Storage,"
9
Relational Approach (contd.)
● Triple store - yields costly self joins of a huge RDF store
(trillions of triples)
● N-array - eliminates the need for joins, but leads to
higher number of nulls.
● reduces null storage, but introduces costly join.
10
Graph based approaches
● New approach that greatly improves the performance of
SPARQL query processing
● Graph exploration instead of joins.
● Unnecessary intermediate results can be pruned down.
● Models RDF data in it’s native graph form.
● Examples: Trinity, TripleRush etc.
11
Trinity RDF
● Graph based implementation. Models RDF as a DAG.
● Subjects and objects are represented as a node.
● Predicate is represented as a directed labelled edge.
● Graph is stored in memory for fast access.
H. Wang, and Y. Li B. Shao, "The Trinity graph engine. Technical Report 161291, Microsoft Research," 12
Trinity Architecture
● Distributed in memory key value store.
● Partitions RDF graph across multiple machines by hashing on the nodes.
● Each machine holds a disjoint part of the graph.
● Final result is assembled at the proxy.
Jiacheng Yang, Haixun Wang, Bin Shao, Zhongyuan Wang Kai Zeng, "A Distributed Graph Engine for Web
Scale RDF Data,"
13
Methodology
● Use case Scenarios
○ Populating data into Cassandra Cluster
○ Building the RDF Graph
○ Querying the RDF Graph
○ Dropping the RDF Store
● Technologies used.
○ Apache Jena RDF API
○ Struts 2
○ Java/JSP/XSLT/XML/XPath
14
System Architecture
15
Demo
16
Evaluation and Result
● DBPedia benchmarking was used to compare.
● DBPedia geo-coordinates and homepages dataset was
used. Accounts for 0.7 million triples
● 4Store, Bigdata RDF stores were compared with our
implementation
● Queries used
○ Query One: Finds the homepage of the Metropolitan museum of Art
○ Query Two: Finds the Homepage of Kevin_Bacon
○ Query Three: Finds all the resources and their homepages which
reside near the area of Berlin.
○ Query Four: Finds all the resources and their homepages which reside
near the area of New York. 17
Benchmark Results
● Query complexity increases from Q1 through Q4.
● The execution time taken by different RDF stores, to execute above four queries.
● Query execution time is measured in ms.
Q1 Q2 Q3 Q4
Our
implementation
216ms 7ms 336ms 279ms
4Store 16ms 18ms 455ms 416ms
Bigdata 41ms 30ms 2sec, 355ms 1sec, 600ms
DBpedia. (2008, Jan 10.) RDF Store Benchmarks with DBpedia [Online]. Available:
http://wifo5-03.informatik.uni-mannheim.de/benchmarks-200801/ 18
Benchmarking Results
19
Benchmarking Results
20
Benchmarking Analysis
● Graph based approach yields more performance boosts
when query becomes more and more complex
● Complexity increases from Query 1 to 4 gradually.
● This implementation outperforms 4store and bigdata
especially when the complexity of the query increases.
● First query takes time, because it builds the index
structure.
21
Future Work
● Main limitation of the approach is Scalability.
● Larger datasets lead to OutOfMemory error while building the graph model.
● Solution: Distributed implementation
22
Conclusion
● Approaches used to model and retrieve RDF data.
● New approaches to manage RDF data efficiently.
● Graph based approach.
● New Implementation
○ Use case scenarios
○ Evaluation and result using DBPedia dataset
○ Benchmark Analysis
23

Weitere ähnliche Inhalte

Was ist angesagt?

Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaJeen Broekstra
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar PresentationMuntazir Mehdi
 
RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsGraph-TA
 
Anatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source APIAnatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source APIdatamantra
 
Scaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeScaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeAngad Singh
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientistsStitch Fix Algorithms
 
Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)jottevanger
 
Superset druid realtime
Superset druid realtimeSuperset druid realtime
Superset druid realtimearupmalakar
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataGraph-TA
 
Enabling access to Linked Media with SPARQL-MM
Enabling access to Linked Media with SPARQL-MMEnabling access to Linked Media with SPARQL-MM
Enabling access to Linked Media with SPARQL-MMThomas Kurz
 
Publishing Linked Data 3/5 Semtech2011
Publishing Linked Data 3/5 Semtech2011Publishing Linked Data 3/5 Semtech2011
Publishing Linked Data 3/5 Semtech2011Juan Sequeda
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF Navid Sedighpour
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Wes McKinney
 
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentAre Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentHarsh Thakkar
 
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike LimcacoExtending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike LimcacoData Con LA
 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & MarquezJulien Le Dem
 

Was ist angesagt? (20)

Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in Java
 
Tracking data lineage at Stitch Fix
Tracking data lineage at Stitch FixTracking data lineage at Stitch Fix
Tracking data lineage at Stitch Fix
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar Presentation
 
RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL Platforms
 
Anatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source APIAnatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source API
 
Scaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeScaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays Singapore
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientists
 
Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)
 
Superset druid realtime
Superset druid realtimeSuperset druid realtime
Superset druid realtime
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF Data
 
Enabling access to Linked Media with SPARQL-MM
Enabling access to Linked Media with SPARQL-MMEnabling access to Linked Media with SPARQL-MM
Enabling access to Linked Media with SPARQL-MM
 
Publishing Linked Data 3/5 Semtech2011
Publishing Linked Data 3/5 Semtech2011Publishing Linked Data 3/5 Semtech2011
Publishing Linked Data 3/5 Semtech2011
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF
 
Data structures
Data structuresData structures
Data structures
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
 
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentAre Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
 
Drupal and the Semantic Web
Drupal and the Semantic WebDrupal and the Semantic Web
Drupal and the Semantic Web
 
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike LimcacoExtending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
 
Extending Analytic Reach
Extending Analytic ReachExtending Analytic Reach
Extending Analytic Reach
 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & Marquez
 

Andere mochten auch

WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...WSO2
 
Deploying WSO2 Middleware on Mesos
Deploying WSO2 Middleware on MesosDeploying WSO2 Middleware on Mesos
Deploying WSO2 Middleware on MesosImesh Gunaratne
 
WSO2Con EU 2016: Integrate APIM to Third-party Tools: Creating an Agent for ELK
WSO2Con EU 2016: Integrate APIM to Third-party Tools:  Creating an Agent for ELKWSO2Con EU 2016: Integrate APIM to Third-party Tools:  Creating an Agent for ELK
WSO2Con EU 2016: Integrate APIM to Third-party Tools: Creating an Agent for ELKWSO2
 
Deploying WSO2 Middleware on Kubernetes
Deploying WSO2 Middleware on KubernetesDeploying WSO2 Middleware on Kubernetes
Deploying WSO2 Middleware on KubernetesImesh Gunaratne
 
WSO2 Identity Server - Product Overview
WSO2 Identity Server - Product OverviewWSO2 Identity Server - Product Overview
WSO2 Identity Server - Product OverviewWSO2
 
Enhanced Developer Experience with WSO2 Enterprise Service Bus Tooling
Enhanced Developer Experience with WSO2 Enterprise Service Bus ToolingEnhanced Developer Experience with WSO2 Enterprise Service Bus Tooling
Enhanced Developer Experience with WSO2 Enterprise Service Bus ToolingWSO2
 
Resilient Enterprise Messaging with WSO2 ESB
Resilient Enterprise Messaging with WSO2 ESBResilient Enterprise Messaging with WSO2 ESB
Resilient Enterprise Messaging with WSO2 ESBRavindra Ranwala
 
Solution Architecture Patterns for Digital Transformation
Solution Architecture Patterns for Digital TransformationSolution Architecture Patterns for Digital Transformation
Solution Architecture Patterns for Digital TransformationWSO2
 
2016 Year End Webinar - Are You Ready for Digital Transformation?
2016 Year End Webinar - Are You Ready for Digital Transformation?2016 Year End Webinar - Are You Ready for Digital Transformation?
2016 Year End Webinar - Are You Ready for Digital Transformation?WSO2
 

Andere mochten auch (9)

WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
 
Deploying WSO2 Middleware on Mesos
Deploying WSO2 Middleware on MesosDeploying WSO2 Middleware on Mesos
Deploying WSO2 Middleware on Mesos
 
WSO2Con EU 2016: Integrate APIM to Third-party Tools: Creating an Agent for ELK
WSO2Con EU 2016: Integrate APIM to Third-party Tools:  Creating an Agent for ELKWSO2Con EU 2016: Integrate APIM to Third-party Tools:  Creating an Agent for ELK
WSO2Con EU 2016: Integrate APIM to Third-party Tools: Creating an Agent for ELK
 
Deploying WSO2 Middleware on Kubernetes
Deploying WSO2 Middleware on KubernetesDeploying WSO2 Middleware on Kubernetes
Deploying WSO2 Middleware on Kubernetes
 
WSO2 Identity Server - Product Overview
WSO2 Identity Server - Product OverviewWSO2 Identity Server - Product Overview
WSO2 Identity Server - Product Overview
 
Enhanced Developer Experience with WSO2 Enterprise Service Bus Tooling
Enhanced Developer Experience with WSO2 Enterprise Service Bus ToolingEnhanced Developer Experience with WSO2 Enterprise Service Bus Tooling
Enhanced Developer Experience with WSO2 Enterprise Service Bus Tooling
 
Resilient Enterprise Messaging with WSO2 ESB
Resilient Enterprise Messaging with WSO2 ESBResilient Enterprise Messaging with WSO2 ESB
Resilient Enterprise Messaging with WSO2 ESB
 
Solution Architecture Patterns for Digital Transformation
Solution Architecture Patterns for Digital TransformationSolution Architecture Patterns for Digital Transformation
Solution Architecture Patterns for Digital Transformation
 
2016 Year End Webinar - Are You Ready for Digital Transformation?
2016 Year End Webinar - Are You Ready for Digital Transformation?2016 Year End Webinar - Are You Ready for Digital Transformation?
2016 Year End Webinar - Are You Ready for Digital Transformation?
 

Ähnlich wie Graph basedrdf storeforapachecassandra

Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionRUHULAMINHAZARIKA
 
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL DatabasesA Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL DatabasesLuiz Henrique Zambom Santana
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusBoldRadius Solutions
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Fabrizio Orlandi
 
HPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemHPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemAdam Marcus
 
The NoSQL Ecosystem
The NoSQL Ecosystem The NoSQL Ecosystem
The NoSQL Ecosystem yarapavan
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Sparkdatamantra
 
Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012scorlosquet
 
Data processing with spark in r &amp; python
Data processing with spark in r &amp; pythonData processing with spark in r &amp; python
Data processing with spark in r &amp; pythonMaloy Manna, PMP®
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopPython Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopJoe Drumgoole
 
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...Alexey Zinoviev
 
The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013scorlosquet
 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014mahchiev
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Michele Pasin
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015Robbie Strickland
 
final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)Ankit Rathi
 

Ähnlich wie Graph basedrdf storeforapachecassandra (20)

Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_Session
 
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL DatabasesA Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadius
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 
HPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemHPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL Ecosystem
 
The NoSQL Ecosystem
The NoSQL Ecosystem The NoSQL Ecosystem
The NoSQL Ecosystem
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012
 
Data processing with spark in r &amp; python
Data processing with spark in r &amp; pythonData processing with spark in r &amp; python
Data processing with spark in r &amp; python
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopPython Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB Workshop
 
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
 
The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013
 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)
 

Kürzlich hochgeladen

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 

Kürzlich hochgeladen (20)

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 

Graph basedrdf storeforapachecassandra

  • 1. Building a Graph based RDF Store for Apache Cassandra Name: Ravindra Ranwala ID: 138227T Supervisor: Dr. Amal Shehan Perera 1
  • 2. Agenda ● Introduction ● Basic Concepts ● The Problem ● Literature Review ● Methodology ● Demo ● Evaluation and Result ● Conclusion 2
  • 3. Introduction ● RDFs are used to support queries in the semantic web. ● RDF stores contain trillions of triples. ● Today RDF data is everywhere - commercial search engines proliferate RDF data ex. Google, yahoo, bing etc. ● SPARQL - used as a query language. ● Different approaches exists to build a triple store. ● Main challenges are system scalability and generality. 3
  • 4. Basic Concepts - RDF Triple ● RDF dataset consists of statements in the form of (subject, predicate, object) ● Subject has a predicate property whose value is the object. ● Examples: <Titanic, has award, Best picture> ● Core of the semantic web is built on top of the RDF data model. ● These triples can be stored in different ways. 4
  • 5. The Problem ● Apache Cassandra is a Nosql, multi tenant and multi data centric database. ● Our objective is to build a scalable RDF store for Apache Cassandra. ● Cassandra is used by eBay, Twitter, Cisco, etc. ● This will exponentially increase the value of Cassandra. ● The largest known Cassandra cluster has 300 TB of data over 400 machines. ● This motivates us to build a distributed, scalable RDF store to answer user queries on them efficiently. 5
  • 6. Literature Review - Concepts ● A triple store can be built on top of any DBMS or File system. ● RDF dataset consists of statements in the form of <subject, predicate, object> ● Subject has a predicate property whose value is object. ● Ex. <person1, name, Mike> ● A typical triple store holds a multi millions/billions of such triples. ● Efficient and scalable management of RDF data is a fundamental challenge. ● SPARQL queries are submitted to the RDF store. Jiacheng Yang, Haixun Wang, Bin Shao, Zhongyuan Wang Kai Zeng, "A Distributed Graph Engine for Web Scale RDF Data," 6
  • 7. Apache Cassandra ● Distributed, fault tolerant (i.e. no single point of failures), post relational, Nosql database system. ● Peer to peer distributed architecture. Supports both strict and eventual consistency. ● All the nodes are the same. There is no master and slave nodes. ● Uses read/write anywhere style architecture. DataStax Corporation. (2011, October) “Welcome to Apache Cassandra 1.0” 7
  • 8. Triple store –approaches ● There are different approaches the exist to manage RDF data. ● Each approach has it’s own advantages and disadvantages. 8
  • 9. Relational Approach ● Triples are stored using the relational model. Justin J. Levandoski F. Mokbel, "RDF Data-Centric Storage," 9
  • 10. Relational Approach (contd.) ● Triple store - yields costly self joins of a huge RDF store (trillions of triples) ● N-array - eliminates the need for joins, but leads to higher number of nulls. ● reduces null storage, but introduces costly join. 10
  • 11. Graph based approaches ● New approach that greatly improves the performance of SPARQL query processing ● Graph exploration instead of joins. ● Unnecessary intermediate results can be pruned down. ● Models RDF data in it’s native graph form. ● Examples: Trinity, TripleRush etc. 11
  • 12. Trinity RDF ● Graph based implementation. Models RDF as a DAG. ● Subjects and objects are represented as a node. ● Predicate is represented as a directed labelled edge. ● Graph is stored in memory for fast access. H. Wang, and Y. Li B. Shao, "The Trinity graph engine. Technical Report 161291, Microsoft Research," 12
  • 13. Trinity Architecture ● Distributed in memory key value store. ● Partitions RDF graph across multiple machines by hashing on the nodes. ● Each machine holds a disjoint part of the graph. ● Final result is assembled at the proxy. Jiacheng Yang, Haixun Wang, Bin Shao, Zhongyuan Wang Kai Zeng, "A Distributed Graph Engine for Web Scale RDF Data," 13
  • 14. Methodology ● Use case Scenarios ○ Populating data into Cassandra Cluster ○ Building the RDF Graph ○ Querying the RDF Graph ○ Dropping the RDF Store ● Technologies used. ○ Apache Jena RDF API ○ Struts 2 ○ Java/JSP/XSLT/XML/XPath 14
  • 17. Evaluation and Result ● DBPedia benchmarking was used to compare. ● DBPedia geo-coordinates and homepages dataset was used. Accounts for 0.7 million triples ● 4Store, Bigdata RDF stores were compared with our implementation ● Queries used ○ Query One: Finds the homepage of the Metropolitan museum of Art ○ Query Two: Finds the Homepage of Kevin_Bacon ○ Query Three: Finds all the resources and their homepages which reside near the area of Berlin. ○ Query Four: Finds all the resources and their homepages which reside near the area of New York. 17
  • 18. Benchmark Results ● Query complexity increases from Q1 through Q4. ● The execution time taken by different RDF stores, to execute above four queries. ● Query execution time is measured in ms. Q1 Q2 Q3 Q4 Our implementation 216ms 7ms 336ms 279ms 4Store 16ms 18ms 455ms 416ms Bigdata 41ms 30ms 2sec, 355ms 1sec, 600ms DBpedia. (2008, Jan 10.) RDF Store Benchmarks with DBpedia [Online]. Available: http://wifo5-03.informatik.uni-mannheim.de/benchmarks-200801/ 18
  • 21. Benchmarking Analysis ● Graph based approach yields more performance boosts when query becomes more and more complex ● Complexity increases from Query 1 to 4 gradually. ● This implementation outperforms 4store and bigdata especially when the complexity of the query increases. ● First query takes time, because it builds the index structure. 21
  • 22. Future Work ● Main limitation of the approach is Scalability. ● Larger datasets lead to OutOfMemory error while building the graph model. ● Solution: Distributed implementation 22
  • 23. Conclusion ● Approaches used to model and retrieve RDF data. ● New approaches to manage RDF data efficiently. ● Graph based approach. ● New Implementation ○ Use case scenarios ○ Evaluation and result using DBPedia dataset ○ Benchmark Analysis 23