SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Virtuoso, The 
Prometheus of RDF 
By Orri Erling 
Virtuoso Program Manager, OpenLink Software
Linked Data at Dawn 
 The Promise and the Practice 
 The Science of Speed 
 The Structure which Is 
 Ongoing Research 
License CC-BY-SA 4.0 (International).
Linked Data Promises 
 RDF is a generic, minimalistic model for describing 
things 
 RDF has global identifiers and data is self-describing 
 URI's may be dereferenceable 
 RDF is flexible to query, does not force a single 
hierarchical view like XML 
License CC-BY-SA 4.0 (International).
Linked Data Scenarios 
RDF is used because of 
 schema flexibility 
 global identifiers 
Inference, if present, is usually trivial 
 Subclass 
 Sub-property 
License CC-BY-SA 4.0 (International).
Where Triples Come From 
 Relational extracts or web content is converted to 
and stored as triples 
NLP extraction 
 New applications with RDF as primary data model 
 Doing SPARQL against data in RDB's is possible 
but is rare and does not deliver the flexibility 
License CC-BY-SA 4.0 (International).
Linked Data Verticals and Patterns 
Publishing: tagging and annotations, evolving vocabularies 
Archives: self description, long term identifiers, many versions 
of schema 
Semantic search: structured, semi-structured and full text all in 
one 
Business intelligence: many sources, ease of adding sources, 
no 6 month DW schema change cycle 
E-science: often in life sciences: common interchange format, 
nano-publications, NLP extracts, different users cook their data 
differently, provenance 
License CC-BY-SA 4.0 (International).
The Hopes and Perceptions 
The age of ad hoc 
Find insight in any data, when you need it, from any source, 
any format 
No data warehouse planning cycles, make your own from 
the pieces you need, when you need it 
Still, data integration remains hard work, quality and 
coverage of sources vary 
Flexibility may be there, but is performance and scalability 
on the level? 
License CC-BY-SA 4.0 (International).
Yes, But ... 
Web and Big Data: Everybody reinvents the triple: Self-description, 
long term identifiers, key-value pairs in many non- 
RDF use cases 
SPARQL and RDF would be the natural, standards compliant 
choice if did beat SQL, information retrieval, custom big data, 
key value, map reduce solutions 
Is this intrinsic to linked data or is this lack of engineering? 
Linked data has unique advantages in breadth of coverage and 
expressivity but performance must not lag behind. 
License CC-BY-SA 4.0 (International).
What is the RDF Tax? 
 90% of bad performance comes from non-optimal 
query plans 
 Some comes from indexing too much (e.g. SQL 
bulk load with no indices is 50x faster than the 
equivalent in RDF with all indexed) 
 Some comes from string ops on URI's, literals 
 Some comes from having a join for every attribute. 
Vectoring and right plans help, though 
License CC-BY-SA 4.0 (International).
The Bane of the Triple 
When data is stored as triples: 
 There is structure still but it is harder to exploit: Schema re-emerges 
as correlations 
 More joins make more possible query plans, bigger errors in 
plan cost estimation 
 More joining reduces locality 
 Lack of schema causes needless indexing, data takes more 
space 
 A URI for everything takes space and time 
For the same workload, SQL can be 2 - 20x faster also with 
Virtuoso 
License CC-BY-SA 4.0 (International).
The Question is Raised 
 LOD2 FP7, now ending: RDF Performance parity 
with relational? 
 SQL is the senior science, who ignores history is 
bound to repeat it 
 Integral mastery of RDB science is a prerequisite, 
but do not forget the subtle twists of schema-less’ness 
License CC-BY-SA 4.0 (International).
Virtuoso Leadership in Linked Data 
 2000 – 06 V1.x - 4.x SQL row store with SQL federation and 
XML 
 2007 – 08 V 5.x - 6.x SPARQL, Adapted for RDF quads with 
more compression, bitmap indices, special data types, RDF 
awareness in query optimization 
 2009 - 6.x Scale out cluster capable 
 2010 – 13 V7.x Column store, vectored execution, 3x more 
space efficient, 10+x more speed 
 2013 Star Schema benchmark with SPARQL 100x MySQL 
SQL, 0.8x MonetDB SQL 
 2014 - Top of the line SQL analytics, 500Gtriples, Structure 
Awareness 
License CC-BY-SA 4.0 (International).
Triples Are Done Right, so? 
 Column store techniques are a good fit, index based 
triple storage does not get much better 
 RAM-only pointer based techniques can be faster 
but cost 10-100x more to scale up 
 To take RDF to SQL parity, Virtuoso must first be 
on the level with the best in SQL 
 TPC-H is the checklist for mastery of DW and query 
optimization , who survives shall not fear 
 Parity is achieved when running with triples just like 
with tables 
License CC-BY-SA 4.0 (International).
Structure is Everywhere 
CWI in LOD2: 
 90% of triples in Common Crawl fall in 20 tables 
 All relational extractions are 100% tables 
 Even Dbpedia is 90% covered by 500 tables, but is 
unusually heterogeneous, albeit not very large 
License CC-BY-SA 4.0 (International).
The Glorious Dawn: 
Structure is the Servant, not the Tyrant 
 A set of subjects with all the same single valued properties is 
in fact a table. 
So, store it as a table 
 Allow exceptions, e.g. sometimes multiple values, different 
values In different graphs, extra properties etc 
 If it is big, it has repeating structure 
 All RDF semantics are preserved, any triple is possible but the 
common ones are SQL compact and SQL fast 
 With tables, query optimization returns to SQL complexity and 
is much more reliable 
 So, more tricks from the SQL analytics bad become safe and 
applicable 
License CC-BY-SA 4.0 (International).
Gains from Structure Awareness 
 3+x Load Speed 
 2x more space efficiency 
 Queries against regular data within 10-20% of SQL 
speeds 
 Just declare which properties tend to occur 
together, no strict schema first like with SQL 
 Later, self configuration 
License CC-BY-SA 4.0 (International).
The Cycle of Adventure 
 Rebels: SQL not cool, too rigid, drop 
ACID, go key-value, map-reduce, the 
triple is all there is, semantic web 
 Pioneers: Life on the frontier is hard, 
infrastructure missing or bad 
 Same everyday problems also in 
Utopia 
 Recognizing the objective values, eg 
schema freedom and identifiers, no 
AI. Do the job, forget dogma 
 Reconciliation: schema-first and 
schema-last converge in structure 
awareness 
License CC-BY-SA 4.0 (International).
Present FP7 Research 
 LDBC - Transparency and 
Relevance for Graph DB, RDF 
performance 
 GeoKnow - GeoData is everywhere, 
how to carry the planet in your 
pocket 
 LOD2 - Where no triple has gone 
before (and come back) 
 Open PHACTs – A Data Platform 
for Drug Discovery 
License CC-BY-SA 4.0 (International).
LDBC - Linked Data Benchmark Council 
 Rebels: SQL not cool, too rigid, drop ACID, 
go key-value, map-reduce, the triple is all 
there is, semantic web 
 Pioneers: Life on the frontier is hard, 
infrastructure missing or bad 
 Same everyday problems also in Utopia 
 Recognizing the objective values, e. schema 
freedom and identifiers, no AI. Do the job, 
forget dogma 
 Reconciliation: Some of the rebel thinking 
becomes mainstream, e.g. schema-first and 
schema-last converge in structure awareness 
License CC-BY-SA 4.0 (International).
LDB Council, Independent Industry Forum 
for Benchmarking 
 The TPC for the frontiers of database 
 Bootstrapped in the LDBC FP7, continues 
as independent industry association 
 OpenLink, Ontotext, Neo Technologies, 
Sparsity as founding members 
 IBM, Oracle Labs, Systap, SPARQL City 
already joined 
 DB superstars Peter Boncz and Thomas 
Neumann as founders and scientific lead 
License CC-BY-SA 4.0 (International).
LDBC Benchmarks 
Social Network 
 Online - Lookups, updates, analysis of 
social environment 
 Business Intelligence - Spotting trends, key 
players, big query 
 Graph analytics - Community detection, 
Page rank, graph metrics 
Semantic Publishing 
 Modeled after the BBC linked data portal, 
online lookups, drill downs and updates 
License CC-BY-SA 4.0 (International).
GeoKnow - The Planet in your Pocket 
Ms. Globe and Mr. Cube have a 
thing going on: 
 Mr. Cube: Desiloization ... 
integrated metadata ... Explicit 
semantics . 
 Ms. Globe: I can feel it... but are 
you man enough ... you need to 
show me. 
License CC-BY-SA 4.0 (International).
Planet Scale Roadmap 
Jan 2014: 
 Virtuoso SPARQL outperforms PostGIS in map lookups with planet-wide 
Open Street Map 
 Virtuoso SQL adds 5x more power 
License CC-BY-SA 4.0 (International).
Next: Jan 2015 
 Parity between SPARQL and SQL via structure 
awareness 
 Geospatial data clustering 
 Graph analytics close to the data, Pregel Giraph etc 
in the DB itself 
 Adding fine grain geo dimension to LDBC social 
network benchmark 
License CC-BY-SA 4.0 (International).
The LOD2 scaling adventures 
Experiments at CWI’s Scilens cluster: 
 150Gtriples in Jan 2013 (8x256G RAM) 
 500Gtriples Aug 2014 (12 x 256G RAM) 
 Some trillion triple claims exist but do not 
detail any query workload 
BSBM explore and BI workloads 
 10x speed gains for BI queries from 2013 to 
2014 
Bulk load at 6M triples/s 
 All done in triples, structure awareness will 
go further still 
License CC-BY-SA 4.0 (International).
Open PHACTs 
Partners: 
License CC-BY-SA 4.0 (International).
Virtuoso Now 
Snapshot of RDF Linked Data customers in the Enterprise: 
 Data.Gov (U.S. Govt. Open 
Linked Data initiative) 
 Bank of America 
 Booz Allen Hamilton 
 Northrop Grumman 
 Elsevier 
 French National Library 
 Samsung 
 Globo 
 Daimler Benz 
 Johnson & Johnson 
 Bayer 
 St Jude's Medical 
 Fuijitsu 
 Syngenta 
 and many more 
License CC-BY-SA 4.0 (International).
Virtuoso Availability 
 Most capabilities as open source 
 Commercial adds 
 Cluster scale-out 
 SQL Federation 
 Replication (SQL & RDF) 
 Advanced RDF security, ABAC & RBAC (ACLs) 
 Wide tables 
 and more 
 Up to the minute tech previews via v7fasttrack on github, e.g. 
superfast TPC-H implementation 
License CC-BY-SA 4.0 (International).
Virtuoso Future 
 Preview of structure aware RDF store in fall 2014 
via v7fasttrack 
Integrated graph analytics framework 
 Embed complex graph algos, e.g. community 
detection, shortest path inside SPARQL/SQL 
 Comparison of SQL and SPARQL for big data 
analytics 
License CC-BY-SA 4.0 (International).
Linked Data Now 
 Adoption across major industries 
 Superior flexibility and time to solution 
 Dramatic performance gains in the last 5 years 
 Benchmarking will continue to drive progress, to the benefit of 
users and vendors alike 
 Run circles around most open source SQL in SPARQL: 
Virtuoso SPARQL beats MySQL in SSB by 100x 
 With structure awareness, SPARQL to match the best in SQL 
for data warehousing, OLTP 
 Linked Data no longer a long shot but a technology that makes 
sense 
License CC-BY-SA 4.0 (International).
About OpenLink Software 
OpenLink Software is a privately-held company founded in 1992 by its President & CEO, 
Kingsley Idehen. The company is an industry acclaimed technology innovator in the 
following areas: 
 ODBC, JDBC, ADO.NET, and OLE-DB compliant Data Access Drivers for Oracle, 
SQL Server, Informix, Ingres, Sybase, Progress, MySQL, and PostgreSQL 
 High-Performance & Scalable Multi-Model (Relational & Graph) Database 
Technology 
 Data Integration Middleware (Data Virtualization Technology across a wide variety of 
Protocols & Formats) 
 Web Application Server Technology 
 Linked Data Deployment & Management 
 Socially-enhanced Distributed Collaborative Applications Platforms (Weblogs, Wikis, 
Feed Aggregation and Syndication, Web File Systems, Discussion Forums, etc.) 
 Identity Management. 
License CC-BY-SA 4.0 (International).
Office Locations 
USA 
OpenLink Software, Inc 
10 Burlington Mall Road 
Suite 265 
Burlington, MA 01803 
Tel.: +1 781 273 0900 
Fax: +1 781 229 8030 
UK 
OpenLink Software Ltd. 
Airport House 
Purley Way 
Croydon, Surrey CR0 0XZ 
Tel.: +44 (0)20 8681 7701 
Fax: +44 (0)20 8681 7702 
License CC-BY-SA 4.0 (International).
Additional Information 
Web Sites 
OpenLink Software 
YouID – Digital Identity Card (Certificate) Generator 
OpenLink Data Spaces – Semantically enhanced Personal & Enterprise Data Spaces & 
Collaboration Platform 
OpenLink Virtuoso - Hybrid Data Management, Integration, Application, and Identity Server 
Universal Data Access Drivers - High-Performance ODBC, JDBC, ADO.NET, and OLE-DB 
Drivers 
LDAP and NetID-TLS – How to use LDAP scheme URIs with NetID-TLS Authentication 
Social Media Data spaces 
http://kidehen.blogspot.com (weblog) 
http://www.openlinksw.com/blog/~kidehen/ (weblog) 
https://plus.google.com/112399767740508618350/posts (Google+) 
https://twitter.com/#!/kidehen (Twitter) 
Hashtag: #LinkedData (Anywhere). 
License CC-BY-SA 4.0 (International).

Weitere ähnliche Inhalte

Was ist angesagt?

Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
Marin Dimitrov
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked Data
EUCLID project
 

Was ist angesagt? (13)

Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
 
LDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionLDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and Discussion
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
 
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionLinking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
Linked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceLinked Data, Ontologies and Inference
Linked Data, Ontologies and Inference
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked Data
 
Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014
 
Structured Dynamics' Semantic Technologies Product Stack
Structured Dynamics' Semantic Technologies Product StackStructured Dynamics' Semantic Technologies Product Stack
Structured Dynamics' Semantic Technologies Product Stack
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
 

Ähnlich wie Virtuoso -- The Prometheus of RDF

NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
Igor Moochnick
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark Greaves
Mediabistro
 
Deploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application ServerDeploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application Server
webhostingguy
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options Compared
Sergey Bushik
 

Ähnlich wie Virtuoso -- The Prometheus of RDF (20)

Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationSigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
 
ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...
ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...
ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...
 
Apache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysApache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - Panorays
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data Smarter
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
NoSql Databases
NoSql DatabasesNoSql Databases
NoSql Databases
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark Greaves
 
Big Data Trend with Open Platform
Big Data Trend with Open PlatformBig Data Trend with Open Platform
Big Data Trend with Open Platform
 
Deploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application ServerDeploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application Server
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options Compared
 
Modèles de données et langages de description ouverts 6 - 2021-2022
Modèles de données et langages de description ouverts   6 - 2021-2022Modèles de données et langages de description ouverts   6 - 2021-2022
Modèles de données et langages de description ouverts 6 - 2021-2022
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousing
 
GraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph DatabasesGraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph Databases
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Virtuoso -- The Prometheus of RDF

  • 1. Virtuoso, The Prometheus of RDF By Orri Erling Virtuoso Program Manager, OpenLink Software
  • 2. Linked Data at Dawn  The Promise and the Practice  The Science of Speed  The Structure which Is  Ongoing Research License CC-BY-SA 4.0 (International).
  • 3. Linked Data Promises  RDF is a generic, minimalistic model for describing things  RDF has global identifiers and data is self-describing  URI's may be dereferenceable  RDF is flexible to query, does not force a single hierarchical view like XML License CC-BY-SA 4.0 (International).
  • 4. Linked Data Scenarios RDF is used because of  schema flexibility  global identifiers Inference, if present, is usually trivial  Subclass  Sub-property License CC-BY-SA 4.0 (International).
  • 5. Where Triples Come From  Relational extracts or web content is converted to and stored as triples NLP extraction  New applications with RDF as primary data model  Doing SPARQL against data in RDB's is possible but is rare and does not deliver the flexibility License CC-BY-SA 4.0 (International).
  • 6. Linked Data Verticals and Patterns Publishing: tagging and annotations, evolving vocabularies Archives: self description, long term identifiers, many versions of schema Semantic search: structured, semi-structured and full text all in one Business intelligence: many sources, ease of adding sources, no 6 month DW schema change cycle E-science: often in life sciences: common interchange format, nano-publications, NLP extracts, different users cook their data differently, provenance License CC-BY-SA 4.0 (International).
  • 7. The Hopes and Perceptions The age of ad hoc Find insight in any data, when you need it, from any source, any format No data warehouse planning cycles, make your own from the pieces you need, when you need it Still, data integration remains hard work, quality and coverage of sources vary Flexibility may be there, but is performance and scalability on the level? License CC-BY-SA 4.0 (International).
  • 8. Yes, But ... Web and Big Data: Everybody reinvents the triple: Self-description, long term identifiers, key-value pairs in many non- RDF use cases SPARQL and RDF would be the natural, standards compliant choice if did beat SQL, information retrieval, custom big data, key value, map reduce solutions Is this intrinsic to linked data or is this lack of engineering? Linked data has unique advantages in breadth of coverage and expressivity but performance must not lag behind. License CC-BY-SA 4.0 (International).
  • 9. What is the RDF Tax?  90% of bad performance comes from non-optimal query plans  Some comes from indexing too much (e.g. SQL bulk load with no indices is 50x faster than the equivalent in RDF with all indexed)  Some comes from string ops on URI's, literals  Some comes from having a join for every attribute. Vectoring and right plans help, though License CC-BY-SA 4.0 (International).
  • 10. The Bane of the Triple When data is stored as triples:  There is structure still but it is harder to exploit: Schema re-emerges as correlations  More joins make more possible query plans, bigger errors in plan cost estimation  More joining reduces locality  Lack of schema causes needless indexing, data takes more space  A URI for everything takes space and time For the same workload, SQL can be 2 - 20x faster also with Virtuoso License CC-BY-SA 4.0 (International).
  • 11. The Question is Raised  LOD2 FP7, now ending: RDF Performance parity with relational?  SQL is the senior science, who ignores history is bound to repeat it  Integral mastery of RDB science is a prerequisite, but do not forget the subtle twists of schema-less’ness License CC-BY-SA 4.0 (International).
  • 12. Virtuoso Leadership in Linked Data  2000 – 06 V1.x - 4.x SQL row store with SQL federation and XML  2007 – 08 V 5.x - 6.x SPARQL, Adapted for RDF quads with more compression, bitmap indices, special data types, RDF awareness in query optimization  2009 - 6.x Scale out cluster capable  2010 – 13 V7.x Column store, vectored execution, 3x more space efficient, 10+x more speed  2013 Star Schema benchmark with SPARQL 100x MySQL SQL, 0.8x MonetDB SQL  2014 - Top of the line SQL analytics, 500Gtriples, Structure Awareness License CC-BY-SA 4.0 (International).
  • 13. Triples Are Done Right, so?  Column store techniques are a good fit, index based triple storage does not get much better  RAM-only pointer based techniques can be faster but cost 10-100x more to scale up  To take RDF to SQL parity, Virtuoso must first be on the level with the best in SQL  TPC-H is the checklist for mastery of DW and query optimization , who survives shall not fear  Parity is achieved when running with triples just like with tables License CC-BY-SA 4.0 (International).
  • 14. Structure is Everywhere CWI in LOD2:  90% of triples in Common Crawl fall in 20 tables  All relational extractions are 100% tables  Even Dbpedia is 90% covered by 500 tables, but is unusually heterogeneous, albeit not very large License CC-BY-SA 4.0 (International).
  • 15. The Glorious Dawn: Structure is the Servant, not the Tyrant  A set of subjects with all the same single valued properties is in fact a table. So, store it as a table  Allow exceptions, e.g. sometimes multiple values, different values In different graphs, extra properties etc  If it is big, it has repeating structure  All RDF semantics are preserved, any triple is possible but the common ones are SQL compact and SQL fast  With tables, query optimization returns to SQL complexity and is much more reliable  So, more tricks from the SQL analytics bad become safe and applicable License CC-BY-SA 4.0 (International).
  • 16. Gains from Structure Awareness  3+x Load Speed  2x more space efficiency  Queries against regular data within 10-20% of SQL speeds  Just declare which properties tend to occur together, no strict schema first like with SQL  Later, self configuration License CC-BY-SA 4.0 (International).
  • 17. The Cycle of Adventure  Rebels: SQL not cool, too rigid, drop ACID, go key-value, map-reduce, the triple is all there is, semantic web  Pioneers: Life on the frontier is hard, infrastructure missing or bad  Same everyday problems also in Utopia  Recognizing the objective values, eg schema freedom and identifiers, no AI. Do the job, forget dogma  Reconciliation: schema-first and schema-last converge in structure awareness License CC-BY-SA 4.0 (International).
  • 18. Present FP7 Research  LDBC - Transparency and Relevance for Graph DB, RDF performance  GeoKnow - GeoData is everywhere, how to carry the planet in your pocket  LOD2 - Where no triple has gone before (and come back)  Open PHACTs – A Data Platform for Drug Discovery License CC-BY-SA 4.0 (International).
  • 19. LDBC - Linked Data Benchmark Council  Rebels: SQL not cool, too rigid, drop ACID, go key-value, map-reduce, the triple is all there is, semantic web  Pioneers: Life on the frontier is hard, infrastructure missing or bad  Same everyday problems also in Utopia  Recognizing the objective values, e. schema freedom and identifiers, no AI. Do the job, forget dogma  Reconciliation: Some of the rebel thinking becomes mainstream, e.g. schema-first and schema-last converge in structure awareness License CC-BY-SA 4.0 (International).
  • 20. LDB Council, Independent Industry Forum for Benchmarking  The TPC for the frontiers of database  Bootstrapped in the LDBC FP7, continues as independent industry association  OpenLink, Ontotext, Neo Technologies, Sparsity as founding members  IBM, Oracle Labs, Systap, SPARQL City already joined  DB superstars Peter Boncz and Thomas Neumann as founders and scientific lead License CC-BY-SA 4.0 (International).
  • 21. LDBC Benchmarks Social Network  Online - Lookups, updates, analysis of social environment  Business Intelligence - Spotting trends, key players, big query  Graph analytics - Community detection, Page rank, graph metrics Semantic Publishing  Modeled after the BBC linked data portal, online lookups, drill downs and updates License CC-BY-SA 4.0 (International).
  • 22. GeoKnow - The Planet in your Pocket Ms. Globe and Mr. Cube have a thing going on:  Mr. Cube: Desiloization ... integrated metadata ... Explicit semantics .  Ms. Globe: I can feel it... but are you man enough ... you need to show me. License CC-BY-SA 4.0 (International).
  • 23. Planet Scale Roadmap Jan 2014:  Virtuoso SPARQL outperforms PostGIS in map lookups with planet-wide Open Street Map  Virtuoso SQL adds 5x more power License CC-BY-SA 4.0 (International).
  • 24. Next: Jan 2015  Parity between SPARQL and SQL via structure awareness  Geospatial data clustering  Graph analytics close to the data, Pregel Giraph etc in the DB itself  Adding fine grain geo dimension to LDBC social network benchmark License CC-BY-SA 4.0 (International).
  • 25. The LOD2 scaling adventures Experiments at CWI’s Scilens cluster:  150Gtriples in Jan 2013 (8x256G RAM)  500Gtriples Aug 2014 (12 x 256G RAM)  Some trillion triple claims exist but do not detail any query workload BSBM explore and BI workloads  10x speed gains for BI queries from 2013 to 2014 Bulk load at 6M triples/s  All done in triples, structure awareness will go further still License CC-BY-SA 4.0 (International).
  • 26. Open PHACTs Partners: License CC-BY-SA 4.0 (International).
  • 27. Virtuoso Now Snapshot of RDF Linked Data customers in the Enterprise:  Data.Gov (U.S. Govt. Open Linked Data initiative)  Bank of America  Booz Allen Hamilton  Northrop Grumman  Elsevier  French National Library  Samsung  Globo  Daimler Benz  Johnson & Johnson  Bayer  St Jude's Medical  Fuijitsu  Syngenta  and many more License CC-BY-SA 4.0 (International).
  • 28. Virtuoso Availability  Most capabilities as open source  Commercial adds  Cluster scale-out  SQL Federation  Replication (SQL & RDF)  Advanced RDF security, ABAC & RBAC (ACLs)  Wide tables  and more  Up to the minute tech previews via v7fasttrack on github, e.g. superfast TPC-H implementation License CC-BY-SA 4.0 (International).
  • 29. Virtuoso Future  Preview of structure aware RDF store in fall 2014 via v7fasttrack Integrated graph analytics framework  Embed complex graph algos, e.g. community detection, shortest path inside SPARQL/SQL  Comparison of SQL and SPARQL for big data analytics License CC-BY-SA 4.0 (International).
  • 30. Linked Data Now  Adoption across major industries  Superior flexibility and time to solution  Dramatic performance gains in the last 5 years  Benchmarking will continue to drive progress, to the benefit of users and vendors alike  Run circles around most open source SQL in SPARQL: Virtuoso SPARQL beats MySQL in SSB by 100x  With structure awareness, SPARQL to match the best in SQL for data warehousing, OLTP  Linked Data no longer a long shot but a technology that makes sense License CC-BY-SA 4.0 (International).
  • 31. About OpenLink Software OpenLink Software is a privately-held company founded in 1992 by its President & CEO, Kingsley Idehen. The company is an industry acclaimed technology innovator in the following areas:  ODBC, JDBC, ADO.NET, and OLE-DB compliant Data Access Drivers for Oracle, SQL Server, Informix, Ingres, Sybase, Progress, MySQL, and PostgreSQL  High-Performance & Scalable Multi-Model (Relational & Graph) Database Technology  Data Integration Middleware (Data Virtualization Technology across a wide variety of Protocols & Formats)  Web Application Server Technology  Linked Data Deployment & Management  Socially-enhanced Distributed Collaborative Applications Platforms (Weblogs, Wikis, Feed Aggregation and Syndication, Web File Systems, Discussion Forums, etc.)  Identity Management. License CC-BY-SA 4.0 (International).
  • 32. Office Locations USA OpenLink Software, Inc 10 Burlington Mall Road Suite 265 Burlington, MA 01803 Tel.: +1 781 273 0900 Fax: +1 781 229 8030 UK OpenLink Software Ltd. Airport House Purley Way Croydon, Surrey CR0 0XZ Tel.: +44 (0)20 8681 7701 Fax: +44 (0)20 8681 7702 License CC-BY-SA 4.0 (International).
  • 33. Additional Information Web Sites OpenLink Software YouID – Digital Identity Card (Certificate) Generator OpenLink Data Spaces – Semantically enhanced Personal & Enterprise Data Spaces & Collaboration Platform OpenLink Virtuoso - Hybrid Data Management, Integration, Application, and Identity Server Universal Data Access Drivers - High-Performance ODBC, JDBC, ADO.NET, and OLE-DB Drivers LDAP and NetID-TLS – How to use LDAP scheme URIs with NetID-TLS Authentication Social Media Data spaces http://kidehen.blogspot.com (weblog) http://www.openlinksw.com/blog/~kidehen/ (weblog) https://plus.google.com/112399767740508618350/posts (Google+) https://twitter.com/#!/kidehen (Twitter) Hashtag: #LinkedData (Anywhere). License CC-BY-SA 4.0 (International).