SlideShare ist ein Scribd-Unternehmen logo
1 von 13
SIB . 23.03.2011 . Page 1                         http://lod2.eu




WP2
Storing and Querying
Very Large Knowledge Bases
                             Vienna Update
                             March 2012 – M18

                             Peter Boncz


                                                http://lod2.eu
SIB . 23.03.2011 . Page 2                                             http://lod2.eu




 Table of Contents

 • WP2 Refresher
 • LOD Cloud Hosted on the Knowledge Store Cluster
    * 50B mark reached, column-store Virtuoso deployed
 • State of the Art LOD Laboratory (“Benchmarking”)
    * LDBC – RDF Store Industry council
    * BSBM at large scale
    * RDF-H + Social Intelligence Benchmark (SIB)
 • Technical work
    * column-store Virtuoso  cluster version
    * recycling query results
 • Next up
   * LOD cloud @250B triples
    * Virtuoso: adaptive query optimizer (and more)
    * first MonetDB/SPARQL version (RDF clustering, graph indexing)
LOD2 Title . 02.09.2010 . Page 3                          http://lod2.eu




 WP2 Organization

 CWI (MonetDB):
 • Peter Boncz (also in VUA group of Frank v Harmelen)
 • Duc Pham Minh (Phd student)
 • Irini Fundulaki (1-year sabbatical from FORTH)

 OpenLink (Virtuoso):
 • Orri Erling
 • Hugh Williams
 • Ivan Mikhailov

 + FU Berlin (BSBM)
 + DERI (BSBM text+ LOD cloud + text retrieval/sindice)
 + ULEI (DBpedia benchmark)
SIB . 23.03.2011 . Page 4                              http://lod2.eu


      WP2
      Storing and Querying Very Large Knowledge Bases

Goal: enabling large-scale, feature-rich & enterprise-ready Linked
  Data management solutions

Database Partners in LOD2:
CWI: Leading open source analytics RDBMS
OpenLink: Leading Linked data deployment platform

Technological Excellence:
Creating and publishing metrics for choosing RDF solutions
Bringing Column Store Technology for Business Intelligence on RDF
Ground-breaking database innovations for RDF stores
   (Dynamic Query optimization, Adaptive Caching of Joins,
   Optimized Graph Processing, Cluster/Cloud scalability)
LOD2 Title . 02.09.2010 . Page 5                   http://lod2.eu




 Task 2.1: State of the Art, Evaluation & Benchmarking

 LOD cloud cache scalability
 • M0: 20B triples
 • M12: 50B triples
 • M24: 250B triples
 • M36: 1T triples

 D2.4 completed: 50B triples in LOD cache @ DERI
 First deployment of Virtuoso7 Cluster
 • Currently hosting about 55 billion triples
 • 8 node Virtuoso v7 (column store) Cluster
 • 384GB RAM
 • 2TB Disk Storage
 • 14B/quads, excl literals

 Next up:
 • hardware provisioning for 250B and 1T triples
  (need 512GB RAM resp. 2TB RAM somewhere)
LOD2 Title . 02.09.2010 . Page 6                         http://lod2.eu




 Task 2.1: State of the Art, Evaluation & Benchmarking

 Benchmarking

 • creating new benchmarks
      • BSBM-BI (FU Berlin)
      • DBpedia Benchmark (ULEI) – best paper award
      • RDF-H (OGL,CWI)
      • Social Intelligence Benchmark (OGL,CWI)
 • running benchmark evaluations
      • BSBM on a large cluster cluster (Lisa @ SARA)
      • BSBM on large single-server (40cores, 1TB RAM)
 • creating industry consensus
      • Benchmark Auditing Service
      • LOD Benchmark Council
LOD2 Title . 02.09.2010 . Page 7                               http://lod2.eu




 BSBM Large Scale Experiments (still ongoing..)

 New Aspects:
 • The Business Intelligence Use Case (BI)
 • Benchmark Rules
 • BSBM V3 Results
 • trying cluster versions

 SARA LISA cluster
 • experiments with up to 64 nodes

 VectorWise high-end server
 • 40-core machine with 1TB RAM

 Benchmarked at SARA and Vectorwise
 4store 1.1.2      Garlik       http://4store.org/
 BigData r4169     SYSTAP LLC   http://www.systap.com/bigdata.htm
 BigOwlim 3.4.3129 OntoText     http://www.ontotext.com/owlim/
 Jena TDB 0.8.9    openjena.org http://www.openjena.org/TDB/
 Fuseki 0.1.0      openjena.org http://openjena.org/wiki/Fuseki
 Virtuoso 7.0      OpenLink     http://virtuoso.openlinksw.com/
LOD2 Title . 02.09.2010 . Page 9                           http://lod2.eu




           Social Intelligence Benchmark




                                       14 dictionaries
                                        of real data
Facebook schema style
                                     Realistic scenario
                                        simulation

         Synthetic Generated Data                         Linked Open Data
LOD2 Title . 02.09.2010 . Page 11                                  http://lod2.eu




 Technical Work: Recycling (D2.4)

 Dynamic caching of intermediate query results
 • SPARQL problem: hard to index workload / expensive backward chaining
 Idea: compute once, re-use many times
LOD2 Title . 02.09.2010 . Page 13                           http://lod2.eu




 Technical Work: Virtuoso 7

 Major now upcoming release V7, due for release in 2012

 • column store technology:
       • aggressive compression  more data fits in RAM
       • vectored execution  things run faster
 • elastic cluster implementation
       • partitions can migrate across nodes
 • bringing computation to the data
       • arbitrary recursive functions in the cluster
 • geospatial support
       • full openGIS support, R-tree backed, EWKT format
 • future enhancements
       • adaptive query optimization (CWI ROX)
       •re-use of intermediates (CWI recycling)
       • using SSDs as cache
LOD2 Title . 02.09.2010 . Page 14                             http://lod2.eu




 Next 6 months


 Virtuoso: sampled query optimizer
 • query optimization in SPARQL is difficult (no stats)
 • use adaptive, run-time, query optimization with sampling

 MonetDB and SPARQL
 • First version in sight (cooperation with FORTH)
 • research tracks
       • RDF clustering on Characteristic Sets
       • correlated join path indexing

 LOD cache at 250B triples
 • what triples to use?
 • what hardware to use? (need 512GB RAM)
SIB . 23.03.2011 . Page 15            http://lod2.eu




      Contact

      Address

      Centrum Wiskunde Informatica (CWI)
      Science Park 123
      1098 XG Amsterdam
      The Netherlands

      monetdb.cwi.nl




Thanks for your attention!
LOD2 Title . 02.09.2010 . Page 16                                  http://lod2.eu




 LOD2 Benchmark Auditing Service

 Benchmarking needs of SPARQL engine vendors:
 • vendors want to publish in their own timescale
 • using new or upcoming releases (not yet public)
 • using properly tuned settings and hardware to their solution
 • yet need credibility (is it fair)

 Tournaments organized by one institution have
 • bad timing, wrong version, one more bug to fix, etc
 • not the right hardware or settings
 • may become a legal liability once matters become more serious

 LOD2 should reach out to the SPARQL technical community and
 provide independent benchmark auditing services
 • start with BSBM  working on Auditing Rules Document
 • maybe other benchmarks later

Weitere ähnliche Inhalte

Andere mochten auch

México Edelman Trust Barometer 2013
México Edelman Trust Barometer 2013México Edelman Trust Barometer 2013
México Edelman Trust Barometer 2013EdelmanMexico
 
Social Media for Small Business
Social Media for Small BusinessSocial Media for Small Business
Social Media for Small BusinessCaroline Cummings
 
Retribution Storyboard1 Pt 1
Retribution Storyboard1 Pt 1Retribution Storyboard1 Pt 1
Retribution Storyboard1 Pt 1dapaz93
 
How to create cned school servr 40000 total
How to create cned school servr 40000 totalHow to create cned school servr 40000 total
How to create cned school servr 40000 totalPrachoom Rangkasikorn
 
Ares José Luis - Juicio abreviado: lo que hay que saber - 2010
Ares José Luis -  Juicio abreviado: lo que hay que saber - 2010Ares José Luis -  Juicio abreviado: lo que hay que saber - 2010
Ares José Luis - Juicio abreviado: lo que hay que saber - 2010Departamento de Derecho UNS
 
バリューチェーン分析・VRIO分析
バリューチェーン分析・VRIO分析バリューチェーン分析・VRIO分析
バリューチェーン分析・VRIO分析0nly0
 
JMESI Bioethics Two Applications
JMESI Bioethics Two ApplicationsJMESI Bioethics Two Applications
JMESI Bioethics Two ApplicationsKevin Parrish
 

Andere mochten auch (9)

México Edelman Trust Barometer 2013
México Edelman Trust Barometer 2013México Edelman Trust Barometer 2013
México Edelman Trust Barometer 2013
 
Social Media for Small Business
Social Media for Small BusinessSocial Media for Small Business
Social Media for Small Business
 
Podcast
PodcastPodcast
Podcast
 
Retribution Storyboard1 Pt 1
Retribution Storyboard1 Pt 1Retribution Storyboard1 Pt 1
Retribution Storyboard1 Pt 1
 
How to create cned school servr 40000 total
How to create cned school servr 40000 totalHow to create cned school servr 40000 total
How to create cned school servr 40000 total
 
resum 2015
resum 2015resum 2015
resum 2015
 
Ares José Luis - Juicio abreviado: lo que hay que saber - 2010
Ares José Luis -  Juicio abreviado: lo que hay que saber - 2010Ares José Luis -  Juicio abreviado: lo que hay que saber - 2010
Ares José Luis - Juicio abreviado: lo que hay que saber - 2010
 
バリューチェーン分析・VRIO分析
バリューチェーン分析・VRIO分析バリューチェーン分析・VRIO分析
バリューチェーン分析・VRIO分析
 
JMESI Bioethics Two Applications
JMESI Bioethics Two ApplicationsJMESI Bioethics Two Applications
JMESI Bioethics Two Applications
 

Ähnlich wie LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases

The architecture of oak
The architecture of oakThe architecture of oak
The architecture of oakMichael Dürig
 
osscon_mysql_redis_plugin
osscon_mysql_redis_pluginosscon_mysql_redis_plugin
osscon_mysql_redis_pluginhyeongchae lee
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionFlink Forward
 
Virtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFVirtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFOpenLink Software
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSSteve Wong
 
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference KeynoteKingsley Uyi Idehen
 
Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)Joachim Neubert
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications OpenEBS
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Jim Dowling
 
AWS Summit Berlin 2012 Talk on Web Data Commons
AWS Summit Berlin 2012 Talk on Web Data CommonsAWS Summit Berlin 2012 Talk on Web Data Commons
AWS Summit Berlin 2012 Talk on Web Data CommonsHannes Mühleisen
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012Amazon Web Services
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformMaris Elsins
 

Ähnlich wie LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases (20)

LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and RepairLOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
 
The architecture of oak
The architecture of oakThe architecture of oak
The architecture of oak
 
osscon_mysql_redis_plugin
osscon_mysql_redis_pluginosscon_mysql_redis_plugin
osscon_mysql_redis_plugin
 
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and AuthoringLOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
 
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and RepairLOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
 
LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz
 
Publishing Linked Data from RDB
Publishing Linked Data from RDBPublishing Linked Data from RDB
Publishing Linked Data from RDB
 
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the StackLOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the Stack
 
Solr 4
Solr 4Solr 4
Solr 4
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
Virtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFVirtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDF
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge BasesLOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
 
Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
AWS Summit Berlin 2012 Talk on Web Data Commons
AWS Summit Berlin 2012 Talk on Web Data CommonsAWS Summit Berlin 2012 Talk on Web Data Commons
AWS Summit Berlin 2012 Talk on Web Data Commons
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
 

Mehr von LOD2 Creating Knowledge out of Interlinked Data

Mehr von LOD2 Creating Knowledge out of Interlinked Data (20)

LOD2 Webinar: SIREn
LOD2 Webinar: SIREnLOD2 Webinar: SIREn
LOD2 Webinar: SIREn
 
LOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViewsLOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViews
 
LOD2 Webinar Series FOX
LOD2 Webinar Series FOXLOD2 Webinar Series FOX
LOD2 Webinar Series FOX
 
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORELOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
 
LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7
 
LOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia SpotlightLOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia Spotlight
 
LOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKANLOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKAN
 
LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine
 
LOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industryLOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industry
 
LOD2 General Presentation 2012
LOD2 General Presentation 2012LOD2 General Presentation 2012
LOD2 General Presentation 2012
 
LOD2 Webinar Series: PoolParty
LOD2 Webinar Series: PoolPartyLOD2 Webinar Series: PoolParty
LOD2 Webinar Series: PoolParty
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
 
LOD2 Webinar Series: LIMES
LOD2 Webinar Series: LIMESLOD2 Webinar Series: LIMES
LOD2 Webinar Series: LIMES
 
LOD2 Plenary Vienna 2012: WP12 - Project Management
LOD2 Plenary Vienna 2012: WP12 - Project ManagementLOD2 Plenary Vienna 2012: WP12 - Project Management
LOD2 Plenary Vienna 2012: WP12 - Project Management
 
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
 
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
 
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
 
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data WebLOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
 
LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing
LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing
LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing
 
LOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 Stack
LOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 StackLOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 Stack
LOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 Stack
 

Kürzlich hochgeladen

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Kürzlich hochgeladen (20)

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases

  • 1. SIB . 23.03.2011 . Page 1 http://lod2.eu WP2 Storing and Querying Very Large Knowledge Bases Vienna Update March 2012 – M18 Peter Boncz http://lod2.eu
  • 2. SIB . 23.03.2011 . Page 2 http://lod2.eu Table of Contents • WP2 Refresher • LOD Cloud Hosted on the Knowledge Store Cluster * 50B mark reached, column-store Virtuoso deployed • State of the Art LOD Laboratory (“Benchmarking”) * LDBC – RDF Store Industry council * BSBM at large scale * RDF-H + Social Intelligence Benchmark (SIB) • Technical work * column-store Virtuoso  cluster version * recycling query results • Next up * LOD cloud @250B triples * Virtuoso: adaptive query optimizer (and more) * first MonetDB/SPARQL version (RDF clustering, graph indexing)
  • 3. LOD2 Title . 02.09.2010 . Page 3 http://lod2.eu WP2 Organization CWI (MonetDB): • Peter Boncz (also in VUA group of Frank v Harmelen) • Duc Pham Minh (Phd student) • Irini Fundulaki (1-year sabbatical from FORTH) OpenLink (Virtuoso): • Orri Erling • Hugh Williams • Ivan Mikhailov + FU Berlin (BSBM) + DERI (BSBM text+ LOD cloud + text retrieval/sindice) + ULEI (DBpedia benchmark)
  • 4. SIB . 23.03.2011 . Page 4 http://lod2.eu WP2 Storing and Querying Very Large Knowledge Bases Goal: enabling large-scale, feature-rich & enterprise-ready Linked Data management solutions Database Partners in LOD2: CWI: Leading open source analytics RDBMS OpenLink: Leading Linked data deployment platform Technological Excellence: Creating and publishing metrics for choosing RDF solutions Bringing Column Store Technology for Business Intelligence on RDF Ground-breaking database innovations for RDF stores (Dynamic Query optimization, Adaptive Caching of Joins, Optimized Graph Processing, Cluster/Cloud scalability)
  • 5. LOD2 Title . 02.09.2010 . Page 5 http://lod2.eu Task 2.1: State of the Art, Evaluation & Benchmarking LOD cloud cache scalability • M0: 20B triples • M12: 50B triples • M24: 250B triples • M36: 1T triples D2.4 completed: 50B triples in LOD cache @ DERI First deployment of Virtuoso7 Cluster • Currently hosting about 55 billion triples • 8 node Virtuoso v7 (column store) Cluster • 384GB RAM • 2TB Disk Storage • 14B/quads, excl literals Next up: • hardware provisioning for 250B and 1T triples (need 512GB RAM resp. 2TB RAM somewhere)
  • 6. LOD2 Title . 02.09.2010 . Page 6 http://lod2.eu Task 2.1: State of the Art, Evaluation & Benchmarking Benchmarking • creating new benchmarks • BSBM-BI (FU Berlin) • DBpedia Benchmark (ULEI) – best paper award • RDF-H (OGL,CWI) • Social Intelligence Benchmark (OGL,CWI) • running benchmark evaluations • BSBM on a large cluster cluster (Lisa @ SARA) • BSBM on large single-server (40cores, 1TB RAM) • creating industry consensus • Benchmark Auditing Service • LOD Benchmark Council
  • 7. LOD2 Title . 02.09.2010 . Page 7 http://lod2.eu BSBM Large Scale Experiments (still ongoing..) New Aspects: • The Business Intelligence Use Case (BI) • Benchmark Rules • BSBM V3 Results • trying cluster versions SARA LISA cluster • experiments with up to 64 nodes VectorWise high-end server • 40-core machine with 1TB RAM Benchmarked at SARA and Vectorwise 4store 1.1.2 Garlik http://4store.org/ BigData r4169 SYSTAP LLC http://www.systap.com/bigdata.htm BigOwlim 3.4.3129 OntoText http://www.ontotext.com/owlim/ Jena TDB 0.8.9 openjena.org http://www.openjena.org/TDB/ Fuseki 0.1.0 openjena.org http://openjena.org/wiki/Fuseki Virtuoso 7.0 OpenLink http://virtuoso.openlinksw.com/
  • 8. LOD2 Title . 02.09.2010 . Page 9 http://lod2.eu Social Intelligence Benchmark 14 dictionaries of real data Facebook schema style Realistic scenario simulation Synthetic Generated Data Linked Open Data
  • 9. LOD2 Title . 02.09.2010 . Page 11 http://lod2.eu Technical Work: Recycling (D2.4) Dynamic caching of intermediate query results • SPARQL problem: hard to index workload / expensive backward chaining Idea: compute once, re-use many times
  • 10. LOD2 Title . 02.09.2010 . Page 13 http://lod2.eu Technical Work: Virtuoso 7 Major now upcoming release V7, due for release in 2012 • column store technology: • aggressive compression  more data fits in RAM • vectored execution  things run faster • elastic cluster implementation • partitions can migrate across nodes • bringing computation to the data • arbitrary recursive functions in the cluster • geospatial support • full openGIS support, R-tree backed, EWKT format • future enhancements • adaptive query optimization (CWI ROX) •re-use of intermediates (CWI recycling) • using SSDs as cache
  • 11. LOD2 Title . 02.09.2010 . Page 14 http://lod2.eu Next 6 months Virtuoso: sampled query optimizer • query optimization in SPARQL is difficult (no stats) • use adaptive, run-time, query optimization with sampling MonetDB and SPARQL • First version in sight (cooperation with FORTH) • research tracks • RDF clustering on Characteristic Sets • correlated join path indexing LOD cache at 250B triples • what triples to use? • what hardware to use? (need 512GB RAM)
  • 12. SIB . 23.03.2011 . Page 15 http://lod2.eu Contact Address Centrum Wiskunde Informatica (CWI) Science Park 123 1098 XG Amsterdam The Netherlands monetdb.cwi.nl Thanks for your attention!
  • 13. LOD2 Title . 02.09.2010 . Page 16 http://lod2.eu LOD2 Benchmark Auditing Service Benchmarking needs of SPARQL engine vendors: • vendors want to publish in their own timescale • using new or upcoming releases (not yet public) • using properly tuned settings and hardware to their solution • yet need credibility (is it fair) Tournaments organized by one institution have • bad timing, wrong version, one more bug to fix, etc • not the right hardware or settings • may become a legal liability once matters become more serious LOD2 should reach out to the SPARQL technical community and provide independent benchmark auditing services • start with BSBM  working on Auditing Rules Document • maybe other benchmarks later

Hinweis der Redaktion

  1. From the aforementioned reasons, we proposed an RDF and graph database benchmark, called Social Intelligence benchmark, that can exploit the advantages of RDF in graph representation. We are aiming at testing the graph database performance on a highly connected graph. As social network is a high profile for graph data management, we design our benchmark over the scenarios of a social network. We try to generate data as realistic as possible with correlations and offer challenging queries over the data correlations.Besides, since a very large amount of useful information is available in many linked-open datasets, we exploit these resources by linking to them.
  2. Now, I will describe the data specification of SIB. As Facebook is the most popular social network with more than 800 millions active users, we take the schema style of Facebook as the baseline for designing SIB. For generating realistic data, we use 14 dictionaries that we build from real data. These dictionaries cover various domains, for example, geographical information, personal names,..SIB data is designed so that it can simulate realistic scenario including the real behaviors of the users and the characteristics of data distributions in social networks.As we mention before, our synthetic data is linked with well-known linked open data. And here, SIB is linked with DBPedia, one of the largest linked open dataset.
  3. I think most of us know FB and even have a Facebook account. The logical schema of our benchmark simulates the Facebook schema in which a user can have many friends, and there are friendships between them. A user can provide many profile information such as his name, where he is studying at, where he is living at. He can also specify his current status, for example, in Relation ship with another user. The user can upload many photo, start a discussion by writing posts, and get a lot of comments from his friends.