SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
1 / 15 
Database Systems Research 
in Dan Olteanu's Group @Oxford 
DBOnto Kick-O Sept 25, 2014
Factorized Databases 
Probabilistic Databases 
Datalog Engines 
2 / 15 
Outline
Factorized Databases by Example 
Orders 
customer day pizza 
Mario Monday Capricciosa 
Mario Friday Capricciosa 
Pietro Friday Hawaii 
Lucia Friday Hawaii 
Pizzas 
pizza item 
Capricciosa base 
Capricciosa ham 
Capricciosa mushrooms 
Hawaii base 
Hawaii ham 
Hawaii pineapple 
Items 
item price 
base 6 
ham 1 
mushrooms 1 
pineapple 2 
Consider the natural join of the three relations above: 
Orders 1 Pizzas 1 Items 
customer day pizza item price 
Mario Monday Capricciosa base 6 
Mario Monday Capricciosa ham 1 
Mario Monday Capricciosa mushrooms 1 
Mario Friday Capricciosa base 6 
Mario Friday Capricciosa ham 1 
Mario Friday Capricciosa mushrooms 1 
: : : : : : : : : : : : : : : 
3 / 15
Factorized Databases by Example 
Orders 1 Pizzas 1 Items 
customer day pizza item price 
Mario Monday Capricciosa base 6 
Mario Monday Capricciosa ham 1 
Mario Monday Capricciosa mushrooms 1 
Mario Friday Capricciosa base 6 
Mario Friday Capricciosa ham 1 
Mario Friday Capricciosa mushrooms 1 
: : : : : : : : : : : : : : : 
A 
at relational algebra expression encoding the above query result is: 
hMarioi  hMondayi  hCapricciosai  hbasei  h6i [ 
hMarioi  hMondayi  hCapricciosai  hhami  h1i [ 
hMarioi  hMondayi  hCapricciosai  hmushroomsi  h1i [ 
hMarioi  hFridayi  hCapricciosai  hbasei  h6i [ 
hMarioi  hFridayi  hCapricciosai  hhami  h1i [ 
hMarioi  hFridayi  hCapricciosai  hmushroomsi  h1i [ : : : 
It uses relational product (), union ([), and singleton relations (e.g., h1i). 
The attribute names are not shown to avoid clutter. 
4 / 15
Factorized Databases by Example 
The previous relational expression entails lots of redundancy due to the joins: 
hMarioi  hMondayi  hCapricciosai  hbasei  h6i [ 
hMarioi  hMondayi  hCapricciosai  hhami  h1i [ 
hMarioi  hMondayi  hCapricciosai  hmushroomsi  h1i [ 
hMarioi  hFridayi  hCapricciosai  hbasei  h6i [ 
hMarioi  hFridayi  hCapricciosai  hhami  h1i [ 
hMarioi  hFridayi  hCapricciosai  hmushroomsi  h1i [ : : : 
We can factorize the expression following the join structure, e.g.,: 
hCapricciosai  (hMondayi  hMarioi [ hFridayi  hMarioi) 
 (hbasei  h6i [ hhami  h1i [ hmushroomsi  h1i) 
[ hHawaiii  hFridayi  (hLuciai [ hPietroi) 
 (hbasei  h6i [ hhami  h1i [ hpineapplei  h2i) 
pizza 
day 
customer 
item 
price 
There are several algebraically equivalent factorized representations de
ned by 
distributivity of product over union and commutativity of product and union. 
5 / 15
Key Properties of Factorized Representations 
Factorized representations of results for queries with select, project, join, 
aggregate, groupby, and orderby operators: 
Very high compression rate 
I Can be exponentially more succinct than the relations they encode. 
I Arbitrarily better than generic compression schemes, e.g., bzip2 
I Factorized representations of asymptotically-tight size bounds computable 
directly from input database and query 
Querying in the compressed domain 
I Factorizations are relational expressions and can be composed with queries 
I We developed the FDB in-memory query engine for this purpose 
6 / 15
Current Focus 
Reduce communication cost in distributed database systems 
Factorization of temporary query results exchanged between nodes 
Many systems already employ limited factorizations 
Google MegaStore and F1, FoundationDB, Microsoft Cloud SQL Server 
Google Faculty Research Award 
Reduce space requirements of large-scale feature vectors in predictive modelling 
Feature vectors = relations with high cardinality 
Improvements of 10-100x on LogicBlox client data 
7 / 15
Factorized Databases 
Probabilistic Databases 
Datalog Engines 
8 / 15 
Outline 
gora ENFrame SPROUT2
Probabilistic Data is Commonplace 
Facts of life: 
Real-world data is often uncertain 
Currrent probabilistic databases are in the order of Billion records 
Generated from web data by NELL, Google Squared  Knowledge Vault 
Curating before processing is a time  money black hole 
We would like to query uncertain data asap! 
9 / 15
Probabilistic Data is Commonplace 
Facts of life: 
Real-world data is often uncertain 
Currrent probabilistic databases are in the order of Billion records 
Generated from web data by NELL, Google Squared  Knowledge Vault 
Curating before processing is a time  money black hole 
We would like to query uncertain data asap! 
MayBMS/SPROUT probabilistic database system 
Open-source, built on top of PostgreSQL 
3000+ downloads (as of Dec 2013) 
The PDB most benchmarked against 
SPROUT2 = SPROUT on Google Squared 
Caught the interest of UK Defence Science and Technology Lab 
9 / 15
A Squared Query Engine for Uncertain Web Data 
10 / 15
A Squared Query Engine for Uncertain Web Data 
11 / 15
Factorized Databases 
Probabilistic Databases 
Datalog Engines 
12 / 15 
Outline
A New Breed of Smart Database Systems 
Uni
ed and declarative programming model for the enterprise tech stack 
Can freely mix transactions, analytics, graph queries, mathematical 
programming and optimization, probabilistic programming 
Makes possible new classes of hybrid applications 
Typical app in retail sector: 
I 50K Datalog++ LOC (vs. millions of C++ LOC) 
I One system (vs. tens) 
13 / 15

Weitere ähnliche Inhalte

Was ist angesagt?

Semantic Web Science
Semantic Web ScienceSemantic Web Science
Semantic Web ScienceJames Hendler
 
R, HTTP, and APIs, with a preview of TopicWatchr
R, HTTP, and APIs, with a preview of TopicWatchrR, HTTP, and APIs, with a preview of TopicWatchr
R, HTTP, and APIs, with a preview of TopicWatchrPortland R User Group
 
Hadoop Case Studies in the Real World
Hadoop Case Studies in the Real WorldHadoop Case Studies in the Real World
Hadoop Case Studies in the Real WorldMobin Ranjbar
 
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...Brandi Davis-Dusenbery
 
Presentation at the EMBL-EBI Industry RDF meeting
Presentation at the EMBL-EBI  Industry RDF meetingPresentation at the EMBL-EBI  Industry RDF meeting
Presentation at the EMBL-EBI Industry RDF meetingJohannes Keizer
 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncMartin Klein
 
3 Google Operators
3 Google Operators3 Google Operators
3 Google Operatorsaptwano
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big DataEdureka!
 
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioData Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioWinston Chen
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data AnalyticsEdureka!
 
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph DataInfovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph DataPaul Houle
 
Elasticsearch in hatena bookmark
Elasticsearch in hatena bookmarkElasticsearch in hatena bookmark
Elasticsearch in hatena bookmarkShunsuke Kozawa
 
Searching for reliable business information: free versus fee
Searching for reliable business information: free versus feeSearching for reliable business information: free versus fee
Searching for reliable business information: free versus feevoginip
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archiveLewis Crawford
 

Was ist angesagt? (18)

Open Data
Open DataOpen Data
Open Data
 
Semantic Web Science
Semantic Web ScienceSemantic Web Science
Semantic Web Science
 
R, HTTP, and APIs, with a preview of TopicWatchr
R, HTTP, and APIs, with a preview of TopicWatchrR, HTTP, and APIs, with a preview of TopicWatchr
R, HTTP, and APIs, with a preview of TopicWatchr
 
Hadoop Case Studies in the Real World
Hadoop Case Studies in the Real WorldHadoop Case Studies in the Real World
Hadoop Case Studies in the Real World
 
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...
 
Database Backup
Database BackupDatabase Backup
Database Backup
 
Presentation at the EMBL-EBI Industry RDF meeting
Presentation at the EMBL-EBI  Industry RDF meetingPresentation at the EMBL-EBI  Industry RDF meeting
Presentation at the EMBL-EBI Industry RDF meeting
 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSync
 
3 Google Operators
3 Google Operators3 Google Operators
3 Google Operators
 
شرح مقدمة في أصول التفسير لابن تيمية (بازمول)
شرح مقدمة في أصول التفسير لابن تيمية (بازمول)شرح مقدمة في أصول التفسير لابن تيمية (بازمول)
شرح مقدمة في أصول التفسير لابن تيمية (بازمول)
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big Data
 
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioData Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudio
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph DataInfovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
 
Big data
Big dataBig data
Big data
 
Elasticsearch in hatena bookmark
Elasticsearch in hatena bookmarkElasticsearch in hatena bookmark
Elasticsearch in hatena bookmark
 
Searching for reliable business information: free versus fee
Searching for reliable business information: free versus feeSearching for reliable business information: free versus fee
Searching for reliable business information: free versus fee
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
 

Andere mochten auch

SemFacet Poster
SemFacet PosterSemFacet Poster
SemFacet PosterDBOnto
 
Optique - poster
Optique - posterOptique - poster
Optique - posterDBOnto
 
Optique presentation
Optique presentationOptique presentation
Optique presentationDBOnto
 
Semantic Faceted Search with SemFacet presentation
Semantic Faceted Search with SemFacet presentationSemantic Faceted Search with SemFacet presentation
Semantic Faceted Search with SemFacet presentationDBOnto
 
ROSeAnn Presentation
ROSeAnn PresentationROSeAnn Presentation
ROSeAnn PresentationDBOnto
 
PAGOdA Presentation
PAGOdA PresentationPAGOdA Presentation
PAGOdA PresentationDBOnto
 
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...DBOnto
 
Aggregating Semantic Annotators Paper
Aggregating Semantic Annotators PaperAggregating Semantic Annotators Paper
Aggregating Semantic Annotators PaperDBOnto
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paperDBOnto
 
RDFox Poster
RDFox PosterRDFox Poster
RDFox PosterDBOnto
 
PDQ: Proof-driven Querying presentation
PDQ: Proof-driven Querying presentationPDQ: Proof-driven Querying presentation
PDQ: Proof-driven Querying presentationDBOnto
 
PAGOdA paper
PAGOdA paperPAGOdA paper
PAGOdA paperDBOnto
 
PAGOdA poster
PAGOdA posterPAGOdA poster
PAGOdA posterDBOnto
 
Diadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meetingDiadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meetingDBOnto
 
ArtForm - Dynamic analysis of JavaScript validation in web forms - Poster
ArtForm - Dynamic analysis of JavaScript validation in web forms - PosterArtForm - Dynamic analysis of JavaScript validation in web forms - Poster
ArtForm - Dynamic analysis of JavaScript validation in web forms - PosterDBOnto
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paperDBOnto
 
PDQ Poster
PDQ PosterPDQ Poster
PDQ PosterDBOnto
 
Welcome by Ian Horrocks
Welcome by Ian HorrocksWelcome by Ian Horrocks
Welcome by Ian HorrocksDBOnto
 
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DBOnto
 
Parallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox PresentationParallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox PresentationDBOnto
 

Andere mochten auch (20)

SemFacet Poster
SemFacet PosterSemFacet Poster
SemFacet Poster
 
Optique - poster
Optique - posterOptique - poster
Optique - poster
 
Optique presentation
Optique presentationOptique presentation
Optique presentation
 
Semantic Faceted Search with SemFacet presentation
Semantic Faceted Search with SemFacet presentationSemantic Faceted Search with SemFacet presentation
Semantic Faceted Search with SemFacet presentation
 
ROSeAnn Presentation
ROSeAnn PresentationROSeAnn Presentation
ROSeAnn Presentation
 
PAGOdA Presentation
PAGOdA PresentationPAGOdA Presentation
PAGOdA Presentation
 
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
 
Aggregating Semantic Annotators Paper
Aggregating Semantic Annotators PaperAggregating Semantic Annotators Paper
Aggregating Semantic Annotators Paper
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paper
 
RDFox Poster
RDFox PosterRDFox Poster
RDFox Poster
 
PDQ: Proof-driven Querying presentation
PDQ: Proof-driven Querying presentationPDQ: Proof-driven Querying presentation
PDQ: Proof-driven Querying presentation
 
PAGOdA paper
PAGOdA paperPAGOdA paper
PAGOdA paper
 
PAGOdA poster
PAGOdA posterPAGOdA poster
PAGOdA poster
 
Diadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meetingDiadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meeting
 
ArtForm - Dynamic analysis of JavaScript validation in web forms - Poster
ArtForm - Dynamic analysis of JavaScript validation in web forms - PosterArtForm - Dynamic analysis of JavaScript validation in web forms - Poster
ArtForm - Dynamic analysis of JavaScript validation in web forms - Poster
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paper
 
PDQ Poster
PDQ PosterPDQ Poster
PDQ Poster
 
Welcome by Ian Horrocks
Welcome by Ian HorrocksWelcome by Ian Horrocks
Welcome by Ian Horrocks
 
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
 
Parallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox PresentationParallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox Presentation
 

Ähnlich wie Overview of Dan Olteanu's Research presentation

Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
llr+ cHApTEFt s Database Processing(2) Does this design e.docx
llr+ cHApTEFt s Database Processing(2) Does this design e.docxllr+ cHApTEFt s Database Processing(2) Does this design e.docx
llr+ cHApTEFt s Database Processing(2) Does this design e.docxsmile790243
 
BSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBigML, Inc
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataInside Analysis
 
Five Critical Success Factors for Big Data and Traditional BI
Five Critical Success Factors for Big Data and Traditional BIFive Critical Success Factors for Big Data and Traditional BI
Five Critical Success Factors for Big Data and Traditional BIInside Analysis
 
CCCB Germline Variant Analysis on Cloud Platform
CCCB Germline Variant Analysis on Cloud PlatformCCCB Germline Variant Analysis on Cloud Platform
CCCB Germline Variant Analysis on Cloud PlatformYaoyu Wang
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018VMware Tanzu
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Eli White
 
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docxFai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docxssuser454af01
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesDataWorks Summit
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptxTarekHamdi8
 
End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed REnd-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed RJorge Martinez de Salinas
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
 
Data Infrastructure for a World of Music
Data Infrastructure for a World of MusicData Infrastructure for a World of Music
Data Infrastructure for a World of MusicLars Albertsson
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data LösungenGuido Schmutz
 

Ähnlich wie Overview of Dan Olteanu's Research presentation (20)

Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
llr+ cHApTEFt s Database Processing(2) Does this design e.docx
llr+ cHApTEFt s Database Processing(2) Does this design e.docxllr+ cHApTEFt s Database Processing(2) Does this design e.docx
llr+ cHApTEFt s Database Processing(2) Does this design e.docx
 
BSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBSSML17 - API and WhizzML
BSSML17 - API and WhizzML
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
Five Critical Success Factors for Big Data and Traditional BI
Five Critical Success Factors for Big Data and Traditional BIFive Critical Success Factors for Big Data and Traditional BI
Five Critical Success Factors for Big Data and Traditional BI
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
CCCB Germline Variant Analysis on Cloud Platform
CCCB Germline Variant Analysis on Cloud PlatformCCCB Germline Variant Analysis on Cloud Platform
CCCB Germline Variant Analysis on Cloud Platform
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Pratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnectPratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnect
 
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docxFai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data Lakes
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptx
 
End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed REnd-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Data Infrastructure for a World of Music
Data Infrastructure for a World of MusicData Infrastructure for a World of Music
Data Infrastructure for a World of Music
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data Lösungen
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Overview of Dan Olteanu's Research presentation

  • 1. 1 / 15 Database Systems Research in Dan Olteanu's Group @Oxford DBOnto Kick-O Sept 25, 2014
  • 2. Factorized Databases Probabilistic Databases Datalog Engines 2 / 15 Outline
  • 3. Factorized Databases by Example Orders customer day pizza Mario Monday Capricciosa Mario Friday Capricciosa Pietro Friday Hawaii Lucia Friday Hawaii Pizzas pizza item Capricciosa base Capricciosa ham Capricciosa mushrooms Hawaii base Hawaii ham Hawaii pineapple Items item price base 6 ham 1 mushrooms 1 pineapple 2 Consider the natural join of the three relations above: Orders 1 Pizzas 1 Items customer day pizza item price Mario Monday Capricciosa base 6 Mario Monday Capricciosa ham 1 Mario Monday Capricciosa mushrooms 1 Mario Friday Capricciosa base 6 Mario Friday Capricciosa ham 1 Mario Friday Capricciosa mushrooms 1 : : : : : : : : : : : : : : : 3 / 15
  • 4. Factorized Databases by Example Orders 1 Pizzas 1 Items customer day pizza item price Mario Monday Capricciosa base 6 Mario Monday Capricciosa ham 1 Mario Monday Capricciosa mushrooms 1 Mario Friday Capricciosa base 6 Mario Friday Capricciosa ham 1 Mario Friday Capricciosa mushrooms 1 : : : : : : : : : : : : : : : A at relational algebra expression encoding the above query result is: hMarioi hMondayi hCapricciosai hbasei h6i [ hMarioi hMondayi hCapricciosai hhami h1i [ hMarioi hMondayi hCapricciosai hmushroomsi h1i [ hMarioi hFridayi hCapricciosai hbasei h6i [ hMarioi hFridayi hCapricciosai hhami h1i [ hMarioi hFridayi hCapricciosai hmushroomsi h1i [ : : : It uses relational product (), union ([), and singleton relations (e.g., h1i). The attribute names are not shown to avoid clutter. 4 / 15
  • 5. Factorized Databases by Example The previous relational expression entails lots of redundancy due to the joins: hMarioi hMondayi hCapricciosai hbasei h6i [ hMarioi hMondayi hCapricciosai hhami h1i [ hMarioi hMondayi hCapricciosai hmushroomsi h1i [ hMarioi hFridayi hCapricciosai hbasei h6i [ hMarioi hFridayi hCapricciosai hhami h1i [ hMarioi hFridayi hCapricciosai hmushroomsi h1i [ : : : We can factorize the expression following the join structure, e.g.,: hCapricciosai (hMondayi hMarioi [ hFridayi hMarioi) (hbasei h6i [ hhami h1i [ hmushroomsi h1i) [ hHawaiii hFridayi (hLuciai [ hPietroi) (hbasei h6i [ hhami h1i [ hpineapplei h2i) pizza day customer item price There are several algebraically equivalent factorized representations de
  • 6. ned by distributivity of product over union and commutativity of product and union. 5 / 15
  • 7. Key Properties of Factorized Representations Factorized representations of results for queries with select, project, join, aggregate, groupby, and orderby operators: Very high compression rate I Can be exponentially more succinct than the relations they encode. I Arbitrarily better than generic compression schemes, e.g., bzip2 I Factorized representations of asymptotically-tight size bounds computable directly from input database and query Querying in the compressed domain I Factorizations are relational expressions and can be composed with queries I We developed the FDB in-memory query engine for this purpose 6 / 15
  • 8. Current Focus Reduce communication cost in distributed database systems Factorization of temporary query results exchanged between nodes Many systems already employ limited factorizations Google MegaStore and F1, FoundationDB, Microsoft Cloud SQL Server Google Faculty Research Award Reduce space requirements of large-scale feature vectors in predictive modelling Feature vectors = relations with high cardinality Improvements of 10-100x on LogicBlox client data 7 / 15
  • 9. Factorized Databases Probabilistic Databases Datalog Engines 8 / 15 Outline gora ENFrame SPROUT2
  • 10. Probabilistic Data is Commonplace Facts of life: Real-world data is often uncertain Currrent probabilistic databases are in the order of Billion records Generated from web data by NELL, Google Squared Knowledge Vault Curating before processing is a time money black hole We would like to query uncertain data asap! 9 / 15
  • 11. Probabilistic Data is Commonplace Facts of life: Real-world data is often uncertain Currrent probabilistic databases are in the order of Billion records Generated from web data by NELL, Google Squared Knowledge Vault Curating before processing is a time money black hole We would like to query uncertain data asap! MayBMS/SPROUT probabilistic database system Open-source, built on top of PostgreSQL 3000+ downloads (as of Dec 2013) The PDB most benchmarked against SPROUT2 = SPROUT on Google Squared Caught the interest of UK Defence Science and Technology Lab 9 / 15
  • 12. A Squared Query Engine for Uncertain Web Data 10 / 15
  • 13. A Squared Query Engine for Uncertain Web Data 11 / 15
  • 14. Factorized Databases Probabilistic Databases Datalog Engines 12 / 15 Outline
  • 15. A New Breed of Smart Database Systems Uni
  • 16. ed and declarative programming model for the enterprise tech stack Can freely mix transactions, analytics, graph queries, mathematical programming and optimization, probabilistic programming Makes possible new classes of hybrid applications Typical app in retail sector: I 50K Datalog++ LOC (vs. millions of C++ LOC) I One system (vs. tens) 13 / 15
  • 17. Live Programming in the Database Flexible spreadsheet backed by scalable full- edged DBMS Users can de
  • 18. ne formulas or change schema I Triggers addition/deletion of datalog code on the DB server program! edbs! execution graph! revised execution graph! idbs! revised idbs! (meta-data)! (actual data)! 14 / 15
  • 19. Our Approach Use declarative programming to improve the implementation of declarative systems! Internal library for declarative and incremental maintenance of program state, using a small datalog engine. I In the LogicBlox engine since May 2014 15 / 15