SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
ADBIS’2021
Optimizing execution plans
in a multistore
Chiara Forresi, Matteo Francia, Enrico Gallinucci, Matteo Golfarelli
University of Bologna, Cesena, Italy
25th European Conference on Advances in Databases and Information Systems (ADBIS 2021)
ADBIS’2021
What is a multistore?
Chiara Forresi – University of Bologna 2
Introduction
Forresi, C., Gallinucci, E., Golfarelli, M. et al. A dataspace-based framework for OLAP analyses in a high-variety multistore. The VLDB Journal (2021)
ADBIS’2021
Common problems in a multistore
- Data models heterogeneity: different
conventions for representing data
(normalization, denormalization and nesting
of the data)
- Some systems handle this kind of heterogeneity
- A naïve approach consists in transforming all the data
into the same reference model [8]
- Multimodel systems [3] support several data models
within the same platform
- Multistore and polystore systems [21] provide
integrated access and querying to several
heterogeneous stores through a mediator layer
(middleware). E.g., BIGDAWG [10], TATOOINE [6],
CloudMDsQL [14]
Chiara Forresi – University of Bologna 3
Introduction
[8] DiScala, M., Abadi, D.J.: Automatic generation of normalized relational schemas from nested key-value data. In: 2016 ACM SIGMOD Int. Conf. on Management of Data. pp. 295–310. ACM (2016)
[3] Bimonte, S., Gallinucci, E., Marcel, P., Rizzi, S.: Data variety, come as you are in multi-model data warehouses. Information Systems (2021)
[21] Tan, R., Chirkova, R., Gadepally, V., Mattson, T.G.: Enabling query processing across heterogeneous data models: A survey. In: 2017 IEEE Int. Conf. on Big Data. pp. 3211–3220. IEEE Computer Society (2017)
[10] Gadepally, V., et al.: The bigdawg polystore system and architecture. In: 2016 IEEE High Performance Extreme Computing Conf. pp. 1–6. IEEE (2016)
[6] Bonaque, R., et al.: Mixed-instance querying: a lightweight integration architecture for data journalism. Proc. VLDB Endow.9(13), 1513–1516 (2016)
[14] Kolev, B., et al.: Cloudmdsql: querying heterogeneous cloud data stores with a common language. Distributed and Parallel Databases 34(4), 463–503 (2016)
ADBIS’2021
Common problems in a multistore
- Schema heterogeneity: missing attributes, attributes with different name
or type that represents the same concept
- Record overlapping
Data fusion [5, 16] is not applied directly in related works
- QUEPA [15] considers it
Chiara Forresi – University of Bologna 4
Introduction
ID Name Age
1 Denis 21
2 Anne 40
T1 ID Name Age
1 Philip 20
3 Michelle 25
T2
ID Name Age
1 Denis 21
2 Anne 40
T1 ID FirstName Creation date
3 Philip 2021-12-09
4 Michelle 2020-02-18
T2
[5] Bleiholder, J., Naumann, F.: Data fusion. ACM computing surveys (CSUR)41(1),1–41 (2009)
[16] Mandreoli, F., Montangero, M.: Dealing with data heterogeneity in a data fusion perspective: Models, methodologies, and algorithms. In: Data Handling in Science and Technology, vol. 31, pp. 235–270. Elsevier (2019)
[15] Maccioni, A., Torlone, R.: Augmented access for querying and exploring a polystore. In: 34th IEEE Int. Conf. on Data Engineering, ICDE 2018. pp. 77–88. IEEE Computer Society (2018)
ADBIS’2021
How to solve schema heterogeneity?
Pay-as-you-go integration [13]
- Overcome traditional integration problems within an iterative and light integration
- Dataspace [9]: logical view of available datasets. Consisting of: mapping, feature and entity
- The computation is done at middleware level
Chiara Forresi – University of Bologna 5
ID Name
T1
CustId CustName
T2
ID
Name
Customer
[13] Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: 2008 ACM SIGMOD Int. Conf. on Management of Data. pp.847–860. ACM (2008)
[9] Franklin, M.J., Halevy, A.Y., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Record34(4), 27–33 (2005)
Querying
a
dataspace
ADBIS’2021
Querying a dataspace
GPSJ [12] queries formulated on features
Chiara Forresi – University of Bologna 6
Querying
a
dataspace
[12] Golfarelli, M., Maio, D., Rizzi, S.: The dimensional fact model: A conceptual model for data warehouses. Int. J. Cooperative Inf. Syst.7(2-3), 215–247 (1998)
The query:
«For each product name, compute the average of the
quantities purchased by female customers starting from
2019»
is equivalent to the GPSJ query with:
- projections: {"#$%&'( )*+,}
- aggregations: {(/&*)(0(1, *34 ())}
- selections: {(%*(,, ≥, "2019/01/01"), (4,)%,#, =, "F" )}
ADBIS’2021
How to solve record overlapping?
Data fusion operations:
- Merge (⨆) and simultaneous unnest ( ̅
A) operations
- Conflict resolution (at the various levels of heterogeneity exposed)
- Shape of the result
- The simultaneous unnest operator is used to manage data fusion with nested data
Chiara Forresi – University of Bologna 7
ID Name Age
1 Denis 21
2 Anne 40
T1
ID FirstName Creation date
1 Philip 2021-12-09
3 Michelle 2020-02-18
T2
ID FirstName Age Creation date
1 Philip 21 2021-12-09
2 Anne 40 -
3 Michelle - 2020-02-18
Result
Querying
a
dataspace
⨆
ADBIS’2021
How to build a query plan
- The rationale of our previous work was:
“First solve heterogeneity at the entity level,
then join all entities”
- Drawbacks of the approach
- It does not take advantage of the original data modeling
- Limited push down of local operations
- Multiple accesses are made to collections containing
more entities
- The goal of this work is to study different
strategies that take these factors into
account
Chiara Forresi – University of Bologna 8
Querying
a
dataspace
ADBIS’2021
Query plans and optimization
Strategies are based on reference schemas representations:
- Normalized (NoS)
Chiara Forresi – University of Bologna 9
Querying
a
dataspace
ADBIS’2021
Query plans and optimization
Strategies are based on reference schemas representations:
- Normalized (NoS)
- Flat (FlS)
Chiara Forresi – University of Bologna 10
Querying
a
dataspace
ADBIS’2021
Query plans and optimization
Strategies are based on reference schemas representations:
- Normalized (NoS)
- Flat (FlS)
- Nested (NeS)
Chiara Forresi – University of Bologna 11
Querying
a
dataspace
ADBIS’2021
Query plans and optimization
Focus on a scenario with two entities (customers and orders)
Different ways to join collections
- Depending on the schema representation
- Depending on the need to solve record overlapping
Chiara Forresi – University of Bologna 12
Querying
a
dataspace
ADBIS’2021
Normalized Join (NoJ)
Chiara Forresi – University of Bologna 13
CX1 CY1 CX2 CY2
φX = true, φY = true
⋈
CX1 CY1 CX2 CY2 CX1 CY1 CX2 CY2
∪ ∪ ∪
φX = true, φY = false φX = false, φY = false
⋈ ⋈
ADBIS’2021
Nested Join (NeJ)
Chiara Forresi – University of Bologna 14
φX = true, φY = true φX = true, φY = false φX = false, φY = false
CXY1 CXY2
"
µ
CXY1 CXY2
$
∪
CXY1 CXY2
∪
$
∪
ADBIS’2021
Flat Join (FlJ)
Chiara Forresi – University of Bologna 15
φX = true, φY = true φX = true, φY = false φX = false, φY = false
CYX1 CYX2
ϒ ϒ
⋈
CYX1 CYX2 CYX1 CYX2
∪
ϒ ϒ
∪
⋈
ADBIS’2021
Query plans and optimization
Execution plans are composed by two steps
1. A schema alignment step
2. A joining step
Several execution plans can be devised
depending on
- The chosen join strategy
- The engines (local DBMS or middleware)
that carry out schema alignment
Chiara Forresi – University of Bologna 16
Querying
a
dataspace
ADBIS’2021
Our cost model
A cost model is necessary to choose the most efficient plan
Creating a cost model in a multistore context is not trivial
- The query is computed from a heterogeneous set of engines
- Each engine has its own resources and capabilities
Our cost model
- Is a custom cost model, based on mathematical formulas
- Is focused on disk IO (number of pages read and written)
- Enables simulations without the effort to generate data
Chiara Forresi – University of Bologna 17
Cost
model
ADBIS’2021
Our cost model
Cost calculation:
C$D( E =
C$D( E!
)!
+ +*G " # $,&,' C$D((E")
Chiara Forresi – University of Bologna 18
Cost
model
ADBIS’2021
Experiments
Focus on a scenario with two entities (customers and orders)
- 3 plans, one for each join strategy
- computation pushed down to local DBMSs as much as possible
- We tested more than 3000 configurations varying:
- Number of records of each entity
- Ordering key
- Record overlapping
- The plan showing the actual lowest execution time is also the one with
lowest estimated cost in 83% of cases. The remaining cases induces a
10% of execution overhead
- Conversely, always choosing a single join strategy returns an average overhead between
77% and 127%
Chiara Forresi – University of Bologna 19
Evaluation
&
Conclusion
ADBIS’2021
Alignment evaluation
- NeS is the cheapest to move away from and the most expensive to move
towards
- FlS is the most expensive to move away from
- MongoDB joins are too expensive
- Cassandra has a lack of support for certain operations
Chiara Forresi – University of Bologna 20
unclustered
clustered
Evaluation
&
Conclusion
ADBIS’2021
Join/full evaluations
Chiara Forresi – University of Bologna 21
- Execution costs for NoJ are the most stable in each overlap scenario
- In the no overlap scenario FlJ is clearly the winner
- In the partial overlap scenario NeJ wins (as to previous alignment step)
and in total overalap scenario NoJ wins but it depends of the imbalance of
data towards a certain schema representation
full
join
Evaluation
&
Conclusion
ADBIS’2021
Future developments
- Work with a broader set of entities and compose the final result with the
combination of each join strategy
- Work with a broader set of queries
- Develop a dynamic cost model that takes into account aspects such as:
- CPU and network
- Distribution of the resources intra and inter databases
- Resources’ changes over the time
Chiara Forresi – University of Bologna 22
Evaluation
&
Conclusion
ADBIS’2021
Questions?
Chiara Forresi – University of Bologna 23
Thank you.

Weitere ähnliche Inhalte

Was ist angesagt?

IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET Journal
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataRinke Hoekstra
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationRinke Hoekstra
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of DataPaul Groth
 
KOIOS: Utilizing Semantic Search for Easy-Access and Visualization of Structu...
KOIOS: Utilizing Semantic Search for Easy-Access and Visualization of Structu...KOIOS: Utilizing Semantic Search for Easy-Access and Visualization of Structu...
KOIOS: Utilizing Semantic Search for Easy-Access and Visualization of Structu...Thanh Tran
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and TextNBER
 
Big data visualization state of the art
Big data visualization state of the artBig data visualization state of the art
Big data visualization state of the artsoria musa
 
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...tmra
 
Graph Databases Lifecycle Methodology and Tool to Support Index/Store Versio...
Graph Databases Lifecycle Methodology  and Tool to Support Index/Store Versio...Graph Databases Lifecycle Methodology  and Tool to Support Index/Store Versio...
Graph Databases Lifecycle Methodology and Tool to Support Index/Store Versio...Paolo Nesi
 
Data Science tutorial for beginner level to advanced level | Data Science pro...
Data Science tutorial for beginner level to advanced level | Data Science pro...Data Science tutorial for beginner level to advanced level | Data Science pro...
Data Science tutorial for beginner level to advanced level | Data Science pro...IQ Online Training
 
Data collection for cultural project
Data collection for cultural projectData collection for cultural project
Data collection for cultural projectDanilo Supino
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachFerdin Joe John Joseph PhD
 
Short intro to 3D Geoinfo pilot The Netherlands
Short intro to 3D Geoinfo pilot The NetherlandsShort intro to 3D Geoinfo pilot The Netherlands
Short intro to 3D Geoinfo pilot The NetherlandsLéon Berlo
 
Business Models - Introduction to Data Science
Business Models -  Introduction to Data ScienceBusiness Models -  Introduction to Data Science
Business Models - Introduction to Data ScienceFrank Kienle
 
Jiali_Han_Resume
Jiali_Han_ResumeJiali_Han_Resume
Jiali_Han_ResumeJiali Han
 

Was ist angesagt? (20)

IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities Data
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance Visualization
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
KOIOS: Utilizing Semantic Search for Easy-Access and Visualization of Structu...
KOIOS: Utilizing Semantic Search for Easy-Access and Visualization of Structu...KOIOS: Utilizing Semantic Search for Easy-Access and Visualization of Structu...
KOIOS: Utilizing Semantic Search for Easy-Access and Visualization of Structu...
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
 
The Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text TechnologiesThe Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text Technologies
 
Big data visualization state of the art
Big data visualization state of the artBig data visualization state of the art
Big data visualization state of the art
 
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
 
Graph Databases Lifecycle Methodology and Tool to Support Index/Store Versio...
Graph Databases Lifecycle Methodology  and Tool to Support Index/Store Versio...Graph Databases Lifecycle Methodology  and Tool to Support Index/Store Versio...
Graph Databases Lifecycle Methodology and Tool to Support Index/Store Versio...
 
Data Science tutorial for beginner level to advanced level | Data Science pro...
Data Science tutorial for beginner level to advanced level | Data Science pro...Data Science tutorial for beginner level to advanced level | Data Science pro...
Data Science tutorial for beginner level to advanced level | Data Science pro...
 
Data collection for cultural project
Data collection for cultural projectData collection for cultural project
Data collection for cultural project
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approach
 
Short intro to 3D Geoinfo pilot The Netherlands
Short intro to 3D Geoinfo pilot The NetherlandsShort intro to 3D Geoinfo pilot The Netherlands
Short intro to 3D Geoinfo pilot The Netherlands
 
Ws2001 sessione8 cibella_tuoto
Ws2001 sessione8 cibella_tuotoWs2001 sessione8 cibella_tuoto
Ws2001 sessione8 cibella_tuoto
 
CV
CVCV
CV
 
Business Models - Introduction to Data Science
Business Models -  Introduction to Data ScienceBusiness Models -  Introduction to Data Science
Business Models - Introduction to Data Science
 
Jiali_Han_Resume
Jiali_Han_ResumeJiali_Han_Resume
Jiali_Han_Resume
 

Ähnlich wie [ADBIS 2021] - Optimizing Execution Plans in a Multistore

Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityBarry Smith
 
Getting relational database from legacy data mdre approach
Getting relational database from legacy data mdre approachGetting relational database from legacy data mdre approach
Getting relational database from legacy data mdre approachAlexander Decker
 
ODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in MLODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in MLBryan Bischof
 
Software Design Patterns Lecutre (Intro)
Software Design Patterns Lecutre (Intro)Software Design Patterns Lecutre (Intro)
Software Design Patterns Lecutre (Intro)VikramJothyPrakash1
 
Model-Driven Spreadsheet Development
Model-Driven Spreadsheet DevelopmentModel-Driven Spreadsheet Development
Model-Driven Spreadsheet DevelopmentJácome Cunha
 
Paper sharing_The architectural design and implementation of a digital platfo...
Paper sharing_The architectural design and implementation of a digital platfo...Paper sharing_The architectural design and implementation of a digital platfo...
Paper sharing_The architectural design and implementation of a digital platfo...YOU SHENG CHEN
 
10.1.1.21.5883
10.1.1.21.588310.1.1.21.5883
10.1.1.21.5883paserv
 
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...ijseajournal
 
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...ijseajournal
 
Isab 11 for_slideshare
Isab 11 for_slideshareIsab 11 for_slideshare
Isab 11 for_slideshareRichard Adams
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
 
Data Integration in Multi-sources Information Systems
Data Integration in Multi-sources Information SystemsData Integration in Multi-sources Information Systems
Data Integration in Multi-sources Information Systemsijceronline
 
PresentationTest
PresentationTestPresentationTest
PresentationTestbolu804
 
Is the cloud educational enterprise resource planning the answer to tradition...
Is the cloud educational enterprise resource planning the answer to tradition...Is the cloud educational enterprise resource planning the answer to tradition...
Is the cloud educational enterprise resource planning the answer to tradition...Alexander Decker
 
Overview of OSLC - INCOSE IW 2018 MBSE Workshop
Overview of OSLC - INCOSE IW 2018 MBSE Workshop Overview of OSLC - INCOSE IW 2018 MBSE Workshop
Overview of OSLC - INCOSE IW 2018 MBSE Workshop Axel Reichwein
 
FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfAlan Morrison
 
Programming (and Learning) Self-Adaptive & Self-Organising Behaviour with Sca...
Programming (and Learning) Self-Adaptive & Self-Organising Behaviour with Sca...Programming (and Learning) Self-Adaptive & Self-Organising Behaviour with Sca...
Programming (and Learning) Self-Adaptive & Self-Organising Behaviour with Sca...Roberto Casadei
 
Intake 38 data access 4
Intake 38 data access 4Intake 38 data access 4
Intake 38 data access 4Mahmoud Ouf
 
Towards a new hybrid approach for building documentoriented data wareh
Towards a new hybrid approach for building documentoriented data warehTowards a new hybrid approach for building documentoriented data wareh
Towards a new hybrid approach for building documentoriented data warehIJECEIAES
 

Ähnlich wie [ADBIS 2021] - Optimizing Execution Plans in a Multistore (20)

Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
 
Getting relational database from legacy data mdre approach
Getting relational database from legacy data mdre approachGetting relational database from legacy data mdre approach
Getting relational database from legacy data mdre approach
 
ODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in MLODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in ML
 
Software Design Patterns Lecutre (Intro)
Software Design Patterns Lecutre (Intro)Software Design Patterns Lecutre (Intro)
Software Design Patterns Lecutre (Intro)
 
Model-Driven Spreadsheet Development
Model-Driven Spreadsheet DevelopmentModel-Driven Spreadsheet Development
Model-Driven Spreadsheet Development
 
Paper sharing_The architectural design and implementation of a digital platfo...
Paper sharing_The architectural design and implementation of a digital platfo...Paper sharing_The architectural design and implementation of a digital platfo...
Paper sharing_The architectural design and implementation of a digital platfo...
 
10.1.1.21.5883
10.1.1.21.588310.1.1.21.5883
10.1.1.21.5883
 
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
 
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
 
Isab 11 for_slideshare
Isab 11 for_slideshareIsab 11 for_slideshare
Isab 11 for_slideshare
 
Intake 37 ef1
Intake 37 ef1Intake 37 ef1
Intake 37 ef1
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Data Integration in Multi-sources Information Systems
Data Integration in Multi-sources Information SystemsData Integration in Multi-sources Information Systems
Data Integration in Multi-sources Information Systems
 
PresentationTest
PresentationTestPresentationTest
PresentationTest
 
Is the cloud educational enterprise resource planning the answer to tradition...
Is the cloud educational enterprise resource planning the answer to tradition...Is the cloud educational enterprise resource planning the answer to tradition...
Is the cloud educational enterprise resource planning the answer to tradition...
 
Overview of OSLC - INCOSE IW 2018 MBSE Workshop
Overview of OSLC - INCOSE IW 2018 MBSE Workshop Overview of OSLC - INCOSE IW 2018 MBSE Workshop
Overview of OSLC - INCOSE IW 2018 MBSE Workshop
 
FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdf
 
Programming (and Learning) Self-Adaptive & Self-Organising Behaviour with Sca...
Programming (and Learning) Self-Adaptive & Self-Organising Behaviour with Sca...Programming (and Learning) Self-Adaptive & Self-Organising Behaviour with Sca...
Programming (and Learning) Self-Adaptive & Self-Organising Behaviour with Sca...
 
Intake 38 data access 4
Intake 38 data access 4Intake 38 data access 4
Intake 38 data access 4
 
Towards a new hybrid approach for building documentoriented data wareh
Towards a new hybrid approach for building documentoriented data warehTowards a new hybrid approach for building documentoriented data wareh
Towards a new hybrid approach for building documentoriented data wareh
 

Kürzlich hochgeladen

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 

Kürzlich hochgeladen (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 

[ADBIS 2021] - Optimizing Execution Plans in a Multistore

  • 1. ADBIS’2021 Optimizing execution plans in a multistore Chiara Forresi, Matteo Francia, Enrico Gallinucci, Matteo Golfarelli University of Bologna, Cesena, Italy 25th European Conference on Advances in Databases and Information Systems (ADBIS 2021)
  • 2. ADBIS’2021 What is a multistore? Chiara Forresi – University of Bologna 2 Introduction Forresi, C., Gallinucci, E., Golfarelli, M. et al. A dataspace-based framework for OLAP analyses in a high-variety multistore. The VLDB Journal (2021)
  • 3. ADBIS’2021 Common problems in a multistore - Data models heterogeneity: different conventions for representing data (normalization, denormalization and nesting of the data) - Some systems handle this kind of heterogeneity - A naïve approach consists in transforming all the data into the same reference model [8] - Multimodel systems [3] support several data models within the same platform - Multistore and polystore systems [21] provide integrated access and querying to several heterogeneous stores through a mediator layer (middleware). E.g., BIGDAWG [10], TATOOINE [6], CloudMDsQL [14] Chiara Forresi – University of Bologna 3 Introduction [8] DiScala, M., Abadi, D.J.: Automatic generation of normalized relational schemas from nested key-value data. In: 2016 ACM SIGMOD Int. Conf. on Management of Data. pp. 295–310. ACM (2016) [3] Bimonte, S., Gallinucci, E., Marcel, P., Rizzi, S.: Data variety, come as you are in multi-model data warehouses. Information Systems (2021) [21] Tan, R., Chirkova, R., Gadepally, V., Mattson, T.G.: Enabling query processing across heterogeneous data models: A survey. In: 2017 IEEE Int. Conf. on Big Data. pp. 3211–3220. IEEE Computer Society (2017) [10] Gadepally, V., et al.: The bigdawg polystore system and architecture. In: 2016 IEEE High Performance Extreme Computing Conf. pp. 1–6. IEEE (2016) [6] Bonaque, R., et al.: Mixed-instance querying: a lightweight integration architecture for data journalism. Proc. VLDB Endow.9(13), 1513–1516 (2016) [14] Kolev, B., et al.: Cloudmdsql: querying heterogeneous cloud data stores with a common language. Distributed and Parallel Databases 34(4), 463–503 (2016)
  • 4. ADBIS’2021 Common problems in a multistore - Schema heterogeneity: missing attributes, attributes with different name or type that represents the same concept - Record overlapping Data fusion [5, 16] is not applied directly in related works - QUEPA [15] considers it Chiara Forresi – University of Bologna 4 Introduction ID Name Age 1 Denis 21 2 Anne 40 T1 ID Name Age 1 Philip 20 3 Michelle 25 T2 ID Name Age 1 Denis 21 2 Anne 40 T1 ID FirstName Creation date 3 Philip 2021-12-09 4 Michelle 2020-02-18 T2 [5] Bleiholder, J., Naumann, F.: Data fusion. ACM computing surveys (CSUR)41(1),1–41 (2009) [16] Mandreoli, F., Montangero, M.: Dealing with data heterogeneity in a data fusion perspective: Models, methodologies, and algorithms. In: Data Handling in Science and Technology, vol. 31, pp. 235–270. Elsevier (2019) [15] Maccioni, A., Torlone, R.: Augmented access for querying and exploring a polystore. In: 34th IEEE Int. Conf. on Data Engineering, ICDE 2018. pp. 77–88. IEEE Computer Society (2018)
  • 5. ADBIS’2021 How to solve schema heterogeneity? Pay-as-you-go integration [13] - Overcome traditional integration problems within an iterative and light integration - Dataspace [9]: logical view of available datasets. Consisting of: mapping, feature and entity - The computation is done at middleware level Chiara Forresi – University of Bologna 5 ID Name T1 CustId CustName T2 ID Name Customer [13] Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: 2008 ACM SIGMOD Int. Conf. on Management of Data. pp.847–860. ACM (2008) [9] Franklin, M.J., Halevy, A.Y., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Record34(4), 27–33 (2005) Querying a dataspace
  • 6. ADBIS’2021 Querying a dataspace GPSJ [12] queries formulated on features Chiara Forresi – University of Bologna 6 Querying a dataspace [12] Golfarelli, M., Maio, D., Rizzi, S.: The dimensional fact model: A conceptual model for data warehouses. Int. J. Cooperative Inf. Syst.7(2-3), 215–247 (1998) The query: «For each product name, compute the average of the quantities purchased by female customers starting from 2019» is equivalent to the GPSJ query with: - projections: {"#$%&'( )*+,} - aggregations: {(/&*)(0(1, *34 ())} - selections: {(%*(,, ≥, "2019/01/01"), (4,)%,#, =, "F" )}
  • 7. ADBIS’2021 How to solve record overlapping? Data fusion operations: - Merge (⨆) and simultaneous unnest ( ̅ A) operations - Conflict resolution (at the various levels of heterogeneity exposed) - Shape of the result - The simultaneous unnest operator is used to manage data fusion with nested data Chiara Forresi – University of Bologna 7 ID Name Age 1 Denis 21 2 Anne 40 T1 ID FirstName Creation date 1 Philip 2021-12-09 3 Michelle 2020-02-18 T2 ID FirstName Age Creation date 1 Philip 21 2021-12-09 2 Anne 40 - 3 Michelle - 2020-02-18 Result Querying a dataspace ⨆
  • 8. ADBIS’2021 How to build a query plan - The rationale of our previous work was: “First solve heterogeneity at the entity level, then join all entities” - Drawbacks of the approach - It does not take advantage of the original data modeling - Limited push down of local operations - Multiple accesses are made to collections containing more entities - The goal of this work is to study different strategies that take these factors into account Chiara Forresi – University of Bologna 8 Querying a dataspace
  • 9. ADBIS’2021 Query plans and optimization Strategies are based on reference schemas representations: - Normalized (NoS) Chiara Forresi – University of Bologna 9 Querying a dataspace
  • 10. ADBIS’2021 Query plans and optimization Strategies are based on reference schemas representations: - Normalized (NoS) - Flat (FlS) Chiara Forresi – University of Bologna 10 Querying a dataspace
  • 11. ADBIS’2021 Query plans and optimization Strategies are based on reference schemas representations: - Normalized (NoS) - Flat (FlS) - Nested (NeS) Chiara Forresi – University of Bologna 11 Querying a dataspace
  • 12. ADBIS’2021 Query plans and optimization Focus on a scenario with two entities (customers and orders) Different ways to join collections - Depending on the schema representation - Depending on the need to solve record overlapping Chiara Forresi – University of Bologna 12 Querying a dataspace
  • 13. ADBIS’2021 Normalized Join (NoJ) Chiara Forresi – University of Bologna 13 CX1 CY1 CX2 CY2 φX = true, φY = true ⋈ CX1 CY1 CX2 CY2 CX1 CY1 CX2 CY2 ∪ ∪ ∪ φX = true, φY = false φX = false, φY = false ⋈ ⋈
  • 14. ADBIS’2021 Nested Join (NeJ) Chiara Forresi – University of Bologna 14 φX = true, φY = true φX = true, φY = false φX = false, φY = false CXY1 CXY2 " µ CXY1 CXY2 $ ∪ CXY1 CXY2 ∪ $ ∪
  • 15. ADBIS’2021 Flat Join (FlJ) Chiara Forresi – University of Bologna 15 φX = true, φY = true φX = true, φY = false φX = false, φY = false CYX1 CYX2 ϒ ϒ ⋈ CYX1 CYX2 CYX1 CYX2 ∪ ϒ ϒ ∪ ⋈
  • 16. ADBIS’2021 Query plans and optimization Execution plans are composed by two steps 1. A schema alignment step 2. A joining step Several execution plans can be devised depending on - The chosen join strategy - The engines (local DBMS or middleware) that carry out schema alignment Chiara Forresi – University of Bologna 16 Querying a dataspace
  • 17. ADBIS’2021 Our cost model A cost model is necessary to choose the most efficient plan Creating a cost model in a multistore context is not trivial - The query is computed from a heterogeneous set of engines - Each engine has its own resources and capabilities Our cost model - Is a custom cost model, based on mathematical formulas - Is focused on disk IO (number of pages read and written) - Enables simulations without the effort to generate data Chiara Forresi – University of Bologna 17 Cost model
  • 18. ADBIS’2021 Our cost model Cost calculation: C$D( E = C$D( E! )! + +*G " # $,&,' C$D((E") Chiara Forresi – University of Bologna 18 Cost model
  • 19. ADBIS’2021 Experiments Focus on a scenario with two entities (customers and orders) - 3 plans, one for each join strategy - computation pushed down to local DBMSs as much as possible - We tested more than 3000 configurations varying: - Number of records of each entity - Ordering key - Record overlapping - The plan showing the actual lowest execution time is also the one with lowest estimated cost in 83% of cases. The remaining cases induces a 10% of execution overhead - Conversely, always choosing a single join strategy returns an average overhead between 77% and 127% Chiara Forresi – University of Bologna 19 Evaluation & Conclusion
  • 20. ADBIS’2021 Alignment evaluation - NeS is the cheapest to move away from and the most expensive to move towards - FlS is the most expensive to move away from - MongoDB joins are too expensive - Cassandra has a lack of support for certain operations Chiara Forresi – University of Bologna 20 unclustered clustered Evaluation & Conclusion
  • 21. ADBIS’2021 Join/full evaluations Chiara Forresi – University of Bologna 21 - Execution costs for NoJ are the most stable in each overlap scenario - In the no overlap scenario FlJ is clearly the winner - In the partial overlap scenario NeJ wins (as to previous alignment step) and in total overalap scenario NoJ wins but it depends of the imbalance of data towards a certain schema representation full join Evaluation & Conclusion
  • 22. ADBIS’2021 Future developments - Work with a broader set of entities and compose the final result with the combination of each join strategy - Work with a broader set of queries - Develop a dynamic cost model that takes into account aspects such as: - CPU and network - Distribution of the resources intra and inter databases - Resources’ changes over the time Chiara Forresi – University of Bologna 22 Evaluation & Conclusion
  • 23. ADBIS’2021 Questions? Chiara Forresi – University of Bologna 23 Thank you.