SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Pay-as-you-go Reconciliation in Schema 
Matching Networks 
Nguyen Quoc Viet Hung1, Nguyen Thanh Tam 1, Zoltán Miklós2, Karl Aberer1, 
Avigdor Gal3, and Matthias Weidlich4 
1 École Polytechnique Fédérale de Lausanne 
2 Université de Rennes 1 
3 Technion – Israel Institute of Technology 
4 Imperial College London
ICDE | 2014 2 
Schema Matching - Where? 
Schema matching is the process of establishing correspondences between the 
attributes of schemas, for the purpose of data integration 
Large enterprises 
Cloud 
WWW 
Collaborative Systems 
P2P Networks
Private PhD Thesis Defense | 12.2013 3 
Schema Matching Network 
A network of schemas that are matched against each other 
Traditional approach: 
Mediated schema 
Our approach: 
Schema Matching Network 
S1 S2 S3 
S1 
S2 S3 
Require consensus on schema 
Updated Frequently
ICDE | 2014 4 
Pay-as-you-go Reconciliation 
 Reconciliation is the process of asking human user to give feedback on correspondences. 
 Need of reconciliation: automatic techniques use heuristics  results are inherently uncertain 
s1: EoverI 
s2: BBC 
s3: DVDizzy 
a4: productionDate 
a1: releaseDate 
a3: availabilityDate 
c4 
c2 
c1 
c3 
c5 
a2: screeningDate 
Attribute names are quite similar 
 automatic matching tools often fail to identify the 
correct correspondences. 
Instantiation 
Selective matching 
Uncertainty 
Reduction 
Pay‐as‐you‐go 
reconciliation 
Incrementally improve matching 
quality with minimal user effort 
Instantiate a single trusted 
set of correspondences
ICDE | 2014 5 
System Overview 
General approach: 
1. Develop a probabilistic matching network (pSMN)  can measure the overall 
uncertainty of the network 
2. Reduce network uncertainty: guide user feedback with minimal effort 
3. Instantiate a selective matching: maintain a good set of attribute correspondences 
to make the system available at any time
ICDE | 2014 6 
Outline 
 Probabilistic Schema Matching Network (pSMN): 
 Model 
 Computation 
 Uncertainty Reduction 
 Instantiation of the selective matching 
 Experimental results 
 Conclusion and future work
ICDE | 2014 7 
pSMN - Modeling 
 Schema matching network is modeled as a quadruple N ൌ ܵ, ܩ௦, Γ, ܥ, ܲ 
 ܵ – set of schemas ݏ 
 ܩ௦ ‐ interaction graph: represents the connections in the networks. 
 ܥ – set of attribute correspondences 
 Γ – set of integrity constraints 
 An integrity constraint is the formulation of natural properties 
 1‐1 constraint 
 Cycle constraint (transitivity) 
 Etc. 
 ܲ ൌ ሼpୡሽ – a set of probabilities. Each probability ݌௖ is associated with a 
correspondence ܿ ∈ ܥ.
ICDE | 2014 8 
pSMN - Computing 
 Probability of a correspondence 
 Semantics: indicate the correctness of these correspondences 
 Source: integrity constraints and user input. Idea: a correspondence that involves 
many violations has a high chance of being problematic. 
 Computation: 
 Step 1: construct all possible matching instances Ω ൌ ሼIଵ, … , I୬ሽ. Matching 
instance is a maximal set of correspondences satisfying all integrity constraints 
and user input. 
 Step 2: compute by the formula: 
݌௖ ൌ #௠௔௧௖௛௜௡௚ ௜௡௦௧௔௡௖௘௦ ௖௢௡௧௔௜௡ ௖ 
#௔௟௟ ௣௢௦௦௜௕௟௘ ௠௔௧௖௛௜௡௚ ௜௡௦௧௔௡௖௘௦ (i.e. ݌௖ ൌ ሼூ∈ஐ:௖∈ூሽ 
ஐ ) 
 Challenge: probability computation has a high complexity  We use non‐uniform 
sampling and a view‐maintenance technique to approximate the probability 
efficiently. 
 Network Uncertainty: quantify the uncertainty of pSMN based on entropy: 
ܪ ܥ ൌ െ෍݌௖ 
log ݌௖ ൅ ሺ1 െ ݌௖ሻ logሺ1 െ ݌௖ሻ 
௖∈஼
ICDE | 2014 9 
Outline 
 Probabilistic Schema Matching Network (pSMN): 
 Model 
 Computation 
 Uncertainty Reduction 
 Instantiation of the selective matching 
 Experimental results 
 Conclusion and future work
ICDE | 2014 10 
Reduce Network Uncertainty 
 Goal: guide user to give feedback with minimal user effort 
 Problem (UNCERTAINTY MINIMIZATION WITH LIMITED EFFORT BUDGET). Given a 
probabilistic matching network 〈ܵ, ܥ, ܩ, Γ, ܲ〉 and a budget of user effort ݇, find a set of 
correspondences ܥᇱ ⊆ ܥ with ܥᇱ ൑ ݇, such that ܪሺܥ, ܲሻ is minimal.
ICDE | 2014 11 
Approach – Use heuristic ordering 
 Idea: feed users the correspondences with highest information‐gain first. 
 Information gain: the uncertainty reduction before and after validation: 
ܫܩ ܿ ൌ ܪ ܥ െ ܪሺܥ|ܿሻ 
ܪ ܥ ܿ : expected network uncertainty when knowing the true value of c 
Two possible solutions: {c1,c2,c3} and 
{c1,c4,c5}. 
 Ask c1 first  the network is unchanged 
 no uncertainty reduction. 
 Ask c2 first  only 1 solution left  the 
network becomes certain. 
SA 
SB 
SC 
c3 
c4 
c5 
c1 c2 
SA 
SB 
SC 
c5 
c3 
c4 
c1 c2 
SA 
SB 
SC 
c3 
c1 c2
ICDE | 2014 12 
Instantiate a selective matching 
 Goal: Maintain a single trusted set of correspondences 
 Goodness measurement of a set of correspondences ܫ ⊆ ܥ: 
 Repair distance: information loss of eliminating some correspondences to 
guarantee integrity constraint 
Δ ܫ ൌ ܥ ∖ ܫ 
 Likelihood: represents the collective correctness of correspondences: 
ݑ ܫ ൌ ෑ݌௖ 
௖∈ூ 
 Instantiation problem: given a schema matching network, identify a set of 
correspondences ܫ ⊆ ܥ with minimal repair distance (w.r.t. ܥ) and maximal 
likelihood.
ICDE | 2014 13 
Approach 
 The instantiation problem is NP‐complete  use heuristic approach 
 Algorithm: 
 Step 1: Initialization ‐ Pickup a sampled matching instance with minimal repair 
distance 
 Step 2: Optimization – Randomized local search 
Repair Distance 
Likelihood 
minimal repair distance + maximal likelihood 
I0 
Iopt 
randomized local search 
matching instances: 
satisfy all constraints 
non‐sampled instance 
sampled instance 
sampled + minimal repair distance
ICDE | 2014 14 
Outline 
 Probabilistic Schema Matching Network (pSMN): 
 Model 
 Computation 
 Uncertainty Reduction 
 Instantiation of the selective matching 
 Experimental results 
 Conclusion and future work
ICDE | 2014 15 
Experiment – Dataset and Setting 
 Datasets: 
 Business Partner: schemas from enterprise systems 
 Purchase Order: purchase order e‐business schemas 
 University Application Form: schemas from Web interfaces of American university 
application forms 
 WebForm: schemas from Web forms of different domains 
 Thalia: schemas describing university courses 
 Metrics: 
 Precision: measures quality improvement at each user interaction step ݅, with G 
being the exact match. 
ܲ௜ ൌ ሺD୧ 
∩ ܩሻ/|D୧| 
 User effort: the percentage of feedback steps relative to the size of the matcher 
output. 
ܧ௜ ൌ ݅/|ܥ|
Efficiency of guiding strategy on uncertainty reduction 
 Goal: compare between guiding vs. non‐guiding strategy on uncertainty reduction 
 Evaluation procedure: 
ICDE | 2014 16 
 Increases user effort 
 Upon each user input, measure the network uncertainty and precision 
 Interesting finding: heuristic ordering strategy achieves savings of up to 48% user 
effort compared to random ordering.
ICDE | 2014 17 
Efficiency of guiding strategy on instantiation 
 Goal: compare between guiding vs. non‐guiding strategy on instantiation 
 Evaluation procedure: 
 Increases user effort 
 Measure the precision and recall of the instantiated matching 
 Interesting finding: heuristic ordering strategy outperforms the baseline with an 
average difference of 15% (precision) and 14% (recall).
ICDE | 2014 18 
Conclusions 
 We introduce the concept of schema matching networks and probabilistic matching 
networks 
 We define a model for pay‐as‐you‐go reconciliation on top of matching networks. 
 We propose a guiding technique to reduce network uncertainty and a heuristic 
approach to instantiate a selective matching. 
 Through experiments with real‐world schemas, our guiding strategy outperforms the 
baseline: 
 Saving user effort by up to 48% 
 Increasing precision (15%) and recall (14%)
ICDE | 2014 19 
Future Work 
 Generalizing pay‐as‐you‐go reconciliation for crowdsourced models: 
 Business process matching 
 Ontology alignment
ICDE | 2014 20 
THANK YOU 
Q&A

Weitere ähnliche Inhalte

Was ist angesagt?

Human-centric Interpretability for Digital Pathology
Human-centric Interpretability for Digital PathologyHuman-centric Interpretability for Digital Pathology
Human-centric Interpretability for Digital PathologyMara Graziani
 
Introduction to Model-Based Machine Learning
Introduction to Model-Based Machine LearningIntroduction to Model-Based Machine Learning
Introduction to Model-Based Machine LearningDaniel Emaasit
 
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHESTEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHESsipij
 
ProbabilisticModeling20080411
ProbabilisticModeling20080411ProbabilisticModeling20080411
ProbabilisticModeling20080411Clay Stanek
 
The Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and PlanningThe Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and PlanningYoonho Lee
 
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...csandit
 
copy for Gary Chin.
copy for Gary Chin.copy for Gary Chin.
copy for Gary Chin.Teng Xiaolu
 
Evaluation of online learning
Evaluation of online learningEvaluation of online learning
Evaluation of online learning61820_62133
 
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...ijscai
 
Machine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dkuMachine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dkuSeokhyun Yoon
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningMLAI2
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesHJ van Veen
 
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...sipij
 
Kernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesKernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesSungjoon Choi
 
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...IRJET Journal
 
Paper id 71201913
Paper id 71201913Paper id 71201913
Paper id 71201913IJRAT
 
Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Modelsbutest
 

Was ist angesagt? (19)

Human-centric Interpretability for Digital Pathology
Human-centric Interpretability for Digital PathologyHuman-centric Interpretability for Digital Pathology
Human-centric Interpretability for Digital Pathology
 
Introduction to Model-Based Machine Learning
Introduction to Model-Based Machine LearningIntroduction to Model-Based Machine Learning
Introduction to Model-Based Machine Learning
 
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHESTEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
 
ProbabilisticModeling20080411
ProbabilisticModeling20080411ProbabilisticModeling20080411
ProbabilisticModeling20080411
 
The Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and PlanningThe Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and Planning
 
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
 
copy for Gary Chin.
copy for Gary Chin.copy for Gary Chin.
copy for Gary Chin.
 
Evaluation of online learning
Evaluation of online learningEvaluation of online learning
Evaluation of online learning
 
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
 
Machine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dkuMachine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dku
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar Dresses
 
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
 
Kernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesKernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian Processes
 
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
 
Ajila (1)
Ajila (1)Ajila (1)
Ajila (1)
 
Icml2018 naver review
Icml2018 naver reviewIcml2018 naver review
Icml2018 naver review
 
Paper id 71201913
Paper id 71201913Paper id 71201913
Paper id 71201913
 
Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Models
 

Ähnlich wie Pay-as-you-go Reconciliation in Schema Matching Networks

A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringAllenWu
 
Low rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationLow rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationEvgeny Frolov
 
A scalable collaborative filtering framework based on co-clustering
A scalable collaborative filtering framework based on co-clusteringA scalable collaborative filtering framework based on co-clustering
A scalable collaborative filtering framework based on co-clusteringlau
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Matrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsMatrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsAladejubelo Oluwashina
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix DatasetBen Mabey
 
Study and development of methods and tools for testing, validation and verif...
 Study and development of methods and tools for testing, validation and verif... Study and development of methods and tools for testing, validation and verif...
Study and development of methods and tools for testing, validation and verif...Emilio Serrano
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...IAEME Publication
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingPlanetData Network of Excellence
 
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGESCASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGESIRJET Journal
 
A Review Study OF Movie Recommendation Using Machine Learning
A Review Study OF Movie Recommendation Using Machine LearningA Review Study OF Movie Recommendation Using Machine Learning
A Review Study OF Movie Recommendation Using Machine LearningIRJET Journal
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation SystemIRJET Journal
 
Artificial Intelligence Certification
Artificial Intelligence CertificationArtificial Intelligence Certification
Artificial Intelligence Certificationkartikaryan4
 

Ähnlich wie Pay-as-you-go Reconciliation in Schema Matching Networks (20)

A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Low rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationLow rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference information
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
A scalable collaborative filtering framework based on co-clustering
A scalable collaborative filtering framework based on co-clusteringA scalable collaborative filtering framework based on co-clustering
A scalable collaborative filtering framework based on co-clustering
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Matrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsMatrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender Systems
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Study and development of methods and tools for testing, validation and verif...
 Study and development of methods and tools for testing, validation and verif... Study and development of methods and tools for testing, validation and verif...
Study and development of methods and tools for testing, validation and verif...
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
 
rerngvit_phd_seminar
rerngvit_phd_seminarrerngvit_phd_seminar
rerngvit_phd_seminar
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory Sensing
 
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGESCASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
 
ann1.pptx
ann1.pptxann1.pptx
ann1.pptx
 
A Review Study OF Movie Recommendation Using Machine Learning
A Review Study OF Movie Recommendation Using Machine LearningA Review Study OF Movie Recommendation Using Machine Learning
A Review Study OF Movie Recommendation Using Machine Learning
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation System
 
Ai
AiAi
Ai
 
Ai
AiAi
Ai
 
Ai
AiAi
Ai
 
Artificial Intelligence Certification
Artificial Intelligence CertificationArtificial Intelligence Certification
Artificial Intelligence Certification
 
Ai
AiAi
Ai
 

Mehr von PlanetData Network of Excellence

A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoPlanetData Network of Excellence
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksPlanetData Network of Excellence
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamPlanetData Network of Excellence
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingPlanetData Network of Excellence
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...PlanetData Network of Excellence
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchPlanetData Network of Excellence
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSPlanetData Network of Excellence
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReducePlanetData Network of Excellence
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...PlanetData Network of Excellence
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsPlanetData Network of Excellence
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...PlanetData Network of Excellence
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsPlanetData Network of Excellence
 

Mehr von PlanetData Network of Excellence (20)

Dl2014 slides
Dl2014 slidesDl2014 slides
Dl2014 slides
 
A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about Trentino
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
 
Privacy-Preserving Schema Reuse
Privacy-Preserving Schema ReusePrivacy-Preserving Schema Reuse
Privacy-Preserving Schema Reuse
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
 
CLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data ArchitectureCLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data Architecture
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
 
Data and Knowledge Evolution
Data and Knowledge Evolution  Data and Knowledge Evolution
Data and Knowledge Evolution
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
 
Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 

Kürzlich hochgeladen

Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 

Kürzlich hochgeladen (20)

Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 

Pay-as-you-go Reconciliation in Schema Matching Networks

  • 1. Pay-as-you-go Reconciliation in Schema Matching Networks Nguyen Quoc Viet Hung1, Nguyen Thanh Tam 1, Zoltán Miklós2, Karl Aberer1, Avigdor Gal3, and Matthias Weidlich4 1 École Polytechnique Fédérale de Lausanne 2 Université de Rennes 1 3 Technion – Israel Institute of Technology 4 Imperial College London
  • 2. ICDE | 2014 2 Schema Matching - Where? Schema matching is the process of establishing correspondences between the attributes of schemas, for the purpose of data integration Large enterprises Cloud WWW Collaborative Systems P2P Networks
  • 3. Private PhD Thesis Defense | 12.2013 3 Schema Matching Network A network of schemas that are matched against each other Traditional approach: Mediated schema Our approach: Schema Matching Network S1 S2 S3 S1 S2 S3 Require consensus on schema Updated Frequently
  • 4. ICDE | 2014 4 Pay-as-you-go Reconciliation  Reconciliation is the process of asking human user to give feedback on correspondences.  Need of reconciliation: automatic techniques use heuristics  results are inherently uncertain s1: EoverI s2: BBC s3: DVDizzy a4: productionDate a1: releaseDate a3: availabilityDate c4 c2 c1 c3 c5 a2: screeningDate Attribute names are quite similar  automatic matching tools often fail to identify the correct correspondences. Instantiation Selective matching Uncertainty Reduction Pay‐as‐you‐go reconciliation Incrementally improve matching quality with minimal user effort Instantiate a single trusted set of correspondences
  • 5. ICDE | 2014 5 System Overview General approach: 1. Develop a probabilistic matching network (pSMN)  can measure the overall uncertainty of the network 2. Reduce network uncertainty: guide user feedback with minimal effort 3. Instantiate a selective matching: maintain a good set of attribute correspondences to make the system available at any time
  • 6. ICDE | 2014 6 Outline  Probabilistic Schema Matching Network (pSMN):  Model  Computation  Uncertainty Reduction  Instantiation of the selective matching  Experimental results  Conclusion and future work
  • 7. ICDE | 2014 7 pSMN - Modeling  Schema matching network is modeled as a quadruple N ൌ ܵ, ܩ௦, Γ, ܥ, ܲ  ܵ – set of schemas ݏ  ܩ௦ ‐ interaction graph: represents the connections in the networks.  ܥ – set of attribute correspondences  Γ – set of integrity constraints  An integrity constraint is the formulation of natural properties  1‐1 constraint  Cycle constraint (transitivity)  Etc.  ܲ ൌ ሼpୡሽ – a set of probabilities. Each probability ݌௖ is associated with a correspondence ܿ ∈ ܥ.
  • 8. ICDE | 2014 8 pSMN - Computing  Probability of a correspondence  Semantics: indicate the correctness of these correspondences  Source: integrity constraints and user input. Idea: a correspondence that involves many violations has a high chance of being problematic.  Computation:  Step 1: construct all possible matching instances Ω ൌ ሼIଵ, … , I୬ሽ. Matching instance is a maximal set of correspondences satisfying all integrity constraints and user input.  Step 2: compute by the formula: ݌௖ ൌ #௠௔௧௖௛௜௡௚ ௜௡௦௧௔௡௖௘௦ ௖௢௡௧௔௜௡ ௖ #௔௟௟ ௣௢௦௦௜௕௟௘ ௠௔௧௖௛௜௡௚ ௜௡௦௧௔௡௖௘௦ (i.e. ݌௖ ൌ ሼூ∈ஐ:௖∈ூሽ ஐ )  Challenge: probability computation has a high complexity  We use non‐uniform sampling and a view‐maintenance technique to approximate the probability efficiently.  Network Uncertainty: quantify the uncertainty of pSMN based on entropy: ܪ ܥ ൌ െ෍݌௖ log ݌௖ ൅ ሺ1 െ ݌௖ሻ logሺ1 െ ݌௖ሻ ௖∈஼
  • 9. ICDE | 2014 9 Outline  Probabilistic Schema Matching Network (pSMN):  Model  Computation  Uncertainty Reduction  Instantiation of the selective matching  Experimental results  Conclusion and future work
  • 10. ICDE | 2014 10 Reduce Network Uncertainty  Goal: guide user to give feedback with minimal user effort  Problem (UNCERTAINTY MINIMIZATION WITH LIMITED EFFORT BUDGET). Given a probabilistic matching network 〈ܵ, ܥ, ܩ, Γ, ܲ〉 and a budget of user effort ݇, find a set of correspondences ܥᇱ ⊆ ܥ with ܥᇱ ൑ ݇, such that ܪሺܥ, ܲሻ is minimal.
  • 11. ICDE | 2014 11 Approach – Use heuristic ordering  Idea: feed users the correspondences with highest information‐gain first.  Information gain: the uncertainty reduction before and after validation: ܫܩ ܿ ൌ ܪ ܥ െ ܪሺܥ|ܿሻ ܪ ܥ ܿ : expected network uncertainty when knowing the true value of c Two possible solutions: {c1,c2,c3} and {c1,c4,c5}.  Ask c1 first  the network is unchanged  no uncertainty reduction.  Ask c2 first  only 1 solution left  the network becomes certain. SA SB SC c3 c4 c5 c1 c2 SA SB SC c5 c3 c4 c1 c2 SA SB SC c3 c1 c2
  • 12. ICDE | 2014 12 Instantiate a selective matching  Goal: Maintain a single trusted set of correspondences  Goodness measurement of a set of correspondences ܫ ⊆ ܥ:  Repair distance: information loss of eliminating some correspondences to guarantee integrity constraint Δ ܫ ൌ ܥ ∖ ܫ  Likelihood: represents the collective correctness of correspondences: ݑ ܫ ൌ ෑ݌௖ ௖∈ூ  Instantiation problem: given a schema matching network, identify a set of correspondences ܫ ⊆ ܥ with minimal repair distance (w.r.t. ܥ) and maximal likelihood.
  • 13. ICDE | 2014 13 Approach  The instantiation problem is NP‐complete  use heuristic approach  Algorithm:  Step 1: Initialization ‐ Pickup a sampled matching instance with minimal repair distance  Step 2: Optimization – Randomized local search Repair Distance Likelihood minimal repair distance + maximal likelihood I0 Iopt randomized local search matching instances: satisfy all constraints non‐sampled instance sampled instance sampled + minimal repair distance
  • 14. ICDE | 2014 14 Outline  Probabilistic Schema Matching Network (pSMN):  Model  Computation  Uncertainty Reduction  Instantiation of the selective matching  Experimental results  Conclusion and future work
  • 15. ICDE | 2014 15 Experiment – Dataset and Setting  Datasets:  Business Partner: schemas from enterprise systems  Purchase Order: purchase order e‐business schemas  University Application Form: schemas from Web interfaces of American university application forms  WebForm: schemas from Web forms of different domains  Thalia: schemas describing university courses  Metrics:  Precision: measures quality improvement at each user interaction step ݅, with G being the exact match. ܲ௜ ൌ ሺD୧ ∩ ܩሻ/|D୧|  User effort: the percentage of feedback steps relative to the size of the matcher output. ܧ௜ ൌ ݅/|ܥ|
  • 16. Efficiency of guiding strategy on uncertainty reduction  Goal: compare between guiding vs. non‐guiding strategy on uncertainty reduction  Evaluation procedure: ICDE | 2014 16  Increases user effort  Upon each user input, measure the network uncertainty and precision  Interesting finding: heuristic ordering strategy achieves savings of up to 48% user effort compared to random ordering.
  • 17. ICDE | 2014 17 Efficiency of guiding strategy on instantiation  Goal: compare between guiding vs. non‐guiding strategy on instantiation  Evaluation procedure:  Increases user effort  Measure the precision and recall of the instantiated matching  Interesting finding: heuristic ordering strategy outperforms the baseline with an average difference of 15% (precision) and 14% (recall).
  • 18. ICDE | 2014 18 Conclusions  We introduce the concept of schema matching networks and probabilistic matching networks  We define a model for pay‐as‐you‐go reconciliation on top of matching networks.  We propose a guiding technique to reduce network uncertainty and a heuristic approach to instantiate a selective matching.  Through experiments with real‐world schemas, our guiding strategy outperforms the baseline:  Saving user effort by up to 48%  Increasing precision (15%) and recall (14%)
  • 19. ICDE | 2014 19 Future Work  Generalizing pay‐as‐you‐go reconciliation for crowdsourced models:  Business process matching  Ontology alignment
  • 20. ICDE | 2014 20 THANK YOU Q&A