SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Institute for Web Science & Technologies – WeST
Of Sampling and Smoothing:
Approximating Distributions over
Linked Open Data
Thomas Gottron
May 26th, 2014
PROFILES Workshop, Crete
Thomas Gottron PROFILES 26.5.2014, 2Approximating Distributions over LOD
Distributions over Linked Data
 Probability to observe a certain pattern k
foaf:knows
Predicates
foaf:Person
rdf:type
RDF class types
?x
Property Sets
?y foaf:Person
dbpedia:Actor
rdf:type
Type Sets
?z
dbpedia:Actor
foaf:knows
ECS
P k( )=?
Thomas Gottron PROFILES 26.5.2014, 3Approximating Distributions over LOD
Distributions over Linked Data
 Effectively: Estimate a distribution over pattern instances ki
 Applications:
 Query federation
 Data Mining
 Schema inferencing
k1 k2 knk3 ...
p
Thomas Gottron PROFILES 26.5.2014, 4Approximating Distributions over LOD
Distributions over Linked Data
 Using entire LOD cloud becomes less and less feasible
 Solution:
 Operate on a sample
 Challenges:
 How to sample?
 How to deal with unobserved
instances of a pattern?
k1 k2 knk3 ...
p
Thomas Gottron PROFILES 26.5.2014, 5Approximating Distributions over LOD
Sampling Linked Open Data
Thomas Gottron PROFILES 26.5.2014, 6Approximating Distributions over LOD
Data Format
 Linked Data as N-Quads:
triple – what is the information?
context URI – where does it come from?
s op
c
( )s op c
Thomas Gottron PROFILES 26.5.2014, 7Approximating Distributions over LOD
Sampling Strategies
 Triple (Edge) Based Sampling
 Unique Subject URI (Node) Based Sampling
 Context Based Sampling
 For all sampling approaches:
 Unbiased sampling based on uniform distribution
s op
s
c
Thomas Gottron PROFILES 26.5.2014, 8Approximating Distributions over LOD
Smoothing Distributions
Thomas Gottron PROFILES 26.5.2014, 9Approximating Distributions over LOD
Obtaining a Distribution from an Index
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
à D( )K s
https://github.com/gottron/lod-index-models
Thomas Gottron PROFILES 26.5.2014, 10Approximating Distributions over LOD
Obtaining a Distribution from an Index
k1
k2
k3
...
kn
4
2
10
8
K s(k)
count
Relative frequencies
...
K
p
P k( )=
s(k)
M
M
Thomas Gottron PROFILES 26.5.2014, 11Approximating Distributions over LOD
Unobserved patterns!
 Unobserved pattern instance (e.g. predicate, type sets)
Adjusted relative frequencies
k1
k2
k3
...
kn
4
2
10
8
<new> 0
K
p
...
+ λ
+ λ
+ λ
+ λ
+ λ
M +l K
Thomas Gottron PROFILES 26.5.2014, 12Approximating Distributions over LOD
Unobserved patterns!
 Unobserved pattern instance (e.g. predicate, type sets)
 Lidstone-Smoothing with parameter λ
 Laplace-Smoothing (Add-One) for λ = 1
k1
k2
k3
...
kn
4
2
10
8
<new> 0
K
p
...
+ λ
+ λ
+ λ
+ λ
+ λ
M +l K
Thomas Gottron PROFILES 26.5.2014, 13Approximating Distributions over LOD
Evaluation
Thomas Gottron PROFILES 26.5.2014, 14Approximating Distributions over LOD
Experimental Evaluation
 Obtain different
distributions based on:
 Sampling:
• Strategy (triple, USU, context)
• Rate: (5% - 90%)
 Smoothing:
• Laplace
• Lidstone with λ = 0.5, λ = 0.1 and λ = 0.01
 Compare to full data set
 10 iterations
Thomas Gottron PROFILES 26.5.2014, 15Approximating Distributions over LOD
Comparing Distributions
 Information theoretic measure for comparing distributions:
???
p q
DKL P,Q( )= H(P,Q)-H(P)
H P,Q( )= - P(x)ld(Q(x))
x
å
Cross-Entropy of P and Q
Kullback-Leibler Divergence
Thomas Gottron PROFILES 26.5.2014, 16Approximating Distributions over LOD
Experimental Setup
Index construction / Estimation of distributions
...
...
5% 10% 20% 30% Full (100%)
...
90%
5%
„deviation“
10% 20% 30% 100%90%
Thomas Gottron PROFILES 26.5.2014, 17Approximating Distributions over LOD
RDF class typesPredicates
Impact of Sampling Strategy
Property sets Type sets
Thomas Gottron PROFILES 26.5.2014, 18Approximating Distributions over LOD
Impact of Smoothing
Predicates, context sampling Predicates, triple sampling
ECS, context sampling ECS, USU sampling
Thomas Gottron PROFILES 26.5.2014, 19Approximating Distributions over LOD
Conclusion
Summary
 Baseline for sampling and smoothing techniques
 Little difference between classical smoothing techniques
 Quality of context-based sampling as realistic scenario
 Other samplings suitable for generating VoID descriptions
Future Work
 Smarter smoothing techniques
 Inspired by Language Modelling
 Specific for LOD
Thomas Gottron PROFILES 26.5.2014, 20Approximating Distributions over LOD
Thanks!
Contact:
Thomas Gottron
WeST – Institute for Web Science and Technologies
Universität Koblenz-Landau
gottron@uni-koblenz.de

Weitere ähnliche Inhalte

Was ist angesagt?

Round Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogsRound Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogsMario Juric
 
Building blocks for aggregate programming of self-organising applications
Building blocks for aggregate programming of self-organising applicationsBuilding blocks for aggregate programming of self-organising applications
Building blocks for aggregate programming of self-organising applicationsFoCAS Initiative
 
Python-List comprehension
Python-List comprehensionPython-List comprehension
Python-List comprehensionColin Su
 
Automatic Features Generation And Model Training On Spark: A Bayesian Approach
Automatic Features Generation And Model Training On Spark: A Bayesian ApproachAutomatic Features Generation And Model Training On Spark: A Bayesian Approach
Automatic Features Generation And Model Training On Spark: A Bayesian ApproachSpark Summit
 
Example of iterative deepening search &amp; bidirectional search
Example of iterative deepening search &amp; bidirectional searchExample of iterative deepening search &amp; bidirectional search
Example of iterative deepening search &amp; bidirectional searchAbhijeet Agarwal
 
Search algorithms for discrete optimization
Search algorithms for discrete optimizationSearch algorithms for discrete optimization
Search algorithms for discrete optimizationSally Salem
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationRob Emanuele
 
LSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationLSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationMaruf Aytekin
 
Discovering human places of interest from multimodal mobile phone data
Discovering human places of interest from multimodal mobile phone dataDiscovering human places of interest from multimodal mobile phone data
Discovering human places of interest from multimodal mobile phone dataWei-Yuan Chang
 
Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)Kira
 
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkFOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkRob Emanuele
 
Parameter Space Noise for Exploration
Parameter Space Noise for ExplorationParameter Space Noise for Exploration
Parameter Space Noise for ExplorationYoonho Lee
 
clique-summary
clique-summaryclique-summary
clique-summaryJia Wang
 

Was ist angesagt? (20)

Round Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogsRound Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogs
 
LSH
LSHLSH
LSH
 
Building blocks for aggregate programming of self-organising applications
Building blocks for aggregate programming of self-organising applicationsBuilding blocks for aggregate programming of self-organising applications
Building blocks for aggregate programming of self-organising applications
 
Python-List comprehension
Python-List comprehensionPython-List comprehension
Python-List comprehension
 
Automatic Features Generation And Model Training On Spark: A Bayesian Approach
Automatic Features Generation And Model Training On Spark: A Bayesian ApproachAutomatic Features Generation And Model Training On Spark: A Bayesian Approach
Automatic Features Generation And Model Training On Spark: A Bayesian Approach
 
Example of iterative deepening search &amp; bidirectional search
Example of iterative deepening search &amp; bidirectional searchExample of iterative deepening search &amp; bidirectional search
Example of iterative deepening search &amp; bidirectional search
 
ARCHES ICF
ARCHES ICFARCHES ICF
ARCHES ICF
 
Internship
InternshipInternship
Internship
 
Search algorithms for discrete optimization
Search algorithms for discrete optimizationSearch algorithms for discrete optimization
Search algorithms for discrete optimization
 
Heap
HeapHeap
Heap
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
LSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationLSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in Recommendation
 
Discovering human places of interest from multimodal mobile phone data
Discovering human places of interest from multimodal mobile phone dataDiscovering human places of interest from multimodal mobile phone data
Discovering human places of interest from multimodal mobile phone data
 
EECSCon Poster
EECSCon PosterEECSCon Poster
EECSCon Poster
 
Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)
 
Birch1
Birch1Birch1
Birch1
 
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkFOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Parameter Space Noise for Exploration
Parameter Space Noise for ExplorationParameter Space Noise for Exploration
Parameter Space Noise for Exploration
 
clique-summary
clique-summaryclique-summary
clique-summary
 

Andere mochten auch

How To Create Your Own Info Product
How To Create Your Own Info ProductHow To Create Your Own Info Product
How To Create Your Own Info Productbelieve52
 
CLUBNIK угадай-ка
CLUBNIK угадай-каCLUBNIK угадай-ка
CLUBNIK угадай-каMihail Nadymov
 
литературный клуб «пушкинец»
литературный клуб «пушкинец»литературный клуб «пушкинец»
литературный клуб «пушкинец»Natalya Dyrda
 
Selling Children's Items On Ebay
Selling Children's Items On EbaySelling Children's Items On Ebay
Selling Children's Items On Ebaybelieve52
 
Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...
Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...
Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...IOSR Journals
 
Sistema de Comando de incidentes, Como organizar recursos
Sistema de Comando de incidentes, Como organizar recursosSistema de Comando de incidentes, Como organizar recursos
Sistema de Comando de incidentes, Como organizar recursosYeison Ramirez
 
Comparison Study of Lossless Data Compression Algorithms for Text Data
Comparison Study of Lossless Data Compression Algorithms for Text DataComparison Study of Lossless Data Compression Algorithms for Text Data
Comparison Study of Lossless Data Compression Algorithms for Text DataIOSR Journals
 
Поэма А.С. Пушкина "Руслан и Людмила"
Поэма А.С. Пушкина "Руслан и Людмила"Поэма А.С. Пушкина "Руслан и Людмила"
Поэма А.С. Пушкина "Руслан и Людмила"Natalya Dyrda
 
A Novel PSNR-B Approach for Evaluating the Quality of De-blocked Images
A Novel PSNR-B Approach for Evaluating the Quality of De-blocked Images A Novel PSNR-B Approach for Evaluating the Quality of De-blocked Images
A Novel PSNR-B Approach for Evaluating the Quality of De-blocked Images IOSR Journals
 
Итоговое сочинение - 2015
Итоговое сочинение - 2015Итоговое сочинение - 2015
Итоговое сочинение - 2015Natalya Dyrda
 
Pixel Ad Mall :: Advertising the Pixel Way
Pixel Ad Mall :: Advertising the Pixel WayPixel Ad Mall :: Advertising the Pixel Way
Pixel Ad Mall :: Advertising the Pixel Waybelieve52
 
царскосельский лицей а.с.пушкина
царскосельский лицей а.с.пушкинацарскосельский лицей а.с.пушкина
царскосельский лицей а.с.пушкинаNatalya Dyrda
 
Capacitive voltage and current induction phenomena in GIS substation
Capacitive voltage and current induction phenomena in GIS substationCapacitive voltage and current induction phenomena in GIS substation
Capacitive voltage and current induction phenomena in GIS substationIOSR Journals
 

Andere mochten auch (20)

Smoothing coarse resolution rasters, by BGS
Smoothing coarse resolution rasters, by BGSSmoothing coarse resolution rasters, by BGS
Smoothing coarse resolution rasters, by BGS
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 
How To Create Your Own Info Product
How To Create Your Own Info ProductHow To Create Your Own Info Product
How To Create Your Own Info Product
 
B0441418
B0441418B0441418
B0441418
 
G0933443
G0933443G0933443
G0933443
 
CLUBNIK угадай-ка
CLUBNIK угадай-каCLUBNIK угадай-ка
CLUBNIK угадай-ка
 
I0355561
I0355561I0355561
I0355561
 
литературный клуб «пушкинец»
литературный клуб «пушкинец»литературный клуб «пушкинец»
литературный клуб «пушкинец»
 
Selling Children's Items On Ebay
Selling Children's Items On EbaySelling Children's Items On Ebay
Selling Children's Items On Ebay
 
Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...
Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...
Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...
 
Sistema de Comando de incidentes, Como organizar recursos
Sistema de Comando de incidentes, Como organizar recursosSistema de Comando de incidentes, Como organizar recursos
Sistema de Comando de incidentes, Como organizar recursos
 
Comparison Study of Lossless Data Compression Algorithms for Text Data
Comparison Study of Lossless Data Compression Algorithms for Text DataComparison Study of Lossless Data Compression Algorithms for Text Data
Comparison Study of Lossless Data Compression Algorithms for Text Data
 
H01055157
H01055157H01055157
H01055157
 
Поэма А.С. Пушкина "Руслан и Людмила"
Поэма А.С. Пушкина "Руслан и Людмила"Поэма А.С. Пушкина "Руслан и Людмила"
Поэма А.С. Пушкина "Руслан и Людмила"
 
A Novel PSNR-B Approach for Evaluating the Quality of De-blocked Images
A Novel PSNR-B Approach for Evaluating the Quality of De-blocked Images A Novel PSNR-B Approach for Evaluating the Quality of De-blocked Images
A Novel PSNR-B Approach for Evaluating the Quality of De-blocked Images
 
Итоговое сочинение - 2015
Итоговое сочинение - 2015Итоговое сочинение - 2015
Итоговое сочинение - 2015
 
Manos a la siembra.
Manos a la siembra.Manos a la siembra.
Manos a la siembra.
 
Pixel Ad Mall :: Advertising the Pixel Way
Pixel Ad Mall :: Advertising the Pixel WayPixel Ad Mall :: Advertising the Pixel Way
Pixel Ad Mall :: Advertising the Pixel Way
 
царскосельский лицей а.с.пушкина
царскосельский лицей а.с.пушкинацарскосельский лицей а.с.пушкина
царскосельский лицей а.с.пушкина
 
Capacitive voltage and current induction phenomena in GIS substation
Capacitive voltage and current induction phenomena in GIS substationCapacitive voltage and current induction phenomena in GIS substation
Capacitive voltage and current induction phenomena in GIS substation
 

Ähnlich wie Of Sampling and Smoothing: Approximating Distributions over Linked Open Data

Focused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataFocused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataThomas Gottron
 
Perplexity of Index Models over Evolving Linked Data
Perplexity of Index Models over Evolving Linked Data Perplexity of Index Models over Evolving Linked Data
Perplexity of Index Models over Evolving Linked Data Thomas Gottron
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 
Metody logiczne w analizie danych
Metody logiczne w analizie danych Metody logiczne w analizie danych
Metody logiczne w analizie danych Data Science Warsaw
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015rusbase
 
Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing...
Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing...Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing...
Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing...Tomonari Masada
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsEnrico Daga
 
Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...butest
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)Dimitris Kontokostas
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingTomonari Masada
 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLKostis Kyzirakos
 
Personalised Search for the Social Semantic Web
Personalised Search for the Social Semantic WebPersonalised Search for the Social Semantic Web
Personalised Search for the Social Semantic WebOana Tifrea-Marciuska
 
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...Alexander Litvinenko
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012Ted Dunning
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Rich Heimann
 

Ähnlich wie Of Sampling and Smoothing: Approximating Distributions over Linked Open Data (20)

Focused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataFocused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open Data
 
Focused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataFocused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open Data
 
Perplexity of Index Models over Evolving Linked Data
Perplexity of Index Models over Evolving Linked Data Perplexity of Index Models over Evolving Linked Data
Perplexity of Index Models over Evolving Linked Data
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Metody logiczne w analizie danych
Metody logiczne w analizie danych Metody logiczne w analizie danych
Metody logiczne w analizie danych
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
 
Biological graph models
Biological graph modelsBiological graph models
Biological graph models
 
Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing...
Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing...Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing...
Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing...
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
 
Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning for Optimization: EDAs, probabilistic modelling, or ...
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
 
Probabilistic content models,
Probabilistic content models,Probabilistic content models,
Probabilistic content models,
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
 
Personalised Search for the Social Semantic Web
Personalised Search for the Social Semantic WebPersonalised Search for the Social Semantic Web
Personalised Search for the Social Semantic Web
 
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 

Mehr von Thomas Gottron

Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Thomas Gottron
 
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources Thomas Gottron
 
Making Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresMaking Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresThomas Gottron
 
 Challenges in Managing Online Business Communities
 Challenges in Managing Online Business Communities Challenges in Managing Online Business Communities
 Challenges in Managing Online Business CommunitiesThomas Gottron
 
Challenging Retrieval Scenarios: Social Media and Linked Open Data
Challenging Retrieval Scenarios: Social Media and Linked Open DataChallenging Retrieval Scenarios: Social Media and Linked Open Data
Challenging Retrieval Scenarios: Social Media and Linked Open DataThomas Gottron
 
Get the Google Feeling! Supporting Users in Finding Relevant Sources
Get the Google Feeling! Supporting Users in Finding Relevant SourcesGet the Google Feeling! Supporting Users in Finding Relevant Sources
Get the Google Feeling! Supporting Users in Finding Relevant SourcesThomas Gottron
 
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...Thomas Gottron
 

Mehr von Thomas Gottron (7)

Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
 
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
 
Making Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresMaking Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index Structures
 
 Challenges in Managing Online Business Communities
 Challenges in Managing Online Business Communities Challenges in Managing Online Business Communities
 Challenges in Managing Online Business Communities
 
Challenging Retrieval Scenarios: Social Media and Linked Open Data
Challenging Retrieval Scenarios: Social Media and Linked Open DataChallenging Retrieval Scenarios: Social Media and Linked Open Data
Challenging Retrieval Scenarios: Social Media and Linked Open Data
 
Get the Google Feeling! Supporting Users in Finding Relevant Sources
Get the Google Feeling! Supporting Users in Finding Relevant SourcesGet the Google Feeling! Supporting Users in Finding Relevant Sources
Get the Google Feeling! Supporting Users in Finding Relevant Sources
 
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...
 

Kürzlich hochgeladen

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 

Kürzlich hochgeladen (20)

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 

Of Sampling and Smoothing: Approximating Distributions over Linked Open Data

  • 1. Institute for Web Science & Technologies – WeST Of Sampling and Smoothing: Approximating Distributions over Linked Open Data Thomas Gottron May 26th, 2014 PROFILES Workshop, Crete
  • 2. Thomas Gottron PROFILES 26.5.2014, 2Approximating Distributions over LOD Distributions over Linked Data  Probability to observe a certain pattern k foaf:knows Predicates foaf:Person rdf:type RDF class types ?x Property Sets ?y foaf:Person dbpedia:Actor rdf:type Type Sets ?z dbpedia:Actor foaf:knows ECS P k( )=?
  • 3. Thomas Gottron PROFILES 26.5.2014, 3Approximating Distributions over LOD Distributions over Linked Data  Effectively: Estimate a distribution over pattern instances ki  Applications:  Query federation  Data Mining  Schema inferencing k1 k2 knk3 ... p
  • 4. Thomas Gottron PROFILES 26.5.2014, 4Approximating Distributions over LOD Distributions over Linked Data  Using entire LOD cloud becomes less and less feasible  Solution:  Operate on a sample  Challenges:  How to sample?  How to deal with unobserved instances of a pattern? k1 k2 knk3 ... p
  • 5. Thomas Gottron PROFILES 26.5.2014, 5Approximating Distributions over LOD Sampling Linked Open Data
  • 6. Thomas Gottron PROFILES 26.5.2014, 6Approximating Distributions over LOD Data Format  Linked Data as N-Quads: triple – what is the information? context URI – where does it come from? s op c ( )s op c
  • 7. Thomas Gottron PROFILES 26.5.2014, 7Approximating Distributions over LOD Sampling Strategies  Triple (Edge) Based Sampling  Unique Subject URI (Node) Based Sampling  Context Based Sampling  For all sampling approaches:  Unbiased sampling based on uniform distribution s op s c
  • 8. Thomas Gottron PROFILES 26.5.2014, 8Approximating Distributions over LOD Smoothing Distributions
  • 9. Thomas Gottron PROFILES 26.5.2014, 9Approximating Distributions over LOD Obtaining a Distribution from an Index k1 k2 k3 ... kn d1,1 d1,2 d1,3 ... d2,1 d2,2 d3,1 d3,2 d3,3 ... dn,1 dn,2 dn,3 ... Ã D( )K s https://github.com/gottron/lod-index-models
  • 10. Thomas Gottron PROFILES 26.5.2014, 10Approximating Distributions over LOD Obtaining a Distribution from an Index k1 k2 k3 ... kn 4 2 10 8 K s(k) count Relative frequencies ... K p P k( )= s(k) M M
  • 11. Thomas Gottron PROFILES 26.5.2014, 11Approximating Distributions over LOD Unobserved patterns!  Unobserved pattern instance (e.g. predicate, type sets) Adjusted relative frequencies k1 k2 k3 ... kn 4 2 10 8 <new> 0 K p ... + λ + λ + λ + λ + λ M +l K
  • 12. Thomas Gottron PROFILES 26.5.2014, 12Approximating Distributions over LOD Unobserved patterns!  Unobserved pattern instance (e.g. predicate, type sets)  Lidstone-Smoothing with parameter λ  Laplace-Smoothing (Add-One) for λ = 1 k1 k2 k3 ... kn 4 2 10 8 <new> 0 K p ... + λ + λ + λ + λ + λ M +l K
  • 13. Thomas Gottron PROFILES 26.5.2014, 13Approximating Distributions over LOD Evaluation
  • 14. Thomas Gottron PROFILES 26.5.2014, 14Approximating Distributions over LOD Experimental Evaluation  Obtain different distributions based on:  Sampling: • Strategy (triple, USU, context) • Rate: (5% - 90%)  Smoothing: • Laplace • Lidstone with λ = 0.5, λ = 0.1 and λ = 0.01  Compare to full data set  10 iterations
  • 15. Thomas Gottron PROFILES 26.5.2014, 15Approximating Distributions over LOD Comparing Distributions  Information theoretic measure for comparing distributions: ??? p q DKL P,Q( )= H(P,Q)-H(P) H P,Q( )= - P(x)ld(Q(x)) x å Cross-Entropy of P and Q Kullback-Leibler Divergence
  • 16. Thomas Gottron PROFILES 26.5.2014, 16Approximating Distributions over LOD Experimental Setup Index construction / Estimation of distributions ... ... 5% 10% 20% 30% Full (100%) ... 90% 5% „deviation“ 10% 20% 30% 100%90%
  • 17. Thomas Gottron PROFILES 26.5.2014, 17Approximating Distributions over LOD RDF class typesPredicates Impact of Sampling Strategy Property sets Type sets
  • 18. Thomas Gottron PROFILES 26.5.2014, 18Approximating Distributions over LOD Impact of Smoothing Predicates, context sampling Predicates, triple sampling ECS, context sampling ECS, USU sampling
  • 19. Thomas Gottron PROFILES 26.5.2014, 19Approximating Distributions over LOD Conclusion Summary  Baseline for sampling and smoothing techniques  Little difference between classical smoothing techniques  Quality of context-based sampling as realistic scenario  Other samplings suitable for generating VoID descriptions Future Work  Smarter smoothing techniques  Inspired by Language Modelling  Specific for LOD
  • 20. Thomas Gottron PROFILES 26.5.2014, 20Approximating Distributions over LOD Thanks! Contact: Thomas Gottron WeST – Institute for Web Science and Technologies Universität Koblenz-Landau gottron@uni-koblenz.de