SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Rules for Inducing Hierarchies
from Social Tagging Data
iConference 2018, Sheffield, UK, March 25-28, 2018
Hang Dong, Wei Wang, Frans Coenen
Department of Computer Science,
University of Liverpool
Social Tagging Data (Folksonomies)
• Users collaboratively generate “key
words” for their interests.
• The “key words” form a taxonomy
of resources online, called
Folksonomies (Vander Wal, 2007).
Social tags for movie “Forrest Gump” in MovieLens
https://movielens.org/movies/356
Issues in social tagging data
• (i) Noisy and ambiguous.
• Data cleaning (Dong, Wang & Coenen, 2017).
• (ii) Plain structure, lack of semantic relations among tags.
• This study focuses on hierarchical/subsumption relations between tags.
• This is a challenging problem:
• a cognitive task requiring much human effort (Weller, 2010, p. 139).
• distinct from mining relations from sentences.
Research Questions
• 1. Which rule can effectively capture the hierarchical relations
between social tags?
• - Not systematically discussed in previous studies, although
some approaches were proposed & evaluated (Garcia-Silva, 2012;
Strohmaier et al., 2012).
• - (Information science & Linguistics) definition of hierarchical relations
• - Rules in the previous study
• - Proposed two new rules: Fuzzy set inclusion, Probabilistic Association
Research Questions (2)
• 2. How do rules and data representations affect the quality of the
induced hierarchies?
• Data representation: resource-based representation, probabilistic topic
representation
• Experimental Design:
• Hierarchical Generation Algorithm
• Automated evaluation against three gold-standard hierarchies
Hierarchical Relations – information science
Acknowledgement to the image in Stock, W. G. (2010). Concepts and semantic relations in information science. Journal of the Association
for Information Science and Technology, 61(10), 1951-1969.
Hierarchical Relations
• Straightforward?
• (i) Apple is a [kind of] fruit. (ii) Library science is a part of Information Science.
• Abstraction, Generalisation.
• A type of paradigmatic Relation: fit into the same grammatical slot
(Cruse, 2003).
• Tagging data only provide syntagmatic relations, but are a great
source for paradigmatic relations (Peters, 2009; Stock, 2010).
Hierarchical Relations – linguistics (1)
Definitions in Cruse (2003)
• Logical: (extensional) X is a hyponym of Y iff the
extension/objects of X’ should be included in the
extension/objects of Y’.
• Unsymmetrical
(intensional) X is a hyponym of Y iff F(X) entails, but
is not entailed by F(Y), where F(-) is a sentential
function satisfied by X or Y.
Extension
apple
fruit
Intension
fruit
apple
Hierarchical Relations – linguistics (2)
• Collocational: X is a hyponym of Y iff the normal context of X is a
subset of the normal context of Y.
You shall know a word by the company it keeps… - Firth (1957)
• Componential: X is a hyponym of Y iff the features defining Y are a
proper subset of features defining X.
Definitions in Cruse (2003)
Hierarchical relations from tags – computational rules
• Set Inclusion (Mika, 2007; De Meo, 2009)
• Graph Centrality (Heymann, 2006)
• Information-Theoretic Condition (Wang, 2010)
• Fuzzy Set Inclusion
• Probabilistic Association
Resource-based
(Res-based)
representation
Probabilistic Topic
Modelling (PTM)
based
Representation
Representing a tag as a vector
Data Representation
• Resource-based Representation:
(Markines et al., 2009)
• Probabilistic Topic Modelling Representation:
Vt[i] = number of
times the tag t is
annotated to the ith
resource
R1 R2 R3
news 1 0 0
Web2.0 1 1 1
knowledge 0 0 1
Using a probabilistic generative model to infer
the p (tag | topic) and p (resource | topic)
Then calculate p(topic | tag) from p (tag |
topic) using Bayesian’s Theorem.
tags
resources
Topic 1 Topic 2 Topic 3
news 0.8 0.1 0.1
Web2.0 0.4 0.3 0.3
knowledge 0.2 0.2 0.6
tags
topics
Each row sums to 1.
Rule 1: Set inclusion (Mika, 2007; De Meo, 2009)
• Tag A is a hyponym of Tag B if set-inc(A, B) >= p ∧ set-inc(B, A) < p ∧
sim(A, B) > s. (p=0.5)
Sim(A, B) is a similarity measure: cosine similarity.
where RA means the resource set annotated using the tag A.
Resources of
Information
Science
Resources
of Library
Science
Assumption: The logical extension of a tag is
measured as its resource context, i.e. the
resources that tag is annotated.
Rule 2: Graph Centrality (Heymann, 2006)
• Tag A is a hyponym of Tag B if graph-cent(A) < graph-cent(B) ∧ sim(A,B)
> s.
Tag similarity
graph, where
each node is a tag
and edge is
established by
similarity of tags
over a threshold.
Assumption: popularity-generality
the more popular/influential a tag,
the more general it is.
(collocational)
graph-cent(A) is a graph centrality
measure (centrality, betweenness, etc.)
of a tag A in the tag similarity graph.
Rule 3: Information-Theoretic Condition (Wang, 2010)
• Tag A is a hyponym of Tag B if DKL(PB||PA)−DKL(PA||PB) < f ∧ sim(A,B) >
s. Here PA and PB are the probability distributions of A and B over topics. f is a
noise factor of a small value (f = 0.05 in this study).
• Kullback-Leibler divergence as a measure of “surprise” of receiving PB when PA is
expected.
Rule 4: Fuzzy set inclusion
• An extension of Set Inclusion, based on probabilistic topic
representation:
• Tag A is a hyponym of tag B if fuzzy-set-inc(SA,SB) >= p ∧ fuzzy-set-
inc(SB,SA)< p ∧ sim(A,B)>s, where p is set as 0.5.
, where SA is a fuzzy set for tag A as a pair (U, m), U is the set of topics for tag and
m:U → [0, 1] is a membership function: for each topic z ∈ U, m(z) = p(A|z).
Set inclusion vs Fuzzy set inclusion
• Resource-based Representation:
• Probabilistic Topic Modelling Representation:
Vt[i] = number of
times the tag t is
annotated to the ith
resource
R1 R2 R3
news 1 0 0
Web2.0 1 1 1
knowledge 0 0 1
Using a probabilistic generative model to infer
the p (topic | tag) and p (resource | topic)
Use p (topic | tag)
Note: this is different from the previous
p (tag | topic)
tags
resources
Topic 1 Topic 2 Topic 3
news 0.57 0.17 0.1
Web2.0 0.29 0.5 0.3
knowledge 0.14 0.33 0.6
tags
topics
Each column sums to 1.
Rule 5: Probabilistic Association
• Based on PTM representation
• Tag A is a hyponym of Tag B if p(A|B) < p(B|A) ∧ sim(A, B) > s,
• p(A|B) =
(Griffith & Steyvers, 2002)
• z is a member in the set of topics.
• Assumption: componential measure of hierarchical relation.
Information
science (A)
Information
literacy (B)
An example
p (A | B) = 1
p (B| A) = 0.25
Methodology: Algorithm to Hierarchy
Generation
• For RQ1 about rules:
Replacing the isHypo()
function to one of the five
rules each time, and
compare the results.
• For RQ2 about data
representations:
• using different
representation to
calculate sim(ti,tj).
• using the compatible
data representation
for each rule.
Experiments
• Data Collection and Processing
Bibsonomy dataset 2003-2015: 3,794,882 annotations,
868,015 resources, 283,858 tags, 11,103 users.
We used a streamline to clean academic social tagging data
(Dong, Wang & Coenen, 2017):
• Unified different variants of tags.
• Selected tags having user frequency >= 4.
• Removed resources with tags < 3
The cleaned dataset contains 7,846 tag concepts and 128,782
resources.
Standard tags
Users Resources
(with 3 concepts)
Tags
Users Resources
Reference-based evaluation
Measuring the similarity of a
learned hierarchy, L, to gold-
standard hierarchies.
• Gold-standard, denoted as G:
• DBpedia (6616 concepts overlap)
• Microsoft Concept Graph (6029
concepts overlap)
• ACM computing classification
system (691 concepts overlap)
Acknowledgment to Images in
http://dbpedia.org/page/Category:Information_retrieval and
https://dl.acm.org/ccs/ccs.cfm?id=10003317&lid=0.10002951.1
0003317
gold-standard hierarchy Glearned hierarchy L
Evaluation metrics (Dellschaft, Staab, 2006)
(i) Find common concepts between the learned hierarchy L and the gold-standard hierarchy G,
(ii) Extract a characteristic excerpt for each concept. We use common direct subsumption (cdsub)
as the characteristic excerpt.
(iii) The similarity of hierarchies is defined based on the characteristic excerpts.
Information
retrieval
indexing web
information
retrieval
cross-
language
Information
retrieval
Information
needs
web information
retrieval
cross-
languageindexing
cdsub(Information retrieval, L, G) = {indexing, web information
retrieval, cross-language}
cdsub(Information retrieval, G, L) = {web information retrieval}
…
• Taxonomic Precision (TP), Taxonomic Recall (TR) and Taxonomic F-measure (TF)
• Taxonomic Overlap (TO)
• Taxonomic F’-measure (TF’)
Results - DBpedia
Results – Microsoft Concept Graph
Results – ACM Computing Classification System
Learned Hierarchies
Probabilistic Association
Rule, with resource-based
representation.
Discussions
• Q1: regarding the rules
• Set Inclusion Rule results overall best & stable hierarchies in most experimental settings.
• Fuzzy Set Inclusion and Probabilistic Association rules have competitive results.
• Q2: regarding the data representation techniques
• The Res-based representation performs best in most experimental settings.
• Except the PTM representation with Set Inclusion rule had overall best results (TF and TF’).
• Issue:
• Not consistent among three gold-standard hierarchies, demonstration the distinction of the
nature of the chosen gold-standard hierarchies.
Future Studies
• Evaluation: Not just automated evaluation.
• Higher quality hierarchies through machine learning:
• Use the rules altogether to induce hierarchies: features in supervised learning
• Add further information/context: resource contents, external lexical resources, transfer
learning, etc.
• Use deep learning approaches:
• Forget about the rules?
• Using very rich data representations: word embedding
References
• Benz, D., Hotho, A., Stumme, G., Stutzer, S.: Semantics made by you and me: Self-emerging ontologies can capture the diversity of shared
knowledge. In: Proceedings of the 2nd Web Science Conference (WebSci10) (2010)
• Cruse, D.A.: Hyponymy and its varieties. In: Green, R., Bean, C.A., Myaeng, S.H. (eds.) The Semantics of Relationships: An Interdisciplinary
Perspective, pp. 3–21. Springer, Dordrecht (2002). https://doi.org/10.1007/978-94-017-0073-3 1
• Dellschaft, K., Staab, S.: On how to perform a gold standard based evaluation of ontology learning. In: Cruz, I., Decker, S., Allemang, D.,
Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 228–241. Springer, Heidelberg (2006).
https://doi.org/10.1007/11926078 17
• Dong, H., Wang, W., Frans, C.: Deriving dynamic knowledge from academic social tagging data: a novel research direction. In: iConference
2017 Proceedings (2017)
• Griffiths, T.L., Steyvers, M.: Prediction and semantic association. In: Proceedings of the 15th International Conference on Neural Information
Processing Systems, pp. 11–18. MIT Press (2002)
• Heymann, P., Garcia-Molina, H.: Collaborative creation of communal hierarchical taxonomies in social tagging systems. Technical report,
Stanford University (2006)
• Markines, B., Cattuto, C., Menczer, F., Benz, D., Hotho, A., & Stumme, G. Evaluating similarity measures for emergent semantics of social
tagging. In Proceedings of the 18th international conference on World wide web, 641-650. ACM (2009, April).
• Meo, P. D., Quattrone, G., Ursino, D.: Exploitation of semantic relationships and hierarchical data structures to support a user in his
annotation and browsing activities in folksonomies. Inf. Syst. 34(6), 511–535 (2009)
• Mika, P.: Ontologies are us: a unified model of social networks and semantics. Web Semant.: Sci. Serv. Agents World Wide Web 5(1), 5–15
(2007)
• Peters, I., Becker, P.: Folksonomies: Indexing and Retrieval in Web 2.0. De Gruyter/Saur, Berlin (2009)
• Strohmaier, M., Helic, D., Benz, D., Korner, C., Kern, R.: Evaluation of folksonomy induction algorithms. ACM Trans. Intell. Syst. Technol. 3(4),
1–22 (2012)
• Vander Wal: Folksonomy Coinage and Definition. http://www.vanderwal.net/folksonomy.html (2007)
• Wang, W., Barnaghi, P.M., Bargiela, A.: Probabilistic topic models for learning terminological ontologies. IEEE Trans. Knowl. Data Eng. 22(7),
1028–1040 (2010)
• Weller, K.: Knowledge Representation in the Social Semantic Web. De Gruyter Saur, Berlin/New York (2010).
Thank you for your attention.
Hang Dong’s Home page: http://cgi.csc.liv.ac.uk/~hang/
Contact: hangdong@liverpool.ac.uk

Weitere ähnliche Inhalte

Was ist angesagt?

Lecture14 xing fei-fei
Lecture14 xing fei-feiLecture14 xing fei-fei
Lecture14 xing fei-feiTianlu Wang
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introductionYueshen Xu
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 
Composing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsComposing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsDaisuke BEKKI
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftSebastian Ruder
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 
Do Neural Models Learn Transitivity of Veridical Inference?
Do Neural Models Learn Transitivity of Veridical Inference?Do Neural Models Learn Transitivity of Veridical Inference?
Do Neural Models Learn Transitivity of Veridical Inference?Hitomi Yanaka
 
POPL 2012 Presentation
POPL 2012 PresentationPOPL 2012 Presentation
POPL 2012 Presentationagarwal1975
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentationSoojung Hong
 
RuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative SystemsRuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative SystemsRuleML
 
Max Entropy
Max EntropyMax Entropy
Max Entropyjianingy
 
Interactive Information Retrieval inspired by Quantum Theory
Interactive Information Retrieval inspired by Quantum TheoryInteractive Information Retrieval inspired by Quantum Theory
Interactive Information Retrieval inspired by Quantum TheoryIngo Frommholz
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalBhaskar Mitra
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionMichael Stumpf
 

Was ist angesagt? (20)

Lecture14 xing fei-fei
Lecture14 xing fei-feiLecture14 xing fei-fei
Lecture14 xing fei-fei
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
Composing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsComposing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type Semantics
 
Word Embedding In IR
Word Embedding In IRWord Embedding In IR
Word Embedding In IR
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain Shift
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Do Neural Models Learn Transitivity of Veridical Inference?
Do Neural Models Learn Transitivity of Veridical Inference?Do Neural Models Learn Transitivity of Veridical Inference?
Do Neural Models Learn Transitivity of Veridical Inference?
 
POPL 2012 Presentation
POPL 2012 PresentationPOPL 2012 Presentation
POPL 2012 Presentation
 
Xenia miscouridou wi mlds 4
Xenia miscouridou wi mlds 4Xenia miscouridou wi mlds 4
Xenia miscouridou wi mlds 4
 
Cluster
ClusterCluster
Cluster
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
RuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative SystemsRuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative Systems
 
Max Entropy
Max EntropyMax Entropy
Max Entropy
 
Interactive Information Retrieval inspired by Quantum Theory
Interactive Information Retrieval inspired by Quantum TheoryInteractive Information Retrieval inspired by Quantum Theory
Interactive Information Retrieval inspired by Quantum Theory
 
Topic Models
Topic ModelsTopic Models
Topic Models
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information Retrieval
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 

Ähnlich wie Rules for inducing hierarchies from social tagging data

One Tag to bind them all: Measuring Term abstractness in Social Metadata
One Tag to bind them all: Measuring Term abstractness in Social MetadataOne Tag to bind them all: Measuring Term abstractness in Social Metadata
One Tag to bind them all: Measuring Term abstractness in Social MetadataInovex GmbH
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Mariana Damova, Ph.D
 
Contextual ontology alignment may 2011
Contextual ontology alignment may 2011Contextual ontology alignment may 2011
Contextual ontology alignment may 2011Mariana Damova, Ph.D
 
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...National Institute of Informatics
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modelingHiroyuki Kuromiya
 
Higher Order Learning
Higher Order LearningHigher Order Learning
Higher Order Learningbutest
 
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social MediaHarnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social MediaAmparo Elizabeth Cano Basave
 
What makes a linked data pattern interesting?
What makes a linked data pattern interesting?What makes a linked data pattern interesting?
What makes a linked data pattern interesting?Szymon Klarman
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Formal Concept Analysis
Formal Concept AnalysisFormal Concept Analysis
Formal Concept AnalysisTzar Umang
 
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...Simon Price
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...irjes
 
Text Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated DocumentsText Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated DocumentsNelson Auner
 
Learning Relations from Social Tagging Data
Learning Relations from Social Tagging DataLearning Relations from Social Tagging Data
Learning Relations from Social Tagging DataHang Dong
 
GDSC SSN - solution Challenge : Fundamentals of Decision Making
GDSC SSN - solution Challenge : Fundamentals of Decision MakingGDSC SSN - solution Challenge : Fundamentals of Decision Making
GDSC SSN - solution Challenge : Fundamentals of Decision MakingGDSCSSN
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilaritySaswat Padhi
 

Ähnlich wie Rules for inducing hierarchies from social tagging data (20)

One Tag to bind them all: Measuring Term abstractness in Social Metadata
One Tag to bind them all: Measuring Term abstractness in Social MetadataOne Tag to bind them all: Measuring Term abstractness in Social Metadata
One Tag to bind them all: Measuring Term abstractness in Social Metadata
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 
Contextual ontology alignment may 2011
Contextual ontology alignment may 2011Contextual ontology alignment may 2011
Contextual ontology alignment may 2011
 
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
 
Higher Order Learning
Higher Order LearningHigher Order Learning
Higher Order Learning
 
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social MediaHarnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
 
What makes a linked data pattern interesting?
What makes a linked data pattern interesting?What makes a linked data pattern interesting?
What makes a linked data pattern interesting?
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
LDA on social bookmarking systems
LDA on social bookmarking systemsLDA on social bookmarking systems
LDA on social bookmarking systems
 
Formal Concept Analysis
Formal Concept AnalysisFormal Concept Analysis
Formal Concept Analysis
 
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...
 
Text Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated DocumentsText Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated Documents
 
Learning Relations from Social Tagging Data
Learning Relations from Social Tagging DataLearning Relations from Social Tagging Data
Learning Relations from Social Tagging Data
 
Token
TokenToken
Token
 
GDSC SSN - solution Challenge : Fundamentals of Decision Making
GDSC SSN - solution Challenge : Fundamentals of Decision MakingGDSC SSN - solution Challenge : Fundamentals of Decision Making
GDSC SSN - solution Challenge : Fundamentals of Decision Making
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
 

Mehr von Hang Dong

Excel VBA programming basics
Excel VBA programming basicsExcel VBA programming basics
Excel VBA programming basicsHang Dong
 
语义沙龙:如何自动建构社会标签中的语义关系
语义沙龙:如何自动建构社会标签中的语义关系语义沙龙:如何自动建构社会标签中的语义关系
语义沙龙:如何自动建构社会标签中的语义关系Hang Dong
 
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataEnrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataHang Dong
 
关联数据的消费与应用构建: 以上海图书馆家谱开放数据为例
关联数据的消费与应用构建: 以上海图书馆家谱开放数据为例关联数据的消费与应用构建: 以上海图书馆家谱开放数据为例
关联数据的消费与应用构建: 以上海图书馆家谱开放数据为例Hang Dong
 
Learning structured knowledge from social tagging data: a critical review of ...
Learning structured knowledge from social tagging data: a critical review of ...Learning structured knowledge from social tagging data: a critical review of ...
Learning structured knowledge from social tagging data: a critical review of ...Hang Dong
 
Modeling health related topics in an online forum designed for the deaf & har...
Modeling health related topics in an online forum designed for the deaf & har...Modeling health related topics in an online forum designed for the deaf & har...
Modeling health related topics in an online forum designed for the deaf & har...Hang Dong
 
Identifying Evaluation Standards for Online Information Literacy Tutorials (O...
Identifying Evaluation Standards for Online Information Literacy Tutorials (O...Identifying Evaluation Standards for Online Information Literacy Tutorials (O...
Identifying Evaluation Standards for Online Information Literacy Tutorials (O...Hang Dong
 
My hometown -- Wuhan
My hometown -- WuhanMy hometown -- Wuhan
My hometown -- WuhanHang Dong
 

Mehr von Hang Dong (8)

Excel VBA programming basics
Excel VBA programming basicsExcel VBA programming basics
Excel VBA programming basics
 
语义沙龙:如何自动建构社会标签中的语义关系
语义沙龙:如何自动建构社会标签中的语义关系语义沙龙:如何自动建构社会标签中的语义关系
语义沙龙:如何自动建构社会标签中的语义关系
 
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataEnrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
 
关联数据的消费与应用构建: 以上海图书馆家谱开放数据为例
关联数据的消费与应用构建: 以上海图书馆家谱开放数据为例关联数据的消费与应用构建: 以上海图书馆家谱开放数据为例
关联数据的消费与应用构建: 以上海图书馆家谱开放数据为例
 
Learning structured knowledge from social tagging data: a critical review of ...
Learning structured knowledge from social tagging data: a critical review of ...Learning structured knowledge from social tagging data: a critical review of ...
Learning structured knowledge from social tagging data: a critical review of ...
 
Modeling health related topics in an online forum designed for the deaf & har...
Modeling health related topics in an online forum designed for the deaf & har...Modeling health related topics in an online forum designed for the deaf & har...
Modeling health related topics in an online forum designed for the deaf & har...
 
Identifying Evaluation Standards for Online Information Literacy Tutorials (O...
Identifying Evaluation Standards for Online Information Literacy Tutorials (O...Identifying Evaluation Standards for Online Information Literacy Tutorials (O...
Identifying Evaluation Standards for Online Information Literacy Tutorials (O...
 
My hometown -- Wuhan
My hometown -- WuhanMy hometown -- Wuhan
My hometown -- Wuhan
 

Kürzlich hochgeladen

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingSelcen Ozturkcan
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Kürzlich hochgeladen (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Rules for inducing hierarchies from social tagging data

  • 1. Rules for Inducing Hierarchies from Social Tagging Data iConference 2018, Sheffield, UK, March 25-28, 2018 Hang Dong, Wei Wang, Frans Coenen Department of Computer Science, University of Liverpool
  • 2. Social Tagging Data (Folksonomies) • Users collaboratively generate “key words” for their interests. • The “key words” form a taxonomy of resources online, called Folksonomies (Vander Wal, 2007). Social tags for movie “Forrest Gump” in MovieLens https://movielens.org/movies/356
  • 3. Issues in social tagging data • (i) Noisy and ambiguous. • Data cleaning (Dong, Wang & Coenen, 2017). • (ii) Plain structure, lack of semantic relations among tags. • This study focuses on hierarchical/subsumption relations between tags. • This is a challenging problem: • a cognitive task requiring much human effort (Weller, 2010, p. 139). • distinct from mining relations from sentences.
  • 4. Research Questions • 1. Which rule can effectively capture the hierarchical relations between social tags? • - Not systematically discussed in previous studies, although some approaches were proposed & evaluated (Garcia-Silva, 2012; Strohmaier et al., 2012). • - (Information science & Linguistics) definition of hierarchical relations • - Rules in the previous study • - Proposed two new rules: Fuzzy set inclusion, Probabilistic Association
  • 5. Research Questions (2) • 2. How do rules and data representations affect the quality of the induced hierarchies? • Data representation: resource-based representation, probabilistic topic representation • Experimental Design: • Hierarchical Generation Algorithm • Automated evaluation against three gold-standard hierarchies
  • 6. Hierarchical Relations – information science Acknowledgement to the image in Stock, W. G. (2010). Concepts and semantic relations in information science. Journal of the Association for Information Science and Technology, 61(10), 1951-1969.
  • 7. Hierarchical Relations • Straightforward? • (i) Apple is a [kind of] fruit. (ii) Library science is a part of Information Science. • Abstraction, Generalisation. • A type of paradigmatic Relation: fit into the same grammatical slot (Cruse, 2003). • Tagging data only provide syntagmatic relations, but are a great source for paradigmatic relations (Peters, 2009; Stock, 2010).
  • 8. Hierarchical Relations – linguistics (1) Definitions in Cruse (2003) • Logical: (extensional) X is a hyponym of Y iff the extension/objects of X’ should be included in the extension/objects of Y’. • Unsymmetrical (intensional) X is a hyponym of Y iff F(X) entails, but is not entailed by F(Y), where F(-) is a sentential function satisfied by X or Y. Extension apple fruit Intension fruit apple
  • 9. Hierarchical Relations – linguistics (2) • Collocational: X is a hyponym of Y iff the normal context of X is a subset of the normal context of Y. You shall know a word by the company it keeps… - Firth (1957) • Componential: X is a hyponym of Y iff the features defining Y are a proper subset of features defining X. Definitions in Cruse (2003)
  • 10. Hierarchical relations from tags – computational rules • Set Inclusion (Mika, 2007; De Meo, 2009) • Graph Centrality (Heymann, 2006) • Information-Theoretic Condition (Wang, 2010) • Fuzzy Set Inclusion • Probabilistic Association Resource-based (Res-based) representation Probabilistic Topic Modelling (PTM) based Representation Representing a tag as a vector
  • 11. Data Representation • Resource-based Representation: (Markines et al., 2009) • Probabilistic Topic Modelling Representation: Vt[i] = number of times the tag t is annotated to the ith resource R1 R2 R3 news 1 0 0 Web2.0 1 1 1 knowledge 0 0 1 Using a probabilistic generative model to infer the p (tag | topic) and p (resource | topic) Then calculate p(topic | tag) from p (tag | topic) using Bayesian’s Theorem. tags resources Topic 1 Topic 2 Topic 3 news 0.8 0.1 0.1 Web2.0 0.4 0.3 0.3 knowledge 0.2 0.2 0.6 tags topics Each row sums to 1.
  • 12. Rule 1: Set inclusion (Mika, 2007; De Meo, 2009) • Tag A is a hyponym of Tag B if set-inc(A, B) >= p ∧ set-inc(B, A) < p ∧ sim(A, B) > s. (p=0.5) Sim(A, B) is a similarity measure: cosine similarity. where RA means the resource set annotated using the tag A. Resources of Information Science Resources of Library Science Assumption: The logical extension of a tag is measured as its resource context, i.e. the resources that tag is annotated.
  • 13. Rule 2: Graph Centrality (Heymann, 2006) • Tag A is a hyponym of Tag B if graph-cent(A) < graph-cent(B) ∧ sim(A,B) > s. Tag similarity graph, where each node is a tag and edge is established by similarity of tags over a threshold. Assumption: popularity-generality the more popular/influential a tag, the more general it is. (collocational) graph-cent(A) is a graph centrality measure (centrality, betweenness, etc.) of a tag A in the tag similarity graph.
  • 14. Rule 3: Information-Theoretic Condition (Wang, 2010) • Tag A is a hyponym of Tag B if DKL(PB||PA)−DKL(PA||PB) < f ∧ sim(A,B) > s. Here PA and PB are the probability distributions of A and B over topics. f is a noise factor of a small value (f = 0.05 in this study). • Kullback-Leibler divergence as a measure of “surprise” of receiving PB when PA is expected.
  • 15. Rule 4: Fuzzy set inclusion • An extension of Set Inclusion, based on probabilistic topic representation: • Tag A is a hyponym of tag B if fuzzy-set-inc(SA,SB) >= p ∧ fuzzy-set- inc(SB,SA)< p ∧ sim(A,B)>s, where p is set as 0.5. , where SA is a fuzzy set for tag A as a pair (U, m), U is the set of topics for tag and m:U → [0, 1] is a membership function: for each topic z ∈ U, m(z) = p(A|z).
  • 16. Set inclusion vs Fuzzy set inclusion • Resource-based Representation: • Probabilistic Topic Modelling Representation: Vt[i] = number of times the tag t is annotated to the ith resource R1 R2 R3 news 1 0 0 Web2.0 1 1 1 knowledge 0 0 1 Using a probabilistic generative model to infer the p (topic | tag) and p (resource | topic) Use p (topic | tag) Note: this is different from the previous p (tag | topic) tags resources Topic 1 Topic 2 Topic 3 news 0.57 0.17 0.1 Web2.0 0.29 0.5 0.3 knowledge 0.14 0.33 0.6 tags topics Each column sums to 1.
  • 17. Rule 5: Probabilistic Association • Based on PTM representation • Tag A is a hyponym of Tag B if p(A|B) < p(B|A) ∧ sim(A, B) > s, • p(A|B) = (Griffith & Steyvers, 2002) • z is a member in the set of topics. • Assumption: componential measure of hierarchical relation. Information science (A) Information literacy (B) An example p (A | B) = 1 p (B| A) = 0.25
  • 18. Methodology: Algorithm to Hierarchy Generation • For RQ1 about rules: Replacing the isHypo() function to one of the five rules each time, and compare the results. • For RQ2 about data representations: • using different representation to calculate sim(ti,tj). • using the compatible data representation for each rule.
  • 19. Experiments • Data Collection and Processing Bibsonomy dataset 2003-2015: 3,794,882 annotations, 868,015 resources, 283,858 tags, 11,103 users. We used a streamline to clean academic social tagging data (Dong, Wang & Coenen, 2017): • Unified different variants of tags. • Selected tags having user frequency >= 4. • Removed resources with tags < 3 The cleaned dataset contains 7,846 tag concepts and 128,782 resources. Standard tags Users Resources (with 3 concepts) Tags Users Resources
  • 20. Reference-based evaluation Measuring the similarity of a learned hierarchy, L, to gold- standard hierarchies. • Gold-standard, denoted as G: • DBpedia (6616 concepts overlap) • Microsoft Concept Graph (6029 concepts overlap) • ACM computing classification system (691 concepts overlap) Acknowledgment to Images in http://dbpedia.org/page/Category:Information_retrieval and https://dl.acm.org/ccs/ccs.cfm?id=10003317&lid=0.10002951.1 0003317
  • 21. gold-standard hierarchy Glearned hierarchy L Evaluation metrics (Dellschaft, Staab, 2006) (i) Find common concepts between the learned hierarchy L and the gold-standard hierarchy G, (ii) Extract a characteristic excerpt for each concept. We use common direct subsumption (cdsub) as the characteristic excerpt. (iii) The similarity of hierarchies is defined based on the characteristic excerpts. Information retrieval indexing web information retrieval cross- language Information retrieval Information needs web information retrieval cross- languageindexing cdsub(Information retrieval, L, G) = {indexing, web information retrieval, cross-language} cdsub(Information retrieval, G, L) = {web information retrieval} …
  • 22. • Taxonomic Precision (TP), Taxonomic Recall (TR) and Taxonomic F-measure (TF) • Taxonomic Overlap (TO) • Taxonomic F’-measure (TF’)
  • 24. Results – Microsoft Concept Graph
  • 25. Results – ACM Computing Classification System
  • 26. Learned Hierarchies Probabilistic Association Rule, with resource-based representation.
  • 27.
  • 28.
  • 29. Discussions • Q1: regarding the rules • Set Inclusion Rule results overall best & stable hierarchies in most experimental settings. • Fuzzy Set Inclusion and Probabilistic Association rules have competitive results. • Q2: regarding the data representation techniques • The Res-based representation performs best in most experimental settings. • Except the PTM representation with Set Inclusion rule had overall best results (TF and TF’). • Issue: • Not consistent among three gold-standard hierarchies, demonstration the distinction of the nature of the chosen gold-standard hierarchies.
  • 30. Future Studies • Evaluation: Not just automated evaluation. • Higher quality hierarchies through machine learning: • Use the rules altogether to induce hierarchies: features in supervised learning • Add further information/context: resource contents, external lexical resources, transfer learning, etc. • Use deep learning approaches: • Forget about the rules? • Using very rich data representations: word embedding
  • 31. References • Benz, D., Hotho, A., Stumme, G., Stutzer, S.: Semantics made by you and me: Self-emerging ontologies can capture the diversity of shared knowledge. In: Proceedings of the 2nd Web Science Conference (WebSci10) (2010) • Cruse, D.A.: Hyponymy and its varieties. In: Green, R., Bean, C.A., Myaeng, S.H. (eds.) The Semantics of Relationships: An Interdisciplinary Perspective, pp. 3–21. Springer, Dordrecht (2002). https://doi.org/10.1007/978-94-017-0073-3 1 • Dellschaft, K., Staab, S.: On how to perform a gold standard based evaluation of ontology learning. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 228–241. Springer, Heidelberg (2006). https://doi.org/10.1007/11926078 17 • Dong, H., Wang, W., Frans, C.: Deriving dynamic knowledge from academic social tagging data: a novel research direction. In: iConference 2017 Proceedings (2017) • Griffiths, T.L., Steyvers, M.: Prediction and semantic association. In: Proceedings of the 15th International Conference on Neural Information Processing Systems, pp. 11–18. MIT Press (2002) • Heymann, P., Garcia-Molina, H.: Collaborative creation of communal hierarchical taxonomies in social tagging systems. Technical report, Stanford University (2006) • Markines, B., Cattuto, C., Menczer, F., Benz, D., Hotho, A., & Stumme, G. Evaluating similarity measures for emergent semantics of social tagging. In Proceedings of the 18th international conference on World wide web, 641-650. ACM (2009, April). • Meo, P. D., Quattrone, G., Ursino, D.: Exploitation of semantic relationships and hierarchical data structures to support a user in his annotation and browsing activities in folksonomies. Inf. Syst. 34(6), 511–535 (2009) • Mika, P.: Ontologies are us: a unified model of social networks and semantics. Web Semant.: Sci. Serv. Agents World Wide Web 5(1), 5–15 (2007) • Peters, I., Becker, P.: Folksonomies: Indexing and Retrieval in Web 2.0. De Gruyter/Saur, Berlin (2009) • Strohmaier, M., Helic, D., Benz, D., Korner, C., Kern, R.: Evaluation of folksonomy induction algorithms. ACM Trans. Intell. Syst. Technol. 3(4), 1–22 (2012) • Vander Wal: Folksonomy Coinage and Definition. http://www.vanderwal.net/folksonomy.html (2007) • Wang, W., Barnaghi, P.M., Bargiela, A.: Probabilistic topic models for learning terminological ontologies. IEEE Trans. Knowl. Data Eng. 22(7), 1028–1040 (2010) • Weller, K.: Knowledge Representation in the Social Semantic Web. De Gruyter Saur, Berlin/New York (2010).
  • 32. Thank you for your attention. Hang Dong’s Home page: http://cgi.csc.liv.ac.uk/~hang/ Contact: hangdong@liverpool.ac.uk

Hinweis der Redaktion

  1. There are 3794882 (around 3.8m) annotations on Bibsonomy till July 2015. On the right hand side, it is just a graphical form of knowledge structure, but we will build a knowledge base on it, and make it useful for real applications.
  2. https://en.wikipedia.org/wiki/Extensional_and_intensional_definitions In logic and mathematics, an intensional definition gives the meaning of a term by specifying necessary and sufficient conditions for when the term should be used. In the case of nouns, this is equivalent to specifying the properties that an object needs to have in order to be counted as a referent of the term. An extensional definition of a concept or term formulates its meaning by specifying its extension, that is, every object that falls under the definition of the concept or term in question.
  3. Didn’t use deep learning.