SlideShare a Scribd company logo
1 of 25
Download to read offline
Improving Topic Modeling with
Knowledge Graph Embeddings
Marco Brambilla, Birant Altinel
marco.brambilla@polimi.it
marcobrambi
Formalizing new knowledge is hard
Only high frequency emerges
The long tail challenge
Key: Feature selection
To extract novel knowledge it’s crucial to find the appropriate way to
describe the source content. Features can be:
Syntactic
user profiles
tags, hashtags
BOW
Semantic
entity extraction
semantic features on images
• Topic Model: A statistical model that is used to discover the
abstract «latent» topics of a given content
• Example usage areas include information retrieval, classification,
collaborative filtering, …
• Most well known topic model is LDA
• Plate notation
Topics as new features: Why not?
• Topic Modeling relatively successful using pure statistical
approaches
• Unsupervised method of representing a corpus as a set
of topics (a distribution over a set of topics)
Topic Modeling
Edwin Chen
"Introduction to Latent Dirichlet Allocation" (2011)
Given the sentences
1.I like to eat broccoli and bananas.
2.I ate a banana and spinach smoothie for breakfast.
3.Chinchillas and kittens are cute.
4.My sister adopted a kitten yesterday.
5.Look at this cute hamster munching on a piece of broccoli.
LDA might produce something like
• Sentences 1 and 2: 100% Topic A
• Sentences 3 and 4: 100% Topic B
• Sentence 5: 60% Topic A, 40% Topic B
• Topic A: 30% broccoli, 15% bananas, 10% breakfast, 10% munching, …
• Topic B: 20% chinchillas, 20% kittens, 20% cute, 15% hamster, …
• Improve state-of-the-art of topic modeling by integrating
embedding methods over knowlegde graphs
• Explore possible extensions on the Knowledge Graph to create a
better structure for the knowledge embedding process
• Further explore the parametrization to clarify the effects of the
most relevant parameters on topic modeling
Objective
Background (1): Representation Learning
• Process of encoding knowledge into low-dimensional vectors
• Used for Machine Learning/Deep Learning tasks over graphs
• Supervised / Unsupervised
• Text Embedding is a RL that encodes textual content into vectors
composed of real numbers
• Graph Embeddings do the same on network models
A B
Background (2): Embedding Nodes
Find embedding of nodes to d-dimensions so that “similar” nodes in
the graph have embeddings that are close together.
OutputInput
Background (3): Knowledge Graphs
• Ontological representation of collected, structured and organized
information as a collective knowledge source. Explains real word
entities and relations between them.
• Examples: DBPedia, Freebase, WordNet, Google Knowledge
Graph.
• WordNet is an online lexical database for English language where
words are linked with semantic relations.
Embedding Methods on Knowledge Graphs
• TransE(2013) –Uses addition as the translation operator
• TransH(2014) –Extends TransE by modeling relations as
hyperplanes
• DistMult(2014) –Uses multiplication as the translation operator
• PTransE(2015) –Extends TransE with paths of multiple relations
• TransR(2015) –Extends TransE by creating separate semantic
spaces for entities and relations
• HolE(2016) –Uses correlation as the translation operator
• Analogy(2017) –Optimizes the representations with the analogical
properties of entities and relations
Embedding Methods Comparison
d: Embedding dimension, ne: Number of entities, nr: Number of relations, h: head entity,
r: relation, t: tail entity, wr: vector representation of r, p: path
Related Work
• KGE-LDA
• A knowledge-based topic model
• Combines LDA with entity
embeddings obtained from
knowledge graphs using TransE
• Proposes 2 models on how to
incorporate the embeddings into the
topic model
Model A
Model B
Our Experiment
• Text corpus: 20-Newsgroups (20NG) public dataset
• 18,800 documents
• 21K distinct words
• Wordnet18 as graph
• 115K triples
• 40K entities
• 18 types of relations
Parameter Exploration and Evaluation
• Topic Number
• Embedding Dimension
• Topic Coherence
A quantitative measure to evaluate the topic models by their coherence
• Document Classification Scores
The accuracy of the document classification through topic model’s output
features
Embedding Methods Comparison –
(some) Results
Embedding Methods Comparison –
(some) Results
Extending the Graph
• dependency relations in sentences constitute meaningful
semantics by itself
• KG merged with the Dependency Graph
Knowledge Graph Extension
• Semantic relations in KG are merged with the syntactic
dependency relations obtained from sentences.
Our Experiment
• Wordnet18 as graph
• 115K triples
• 40K entities (only 9K in common with the dataset vocabulary)
• 18 types of relations
• Extended graph size
• 815K triples
• 55 types of relations
Knowledge Graph Extension – Results
Further parameter exploration
(embedding size = 100)
Execution Time: Embedding Dimension and
Topics
• The runtime duration with
respect to the topic number,
embedding dimension, and
incorporation model
• Topic Number has a higher
impact on runtime than the
embedding dimension
Conclusions
• First attempt to systematically integrate KBs in Topic analysis
• A content-based approach to extend the Knowledge Graph
transforming it in a domain specific network in order to improve
the embeddings.
• Parametrization extended (topic number and embedding
dimension)
Future:
• Grid-search for parameter optimization
• Improvement of knowledge graph extension process
THANKS!
QUESTIONS?
Marco Brambilla @marcobrambi marco.brambilla@polimi.it
http://datascience.deib.polimi.it http://home.deib.polimi.it/marcobrambi

More Related Content

More from Marco Brambilla

Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...Marco Brambilla
 
Community analysis using graph representation learning on social networks
Community analysis using graph representation learning on social networksCommunity analysis using graph representation learning on social networks
Community analysis using graph representation learning on social networksMarco Brambilla
 
Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals Marco Brambilla
 
Data Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionData Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionMarco Brambilla
 
Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018Marco Brambilla
 
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...Marco Brambilla
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Marco Brambilla
 
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...Marco Brambilla
 
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...Marco Brambilla
 
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.Marco Brambilla
 
Big Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoBig Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoMarco Brambilla
 
Web Science. An introduction
Web Science. An introductionWeb Science. An introduction
Web Science. An introductionMarco Brambilla
 
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...Marco Brambilla
 
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...Marco Brambilla
 
Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...Marco Brambilla
 
Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...Marco Brambilla
 
Automatic code generation for cross platform, multi-device mobile apps. An in...
Automatic code generation for cross platform, multi-device mobile apps. An in...Automatic code generation for cross platform, multi-device mobile apps. An in...
Automatic code generation for cross platform, multi-device mobile apps. An in...Marco Brambilla
 
IFML - Internet of Things and Internet of People: The Role of User Interactio...
IFML - Internet of Things and Internet of People: The Role of User Interactio...IFML - Internet of Things and Internet of People: The Role of User Interactio...
IFML - Internet of Things and Internet of People: The Role of User Interactio...Marco Brambilla
 
Model-Driven Software Engineering in Practice - Chapter 5 - Integration of Mo...
Model-Driven Software Engineering in Practice - Chapter 5 - Integration of Mo...Model-Driven Software Engineering in Practice - Chapter 5 - Integration of Mo...
Model-Driven Software Engineering in Practice - Chapter 5 - Integration of Mo...Marco Brambilla
 
Mobile + cloud + internet of things (iot) = nuove opportunità di business
Mobile + cloud + internet of things (iot) = nuove opportunità di businessMobile + cloud + internet of things (iot) = nuove opportunità di business
Mobile + cloud + internet of things (iot) = nuove opportunità di businessMarco Brambilla
 

More from Marco Brambilla (20)

Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
 
Community analysis using graph representation learning on social networks
Community analysis using graph representation learning on social networksCommunity analysis using graph representation learning on social networks
Community analysis using graph representation learning on social networks
 
Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals
 
Data Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionData Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extraction
 
Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018
 
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...
 
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
 
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
 
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
 
Big Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoBig Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di Milano
 
Web Science. An introduction
Web Science. An introductionWeb Science. An introduction
Web Science. An introduction
 
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
 
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
 
Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...
 
Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...
 
Automatic code generation for cross platform, multi-device mobile apps. An in...
Automatic code generation for cross platform, multi-device mobile apps. An in...Automatic code generation for cross platform, multi-device mobile apps. An in...
Automatic code generation for cross platform, multi-device mobile apps. An in...
 
IFML - Internet of Things and Internet of People: The Role of User Interactio...
IFML - Internet of Things and Internet of People: The Role of User Interactio...IFML - Internet of Things and Internet of People: The Role of User Interactio...
IFML - Internet of Things and Internet of People: The Role of User Interactio...
 
Model-Driven Software Engineering in Practice - Chapter 5 - Integration of Mo...
Model-Driven Software Engineering in Practice - Chapter 5 - Integration of Mo...Model-Driven Software Engineering in Practice - Chapter 5 - Integration of Mo...
Model-Driven Software Engineering in Practice - Chapter 5 - Integration of Mo...
 
Mobile + cloud + internet of things (iot) = nuove opportunità di business
Mobile + cloud + internet of things (iot) = nuove opportunità di businessMobile + cloud + internet of things (iot) = nuove opportunità di business
Mobile + cloud + internet of things (iot) = nuove opportunità di business
 

Recently uploaded

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 

Recently uploaded (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Improving Topic Modeling with Knowledge Graph Embeddings

  • 1. Improving Topic Modeling with Knowledge Graph Embeddings Marco Brambilla, Birant Altinel marco.brambilla@polimi.it marcobrambi
  • 2. Formalizing new knowledge is hard Only high frequency emerges The long tail challenge
  • 3. Key: Feature selection To extract novel knowledge it’s crucial to find the appropriate way to describe the source content. Features can be: Syntactic user profiles tags, hashtags BOW Semantic entity extraction semantic features on images
  • 4. • Topic Model: A statistical model that is used to discover the abstract «latent» topics of a given content • Example usage areas include information retrieval, classification, collaborative filtering, … • Most well known topic model is LDA • Plate notation Topics as new features: Why not?
  • 5. • Topic Modeling relatively successful using pure statistical approaches • Unsupervised method of representing a corpus as a set of topics (a distribution over a set of topics) Topic Modeling
  • 6. Edwin Chen "Introduction to Latent Dirichlet Allocation" (2011) Given the sentences 1.I like to eat broccoli and bananas. 2.I ate a banana and spinach smoothie for breakfast. 3.Chinchillas and kittens are cute. 4.My sister adopted a kitten yesterday. 5.Look at this cute hamster munching on a piece of broccoli. LDA might produce something like • Sentences 1 and 2: 100% Topic A • Sentences 3 and 4: 100% Topic B • Sentence 5: 60% Topic A, 40% Topic B • Topic A: 30% broccoli, 15% bananas, 10% breakfast, 10% munching, … • Topic B: 20% chinchillas, 20% kittens, 20% cute, 15% hamster, …
  • 7. • Improve state-of-the-art of topic modeling by integrating embedding methods over knowlegde graphs • Explore possible extensions on the Knowledge Graph to create a better structure for the knowledge embedding process • Further explore the parametrization to clarify the effects of the most relevant parameters on topic modeling Objective
  • 8. Background (1): Representation Learning • Process of encoding knowledge into low-dimensional vectors • Used for Machine Learning/Deep Learning tasks over graphs • Supervised / Unsupervised • Text Embedding is a RL that encodes textual content into vectors composed of real numbers • Graph Embeddings do the same on network models
  • 9. A B Background (2): Embedding Nodes Find embedding of nodes to d-dimensions so that “similar” nodes in the graph have embeddings that are close together. OutputInput
  • 10. Background (3): Knowledge Graphs • Ontological representation of collected, structured and organized information as a collective knowledge source. Explains real word entities and relations between them. • Examples: DBPedia, Freebase, WordNet, Google Knowledge Graph. • WordNet is an online lexical database for English language where words are linked with semantic relations.
  • 11. Embedding Methods on Knowledge Graphs • TransE(2013) –Uses addition as the translation operator • TransH(2014) –Extends TransE by modeling relations as hyperplanes • DistMult(2014) –Uses multiplication as the translation operator • PTransE(2015) –Extends TransE with paths of multiple relations • TransR(2015) –Extends TransE by creating separate semantic spaces for entities and relations • HolE(2016) –Uses correlation as the translation operator • Analogy(2017) –Optimizes the representations with the analogical properties of entities and relations
  • 12. Embedding Methods Comparison d: Embedding dimension, ne: Number of entities, nr: Number of relations, h: head entity, r: relation, t: tail entity, wr: vector representation of r, p: path
  • 13. Related Work • KGE-LDA • A knowledge-based topic model • Combines LDA with entity embeddings obtained from knowledge graphs using TransE • Proposes 2 models on how to incorporate the embeddings into the topic model Model A Model B
  • 14. Our Experiment • Text corpus: 20-Newsgroups (20NG) public dataset • 18,800 documents • 21K distinct words • Wordnet18 as graph • 115K triples • 40K entities • 18 types of relations
  • 15. Parameter Exploration and Evaluation • Topic Number • Embedding Dimension • Topic Coherence A quantitative measure to evaluate the topic models by their coherence • Document Classification Scores The accuracy of the document classification through topic model’s output features
  • 16. Embedding Methods Comparison – (some) Results
  • 17. Embedding Methods Comparison – (some) Results
  • 18. Extending the Graph • dependency relations in sentences constitute meaningful semantics by itself • KG merged with the Dependency Graph
  • 19. Knowledge Graph Extension • Semantic relations in KG are merged with the syntactic dependency relations obtained from sentences.
  • 20. Our Experiment • Wordnet18 as graph • 115K triples • 40K entities (only 9K in common with the dataset vocabulary) • 18 types of relations • Extended graph size • 815K triples • 55 types of relations
  • 23. Execution Time: Embedding Dimension and Topics • The runtime duration with respect to the topic number, embedding dimension, and incorporation model • Topic Number has a higher impact on runtime than the embedding dimension
  • 24. Conclusions • First attempt to systematically integrate KBs in Topic analysis • A content-based approach to extend the Knowledge Graph transforming it in a domain specific network in order to improve the embeddings. • Parametrization extended (topic number and embedding dimension) Future: • Grid-search for parameter optimization • Improvement of knowledge graph extension process
  • 25. THANKS! QUESTIONS? Marco Brambilla @marcobrambi marco.brambilla@polimi.it http://datascience.deib.polimi.it http://home.deib.polimi.it/marcobrambi