SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Downloaden Sie, um offline zu lesen
Seman&c 
Analysis 
in 
Language 
Technology 
http://stp.lingfil.uu.se/~santinim/sais/2014/sais_2014.htm 
Semantic Word Clouds 
Marina 
San(ni 
san$nim@stp.lingfil.uu.se 
Department 
of 
Linguis(cs 
and 
Philology 
Uppsala 
University, 
Uppsala, 
Sweden 
Autumn 
2014 
Lect 
10: 
Seman(c 
Word 
Clouds 
1
Acknowledgements 
• Some 
slides 
borrowed 
from 
Sergey 
Pupyrev. 
Lect 
10: 
Seman(c 
Word 
Clouds 
2
Outline 
• Word 
Clouds 
• 3 
early 
algorithms 
• 3 
new 
algorithms 
• Metrics 
& 
Quan(ta(ve 
Evalua(on 
Lect 
10: 
Seman(c 
Word 
Clouds 
3
Word 
Clouds 
• Word 
clouds 
have 
become 
a 
standard 
tool 
for 
abstrac(ng, 
visualizing 
and 
comparing 
texts… 
• We 
could 
apply 
the 
same 
or 
similar 
techniques 
to 
the 
huge 
amonts 
of 
tags 
produced 
by 
users 
interac(ng 
in 
the 
social 
networks 
Lect 
10: 
Seman(c 
Word 
Clouds 
4
Comparison 
& 
conceptualiza(on 
Tool 
Lect 
10: 
Seman(c 
Word 
Clouds 
5 
• Word 
Clouds 
as 
a 
tool 
for 
”conceptualizing” 
documents. 
Cf 
Ontologies 
• Ex: 
2008, 
comparison 
of 
speeches: 
Obama 
vs 
McCain
Word 
Clouds 
and 
Tag 
Clouds… 
• … 
are 
oVen 
used 
to 
represent 
importance 
among 
terms 
(ex, 
band 
popularity) 
or 
serve 
as 
a 
naviga(on 
tool 
(ex, 
Google 
search 
results). 
Lect 
10: 
Seman(c 
Word 
Clouds 
6
The 
Problem… 
• How 
to 
compute 
seman(c-­‐preserving 
word 
clouds 
in 
which 
seman(cally-­‐related 
words 
are 
close 
to 
each 
other. 
Lect 
10: 
Seman(c 
Word 
Clouds 
7
Wordle 
h^p://www.wordle.net 
• Prac(cal 
tools, 
like 
Wordle, 
make 
word 
cloud 
visualiza(on 
easy. 
• Shortoming: 
they 
do 
not 
capture 
the 
rela(onships 
between 
words 
in 
any 
way 
Lect 
10: 
Seman(c 
Word 
Clouds 
8
Many 
word 
clouds 
are 
arranged 
randomly 
(look 
also 
at 
the 
sca^ered 
colours) 
Lect 
10: 
Seman(c 
Word 
Clouds 
9
Seman(c 
Pa^erns 
• Humans 
ins(nc(vely 
tend 
to 
pick 
up 
pa^erns 
• Ins(nc(vely, 
one 
could 
say 
that 
two 
words 
that 
are 
close 
to 
each 
other 
in 
a 
word 
cloud 
are 
seman(cally 
related. 
Lect 
10: 
Seman(c 
Word 
Clouds 
10
So, 
it 
makes 
sense 
to 
place 
such 
related 
words 
close 
to 
each 
other 
(look 
also 
at 
the 
color 
distribu(on) 
Lect 
10: 
Seman(c 
Word 
Clouds 
11
In 
linguis(cs 
and 
in 
LT… 
• … 
if 
a 
pair 
of 
words 
oVen 
appear 
together 
in 
a 
sentence, 
then 
we 
can 
assume 
that 
this 
pair 
of 
words 
is 
related 
seman(cally. 
Lect 
10: 
Seman(c 
Word 
Clouds 
12
Seman(c 
word 
clouds 
have 
higher 
user 
sa(sfac(on 
compared 
to 
other 
layouts… 
Lect 
10: 
Seman(c 
Word 
Clouds 
13
All 
recent 
word 
cloud 
visualiza(on 
tools 
aim 
to 
incoprorate 
seman(cs 
in 
the 
layout… 
Lect 
10: 
Seman(c 
Word 
Clouds 
14
… 
but 
none 
of 
them 
provide 
any 
guarantee 
about 
the 
quality 
of 
the 
layout 
in 
terms 
of 
seman(cs 
Lect 
10: 
Seman(c 
Word 
Clouds 
15
Early 
algorithms: 
Force-­‐Directed 
Graph 
• Most 
of 
the 
exis(ng 
algorithms 
are 
based 
on 
force-­‐directed 
graph 
layout. 
• Force-­‐directed 
graph 
drawing 
algorithms 
are 
a 
class 
of 
algorithms 
for 
drawing 
graphs 
in 
an 
aesthe(cally 
pleasing 
way 
– A^rac(ve 
forces 
between 
pairs 
to 
reduce 
empty 
space 
– Repulsive 
forces 
ensure 
that 
words 
do 
not 
overlap 
– Final 
force 
preserve 
seman(c 
rela(ons 
between 
words. 
Force-­‐directed 
graph 
drawing 
algorithms 
assign 
forces 
among 
the 
set 
of 
edges 
and 
the 
set 
of 
nodes 
of 
a 
graph 
drawing. 
Typically, 
spring-­‐like 
a^rac(ve 
forces 
based 
on 
Hooke's 
law 
are 
used 
to 
a^ract 
pairs 
of 
endpoints 
of 
the 
graph's 
edges 
towards 
each 
other, 
while 
simultaneously 
repulsive 
forces 
like 
those 
of 
electrically 
charged 
par(cles 
based 
on 
Coulomb's 
law 
are 
used 
to 
separate 
all 
pairs 
of 
nodes. 
Lect 
10: 
Seman(c 
Word 
Clouds 
16
Newer 
Algorithms: 
rectangle 
representa(on 
of 
graphs 
• Vertex-­‐weighted 
and 
edge-­‐weighed 
graph: 
– The 
ver(ces 
of 
the 
graph 
are 
the 
words 
• Their 
weight 
correspond 
to 
some 
measure 
of 
importance 
(eg. 
word 
frequencies) 
– The 
edges 
capture 
the 
seman(c 
relatedness 
of 
pair 
of 
words 
(eg. 
co-­‐occurrence) 
• Their 
weight 
correspond 
to 
the 
strength 
of 
the 
rela(on 
– Each 
vertex 
can 
be 
drawn 
as 
a 
box 
(rectangle) 
with 
a 
dimension 
determing 
by 
its 
weight 
– A 
realized 
adjacency 
is 
the 
sum 
of 
the 
edge 
weights 
for 
all 
pairs 
of 
touching 
boxes. 
– The 
goal 
is 
to 
maximize 
the 
realized 
adjacencies. 
Lect 
10: 
Seman(c 
Word 
Clouds 
17
Experimental 
Setup: 
1) 
Term 
Extrac(on 
2) 
Ranking 
3) 
Similarity 
Conputa(on 
Lect 
10: 
Seman(c 
Word 
Clouds 
18
Early 
Algorithms 
1. Wordle 
(Random) 
2. Context-­‐Preserving 
Word 
Cloud 
Visualiza(on 
(CPWCV) 
3. Seam 
Carving 
Lect 
10: 
Seman(c 
Word 
Clouds 
19
Wordle 
à 
Random 
• 
The 
Wordle 
algorithm 
places 
one 
word 
at 
a 
(me 
in 
a 
greedy 
fashion, 
aiming 
to 
use 
space 
as 
efficiently 
as 
possible. 
• First 
the 
words 
are 
sorted 
by 
weight 
in 
decreasing 
order. 
• Then 
for 
each 
word 
in 
the 
order, 
a 
posi(on 
is 
picked 
at 
random. 
Lect 
10: 
Seman(c 
Word 
Clouds 
20
1: 
Random 
Lect 
10: 
Seman(c 
Word 
Clouds 
21
2: 
Random 
Lect 
10: 
Seman(c 
Word 
Clouds 
22
3: 
Random 
Lect 
10: 
Seman(c 
Word 
Clouds 
23
4: 
Random 
Lect 
10: 
Seman(c 
Word 
Clouds 
24
5: 
Random 
Lect 
10: 
Seman(c 
Word 
Clouds 
25
6: 
Random 
Lect 
10: 
Seman(c 
Word 
Clouds 
26
Context-­‐Preserving 
Word 
Cloud 
Visualiza(on 
(CPWCV) 
• First, 
a 
dissimilarity 
matrix 
is 
computed 
and 
Mul(dimensional 
Scaling 
(MDS) 
is 
performed 
• Second, 
Mul(dimensional 
scaling 
(MDS) 
is 
a 
means 
of 
visualizing 
the 
level 
of 
similarity 
of 
individual 
cases 
of 
a 
dataset. 
effort 
to 
create 
a 
compact 
layout 
Lect 
10: 
Seman(c 
Word 
Clouds 
27
1: 
Context-­‐Preserving 
Lect 
10: 
Seman(c 
Word 
Clouds 
28
2: 
Context-­‐Preserving 
: 
repulsive 
force 
Lect 
10: 
Seman(c 
Word 
Clouds 
29
3: 
Context-­‐Preserving 
: 
a^rac(ve 
force 
Lect 
10: 
Seman(c 
Word 
Clouds 
30
Seam 
Carving 
• Seam 
carving 
is 
a 
content-­‐aware 
image 
resizing 
technique 
• Basically, 
an 
algorithm 
for 
image 
resizing 
• It 
was 
invented 
at 
Mitsubishi’s 
Lect 
10: 
Seman(c 
Word 
Clouds 
31
1: 
Seam 
Carving 
Lect 
10: 
Seman(c 
Word 
Clouds 
32
2: 
Seam 
Carving 
: 
space 
is 
divided 
into 
regions 
Lect 
10: 
Seman(c 
Word 
Clouds 
33
3: 
Seam 
Carving 
: 
empty 
paths 
trimmed 
out 
itera(vely 
Lect 
10: 
Seman(c 
Word 
Clouds 
34
4: 
Seam 
Carving 
Lect 
10: 
Seman(c 
Word 
Clouds 
35
5: 
Seam 
Carving 
Lect 
10: 
Seman(c 
Word 
Clouds 
36
6: 
Seam 
Carving: 
space 
divided 
into 
regions 
Lect 
10: 
Seman(c 
Word 
Clouds 
37
7: 
Seam 
Carving 
Lect 
10: 
Seman(c 
Word 
Clouds 
38
3 
New 
Algorithms 
1. Inflate 
and 
Push 
2. Star 
Forest 
3. Cycle 
Cover 
Lect 
10: 
Seman(c 
Word 
Clouds 
39
Inflate-­‐and-­‐Push 
• Simple 
heuris(c 
method 
for 
word 
layout, 
which 
aims 
to 
preserve 
seman(c 
rela(ons 
between 
pair 
of 
words. 
Lect 
10: 
Seman(c 
Word 
Clouds 
40
1: 
Inflate 
Lect 
10: 
Seman(c 
Word 
Clouds 
41
2: 
Inflate 
: 
scaling 
down 
Lect 
10: 
Seman(c 
Word 
Clouds 
42
3: 
Inflate 
: 
seman(cally-­‐related 
words 
are 
placed 
close 
to 
each 
other 
Lect 
10: 
Seman(c 
Word 
Clouds 
43
4: 
Inflate 
: 
repulsive 
force 
to 
resolve 
overlaps 
Lect 
10: 
Seman(c 
Word 
Clouds 
44
5: 
Inflate 
Lect 
10: 
Seman(c 
Word 
Clouds 
45
Star 
Forest 
• A 
star 
is 
a 
tree 
and 
a 
star 
forest 
is 
a 
forest 
whose 
connected 
components 
are 
all 
stars. 
Lect 
10: 
Seman(c 
Word 
Clouds 
46
Star 
Forest 
: 
star 
= 
graph 
• Dissimilarity 
matrix 
à 
disjoint 
stars 
= 
star 
forest 
• A^rac(ve 
force 
to 
get 
a 
compact 
layout 
Lect 
10: 
Seman(c 
Word 
Clouds 
47
Cycle 
Cover 
• This 
algorithm 
is 
based 
on 
a 
similarity 
matrix. 
• First, 
a 
similarity 
path(=cycle) 
is 
created 
• Then, 
the 
op(mal 
level 
of 
compact-­‐ness 
is 
computed 
Lect 
10: 
Seman(c 
Word 
Clouds 
48
Quan(ta(ve 
Metrics 
Lect 
10: 
Seman(c 
Word 
Clouds 
49
Criteria 
1. Realized 
Adjacenies 
– how 
close 
are 
similar 
words 
to 
each 
other? 
2. Distor(on 
– how 
distant 
are 
dissimilar 
words? 
3. Comptactness 
– how 
well 
u(lized 
is 
the 
drawing 
area? 
4. Uniform 
Area 
U(liza(on 
– uniformity 
of 
the 
distribu(on 
(overpopulated 
vs 
sparse 
areas 
in 
the 
word 
cloud) 
5. Aspect 
Ra(o 
– width 
and 
height 
of 
the 
bounding 
box 
6. Running 
Time 
– execu(on 
(me 
Lect 
10: 
Seman(c 
Word 
Clouds 
50
2 
datasets 
(1) 
WIKI 
, 
a 
set 
of 
112 
plain-­‐text 
ar(cles 
extracted 
from 
the 
English 
Wikipedia, 
each 
consis(ng 
of 
at 
least 
200 
dis(nct 
words 
(2) 
PAPERS 
, 
a 
set 
of 
56 
research 
papers 
published 
in 
conferences 
on 
experimental 
algorithms 
(SEA 
and 
ALENEX) 
in 
2011-­‐2012. 
Lect 
10: 
Seman(c 
Word 
Clouds 
51
Cycle 
Cover 
wins 
Lect 
10: 
Seman(c 
Word 
Clouds 
52
Seam 
Carving 
wins 
Lect 
10: 
Seman(c 
Word 
Clouds 
53
Random 
wins 
Lect 
10: 
Seman(c 
Word 
Clouds 
54
Inflate 
wins 
Lect 
10: 
Seman(c 
Word 
Clouds 
55
Random 
and 
Seam 
Carving 
win 
Lect 
10: 
Seman(c 
Word 
Clouds 
56
All 
ok 
except 
Seam 
Carving 
Lect 
10: 
Seman(c 
Word 
Clouds 
57
Demo 
Lect 
10: 
Seman(c 
Word 
Clouds 
58
Final 
Words 
Lect 
10: 
Seman(c 
Word 
Clouds 
59
The 
end 
Lect 
10: 
Seman(c 
Word 
Clouds 
60

Weitere ähnliche Inhalte

Ähnlich wie Lecture: Semantic Word Clouds

Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Jonathon Hare
 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data scienceNikhil Jaiswal
 
[Emnlp] what is glo ve part i - towards data science
[Emnlp] what is glo ve  part i - towards data science[Emnlp] what is glo ve  part i - towards data science
[Emnlp] what is glo ve part i - towards data scienceNikhil Jaiswal
 
On solving coverage problems in a wireless sensor networks using diagrams
On solving coverage problems in a wireless sensor networks using diagrams On solving coverage problems in a wireless sensor networks using diagrams
On solving coverage problems in a wireless sensor networks using diagrams marwaeng
 
Course Assignment : Skip gram
Course Assignment : Skip gramCourse Assignment : Skip gram
Course Assignment : Skip gramKhalilBergaoui
 
Bag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse CodinBag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse CodinKarlos Svoboda
 
Ensemble Data Assimilation on a Non-Conservative Adaptive Mesh
Ensemble Data Assimilation on a Non-Conservative Adaptive MeshEnsemble Data Assimilation on a Non-Conservative Adaptive Mesh
Ensemble Data Assimilation on a Non-Conservative Adaptive MeshColinGuider
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamkevig
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamkevig
 
Ontology mapping needs context & approximation
Ontology mapping needs context & approximationOntology mapping needs context & approximation
Ontology mapping needs context & approximationFrank van Harmelen
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOpsPooyan Jamshidi
 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment andrefsantos
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networksconnectbeubax
 
cis97003
cis97003cis97003
cis97003perfj
 
Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptxNameetDaga1
 
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptxthanhdowork
 
Modeling and Structural Analysis of a Wing [FSI ANSYS&MATLAB]
 Modeling and Structural Analysis of a Wing [FSI ANSYS&MATLAB]  Modeling and Structural Analysis of a Wing [FSI ANSYS&MATLAB]
Modeling and Structural Analysis of a Wing [FSI ANSYS&MATLAB] BahaaIbrahim10
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Jinpyo Lee
 

Ähnlich wie Lecture: Semantic Word Clouds (20)

Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data science
 
[Emnlp] what is glo ve part i - towards data science
[Emnlp] what is glo ve  part i - towards data science[Emnlp] what is glo ve  part i - towards data science
[Emnlp] what is glo ve part i - towards data science
 
On solving coverage problems in a wireless sensor networks using diagrams
On solving coverage problems in a wireless sensor networks using diagrams On solving coverage problems in a wireless sensor networks using diagrams
On solving coverage problems in a wireless sensor networks using diagrams
 
Course Assignment : Skip gram
Course Assignment : Skip gramCourse Assignment : Skip gram
Course Assignment : Skip gram
 
Bag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse CodinBag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse Codin
 
Ensemble Data Assimilation on a Non-Conservative Adaptive Mesh
Ensemble Data Assimilation on a Non-Conservative Adaptive MeshEnsemble Data Assimilation on a Non-Conservative Adaptive Mesh
Ensemble Data Assimilation on a Non-Conservative Adaptive Mesh
 
first research paper
first research paperfirst research paper
first research paper
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a stream
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a stream
 
Ontology mapping needs context & approximation
Ontology mapping needs context & approximationOntology mapping needs context & approximation
Ontology mapping needs context & approximation
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOps
 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networks
 
cis97003
cis97003cis97003
cis97003
 
Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptx
 
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
 
Grids
GridsGrids
Grids
 
Modeling and Structural Analysis of a Wing [FSI ANSYS&MATLAB]
 Modeling and Structural Analysis of a Wing [FSI ANSYS&MATLAB]  Modeling and Structural Analysis of a Wing [FSI ANSYS&MATLAB]
Modeling and Structural Analysis of a Wing [FSI ANSYS&MATLAB]
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)
 

Mehr von Marina Santini

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Marina Santini
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsMarina Santini
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-Marina Santini
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesMarina Santini
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebMarina Santini
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: SummarizationMarina Santini
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question AnsweringMarina Santini
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Marina Santini
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingMarina Santini
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational SemanticsMarina Santini
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Marina Santini
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Marina Santini
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Marina Santini
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part) Marina Santini
 

Mehr von Marina Santini (20)

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability Features
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)
 

Kürzlich hochgeladen

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 

Kürzlich hochgeladen (20)

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 

Lecture: Semantic Word Clouds

  • 1. Seman&c Analysis in Language Technology http://stp.lingfil.uu.se/~santinim/sais/2014/sais_2014.htm Semantic Word Clouds Marina San(ni san$nim@stp.lingfil.uu.se Department of Linguis(cs and Philology Uppsala University, Uppsala, Sweden Autumn 2014 Lect 10: Seman(c Word Clouds 1
  • 2. Acknowledgements • Some slides borrowed from Sergey Pupyrev. Lect 10: Seman(c Word Clouds 2
  • 3. Outline • Word Clouds • 3 early algorithms • 3 new algorithms • Metrics & Quan(ta(ve Evalua(on Lect 10: Seman(c Word Clouds 3
  • 4. Word Clouds • Word clouds have become a standard tool for abstrac(ng, visualizing and comparing texts… • We could apply the same or similar techniques to the huge amonts of tags produced by users interac(ng in the social networks Lect 10: Seman(c Word Clouds 4
  • 5. Comparison & conceptualiza(on Tool Lect 10: Seman(c Word Clouds 5 • Word Clouds as a tool for ”conceptualizing” documents. Cf Ontologies • Ex: 2008, comparison of speeches: Obama vs McCain
  • 6. Word Clouds and Tag Clouds… • … are oVen used to represent importance among terms (ex, band popularity) or serve as a naviga(on tool (ex, Google search results). Lect 10: Seman(c Word Clouds 6
  • 7. The Problem… • How to compute seman(c-­‐preserving word clouds in which seman(cally-­‐related words are close to each other. Lect 10: Seman(c Word Clouds 7
  • 8. Wordle h^p://www.wordle.net • Prac(cal tools, like Wordle, make word cloud visualiza(on easy. • Shortoming: they do not capture the rela(onships between words in any way Lect 10: Seman(c Word Clouds 8
  • 9. Many word clouds are arranged randomly (look also at the sca^ered colours) Lect 10: Seman(c Word Clouds 9
  • 10. Seman(c Pa^erns • Humans ins(nc(vely tend to pick up pa^erns • Ins(nc(vely, one could say that two words that are close to each other in a word cloud are seman(cally related. Lect 10: Seman(c Word Clouds 10
  • 11. So, it makes sense to place such related words close to each other (look also at the color distribu(on) Lect 10: Seman(c Word Clouds 11
  • 12. In linguis(cs and in LT… • … if a pair of words oVen appear together in a sentence, then we can assume that this pair of words is related seman(cally. Lect 10: Seman(c Word Clouds 12
  • 13. Seman(c word clouds have higher user sa(sfac(on compared to other layouts… Lect 10: Seman(c Word Clouds 13
  • 14. All recent word cloud visualiza(on tools aim to incoprorate seman(cs in the layout… Lect 10: Seman(c Word Clouds 14
  • 15. … but none of them provide any guarantee about the quality of the layout in terms of seman(cs Lect 10: Seman(c Word Clouds 15
  • 16. Early algorithms: Force-­‐Directed Graph • Most of the exis(ng algorithms are based on force-­‐directed graph layout. • Force-­‐directed graph drawing algorithms are a class of algorithms for drawing graphs in an aesthe(cally pleasing way – A^rac(ve forces between pairs to reduce empty space – Repulsive forces ensure that words do not overlap – Final force preserve seman(c rela(ons between words. Force-­‐directed graph drawing algorithms assign forces among the set of edges and the set of nodes of a graph drawing. Typically, spring-­‐like a^rac(ve forces based on Hooke's law are used to a^ract pairs of endpoints of the graph's edges towards each other, while simultaneously repulsive forces like those of electrically charged par(cles based on Coulomb's law are used to separate all pairs of nodes. Lect 10: Seman(c Word Clouds 16
  • 17. Newer Algorithms: rectangle representa(on of graphs • Vertex-­‐weighted and edge-­‐weighed graph: – The ver(ces of the graph are the words • Their weight correspond to some measure of importance (eg. word frequencies) – The edges capture the seman(c relatedness of pair of words (eg. co-­‐occurrence) • Their weight correspond to the strength of the rela(on – Each vertex can be drawn as a box (rectangle) with a dimension determing by its weight – A realized adjacency is the sum of the edge weights for all pairs of touching boxes. – The goal is to maximize the realized adjacencies. Lect 10: Seman(c Word Clouds 17
  • 18. Experimental Setup: 1) Term Extrac(on 2) Ranking 3) Similarity Conputa(on Lect 10: Seman(c Word Clouds 18
  • 19. Early Algorithms 1. Wordle (Random) 2. Context-­‐Preserving Word Cloud Visualiza(on (CPWCV) 3. Seam Carving Lect 10: Seman(c Word Clouds 19
  • 20. Wordle à Random • The Wordle algorithm places one word at a (me in a greedy fashion, aiming to use space as efficiently as possible. • First the words are sorted by weight in decreasing order. • Then for each word in the order, a posi(on is picked at random. Lect 10: Seman(c Word Clouds 20
  • 21. 1: Random Lect 10: Seman(c Word Clouds 21
  • 22. 2: Random Lect 10: Seman(c Word Clouds 22
  • 23. 3: Random Lect 10: Seman(c Word Clouds 23
  • 24. 4: Random Lect 10: Seman(c Word Clouds 24
  • 25. 5: Random Lect 10: Seman(c Word Clouds 25
  • 26. 6: Random Lect 10: Seman(c Word Clouds 26
  • 27. Context-­‐Preserving Word Cloud Visualiza(on (CPWCV) • First, a dissimilarity matrix is computed and Mul(dimensional Scaling (MDS) is performed • Second, Mul(dimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. effort to create a compact layout Lect 10: Seman(c Word Clouds 27
  • 28. 1: Context-­‐Preserving Lect 10: Seman(c Word Clouds 28
  • 29. 2: Context-­‐Preserving : repulsive force Lect 10: Seman(c Word Clouds 29
  • 30. 3: Context-­‐Preserving : a^rac(ve force Lect 10: Seman(c Word Clouds 30
  • 31. Seam Carving • Seam carving is a content-­‐aware image resizing technique • Basically, an algorithm for image resizing • It was invented at Mitsubishi’s Lect 10: Seman(c Word Clouds 31
  • 32. 1: Seam Carving Lect 10: Seman(c Word Clouds 32
  • 33. 2: Seam Carving : space is divided into regions Lect 10: Seman(c Word Clouds 33
  • 34. 3: Seam Carving : empty paths trimmed out itera(vely Lect 10: Seman(c Word Clouds 34
  • 35. 4: Seam Carving Lect 10: Seman(c Word Clouds 35
  • 36. 5: Seam Carving Lect 10: Seman(c Word Clouds 36
  • 37. 6: Seam Carving: space divided into regions Lect 10: Seman(c Word Clouds 37
  • 38. 7: Seam Carving Lect 10: Seman(c Word Clouds 38
  • 39. 3 New Algorithms 1. Inflate and Push 2. Star Forest 3. Cycle Cover Lect 10: Seman(c Word Clouds 39
  • 40. Inflate-­‐and-­‐Push • Simple heuris(c method for word layout, which aims to preserve seman(c rela(ons between pair of words. Lect 10: Seman(c Word Clouds 40
  • 41. 1: Inflate Lect 10: Seman(c Word Clouds 41
  • 42. 2: Inflate : scaling down Lect 10: Seman(c Word Clouds 42
  • 43. 3: Inflate : seman(cally-­‐related words are placed close to each other Lect 10: Seman(c Word Clouds 43
  • 44. 4: Inflate : repulsive force to resolve overlaps Lect 10: Seman(c Word Clouds 44
  • 45. 5: Inflate Lect 10: Seman(c Word Clouds 45
  • 46. Star Forest • A star is a tree and a star forest is a forest whose connected components are all stars. Lect 10: Seman(c Word Clouds 46
  • 47. Star Forest : star = graph • Dissimilarity matrix à disjoint stars = star forest • A^rac(ve force to get a compact layout Lect 10: Seman(c Word Clouds 47
  • 48. Cycle Cover • This algorithm is based on a similarity matrix. • First, a similarity path(=cycle) is created • Then, the op(mal level of compact-­‐ness is computed Lect 10: Seman(c Word Clouds 48
  • 49. Quan(ta(ve Metrics Lect 10: Seman(c Word Clouds 49
  • 50. Criteria 1. Realized Adjacenies – how close are similar words to each other? 2. Distor(on – how distant are dissimilar words? 3. Comptactness – how well u(lized is the drawing area? 4. Uniform Area U(liza(on – uniformity of the distribu(on (overpopulated vs sparse areas in the word cloud) 5. Aspect Ra(o – width and height of the bounding box 6. Running Time – execu(on (me Lect 10: Seman(c Word Clouds 50
  • 51. 2 datasets (1) WIKI , a set of 112 plain-­‐text ar(cles extracted from the English Wikipedia, each consis(ng of at least 200 dis(nct words (2) PAPERS , a set of 56 research papers published in conferences on experimental algorithms (SEA and ALENEX) in 2011-­‐2012. Lect 10: Seman(c Word Clouds 51
  • 52. Cycle Cover wins Lect 10: Seman(c Word Clouds 52
  • 53. Seam Carving wins Lect 10: Seman(c Word Clouds 53
  • 54. Random wins Lect 10: Seman(c Word Clouds 54
  • 55. Inflate wins Lect 10: Seman(c Word Clouds 55
  • 56. Random and Seam Carving win Lect 10: Seman(c Word Clouds 56
  • 57. All ok except Seam Carving Lect 10: Seman(c Word Clouds 57
  • 58. Demo Lect 10: Seman(c Word Clouds 58
  • 59. Final Words Lect 10: Seman(c Word Clouds 59
  • 60. The end Lect 10: Seman(c Word Clouds 60