How to Troubleshoot Apps for the Modern Connected Worker
Project Presentation: Graph-based Analysis and Opinion Mining in Social Network
1. Graph-based Analysis and
Opinion Mining in Social
Network
- Opinion about an entity
- Groups of entities
Khan Mostafa
Graduate Student (Computer Science)
Stony Brook University
4. tweet
entities
Proper nouns
keywords
Let data decide
polarity score
<TEkwPs>
<TEkwP i="699" pScore="0.460692807729435" marker="positive">
<T>? la familia RT @sheriishirlz: Lust was turnt up!!! @DeeA
llova always takes care of Shirlz ? </T>
<E>Sheriishirlz,DeeAllova,Shirlz,</E>
<kw>turnt,up,always,takes,</kw>
</TEkwP>
<TEkwP i="701" pScore="-0.316666516666734" marker="negative">
<T>@IanSmall4 @acorns47 @newmelinda I'll second that, Ian. <
/T>
<E>IanSmall4 Acorns47 Newmelinda,Ian,</E>
<kw>second,</kw>
</TEkwP>
<TEkwP i="706" pScore="0.35" marker="positive">
<T>@ManMadeMoon is pa bear having a do ? enjoy and have a be
er at my fav dive bar,doc holidays on 1st ave. it's the Star War
s bar on crack ? </T>
<E>ManMadeMoon,pa,Star Wars,</E>
<kw>do,enjoy,fav,1st,</kw>
</TEkwP>
<TEkwP i="711" pScore="-0.535463140011847" marker="negative">
<T>Photo: 90percentunrelated: I know I just included this in
that last picture set. But, I like it and this is... http://t.c
o/E8CmT1In5L </T>
<E>Photo,</E>
<kw>know,just,included,last,like,</kw>
</TEkwP>
</TEkwPs>
5. - Opinion about an entity
word
Overall polarity score
Keyword describing it
<opinion entity=Kyles'>
<score>0.2</score>
<analysis
post-count=‘500'
percent-positive='52.03'
percent-negative='24.59'/>
<keywords count="3">calls,
compelling, familiar</keywords>
</opinion>
1.5
1.5
1
1
0.5
0.5
0
0
0
-0.5
1
2
3
4
5
6
7
-0.5
-1
-1
-1.5
-1.5
Distribution of Polarity Score over entire
entity space
Polarity Score over ln(Occurance) of
entities
8
6. E×kw bigraph
E
tweet
entities
keywords
polarity score
E
E
E
E
pScore
E
kw
kw
kw
kw
E×kw bigraph such that,
kw
kw
weight
9
8
There exists an edge between Ei and kwj if there is one or more tweet that contains Ei
and kwj
The edge has a weight indicating co-occurrence of Ei and kwj. i.e.
weightij = Count ({Tk | Ei ∈ Tk.E ∧ kwj∈ Tk.kw})
The edge has pScore that is average of pScore (=P) for all such occurrences. i.e.
pScore =
Sum({Tk .pScore| Ei ∈ Tk.E ∧ kwj∈ Tk.kw})/weight
After this, a filter will be run on this graph to eliminate those links that exist between entity and
keyword where the keyword is not enough descriptive of the entity. This is done, by calculating
freq such that,
7
6
5
4
3
2
1
0
0
2000
4000
ln(Occurance)
6000
8000
10
10000
12000
14000
9
8
7
6
freqij = weightij/ Occurrence (Ei)
5
4
If freqij is smaller than certain threshold, εfreq then that keyword is filtered out for this entity Ei.
3
2
1
0
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
7. E×E graph
pScore1
E1
weight1
E1
E2
kw
kw
pScore2
weight2
pScore1
E2
pScore2
E×E graph, such that, there exists an edge between Ei and Ej if
Occurrence(Ei)> εeo ∧ Occurrence(Ej)> εeo
{kw(Ei) | Occurrence(kwx)< εkwo} ⋂ {kw(Ej) | Occurrence(kwx)< εkwo} is not empty
Polarity bias for both are similar
If a potential word occur in description of most entities then that is not an keyword but is a generic term
13. Sample 1
Tweets
Time to analyze each
Build Bigraph
Generate EE graph
Time to Find Groups
Groups count
Largest Group size
Significant Entities
Legitimate Keywords
160711
48.91s
9.29s
1.54s
0.126s
157
136
1378
14997
Kw threshold
350
350
Minimum nodes
2
2
Common Noun as false
true
keyword
Potential kw
15108 31593
Legitimate kw
14967 31368
Entities
97147 97147
E occurring > 2
7580 7580
Significant E.
1190 2012
Groups
170
92
Largest size
70
1256
Large
Sample
485447
148.53s
34.24s
3.49s
0.310s
334
183
2627
25818
Very large
Sample
847276
262.01s
66.45
4.99s
0.358s
457
162
3560
35005
450
2
false
15108
14997
97147
7580
1378
157
136
polarity invariant version generated 174 groups
with largest group of size 598 for 1854
significant entities. Generated groups are also
significantly different.