SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Graph-based Analysis and
Opinion Mining in Social
Network
- Opinion about an entity
- Groups of entities

Khan Mostafa
Graduate Student (Computer Science)
Stony Brook University
positive
negative
objective

entity

keywords
subjective vs. objective
DT, -0.016
CD, -0.033
NNP, -0.052
FW, -0.060
USR, -0.072
SYM, -0.081
JJS, -0.085
WP, -0.098
URL, -0.103
RBS, -0.123
PDT, -0.143
WP$, -0.200
POS, -0.231

SUBJECTIVITY
WRB, 0.164
VBN, 0.140
VBD, 0.128
RB, 0.100
RP, 0.096
TO, 0.081
VBP, 0.078
PRP, 0.072
PRP$, 0.061
CC, 0.054
MD, 0.052
EX, 0.033
VBZ, 0.028
NNPS, 0.025
VBG, 0.017
WDT, 0.016
RBR, 0.012
JJ, 0.010
NNS, 0.008
IN, 0.006
JJR, 0.005
NN, 0.003
UH, 0.002
VB, 0.000
LS, 0.000
VB, 0.000
UH, -0.004
NN, -0.007
JJR, -0.010
IN, -0.012
NNS, -0.015
JJ, -0.019
RBR, -0.024
WDT, -0.031
VBG, -0.034
NNPS, -0.050
VBZ, -0.055
EX, -0.064
MD, -0.099
CC, -0.102
PRP$, -0.114
PRP, -0.135
VBP, -0.144
TO, -0.149
RP, -0.175
RB, -0.182
VBD, -0.227
VBN, -0.245
WRB, -0.282

PDT, 0.333
RBS, 0.280
URL, 0.229
WP, 0.217
JJS, 0.187
SYM, 0.176
USR, 0.155
FW, 0.127
NNP, 0.110
CD, 0.068
DT, 0.032

BIAS
POS, 0.600
WP$, 0.500

PoS distribution

Polarity Scorer
PoS based

N-gram based

n-gram
'enjoying break'
'happy birthday'
'so happy'
'follow back'
'miss my'
'no one notices'
'notices my'
'good day'
'follow please'
'my phone'
'presenting emotional'
'please follow'
'follow love'
'am sorry'
'so sad'
'miss u'
'new followers'
Positive
1
22
106
10
93
97
97
5
47
64
60
11
17
71
71
65
53
Negative
328
207
53
132
10
4
1
82
38
18
20
66
60
4
3
7
17

Polarity
score

positive vs. negative

Objective
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
tweet
entities

Proper nouns

keywords

Let data decide

polarity score
<TEkwPs>
<TEkwP i="699" pScore="0.460692807729435" marker="positive">
<T>? la familia RT @sheriishirlz: Lust was turnt up!!! @DeeA
llova always takes care of Shirlz ? </T>
<E>Sheriishirlz,DeeAllova,Shirlz,</E>
<kw>turnt,up,always,takes,</kw>
</TEkwP>
<TEkwP i="701" pScore="-0.316666516666734" marker="negative">
<T>@IanSmall4 @acorns47 @newmelinda I'll second that, Ian. <
/T>
<E>IanSmall4 Acorns47 Newmelinda,Ian,</E>
<kw>second,</kw>
</TEkwP>
<TEkwP i="706" pScore="0.35" marker="positive">
<T>@ManMadeMoon is pa bear having a do ? enjoy and have a be
er at my fav dive bar,doc holidays on 1st ave. it's the Star War
s bar on crack ? </T>
<E>ManMadeMoon,pa,Star Wars,</E>
<kw>do,enjoy,fav,1st,</kw>
</TEkwP>
<TEkwP i="711" pScore="-0.535463140011847" marker="negative">
<T>Photo: 90percentunrelated: I know I just included this in
that last picture set. But, I like it and this is... http://t.c
o/E8CmT1In5L </T>
<E>Photo,</E>
<kw>know,just,included,last,like,</kw>
</TEkwP>
</TEkwPs>
- Opinion about an entity

word

Overall polarity score
Keyword describing it
<opinion entity=Kyles'>
<score>0.2</score>
<analysis
post-count=‘500'
percent-positive='52.03'
percent-negative='24.59'/>
<keywords count="3">calls,
compelling, familiar</keywords>
</opinion>
1.5

1.5

1

1

0.5

0.5

0

0
0

-0.5

1

2

3

4

5

6

7

-0.5

-1

-1

-1.5
-1.5

Distribution of Polarity Score over entire
entity space

Polarity Score over ln(Occurance) of
entities

8
E×kw bigraph
E
tweet
entities
keywords
polarity score

E

E

E

E




pScore

E

kw
kw
kw
kw

E×kw bigraph such that,


kw

kw

weight

9
8

There exists an edge between Ei and kwj if there is one or more tweet that contains Ei
and kwj
The edge has a weight indicating co-occurrence of Ei and kwj. i.e.
weightij = Count ({Tk | Ei ∈ Tk.E ∧ kwj∈ Tk.kw})
The edge has pScore that is average of pScore (=P) for all such occurrences. i.e.
pScore =
Sum({Tk .pScore| Ei ∈ Tk.E ∧ kwj∈ Tk.kw})/weight

After this, a filter will be run on this graph to eliminate those links that exist between entity and
keyword where the keyword is not enough descriptive of the entity. This is done, by calculating
freq such that,

7
6
5
4
3
2
1
0
0

2000

4000

ln(Occurance)

6000

8000

10

10000

12000

14000

9
8
7
6

freqij = weightij/ Occurrence (Ei)

5
4

If freqij is smaller than certain threshold, εfreq then that keyword is filtered out for this entity Ei.

3
2
1
0
0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000
E×E graph
pScore1

E1

weight1

E1

E2

kw

kw

pScore2
weight2

pScore1

E2

pScore2

E×E graph, such that, there exists an edge between Ei and Ej if




Occurrence(Ei)> εeo ∧ Occurrence(Ej)> εeo
{kw(Ei) | Occurrence(kwx)< εkwo} ⋂ {kw(Ej) | Occurrence(kwx)< εkwo} is not empty
Polarity bias for both are similar

If a potential word occur in description of most entities then that is not an keyword but is a generic term
Very big graph, with lots of no edges!
Never built it
Entities with neighbors,
But not event this one is built.
Filtered entities with very few neighbors
Keyword from data
Tweets
analysis

E×kw
bigraph

E×E
graph

Community
detection

PoS
tagging

remove
low freq.

remove
high freq

consolidate
members

Preliminary
set

Legitimate
keywords

No generic
word as kw

Final set on
keywords

JJ, RB, VB

NN ?

Occurrence(kwx)< εkwo

size:= size of community
:= number of entities in it
Threshold := ln(size)
If (Occcurance(kw)< Threshold)
then Remove(kw)
- Groups of entities

<Communities>
<Community id="1" size="26"
conductance="0.524193548387097"
pScore="0.30589296726828">
<trapped-keywords count="8">
turning:6,single:20,Download:12,using:56,
working:6,cut:9,acting:4,crazy:6,
</trapped-keywords>
<e>Rollsroycerizzy</e><e>ZexyZek</e><e>Minecraft Trolling</e>
<e>Mediafire</e><e>Tool</e><e>Uptime24/7</e><e>IbottaApp</e>
<e>Vlambeer RADICAL FISHING</e><e>PE</e><e>Obamacare Website</e>
<e>HDM</e><e>Asuu Strike</e><e>Stevie</e><e>UH UH</e>
<e>Waze</e><e>TemmyAFC</e><e>JESU</e><e>Yuri</e>
<e>Shaq</e><e>Yourmaintopicc</e><e>Ones</e><e>CFB</e>
<e>Yotpo</e><e>2xAwesome</e><e>Urbanaira</e>
</Community>
Sample 1
Tweets
Time to analyze each
Build Bigraph
Generate EE graph
Time to Find Groups
Groups count
Largest Group size
Significant Entities
Legitimate Keywords

160711
48.91s
9.29s
1.54s
0.126s
157
136
1378
14997

Kw threshold
350
350
Minimum nodes
2
2
Common Noun as false
true
keyword
Potential kw
15108 31593
Legitimate kw
14967 31368
Entities
97147 97147
E occurring > 2
7580 7580
Significant E.
1190 2012
Groups
170
92
Largest size
70
1256

Large
Sample
485447
148.53s
34.24s
3.49s
0.310s
334
183
2627
25818

Very large
Sample
847276
262.01s
66.45
4.99s
0.358s
457
162
3560
35005

450
2
false
15108
14997
97147
7580
1378
157
136

polarity invariant version generated 174 groups
with largest group of size 598 for 1854
significant entities. Generated groups are also
significantly different.
thanks

To see result sets please visit, http://meaningofdata.com/mining/

Weitere ähnliche Inhalte

Mehr von Khan Mostafa

RDF by Structured Reference to Semantics, the RS2 framework
RDF by Structured Reference to Semantics, the RS2 frameworkRDF by Structured Reference to Semantics, the RS2 framework
RDF by Structured Reference to Semantics, the RS2 frameworkKhan Mostafa
 
Study Tour (KUET CSE 2k5) Poster
Study Tour (KUET CSE 2k5) PosterStudy Tour (KUET CSE 2k5) Poster
Study Tour (KUET CSE 2k5) PosterKhan Mostafa
 
Traffic Jam Detection System by Ratul, Sadh, Shams
Traffic Jam Detection System by Ratul, Sadh, ShamsTraffic Jam Detection System by Ratul, Sadh, Shams
Traffic Jam Detection System by Ratul, Sadh, ShamsKhan Mostafa
 
Open Document Format
Open Document FormatOpen Document Format
Open Document FormatKhan Mostafa
 
An Approach To Emerge Web 3.0
An Approach To Emerge Web 3.0An Approach To Emerge Web 3.0
An Approach To Emerge Web 3.0Khan Mostafa
 

Mehr von Khan Mostafa (7)

The Career (CSE)
The Career (CSE)The Career (CSE)
The Career (CSE)
 
RDF by Structured Reference to Semantics, the RS2 framework
RDF by Structured Reference to Semantics, the RS2 frameworkRDF by Structured Reference to Semantics, the RS2 framework
RDF by Structured Reference to Semantics, the RS2 framework
 
Study Tour (KUET CSE 2k5) Poster
Study Tour (KUET CSE 2k5) PosterStudy Tour (KUET CSE 2k5) Poster
Study Tour (KUET CSE 2k5) Poster
 
Traffic Jam Detection System by Ratul, Sadh, Shams
Traffic Jam Detection System by Ratul, Sadh, ShamsTraffic Jam Detection System by Ratul, Sadh, Shams
Traffic Jam Detection System by Ratul, Sadh, Shams
 
Open Document Format
Open Document FormatOpen Document Format
Open Document Format
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
An Approach To Emerge Web 3.0
An Approach To Emerge Web 3.0An Approach To Emerge Web 3.0
An Approach To Emerge Web 3.0
 

Kürzlich hochgeladen

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Project Presentation: Graph-based Analysis and Opinion Mining in Social Network

  • 1. Graph-based Analysis and Opinion Mining in Social Network - Opinion about an entity - Groups of entities Khan Mostafa Graduate Student (Computer Science) Stony Brook University
  • 3. subjective vs. objective DT, -0.016 CD, -0.033 NNP, -0.052 FW, -0.060 USR, -0.072 SYM, -0.081 JJS, -0.085 WP, -0.098 URL, -0.103 RBS, -0.123 PDT, -0.143 WP$, -0.200 POS, -0.231 SUBJECTIVITY WRB, 0.164 VBN, 0.140 VBD, 0.128 RB, 0.100 RP, 0.096 TO, 0.081 VBP, 0.078 PRP, 0.072 PRP$, 0.061 CC, 0.054 MD, 0.052 EX, 0.033 VBZ, 0.028 NNPS, 0.025 VBG, 0.017 WDT, 0.016 RBR, 0.012 JJ, 0.010 NNS, 0.008 IN, 0.006 JJR, 0.005 NN, 0.003 UH, 0.002 VB, 0.000 LS, 0.000 VB, 0.000 UH, -0.004 NN, -0.007 JJR, -0.010 IN, -0.012 NNS, -0.015 JJ, -0.019 RBR, -0.024 WDT, -0.031 VBG, -0.034 NNPS, -0.050 VBZ, -0.055 EX, -0.064 MD, -0.099 CC, -0.102 PRP$, -0.114 PRP, -0.135 VBP, -0.144 TO, -0.149 RP, -0.175 RB, -0.182 VBD, -0.227 VBN, -0.245 WRB, -0.282 PDT, 0.333 RBS, 0.280 URL, 0.229 WP, 0.217 JJS, 0.187 SYM, 0.176 USR, 0.155 FW, 0.127 NNP, 0.110 CD, 0.068 DT, 0.032 BIAS POS, 0.600 WP$, 0.500 PoS distribution Polarity Scorer PoS based N-gram based n-gram 'enjoying break' 'happy birthday' 'so happy' 'follow back' 'miss my' 'no one notices' 'notices my' 'good day' 'follow please' 'my phone' 'presenting emotional' 'please follow' 'follow love' 'am sorry' 'so sad' 'miss u' 'new followers' Positive 1 22 106 10 93 97 97 5 47 64 60 11 17 71 71 65 53 Negative 328 207 53 132 10 4 1 82 38 18 20 66 60 4 3 7 17 Polarity score positive vs. negative Objective 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  • 4. tweet entities Proper nouns keywords Let data decide polarity score <TEkwPs> <TEkwP i="699" pScore="0.460692807729435" marker="positive"> <T>? la familia RT @sheriishirlz: Lust was turnt up!!! @DeeA llova always takes care of Shirlz ? </T> <E>Sheriishirlz,DeeAllova,Shirlz,</E> <kw>turnt,up,always,takes,</kw> </TEkwP> <TEkwP i="701" pScore="-0.316666516666734" marker="negative"> <T>@IanSmall4 @acorns47 @newmelinda I'll second that, Ian. < /T> <E>IanSmall4 Acorns47 Newmelinda,Ian,</E> <kw>second,</kw> </TEkwP> <TEkwP i="706" pScore="0.35" marker="positive"> <T>@ManMadeMoon is pa bear having a do ? enjoy and have a be er at my fav dive bar,doc holidays on 1st ave. it's the Star War s bar on crack ? </T> <E>ManMadeMoon,pa,Star Wars,</E> <kw>do,enjoy,fav,1st,</kw> </TEkwP> <TEkwP i="711" pScore="-0.535463140011847" marker="negative"> <T>Photo: 90percentunrelated: I know I just included this in that last picture set. But, I like it and this is... http://t.c o/E8CmT1In5L </T> <E>Photo,</E> <kw>know,just,included,last,like,</kw> </TEkwP> </TEkwPs>
  • 5. - Opinion about an entity word Overall polarity score Keyword describing it <opinion entity=Kyles'> <score>0.2</score> <analysis post-count=‘500' percent-positive='52.03' percent-negative='24.59'/> <keywords count="3">calls, compelling, familiar</keywords> </opinion> 1.5 1.5 1 1 0.5 0.5 0 0 0 -0.5 1 2 3 4 5 6 7 -0.5 -1 -1 -1.5 -1.5 Distribution of Polarity Score over entire entity space Polarity Score over ln(Occurance) of entities 8
  • 6. E×kw bigraph E tweet entities keywords polarity score E E E E   pScore E kw kw kw kw E×kw bigraph such that,  kw kw weight 9 8 There exists an edge between Ei and kwj if there is one or more tweet that contains Ei and kwj The edge has a weight indicating co-occurrence of Ei and kwj. i.e. weightij = Count ({Tk | Ei ∈ Tk.E ∧ kwj∈ Tk.kw}) The edge has pScore that is average of pScore (=P) for all such occurrences. i.e. pScore = Sum({Tk .pScore| Ei ∈ Tk.E ∧ kwj∈ Tk.kw})/weight After this, a filter will be run on this graph to eliminate those links that exist between entity and keyword where the keyword is not enough descriptive of the entity. This is done, by calculating freq such that, 7 6 5 4 3 2 1 0 0 2000 4000 ln(Occurance) 6000 8000 10 10000 12000 14000 9 8 7 6 freqij = weightij/ Occurrence (Ei) 5 4 If freqij is smaller than certain threshold, εfreq then that keyword is filtered out for this entity Ei. 3 2 1 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
  • 7. E×E graph pScore1 E1 weight1 E1 E2 kw kw pScore2 weight2 pScore1 E2 pScore2 E×E graph, such that, there exists an edge between Ei and Ej if    Occurrence(Ei)> εeo ∧ Occurrence(Ej)> εeo {kw(Ei) | Occurrence(kwx)< εkwo} ⋂ {kw(Ej) | Occurrence(kwx)< εkwo} is not empty Polarity bias for both are similar If a potential word occur in description of most entities then that is not an keyword but is a generic term
  • 8. Very big graph, with lots of no edges! Never built it
  • 9. Entities with neighbors, But not event this one is built.
  • 10. Filtered entities with very few neighbors
  • 11. Keyword from data Tweets analysis E×kw bigraph E×E graph Community detection PoS tagging remove low freq. remove high freq consolidate members Preliminary set Legitimate keywords No generic word as kw Final set on keywords JJ, RB, VB NN ? Occurrence(kwx)< εkwo size:= size of community := number of entities in it Threshold := ln(size) If (Occcurance(kw)< Threshold) then Remove(kw)
  • 12. - Groups of entities <Communities> <Community id="1" size="26" conductance="0.524193548387097" pScore="0.30589296726828"> <trapped-keywords count="8"> turning:6,single:20,Download:12,using:56, working:6,cut:9,acting:4,crazy:6, </trapped-keywords> <e>Rollsroycerizzy</e><e>ZexyZek</e><e>Minecraft Trolling</e> <e>Mediafire</e><e>Tool</e><e>Uptime24/7</e><e>IbottaApp</e> <e>Vlambeer RADICAL FISHING</e><e>PE</e><e>Obamacare Website</e> <e>HDM</e><e>Asuu Strike</e><e>Stevie</e><e>UH UH</e> <e>Waze</e><e>TemmyAFC</e><e>JESU</e><e>Yuri</e> <e>Shaq</e><e>Yourmaintopicc</e><e>Ones</e><e>CFB</e> <e>Yotpo</e><e>2xAwesome</e><e>Urbanaira</e> </Community>
  • 13. Sample 1 Tweets Time to analyze each Build Bigraph Generate EE graph Time to Find Groups Groups count Largest Group size Significant Entities Legitimate Keywords 160711 48.91s 9.29s 1.54s 0.126s 157 136 1378 14997 Kw threshold 350 350 Minimum nodes 2 2 Common Noun as false true keyword Potential kw 15108 31593 Legitimate kw 14967 31368 Entities 97147 97147 E occurring > 2 7580 7580 Significant E. 1190 2012 Groups 170 92 Largest size 70 1256 Large Sample 485447 148.53s 34.24s 3.49s 0.310s 334 183 2627 25818 Very large Sample 847276 262.01s 66.45 4.99s 0.358s 457 162 3560 35005 450 2 false 15108 14997 97147 7580 1378 157 136 polarity invariant version generated 174 groups with largest group of size 598 for 1854 significant entities. Generated groups are also significantly different.
  • 14. thanks To see result sets please visit, http://meaningofdata.com/mining/