Characterising the Emergent Semantics in Twitter Lists
1. Characterising the Emergent
Semantics in Twitter Lists
Andrés García-Silva †, Jeon-Hyung Kang*, Kristina Lerman*,
Oscar Corcho †
† {hgarcia, ocorcho}@fi.upm.es
Facultad de Informática
Universidad Politécnica de Madrid, Spain
*{jeonhyuk,lerman}@isi.edu
Information Sciences Institute,
University of Southern California, USA
2. Introduction
Twitter Lists
Characterising the Emergent Semantics in Twitter Lists 2
3. Introduction
Curators
and
List Names
Characterising the Emergent Semantics in Twitter Lists 3
4. Introduction
Members
and
List Names
Characterising the Emergent Semantics in Twitter Lists 4
5. Introduction
Subscribers
and
List Names
Characterising the Emergent Semantics in Twitter Lists 5
6. Introduction
• Previous examples showed individual uses of lists
• Some list names where related among them
• What about if we group the lists?
Characterising the Emergent Semantics in Twitter Lists 6
7. Introduction
Lists where the Yahoo!Finance user is a member
grouped by frequency of membership
Lists where the NASDAQ user is a member
grouped by number of subscriptions
Characterising the Emergent Semantics in Twitter Lists 7
8. Introduction: Research questions
• Is it possible to identify related keywords from list names according to
the use given by the different user roles?
• Are two list names related if they have been used by a similar set of
curators?
• Are two list names related if a similar set of users have subscribe to the
corresponding lists?
• Are two list names related if their corresponding lists have a similar set of
members?
• What kind of user roles will generate more related keywords?
• What types of relations between keywords can we obtain?
• Synonyms, is-a, siblings..?
Stocks Investment
Curator 1
PersonalBanking Banks Curator 2
List members
Subscriber 1
Characterising the Emergent Semantics in Twitter Lists 8
9. Approach
Elicit related keywords Characterise the
from Twitter lists semantics of the relations
Schema Representation Model to identify similar
of keywords Pairs of
keywords
related
Based on curators keywords
Vector Space Model per
Twitter Schema
Based on subscribers
Lists Latent Dirichlet Rep. and
Based on members Allocation Model
Characterising the Emergent Semantics in Twitter Lists 9
10. Approach
Elicit related keywords Characterise the
from Twitter lists semantics of the relations
Synonyms
Similarity based on WordNet Is-a
Siblings
Path Length Indirect is-a
Pairs of
related Wu & Palmer (Hierarchical Inf.) Specificity of
keywords relations
per Jiang & Conrath (Distributional Inf.)
Schema
Rep. and Synonyms
Model (sameAs)
SPARQL queries over general KBs
Binary relations
published as Linked Data
(TypeOf, BT)
DBpedia, OpenCyc, and UMBEL
Object Prop.
(Occupation)
Characterising the Emergent Semantics in Twitter Lists 10
11. Experiment: Setup
• Data set
• Total
• 297,521 lists, 2,171,140 members, 215,599 curators, and
616,662 subscribers
• We extracted 5932 unique keywords from list names; 55% of them
were found in WordNet.
• We use approximate matching of the list names with
dictionary entries
• The dictionary was created from Wikipedia article titles
Characterising the Emergent Semantics in Twitter Lists 11
12. Experiment: Execution
Elicit related keywords from Twitter lists
Pairs of
Schema Representation related
Model to identify similar
of keywords keywords
keywords
per
Based on curators Schema
Vector Space Model Rep. and
Data Model
Based on subscribers Latent Dirichlet
set
Allocation
Based on members
Characterise the semantics of the relations Each
keyword
Similarity based on WordNet with the 5
WordNet Path Length Most
Similarity related
Wu & Palmer (Hierarchical Inf.)
Jiang & Conrath (Distributional Inf.)
Characterising the Emergent Semantics in Twitter Lists 12
13. Experiment: Data Analysis
Pearson's coefficient of correlations
Correlation Values (-1 to 1)
Average J&C distance and W&P similarity
Characterising the Emergent Semantics in Twitter Lists 13
14. Experiment: Data Analysis
Path Length in WordNet
Path Length Members Subscribers Curators
VSM LDA VSM LDA VSM LDA
1 (synonyms) 8.58% 10.87% 3.97% 3.24% 1.24% 0.50%
2 (is-a) 3.42% 3.08% 1.93% 0.47% 0.70% 0.00%
3 (Siblings, ind. Is-a) 2.37% 3.77% 2.96% 2.06% 2.38% 4.03%
>3 67.61% 65.5% 67.2% 67.5% 77.8% 75.8%
% of relations found by each schema representation and model
In average 97.65% of the relations with a path length greater than 3
involve a common subsumer
Characterising the Emergent Semantics in Twitter Lists 14
15. Experiment: Data Analysis
Depth (LCS) and path length as indicators of specificity
Depth of the least common subsumer
Relations in WordNet
Length of the path setting up the relation
Relations with dept(LCS) >=5
Characterising the Emergent Semantics in Twitter Lists 15
16. Experiment: Findings
Summary
• Similarity models based on members
• produce the results that are most correlated to the results of similarity measures
based on WordNet
• find more synonyms and direct relations is-a when compared to the other
models (path length).
• The majority of relations found by any model have a path length >= 3 and
involve a common subsumer.
• Depth of LCS
• VSM based on subscribers produces the highest number of specific
relations (depth of LCS >= 5 or 6).
• Similarity models based on curators produce a lower number of relations.
Characterising the Emergent Semantics in Twitter Lists 16
17. Experiment: Execution
Elicit related keywords from Twitter lists
Pairs of
Schema Representation related
Model to identify similar
of keywords keywords
keywords
per
Based on curators Schema
Vector Space Model Rep. and
Data Model
Based on subscribers Latent Dirichlet
set
Allocation
Based on members
Each
Characterise the semantics of the keyword
Ontological relations with the 5
Relations SPARQL queries over general KBs Most
between published as Linked Data related
keywords DBpedia, OpenCyc, and UMBEL
Characterising the Emergent Semantics in Twitter Lists 17
18. Experiment
• We anchor 63.77% of the keywords extracted from
Twitter Lists to DBPedia resources
Characterising the Emergent Semantics in Twitter Lists 18
19. Experiment
Vector-space model based on members (direct relations)
Relation type Example of keywords
Broader Term 26% life-science biotech
subClassOf 26% writers authors
developer 11% google google_apps
genre 11% funland comedy
largest city 6% houston texas
Others 20% - -
Vector-space model based on subscribers (relations of length 3)
Linked data pattern (54.73%): x -> object <-y
Relations object Keywords
type type 67.35% company nokia intel
subClassOf subClassOf 30.61% activities philanthropy fundraising
Linked data pattern (43.49%): x <-object->y
Relations object Keywords
genre genre 12.43% Aesthetica theater film
occupation genre 10.27% Adam Maxwell fiction writer
occupation occupation 8.11% Alina Tugend poet writer
product product 7.57% ChenOne clothes fashion
industry product 9.73% UserLand Softw. blogs internet
known for occupation 5.41% Adeline Yen Mah author writing
known for known for 3.78% Rebecca Watson skeptics atheist
main interest main interest 3.24% Aristotle politics government
Characterising the Emergent Semantics in Twitter Lists 19
20. Conclusions
• Different models to elicit related keywords from Twitter lists.
• Curators, Subscribers and members - VSM and LDA
• Characterise the semantics of relations: WordNet-based similarity
measures and SPARQL queries over linked data sets
Characterising the Emergent Semantics in Twitter Lists 20
21. Conclusions
• Vector-space and LDA models based on members produce the most
correlated results to those of WordNet-based metrics.
• Shortest JC distance and highest WP similarities
• According to the path length in WordNet
• Models based on members produce more synonyms and direct is-a
• Most of the relations have path length ≥ 3 and have a common subsumer
• Depth of LCS
• Vector-space model based on subscribers finds highest
number of relations (depth LCS ≥ 5 and 4 ≤ path length ≤ 0)
• We confirm these results according to linked data sets
Characterising the Emergent Semantics in Twitter Lists 21
22. Characterising the Emergent
Semantics in Twitter Lists
Andrés García-Silva †, Jeon-Hyung Kang*, Kristina Lerman*,
Oscar Corcho †
† {hgarcia, ocorcho}@fi.upm.es
Facultad de Informática
Universidad Politécnica de Madrid, Spain
*{jeonhyuk,lerman}@isi.edu
Information Sciences Institute,
University of Southern California, USA