These are the slides presenting our full paper titled "Extracting Emerging Knowledge from Social Media" at the WWW 2017 conference.
The work is based on a rather obvious assumption, i.e., that knowledge in the world continuously evolves, and ontologies are largely incomplete for what concerns low-frequency data, belonging to the so-called long tail.
Socially produced content is an excellent source for discovering emerging knowledge: it is huge, and immediately reflects the relevant changes which hide emerging entities.
In the paper we propose a method and a tool for discovering emerging entities by extracting them from social media.
Once instrumented by experts through very simple initialization, the method is capable of finding emerging entities; we propose a mixed syntactic + semantic method. The method uses seeds, i.e. prototypes of emerging entities provided by experts, for generating candidates; then, it associates candidates to feature vectors, built by using terms occurring in their social content, and then ranks the candidates by using their distance from the centroid of seeds, returning the top candidates as result.
The method can be continuously or periodically iterated, using the results as new seeds.
The PDF of the full paper presented at WWW 2017 is available online (open access with Creative Common license).
You can also check out the slides of my presentation on Slideshare.
A demo version of the tool is available online for free use, thanks also to our partners Dandelion and Microsoft Azure.
You can TRY THE TOOL online if you want.
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
Extracting emerging knowledge from social media - WWW2017
1. Extracting Emerging Knowledge
from Social Media
Marco Brambilla, Stefano Ceri, Emanuele Della Valle,
Riccardo Volonterio, Felix Acero Salazar
marco.brambilla@polimi.it
marcobrambi
WWW 2017, Perth, Australia
7. There are more things
In heaven and earth, Horatio,
Than are dreamt of in your philosophy.
Shakespeare (Hamlet Act 1, scene 5)
8. The Answer to the Great Question...
Of Life, the Universe and Everything
Data
Information
Knowledge
WisdomContext
independence
Understanding
Understanding relations
Understanding patterns
Understanding principles
9. Our focus: The Evolving Knowledge
known
social
factoid
a
c
¬c
bpotentially
emerging potentially
decaying
actual and solid
d
10. Heaven and Heart
How to peer into the world
through an effective window?
TWO INGREDIENTS
Social media – the data
Domain experts – the context
11. Can we use social media to discover and codify
emerging knowledge?
16. Input (1): Domain Specific Types
Types selected by the expert
Relevant for the domain
17. Input (2): Seeds (emerging entities)
Known and selected by the domain expert
Belonging to an expert type
Thoroughly Described
# @ a
18. Objectives
(1) Discover candidate unknown emerging entities
(2) Determine the relevance of the candidate
(3) Determine the type of the candidate
19. Step (1): Social Media Sourcing
Collect content produced by the seeds
20. Step (2): Candidate Extraction
Potentially any entity extracted from the social
streams of the seeds
Resulting in huge sets of candidates
Our hyp.: take only SN users as candidates
# @ w
@
21. Step (3): Candidate Pruning
Initial pruning of candidates based on
TF-DF:= df * ttf / (N – df +1)
Where: df = Number of seeds with which a candidate co-occurs with;
ttf = Total number of times a candidate occurs in the analyzed content;
N = Number of seeds.
Ranking + threshold
(*) variant of TF-IDF that does not discount document frequency because we are actually happy about frequent appearance
(we don’t look for information entropy!)
22. Step (4): Candidate Description
Repeat social media sourcing for candidates
A potentially good candidate is one that behaves
similarly to one or more of the seeds
Our hyp.: Talks about the same things
# @ w
24. Step (6): Feature selection
Purely syntactic
only user handles (accounts)
handles and hashtags
Semantic:
based on entity extraction / Dbpedia
based on deep learning on images / ClarifAI
25. Step (6): Semantic Feature selection for text
9 basic strategies
Generating 18 combinations of T + E strategies
26. 990 semantic strategies evaluated
18 alternative feature vectors
11 different weighting values for aggregations
5 levels of recall for entity extraction
( + 3 different distance functions analyzed)
33. Repeatability in time (years!)
Recursion (candidates to seeds)
Multi-source data collection
Multiple types
Emerging relations
Emerging types
Challenges ahead
34. You can try it yourself!
http://datascience.deib.polimi.it/social-knowledge
35. THANKS!
QUESTIONS?
Marco Brambilla, Stefano Ceri, Emanuele Della Valle, Riccardo Volonterio, Felix Acero Salazar
Extracting Emerging Knowledge from Social Media
Marco Brambilla @marcobrambi marco.brambilla@polimi.it
http://datascience.deib.polimi.it http://home.deib.polimi.it/marcobrambi