The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.
7. For content interpretation and complex filtering
tasks we want to know who/what people talk about.
7d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
8. Entity Linking
8d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
TANK (VEHICLE)
Knowledge
Base (KB)
Document r
TANK
query q
?
?
TANK JOHNSON
14. Challenges
1. Entity "importance"
2. Noisy & short text (Twitter), updates in the KB
14d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
15. Challenge 1:
Entity Importance
Q: When should an entity exist in Wikipedia?
A: When it is important or has impact
!
!
15d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
16. Challenge 1:
Entity Importance
Q: When should an entity exist in Wikipedia?
A: When it is important or has impact
!
Q: How do you know an entity is important or has impact?
A: If it is in Wikipedia, it is/has
16d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
17. Challenge 1:
Entity Importance
Can we leverage today's entities to learn
to predict tomorrow's entities?
17d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
18. Challenge 1:
Entity Importance
Can we leverage today's entities to learn
to predict tomorrow's entities?
18d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
74
19. Challenge 1:
Entity Importance
Can we leverage today's entities to learn
to predict tomorrow's entities?
19d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
74/140
20. Challenge 2:
Noisy data & changing KB
Unsupervised method for generating
pseudo-ground truth (for training NER)
20d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
21. Assumption
A named-entity recognizer trained only on
KB entities will learn to recognize KB entities
21d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
22. 22d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
23. 23d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
24. 24d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
25. 25d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
26. 26d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
This is like IBM buying Apple after
the Homebrew Computing Club
demo of the Apple I.
27. 27d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
This is like IBM buying Apple after
the Homebrew Computing Club
demo of the Apple I.
28. d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
This is like IBM buying Apple after
the Homebrew Computing Club
demo of the Apple I.
29. 29d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
30. 30d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
Unlabeled Tweet
?
31. 31d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
Unlabeled Tweet
?
Hahaha! Are we sure Jillert Anema isn't
Canadian? RT @rzbh: Dutch Coach's Anti-
America Rant http://on.cc.com/1htk9Wo
32. 32d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
Unlabeled Tweet
?
33. 33d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
Unlabeled Tweet
?
Sample
Corpus
34. 34d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
Unlabeled Tweet
?
Sample
Corpus
Training data
m1
, c1
m2
, c2
35. 35d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
Unlabeled Tweet
?
Sample
Corpus
Training data
m1
, c1
m2
, c2
This is like IBM buying Apple after
the Homebrew Computing Club
demo of the Apple I.
36. 36d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
Unlabeled Tweet
?
Sample
Corpus
Training data
m1
, c1
m2
, c2
This is like IBM buying Apple after
the Homebrew Computing Club
demo of the Apple I.
product
organization
organization
37. 37d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
Unlabeled Tweet
?
Sample
Corpus
Training data
m1
, c1
m2
, c2
NERC
38. 38d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
Unlabeled Tweet
?
Sample
Corpus
Training data
m1
, c1
m2
, c2
NERC
NERC
Model
39. 39d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
Unlabeled Tweet
?
Sample
Corpus
Training data
m1
, c1
m2
, c2
NERC
NERC
Model
40. 40d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
Unlabeled Tweet
?
Sample
Corpus
Training data
m1
, c1
m2
, c2
NERC
NERC
Model
Predictions
m1
, c1
m2, c2
…
41. 41d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
Unlabeled Tweet
?
Sample
Corpus
Training data
m1
, c1
m2
, c2
NERC
NERC
Model
Predictions
m1
, c1
m2, c2
…
Today's
KB
small KB
42. 42d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Future
KB
Unlabeled Tweet
?
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
Sample
Corpus
Training data
m1
, c1
m2
, c2
NERC
NERC
Model
Predictions
m1
, c1
m2, c2
…
Today's
KB
small KB
43. Future
KB
43d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Unlabeled Tweet
?
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
Sample
Corpus
Training data
m1
, c1
m2
, c2
NERC
NERC
Model
Predictions
m1
, c1
m2, c2
…
Today's
KB
full KB
small KB
44. 44d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
m1
, e1
m2
, e2
Unlabeled Tweet
?
Sample
Corpus
Training data
m1
, c1
m2
, c2
NERC
NERC
Model
Predictions
m1
, c1
m2, c2
…
Today's
KB
Future
KB
45. 45d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
Unlabeled Tweet
?
NERC
Model
Predictions
m1
, c1
m2, c2
…
Today's
KB
Future
KB
Ground Truth
m1
, c1
m2
, c2
…
46. 46d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet Entity
Linker
Unlabeled Tweet
?
NERC
Model
Predictions
m1
, c1
m2, c2
…
Today's
KB
Future
KB
Ground Truth
m1
, c1
m2
, c2
…
Evaluate
47. Evaluation
• Mention level (NER style)
• Entity level
!
47d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
48. 48d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Evaluation
This is like IBM buying Apple after
the Homebrew Computing Club
demo of the Apple I.
Prediction
• Mention level (NER style)
• Entity level
!
49. • Mention level (NER style)
• Entity level
!
49d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Evaluation
This is like IBM buying Apple after
the Homebrew Computing Club
demo of the Apple I.
This is like IBM buying Apple after
the Homebrew Computing Club
demo of the Apple I.
Prediction
Ground Truth
50. 50d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Evaluation
• Mention level (NER style)
• Entity level
This is like IBM buying Apple after
the Homebrew Computing Club
demo of the Apple I.
This is like IBM buying Apple after
the Homebrew Computing Club
demo of the Apple I.
Prediction
Ground Truth
51. 51d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweets Entities Tweets EntitiesPrediction
Ground Truth
52. 52d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Tweets Entities Tweets EntitiesPrediction
Ground Truth
53. Experimental setup
Data:
Corpus: Twitter (TREC'11 MB: 4,832,838 tweets)
KB: Wikipedia (Jan 4th, 2012)
!
Components:
EL: Semanticizer
NERC: Custom
53d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
55. NERC
Two-stage approach [1]
1. Recognition
• Predict entity span
• For each token predict B, I, or O tag.
• Structured perceptron
2. Classification
• Given entity span, predict entity class (PER/LOC/ORG)
• SVMs
55d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
[1] L. Buitinck and M. Marx. Two-stage named-entity recognition using averaged perceptrons. In NLDB'12, 2012.
56. NERC
Two-stage approach [1]
1. Recognition
• Predict entity span
• For each token predict B, I, or O tag.
• Structured perceptron
2. Classification
• Given entity span, predict entity class (PER/LOC/ORG)
• SVMs
56d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
[1] L. Buitinck and M. Marx. Two-stage named-entity recognition using averaged perceptrons. In NLDB'12, 2012.
"ASAP Rocky and Kendrick Lamar,
that's when I started listening again"
57. NERC
Two-stage approach [1]
1. Recognition
• Predict entity span
• For each token predict B, I, or O tag.
• Structured perceptron
2. Classification
• Given entity span, predict entity class (PER/LOC/ORG)
• SVMs
57d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
[1] L. Buitinck and M. Marx. Two-stage named-entity recognition using averaged perceptrons. In NLDB'12, 2012.
"ASAP Rocky and Kendrick Lamar,
that's when I started listening again"
B I O B I
O O O O O O
58. NERC
Two-stage approach [1]
1. Recognition
• Predict entity span
• For each token predict B, I, or O tag.
• Structured perceptron
2. Classification
• Given entity span, predict entity class (PER/LOC/ORG)
• SVMs
58d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
[1] L. Buitinck and M. Marx. Two-stage named-entity recognition using averaged perceptrons. In NLDB'12, 2012.
"ASAP Rocky and Kendrick Lamar,
that's when I started listening again"
person person
59. 59d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
m1
, e1
m2
, e2
Future
KB
NERC
Tweet
NERC
Model
Unlabeled Tweet
?
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Predictions
m1
, c1
m2, c2
…
Entity
Linker
Sample
Corpus
Training data
m1
, c1
m2
, c2
60. From tweet to training sample
1. Convert EL output (Wikipedia concepts) to NERC labels;
• Label entity span (B-I-O) & class (PER/LOC/ORG)
!
2. Pick "good" samples
• entity linker's confidence score
• textual quality
60d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
62. Entity Class
1. Map Wikipedia entity to DBpedia entity
2. Retrieve entity class (ontology);
• if Person: PER
• if Organisation, Company, or Non-ProfitOrganisation: ORG
• if Place, PopulatedPlace, City, Country: LOC
• …?
62d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
63. Sampling Methods
1. Entity linker confidence score
2. Textual quality
63d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
64. Sampling 1: Confidence Score
• Extract anchor text (a) to Wikipedia page (W)-mappings
• Confidence score combines two signals:
1. How common is it that a is used as a link
2. How commonly is a used as a link to W
64d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
65. Sampling 1: Confidence Score
• Higher threshold = fewer entities, less noise
• Lower threshold = fewer entities, more noise
65d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
66. Sampling 2: Textual quality
66d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
67. Sampling 2: Textual quality
67d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Highest scoring tweets
1. Watching the History channel, Hitler’”⁹s
Family. Hitler hid his true family
heritage, while others had to measure
up to Aryan purity.
2. When you sense yourself becoming
negative, stop and consider what it
would mean to apply that negative
energy in the opposite direction.
3. So. After school tomorrow, french
revision class. Tuesday, Drama
rehearsal and then at 8, cricket
training. Wednesday, Drama. Thursday
… (c)
Lowest scoring tweets
1. Toni Braxton ~ He Wasn't Man
Enough for Me _HASHTAG_
_HASHTAG_? _URL_ RT _Mention_
2. tell me what u think The GetMore
Girls, Part One _URL_
3. this girl better not go off on me rt
68. Sampling 2: Textual quality
• Compare different sampling strategies;
• top tweets
• medium tweets
• medium+top tweets
• low+medium+top tweets (no sampling)
68d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
70. 70d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
m1
, e1
m2
, e2
NERC
NERC
Model
Unlabeled Tweet
?Entity
Linker
Future
KB
Tweet
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus Sample
Corpus
Predictions
m1
, c1
m2, c2
…
RQ1:
What is the impact of our sampling methods for
generating
pseudo-ground truth?
Training data
m1
, c1
m2
, c2
72. Findings: EL confidence score threshold
2. Higher threshold, more predictions
72d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
73. Findings: Textual Quality Sampling
73d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
74. 74d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
Future
KB
Tweet
Unlabeled Tweet
?
NERC
Model
NERC
Training data
m1
, c1
m2
, c2
Sample
Corpus
m1
, e1
m2
, e2
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
Tweet
Corpus
RQ2:
What is the impact of the size of prior knowledge
on detecting unknown entities?
Entity
Linker
Today's
KB
Predictions
m1
, c1
m2, c2
…
76. Conclusions
Recall increases as amount of prior knowledge grows:
1. Able to deal with missing labels, justifying approach
2. Rate of unknown entity detection increases as KB grows
76d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media
77. Future Work
• Next step: Closing the loop
• Feed back to KB (entity normalization)
• From PER/LOC/ORG entities to other classes:
• Books, buildings, drugs, artists, …?
• Apply to other domains, languages
• From random sampling to time-based sampling
77d.p.graus@uva.nl | @dvdgrs | www.graus.co April 15 2014 | ECIR ‘14 | Mining Social Media