This document summarizes Besnik Fetahu's PhD defense presentation on approaches for improving and enriching textual knowledge bases like Wikipedia. It discusses what a textual knowledge base is using Wikipedia as an example. It then covers Wikipedia's editor collaboration dynamics and growth rates. The presentation addresses challenges like finding news citations to support statements in Wikipedia, determining the span of citations, and suggesting new information for Wikipedia pages from news articles. The thesis contributions include methods for citation recommendation, citation span identification, and news suggestion to address gaps in Wikipedia coverage.
3. Wikipedia as a textual knowledge base
3
Wikipedia
Articles
University of
Hannover
Infobox Section Template Section Text
Wikipedia is a free online encyclopedia with the aim to allow anyone to edit articles.
Wikipedia is the largest and most popular general reference work on the Internet, and is
ranked the 5th popular website. Wikipedia is owned by the nonprofit Wikimedia Foundation.
4. Wikipedia Editor Collaboration Dynamics
4
Wikipedia
Wikipedia Editors
Localized Wikipedias
Editor Profiles
lang: {English}
topic: {Education, Politics}
~40 mill. articles
293
lang.
32 mill. editors (only
in english Wikipedia)
Wikipedia Revisions
(cur | prev) 02:51, 5 October 2017
Brilliantwiki2 (talk | contribs) . . (14,516 bytes)
(+69) . . (→Rankings) (undo | thank)
(cur | prev) 00:30, 21 August 2017 Blueclaw
(talk | contribs) . . (14,447 bytes) (+83) . .
(→Alumni: added Flügge-Lotz) (undo | thank)
(cur | prev) 09:03, 18 June 2017
77.23.196.148 (talk) . . (14,364 bytes) (-2) . .
(→History: pupils -> students, today and now
in same sentence corrected) (undo)
(cur | prev) 05:35, 10 June 2017 AnomieBOT
(talk | contribs) . . (14,366 bytes) (+319) . .
(Rescuing orphaned refs ("Mitarbeiter und
Etat" from rev 782668206; "Studierende" from
rev 782668206)) (undo)
(cur | prev) 01:24, 10 June 2017 Mephistolus
(talk | contribs) m . . (14,047 bytes) (+9) . .
(undo | thank) (Tag: Visual edit)
(cur | prev) 01:21, 10 June 2017 Mephistolus
(talk | contribs) . . (14,038 bytes) (-89) . .
(Update infobox) (undo | thank) (Tag: Visual
edit)
5. • Wikipedia and its sister projects develop at a rate of
over 10 edits per second, performed by editors from all
over the world.
• English Wikipedia has an average growth rate of 600
new articles per day.
Wikipedia Dynamics and Growth
5
Wikipedia’s Daily Growth Rate
6. Editorial Policies in Wikipedia
6
Wikipedia is written from a neutral point of view.
Content in Wikipedia must be verifiable. The
burden of evidence lies with the editor who adds
content into a page.
No original research. Content — such as facts,
allegations, and ideas — for which no reliable,
published sources exist.
10. Importance and Use of Wikipedia
9
Rank Site
Daily Time on
Site
Daily Pageviews per
Visitor
% of traffic from
search
Total Sites
Lining in
1 google.com 8:02 8.93 4.30% 3.56 mill.
2 youtube.com 8:27 4.98 15.40% 2.69 mill.
3 facebook.com 9:48 4.01 8.30% 7.6 mill.
4 baidu.com 7:56 6.36 8.50% 1.3 mill.
5 wikipedia.org 4:11 3.28 68.40% 1.7 mill.
12. From Wikipedia to Structured Data and Search
11
Web Search (Google Knowledge Cards)
VoiceSearch
13. Use Cases: Question Answering
12
Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes:
Reading Wikipedia to Answer Open-Domain Questions. ACL 2017.
Q: When did the 1973 oil crisis begin?
A: October
1973
15. Issues with Verifiability in Wikipedia
14
https://en.wikipedia.org/wiki/Wikipedia:List_of_hoaxes_on_Wikipedia
16. Issues with Verifiability in Wikipedia
15
https://www.theguardian.com/technology/2017/feb/08/wikipedia-bans-
daily-mail-as-unreliable-source-for-website
17. Issues with Verifiability in Wikipedia
15
Unreliable news
source
https://www.theguardian.com/technology/2017/feb/08/wikipedia-bans-
daily-mail-as-unreliable-source-for-website
18. Issues with Verifiability in Wikipedia
16
Where was Obama born?
In 2012, Breitbart.com published a copy of a promotional
booklet that Obama's literary agency, Acton & Dystel,
printed in 1991 (and later posted to their website, in a
biography in place until April 2007) which misidentified
Obama's birthplace and states that Obama was "born in
Kenya and raised in Indonesia and Hawaii."
Obama was born on August 4, 1961, at Kapiʻolani Maternity &
Gynecological Hospital in Honolulu, Hawaii. He is the first President to
have been born in Hawaii, making him the first President born outside of
the contiguous 48 states. He was born to a white mother and a black
father. His mother, Ann Dunham (1942–1995), was born in Wichita,
Kansas, of mostly English descent, with some German, Irish, Scottish,
Swiss, and Welsh ancestry.
Kenya. Kapiʻolani Maternity & Gynecological Hospital in Honolulu, Hawaii
Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi: Bidirectional
Attention Flow for Machine Comprehension. CoRR abs/1611.01603 (2016)
19. Issues with Verifiability in Wikipedia
16
Where was Obama born?
In 2012, Breitbart.com published a copy of a promotional
booklet that Obama's literary agency, Acton & Dystel,
printed in 1991 (and later posted to their website, in a
biography in place until April 2007) which misidentified
Obama's birthplace and states that Obama was "born in
Kenya and raised in Indonesia and Hawaii."
Obama was born on August 4, 1961, at Kapiʻolani Maternity &
Gynecological Hospital in Honolulu, Hawaii. He is the first President to
have been born in Hawaii, making him the first President born outside of
the contiguous 48 states. He was born to a white mother and a black
father. His mother, Ann Dunham (1942–1995), was born in Wichita,
Kansas, of mostly English descent, with some German, Irish, Scottish,
Swiss, and Welsh ancestry.
Kenya. Kapiʻolani Maternity & Gynecological Hospital in Honolulu, Hawaii
Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi: Bidirectional
Attention Flow for Machine Comprehension. CoRR abs/1611.01603 (2016)
20. Misalignment of Editor Efforts in Wikipedia
17
• Human Fatalities: 10k vs 1.8k
losses
• Estimated Damages: $4.5 vs.
$108 billions
• “Odisha Cyclone” without
coverage and mention in
Wikipedia article “Odisha”
• “Hurricane Katrina” finds broad
coverage in Wikipedia article
“New Orleans”
22. Challenges and Contributions
19
For an arbitrary statement in Wikipedia how can we find
citations which provide evidence for it?
For a paragraph in Wikipedia and an existing citation how can
we determine the exact span of the citation?
For a Wikipedia page and a given news corpus how can we
find and suggest important and novel information for a page?
23. Part (I): Finding news citations
for Wikipedia entity pages?
Besnik Fetahu, Katja Markert, Wolfgang Nejdl, Avishek Anand:
“Finding News Citations for Wikipedia”. CIKM 2016: 337-346
24. News Collection
t1 t2 tn
Textual Knowledge Base
t1 t2 tn
Citation Recommendation Citation Span
News Suggestion
Entity Placement Section Placement
e:“Barack Obama”
Obama was born on
August 4, 1961,[4] …..
The couple married in
Wailuku on Maui on …
After graduating ... a JD …
magna cum laude[49]…
Obama was elected to the
Illinois Senate in …
news? query for s1
c4
Obama was born on August 4,
1961, at Kapiʻolani Maternity &
Gynecological Hospital in
Honolulu, Hawaii.
c4s1
Obama was born on August 4, 1961, at
Kapiʻolani Maternity & Gynecological Hospital
in Honolulu, Hawaii.
citation c4 span
e:“Barack Obama” AND t2
news: nk
time: > t2
The choice of Barack Obama on Friday
as the recipient of the 2009 Nobel Peace
Prize, [...] around the globe. [...] The
Nobel committee’s embrace of Mr. Obama
was viewed [as a rejection of the
unpopular tenure, in] Europe especially, of
his predecessor, George W. Bush. [...] “To
be honest,” the president said in the Rose
Garden, [...] Last year’s laureate, former
President Martti Ahtisaari of Finland, saw
the award as an endorsement of Mr.
Obama’s goal of achieving Middle East
peace.
1.Early life and career
2.Political career
3.2008 presidential
campaign
4.Presidency
5.Political positions
6.Family and Personal
life
7.Cultural and political
image
1.Early life and career
2.Political career
3.2008 presidential
campaign
4.Presidency
5.Political positions
6.Nobel Peace Prize
7.Family and Personal life
8.Cultural and political
image
t2 t3
publish date t
headline
body
entity mentions (e.g. “Barack
Obama, Nobel Prize”…)
revision date t
entity title
sections
section text
categories
citations
25. Finding news citations for Wikipedia entity pages
22
Wikipedia
Articles
1. Early life and career
2. Presidential campaigns
3. Presidency (2009-2017)
4. Post-presidency (2017-present)
5. Legacy
6. Books written
Section Chunking Statements Extraction
Obama was born on August 4, 1961,[c] at Kapiʻolani
Maternity & Gynecological Hospital in Honolulu, Hawaii.[c]
He is the first President to have been born in Hawaii,[c]
making him the first President born outside of the
contiguous 48 states.[c] […] His father, Barack Obama Sr.
(1936–1982), was a married Luo Kenyan man from
Nyang'oma Kogelo. Obama's parents met in 1960 in a
Russian language class at the University of Hawaii at
Manoa, where his father was a foreign student on
scholarship.[c] […]
Obama's parents met in 1960 in
a Russian language class at the
University of Hawaii at Manoa,
where his father was a foreign
student on scholarship.[c]
Find a citation for the
statement!
Does it require a
news citation?
Yes
26. Finding news citations for Wikipedia entity pages
22
Wikipedia
Articles
1. Early life and career
2. Presidential campaigns
3. Presidency (2009-2017)
4. Post-presidency (2017-present)
5. Legacy
6. Books written
Section Chunking Statements Extraction
Obama was born on August 4, 1961,[c] at Kapiʻolani
Maternity & Gynecological Hospital in Honolulu, Hawaii.[c]
He is the first President to have been born in Hawaii,[c]
making him the first President born outside of the
contiguous 48 states.[c] […] His father, Barack Obama Sr.
(1936–1982), was a married Luo Kenyan man from
Nyang'oma Kogelo. Obama's parents met in 1960 in a
Russian language class at the University of Hawaii at
Manoa, where his father was a foreign student on
scholarship.[c] […]
Obama's parents met in 1960 in
a Russian language class at the
University of Hawaii at Manoa,
where his father was a foreign
student on scholarship.[c]
Find a citation for the
statement!
Does it require a
news citation?
Yes
Task—1: Statement Categorization
27. Finding news citations for Wikipedia entity pages
22
Wikipedia
Articles
1. Early life and career
2. Presidential campaigns
3. Presidency (2009-2017)
4. Post-presidency (2017-present)
5. Legacy
6. Books written
Section Chunking Statements Extraction
Obama was born on August 4, 1961,[c] at Kapiʻolani
Maternity & Gynecological Hospital in Honolulu, Hawaii.[c]
He is the first President to have been born in Hawaii,[c]
making him the first President born outside of the
contiguous 48 states.[c] […] His father, Barack Obama Sr.
(1936–1982), was a married Luo Kenyan man from
Nyang'oma Kogelo. Obama's parents met in 1960 in a
Russian language class at the University of Hawaii at
Manoa, where his father was a foreign student on
scholarship.[c] […]
Obama's parents met in 1960 in
a Russian language class at the
University of Hawaii at Manoa,
where his father was a foreign
student on scholarship.[c]
Find a citation for the
statement!
Does it require a
news citation?
Yes
Task—1: Statement Categorization
Task—2: Citation Discovery
29. Why Statement Categorization?
24
type description
arXiv arXiv preprints
AV media audio and visual
AV media notes audio and visual liner
notesbioRxiv bioRxiv preprints
book books
conference conference papers
encylopedia edited collections
episode radio or television
collectionsinterview interviews
journal academic journals
and papersmagazine magazines,
periodicalsmailing list public mailing lists
map maps
news news articles
newsgroup online newsgroups
podcast audio or video
podcastpress release press releases
report reports
serial audio or video serials
sign sign, plaques
speech speeches
techreport technical report
thesis theses
web any resource
accessible through a
Citation Types in Wikipedia
0
0.2
0.4
0.6
0.8
1
ComicsCreator
Artwork
NaturalPlace
Airline
Film
SoccerManager
LegalCase
Album
Band
SportsTeam
TelevisionShow
AnatomicalStructure
Athlete
Weapon
Criminal
MusicalArtist
Politician
Plant
Song
Non-ProfitOrganisation
Book
Actor
FictionalCharacter
RecordLabel
Broadcaster
PoliticalParty
Automobile
TradeUnion
Scientist
MilitaryPerson
Philosopher
TelevisionSeason
Election
OfficeHolder
SportsLeague
GovernmentAgency
Single
Animal
Award
SportsEvent
Airport
MilitaryConflict
TelevisionEpisode
Aircraft
Magazine
Writer
Location
news book court journal web thesis
Besnik Fetahu, Abhijit Anand, Avishek Anand: “How much is
Wikipedia Lagging Behind News?” WebSci 2015: 28:1-28:9
30. Why Statement Categorization?
24
type description
arXiv arXiv preprints
AV media audio and visual
AV media notes audio and visual liner
notesbioRxiv bioRxiv preprints
book books
conference conference papers
encylopedia edited collections
episode radio or television
collectionsinterview interviews
journal academic journals
and papersmagazine magazines,
periodicalsmailing list public mailing lists
map maps
news news articles
newsgroup online newsgroups
podcast audio or video
podcastpress release press releases
report reports
serial audio or video serials
sign sign, plaques
speech speeches
techreport technical report
thesis theses
web any resource
accessible through a
Citation Types in Wikipedia
0
0.2
0.4
0.6
0.8
1
ComicsCreator
Artwork
NaturalPlace
Airline
Film
SoccerManager
LegalCase
Album
Band
SportsTeam
TelevisionShow
AnatomicalStructure
Athlete
Weapon
Criminal
MusicalArtist
Politician
Plant
Song
Non-ProfitOrganisation
Book
Actor
FictionalCharacter
RecordLabel
Broadcaster
PoliticalParty
Automobile
TradeUnion
Scientist
MilitaryPerson
Philosopher
TelevisionSeason
Election
OfficeHolder
SportsLeague
GovernmentAgency
Single
Animal
Award
SportsEvent
Airport
MilitaryConflict
TelevisionEpisode
Aircraft
Magazine
Writer
Location
news book court journal web thesis
Besnik Fetahu, Abhijit Anand, Avishek Anand: “How much is
Wikipedia Lagging Behind News?” WebSci 2015: 28:1-28:9
• Citations of type web and news account for the absolute
majority of citations in Wikipedia
• Citations of type news are considered as “reliable, published
source”
• Depending on the context and added information in Wikipedia,
different citation types are preferred
31. Why Statement Categorization?
25
Obama emphasized issues of rapidly ending the Iraq War, increasing
energy independence, and reforming the health care system,[1] in a
campaign that projected themes of hope and change.[2]
On June 3, 2008, Senator Obama—along with Senators Tom Carper, Tom
Coburn, and John McCain—introduced follow-up legislation: Strengthening
Transparency and Accountability in Federal Spending Act of 2008.[1]
In mid—1988, he traveled for the first time in Europe for three weeks and
then for five weeks in Kenya, where he met many of his paternal relatives
for the first time.[1][2]
1. “Barack Obama on the Issues: What Would
Be Your Top Three Overall Priorities If
Elected?". The Washington Post.
2. “The Obama promise of hope and
change". The Independent. London.
November 1, 2008.
cite type = “news”
1. "S. 3077: Strengthening Transparency and
Accountability in Federal Spending Act of
2008: 2007–2008 (110th Congress)".
Govtrack.us. June 3, 2008.
cite type = “report” cite type = “book”
1. Obama, Auma (2012). And then life
happens: a memoir. New York: St. Martin's
Press. pp. 189–208, 212–216. ISBN
978-1-250-01005-6.
32. Task#1: Statement Categorization
26
For a given Wikipedia statement, categorize it through a supervised model into
one of the predefined citation types (e.g. “news”, “web” etc.).
33. Task#1: Statement Categorization
26
For a given Wikipedia statement, categorize it through a supervised model into
one of the predefined citation types (e.g. “news”, “web” etc.).
Obama emphasized issues of rapidly ending the Iraq War, increasing energy
independence, and reforming the health care system,[c] in a campaign that projected
themes of hope and change.[c]
34. Task#1: Statement Categorization
26
For a given Wikipedia statement, categorize it through a supervised model into
one of the predefined citation types (e.g. “news”, “web” etc.).
Obama emphasized issues of rapidly ending the Iraq War, increasing energy
independence, and reforming the health care system,[c] in a campaign that projected
themes of hope and change.[c]
Wikipedia
language style
Wikipedia
entity structure
feature extraction
35. Task#1: Statement Categorization
26
For a given Wikipedia statement, categorize it through a supervised model into
one of the predefined citation types (e.g. “news”, “web” etc.).
Obama emphasized issues of rapidly ending the Iraq War, increasing energy
independence, and reforming the health care system,[c] in a campaign that projected
themes of hope and change.[c]
Wikipedia
language style
Wikipedia
entity structure
feature extraction
multi-class classification
feature representation
36. Task#1: Statement Categorization
26
For a given Wikipedia statement, categorize it through a supervised model into
one of the predefined citation types (e.g. “news”, “web” etc.).
Obama emphasized issues of rapidly ending the Iraq War, increasing energy
independence, and reforming the health care system,[c] in a campaign that projected
themes of hope and change.[c]
Wikipedia
language style
Wikipedia
entity structure
feature extraction
multi-class classification
feature representation
38. Task#2: Citation Discovery
28
For a Wikipedia statement that requires a citation of type news, find one or more
news article(s) as a citation from a given news corpus.
39. Task#2: Citation Discovery
28
For a Wikipedia statement that requires a citation of type news, find one or more
news article(s) as a citation from a given news corpus.
On February 10, 2007, Obama announced his candidacy for President of the United States in front
of the Old State Capitol building in Springfield, Illinois.
Wikipedia statement
40. citation discovery
Wikipedia
Statement
Task#2: Citation Discovery
28
For a Wikipedia statement that requires a citation of type news, find one or more
news article(s) as a citation from a given news corpus.
On February 10, 2007, Obama announced his candidacy for President of the United States in front
of the Old State Capitol building in Springfield, Illinois.
Wikipedia statement
41. citation discovery
Wikipedia
Statement
Task#2: Citation Discovery
28
For a Wikipedia statement that requires a citation of type news, find one or more
news article(s) as a citation from a given news corpus.
On February 10, 2007, Obama announced his candidacy for President of the United States in front
of the Old State Capitol building in Springfield, Illinois.
Wikipedia statement
news index
query
42. citation discovery
Wikipedia
Statement
Task#2: Citation Discovery
28
For a Wikipedia statement that requires a citation of type news, find one or more
news article(s) as a citation from a given news corpus.
On February 10, 2007, Obama announced his candidacy for President of the United States in front
of the Old State Capitol building in Springfield, Illinois.
Wikipedia statement
news index
query
top—k
retrieval
1 doc1
2 doc2
3 doc3
100 doc100
ranked news
43. citation discovery
Wikipedia
Statement
Task#2: Citation Discovery
28
For a Wikipedia statement that requires a citation of type news, find one or more
news article(s) as a citation from a given news corpus.
On February 10, 2007, Obama announced his candidacy for President of the United States in front
of the Old State Capitol building in Springfield, Illinois.
Wikipedia statement
news index
query
top—k
retrieval
1 doc1
2 doc2
3 doc3
100 doc100
ranked news
citation
discovery
44. citation discovery
Wikipedia
Statement
Task#2: Citation Discovery
28
For a Wikipedia statement that requires a citation of type news, find one or more
news article(s) as a citation from a given news corpus.
On February 10, 2007, Obama announced his candidacy for President of the United States in front
of the Old State Capitol building in Springfield, Illinois.
Wikipedia statement
news index
query
top—k
retrieval
1 doc1
2 doc2
3 doc3
100 doc100
ranked news
citation
discovery
45. citation discovery
Wikipedia
Statement
Task#2: Citation Discovery
28
For a Wikipedia statement that requires a citation of type news, find one or more
news article(s) as a citation from a given news corpus.
On February 10, 2007, Obama announced his candidacy for President of the United States in front
of the Old State Capitol building in Springfield, Illinois.
Wikipedia statement
news index
query
top—k
retrieval
1 doc1
2 doc2
3 doc3
100 doc100
ranked news
citation
discovery
46. Task#2: Citation Discovery Properties
29
Candidate News Article
On February 10, 2007, Obama announced his
candidacy for President of the United States in front of
the Old State Capitol building in Springfield, Illinois.
Wikipedia Statement
+
47. Task#2: Citation Discovery Properties
29
Candidate News Article
On February 10, 2007, Obama announced his
candidacy for President of the United States in front of
the Old State Capitol building in Springfield, Illinois.
Wikipedia Statement
+
Speaking in a single-digit, morning chill and sunshine to
thousands of supporters outside the Old State Capitol,
the first-term Democratic senator delivered an address
that built upon his biography as a community organizer
in Chicago, state legislator and U.S. senator to call for
quick action on issues ranging from bringing a close to
the Iraq war to the need for universal health care and an
end to foreign-oil dependence.
News Sentence
48. Task#2: Citation Discovery Properties
29
Candidate News Article
On February 10, 2007, Obama announced his
candidacy for President of the United States in front of
the Old State Capitol building in Springfield, Illinois.
Wikipedia Statement
+
Speaking in a single-digit, morning chill and sunshine to
thousands of supporters outside the Old State Capitol,
the first-term Democratic senator delivered an address
that built upon his biography as a community organizer
in Chicago, state legislator and U.S. senator to call for
quick action on issues ranging from bringing a close to
the Iraq war to the need for universal health care and an
end to foreign-oil dependence.
News Sentence
• Lexical similarity between
sentence and statement
• Language model similarity
• TreeKernel similarity
• Query and news article
Entailment
49. Task#2: Citation Discovery Properties
29
Candidate News Article
On February 10, 2007, Obama announced his
candidacy for President of the United States in front of
the Old State Capitol building in Springfield, Illinois.
Wikipedia Statement
+
Speaking in a single-digit, morning chill and sunshine to
thousands of supporters outside the Old State Capitol,
the first-term Democratic senator delivered an address
that built upon his biography as a community organizer
in Chicago, state legislator and U.S. senator to call for
quick action on issues ranging from bringing a close to
the Iraq war to the need for universal health care and an
end to foreign-oil dependence.
News Sentence
• Lexical similarity between
sentence and statement
• Language model similarity
• TreeKernel similarity
• Query and news article
Entailment
• TextRank for measuring
sentence centrality in a
news article
• Entailment feature scores
w.r.t most central sentence
in a news article
Centrality
50. Task#2: Citation Discovery Properties
29
Candidate News Article
On February 10, 2007, Obama announced his
candidacy for President of the United States in front of
the Old State Capitol building in Springfield, Illinois.
Wikipedia Statement
+
Speaking in a single-digit, morning chill and sunshine to
thousands of supporters outside the Old State Capitol,
the first-term Democratic senator delivered an address
that built upon his biography as a community organizer
in Chicago, state legislator and U.S. senator to call for
quick action on issues ranging from bringing a close to
the Iraq war to the need for universal health care and an
end to foreign-oil dependence.
News Sentence
• Lexical similarity between
sentence and statement
• Language model similarity
• TreeKernel similarity
• Query and news article
Entailment
• TextRank for measuring
sentence centrality in a
news article
• Entailment feature scores
w.r.t most central sentence
in a news article
Centrality
• Entity type specific news
citation suggestion
• Authority of news domains
on specific entity types
Authority
51. Task#2: Citation Discovery Properties
29
Candidate News Article
On February 10, 2007, Obama announced his
candidacy for President of the United States in front of
the Old State Capitol building in Springfield, Illinois.
Wikipedia Statement
+
Speaking in a single-digit, morning chill and sunshine to
thousands of supporters outside the Old State Capitol,
the first-term Democratic senator delivered an address
that built upon his biography as a community organizer
in Chicago, state legislator and U.S. senator to call for
quick action on issues ranging from bringing a close to
the Iraq war to the need for universal health care and an
end to foreign-oil dependence.
News Sentence
• Lexical similarity between
sentence and statement
• Language model similarity
• TreeKernel similarity
• Query and news article
Entailment
• TextRank for measuring
sentence centrality in a
news article
• Entailment feature scores
w.r.t most central sentence
in a news article
Centrality
• Entity type specific news
citation suggestion
• Authority of news domains
on specific entity types
Authority
Properties of a good citation:
1. the statement should be entailed by the news article
2. the statement is central in the news article
3. the cited news article should be from an authoritative source
53. Evaluation Datasets
31
• 6.9 million Wikipedia statements
• 8.8 million citations to external
references
• 1.6 million Wikipedia entities
• 1.88 million news articles cited from
Wikipedia statements
• 20 million news articles from a real world
news collection (GDelt), between 2013—
2015
• 27k news articles cited from Wikipedia
statements in the within the range of
GDelt
Task#1: Statement Categorization Data Task#2: Citation Discovery Data
GDelt domain stats
news domain news articles
yahoo.com 1244781
allafrica.com 1035646
reuters.com 828133
dailymail.co.uk 815372
indiatimes.com 743991
wn.com 587607
Wikipedia statement distribution by citation type
54. Task#1: Statement Categorization Results
32
yagoLegalActorGeo
Parent
Type
Child Type 1 ⌧ 10 10 < ⌧ 50 50 < ⌧ 90
P R F1 P R F1 P R F1
owl:Thing Legal
Actor Geo
0.48 0.36 0.41 0.51 0.43 0.47 0.53 0.47 0.50
Legal Actor
Geo
Legal
Actor
0.51 0.34 0.41 0.54 0.41 0.47 0.56 0.45 0.50
location 0.30 0.29 0.29 0.34 0.40 0.37 0.36 0.45 0.40
location
region 0.30 0.28 0.29 0.35 0.40 0.37 0.37 0.44 0.40
point 0.30 0.10 0.14 0.38 0.22 0.28 0.39 0.26 0.32
Legal
Actor
person 0.53 0.36 0.43 0.56 0.43 0.49 0.58 0.46 0.51
person
preserver 0.63 0.31 0.42 0.67 0.46 0.54 0.67 0.49 0.57
authority 0.53 0.20 0.29 0.62 0.24 0.35 0.65 0.33 0.44
contestant 0.59 0.43 0.50 0.62 0.52 0.57 0.64 0.56 0.60
leader 0.53 0.26 0.34 0.59 0.34 0.43 0.61 0.37 0.46
wc Living
people
0.55 0.37 0.44 0.58 0.44 0.50 0.59 0.47 0.52
Separate models
per entity type
YAGO type hierarchy Statement Categorization results based on RandomForests
57. Task#2: Citation Discovery Results
33
B1: top—1 retrieved article from the retrieval model
B2: supervised model based on rank and similarity score from the search engine
58. Task#2: Citation Discovery Results
33
B1: top—1 retrieved article from the retrieval model
B2: supervised model based on rank and similarity score from the search engine
E1: our approach with entailment, centrality, authority features, where for a statement a correct
citation are news articles which are cited originally from the statement in the Wikipedia page
59. Task#2: Citation Discovery Results
33
B1: top—1 retrieved article from the retrieval model
B2: supervised model based on rank and similarity score from the search engine
E1: our approach with entailment, centrality, authority features, where for a statement a correct
citation are news articles which are cited originally from the statement in the Wikipedia page
E1+FP: similar as E1, but consider as relevant false positives which have very high similarity to ground—truth articles
60. Task#2: Citation Discovery Results
33
B1: top—1 retrieved article from the retrieval model
B2: supervised model based on rank and similarity score from the search engine
E1: our approach with entailment, centrality, authority features, where for a statement a correct
citation are news articles which are cited originally from the statement in the Wikipedia page
E1+FP: similar as E1, but consider as relevant false positives which have very high similarity to ground—truth articles
E2: builds on top of E1+FP with additionally assessing for the relevance of FP articles below the similarity threshold
61. Part (I): Conclusion
34
• Specific citation types are preferred based on the statement, its context, and the
Wikipedia page
• Statement categorization works fairly well for some entity types
• Challenging to distinguish between citation type web and news
• Citation discovery can be performed accurately across all entity types
62. Part (II): Fine Grained
Citation Span for
References in Wikipedia
Besnik Fetahu, Katja Markert, Avishek Anand: “Fine Grained
Citation Span for References in Wikipedia”. EMNLP 2017: 1980-1989
63. News Collection
t1 t2 tn
Textual Knowledge Base
t1 t2 tn
Citation Recommendation Citation Span
News Suggestion
Entity Placement Section Placement
e:“Barack Obama”
Obama was born on
August 4, 1961,[4] …..
The couple married in
Wailuku on Maui on …
After graduating ... a JD …
magna cum laude[49]…
Obama was elected to the
Illinois Senate in …
news? query for s1
c4
Obama was born on August 4,
1961, at Kapiʻolani Maternity &
Gynecological Hospital in
Honolulu, Hawaii.
c4s1
Obama was born on August 4, 1961, at
Kapiʻolani Maternity & Gynecological Hospital
in Honolulu, Hawaii.
citation c4 span
e:“Barack Obama” AND t2
news: nk
time: > t2
The choice of Barack Obama on Friday
as the recipient of the 2009 Nobel Peace
Prize, [...] around the globe. [...] The
Nobel committee’s embrace of Mr. Obama
was viewed [as a rejection of the
unpopular tenure, in] Europe especially, of
his predecessor, George W. Bush. [...] “To
be honest,” the president said in the Rose
Garden, [...] Last year’s laureate, former
President Martti Ahtisaari of Finland, saw
the award as an endorsement of Mr.
Obama’s goal of achieving Middle East
peace.
1.Early life and career
2.Political career
3.2008 presidential
campaign
4.Presidency
5.Political positions
6.Family and Personal
life
7.Cultural and political
image
1.Early life and career
2.Political career
3.2008 presidential
campaign
4.Presidency
5.Political positions
6.Nobel Peace Prize
7.Family and Personal life
8.Cultural and political
image
t2 t3
publish date t
headline
body
entity mentions (e.g. “Barack
Obama, Nobel Prize”…)
revision date t
entity title
sections
section text
categories
citations
65. Citation Span Cases
38
Obama was born on August 4, 1961,[5] at Kapiʻolani Maternity & Gynecological
Hospital in Honolulu, Hawaii.[6][7][8]
On February 10, 2007, Obama announced his candidacy for President of the United
States in front of the Old State Capitol building in Springfield, Illinois.[158][159] […]
Obama emphasized issues of rapidly ending the Iraq War, increasing energy
independence, and reforming the health care system,[161] in a campaign that
projected themes of hope and change.[162]
At the Democratic National Convention in Charlotte, North Carolina, Obama and Joe
Biden were formally nominated by former President Bill Clinton as the Democratic
Party candidates for president and vice president in the general election. Their main
opponents were Republicans Mitt Romney, the former governor of Massachusetts,
and Representative Paul Ryan of Wisconsin.[183]
Citation marker placed at a sub-sentence level
Citation marker placed at the end of a sentence
Citation marker placed after multiple sentences in a paragraph
67. Citation Span Task
40
Citing
Paragraph
He was reelected to the Illinois Senate in 1998, defeating Republican Yesse Yehudah in the
general election, and was re-elected again in 2002.[117] In 2000, he lost a Democratic primary
race for Illinois's 1st congressional district in the United States House of Representatives to four-
term incumbent Bobby Rush by a margin of two to one.[118]
68. Citation Span Task
40
Citing
Paragraph
He was reelected to the Illinois Senate in 1998, defeating Republican Yesse Yehudah in the
general election, and was re-elected again in 2002.[117] In 2000, he lost a Democratic primary
race for Illinois's 1st congressional district in the United States House of Representatives to four-
term incumbent Bobby Rush by a margin of two to one.[118]
Textual
Fragments
Chunk Paragraph
(punctuation symbols)
He was reelected to the Illinois Senate in 1998, defeating Republican Yesse Yehudah in the
general election, and was re-elected again in 2002.[117] In 2000, he lost a Democratic primary
race for Illinois's 1st congressional district in the United States House of Representatives to four-
term incumbent Bobby Rush by a margin of two to one.[118]
69. Citation Span Task
40
Citing
Paragraph
He was reelected to the Illinois Senate in 1998, defeating Republican Yesse Yehudah in the
general election, and was re-elected again in 2002.[117] In 2000, he lost a Democratic primary
race for Illinois's 1st congressional district in the United States House of Representatives to four-
term incumbent Bobby Rush by a margin of two to one.[118]
Textual
Fragments
Chunk Paragraph
(punctuation symbols)
He was reelected to the Illinois Senate in 1998, defeating Republican Yesse Yehudah in the
general election, and was re-elected again in 2002.[117] In 2000, he lost a Democratic primary
race for Illinois's 1st congressional district in the United States House of Representatives to four-
term incumbent Bobby Rush by a margin of two to one.[118]
CitingSpan
Citation Span for
reference [117]
He was reelected to the Illinois Senate in 1998, defeating Republican Yesse Yehudah in the
general election, and was re-elected again in 2002.[117] In 2000, he lost a Democratic primary
race for Illinois's 1st congressional district in the United States House of Representatives to four-
term incumbent Bobby Rush by a margin of two to one.[118]
70. Citation Span Approach
41
Sequence Classification
(linear—chain CRF)
Plain Classification
• Citations other than c
• Same sentence as c
• Same sentence as previous text
fragment
• Distance in terms of text
fragments to c
Paragraph Structure
He was reelected to the Illinois Senate in 1998, defeating Republican Yesse Yehudah in the
general election, and was re-elected again in 2002.[117] In 2000, he lost a Democratic primary
race for Illinois's 1st congressional district in the United States House of Representatives to four-
term incumbent Bobby Rush by a margin of two to one.[118]
• Language models per
paragraphs in cited document
• Language model similarity
between fragment and
paragraph’s LM
Citation Features
• Explicit discourse sense annotation
of sentences
• Fragments in a sentence with explicit
discourse (e.g. comparison) are
likely to have same label
• Fragments with different time points
are unlikely to have same label
Discourse/Temporal Features
Determine span for citation c= [117]
Extract features for each text fragment
72. Citation Span Dataset
43
• 500 citing paragraphs, pointing to either web or news citations
• Manual annotation of each textual fragment whether it is explicitly
supported or implied by the corresponding citation
• High inter-rater agreement on a 10% sample with 𝜅=0.84
span
news web
dist. skip frag. skip sent. dist. skip frag. skip sent.
<= 0.5 11% 6% - 6.8% - -
(.5,1] 63% - - 63% - 1%
(1,2] 17% - 8% 14% - 19%
(2,5] 7% 5% 18% 13.1% - 21%
> 5 1.8% - 20% 3.1% - 67%
73. Evaluation Metrics
44
MAP =
1
|N|
X
p2N
|S0
St
|
|S0|
R =
1
|N|
X
p2N
|S0
St
|
|St|
w =
1
|N|
X
p2N
P
2S0St words( )
P
2St words( )
Mean Average Precision — MAP:
Recall — R:
Erroneous Span (word and text-fragment level) - ∆”
• S’ — text fragment marked as
covered by one of the
approaches.
• St — text fragment marked as
covered by the citation in our
ground-truth.
• words(∂) — number of words in a
text fragment.
74. Citation Span Analysis — Accuracy
45
Citation Span results decomposed across the different span cases.
75. Citation Span Analysis — Accuracy
45
Citation Span results decomposed across the different span cases.
MRF: compute two functions: (i) potential and (ii) compatibility. In (i) measure the similarity between sentences in a
citing paragraph, and in (ii) measure the similarity between a sentences and sentences in the citing document.
76. Citation Span Analysis — Accuracy
45
Citation Span results decomposed across the different span cases.
MRF: compute two functions: (i) potential and (ii) compatibility. In (i) measure the similarity between sentences in a
citing paragraph, and in (ii) measure the similarity between a sentences and sentences in the citing document.
IC: the span of a citation is the text between two citations, or it starts in the beginning of a paragraph, and it can end at the
end of a paragraph.
77. Citation Span Analysis — Accuracy
45
Citation Span results decomposed across the different span cases.
MRF: compute two functions: (i) potential and (ii) compatibility. In (i) measure the similarity between sentences in a
citing paragraph, and in (ii) measure the similarity between a sentences and sentences in the citing document.
IC: the span of a citation is the text between two citations, or it starts in the beginning of a paragraph, and it can end at the
end of a paragraph.
CS/CSW: the span is the citing sentence, and for CSW the span is the citing sentence +/- 2 sentences depending if they contain
specific cue words in specific locations in the sentence.
78. Citation Span Analysis — Accuracy
45
Citation Span results decomposed across the different span cases.
MRF: compute two functions: (i) potential and (ii) compatibility. In (i) measure the similarity between sentences in a
citing paragraph, and in (ii) measure the similarity between a sentences and sentences in the citing document.
IC: the span of a citation is the text between two citations, or it starts in the beginning of a paragraph, and it can end at the
end of a paragraph.
CS/CSW: the span is the citing sentence, and for CSW the span is the citing sentence +/- 2 sentences depending if they contain
specific cue words in specific locations in the sentence.
CSPC/CSPS: our plain classifier with the proposed features, and for CSPS we train a structured prediction model based on the same feature set.
80. Part (II): Conclusion
47
• Citation span can be accurately determined for web and news
citations
• Sequence classification achieves slightly better performance than
plain classification and outperforms baseline approaches
• Baseline approaches from the scientific domain do not generalize in
Wikipedia’s language style
81. Part (III): Automated News
Suggestion for Populating
Wikipedia Pages
Besnik Fetahu, Katja Markert, Avishek Anand: “Automated News
Suggestion for Populating Wikipedia Pages”. CIKM 2015: 323-332
82. News Collection
t1 t2 tn
Textual Knowledge Base
t1 t2 tn
Citation Recommendation Citation Span
News Suggestion
Entity Placement Section Placement
e:“Barack Obama”
Obama was born on
August 4, 1961,[4] …..
The couple married in
Wailuku on Maui on …
After graduating ... a JD …
magna cum laude[49]…
Obama was elected to the
Illinois Senate in …
news? query for s1
c4
Obama was born on August 4,
1961, at Kapiʻolani Maternity &
Gynecological Hospital in
Honolulu, Hawaii.
c4s1
Obama was born on August 4, 1961, at
Kapiʻolani Maternity & Gynecological Hospital
in Honolulu, Hawaii.
citation c4 span
e:“Barack Obama” AND t2
news: nk
time: > t2
The choice of Barack Obama on Friday
as the recipient of the 2009 Nobel Peace
Prize, [...] around the globe. [...] The
Nobel committee’s embrace of Mr. Obama
was viewed [as a rejection of the
unpopular tenure, in] Europe especially, of
his predecessor, George W. Bush. [...] “To
be honest,” the president said in the Rose
Garden, [...] Last year’s laureate, former
President Martti Ahtisaari of Finland, saw
the award as an endorsement of Mr.
Obama’s goal of achieving Middle East
peace.
1.Early life and career
2.Political career
3.2008 presidential
campaign
4.Presidency
5.Political positions
6.Family and Personal
life
7.Cultural and political
image
1.Early life and career
2.Political career
3.2008 presidential
campaign
4.Presidency
5.Political positions
6.Nobel Peace Prize
7.Family and Personal life
8.Cultural and political
image
t2 t3
publish date t
headline
body
entity mentions (e.g. “Barack
Obama, Nobel Prize”…)
revision date t
entity title
sections
section text
categories
citations
84. Automated News Suggestion to Entity Pages
50
Daily News Articles
Some half a million people were evacuated
from the southeastern Indian coast as Cyclone
Phailin, a tropical storm from the Bay of
Bengal, bore down on India. The states of
Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
WSJ: As it Happened: Cyclone Reaches Orissa
85. Automated News Suggestion to Entity Pages
50
Daily News Articles
Some half a million people were evacuated
from the southeastern Indian coast as Cyclone
Phailin, a tropical storm from the Bay of
Bengal, bore down on India. The states of
Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
WSJ: As it Happened: Cyclone Reaches Orissa
86. Automated News Suggestion to Entity Pages
50
Daily News Articles
Some half a million people were evacuated
from the southeastern Indian coast as Cyclone
Phailin, a tropical storm from the Bay of
Bengal, bore down on India. The states of
Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
WSJ: As it Happened: Cyclone Reaches Orissa
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The states
of Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
87. Automated News Suggestion to Entity Pages
50
Daily News Articles
Some half a million people were evacuated
from the southeastern Indian coast as Cyclone
Phailin, a tropical storm from the Bay of
Bengal, bore down on India. The states of
Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
WSJ: As it Happened: Cyclone Reaches Orissa
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The states
of Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
88. Automated News Suggestion to Entity Pages
50
Daily News Articles
Some half a million people were evacuated
from the southeastern Indian coast as Cyclone
Phailin, a tropical storm from the Bay of
Bengal, bore down on India. The states of
Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
WSJ: As it Happened: Cyclone Reaches Orissa
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The states
of Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
89. Automated News Suggestion to Entity Pages
50
Daily News Articles
Some half a million people were evacuated
from the southeastern Indian coast as Cyclone
Phailin, a tropical storm from the Bay of
Bengal, bore down on India. The states of
Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
WSJ: As it Happened: Cyclone Reaches Orissa
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The states
of Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh,
both of which have large coastal populations,
were on high alert ahead of the storm’s
expected arrival. […]
90. Automated News Suggestion to Entity Pages
50
Daily News Articles
Some half a million people were evacuated
from the southeastern Indian coast as Cyclone
Phailin, a tropical storm from the Bay of
Bengal, bore down on India. The states of
Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
WSJ: As it Happened: Cyclone Reaches Orissa
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The states
of Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh,
both of which have large coastal populations,
were on high alert ahead of the storm’s
expected arrival. […]
Task#1: Article Placement
91. Automated News Suggestion to Entity Pages
50
Daily News Articles
Some half a million people were evacuated
from the southeastern Indian coast as Cyclone
Phailin, a tropical storm from the Bay of
Bengal, bore down on India. The states of
Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
WSJ: As it Happened: Cyclone Reaches Orissa
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The states
of Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh,
both of which have large coastal populations,
were on high alert ahead of the storm’s
expected arrival. […]
Task#1: Article Placement
Bay of Bengal
Odisha
India
Cyclone Phailin
Entity Mentions
92. Automated News Suggestion to Entity Pages
50
Daily News Articles
Some half a million people were evacuated
from the southeastern Indian coast as Cyclone
Phailin, a tropical storm from the Bay of
Bengal, bore down on India. The states of
Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
WSJ: As it Happened: Cyclone Reaches Orissa
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The states
of Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh,
both of which have large coastal populations,
were on high alert ahead of the storm’s
expected arrival. […]
Task#1: Article Placement
Bay of Bengal
Odisha
India
Cyclone Phailin
Entity Mentions
Bay of Bengal
Odisha
India
Cyclone Phailin
93. Automated News Suggestion to Entity Pages
50
Daily News Articles
Some half a million people were evacuated
from the southeastern Indian coast as Cyclone
Phailin, a tropical storm from the Bay of
Bengal, bore down on India. The states of
Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
WSJ: As it Happened: Cyclone Reaches Orissa
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The states
of Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh,
both of which have large coastal populations,
were on high alert ahead of the storm’s
expected arrival. […]
Task#1: Article Placement
Bay of Bengal
Odisha
India
Cyclone Phailin
Entity Mentions
Bay of Bengal
Odisha
India
Cyclone Phailin
Task#2: Section Placement
Odisha
94. Automated News Suggestion to Entity Pages
50
Daily News Articles
Some half a million people were evacuated
from the southeastern Indian coast as Cyclone
Phailin, a tropical storm from the Bay of
Bengal, bore down on India. The states of
Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
WSJ: As it Happened: Cyclone Reaches Orissa
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The states
of Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh,
both of which have large coastal populations,
were on high alert ahead of the storm’s
expected arrival. […]
Task#1: Article Placement
Bay of Bengal
Odisha
India
Cyclone Phailin
Entity Mentions
Bay of Bengal
Odisha
India
Cyclone Phailin
Task#2: Section Placement
Odisha
1. Etymology
2. History
3. Geography
4. Government and politics
5. Economy
6. Transportation
7. Demographics
8. Education
Sections
95. Automated News Suggestion to Entity Pages
50
Daily News Articles
Some half a million people were evacuated
from the southeastern Indian coast as Cyclone
Phailin, a tropical storm from the Bay of
Bengal, bore down on India. The states of
Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
WSJ: As it Happened: Cyclone Reaches Orissa
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The states
of Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh,
both of which have large coastal populations,
were on high alert ahead of the storm’s
expected arrival. […]
Task#1: Article Placement
Bay of Bengal
Odisha
India
Cyclone Phailin
Entity Mentions
Bay of Bengal
Odisha
India
Cyclone Phailin
Task#2: Section Placement
Odisha
1. Etymology
2. History
3. Geography
4. Government and politics
5. Economy
6. Transportation
7. Demographics
8. Education
Sections
Add section in case
it is missing
96. Automated News Suggestion to Entity Pages
50
Daily News Articles
Some half a million people were evacuated
from the southeastern Indian coast as Cyclone
Phailin, a tropical storm from the Bay of
Bengal, bore down on India. The states of
Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
WSJ: As it Happened: Cyclone Reaches Orissa
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The states
of Orissa and Andhra Pradesh, both of which
have large coastal populations, were on high
alert ahead of the storm’s expected arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh, both of
which have large coastal populations, were on
high alert ahead of the storm’s expected
arrival. […]
Some half a million people were evacuated
from the southeastern Indian coast as
Cyclone Phailin, a tropical storm from the
Bay of Bengal, bore down on India. The
states of Orissa and Andhra Pradesh,
both of which have large coastal populations,
were on high alert ahead of the storm’s
expected arrival. […]
Task#1: Article Placement
Bay of Bengal
Odisha
India
Cyclone Phailin
Entity Mentions
Bay of Bengal
Odisha
India
Cyclone Phailin
Task#2: Section Placement
Odisha
1. Etymology
2. History
3. Geography
4. Government and politics
5. Economy
6. Transportation
7. Demographics
8. Education
Sections
Add section in case
it is missing
Catastrophes
97. News Suggestion Attributes
51
• The entity should be a central concept in the news article
• The information in the news article should be important for the
Wikipedia entity
• Information in news article should contain novel or missing
information for a Wikipedia entity
• A news article should be suggested for to the exact section, if such
section does not exist, it needs to be added
99. Article—Placement: News Suggestion Attributes
53
Entity Salience Relative Authority Novelty
• Reward entities appearing
throughout the news article
• Reward entities appearing in
top-paragraphs
• Weigh the entities w.r.t the score
of the co-occurring entities
• Entry barrier is lower for
information from news articles
for entities with low-authority
• Important information for an
entity can be unveiled by
measuring the relative
importance of its co-occurring
entities
• Information from a news article
should be novel w.r.t to the
entity under consideration
• Measure information novelty
against already cited news
sources
• Measure information novelty
against the already existing
content in a Wikipedia entity
108. Part (III): Conclusions
59
• Three main properties of a good news suggestion
• Through AEP and AES tasks, we can suggest important and novel
information for Wikipedia entities
• Entity profile expansion through section templates generated at entity
type level
110. Conclusions
61
• We account for the evolving nature of Wikipedia entities as new and
novel information becomes available on the Web
• We present a holistic approach for enriching and improving
Wikipedia entities
• Through our approach we enforce the core principles of Wikipedia
such as the “verifiability” principle
• Our automated approach provides accurate enrichments and
improvements, and furthermore accounts for long-tail entities, where
editor interests are low.
111. Future Work
62
• Wikipedia is a collaboratively edited and created data source, as such it can have
pitfalls like “echo chambers”. We want to investigate how are such “echo
chambers” established, and what are the factors (e.g. editors, sources, topic
interests) that cause it?
• Quality issues such as NPOV violations in Wikipedia are coarse-grained and
such quality indicators are inexistent in long-tail entities, thus, investigating editor,
language, and source biases that cause such a NPOV violations is an important
quality assurance step.
• Editors dynamics reflect the quality of Wikipedia pages. How can we provide a
mechanism for distributing “uniformly” Wikipedia pages across editors, such that
we satisfy their interests and at the same time increase the overall quality of
Wikipedia pages overall.