4. cut
global local*
cutting tax
squeeze deficit
reduce vote
slash budget
reduction reduction
spend house
lower bill
halve plan
soften spend
freeze billion
global: trained using full
corpus
local: trained using topically-
*gas
8. • local term clustering [Lesk 1968, Attar and Fraenkel
1977]
• local latent semantic analysis [Hull 1995, Hull, 1994;
Schutze et al., 1995; Singhal et al., 1997]
• local document clustering [Tombros and van
Rijsbergen, 2001; Tombros et al., 2002; Willett, 1985]
• one sense per discourse [Gale et al., 1992]
18. U =
8
>>><
>>>:
uniform p(d) on the target corpus
uniform p(d) on an external corpus
p(d|q) on the target corpus
p(d|q) on an external corpus
19. docs words queries
trec12 469,949 438,338 150
robust 528,155 665,128 250
web 50,220,423 90,411,624 200
20. global local
target target
wikipedia+gigaword* gigaword†
google news* wikipedia†
*publicly available embedding; †publicly available external corpus
target
corpus
query
results
external
corpus
query
results
target
corpus
query
results
target
corpus
query
results
external
corpus
query
results
21. trec12 robust web
local vs global
NDCG@10
0.0
0.1
0.2
0.3
0.4
0.5
expansion
none
global
local
22. trec12 robust web
local embedding
NDCG@10
0.0
0.1
0.2
0.3
0.4
0.5
corpus
target
gigaword
wikipedia
23. • local embedding provides a stronger representation than
global embedding
• potential impact for other topic-specific natural language
processing tasks
• future work
• effectiveness improvements
• efficiency improvements