SlideShare ist ein Scribd-Unternehmen logo
1 von 105
The Web as an Implicit Training Set:
Application to Noun Compound Syntax and Semantics
Preslav Nakov, Qatar Computing Research Institute, HBKU
Data Science Society
July 23, 2015
Sofia, Bulgaria
The Dream
The Big Dream
(2001: A Space Odyssey)
Dave Bowman: “Open the pod bay doors, HAL”
HAL 9000: “I’m sorry Dave. I’m afraid I can’t do that.”
Where Are We?
IR: Google
MT: Google
QA: IBM Watson won Jeopardy!
Dialog: Siri
ASR: Siri, Google
Generation: Computers Write News!
The Turing Test Passed?
Reality check…
Dialog: Siri
Dialog: Siri
MT: Google Fails
MT Fails?
Turing Test: Not Passed?
Brief History of NLP
•50s: early optimism
•1964: ALPAC report
- skeptical of MT so far
- basic research in CL needed
•90s: statistical revolution
- get large text collections
- compute statistics
Web-scale
Corpus Linguistics
Size Matters
Banko & Brill: “Scaling to Very, Very Large Corpora for Natural
Language Disambiguation”, ACL’2001
• Spelling correction
– Which word should we use?
<principal> <principle>
– In a given context:
• The teachers and the principal of 137th school "Angel Kanchev" in Sofia,
Bulgaria shared their experience with Jumpido over the last one year. The
school is one of the few mentor schools in innovation of Microsoft's
Microsoft's "Innovative schools program - Partners in learning".
• “Bulgaria plans to carry out ambitious reforms aimed at modernization and at
and at keeping the principle of the rule of law,” said President Plevneliev.
25
 Log-linear improvement even to a billion words!
 Getting more data is better than fine-tuning algorithms!
(Banko & Brill, 2001)
Great idea!
Can it be
extended to
other tasks?
For this problem,
one can get a lot
of training data.
Size Matters: Using Billions of Words
(Banko&Brill, ACL 2011)
26
Language Models for SMT at Google:
Using Quadrillions (1015) of Words!
(Brants&al., EMNLP 2007)
27
The Web as a Baseline
• “Web as a baseline” (Lapata & Keller 04;05):
n-gram models
– machine translation candidate selection
– article generation
– noun compound interpretation
– noun compound bracketing
– adjective ordering
– spelling correction
– countability detection
– prepositional phrase attachment
• Their conclusion:
– The Web should be used as a baseline.
Significantly better than the
best supervised algorithm.
Not significantly different from
the best supervised algorithm.
These are all UNSUPERVISED!
Wecandobetter…
(Lapata&Keller, HLT-NAACL 2004, TSLP 2005)
28
The Web as an Implicit Training Set
• Much more can be achieved using
– surface features
– paraphrases
– linguistic knowledge
• I will demonstrate this on noun compounds
(and on some other problems)
In Memoriam
portrait by Gregory Grefenstette
30
Noun Compounds
31
Noun Compound
• Def: Sequence of nouns that function as a single noun, e.g.
– healthcare reform
– plastic water bottle
– colon cancer tumor suppressor protein
– Korpuslinguistikkonferenz (German)
– къща-музей, човекомесец, нармаг (Bulgarian)
Three problems:
1. Segmentation
2. Syntax
3. Semantics
32
36
Noun Compound
Syntax
37
Noun Compound Syntax: The Problem
plastic water bottle plastic water bottle
OR ?
[ plastic [ water bottle ] ] [ [ plastic water ] bottle ]
right left
water bottle made of plastic bottle containing plastic water
38
Measuring Word Association
• Frequencies
– Dependency: #(w1,w2) vs. #(w1,w3)
– Adjacency: #(w1,w2) vs. #(w2,w3)
• Probabilities
– Dependency: Pr(w1w2|w2) vs. Pr(w1w3|w3)
– Adjacency: Pr(w1w2|w2) vs. Pr(w2w3|w3)
• Also: Pointwise Mutual Information, Chi Square, etc.
Simple Word-based Models
w1 w2 w3
adjacency
dependency
plastic water bottle
39
Web-derived Surface Features
• Observations
– Authors often disambiguate noun compounds using surface
markers.
– The size of the Web makes such markers frequent enough to be
useful.
• Ideas
– Look for instances where the compound occurs with surface
markers.
– Also try
• paraphrases
• linguistic knowledge
The Web as an Implicit Training Set
40
Web-derived Surface Features:
Dash (hyphen)
• Left dash
– cell-cycle analysis  left
• Right dash
– donor T-cell  right
CoNLL'05: Nakov&Hearst
41
Web-derived Surface Features:
Possessive Marker
• After the first word
– world’s food production  right
• After the second word
– cell cycle’s analysis  left
CoNLL'05: Nakov&Hearst
42
Web-derived Surface Features:
Capitalization
• don’t-care – lowercase – uppercase
– Plasmodium vivax Malaria  left
– plasmodium vivax Malaria  left
• lowercase – uppercase – don’t-care
– tumor Necrosis Factor  right
– tumor Necrosis factor  right
CoNLL'05: Nakov&Hearst
43
Web-derived Surface Features:
Embedded Slash
• Left embedded slash
– leukemia/lymphoma cell  right
CoNLL'05: Nakov&Hearst
44
Web-derived Surface Features:
Parentheses
• Single word
– growth factor (beta)  left
– (tumor) necrosis factor  right
• Two words
– (cell cycle) analysis  left
– adult (male rat)  right
CoNLL'05: Nakov&Hearst
45
Web-derived Surface Features:
Comma,dot,column,semi-column,…
• Following the second word
– lung cancer: patients  left
– health care, provider  left
• Following the first word
– home. health care  right
– adult, male rat  right
CoNLL'05: Nakov&Hearst
46
Web-derived Surface Features:
Abbreviation
• After the second word
– tumor necrosis (TN) factor  left
• After the third word
– tumor necrosis factor (NF)  right
CoNLL'05: Nakov&Hearst
47
Web-derived Surface Features:
Concatenation
Consider “health care reform”
• Dependency model
– healthcare vs. healthreform
• Adjacency model
– healthcare vs. carereform
• Triples
– “healthcare reform” vs. “health carereform”
w1 w2 w3
adjacency
dependency
health care reform
CoNLL'05: Nakov&Hearst
48
Web-derived Surface Features:
Internal Inflection Variability
• First word
– bone mineral density
– bones mineral density
• Second word
– bone mineral density
– bone minerals density
CoNLL'05: Nakov&Hearst
49
Web-derived Surface Features:
Switch The First Two Words
• Predict right if we can reorder
– adult male rat as
– male adult rat
CoNLL'05: Nakov&Hearst
50
• Prepositional
– cells in (the) bone marrow  left (61,700)
– cells from (the) bone marrow  left (16,500)
– marrow cells from (the) bone  right (12)
• Verbal
– cells extracted from (the) bone marrow  left (17)
– marrow cells found in (the) bone  right (1)
• Copula
– cells that are bone marrow  left (3)
“bone marrow cell”: left or right?
Paraphrases
CoNLL'05: Nakov&Hearst
“left”
sum
“right”
sum
compare
51
• Word associations
• Surface features and paraphrases
Evaluation Results
On 244 noun compounds from Grolier’s encyclopedia (Lauer dataset)
Size does matter!
Using MEDLINE instead of the Web (million times smaller)
• 9.43% Coverage (23 out of 244 NCs)
• 47.83% Accuracy (12 out of 23 wrong)
CoNLL'05: Nakov&Hearst
Acc. Cov.
Acc. Cov.
52
Application to Other
Syntactic Problems
53
Syntactic Application 1:
Prepositional Phrase (PP)
Attachment
54
HLT-ENMLP'05: Nakov&Hearst
(a) Peter spent millions of dollars. (noun)
(b) Peter spent time with his family. (verb)
Can be represented as a quadruple: (v, n1, p, n2)
(a) (spent, millions, of, dollars)
(b) (spent, time, with, family)
• Accuracy
– Surface features & paraphrases: 83.63%
– Best unsupervised (Lin&Pantel’00): 84.30%
Syntactic Application 1:
Prepositional Phrase Attachment
Human performance:
 quadruple: 88%
 whole sentence: 93%
55
PP Attachment: n-gram models
• (i) Pr(p|n1) vs. Pr(p|v)
• (ii) Pr(p,n2|n1) vs. Pr(p,n2|v)
– I eat/v spaghetti/n1 with/p a fork/n2.
– I eat/v spaghetti/n1 with/p sauce/n2.
HLT-ENMLP'05: Nakov&Hearst
56
PP Attachment:
Web-derived Surface Features
• Example features
– open the door / with a key  verb (100.00%, 0.13%)
– open the door (with a key)  verb (73.58%, 2.44%)
– open the door – with a key verb (68.18%, 2.03%)
– open the door , with a key  verb (58.44%, 7.09%)
– eat Spaghetti with sauce  noun (100.00%, 0.14%)
– eat ? spaghetti with sauce noun (83.33%, 0.55%)
– eat , spaghetti with sauce  noun (65.77%, 5.11%)
– eat : spaghetti with sauce  noun (64.71%, 1.57%)
Acc Cov
sumsum
compare
HLT-ENMLP'05: Nakov&Hearst
57
Paraphrases
v n1 p n2
• v n2 n1 (noun)
• v p n2 n1 (verb)
• p n2 * v n1 (verb)
• n1 p n2 v (noun)
• v PRONOUN p n2 (verb)
• BE n1 p n2 (noun)
58
PP Attachment Paraphrases (1)
(1) v n1 p n2  v n2 n1 (noun)
• Turn “n1 p n2” into a noun compound “n2 n1”
– meet/v demands/n1 from/p customers/n2 
meet/v the customer/n2 demands/n1
HLT-ENMLP'05: Nakov&Hearst
59
PP Attachment Paraphrases (2)
(2) v n1 p n2  v p n2 n1 (verb)
• Swap direct and indirect objects:
– had/v a program/n1 in/p place/n2 
had/v in/p place/n2 a program/n1
HLT-ENMLP'05: Nakov&Hearst
60
PP Attachment Paraphrases (3)
(3) v n1 p n2  p n2 * v n1 (verb)
• Look for apposition of “p n2”
– I gave/v an apple/n1 to/p him/n2 
(It was) to/p him/n2 (that) I gave/v an apple/n1
HLT-ENMLP'05: Nakov&Hearst
61
PP Attachment Paraphrases (4)
(4) v n1 p n2  n1 p n2 v (noun)
• Look for apposition of “n1 p n2”
– shaken/v confidence/n1 in/p markets/n2 
confidence/n1 in/p markets/n2 shaken/v
HLT-ENMLP'05: Nakov&Hearst
62
PP Attachment Paraphrases (5)
(5) v n1 p n2  v PRONOUN p n2 (verb)
• Substitute n1 with a pronoun (him, her)
– put/v a client/n1 at/p odds/n2 
put/v him at/p odds/n2
pronoun
HLT-ENMLP'05: Nakov&Hearst
63
PP Attachment Paraphrases (6)
(6) v n1 p n2  BE n1 p n2 (noun)
• Substitute v with is/are/was/were, e.g.
– eat/v spaghetti/n1 with/p sauce/n2 
– is spaghetti/n1 with/p sauce/n2
to be
HLT-ENMLP'05: Nakov&Hearst
64
Syntactic Application 2:
Noun Compound
Coordination & Ellipsis
65
Syntactic Application 2:
Noun Compound Coordination & Ellipsis
• Penn Treebank
– ellipsis:
(NP bar/NN and/CC pie/NN graph/NN)
– no ellipsis:
(NP
(NP president/NN)
and/CC
(NP chief/NN executive/NN))
• Accuracy
– Surface features & paraphrases: 80.61%
HLT-ENMLP'05: Nakov&Hearst
Real-world coordinations can be more complex:
The Department of Chronic Diseases and Health
Promotion leads and strengthens global efforts to
prevent and control chronic diseases or disabilities
and to promote health and quality of life.
66
NP Coordination: N-gram models
(n1,c,n2,h)
• (i) #(n1,h) vs. #(n2,h)
• (ii) #(n1,h) vs. #(n1,c,n2)
HLT-ENMLP'05: Nakov&Hearst
67
NP Coordination: Surface Featuressumsum
compare
HLT-ENMLP'05: Nakov&Hearst
68
Paraphrases
n1 c n2 h
• n2 c n1 h (ellipsis)
• n2 h c n1 (NO ellipsis)
• n1 h c n2 h (ellipsis)
• n2 h c n1 h (ellipsis)
69
NP Coordination Paraphrases (1)
(1) n1 c n2 h  n2 c n1 h (ellipsis)
• Swap n1 and n2
– bar/n1 and/c pie/n2 graph/h 
pie/n2 and/c bar/n1 graph/h
HLT-ENMLP'05: Nakov&Hearst
70
NP Coordination Paraphrases (2)
(2) n1 c n2 h  n2 h c n1 (NO ellipsis)
• Swap n1 and n2 h
– president/n1 and/c chief/n2 executive/h 
chief/n2 executive/h and/c president/n1
HLT-ENMLP'05: Nakov&Hearst
71
NP Coordination Paraphrases (3)
(3) n1 c n2 h  n1 h c n2 h (ellipsis)
• Insert the elided head h
– bar/n1 and/c pie/n2 graph/h 
– bar/n1 graph/h and/c pie/n2 graph/h
h
HLT-ENMLP'05: Nakov&Hearst
72
NP Coordination Paraphrases (4)
(4) n1 c n2 h  n2 h c n1 h (ellipsis)
• Insert the head h; also switch n1 and n2
– bar/n1 and/c pie/n2 graph/h 
– pie/n2 graph/h and/c bar/n1 graph/h
h
HLT-ENMLP'05: Nakov&Hearst
73
Other
Syntactic Applications
74
More Applications (1):
Bracketing the Penn Treebank NPs
• Augmenting the Penn Treebank
ACL’07: Adding Noun Phrase Structure to the Penn Treebank
David Vadas and James R. Curran
(Vadas&Curran, ACL 2007)
75
More Applications (1):
Bracketing the Penn Treebank NPs
(Vadas&Curran, ACL 2007)
76
More Applications (1):
Bracketing the Penn Treebank NPs
 Constituency Parsing
 0.5% drop in F1
 But
 useful for QA, etc.
 helps fix the CCG-bank
(Vadas&Curran, ACL 2007)
77
More Applications (2):
Search Engine Query Segmentation
• Query Segmentation
– [ tumor suppressor protein ]
– [ tumor suppressor ] [ protein ]
– [ tumor ] [ suppressor protein ]
– [ tumor ] [ suppressor ] [ protein ]
• Bracketing
[ [ tumor suppressor ] protein ]
[ tumor [ suppressor protein ] ]
EMNLP'07: Learning Noun Phrase Query Segmentation
Shane Bergsma and Qin Iris Wang
(Bergsma and Wang, EMNLP 2007)
78
 Learn features for
 all head-argument structures
 individual attachments (not competing pairs)
 Generalize over POS
 Use Google 1T 5-grams instead of Web
 Error reduction
 dependency parser: 7% (MSTParser)
 constituency parser 9.2% (Berkeley parser)
 re-ranker: 3.4%
More Applications (3):
Full Syntactic Parsing ACL’11: Web-Scale Features
for Full-Scale Parsing
Mohit Bansal and Dan Klein
(Bansal & Klein, ACL 2011)
79
More Applications (3):
Full Syntactic Parsing
ACL’11: Web-Scale Features
for Full-Scale Parsing
Mohit Bansal and Dan Klein
(Bansal & Klein, ACL 2011)
80
More Applications (3):
Full Syntactic Parsing
For constituency parsing,
improvement is due to
 40% affinity
 60% paraphrases
ACL’11: Web-Scale Features
for Full-Scale Parsing
Mohit Bansal and Dan Klein
(Bansal & Klein, ACL 2011)
81
Noun Compound
Semantics
82
Noun Compound Semantics
• Typically, choose one abstract relation
– Fixed set of abstract relations (Girju&al.,2005)
• malaria mosquito  CAUSE
• olive oil  SOURCE
– Prepositions (Lauer,1995)
• malaria mosquito  with
• olive oil  from
• Proposed approach: use multiple paraphrasing verbs
– Paraphrasing verbs
• malaria mosquito  carries, spreads, causes, transmits, brings, has
• olive oil  comes from, is obtained from, is extracted from
– Distribution over paraphrasing verbs
ACL'08: Nakov&Hearst
83
Extracting Paraphrasing Verbs
Using a Linguistic Paraphrasing Pattern
• Given “malaria mosquito”, query Google for
“mosquito THAT * malaria“
• Extract verbs
23 carry
16 spread
12 cause
9 transmit
7 bring
4 have
3 be infected with
3 infect with
2 give
post-modifier
pre-modifier
84
Comparing to Girju&al. (2005)
shown are 14 out of 21 relations
85
Amazon’s Mechanical Turk:
Malaria Mosquito
• 10 human judges:
– 8 carries
– 4 causes
– 2 transmits
– 2 is infected with
– 2 infects with
– 1 has
– 1 gives
– 1 spreads
– 1 supplies
– …
 The program:
23 carry
16 spread
12 cause
9 transmit
7 bring
4 have
3 be infected with
3 infect with
2 give
On 250 noun-noun compounds
and 25-30 human judges:
32% cosine correlation
SemEval-2010 task 9: The Interpretation of Noun Compounds
Using Paraphrasing Verbs and Prepositions
C. Butnariu, Su Nam Kim, P. Nakov, D. Ó Séaghdha, S. Szpakowicz, T. Veale
86
Relational Componential Analysis
• Classic componential analysis
• Relational componential analysis
87
Noun Compound Semantics
Using Abstract Relations
[colon cancer] [[tumor suppressor] protein]
ABSTRACT RELATIONS:
[
[colon cancer]/LOCATION
[ [tumor suppressor]/PURPOSE protein]/AGENT
]/LOCATION
88
Noun Compound Semantics
Using Prepositional Paraphrases
[colon cancer] [[tumor suppressor] protein]
PREPOSITIONS:
{
{protein that is a
{suppressor of tumors}
}
in
{cancer of/in the colon}
}
89
Noun Compound Semantics
Using Paraphrasing Verbs
[colon cancer] [[tumor suppressor] protein]
VERBS:
{
{protein that acts as a
{suppressor that inhibits tumors}
}
which is implicated in
{cancer that occurs in the colon}
}
prevent/stop/keep
from
developing/growing/arising
90
Free Paraphrasing of Noun Compounds:
Going Beyond Verbs and Prepositions
“onion tears”
tears from onions
tears due to cutting onion
tears induced when cutting onions
tears that onions induce
tears that come from chopping onions
tears that sometimes flow when onions are chopped
tears that raw onions give you
SemEval-2013 task 4: Free Paraphrases of Noun Compounds
C. Butnariu, I. Hendrickx, S. N. Kim, Z. Kozareva, P. Nakov, D. Ó Séaghdha, S. Szpakowicz, T. Veale
91
Application to Other
Semantic Tasks
92
The V+P+C Semantic Vector
• For “noun1 noun2”, query:
"noun2 * noun1"
"noun1 * noun2"
• Extract:
– V: verbs
– P: prepositions
– C: coordinating conjunctions
committee member
93
Semantic Application 1:
Predicting Levi’s 12 Recoverably Deletable Predicates
• Accuracy (212 noun-noun compounds)
– V+P+C: 50.0%±6.7% (baseline: 19.6%)
ACL'08: Nakov&Hearst
94
Semantic Application 2:
SAT Analogy Questions
• Accuracy (174 noun:noun examples)
– LRA : 67.4%±7.1%
– V+P+C : 71.3%±7.0%
ACL'08: Nakov&Hearst
95
Semantic Application 3:
Relations Between Complex Nominals
• Accuracy
– Task #4 winner : 66.0%
– V+P+C : 67.0%
– + web-based argument generalization : 71.3%
SemEval-2007 task 4:Classification of semantic relations between nominals
R. Girju, P. Nakov, V. Nastase, S. Szpakowicz, P. Turney, D. Yuret
Follow-up: SemEval-2010 task 8: Multi-Way
Classification of Semantic Relations
Between Pairs of Nominals
I. Hendrickx, Su Nam Kim, Z. Kozareva, P.
Nakov, D. Ó Séaghdha, S. Padó, M.
Pennacchiotti, L. Romano, S. Szpakowicz
ACL'08: Nakov&Hearst
RANLP‘11: Nakov&Kozareva
96
Semantic Application 4:
30 Head-Modifier Relations
• Accuracy (600 examples)
– LRA : 39.8%±3.8%
– V+P+C : 40.5%±3.9%
ACL'08: Nakov&Hearst
97
“WTO Geneva headquarters” =
“headquarters of the WTO are located in Geneva”
(1) Geneva headquarters of the WTO
(2) WTO headquarters are located in Geneva
Semantic Application 5:
Textual Entailment
98
Application
to Machine Translation
99
Statistical Machine Translation:
Trained on Parallel Text
100
Noun Compounds in Phrase-based SMT
English 
After the
oil price hikes
of 1974 and 1980 ,
Japan's economy
recovered through
export growth
.
Spanish
Después de las
alzas en los precios del petróleo
de 1974 y 1980 ,
la economía nipona
se recuperó a través del
crecimiento basado en las exportaciones
.
Idea: paraphrase the source phrase to increase coverage
oil price hikes  alzas en los precios del petróleo
hikes in oil prices  alzas en los precios del petróleo
hikes in prices of oil  alzas en los precios del petróleo
hikes in prices for oil  alzas en los precios del petróleo
hikes in the prices of oil  alzas en los precios del petróleo
hikes in the prices for oil  alzas en los precios del petróleo
101
Pair each new sentence
with the original translation,
thus generating a synthetic corpus.
Train an SMT system on it.
Paraphrasing a Source-Language Sentence
Improvement: equivalent to 33-50%
of what could be achieved by doubling
the amount of training data.
ECAI'08: Nakov
105
Summary
106
Summary
• Syntactic Tasks
– Noun Compound Syntax
– Prepositional Phrase Attachment
– Noun Compound Coordination
– Full syntactic parsing, etc.
• Semantic Tasks
– Noun Compound Semantics
– Predicting
• Abstract Semantic Relations
• Relations Between Complex Nominals
• Head-Modifier Relations
– Solving SAT Analogy Problems
• Application to a Real-World Task
– Machine Translation
Tapped the potential of very large
corpora for corpus linguistics by
going beyond the n-gram:
• surface markers
• paraphrases
• linguistic knowledge
108
References
• (Banko&Brill, ACL 2011) Michele Banko and Eric Brill. Scaling to Very, Very Large
Corpora for Natural Language Disambiguation, In Proceedings of ACL, 2001.
• (Bansal&Klein, ACL 2011) Mohit Bansal and Dan Klein. Web-Scale Features for Full-
Scale Parsing. In Proceedings of ACL, 2011.
• (Bergsma and Wang, EMNLP 2007) Shane Bergsma and Qin Iris Wang. Learning Noun
Phrase Query Segmentation. In Proceedings of EMNLP, 2007.
• (Brants&al., EMNLP 2007) Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och,
and Jeffrey Dean. Large Language Models in Machine Translation. In Proceedings of
EMNLP, 2007.
• (Butnariu&Veale, COLING 2008) Cristina Butnariu and Tony Veale. A concept-centered
approach to noun-compound interpretation. In Proceedings of COLING, 2008.
• (Butnariu&al., SemEval 2010) Cristina Butnariu, Su Nam Kim, Preslav Nakov, Diarmuid
and O Seaghdha, Stan Szpakowicz, and Tony Veale. SemEval-2010 Task 9: The
Interpretation of Noun Compounds Using Paraphrasing Verbs and Prepositions. In
Proceedings of SemEval, 2010.
109
References
• (Finin, 1980) Timothy Finin. The semantic interpretation of nominal compounds. In
Proceedings of the 1st National Conference on Artificial Intelligence, 1980.
• (Girju&al., CSL 2005) Roxana Girju, Dan Moldovan, Marta Tatu, and Daniel Antohe. On
the semantics of noun compounds. Computer Speech & Language 19(4): 479-496, 2005.
• (Girju&al., ACL 2007) Roxana Girju. Improving the Interpretation of Noun Phrases with
Cross-linguistic Information. In Proceedings of ACL, 2007.
• (Girju&al., SemEval 2007) Roxana Girju, Preslav Nakov, Vivi Nastase, Stan Szpakowicz,
Peter Turney, Deniz Yuret.. SemEval-2007 Task 04: Classification of Semantic Relations
between Nominals. In Proceedings of SemEval, 2007.
• (Hendrickx&al., SemEval 2010) Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav
Nakov, Diarmuid Ó Séaghdha, Sebastian Padó, Marco Pennacchiotti, Lorenza Romano,
and Stan Szpakowicz. SemEval-2010 Task 8: Multi-Way Classification of Semantic
Relations Between Pairs of Nominals. In Proceedings of SemEval, 2010.
• (Hendrickx&al., SemEval 2013) Iris Hendrickx, Zornitsa Kozareva, Preslav Nakov,
Diarmuid Ó Séaghdha, Stan Szpakowicz, and Tony Veale. SemEval-2013 Task 4: Free
Paraphrases of Noun Compounds. Proceedings of SemEval, 2013.
110
References
• (Kim&Baldwin, 2005) Su Nam Kim and Timothy Baldwin. Automatic Interpretation of
noun compounds using WordNet::Similarity. In Proceedings of the 2nd International Joint
Conference on Natural Language Processing, Jeju Island, South Korea, 2005.
• (Kim&Baldwin, COLING 2006) Su Nam Kim and Timothy Baldwin. Interpreting semantic
relations in noun compounds via verb semantics. In Proceedings of COLING-ACL, 2006.
• (Kim&Nakov, EMNLP 2011) Su Nam Kim, Preslav Nakov. Large-Scale Noun Compound
Interpretation Using Bootstrapping and the Web as a Corpus. In Proceedings of EMNLP,
2011.
• (Lapata&Keller, HLT-NAACL 2004) Mirella Lapata and Frank Keller. 2004. The Web as a
Baseline: Evaluating the Performance of Unsupervised Web-based Models for a Range of
NLP Tasks. In Proceedings of HLT-NAACL, 2004.
• (Lapata&Keller, TSLP 2005) Mirella Lapata and Frank Keller. Web-based Models for
Natural Language Processing. ACM Transactions on Speech and Language Processing
2:1, 1-31, 2005.
• (Lauer, ACL 1995) Mark Lauer. Corpus Statistics Meet The Compound Noun: Some
Empirical Results. In Proceedings of ACL, 1995.
111
References
• (Levi, 1978) Judith N. Levi. The Syntax and Semantics of Complex Nominals. Academic
Press, New York, 1978.
• (Nakov, 2007 PhD thesis) Preslav Nakov. Using the Web as an Implicit Training Set:
Application to Noun Compound Syntax and Semantics. PhD thesis. EECS Department,
University of California, Berkeley. Technical Report No. UCB/EECS-2007-173. 2007.
• (Nakov, AIMSA 2008) Preslav Nakov. Noun Compound Interpretation Using Paraphrasing
Verbs: Feasibility Study. In Proceedings of AIMSA, 2008.
• (Nakov, NLE 2013) Preslav Nakov. On the Interpretation of Noun Compounds: Syntax,
Semantics, and Entailment. Natural Language Engineering, 19(3):192-330, 2013.
• (Nakov&Hearst, CoNLL 2005) Preslav Nakov and Marti Hearst. Search Engine Statistics
Beyond the n-gram: Application to Noun Compound Bracketing. In Proceedings of CoNLL,
2005.
• (Nakov&Hearst, HLT-EMNLP 2005) Preslav Nakov and Marti Hearst. Using the Web as
an Implicit Training Set: Application to Structural Ambiguity Resolution. In Proceedings of
HLT/EMNLP, 2005.
112
References
• (Nakov&Hearst, AIMSA 2006) Preslav Nakov, Marti A. Hearst. Using Verbs to
Characterize Noun-Noun Relations. In Proceedings of AIMSA, 2006.
• (Nakov&Hearst, ACL 2008) Preslav Nakov and Marti Hearst. Solving Relational Similarity
Problems Using the Web as a Corpus. In Proceedings of ACL, 2008.
• (Nakov&Hearst, TSLP 2013) Preslav Nakov and Marti Hearst. Semantic Interpretation of
Noun Compounds Using Verbal and Other Paraphrases. ACM Transactions on Speech and
Language Processing, 10(3):13:1-13:51, 2013.
• (Nakov&Kozareva, RANLP 2011) Preslav Nakov and Zornitsa Kozareva. Combining
Relational and Attributional Similarity for Semantic Relation Classification. In Proceedings of
RANLP, 2011.
• (Nastase & Szpakowicz, 2003) Vivi Nastase and Stan Szpakowicz. Exploring noun-
modifier semantic relations. In Proceedings of the 6th International Workshop on
Computational Semantics, Tilburg, The Netherlands, 2003.
• (Pantel&Lin, ACL 2000) Patrick Pantel and Dekang Lin. 2000. An unsupervised
approach to prepositional phrase attachment using contextually similar words. In
Proceedings of ACL, 2000.
113
References
• (Resnik, 1993 PhD thesis) Philip Resnik. Selection and Information: A Class-Based
Approach to Lexical Relationships, Doctoral Dissertation, Department of Computer and
Information Science, University of Pennsylvania, 1993.
• (Vadas&Curran, ACL 2007) David Vadas and James R. Curran. Adding Noun Phrase
Structure to the Penn Treebank. In Proceedings of ACL, 2007.
• (Vanderwende, 1994) Lucy Vanderwende. Algorithm for the automatic interpretation of
noun sequences. In Proceedings of COLING, 1994.
114
Acknowledgments
• Some slides adapted from Marti Hearst, Dan Klein, Chris
Manning, and others

Weitere ähnliche Inhalte

Andere mochten auch

Big Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companiesBig Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companiesData Science Society
 
Real-time information analysis: social networks and open data
Real-time information analysis: social networks and open dataReal-time information analysis: social networks and open data
Real-time information analysis: social networks and open dataData Science Society
 
Demand model development for the retail sector of industry
Demand model development for the retail sector of industryDemand model development for the retail sector of industry
Demand model development for the retail sector of industryData Science Society
 
Location Intelligence - the Next Evolution of Business Applications
Location Intelligence - the Next Evolution of Business ApplicationsLocation Intelligence - the Next Evolution of Business Applications
Location Intelligence - the Next Evolution of Business ApplicationsMISNet - Integeo SE Asia
 

Andere mochten auch (6)

Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Big Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companiesBig Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companies
 
Real-time information analysis: social networks and open data
Real-time information analysis: social networks and open dataReal-time information analysis: social networks and open data
Real-time information analysis: social networks and open data
 
Demand model development for the retail sector of industry
Demand model development for the retail sector of industryDemand model development for the retail sector of industry
Demand model development for the retail sector of industry
 
Machine learning for NLP
Machine learning for NLPMachine learning for NLP
Machine learning for NLP
 
Location Intelligence - the Next Evolution of Business Applications
Location Intelligence - the Next Evolution of Business ApplicationsLocation Intelligence - the Next Evolution of Business Applications
Location Intelligence - the Next Evolution of Business Applications
 

Ähnlich wie Preslav Nakov - The Web as a Training Set Part 1

Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyCrowdsourcing the Quality of Knowledge Graphs:A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia StudyMaribel Acosta Deibe
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesBertram Ludäscher
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question AnsweringMarina Santini
 
2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible researchYannick Wurm
 
What’s wrong with research papers - and (how) can we fix it?
What’s wrong with research papers -  and (how) can we fix it?What’s wrong with research papers -  and (how) can we fix it?
What’s wrong with research papers - and (how) can we fix it?Anita de Waard
 
Alignment of Ontology Design Patterns: Class As Property Value, Value Partiti...
Alignment of Ontology Design Patterns: Class As Property Value, Value Partiti...Alignment of Ontology Design Patterns: Class As Property Value, Value Partiti...
Alignment of Ontology Design Patterns: Class As Property Value, Value Partiti...Bene Rodriguez
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Roy Clariana
 
YesWorkflow: Yes, Scripts can be Workflows, Too!
YesWorkflow: Yes, Scripts can be Workflows, Too!YesWorkflow: Yes, Scripts can be Workflows, Too!
YesWorkflow: Yes, Scripts can be Workflows, Too!Bertram Ludäscher
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdfSoha82
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesBertram Ludäscher
 
SampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS StackSampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS Stackjeykottalam
 
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...Maribel Acosta Deibe
 

Ähnlich wie Preslav Nakov - The Web as a Training Set Part 1 (17)

Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyCrowdsourcing the Quality of Knowledge Graphs:A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science Tales
 
All good things
All good thingsAll good things
All good things
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
Xerox2009
Xerox2009Xerox2009
Xerox2009
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research
 
What’s wrong with research papers - and (how) can we fix it?
What’s wrong with research papers -  and (how) can we fix it?What’s wrong with research papers -  and (how) can we fix it?
What’s wrong with research papers - and (how) can we fix it?
 
Nlp tech talk
Nlp tech talkNlp tech talk
Nlp tech talk
 
Alignment of Ontology Design Patterns: Class As Property Value, Value Partiti...
Alignment of Ontology Design Patterns: Class As Property Value, Value Partiti...Alignment of Ontology Design Patterns: Class As Property Value, Value Partiti...
Alignment of Ontology Design Patterns: Class As Property Value, Value Partiti...
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016
 
YesWorkflow: Yes, Scripts can be Workflows, Too!
YesWorkflow: Yes, Scripts can be Workflows, Too!YesWorkflow: Yes, Scripts can be Workflows, Too!
YesWorkflow: Yes, Scripts can be Workflows, Too!
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdf
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
SampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS StackSampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS Stack
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
 
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
 

Mehr von Data Science Society

[Data Meetup] Data Science in Finance - Factor Models in Finance
[Data Meetup] Data Science in Finance - Factor Models in Finance[Data Meetup] Data Science in Finance - Factor Models in Finance
[Data Meetup] Data Science in Finance - Factor Models in FinanceData Science Society
 
[Data Meetup] Data Science in Finance - Building a Quant ML pipeline
[Data Meetup] Data Science in Finance -  Building a Quant ML pipeline[Data Meetup] Data Science in Finance -  Building a Quant ML pipeline
[Data Meetup] Data Science in Finance - Building a Quant ML pipelineData Science Society
 
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MITData Science Society
 
ML in Proptech - Concept to Production
ML in Proptech  -  Concept to ProductionML in Proptech  -  Concept to Production
ML in Proptech - Concept to ProductionData Science Society
 
Lessons Learned: Linked Open Data implemented in 2 Use Cases
Lessons Learned: Linked Open Data implemented in 2 Use CasesLessons Learned: Linked Open Data implemented in 2 Use Cases
Lessons Learned: Linked Open Data implemented in 2 Use CasesData Science Society
 
AI methods for localization in noisy environment
AI methods for localization in noisy environment AI methods for localization in noisy environment
AI methods for localization in noisy environment Data Science Society
 
Object Identification and Detection Hackathon Solution
Object Identification and Detection Hackathon Solution Object Identification and Detection Hackathon Solution
Object Identification and Detection Hackathon Solution Data Science Society
 
Data Science for Open Innovation in SMEs and Large Corporations
Data Science for Open Innovation in SMEs and Large CorporationsData Science for Open Innovation in SMEs and Large Corporations
Data Science for Open Innovation in SMEs and Large CorporationsData Science Society
 
Air Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi teamAir Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi teamData Science Society
 
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon CaseData Science Society
 
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018Data Science Society
 
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 SolutionDNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 SolutionData Science Society
 
Relationships between research tasks and data structure (basic methods and a...
Relationships between research tasks and data structure (basic  methods and a...Relationships between research tasks and data structure (basic  methods and a...
Relationships between research tasks and data structure (basic methods and a...Data Science Society
 
Data science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.HaralampievData science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.HaralampievData Science Society
 
Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel Data Science Society
 
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...Data Science Society
 
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav NakovIntelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav NakovData Science Society
 
Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg Data Science Society
 

Mehr von Data Science Society (20)

[Data Meetup] Data Science in Finance - Factor Models in Finance
[Data Meetup] Data Science in Finance - Factor Models in Finance[Data Meetup] Data Science in Finance - Factor Models in Finance
[Data Meetup] Data Science in Finance - Factor Models in Finance
 
[Data Meetup] Data Science in Finance - Building a Quant ML pipeline
[Data Meetup] Data Science in Finance -  Building a Quant ML pipeline[Data Meetup] Data Science in Finance -  Building a Quant ML pipeline
[Data Meetup] Data Science in Finance - Building a Quant ML pipeline
 
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
 
Computer Vision in Real Estate
Computer Vision in Real EstateComputer Vision in Real Estate
Computer Vision in Real Estate
 
ML in Proptech - Concept to Production
ML in Proptech  -  Concept to ProductionML in Proptech  -  Concept to Production
ML in Proptech - Concept to Production
 
Lessons Learned: Linked Open Data implemented in 2 Use Cases
Lessons Learned: Linked Open Data implemented in 2 Use CasesLessons Learned: Linked Open Data implemented in 2 Use Cases
Lessons Learned: Linked Open Data implemented in 2 Use Cases
 
AI methods for localization in noisy environment
AI methods for localization in noisy environment AI methods for localization in noisy environment
AI methods for localization in noisy environment
 
Object Identification and Detection Hackathon Solution
Object Identification and Detection Hackathon Solution Object Identification and Detection Hackathon Solution
Object Identification and Detection Hackathon Solution
 
Data Science for Open Innovation in SMEs and Large Corporations
Data Science for Open Innovation in SMEs and Large CorporationsData Science for Open Innovation in SMEs and Large Corporations
Data Science for Open Innovation in SMEs and Large Corporations
 
Air Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi teamAir Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi team
 
Machine Learning in Astrophysics
Machine Learning in AstrophysicsMachine Learning in Astrophysics
Machine Learning in Astrophysics
 
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
 
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
 
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 SolutionDNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
 
Relationships between research tasks and data structure (basic methods and a...
Relationships between research tasks and data structure (basic  methods and a...Relationships between research tasks and data structure (basic  methods and a...
Relationships between research tasks and data structure (basic methods and a...
 
Data science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.HaralampievData science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.Haralampiev
 
Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel
 
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
 
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav NakovIntelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
 
Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg
 

Kürzlich hochgeladen

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 

Kürzlich hochgeladen (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 

Preslav Nakov - The Web as a Training Set Part 1

  • 1. The Web as an Implicit Training Set: Application to Noun Compound Syntax and Semantics Preslav Nakov, Qatar Computing Research Institute, HBKU Data Science Society July 23, 2015 Sofia, Bulgaria
  • 3. The Big Dream (2001: A Space Odyssey) Dave Bowman: “Open the pod bay doors, HAL” HAL 9000: “I’m sorry Dave. I’m afraid I can’t do that.”
  • 7. QA: IBM Watson won Jeopardy!
  • 11. The Turing Test Passed?
  • 17. Turing Test: Not Passed?
  • 18.
  • 19.
  • 20. Brief History of NLP •50s: early optimism •1964: ALPAC report - skeptical of MT so far - basic research in CL needed •90s: statistical revolution - get large text collections - compute statistics
  • 22. Size Matters Banko & Brill: “Scaling to Very, Very Large Corpora for Natural Language Disambiguation”, ACL’2001 • Spelling correction – Which word should we use? <principal> <principle> – In a given context: • The teachers and the principal of 137th school "Angel Kanchev" in Sofia, Bulgaria shared their experience with Jumpido over the last one year. The school is one of the few mentor schools in innovation of Microsoft's Microsoft's "Innovative schools program - Partners in learning". • “Bulgaria plans to carry out ambitious reforms aimed at modernization and at and at keeping the principle of the rule of law,” said President Plevneliev.
  • 23. 25  Log-linear improvement even to a billion words!  Getting more data is better than fine-tuning algorithms! (Banko & Brill, 2001) Great idea! Can it be extended to other tasks? For this problem, one can get a lot of training data. Size Matters: Using Billions of Words (Banko&Brill, ACL 2011)
  • 24. 26 Language Models for SMT at Google: Using Quadrillions (1015) of Words! (Brants&al., EMNLP 2007)
  • 25. 27 The Web as a Baseline • “Web as a baseline” (Lapata & Keller 04;05): n-gram models – machine translation candidate selection – article generation – noun compound interpretation – noun compound bracketing – adjective ordering – spelling correction – countability detection – prepositional phrase attachment • Their conclusion: – The Web should be used as a baseline. Significantly better than the best supervised algorithm. Not significantly different from the best supervised algorithm. These are all UNSUPERVISED! Wecandobetter… (Lapata&Keller, HLT-NAACL 2004, TSLP 2005)
  • 26. 28 The Web as an Implicit Training Set • Much more can be achieved using – surface features – paraphrases – linguistic knowledge • I will demonstrate this on noun compounds (and on some other problems)
  • 27. In Memoriam portrait by Gregory Grefenstette
  • 29. 31 Noun Compound • Def: Sequence of nouns that function as a single noun, e.g. – healthcare reform – plastic water bottle – colon cancer tumor suppressor protein – Korpuslinguistikkonferenz (German) – къща-музей, човекомесец, нармаг (Bulgarian) Three problems: 1. Segmentation 2. Syntax 3. Semantics
  • 30. 32
  • 32. 37 Noun Compound Syntax: The Problem plastic water bottle plastic water bottle OR ? [ plastic [ water bottle ] ] [ [ plastic water ] bottle ] right left water bottle made of plastic bottle containing plastic water
  • 33. 38 Measuring Word Association • Frequencies – Dependency: #(w1,w2) vs. #(w1,w3) – Adjacency: #(w1,w2) vs. #(w2,w3) • Probabilities – Dependency: Pr(w1w2|w2) vs. Pr(w1w3|w3) – Adjacency: Pr(w1w2|w2) vs. Pr(w2w3|w3) • Also: Pointwise Mutual Information, Chi Square, etc. Simple Word-based Models w1 w2 w3 adjacency dependency plastic water bottle
  • 34. 39 Web-derived Surface Features • Observations – Authors often disambiguate noun compounds using surface markers. – The size of the Web makes such markers frequent enough to be useful. • Ideas – Look for instances where the compound occurs with surface markers. – Also try • paraphrases • linguistic knowledge The Web as an Implicit Training Set
  • 35. 40 Web-derived Surface Features: Dash (hyphen) • Left dash – cell-cycle analysis  left • Right dash – donor T-cell  right CoNLL'05: Nakov&Hearst
  • 36. 41 Web-derived Surface Features: Possessive Marker • After the first word – world’s food production  right • After the second word – cell cycle’s analysis  left CoNLL'05: Nakov&Hearst
  • 37. 42 Web-derived Surface Features: Capitalization • don’t-care – lowercase – uppercase – Plasmodium vivax Malaria  left – plasmodium vivax Malaria  left • lowercase – uppercase – don’t-care – tumor Necrosis Factor  right – tumor Necrosis factor  right CoNLL'05: Nakov&Hearst
  • 38. 43 Web-derived Surface Features: Embedded Slash • Left embedded slash – leukemia/lymphoma cell  right CoNLL'05: Nakov&Hearst
  • 39. 44 Web-derived Surface Features: Parentheses • Single word – growth factor (beta)  left – (tumor) necrosis factor  right • Two words – (cell cycle) analysis  left – adult (male rat)  right CoNLL'05: Nakov&Hearst
  • 40. 45 Web-derived Surface Features: Comma,dot,column,semi-column,… • Following the second word – lung cancer: patients  left – health care, provider  left • Following the first word – home. health care  right – adult, male rat  right CoNLL'05: Nakov&Hearst
  • 41. 46 Web-derived Surface Features: Abbreviation • After the second word – tumor necrosis (TN) factor  left • After the third word – tumor necrosis factor (NF)  right CoNLL'05: Nakov&Hearst
  • 42. 47 Web-derived Surface Features: Concatenation Consider “health care reform” • Dependency model – healthcare vs. healthreform • Adjacency model – healthcare vs. carereform • Triples – “healthcare reform” vs. “health carereform” w1 w2 w3 adjacency dependency health care reform CoNLL'05: Nakov&Hearst
  • 43. 48 Web-derived Surface Features: Internal Inflection Variability • First word – bone mineral density – bones mineral density • Second word – bone mineral density – bone minerals density CoNLL'05: Nakov&Hearst
  • 44. 49 Web-derived Surface Features: Switch The First Two Words • Predict right if we can reorder – adult male rat as – male adult rat CoNLL'05: Nakov&Hearst
  • 45. 50 • Prepositional – cells in (the) bone marrow  left (61,700) – cells from (the) bone marrow  left (16,500) – marrow cells from (the) bone  right (12) • Verbal – cells extracted from (the) bone marrow  left (17) – marrow cells found in (the) bone  right (1) • Copula – cells that are bone marrow  left (3) “bone marrow cell”: left or right? Paraphrases CoNLL'05: Nakov&Hearst “left” sum “right” sum compare
  • 46. 51 • Word associations • Surface features and paraphrases Evaluation Results On 244 noun compounds from Grolier’s encyclopedia (Lauer dataset) Size does matter! Using MEDLINE instead of the Web (million times smaller) • 9.43% Coverage (23 out of 244 NCs) • 47.83% Accuracy (12 out of 23 wrong) CoNLL'05: Nakov&Hearst Acc. Cov. Acc. Cov.
  • 49. 54 HLT-ENMLP'05: Nakov&Hearst (a) Peter spent millions of dollars. (noun) (b) Peter spent time with his family. (verb) Can be represented as a quadruple: (v, n1, p, n2) (a) (spent, millions, of, dollars) (b) (spent, time, with, family) • Accuracy – Surface features & paraphrases: 83.63% – Best unsupervised (Lin&Pantel’00): 84.30% Syntactic Application 1: Prepositional Phrase Attachment Human performance:  quadruple: 88%  whole sentence: 93%
  • 50. 55 PP Attachment: n-gram models • (i) Pr(p|n1) vs. Pr(p|v) • (ii) Pr(p,n2|n1) vs. Pr(p,n2|v) – I eat/v spaghetti/n1 with/p a fork/n2. – I eat/v spaghetti/n1 with/p sauce/n2. HLT-ENMLP'05: Nakov&Hearst
  • 51. 56 PP Attachment: Web-derived Surface Features • Example features – open the door / with a key  verb (100.00%, 0.13%) – open the door (with a key)  verb (73.58%, 2.44%) – open the door – with a key verb (68.18%, 2.03%) – open the door , with a key  verb (58.44%, 7.09%) – eat Spaghetti with sauce  noun (100.00%, 0.14%) – eat ? spaghetti with sauce noun (83.33%, 0.55%) – eat , spaghetti with sauce  noun (65.77%, 5.11%) – eat : spaghetti with sauce  noun (64.71%, 1.57%) Acc Cov sumsum compare HLT-ENMLP'05: Nakov&Hearst
  • 52. 57 Paraphrases v n1 p n2 • v n2 n1 (noun) • v p n2 n1 (verb) • p n2 * v n1 (verb) • n1 p n2 v (noun) • v PRONOUN p n2 (verb) • BE n1 p n2 (noun)
  • 53. 58 PP Attachment Paraphrases (1) (1) v n1 p n2  v n2 n1 (noun) • Turn “n1 p n2” into a noun compound “n2 n1” – meet/v demands/n1 from/p customers/n2  meet/v the customer/n2 demands/n1 HLT-ENMLP'05: Nakov&Hearst
  • 54. 59 PP Attachment Paraphrases (2) (2) v n1 p n2  v p n2 n1 (verb) • Swap direct and indirect objects: – had/v a program/n1 in/p place/n2  had/v in/p place/n2 a program/n1 HLT-ENMLP'05: Nakov&Hearst
  • 55. 60 PP Attachment Paraphrases (3) (3) v n1 p n2  p n2 * v n1 (verb) • Look for apposition of “p n2” – I gave/v an apple/n1 to/p him/n2  (It was) to/p him/n2 (that) I gave/v an apple/n1 HLT-ENMLP'05: Nakov&Hearst
  • 56. 61 PP Attachment Paraphrases (4) (4) v n1 p n2  n1 p n2 v (noun) • Look for apposition of “n1 p n2” – shaken/v confidence/n1 in/p markets/n2  confidence/n1 in/p markets/n2 shaken/v HLT-ENMLP'05: Nakov&Hearst
  • 57. 62 PP Attachment Paraphrases (5) (5) v n1 p n2  v PRONOUN p n2 (verb) • Substitute n1 with a pronoun (him, her) – put/v a client/n1 at/p odds/n2  put/v him at/p odds/n2 pronoun HLT-ENMLP'05: Nakov&Hearst
  • 58. 63 PP Attachment Paraphrases (6) (6) v n1 p n2  BE n1 p n2 (noun) • Substitute v with is/are/was/were, e.g. – eat/v spaghetti/n1 with/p sauce/n2  – is spaghetti/n1 with/p sauce/n2 to be HLT-ENMLP'05: Nakov&Hearst
  • 59. 64 Syntactic Application 2: Noun Compound Coordination & Ellipsis
  • 60. 65 Syntactic Application 2: Noun Compound Coordination & Ellipsis • Penn Treebank – ellipsis: (NP bar/NN and/CC pie/NN graph/NN) – no ellipsis: (NP (NP president/NN) and/CC (NP chief/NN executive/NN)) • Accuracy – Surface features & paraphrases: 80.61% HLT-ENMLP'05: Nakov&Hearst Real-world coordinations can be more complex: The Department of Chronic Diseases and Health Promotion leads and strengthens global efforts to prevent and control chronic diseases or disabilities and to promote health and quality of life.
  • 61. 66 NP Coordination: N-gram models (n1,c,n2,h) • (i) #(n1,h) vs. #(n2,h) • (ii) #(n1,h) vs. #(n1,c,n2) HLT-ENMLP'05: Nakov&Hearst
  • 62. 67 NP Coordination: Surface Featuressumsum compare HLT-ENMLP'05: Nakov&Hearst
  • 63. 68 Paraphrases n1 c n2 h • n2 c n1 h (ellipsis) • n2 h c n1 (NO ellipsis) • n1 h c n2 h (ellipsis) • n2 h c n1 h (ellipsis)
  • 64. 69 NP Coordination Paraphrases (1) (1) n1 c n2 h  n2 c n1 h (ellipsis) • Swap n1 and n2 – bar/n1 and/c pie/n2 graph/h  pie/n2 and/c bar/n1 graph/h HLT-ENMLP'05: Nakov&Hearst
  • 65. 70 NP Coordination Paraphrases (2) (2) n1 c n2 h  n2 h c n1 (NO ellipsis) • Swap n1 and n2 h – president/n1 and/c chief/n2 executive/h  chief/n2 executive/h and/c president/n1 HLT-ENMLP'05: Nakov&Hearst
  • 66. 71 NP Coordination Paraphrases (3) (3) n1 c n2 h  n1 h c n2 h (ellipsis) • Insert the elided head h – bar/n1 and/c pie/n2 graph/h  – bar/n1 graph/h and/c pie/n2 graph/h h HLT-ENMLP'05: Nakov&Hearst
  • 67. 72 NP Coordination Paraphrases (4) (4) n1 c n2 h  n2 h c n1 h (ellipsis) • Insert the head h; also switch n1 and n2 – bar/n1 and/c pie/n2 graph/h  – pie/n2 graph/h and/c bar/n1 graph/h h HLT-ENMLP'05: Nakov&Hearst
  • 69. 74 More Applications (1): Bracketing the Penn Treebank NPs • Augmenting the Penn Treebank ACL’07: Adding Noun Phrase Structure to the Penn Treebank David Vadas and James R. Curran (Vadas&Curran, ACL 2007)
  • 70. 75 More Applications (1): Bracketing the Penn Treebank NPs (Vadas&Curran, ACL 2007)
  • 71. 76 More Applications (1): Bracketing the Penn Treebank NPs  Constituency Parsing  0.5% drop in F1  But  useful for QA, etc.  helps fix the CCG-bank (Vadas&Curran, ACL 2007)
  • 72. 77 More Applications (2): Search Engine Query Segmentation • Query Segmentation – [ tumor suppressor protein ] – [ tumor suppressor ] [ protein ] – [ tumor ] [ suppressor protein ] – [ tumor ] [ suppressor ] [ protein ] • Bracketing [ [ tumor suppressor ] protein ] [ tumor [ suppressor protein ] ] EMNLP'07: Learning Noun Phrase Query Segmentation Shane Bergsma and Qin Iris Wang (Bergsma and Wang, EMNLP 2007)
  • 73. 78  Learn features for  all head-argument structures  individual attachments (not competing pairs)  Generalize over POS  Use Google 1T 5-grams instead of Web  Error reduction  dependency parser: 7% (MSTParser)  constituency parser 9.2% (Berkeley parser)  re-ranker: 3.4% More Applications (3): Full Syntactic Parsing ACL’11: Web-Scale Features for Full-Scale Parsing Mohit Bansal and Dan Klein (Bansal & Klein, ACL 2011)
  • 74. 79 More Applications (3): Full Syntactic Parsing ACL’11: Web-Scale Features for Full-Scale Parsing Mohit Bansal and Dan Klein (Bansal & Klein, ACL 2011)
  • 75. 80 More Applications (3): Full Syntactic Parsing For constituency parsing, improvement is due to  40% affinity  60% paraphrases ACL’11: Web-Scale Features for Full-Scale Parsing Mohit Bansal and Dan Klein (Bansal & Klein, ACL 2011)
  • 77. 82 Noun Compound Semantics • Typically, choose one abstract relation – Fixed set of abstract relations (Girju&al.,2005) • malaria mosquito  CAUSE • olive oil  SOURCE – Prepositions (Lauer,1995) • malaria mosquito  with • olive oil  from • Proposed approach: use multiple paraphrasing verbs – Paraphrasing verbs • malaria mosquito  carries, spreads, causes, transmits, brings, has • olive oil  comes from, is obtained from, is extracted from – Distribution over paraphrasing verbs ACL'08: Nakov&Hearst
  • 78. 83 Extracting Paraphrasing Verbs Using a Linguistic Paraphrasing Pattern • Given “malaria mosquito”, query Google for “mosquito THAT * malaria“ • Extract verbs 23 carry 16 spread 12 cause 9 transmit 7 bring 4 have 3 be infected with 3 infect with 2 give post-modifier pre-modifier
  • 79. 84 Comparing to Girju&al. (2005) shown are 14 out of 21 relations
  • 80. 85 Amazon’s Mechanical Turk: Malaria Mosquito • 10 human judges: – 8 carries – 4 causes – 2 transmits – 2 is infected with – 2 infects with – 1 has – 1 gives – 1 spreads – 1 supplies – …  The program: 23 carry 16 spread 12 cause 9 transmit 7 bring 4 have 3 be infected with 3 infect with 2 give On 250 noun-noun compounds and 25-30 human judges: 32% cosine correlation SemEval-2010 task 9: The Interpretation of Noun Compounds Using Paraphrasing Verbs and Prepositions C. Butnariu, Su Nam Kim, P. Nakov, D. Ó Séaghdha, S. Szpakowicz, T. Veale
  • 81. 86 Relational Componential Analysis • Classic componential analysis • Relational componential analysis
  • 82. 87 Noun Compound Semantics Using Abstract Relations [colon cancer] [[tumor suppressor] protein] ABSTRACT RELATIONS: [ [colon cancer]/LOCATION [ [tumor suppressor]/PURPOSE protein]/AGENT ]/LOCATION
  • 83. 88 Noun Compound Semantics Using Prepositional Paraphrases [colon cancer] [[tumor suppressor] protein] PREPOSITIONS: { {protein that is a {suppressor of tumors} } in {cancer of/in the colon} }
  • 84. 89 Noun Compound Semantics Using Paraphrasing Verbs [colon cancer] [[tumor suppressor] protein] VERBS: { {protein that acts as a {suppressor that inhibits tumors} } which is implicated in {cancer that occurs in the colon} } prevent/stop/keep from developing/growing/arising
  • 85. 90 Free Paraphrasing of Noun Compounds: Going Beyond Verbs and Prepositions “onion tears” tears from onions tears due to cutting onion tears induced when cutting onions tears that onions induce tears that come from chopping onions tears that sometimes flow when onions are chopped tears that raw onions give you SemEval-2013 task 4: Free Paraphrases of Noun Compounds C. Butnariu, I. Hendrickx, S. N. Kim, Z. Kozareva, P. Nakov, D. Ó Séaghdha, S. Szpakowicz, T. Veale
  • 87. 92 The V+P+C Semantic Vector • For “noun1 noun2”, query: "noun2 * noun1" "noun1 * noun2" • Extract: – V: verbs – P: prepositions – C: coordinating conjunctions committee member
  • 88. 93 Semantic Application 1: Predicting Levi’s 12 Recoverably Deletable Predicates • Accuracy (212 noun-noun compounds) – V+P+C: 50.0%±6.7% (baseline: 19.6%) ACL'08: Nakov&Hearst
  • 89. 94 Semantic Application 2: SAT Analogy Questions • Accuracy (174 noun:noun examples) – LRA : 67.4%±7.1% – V+P+C : 71.3%±7.0% ACL'08: Nakov&Hearst
  • 90. 95 Semantic Application 3: Relations Between Complex Nominals • Accuracy – Task #4 winner : 66.0% – V+P+C : 67.0% – + web-based argument generalization : 71.3% SemEval-2007 task 4:Classification of semantic relations between nominals R. Girju, P. Nakov, V. Nastase, S. Szpakowicz, P. Turney, D. Yuret Follow-up: SemEval-2010 task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals I. Hendrickx, Su Nam Kim, Z. Kozareva, P. Nakov, D. Ó Séaghdha, S. Padó, M. Pennacchiotti, L. Romano, S. Szpakowicz ACL'08: Nakov&Hearst RANLP‘11: Nakov&Kozareva
  • 91. 96 Semantic Application 4: 30 Head-Modifier Relations • Accuracy (600 examples) – LRA : 39.8%±3.8% – V+P+C : 40.5%±3.9% ACL'08: Nakov&Hearst
  • 92. 97 “WTO Geneva headquarters” = “headquarters of the WTO are located in Geneva” (1) Geneva headquarters of the WTO (2) WTO headquarters are located in Geneva Semantic Application 5: Textual Entailment
  • 95. 100 Noun Compounds in Phrase-based SMT English  After the oil price hikes of 1974 and 1980 , Japan's economy recovered through export growth . Spanish Después de las alzas en los precios del petróleo de 1974 y 1980 , la economía nipona se recuperó a través del crecimiento basado en las exportaciones . Idea: paraphrase the source phrase to increase coverage oil price hikes  alzas en los precios del petróleo hikes in oil prices  alzas en los precios del petróleo hikes in prices of oil  alzas en los precios del petróleo hikes in prices for oil  alzas en los precios del petróleo hikes in the prices of oil  alzas en los precios del petróleo hikes in the prices for oil  alzas en los precios del petróleo
  • 96. 101 Pair each new sentence with the original translation, thus generating a synthetic corpus. Train an SMT system on it. Paraphrasing a Source-Language Sentence Improvement: equivalent to 33-50% of what could be achieved by doubling the amount of training data. ECAI'08: Nakov
  • 98. 106 Summary • Syntactic Tasks – Noun Compound Syntax – Prepositional Phrase Attachment – Noun Compound Coordination – Full syntactic parsing, etc. • Semantic Tasks – Noun Compound Semantics – Predicting • Abstract Semantic Relations • Relations Between Complex Nominals • Head-Modifier Relations – Solving SAT Analogy Problems • Application to a Real-World Task – Machine Translation Tapped the potential of very large corpora for corpus linguistics by going beyond the n-gram: • surface markers • paraphrases • linguistic knowledge
  • 99. 108 References • (Banko&Brill, ACL 2011) Michele Banko and Eric Brill. Scaling to Very, Very Large Corpora for Natural Language Disambiguation, In Proceedings of ACL, 2001. • (Bansal&Klein, ACL 2011) Mohit Bansal and Dan Klein. Web-Scale Features for Full- Scale Parsing. In Proceedings of ACL, 2011. • (Bergsma and Wang, EMNLP 2007) Shane Bergsma and Qin Iris Wang. Learning Noun Phrase Query Segmentation. In Proceedings of EMNLP, 2007. • (Brants&al., EMNLP 2007) Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. Large Language Models in Machine Translation. In Proceedings of EMNLP, 2007. • (Butnariu&Veale, COLING 2008) Cristina Butnariu and Tony Veale. A concept-centered approach to noun-compound interpretation. In Proceedings of COLING, 2008. • (Butnariu&al., SemEval 2010) Cristina Butnariu, Su Nam Kim, Preslav Nakov, Diarmuid and O Seaghdha, Stan Szpakowicz, and Tony Veale. SemEval-2010 Task 9: The Interpretation of Noun Compounds Using Paraphrasing Verbs and Prepositions. In Proceedings of SemEval, 2010.
  • 100. 109 References • (Finin, 1980) Timothy Finin. The semantic interpretation of nominal compounds. In Proceedings of the 1st National Conference on Artificial Intelligence, 1980. • (Girju&al., CSL 2005) Roxana Girju, Dan Moldovan, Marta Tatu, and Daniel Antohe. On the semantics of noun compounds. Computer Speech & Language 19(4): 479-496, 2005. • (Girju&al., ACL 2007) Roxana Girju. Improving the Interpretation of Noun Phrases with Cross-linguistic Information. In Proceedings of ACL, 2007. • (Girju&al., SemEval 2007) Roxana Girju, Preslav Nakov, Vivi Nastase, Stan Szpakowicz, Peter Turney, Deniz Yuret.. SemEval-2007 Task 04: Classification of Semantic Relations between Nominals. In Proceedings of SemEval, 2007. • (Hendrickx&al., SemEval 2010) Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid Ó Séaghdha, Sebastian Padó, Marco Pennacchiotti, Lorenza Romano, and Stan Szpakowicz. SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals. In Proceedings of SemEval, 2010. • (Hendrickx&al., SemEval 2013) Iris Hendrickx, Zornitsa Kozareva, Preslav Nakov, Diarmuid Ó Séaghdha, Stan Szpakowicz, and Tony Veale. SemEval-2013 Task 4: Free Paraphrases of Noun Compounds. Proceedings of SemEval, 2013.
  • 101. 110 References • (Kim&Baldwin, 2005) Su Nam Kim and Timothy Baldwin. Automatic Interpretation of noun compounds using WordNet::Similarity. In Proceedings of the 2nd International Joint Conference on Natural Language Processing, Jeju Island, South Korea, 2005. • (Kim&Baldwin, COLING 2006) Su Nam Kim and Timothy Baldwin. Interpreting semantic relations in noun compounds via verb semantics. In Proceedings of COLING-ACL, 2006. • (Kim&Nakov, EMNLP 2011) Su Nam Kim, Preslav Nakov. Large-Scale Noun Compound Interpretation Using Bootstrapping and the Web as a Corpus. In Proceedings of EMNLP, 2011. • (Lapata&Keller, HLT-NAACL 2004) Mirella Lapata and Frank Keller. 2004. The Web as a Baseline: Evaluating the Performance of Unsupervised Web-based Models for a Range of NLP Tasks. In Proceedings of HLT-NAACL, 2004. • (Lapata&Keller, TSLP 2005) Mirella Lapata and Frank Keller. Web-based Models for Natural Language Processing. ACM Transactions on Speech and Language Processing 2:1, 1-31, 2005. • (Lauer, ACL 1995) Mark Lauer. Corpus Statistics Meet The Compound Noun: Some Empirical Results. In Proceedings of ACL, 1995.
  • 102. 111 References • (Levi, 1978) Judith N. Levi. The Syntax and Semantics of Complex Nominals. Academic Press, New York, 1978. • (Nakov, 2007 PhD thesis) Preslav Nakov. Using the Web as an Implicit Training Set: Application to Noun Compound Syntax and Semantics. PhD thesis. EECS Department, University of California, Berkeley. Technical Report No. UCB/EECS-2007-173. 2007. • (Nakov, AIMSA 2008) Preslav Nakov. Noun Compound Interpretation Using Paraphrasing Verbs: Feasibility Study. In Proceedings of AIMSA, 2008. • (Nakov, NLE 2013) Preslav Nakov. On the Interpretation of Noun Compounds: Syntax, Semantics, and Entailment. Natural Language Engineering, 19(3):192-330, 2013. • (Nakov&Hearst, CoNLL 2005) Preslav Nakov and Marti Hearst. Search Engine Statistics Beyond the n-gram: Application to Noun Compound Bracketing. In Proceedings of CoNLL, 2005. • (Nakov&Hearst, HLT-EMNLP 2005) Preslav Nakov and Marti Hearst. Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution. In Proceedings of HLT/EMNLP, 2005.
  • 103. 112 References • (Nakov&Hearst, AIMSA 2006) Preslav Nakov, Marti A. Hearst. Using Verbs to Characterize Noun-Noun Relations. In Proceedings of AIMSA, 2006. • (Nakov&Hearst, ACL 2008) Preslav Nakov and Marti Hearst. Solving Relational Similarity Problems Using the Web as a Corpus. In Proceedings of ACL, 2008. • (Nakov&Hearst, TSLP 2013) Preslav Nakov and Marti Hearst. Semantic Interpretation of Noun Compounds Using Verbal and Other Paraphrases. ACM Transactions on Speech and Language Processing, 10(3):13:1-13:51, 2013. • (Nakov&Kozareva, RANLP 2011) Preslav Nakov and Zornitsa Kozareva. Combining Relational and Attributional Similarity for Semantic Relation Classification. In Proceedings of RANLP, 2011. • (Nastase & Szpakowicz, 2003) Vivi Nastase and Stan Szpakowicz. Exploring noun- modifier semantic relations. In Proceedings of the 6th International Workshop on Computational Semantics, Tilburg, The Netherlands, 2003. • (Pantel&Lin, ACL 2000) Patrick Pantel and Dekang Lin. 2000. An unsupervised approach to prepositional phrase attachment using contextually similar words. In Proceedings of ACL, 2000.
  • 104. 113 References • (Resnik, 1993 PhD thesis) Philip Resnik. Selection and Information: A Class-Based Approach to Lexical Relationships, Doctoral Dissertation, Department of Computer and Information Science, University of Pennsylvania, 1993. • (Vadas&Curran, ACL 2007) David Vadas and James R. Curran. Adding Noun Phrase Structure to the Penn Treebank. In Proceedings of ACL, 2007. • (Vanderwende, 1994) Lucy Vanderwende. Algorithm for the automatic interpretation of noun sequences. In Proceedings of COLING, 1994.
  • 105. 114 Acknowledgments • Some slides adapted from Marti Hearst, Dan Klein, Chris Manning, and others