2. Abstract
• Target:
– Rapid construction of concept and relation
extraction system
• Method:
– Extend an existing ACE system for new relation
– in short time with minimum training data
• in a Week (<50 person hours) with <20 example pairs
– Evaluate by question answering task
3. Phases
1. Ontology and resources
2. Extending system for new ontology
3. Extracting relations
4. Evaluation
4. 1. Ontology and resources
• possibleTreatment( Substance, Condition )
– SSRIs(S) are effective treatments for depression(C)
• expectedDateOnMarket( Substance , Date )
– More drugs for type 2(S) expected on market soon(D)
• responsibleForTreatment( Substance, Agent )
– Officials(A) Responsible for Treatment of War Dead(S)
• studiesDisease( Agent , Condition ) not
sure
– cancer(C) researcher Dr. Henri Joyeux(A)
• hasSideEffect( Substance, Condition )
5. 2. Extending system for new
ontology
• Add new relation/class detectors into “our”
extraction system for ACE task
– Details of the system are not clear...
• Class detectors with unsupervised word clustering
• Bootstrap relation learner with a template and seeds
• Pattern learning for relation extraction
• Annotate words for 4 classes
• Coreference
6. Bootstrap relation learner
• DAP(Double-Anchored Pattern) (Kozareva+ 08)
– Web search with a query based on “<CLASS>
such as <SEED> and *”
– Add words at the position “*” in snippet into the
class member as new seeds
– Repeat “the bootstraping loop” while seeds are
available
7. Relation detection with DAP
• CLASS = disease / SEED = cold
• Web search = “disease such as cold and”
8. Relation detection with DAP
• CLASS = disease / SEED = cold
• Web search = “disease such as cold and”
– disease such as cold and flu (9). ...
– disease such as cold and heat, external ...
– disease such as cold and pneumonia. ...
– disease (such as cold and hot diseases), ...
– disease such as cold and flu viruses. ...
– disease such as cold and food poisoning. ...
9. Four classes to annotate
• Substance-Name
– medicine name
• Substance-Description
– e.g. “new drags”
• Condition-Name
– name of disease
• Condition-Description
– e.g. “the illness”
10. Annotation
• Name tagging with active learning(Miller+ 04)
– Unsupervised word clustering on binary tree
(Brown+ 90)
– Tagging with clustering information
• Averaged Perceptron (Collins 02)
– Request annotation for selected sentence based on
“confidence score”
• score = (highest perceptron score) - (second one)
!?
11. Results of Class Detection
What’s
GS(GoldStandard)?
from [Freedman+ 11]
• substances & conditions
– -Name / -Description respectively
• without/with lists of known substances and conditions
12. Coreference
• It took the most time(20 of 43 hours)
• But its detail is not clear...
– domain independent heuristics
– appositive linking
15. 4. Evaluation
• Question Answering with extracted
information
• Query examples
– Find possible treatments for diabetes
– What is expected date to market for Abilify?
16. Answer Example
• ACME produces a wide range of drugs
including treatments for malaria and
athletes foot
– responsibleForTreatment(drugs, ACME)
– possibleTreatment(drugs, malaria)
– possibleTreatment(drugs, athletes foot)
18. When non-useful answers are removed
from [Freedman+ 11]
• annotator’s recall (A)
• using combining both (C)
• using only handwritten rules (H, HW)
• using only learned patterns (L)
21. Conclusions
• The combination system can achieve
F1 of 0.51 in a new domain in a week.
• It requires so little training data.
• The effectiveness of learning algorithms is
still not competitive with handwritten
patterns.
22. References
• [Freedman+ 11] Extreme Extraction – Machine
Reading in a Week
• [Kozareva+ 08] Semantic Class Learning from the
Web with Hyponym Pattern Linkage
• [Miller+ 04] Name Tagging with Word Cluster and
Discriminative Training
– [Brown+ 90] Class-based n-gram models of natural
language
– [Collins 02] Discriminative Training Methods for Hidden
Markov Models: Theory and Experiments with Perceptron
Algorithm