SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
[Freedman+ EMNLP11] Extreme
Extraction – Machine Reading in a
              Week

                23 Dec 2011
      Nakatani Shuyo @ Cybozu labs, Inc
               twitter : @shuyo
Abstract
• Target:
  – Rapid construction of concept and relation
    extraction system
• Method:
  – Extend an existing ACE system for new relation
  – in short time with minimum training data
     • in a Week (<50 person hours) with <20 example pairs
  – Evaluate by question answering task
Phases
1. Ontology and resources
2. Extending system for new ontology
3. Extracting relations
4. Evaluation
1. Ontology and resources
• possibleTreatment( Substance, Condition )
   – SSRIs(S) are effective treatments for depression(C)
• expectedDateOnMarket( Substance , Date )
   – More drugs for type 2(S) expected on market soon(D)
• responsibleForTreatment( Substance, Agent )
   – Officials(A) Responsible for Treatment of War Dead(S)
• studiesDisease( Agent , Condition )                       not
                                                           sure
   – cancer(C) researcher Dr. Henri Joyeux(A)
• hasSideEffect( Substance, Condition )
2. Extending system for new
               ontology
• Add new relation/class detectors into “our”
  extraction system for ACE task
  – Details of the system are not clear...
     • Class detectors with unsupervised word clustering
     • Bootstrap relation learner with a template and seeds
     • Pattern learning for relation extraction

• Annotate words for 4 classes
• Coreference
Bootstrap relation learner
• DAP(Double-Anchored Pattern) (Kozareva+ 08)
  – Web search with a query based on “<CLASS>
    such as <SEED> and *”
  – Add words at the position “*” in snippet into the
    class member as new seeds
  – Repeat “the bootstraping loop” while seeds are
    available
Relation detection with DAP
• CLASS = disease / SEED = cold
• Web search = “disease such as cold and”
Relation detection with DAP
• CLASS = disease / SEED = cold
• Web search = “disease such as cold and”
  – disease such as cold and flu (9). ...
  – disease such as cold and heat, external ...
  – disease such as cold and pneumonia. ...
  – disease (such as cold and hot diseases), ...
  – disease such as cold and flu viruses. ...
  – disease such as cold and food poisoning. ...
Four classes to annotate
• Substance-Name
  – medicine name
• Substance-Description
  – e.g. “new drags”
• Condition-Name
  – name of disease
• Condition-Description
  – e.g. “the illness”
Annotation
• Name tagging with active learning(Miller+ 04)
  – Unsupervised word clustering on binary tree
    (Brown+ 90)
  – Tagging with clustering information
     • Averaged Perceptron (Collins 02)

  – Request annotation for selected sentence based on
    “confidence score”
     • score = (highest perceptron score) - (second one)

                                       !?
Results of Class Detection
            What’s
       GS(GoldStandard)?




                                         from [Freedman+ 11]
• substances & conditions
   – -Name / -Description respectively
• without/with lists of known substances and conditions
Coreference
• It took the most time(20 of 43 hours)
• But its detail is not clear...
  – domain independent heuristics
  – appositive linking
3. Extracting relations
• Learned Patterns vs. Handwritten Patterns




                from [Freedman+ 11]
from [Freedman+ 11]
4. Evaluation
• Question Answering with extracted
  information


• Query examples
  – Find possible treatments for diabetes
  – What is expected date to market for Abilify?
Answer Example
• ACME produces a wide range of drugs
  including treatments for malaria and
  athletes foot
  – responsibleForTreatment(drugs, ACME)
  – possibleTreatment(drugs, malaria)
  – possibleTreatment(drugs, athletes foot)
from [Freedman+ 11]

• useful = answering complex query
When non-useful answers are removed




                                           from [Freedman+ 11]
•   annotator’s recall (A)
•   using combining both (C)
•   using only handwritten rules (H, HW)
•   using only learned patterns (L)
from [Freedman+ 11]
Discussion




 from [Freedman+ 11]
Conclusions
• The combination system can achieve
  F1 of 0.51 in a new domain in a week.
• It requires so little training data.
• The effectiveness of learning algorithms is
  still not competitive with handwritten
  patterns.
References
• [Freedman+ 11] Extreme Extraction – Machine
  Reading in a Week
• [Kozareva+ 08] Semantic Class Learning from the
  Web with Hyponym Pattern Linkage
• [Miller+ 04] Name Tagging with Word Cluster and
  Discriminative Training
   – [Brown+ 90] Class-based n-gram models of natural
     language
   – [Collins 02] Discriminative Training Methods for Hidden
     Markov Models: Theory and Experiments with Perceptron
     Algorithm

Weitere ähnliche Inhalte

Ähnlich wie Extreme Extraction - Machine Reading in a Week

Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...Liz Norman
 
Studying ppl scientifically nb 913
Studying ppl scientifically nb 913Studying ppl scientifically nb 913
Studying ppl scientifically nb 913Jim Forde
 
Soc. Unit I, Packet 2
Soc. Unit I, Packet 2Soc. Unit I, Packet 2
Soc. Unit I, Packet 2NHSDAnderson
 
Nursingnotes.info nursing-research-review
Nursingnotes.info nursing-research-reviewNursingnotes.info nursing-research-review
Nursingnotes.info nursing-research-reviewgrey clemente
 
Variations in citation practices across the scientific landscape: Analysis ba...
Variations in citation practices across the scientific landscape: Analysis ba...Variations in citation practices across the scientific landscape: Analysis ba...
Variations in citation practices across the scientific landscape: Analysis ba...Wout Lamers
 
MELJUN CORTES research seminar_1_the_research_process_coming_to_terms
MELJUN CORTES research seminar_1_the_research_process_coming_to_termsMELJUN CORTES research seminar_1_the_research_process_coming_to_terms
MELJUN CORTES research seminar_1_the_research_process_coming_to_termsMELJUN CORTES
 
Studyingpplscientificallynb914
Studyingpplscientificallynb914Studyingpplscientificallynb914
Studyingpplscientificallynb914Jim Forde
 
Clinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching WorkshopClinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching WorkshopRobin Featherstone
 
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to KnowSystematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to KnowLorie Kloda
 
Information retrieval in systematic reviews: a case study of the crime preven...
Information retrieval in systematic reviews: a case study of the crime preven...Information retrieval in systematic reviews: a case study of the crime preven...
Information retrieval in systematic reviews: a case study of the crime preven...Lisa Tompson
 
Methodology and research process
Methodology and research processMethodology and research process
Methodology and research processToufik Kasmi
 
The best research method طرق البحث
The best research method طرق البحثThe best research method طرق البحث
The best research method طرق البحثabdullah alhariri
 
Pronunciation App - Research Proposal
Pronunciation App - Research ProposalPronunciation App - Research Proposal
Pronunciation App - Research ProposalLiza Pesenson
 
Classroom Research
Classroom ResearchClassroom Research
Classroom Researchharrindl
 

Ähnlich wie Extreme Extraction - Machine Reading in a Week (20)

Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
 
Studying ppl scientifically nb 913
Studying ppl scientifically nb 913Studying ppl scientifically nb 913
Studying ppl scientifically nb 913
 
R methods 66
R methods 66R methods 66
R methods 66
 
Soc. Unit I, Packet 2
Soc. Unit I, Packet 2Soc. Unit I, Packet 2
Soc. Unit I, Packet 2
 
Nursingnotes.info nursing-research-review
Nursingnotes.info nursing-research-reviewNursingnotes.info nursing-research-review
Nursingnotes.info nursing-research-review
 
Effectiveness of New, Informationist-led Curriculum Changes at the College of...
Effectiveness of New, Informationist-led Curriculum Changes at the College of...Effectiveness of New, Informationist-led Curriculum Changes at the College of...
Effectiveness of New, Informationist-led Curriculum Changes at the College of...
 
Variations in citation practices across the scientific landscape: Analysis ba...
Variations in citation practices across the scientific landscape: Analysis ba...Variations in citation practices across the scientific landscape: Analysis ba...
Variations in citation practices across the scientific landscape: Analysis ba...
 
MELJUN CORTES research seminar_1_the_research_process_coming_to_terms
MELJUN CORTES research seminar_1_the_research_process_coming_to_termsMELJUN CORTES research seminar_1_the_research_process_coming_to_terms
MELJUN CORTES research seminar_1_the_research_process_coming_to_terms
 
Studyingpplscientificallynb914
Studyingpplscientificallynb914Studyingpplscientificallynb914
Studyingpplscientificallynb914
 
Clinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching WorkshopClinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching Workshop
 
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to KnowSystematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
 
Searching for evidence - Paramedicine
Searching for evidence - ParamedicineSearching for evidence - Paramedicine
Searching for evidence - Paramedicine
 
Information retrieval in systematic reviews: a case study of the crime preven...
Information retrieval in systematic reviews: a case study of the crime preven...Information retrieval in systematic reviews: a case study of the crime preven...
Information retrieval in systematic reviews: a case study of the crime preven...
 
Meta analysis_Sharanbasappa
Meta analysis_SharanbasappaMeta analysis_Sharanbasappa
Meta analysis_Sharanbasappa
 
Exercise Science
Exercise ScienceExercise Science
Exercise Science
 
Methodology and research process
Methodology and research processMethodology and research process
Methodology and research process
 
The best research method طرق البحث
The best research method طرق البحثThe best research method طرق البحث
The best research method طرق البحث
 
Podiatry: Searching for Evidence
Podiatry: Searching for EvidencePodiatry: Searching for Evidence
Podiatry: Searching for Evidence
 
Pronunciation App - Research Proposal
Pronunciation App - Research ProposalPronunciation App - Research Proposal
Pronunciation App - Research Proposal
 
Classroom Research
Classroom ResearchClassroom Research
Classroom Research
 

Mehr von Shuyo Nakatani

画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15Shuyo Nakatani
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksShuyo Nakatani
 
無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)Shuyo Nakatani
 
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Shuyo Nakatani
 
人工知能と機械学習の違いって?
人工知能と機械学習の違いって?人工知能と機械学習の違いって?
人工知能と機械学習の違いって?Shuyo Nakatani
 
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRRとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRShuyo Nakatani
 
ドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRShuyo Nakatani
 
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...Shuyo Nakatani
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章Shuyo Nakatani
 
星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章Shuyo Nakatani
 
言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyoShuyo Nakatani
 
Zipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPZipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPShuyo Nakatani
 
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...Shuyo Nakatani
 
ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014Shuyo Nakatani
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測Shuyo Nakatani
 
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5Shuyo Nakatani
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013Shuyo Nakatani
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門Shuyo Nakatani
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013Shuyo Nakatani
 
ノンパラベイズ入門の入門
ノンパラベイズ入門の入門ノンパラベイズ入門の入門
ノンパラベイズ入門の入門Shuyo Nakatani
 

Mehr von Shuyo Nakatani (20)

画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)
 
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)
 
人工知能と機械学習の違いって?
人工知能と機械学習の違いって?人工知能と機械学習の違いって?
人工知能と機械学習の違いって?
 
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRRとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
 
ドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoR
 
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章
 
星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章
 
言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo
 
Zipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPZipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLP
 
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
 
ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測
 
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013
 
ノンパラベイズ入門の入門
ノンパラベイズ入門の入門ノンパラベイズ入門の入門
ノンパラベイズ入門の入門
 

Kürzlich hochgeladen

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Kürzlich hochgeladen (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Extreme Extraction - Machine Reading in a Week

  • 1. [Freedman+ EMNLP11] Extreme Extraction – Machine Reading in a Week 23 Dec 2011 Nakatani Shuyo @ Cybozu labs, Inc twitter : @shuyo
  • 2. Abstract • Target: – Rapid construction of concept and relation extraction system • Method: – Extend an existing ACE system for new relation – in short time with minimum training data • in a Week (<50 person hours) with <20 example pairs – Evaluate by question answering task
  • 3. Phases 1. Ontology and resources 2. Extending system for new ontology 3. Extracting relations 4. Evaluation
  • 4. 1. Ontology and resources • possibleTreatment( Substance, Condition ) – SSRIs(S) are effective treatments for depression(C) • expectedDateOnMarket( Substance , Date ) – More drugs for type 2(S) expected on market soon(D) • responsibleForTreatment( Substance, Agent ) – Officials(A) Responsible for Treatment of War Dead(S) • studiesDisease( Agent , Condition ) not sure – cancer(C) researcher Dr. Henri Joyeux(A) • hasSideEffect( Substance, Condition )
  • 5. 2. Extending system for new ontology • Add new relation/class detectors into “our” extraction system for ACE task – Details of the system are not clear... • Class detectors with unsupervised word clustering • Bootstrap relation learner with a template and seeds • Pattern learning for relation extraction • Annotate words for 4 classes • Coreference
  • 6. Bootstrap relation learner • DAP(Double-Anchored Pattern) (Kozareva+ 08) – Web search with a query based on “<CLASS> such as <SEED> and *” – Add words at the position “*” in snippet into the class member as new seeds – Repeat “the bootstraping loop” while seeds are available
  • 7. Relation detection with DAP • CLASS = disease / SEED = cold • Web search = “disease such as cold and”
  • 8. Relation detection with DAP • CLASS = disease / SEED = cold • Web search = “disease such as cold and” – disease such as cold and flu (9). ... – disease such as cold and heat, external ... – disease such as cold and pneumonia. ... – disease (such as cold and hot diseases), ... – disease such as cold and flu viruses. ... – disease such as cold and food poisoning. ...
  • 9. Four classes to annotate • Substance-Name – medicine name • Substance-Description – e.g. “new drags” • Condition-Name – name of disease • Condition-Description – e.g. “the illness”
  • 10. Annotation • Name tagging with active learning(Miller+ 04) – Unsupervised word clustering on binary tree (Brown+ 90) – Tagging with clustering information • Averaged Perceptron (Collins 02) – Request annotation for selected sentence based on “confidence score” • score = (highest perceptron score) - (second one) !?
  • 11. Results of Class Detection What’s GS(GoldStandard)? from [Freedman+ 11] • substances & conditions – -Name / -Description respectively • without/with lists of known substances and conditions
  • 12. Coreference • It took the most time(20 of 43 hours) • But its detail is not clear... – domain independent heuristics – appositive linking
  • 13. 3. Extracting relations • Learned Patterns vs. Handwritten Patterns from [Freedman+ 11]
  • 15. 4. Evaluation • Question Answering with extracted information • Query examples – Find possible treatments for diabetes – What is expected date to market for Abilify?
  • 16. Answer Example • ACME produces a wide range of drugs including treatments for malaria and athletes foot – responsibleForTreatment(drugs, ACME) – possibleTreatment(drugs, malaria) – possibleTreatment(drugs, athletes foot)
  • 17. from [Freedman+ 11] • useful = answering complex query
  • 18. When non-useful answers are removed from [Freedman+ 11] • annotator’s recall (A) • using combining both (C) • using only handwritten rules (H, HW) • using only learned patterns (L)
  • 21. Conclusions • The combination system can achieve F1 of 0.51 in a new domain in a week. • It requires so little training data. • The effectiveness of learning algorithms is still not competitive with handwritten patterns.
  • 22. References • [Freedman+ 11] Extreme Extraction – Machine Reading in a Week • [Kozareva+ 08] Semantic Class Learning from the Web with Hyponym Pattern Linkage • [Miller+ 04] Name Tagging with Word Cluster and Discriminative Training – [Brown+ 90] Class-based n-gram models of natural language – [Collins 02] Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithm