SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Extracting Complex Biological Events
with Rich Graph­Based Feature Sets


 Jari Björne, Juho Heimonen, Filip Ginter, Antti
 Airola, Tapio Pahikkala, Tapio Salakoski
 BioNLP 2009 Workshop

Farzaneh Sarafraz
18 June 2009
                           
BioNLP'09 Task 1
       Events in abstracts
       Given: gene and gene products (proteins)
       Wanted: events
        −   type
        −   trigger
        −   participant(s)
        −   cause (if applicable)

                                     
Example
    "I kappa B/MAD­3 masks the nuclear localization 
      signal of NF­kappa B p65 and requires the 
      transactivation domain to inhibit NF­kappa B 
      p65 DNA binding. "


    Event: negative regulation
    Trigger: masks
    Theme1: the first p65
    Cause: MAD­3


                             
Event Types
       Gene expression             Binding
       Transcription               Regulation
       Protein Catabolism          Positive regulation
       Localisation                Negative regulation
       Phosphorylation




                              
Training and Test Data
       Training data: 800 abstracts
       Development data: 150 abstracts
       Test data: 260 abstracts




                               
The System
       Trigger recognition
        −   Methods similar to NER
        −   Classification
       Argument detection
        −   Graph edge selection
        −   Classification
       Semantic post­processing
        −   Rule­based
                                    
Trigger Detection
       Token labelling (one for each type and one ­)
       92% of triggers are single token
        −   Adjacent tokens form a trigger if they appear in the 
            training data
       Triggers that share a token:
        −   Combined class: gene expression/pos regulation
       A graph node for each trigger
        −   Not duplicated just yet
                                       
Classification ­ SVM
       Token features
        −   Binary: capitalisation, presence of punctuation or 
            numeric characters
        −   Stem
        −   Character bigrams and trigrams
        −   Token is known triggers in training data
        −   All the above for linear and dependency 
            “neighbours”

                                     
Classification ­ SVM
       Frequency features
        −   # of named entities
                In sentence
                In a linear window around the token
                Bag­of­words count of token texts in the sentence (?)
       Dependency chains
        −   Up to depth of 3 from the token are constructed
        −   At each depth both token and frequency features
        −   Plus dep type and sequence of dep types in chain
                                         
Two SVMs
       “Somewhat”  different feature sets
       Combined weighted results



    “This design should be considered an artifact of 
      the time­constrained, experiment­driven 
      development of the system rather than a 
      principled design”

                               
Precision/Recall trade­off
       Undetected trigger ­­> undetected event
       All triggers have events in the training data ­­> 
        bias towards reporting an event for all detected 
        triggers
       Adjust P/R explicitly 
        −   multiply the negative class by β
        −   find β experimentally


                                     
Edge Detection
       Multi­class SVM
       All potential directed edges
        −   Event node to named entity
        −   Event node to event node (nested event)
        −   Labelled as theme, cause, or negative
       Each edge is predicted independently



                                   
Feature Set – Central Concept

    Shortest undirected 
     path of syntactic 
     dependencies in the 
     Stanford scheme 
     parse of the 
     sentence.




                             
Feature Set
       Token text, POS, entity/event class, 
        dependency (subject)
       N­grams: merging the attributes of 2­4
        −   Consecutive tokens
        −   Consecutive dependencies
        −   Each token and two neighbouring dependencies
        −   Each dependency and two neighbouring tokens
        −   One bigram showing direction
                                  
Other Features
       Individual component features
       Semantic node features
       Frequency features




                              
Semantic Post­Processing
       Duplicate nodes
        −   Same class and same trigger
        −   Combined trigger
       Remove improper arguments
       Remove directed cycles by removing the 
        weakest link



                                  
Duplicating Event Nodes
       Task restrictions
        −   Two causes,
        −   must have theme,
        −   etc.
       Several heuristics
       x­th first dependency 
        in shortest path from 
        the event for binding
                                  
Results




           
Compared to Us




                  
What Didn't Work/Wasn't Tried
       CRF
       HMM
       Removing strong independence assumption
       Co­reference resolution (4.8%)




                               
End.




        

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (11)

Language
LanguageLanguage
Language
 
Six Month
Six MonthSix Month
Six Month
 
Nacsa úJ 4.1 Jav.
Nacsa úJ 4.1 Jav.Nacsa úJ 4.1 Jav.
Nacsa úJ 4.1 Jav.
 
Workshop negations
Workshop negationsWorkshop negations
Workshop negations
 
Edu2
Edu2Edu2
Edu2
 
Eoy
EoyEoy
Eoy
 
the_life_cycle_of_a_wireframe
the_life_cycle_of_a_wireframethe_life_cycle_of_a_wireframe
the_life_cycle_of_a_wireframe
 
I2b209
I2b209I2b209
I2b209
 
Defense
DefenseDefense
Defense
 
Olivia Contradictions
Olivia ContradictionsOlivia Contradictions
Olivia Contradictions
 
Ambiguity
AmbiguityAmbiguity
Ambiguity
 

Ähnlich wie BioNLP09 Winners

BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS
 
Workshop NGS data analysis - 3
Workshop NGS data analysis - 3Workshop NGS data analysis - 3
Workshop NGS data analysis - 3Maté Ongenaert
 
Machine reading for cancer biology
Machine reading for cancer biologyMachine reading for cancer biology
Machine reading for cancer biologyLaura Berry
 
BlueHat v18 || Protecting the protector, hardening machine learning defenses ...
BlueHat v18 || Protecting the protector, hardening machine learning defenses ...BlueHat v18 || Protecting the protector, hardening machine learning defenses ...
BlueHat v18 || Protecting the protector, hardening machine learning defenses ...BlueHat Security Conference
 
Advances in Bayesian Learning
Advances in Bayesian LearningAdvances in Bayesian Learning
Advances in Bayesian Learningbutest
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Maté Ongenaert
 
Deep learning notes.pptx
Deep learning notes.pptxDeep learning notes.pptx
Deep learning notes.pptxPandi Gingee
 
Machine learning in computer security
Machine learning in computer securityMachine learning in computer security
Machine learning in computer securityKishor Datta Gupta
 
Automatic test packet generation
Automatic test packet generationAutomatic test packet generation
Automatic test packet generationtusharjadhav2611
 
CUHK System for the Spoken Web Search task at Mediaeval 2012
CUHK System for the Spoken Web Search task at Mediaeval 2012CUHK System for the Spoken Web Search task at Mediaeval 2012
CUHK System for the Spoken Web Search task at Mediaeval 2012MediaEval2012
 
Performance Metrics and Figures of Merit Working Group Summary Aug2012
Performance Metrics and Figures of Merit Working Group Summary Aug2012Performance Metrics and Figures of Merit Working Group Summary Aug2012
Performance Metrics and Figures of Merit Working Group Summary Aug2012GenomeInABottle
 
Temporal Hypermap Theory and Application
Temporal Hypermap Theory and ApplicationTemporal Hypermap Theory and Application
Temporal Hypermap Theory and ApplicationAbel Nyamapfene
 
Instruction level power analysis
Instruction level power analysisInstruction level power analysis
Instruction level power analysisRadhegovind
 
Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...
Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...
Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...Priyanka Aash
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 

Ähnlich wie BioNLP09 Winners (20)

BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
 
BioWeka
BioWekaBioWeka
BioWeka
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Workshop NGS data analysis - 3
Workshop NGS data analysis - 3Workshop NGS data analysis - 3
Workshop NGS data analysis - 3
 
Machine reading for cancer biology
Machine reading for cancer biologyMachine reading for cancer biology
Machine reading for cancer biology
 
BlueHat v18 || Protecting the protector, hardening machine learning defenses ...
BlueHat v18 || Protecting the protector, hardening machine learning defenses ...BlueHat v18 || Protecting the protector, hardening machine learning defenses ...
BlueHat v18 || Protecting the protector, hardening machine learning defenses ...
 
Advances in Bayesian Learning
Advances in Bayesian LearningAdvances in Bayesian Learning
Advances in Bayesian Learning
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Workshop NGS data analysis - 2
Workshop NGS data analysis - 2
 
Deep learning notes.pptx
Deep learning notes.pptxDeep learning notes.pptx
Deep learning notes.pptx
 
Machine learning in computer security
Machine learning in computer securityMachine learning in computer security
Machine learning in computer security
 
Automatic test packet generation
Automatic test packet generationAutomatic test packet generation
Automatic test packet generation
 
CUHK System for the Spoken Web Search task at Mediaeval 2012
CUHK System for the Spoken Web Search task at Mediaeval 2012CUHK System for the Spoken Web Search task at Mediaeval 2012
CUHK System for the Spoken Web Search task at Mediaeval 2012
 
sequencea.ppt
sequencea.pptsequencea.ppt
sequencea.ppt
 
Performance Metrics and Figures of Merit Working Group Summary Aug2012
Performance Metrics and Figures of Merit Working Group Summary Aug2012Performance Metrics and Figures of Merit Working Group Summary Aug2012
Performance Metrics and Figures of Merit Working Group Summary Aug2012
 
Temporal Hypermap Theory and Application
Temporal Hypermap Theory and ApplicationTemporal Hypermap Theory and Application
Temporal Hypermap Theory and Application
 
Instruction level power analysis
Instruction level power analysisInstruction level power analysis
Instruction level power analysis
 
Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...
Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...
Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...
 
Thesis proposal
Thesis proposalThesis proposal
Thesis proposal
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 

Kürzlich hochgeladen

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

BioNLP09 Winners

  • 2. BioNLP'09 Task 1  Events in abstracts  Given: gene and gene products (proteins)  Wanted: events − type − trigger − participant(s) − cause (if applicable)    
  • 3. Example "I kappa B/MAD­3 masks the nuclear localization  signal of NF­kappa B p65 and requires the  transactivation domain to inhibit NF­kappa B  p65 DNA binding. " Event: negative regulation Trigger: masks Theme1: the first p65 Cause: MAD­3    
  • 4. Event Types  Gene expression  Binding  Transcription  Regulation  Protein Catabolism  Positive regulation  Localisation  Negative regulation  Phosphorylation    
  • 5. Training and Test Data  Training data: 800 abstracts  Development data: 150 abstracts  Test data: 260 abstracts    
  • 6. The System  Trigger recognition − Methods similar to NER − Classification  Argument detection − Graph edge selection − Classification  Semantic post­processing − Rule­based    
  • 7. Trigger Detection  Token labelling (one for each type and one ­)  92% of triggers are single token − Adjacent tokens form a trigger if they appear in the  training data  Triggers that share a token: − Combined class: gene expression/pos regulation  A graph node for each trigger − Not duplicated just yet    
  • 8. Classification ­ SVM  Token features − Binary: capitalisation, presence of punctuation or  numeric characters − Stem − Character bigrams and trigrams − Token is known triggers in training data − All the above for linear and dependency  “neighbours”    
  • 9. Classification ­ SVM  Frequency features − # of named entities  In sentence  In a linear window around the token  Bag­of­words count of token texts in the sentence (?)  Dependency chains − Up to depth of 3 from the token are constructed − At each depth both token and frequency features − Plus dep type and sequence of dep types in chain    
  • 10. Two SVMs  “Somewhat”  different feature sets  Combined weighted results “This design should be considered an artifact of  the time­constrained, experiment­driven  development of the system rather than a  principled design”    
  • 11. Precision/Recall trade­off  Undetected trigger ­­> undetected event  All triggers have events in the training data ­­>  bias towards reporting an event for all detected  triggers  Adjust P/R explicitly  − multiply the negative class by β − find β experimentally    
  • 12. Edge Detection  Multi­class SVM  All potential directed edges − Event node to named entity − Event node to event node (nested event) − Labelled as theme, cause, or negative  Each edge is predicted independently    
  • 13. Feature Set – Central Concept Shortest undirected  path of syntactic  dependencies in the  Stanford scheme  parse of the  sentence.    
  • 14. Feature Set  Token text, POS, entity/event class,  dependency (subject)  N­grams: merging the attributes of 2­4 − Consecutive tokens − Consecutive dependencies − Each token and two neighbouring dependencies − Each dependency and two neighbouring tokens − One bigram showing direction    
  • 15. Other Features  Individual component features  Semantic node features  Frequency features    
  • 16. Semantic Post­Processing  Duplicate nodes − Same class and same trigger − Combined trigger  Remove improper arguments  Remove directed cycles by removing the  weakest link    
  • 17. Duplicating Event Nodes  Task restrictions − Two causes, − must have theme, − etc.  Several heuristics  x­th first dependency  in shortest path from  the event for binding    
  • 20. What Didn't Work/Wasn't Tried  CRF  HMM  Removing strong independence assumption  Co­reference resolution (4.8%)    
  • 21. End.