SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
BIONLP'09 Shared Task



 Farzaneh Sarafraz
 James Eales
 Reza Mohammadi
 Goran Nenadic
26 March 2009
                      
BioNLP'09 Task 1

        Events in abstracts
    



        Given: gene and gene products (proteins)
    



        Wanted: events
    



            type
        −

            trigger
        −

            participant(s)
        −

            cause (if applicable)
        −



                                     
Example
    quot;I kappa B/MAD­3 masks the nuclear localization 
      signal of NF­kappa B p65 and requires the 
      transactivation domain to inhibit NF­kappa B 
      p65 DNA binding. quot;


    Event: negative regulation
    Trigger: masks
    Theme1: the first p65
    Cause: MAD­3


                             
Event Types

        Gene expression              Binding
                                



        Transcription                Regulation
                                



        Protein Catabolism           Positive regulation
                                



        Localisation                 Negative regulation
                                



        Phosphorylation
    




                              
Training and Test Data

        Training data: 800 abstracts
    



        Development data: 150 abstracts
    



        Test data: 260 abstracts
    




                               
Our System

    1) Finding trigger and type
    2) Finding participants (themes)
    3) Post processing




                             
1) Finding Triggers and Types ­ CRF
quot;I kappa B/MAD­3 masks the nuclear localization...quot; 
  0   0   0  0      9    0     0          0


quot;The binding of I kappa B/MAD­3 to NF­kappa B p65 is 
  0      0    0 0    0  0   0    0     0    0  0   0
sufficient to retarget NF­kappa B p65 from the
    0       0     4        0    0   0   0    0
nucleus to the cytoplasm.quot;
   0     0   0      0


9: negative regulation
4: localisation
                          
CRF features for each token

        is­protein
    



        is­PPI­word
    



        generic POS tag
    



        log­frequency of token being a trigger for each 
    


        event type (10 features)
        number of proteins in sentence (sentence­level)
    




                                
Trigger Detection Post Processing

        Positive discrimination
    



            Manually looking at false negatives
        −

            Adding recurring triggers
        −

        Negative discrimination
    



            Manually looking at false positives
        −

            Filtering out common mistaken tokens
        −




                                    
Trigger Detection Results

     Event Class #Gold         R      P     F­score
     Localisation        40    77.5   47.69 59.05
     Binding            180   33.33   54.55 41.38
     Gene expression 282       76.6   58.54 66.36
     Transcription       68   58.82     18.6 28.27
                                      88.89 86.49
     Protein catabolism 19    84.21
                               97.5   81.25 88.64
     Phosphorylation 40
     Non­reg total      629   63.91   48.73    55.3
     Regulation         138   13.04   62.07 21.56
     Positive regulation462   13.85   54.24 22.07
     Neg. regulation 153      29.41   45.92 35.86
     All total         1382   38.28   49.44 43.15

                                  
2) Finding Participants

        Type and number of participants
    



            1 theme (protein)
        −                                     1 theme and 1 cause 
                                          −
                                              (proteins/other events)
                 Gene expression
             



                 Transcription                     Regulation
                                              



                 Protein Catabolism                Positive regulation
                                              



                 Localisation                      Negative regulation
                                              



                 Phosphorylation
             



            1 or more themes (protein)
        −

                 Binding
             



                                       
Parse Tree Distance




                   
Parse Tree Distance Analysis




                   
Theme in Subtree

        Single Theme events
    



            Theme in subtree  0.7054
        −

            Theme not in subtree  0.2946
        −

        Binding event
    



            Any theme in subtree = 0.5435
        −

            Any theme not in subtree = 0.4565
        −

        Regulation events
    



            Either theme or cause in subtree = 0.5919
        −
                                   
            Either theme or cause not in subtree = 0.4081
        −
Distance in Trigger Subtree




                    
Distances not in Trigger Subtree




                    
Rules Concerning Parse Tree Analysis

        For quot;bindingquot;, report as themes:
    



            up to the second closest protein in the subtree
        −

            and the first closest protein in the rest of the tree
        −


            quot;In contrast, gp41 failed to stimulate NF­kappaB 
            binding activity in as much as no NF­kappaB bound to 
            the main NF­kappaB­binding site 2 of the IL­10 
            promoter after addition of gp41.quot;


        Successfully missing out the final 
    


        gp41.
                                      
Example of a Missed (FN) Theme

        For gene expression
    



            All the proteins in the subtree are reported as 
        −
            themes
        quot;The 15­lipoxygenase (lox) gene is expressed in a 
          tissue­specific manner, predominantly in 
          erythroid cells but also in airway epithelial 
          cells and eosinophils.quot;
                        is
                       /   
                   gene   expressed
                     |
                                     
             15­lipoxygenase
Evaluation on Development Data

      Event Class        #Gold         R       P     F­score
      Localisation          53       67.92   46.75    55.38
      Binding              312       21.47   63.81    32.13
      Gene expression      356       64.61   76.33    69.98
                                             89.8
      Transcription         82       53.66            67.18
                                                      77.55
      Protein catabolism    21       90.48   67.86
                                     91.49
      Phosphorylation       47               53.09    67.19
      Non­reg total        871        50.4   68.44    58.05
      Regulation           172        5.23   33.33    9.05
      Positive regulation 632         3.48   21.36    5.99
      Neg. regulation      201        9.45   15.08    11.62
      Regulatory total    1005        4.98   19.53    7.93
      All total           1876       26.07   54.46    35.26
                                  
Evaluation on Test Data

      Event Class         #Gold R       P F­score
      Localisation          174 44.83 53.06 48.6
      Binding               347 12.68 40.37 19.3
                            722 52.63 69.34 59.84
      Gene expression
      Transcription         137 15.33 67.74  25
      Protein catabolism     14 42.86  50   46.15
                            135 78.52 53.81 63.86
      Phosphorylation
      Non­reg total        1529 41.53 60.82 49.36
      Regulation            291 3.09 19.15    5.33

      Positive regulation   983 1.12 8.87 1.99
      Neg. regulation       379 12.4 20.52 15.46
      Regulatory total     1653 4.05 16.75 6.53
      All total            3182 22.06 48.61 30.35
                             
Results: Ranked 12 out of 24 teams

Rank     R       P     F­Score       Rank     R       P     F­Score
1      46.73   58.48    51.95        13     25.96   36.26    30.26
2      45.82   47.52    46.66        14     20.93   49.3     29.38
3      34.98   61.59    44.62        15     22.69   40.55     29.1
4      36.9    55.59    44.35        16     21.53   36.99    27.21
5      33.41   51.55    40.54        17     17.44   39.99    24.29
6      28.13   53.56    36.88        18     28.63   20.88    24.15
7      28.22   45.78    34.92        19     13.45   71.81    22.66
8      27.75   46.6     34.78        20     22.78   19.03    20.74
9      21.62   62.21    32.09        21     30.42   14.11    19.28
10     21.12   56.9      30.8        22     11.25   66.54    19.25
11     22.5    47.7     30.58        23     11.69   31.42    17.04
12     22.06   48.61    30.35        24      9.4    61.65    16.31
                                  
End.




        
Other Tasks

        Event detection and characterization
    



        Event argument recognition
    



        Negations and speculations
    




                               
Example
    quot;I kappa B/MAD­3 masks the nuclear localization 
      signal of NF­kappa B p65 and requires the 
      transactivation domain to inhibit NF­kappa B 
      p65 DNA binding. quot;


    Event: negative regulation
    Trigger: masks
    Theme1: the first p65
    Cause: MAD­3
    Site: nuclear localization signal

                             
Example
    quot;In contrast, NF­kappa B p50 alone fails to 
      stimulate kappa B­directed transcription, and 
      based on prior in vitro studies, is not 
      directly regulated by I kappa B. quot;


    Event: regulation
    Theme1: this p50
    Trigger: regulated
    Negation: true for this event
    Speculation: none

                             

Weitere ähnliche Inhalte

Andere mochten auch (11)

the_life_cycle_of_a_wireframe
the_life_cycle_of_a_wireframethe_life_cycle_of_a_wireframe
the_life_cycle_of_a_wireframe
 
Six Month
Six MonthSix Month
Six Month
 
Eoy
EoyEoy
Eoy
 
Tinsleys 7 Accomplishments
Tinsleys 7 AccomplishmentsTinsleys 7 Accomplishments
Tinsleys 7 Accomplishments
 
Rosario Hearst
Rosario HearstRosario Hearst
Rosario Hearst
 
Edu
EduEdu
Edu
 
BioNLP09 Winners
BioNLP09 WinnersBioNLP09 Winners
BioNLP09 Winners
 
Language
LanguageLanguage
Language
 
Defense
DefenseDefense
Defense
 
Olivia Contradictions
Olivia ContradictionsOlivia Contradictions
Olivia Contradictions
 
Ambiguity
AmbiguityAmbiguity
Ambiguity
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Bionlp09

  • 1. BIONLP'09 Shared Task Farzaneh Sarafraz James Eales Reza Mohammadi Goran Nenadic 26 March 2009    
  • 2. BioNLP'09 Task 1 Events in abstracts  Given: gene and gene products (proteins)  Wanted: events  type − trigger − participant(s) − cause (if applicable) −    
  • 3. Example quot;I kappa B/MAD­3 masks the nuclear localization  signal of NF­kappa B p65 and requires the  transactivation domain to inhibit NF­kappa B  p65 DNA binding. quot; Event: negative regulation Trigger: masks Theme1: the first p65 Cause: MAD­3    
  • 4. Event Types Gene expression Binding   Transcription Regulation   Protein Catabolism Positive regulation   Localisation Negative regulation   Phosphorylation     
  • 5. Training and Test Data Training data: 800 abstracts  Development data: 150 abstracts  Test data: 260 abstracts     
  • 6. Our System 1) Finding trigger and type 2) Finding participants (themes) 3) Post processing    
  • 7. 1) Finding Triggers and Types ­ CRF quot;I kappa B/MAD­3 masks the nuclear localization...quot;  0   0   0  0      9    0     0          0 quot;The binding of I kappa B/MAD­3 to NF­kappa B p65 is  0      0    0 0    0  0   0    0     0    0  0   0 sufficient to retarget NF­kappa B p65 from the   0       0     4        0    0   0   0    0 nucleus to the cytoplasm.quot;  0     0   0      0 9: negative regulation 4: localisation    
  • 8. CRF features for each token is­protein  is­PPI­word  generic POS tag  log­frequency of token being a trigger for each   event type (10 features) number of proteins in sentence (sentence­level)     
  • 9. Trigger Detection Post Processing Positive discrimination  Manually looking at false negatives − Adding recurring triggers − Negative discrimination  Manually looking at false positives − Filtering out common mistaken tokens −    
  • 10. Trigger Detection Results Event Class #Gold R P F­score Localisation 40 77.5 47.69 59.05 Binding 180 33.33 54.55 41.38 Gene expression 282 76.6 58.54 66.36 Transcription 68 58.82 18.6 28.27 88.89 86.49 Protein catabolism 19 84.21 97.5 81.25 88.64 Phosphorylation 40 Non­reg total 629 63.91 48.73 55.3 Regulation 138 13.04 62.07 21.56 Positive regulation462 13.85 54.24 22.07 Neg. regulation 153 29.41 45.92 35.86 All total 1382 38.28 49.44 43.15    
  • 11. 2) Finding Participants Type and number of participants  1 theme (protein) − 1 theme and 1 cause  − (proteins/other events) Gene expression  Transcription Regulation   Protein Catabolism Positive regulation   Localisation Negative regulation   Phosphorylation  1 or more themes (protein) − Binding     
  • 14. Theme in Subtree Single Theme events  Theme in subtree  0.7054 − Theme not in subtree  0.2946 − Binding event  Any theme in subtree = 0.5435 − Any theme not in subtree = 0.4565 − Regulation events  Either theme or cause in subtree = 0.5919 −     Either theme or cause not in subtree = 0.4081 −
  • 17. Rules Concerning Parse Tree Analysis For quot;bindingquot;, report as themes:  up to the second closest protein in the subtree − and the first closest protein in the rest of the tree − quot;In contrast, gp41 failed to stimulate NF­kappaB  binding activity in as much as no NF­kappaB bound to  the main NF­kappaB­binding site 2 of the IL­10  promoter after addition of gp41.quot; Successfully missing out the final   gp41.    
  • 18. Example of a Missed (FN) Theme For gene expression  All the proteins in the subtree are reported as  − themes quot;The 15­lipoxygenase (lox) gene is expressed in a  tissue­specific manner, predominantly in  erythroid cells but also in airway epithelial  cells and eosinophils.quot;                 is                /               gene   expressed              |          15­lipoxygenase
  • 19. Evaluation on Development Data Event Class #Gold R P F­score Localisation 53 67.92 46.75 55.38 Binding 312 21.47 63.81 32.13 Gene expression 356 64.61 76.33 69.98 89.8 Transcription 82 53.66 67.18 77.55 Protein catabolism 21 90.48 67.86 91.49 Phosphorylation 47 53.09 67.19 Non­reg total 871 50.4 68.44 58.05 Regulation 172 5.23 33.33 9.05 Positive regulation 632 3.48 21.36 5.99 Neg. regulation 201 9.45 15.08 11.62 Regulatory total 1005 4.98 19.53 7.93 All total 1876 26.07 54.46 35.26    
  • 20. Evaluation on Test Data Event Class #Gold R P F­score Localisation 174 44.83 53.06 48.6 Binding 347 12.68 40.37 19.3 722 52.63 69.34 59.84 Gene expression Transcription 137 15.33 67.74 25 Protein catabolism 14 42.86 50 46.15 135 78.52 53.81 63.86 Phosphorylation Non­reg total 1529 41.53 60.82 49.36 Regulation 291 3.09 19.15  5.33 Positive regulation 983 1.12 8.87 1.99 Neg. regulation 379 12.4 20.52 15.46 Regulatory total 1653 4.05 16.75 6.53 All total 3182 22.06 48.61 30.35    
  • 21. Results: Ranked 12 out of 24 teams Rank R P F­Score Rank R P F­Score 1 46.73 58.48 51.95 13 25.96 36.26 30.26 2 45.82 47.52 46.66 14 20.93 49.3 29.38 3 34.98 61.59 44.62 15 22.69 40.55 29.1 4 36.9 55.59 44.35 16 21.53 36.99 27.21 5 33.41 51.55 40.54 17 17.44 39.99 24.29 6 28.13 53.56 36.88 18 28.63 20.88 24.15 7 28.22 45.78 34.92 19 13.45 71.81 22.66 8 27.75 46.6 34.78 20 22.78 19.03 20.74 9 21.62 62.21 32.09 21 30.42 14.11 19.28 10 21.12 56.9 30.8 22 11.25 66.54 19.25 11 22.5 47.7 30.58 23 11.69 31.42 17.04 12 22.06 48.61 30.35 24 9.4 61.65 16.31    
  • 22. End.    
  • 23. Other Tasks Event detection and characterization  Event argument recognition  Negations and speculations     
  • 24. Example quot;I kappa B/MAD­3 masks the nuclear localization  signal of NF­kappa B p65 and requires the  transactivation domain to inhibit NF­kappa B  p65 DNA binding. quot; Event: negative regulation Trigger: masks Theme1: the first p65 Cause: MAD­3 Site: nuclear localization signal    
  • 25. Example quot;In contrast, NF­kappa B p50 alone fails to  stimulate kappa B­directed transcription, and  based on prior in vitro studies, is not  directly regulated by I kappa B. quot; Event: regulation Theme1: this p50 Trigger: regulated Negation: true for this event Speculation: none