SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
Categorizing Epistemic
                               Segment Types in Biology
                                  Research Articles

                                       Anita de Waard
                                  Elsevier Labs, Amsterdam
                                 UiL-OTS, Utrecht University




Thursday, September 17, 2009                                   1
Introduction




Thursday, September 17, 2009                  2
Why Study Biological Discourse?

                    -      There is too much of it!

                    -      Text mining and ‘fact
                           extraction’ techniques are
                           gaining ground to tame this
                           tangle

                    -      Emerging area of biological
                           natural language processing
                           (BioNLP): subfield of computational linguistics

                    -      Main focus: identifying biological entities (genes,
                           proteins, drugs) and their relationships



Thursday, September 17, 2009                                                     3
Example state of the art: MEDIE

                                       without some idea of the status of the
                                        sentence, it cannot be interpreted!

    Alteration of nm23, P53, and S100A4 expression may
    contribute to the development of gastric



                     Previous studies have implicated miR-34a as a tumor
                     suppressor gene whose transcription is activated by p53.




Thursday, September 17, 2009                                                    4
How can linguistics help?
             Underlying model of text mining systems:

                      -        Scientific paper is ‘statement of pertinent facts’
                      -        So: finding entities and relationships will give you a summary of
                               the knowledge within the paper
                      -        However, information extracted this way is not very useful....
             Proposed approach: treat scientific paper as a persuasive text: specific
             genre, with genre characteristics and allowed persuasive techniques:
                      -        ‘these results suggest’ (depersonification)

                      -        ‘as fig. 2a shows’ (evidence is in the data)

                      -        ‘oncogenes produce a stress response [Serrano, 2003]’

             References and data form a “folded array of successive defense lines, behind
             which scientists ensconce themselves” [Latour, 1988]




Thursday, September 17, 2009                                                                      5
Modality Dropping
                    -      Fact creation occurs through social acceptance: “[Y]ou can
                           transform .. fiction into fact just by adding or subtracting
                           references” [Latour, 1988]

                    -      When references are cited the modality is dropped:

                          -    A: ‘these results suggest/demonstate/imply that’ X

                          -    B: ‘A et al. have shown that X [A, 2009]’

                          -    C: ‘X [2009]’

                          -    D: ‘Since X, we investigated the possibility that Y’




Thursday, September 17, 2009                                                             6
Overall Research Questions
              I. (How) can we add epistemic value to results from a
                 text mining system?
              II. How is a scientific fact created, as it moves from a
                  hedged claim to a throughout successive citations?
              III. Can we identify a rhetorically successful text (and
                   help authors create them)?




Thursday, September 17, 2009                                             7
Present work:
            Perform discourse analysis on a few selected texts in
            biology:
            1. Parse text into discourse segments (edu’s) containing a
               single rhetorical move (if possible...)
            2. Determine categories or types of discourse segments
               that have similar rhetorical/pragmatic properties
            3. Look at a number of linguistic characteristics and see if
               these segments share those characteristics.




Thursday, September 17, 2009                                               8
Present research questions:

          i. Can these segments indeed be grouped by linguistic
             characteristics (verb tense, verb registry, metadiscourse
             markers?)
          ii. Does this offer a useful version of the structure of a
              paper?
          iii. Is this useful for enabling automated epistemic markup?
          iv. Can this help us to trace evolution of a hypothesis?




Thursday, September 17, 2009                                             9
Methods




Thursday, September 17, 2009             10
Method
            1. Parse text into Discourse Segments (EDUs) according to
               syntactic criteria
            2. Define set of semantic segment types
            3. Identify semantic type for each segment
            4. Specify linguistic and structural properties for each
               segment
            5. Identify correlations between semantic type and
               structural/syntactic properties
            6. Trace a hypothesis through the process of fact creation




Thursday, September 17, 2009                                             11
Segmentation Criteria
        Goal: ‘one new thought per segment’:
                Figure 4A shows that following RASV12 stimulation, p53
                was stabilized and activated, and its target gene, p21cip1,
                was induced in all cases, indicating an intact p53 pathway
                in these cells.
                a.     Figure 4a shows that
                b.     following RASV12 stimulation
                c.     p53 was stabilized and activated
                d.     and the target gene, p21cip1, was induced in all cases,
                e.     indicating an intact p53 pathway in these cells.



Thursday, September 17, 2009                                                     12
Segmentation Criteria (summary)
                Finite/
                                        Grammatical role                 Segment?                      Example
               Non-finite

                                                                                    The extent to which miRNAs specifically affect
             Finite/Non-finite                  Subject                      N       metastasis

             Finite/Non-finite               Direct Object                   Y       these miRNAs are potential novel oncogenes

                                 Phrase-level adjunct (restrictive and
                 Nonfinite                                                   N       spanning a given miRNA genomic region
                                           non-restrictive)

                 Nonfinite                Clause-level adjunct               Y       by cloning eight miR-Vec plasmids


                                                                                    which is only active when tamoxifen is added (De
                    Finite       Non-restrictive Phrase-level adjunct       Y       Vita et al, 2005) […]


                    Finite         Restrictive Phrase-level adjunct         N       that we examined

                                                                                    which correlates with the reported ES-cell
                    Finite               Clause-level adjunct               Y       expression pattern of the miR-371-3 cluster (Suh et
                                                                                    al, 2004)




Thursday, September 17, 2009                                                                                                              13
Basic Segment Types
                         Segment               Description                                     Example

                                          a known fact, generally
                                Fact                                     mature miR-373 is a homolog of miR-372
                                         without explicit citation

                                            a proposed idea, not
                         Hypothesis                                      This could for instance be a result of high mdm2 levels
                                           supported by evidence

                                        unresolved, contradictory, or However, further investigation is required to
                           Problem
                                               unclear issue          demonstrate the exact mechanism of LATS2 action


                               Goal             research goal            To identify novel functions of miRNAs,


                           Method           experimental method          Using fluorescence microscopy and luciferase assays,

                                        a restatement of the outcome all constructs yielded high expression levels of mature
                               Result
                                               of an experiment      miRNAs

                                          an interpretation of the
                                                                         our procedure is sensitive enough to detect mild growth
                         Implication      results, in light of earlier
                                            hypotheses and facts         differences




Thursday, September 17, 2009                                                                                                       14
Two Types of Derived Segment Types
                ‘Other-segments’, related to (referenced) other work:

                -      other-result: ‘they are also found in the FCX and other cortical structures
                       ([Sokoloff et al., 1990]’

                -      other-goal: ‘the role of D3 receptors in the control of motivation and affect
                       has been intensively studied [Heidbreder et al., 2005]’

                -      other-implication: ‘D1 or, more likely, D5, receptors have been implicated in
                       mechanisms underlying long-term spatial memory [Hersi et al., 1995]’

                Regulatory segments, acting as matrix sentences framing other segments:

                -      reg-hypothesis: ‘we hypothesized that ’

                -      reg-implication: ‘These observations suggest that’

                -      intratextual: ‘Fig 4 shows that’

                -      intertextual: ‘reviewed in (Serrano, 1997)’




Thursday, September 17, 2009                                                                           15
My categories vs. Latour (1979)




Thursday, September 17, 2009                       16
Linguistic and structural properties
                        1. Position in text

                               -   Section of the paper (Introduction, Results, Discussion)
                               -   Beginning/middle/end of section
                               -   First/second third part of sentence
                        2. Verb:

                               -   Tense, aspect, voice
                               -   Verb class (idiosyncratic)
                               -   Lexicon

                        3. Metadiscourse markers [Hyland, 2003]:

                               -   Connectives
                               -   Endophorics, Evidentials
                               -   Hedges, Boosters
                               -   Person markers


Thursday, September 17, 2009                                                                  17
Verb class
    Two types of entities interact in biology texts:
    -       Thing:
              -       Thing -> Increase, die, etc
              -       Thing-thing: affect, stimulate etc.
    -       People:
              -       People -> Thing:
                    -          Examine (Goal)
                    -          Operate (Method)

                    -          Observe (Result)
                    -          Implicate (Implication)
              -       People - people: Report



Thursday, September 17, 2009                                   18
Results




Thursday, September 17, 2009             19
Two texts
                    1. Voorhoeve, 2006: Cell

                          -    Cell biology text, written by group in Amsterdam

                          -    Dealing with microRNAs - hot topic

                          -    290 citations in Google Scholar: succesful paper!


                    2. Louiseau, 2008: European Neuropsychopharmacology

                          -    Text on schizophrenia

                          -    Prompted by interest from Pharma company

                          -    Adjacent subfield of biology (neuropharmacology)




Thursday, September 17, 2009                                                       20
Segment vs. Section




Thursday, September 17, 2009                         21
Segment vs.Verb Type




Thursday, September 17, 2009                          22
Segment vs. verb tense




Thursday, September 17, 2009                            23
Segments vs. markers




Thursday, September 17, 2009                          24
Segment Order




Thursday, September 17, 2009                   25
Discussion




Thursday, September 17, 2009                26
Interpretation: 3 Realms of Science:
                                    (1) Oncogene-induced senescence is            (4b) transduction with either
       Conceptual                   characterized by the appearance of            miR-Vec-371&2 or miR-Vec-
                                                                                                     V12
                                    cells with a flat morphology that             373 prevents RAS -
         realm                      express senescence associated (SA)-           induced growth arrest in
                                     -Galactosid a s e .                          primary human cells.


                                         (2a) Indeed,              (4a) Altogether, these data
                                                                   show that

Experimental realm (2b) control RAS                 V12
                                                       -arrested                     (3b) very few cells showed
                                    cells showed relatively high                     senescent morphology when
                                                                   (3a) Consistent
                                    abundance of flat cells                          transduced with either miR-
                                                                   with the cell
                                    expressing SA- -                                 Vec-371&2, miR-Vec-373, or
                                                                   growth assay,                 kd
                                    Galactosidase                                    control p53 .

                                           (2c) (Figures
                                           2G and 2H).
             Data realm
                                            (Figures)




Thursday, September 17, 2009                                                                                       27
Tense 1: Concepts vs. Experiment
                               (1) Oncogene-induced senescence is            (4b) transduction with either




                                                                                                              Concept realm
                               characterized by the appearance of            miR-Vec-371&2 or miR-Vec-
                                                                                                V12
                               cells with a flat morphology that             373 prevents RAS -
                               express senescence associated (SA)-           induced growth arrest in
                                -Galactosid a s e .                          primary human cells.


                                    (2a) Indeed,              (4a) Altogether, these data
                                                              show that




                                                                                                              Experimental realm
                                                                                                              (personal, past)
                                               V12
                               (2b) control RAS -arrested                       (3b) very few cells showed
                               cells showed relatively high                     senescent morphology when
                                                              (3a) Consistent
                               abundance of flat cells                          transduced with either miR-
                                                              with the cell
                               expressing SA- -                                 Vec-371&2, miR-Vec-373, or
                                                              growth assay,                 kd
                               Galactosidase                                    control p53 .

                                      (2c) (Figures
                                      2G and 2H).




                                                                                                              (nontverbal)
                                                                                                              Data realm
                                       (Figures)




Thursday, September 17, 2009                                                                                                       28
Tense 2: Referral

                               past                                present                                  future
                                           Introduction                                        Discussion
               own paper




                                   After     Before current      Current work       After current
                                   other     work: present                            work: past
                                                              (= Results section)
                                   work:
                                   past
               other papers




                              Other Work




Thursday, September 17, 2009                                                                                         29
Tense 1+ 2 = 3:


                                             Claim,
                                              fact
                   Conceptual




                                             Experi
                                             ment
                   Experiential




                                  past     present       future
                                          Reading time




Thursday, September 17, 2009                                      30
Discourse Fact-ory
              hypothetical realm:              hypothesis                                realm of activity:
               (might, would)                                                           (to test, to see)
                                                                         goal
                                                              to
                                      problem                                                 results
                                                                                      we                   realm of
     introduction                                                                   method                experience:
                                                                                                             past
                                                                                     resulting in
                                                                                     result


                                                                                suggests that

                                                            discussion                                 realm of models:
                               fact     fact         fact                                                  present
                                                                                implication



                                 Shared view                               Own view                      discussion

Thursday, September 17, 2009                                                                                              31
Citation and fact creation:                                                                        Yabuta, JBioChem 2007

                                                   Voorhoeve, 2006                                   miR-372 and miR-373 target the
                                                                                                         Lats2 tumor suppressor
             To investigate the possibility that                                                         (Voorhoeve et al., 2006)
            miR-372 and miR-373 suppress the
                expression of LATS2, we...
                                                                                         Raver-Shapira et.al, JMolCell 2007

                                    Therefore, these results point to              two miRNAs, miRNA-372 and-373, function as
                                    LATS2 as a mediator of the miR-372 and        potential novel oncogenes in testicular germ cell
                                    miR-373 effects on cell proliferation and    tumors by inhibition of LATS2 expression, which
                                    tumorigenicity,                                 suggests that Lats2 is an important tumor
                                                                                       suppressor (Voorhoeve et al., 2006).


                                 KnownFact                      KnownFact

 Concepts                                     Hypothesis                 Implication                                   Fact


                                                                                               Goal
                                      Goal


                               Method                  Result                              Method                    Result



                                        Data                                                         Data

      Experiment 1                                                         Experiment 2
Thursday, September 17, 2009                                                                                                          32
Answers to current research questions:
    i.     Can these segments indeed be identified?
                    ✓      yes, adequate evidence, probably ok segments:
                    ‣      need more annotators!
    ii. Does this offer a useful version of the structure of a paper?
                    ✓      yes, offers insight, and a possible model
                    ‣      need to be validated whether this structure holds over more
                           papers, different subcategories
    iii. Is this useful for enabling automated epistemic markup?
                    ✓      first efforts seem promising: simple markers (‘suggest’ verbs,
                           connectives, etc.) already help
                    ‣      ongoing research! (Sandor, XRCE; Buitelaar, DERI)
    iv. Can this help us to trace the evolution of a hypothesis?
                    ✓      anecdotal: promising
                    ‣      need to scale up!


Thursday, September 17, 2009                                                               33
Where are we on overall research questions?
              I. (How) can we add epistemic value to results from a
                 text mining system?
              ‣       Segment types help - need to expand + verify
              II. How is a scientific fact created, as it moves from a
                  hedged claim to a throughout successive citations?
              ‣       Model is developing, also spurt of other work!
              III. Can we identify a rhetorically successful text (and
                   help authors create them)?
              ‣       Not addressed yet - verb tense, hedging seem
                      important.


Thursday, September 17, 2009                                             34
Work on (biological) scientific discourse

                    -      Is a growing field of interest!

                    -      Several projects developing going ‘beyond the facts’

                    -      Epistemic modality is becoming a term
                           bioinformaticians are exploring

                    -      Room for people who know about discourse
                           analysis!




Thursday, September 17, 2009                                                      35

Weitere ähnliche Inhalte

Ähnlich wie Epistemics

Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...
Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...
Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...GigaScience, BGI Hong Kong
 
A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)
A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)
A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)Daisuke BEKKI
 
SBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesSBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesMike Hucka
 
Luciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metricsLuciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metricsJoanne Luciano
 
Luciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metricsLuciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metricsJoanne Luciano
 
Dr.saleem gul assignment summary
Dr.saleem gul assignment summaryDr.saleem gul assignment summary
Dr.saleem gul assignment summaryJaved Riza
 
A Computational Framework for Concept Representation in Cognitive Systems and...
A Computational Framework for Concept Representation in Cognitive Systems and...A Computational Framework for Concept Representation in Cognitive Systems and...
A Computational Framework for Concept Representation in Cognitive Systems and...Antonio Lieto
 
15methods for Qualitative Research
15methods for Qualitative Research15methods for Qualitative Research
15methods for Qualitative Researchmrinalwkh
 
Afl 521 interpretive
Afl 521 interpretiveAfl 521 interpretive
Afl 521 interpretiveRandy Nobleza
 
A model for epistemic modality and knowledge attribution
A model for epistemic modality and knowledge attributionA model for epistemic modality and knowledge attribution
A model for epistemic modality and knowledge attributionAnita de Waard
 
download
downloaddownload
downloadbutest
 
download
downloaddownload
downloadbutest
 
Psychological processes in language acquisition
Psychological processes in language acquisitionPsychological processes in language acquisition
Psychological processes in language acquisitionsrnaz
 
Representation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object modelRepresentation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object modelMihika Shah
 
Philosophy of science summary presentation engelby
Philosophy of science summary presentation engelbyPhilosophy of science summary presentation engelby
Philosophy of science summary presentation engelbyDavid Engelby
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS
 
'These Results Suggest That...', Knowledge Attribution in Scientific Discourse
'These Results Suggest That...', Knowledge Attribution in Scientific Discourse'These Results Suggest That...', Knowledge Attribution in Scientific Discourse
'These Results Suggest That...', Knowledge Attribution in Scientific DiscourseAnita de Waard
 

Ähnlich wie Epistemics (20)

Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...
Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...
Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...
 
A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)
A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)
A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)
 
SBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesSBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resources
 
Luciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metricsLuciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metrics
 
Luciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metricsLuciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metrics
 
Dr.saleem gul assignment summary
Dr.saleem gul assignment summaryDr.saleem gul assignment summary
Dr.saleem gul assignment summary
 
A Computational Framework for Concept Representation in Cognitive Systems and...
A Computational Framework for Concept Representation in Cognitive Systems and...A Computational Framework for Concept Representation in Cognitive Systems and...
A Computational Framework for Concept Representation in Cognitive Systems and...
 
15methods for Qualitative Research
15methods for Qualitative Research15methods for Qualitative Research
15methods for Qualitative Research
 
Prosdocimi ucb cdao
Prosdocimi ucb cdaoProsdocimi ucb cdao
Prosdocimi ucb cdao
 
Afl 521 interpretive
Afl 521 interpretiveAfl 521 interpretive
Afl 521 interpretive
 
A model for epistemic modality and knowledge attribution
A model for epistemic modality and knowledge attributionA model for epistemic modality and knowledge attribution
A model for epistemic modality and knowledge attribution
 
download
downloaddownload
download
 
download
downloaddownload
download
 
Meiosis pdf
Meiosis pdfMeiosis pdf
Meiosis pdf
 
Psychological processes in language acquisition
Psychological processes in language acquisitionPsychological processes in language acquisition
Psychological processes in language acquisition
 
A Bridge Not too Far
A Bridge Not too FarA Bridge Not too Far
A Bridge Not too Far
 
Representation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object modelRepresentation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object model
 
Philosophy of science summary presentation engelby
Philosophy of science summary presentation engelbyPhilosophy of science summary presentation engelby
Philosophy of science summary presentation engelby
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
 
'These Results Suggest That...', Knowledge Attribution in Scientific Discourse
'These Results Suggest That...', Knowledge Attribution in Scientific Discourse'These Results Suggest That...', Knowledge Attribution in Scientific Discourse
'These Results Suggest That...', Knowledge Attribution in Scientific Discourse
 

Mehr von Anita de Waard

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataAnita de Waard
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesAnita de Waard
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Anita de Waard
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data ManagementAnita de Waard
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of PublishingAnita de Waard
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data SharingAnita de Waard
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingAnita de Waard
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataAnita de Waard
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...Anita de Waard
 

Mehr von Anita de Waard (20)

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
History of the future
History of the futureHistory of the future
History of the future
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
 

Kürzlich hochgeladen

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Kürzlich hochgeladen (20)

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Epistemics

  • 1. Categorizing Epistemic Segment Types in Biology Research Articles Anita de Waard Elsevier Labs, Amsterdam UiL-OTS, Utrecht University Thursday, September 17, 2009 1
  • 3. Why Study Biological Discourse? - There is too much of it! - Text mining and ‘fact extraction’ techniques are gaining ground to tame this tangle - Emerging area of biological natural language processing (BioNLP): subfield of computational linguistics - Main focus: identifying biological entities (genes, proteins, drugs) and their relationships Thursday, September 17, 2009 3
  • 4. Example state of the art: MEDIE without some idea of the status of the sentence, it cannot be interpreted! Alteration of nm23, P53, and S100A4 expression may contribute to the development of gastric Previous studies have implicated miR-34a as a tumor suppressor gene whose transcription is activated by p53. Thursday, September 17, 2009 4
  • 5. How can linguistics help? Underlying model of text mining systems: - Scientific paper is ‘statement of pertinent facts’ - So: finding entities and relationships will give you a summary of the knowledge within the paper - However, information extracted this way is not very useful.... Proposed approach: treat scientific paper as a persuasive text: specific genre, with genre characteristics and allowed persuasive techniques: - ‘these results suggest’ (depersonification) - ‘as fig. 2a shows’ (evidence is in the data) - ‘oncogenes produce a stress response [Serrano, 2003]’ References and data form a “folded array of successive defense lines, behind which scientists ensconce themselves” [Latour, 1988] Thursday, September 17, 2009 5
  • 6. Modality Dropping - Fact creation occurs through social acceptance: “[Y]ou can transform .. fiction into fact just by adding or subtracting references” [Latour, 1988] - When references are cited the modality is dropped: - A: ‘these results suggest/demonstate/imply that’ X - B: ‘A et al. have shown that X [A, 2009]’ - C: ‘X [2009]’ - D: ‘Since X, we investigated the possibility that Y’ Thursday, September 17, 2009 6
  • 7. Overall Research Questions I. (How) can we add epistemic value to results from a text mining system? II. How is a scientific fact created, as it moves from a hedged claim to a throughout successive citations? III. Can we identify a rhetorically successful text (and help authors create them)? Thursday, September 17, 2009 7
  • 8. Present work: Perform discourse analysis on a few selected texts in biology: 1. Parse text into discourse segments (edu’s) containing a single rhetorical move (if possible...) 2. Determine categories or types of discourse segments that have similar rhetorical/pragmatic properties 3. Look at a number of linguistic characteristics and see if these segments share those characteristics. Thursday, September 17, 2009 8
  • 9. Present research questions: i. Can these segments indeed be grouped by linguistic characteristics (verb tense, verb registry, metadiscourse markers?) ii. Does this offer a useful version of the structure of a paper? iii. Is this useful for enabling automated epistemic markup? iv. Can this help us to trace evolution of a hypothesis? Thursday, September 17, 2009 9
  • 11. Method 1. Parse text into Discourse Segments (EDUs) according to syntactic criteria 2. Define set of semantic segment types 3. Identify semantic type for each segment 4. Specify linguistic and structural properties for each segment 5. Identify correlations between semantic type and structural/syntactic properties 6. Trace a hypothesis through the process of fact creation Thursday, September 17, 2009 11
  • 12. Segmentation Criteria Goal: ‘one new thought per segment’: Figure 4A shows that following RASV12 stimulation, p53 was stabilized and activated, and its target gene, p21cip1, was induced in all cases, indicating an intact p53 pathway in these cells. a. Figure 4a shows that b. following RASV12 stimulation c. p53 was stabilized and activated d. and the target gene, p21cip1, was induced in all cases, e. indicating an intact p53 pathway in these cells. Thursday, September 17, 2009 12
  • 13. Segmentation Criteria (summary) Finite/ Grammatical role Segment? Example Non-finite The extent to which miRNAs specifically affect Finite/Non-finite Subject N metastasis Finite/Non-finite Direct Object Y these miRNAs are potential novel oncogenes Phrase-level adjunct (restrictive and Nonfinite N spanning a given miRNA genomic region non-restrictive) Nonfinite Clause-level adjunct Y by cloning eight miR-Vec plasmids which is only active when tamoxifen is added (De Finite Non-restrictive Phrase-level adjunct Y Vita et al, 2005) […] Finite Restrictive Phrase-level adjunct N that we examined which correlates with the reported ES-cell Finite Clause-level adjunct Y expression pattern of the miR-371-3 cluster (Suh et al, 2004) Thursday, September 17, 2009 13
  • 14. Basic Segment Types Segment Description Example a known fact, generally Fact mature miR-373 is a homolog of miR-372 without explicit citation a proposed idea, not Hypothesis This could for instance be a result of high mdm2 levels supported by evidence unresolved, contradictory, or However, further investigation is required to Problem unclear issue demonstrate the exact mechanism of LATS2 action Goal research goal To identify novel functions of miRNAs, Method experimental method Using fluorescence microscopy and luciferase assays, a restatement of the outcome all constructs yielded high expression levels of mature Result of an experiment miRNAs an interpretation of the our procedure is sensitive enough to detect mild growth Implication results, in light of earlier hypotheses and facts differences Thursday, September 17, 2009 14
  • 15. Two Types of Derived Segment Types ‘Other-segments’, related to (referenced) other work: - other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’ - other-goal: ‘the role of D3 receptors in the control of motivation and affect has been intensively studied [Heidbreder et al., 2005]’ - other-implication: ‘D1 or, more likely, D5, receptors have been implicated in mechanisms underlying long-term spatial memory [Hersi et al., 1995]’ Regulatory segments, acting as matrix sentences framing other segments: - reg-hypothesis: ‘we hypothesized that ’ - reg-implication: ‘These observations suggest that’ - intratextual: ‘Fig 4 shows that’ - intertextual: ‘reviewed in (Serrano, 1997)’ Thursday, September 17, 2009 15
  • 16. My categories vs. Latour (1979) Thursday, September 17, 2009 16
  • 17. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion) - Beginning/middle/end of section - First/second third part of sentence 2. Verb: - Tense, aspect, voice - Verb class (idiosyncratic) - Lexicon 3. Metadiscourse markers [Hyland, 2003]: - Connectives - Endophorics, Evidentials - Hedges, Boosters - Person markers Thursday, September 17, 2009 17
  • 18. Verb class Two types of entities interact in biology texts: - Thing: - Thing -> Increase, die, etc - Thing-thing: affect, stimulate etc. - People: - People -> Thing: - Examine (Goal) - Operate (Method) - Observe (Result) - Implicate (Implication) - People - people: Report Thursday, September 17, 2009 18
  • 20. Two texts 1. Voorhoeve, 2006: Cell - Cell biology text, written by group in Amsterdam - Dealing with microRNAs - hot topic - 290 citations in Google Scholar: succesful paper! 2. Louiseau, 2008: European Neuropsychopharmacology - Text on schizophrenia - Prompted by interest from Pharma company - Adjacent subfield of biology (neuropharmacology) Thursday, September 17, 2009 20
  • 21. Segment vs. Section Thursday, September 17, 2009 21
  • 22. Segment vs.Verb Type Thursday, September 17, 2009 22
  • 23. Segment vs. verb tense Thursday, September 17, 2009 23
  • 24. Segments vs. markers Thursday, September 17, 2009 24
  • 27. Interpretation: 3 Realms of Science: (1) Oncogene-induced senescence is (4b) transduction with either Conceptual characterized by the appearance of miR-Vec-371&2 or miR-Vec- V12 cells with a flat morphology that 373 prevents RAS - realm express senescence associated (SA)- induced growth arrest in -Galactosid a s e . primary human cells. (2a) Indeed, (4a) Altogether, these data show that Experimental realm (2b) control RAS V12 -arrested (3b) very few cells showed cells showed relatively high senescent morphology when (3a) Consistent abundance of flat cells transduced with either miR- with the cell expressing SA- - Vec-371&2, miR-Vec-373, or growth assay, kd Galactosidase control p53 . (2c) (Figures 2G and 2H). Data realm (Figures) Thursday, September 17, 2009 27
  • 28. Tense 1: Concepts vs. Experiment (1) Oncogene-induced senescence is (4b) transduction with either Concept realm characterized by the appearance of miR-Vec-371&2 or miR-Vec- V12 cells with a flat morphology that 373 prevents RAS - express senescence associated (SA)- induced growth arrest in -Galactosid a s e . primary human cells. (2a) Indeed, (4a) Altogether, these data show that Experimental realm (personal, past) V12 (2b) control RAS -arrested (3b) very few cells showed cells showed relatively high senescent morphology when (3a) Consistent abundance of flat cells transduced with either miR- with the cell expressing SA- - Vec-371&2, miR-Vec-373, or growth assay, kd Galactosidase control p53 . (2c) (Figures 2G and 2H). (nontverbal) Data realm (Figures) Thursday, September 17, 2009 28
  • 29. Tense 2: Referral past present future Introduction Discussion own paper After Before current Current work After current other work: present work: past (= Results section) work: past other papers Other Work Thursday, September 17, 2009 29
  • 30. Tense 1+ 2 = 3: Claim, fact Conceptual Experi ment Experiential past present future Reading time Thursday, September 17, 2009 30
  • 31. Discourse Fact-ory hypothetical realm: hypothesis realm of activity: (might, would) (to test, to see) goal to problem results we realm of introduction method experience: past resulting in result suggests that discussion realm of models: fact fact fact present implication Shared view Own view discussion Thursday, September 17, 2009 31
  • 32. Citation and fact creation: Yabuta, JBioChem 2007 Voorhoeve, 2006 miR-372 and miR-373 target the Lats2 tumor suppressor To investigate the possibility that (Voorhoeve et al., 2006) miR-372 and miR-373 suppress the expression of LATS2, we... Raver-Shapira et.al, JMolCell 2007 Therefore, these results point to two miRNAs, miRNA-372 and-373, function as LATS2 as a mediator of the miR-372 and potential novel oncogenes in testicular germ cell miR-373 effects on cell proliferation and tumors by inhibition of LATS2 expression, which tumorigenicity, suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006). KnownFact KnownFact Concepts Hypothesis Implication Fact Goal Goal Method Result Method Result Data Data Experiment 1 Experiment 2 Thursday, September 17, 2009 32
  • 33. Answers to current research questions: i. Can these segments indeed be identified? ✓ yes, adequate evidence, probably ok segments: ‣ need more annotators! ii. Does this offer a useful version of the structure of a paper? ✓ yes, offers insight, and a possible model ‣ need to be validated whether this structure holds over more papers, different subcategories iii. Is this useful for enabling automated epistemic markup? ✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help ‣ ongoing research! (Sandor, XRCE; Buitelaar, DERI) iv. Can this help us to trace the evolution of a hypothesis? ✓ anecdotal: promising ‣ need to scale up! Thursday, September 17, 2009 33
  • 34. Where are we on overall research questions? I. (How) can we add epistemic value to results from a text mining system? ‣ Segment types help - need to expand + verify II. How is a scientific fact created, as it moves from a hedged claim to a throughout successive citations? ‣ Model is developing, also spurt of other work! III. Can we identify a rhetorically successful text (and help authors create them)? ‣ Not addressed yet - verb tense, hedging seem important. Thursday, September 17, 2009 34
  • 35. Work on (biological) scientific discourse - Is a growing field of interest! - Several projects developing going ‘beyond the facts’ - Epistemic modality is becoming a term bioinformaticians are exploring - Room for people who know about discourse analysis! Thursday, September 17, 2009 35