SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
How Scientists Read,
  How Computers Read,
 and What We Should Do
     (= not what it says in the abstract!)


         Anita de Waard
Disruptive Technologies Director
          Elsevier Labs
Outline
1. How do scientists read?
2. How do computers read?
3. What should we do?
Outline
1. How do scientists read?
2. How do computers read?
3. What should we do?
How we read
• Letter < syllable < word < clause < sentence < discourse:
   This is how linguistics is structured.
   But it is not how we understand text!
How we read
• Letter < syllable < word < clause < sentence < discourse:
   This is how linguistics is structured.
   But it is not how we understand text!
How we read
• Letter < syllable < word < clause < sentence < discourse:
   This is how linguistics is structured.
   But it is not how we understand text!
How we read
• Letter < syllable < word < clause < sentence < discourse:
   This is how linguistics is structured.
   But it is not how we understand text!
How we read
• Letter < syllable < word < clause < sentence < discourse:
   This is how linguistics is structured.
   But it is not how we understand text!
How we read
• Letter < syllable < word < clause < sentence < discourse:
   This is how linguistics is structured.
   But it is not how we understand text!
Scientists read:

• Why do scientists read?
  – They want to ingest knowledge:
  – read, integrate with their current knowledge
• What do scientists read?
  – Things that are ‘interesting’ :
  – Pertinent (within their ‘shell of interest’)
  – Possibly or probably true
  – Novel, but in agreement with what we know
What is this paper about?
                    NOUN PHRASES
     transiently expressed miRNA sponges

            human breast cancer   high-grade malignancy
  miR-31
              noninvasive MCF7-Ras
     antisense oligonucleotides
               cell viability               cloned
            retroviral vector

Is it pertinent? -> Possibly…
Is it true? -> ?
Is it new, but in agreement with what I know? -> -?
What is this paper about?
                         TRIPLES

      miR-31 expression DEPRIVE metastatic cells
miR-31 PREVENT acquisition of aggressive traits
   miR-31 INHIBIT noninvasive MCF7-Ras cells
             miR-31 ENHANCE invasion
                 cell viability AFFECT inhibitor

Is it pertinent? -> Possibly…
Is it true? -> ?
Is it new, but in agreement with what I know? ->?
What is this paper about?
                             METADISCOURSE
The preceding observations demonstrated that X expression deprives Y cells of
attributes associated with Z.
We next asked whether X also prevents the acquisition of A traits by B cells.
To do so, we transiently inhibited X in C cells with either D or E.
Both approaches inhibited X function by > 4.5-fold (Figure S7A).
Suppression of X enhanced invasion by 20-fold and motility by 5-fold, but F was
unaffected by either inhibitor (Figure 3A; Figure S7B).
The E sponge reduced X function by 2.5-fold, but did not affect the activity of other
known Js (Figures S8A and S8B).
Collectively, these data indicated that sustained X activity is necessary to prevent the
acquisition of Z traits by both K and untransformed B cells.
    Is it pertinent? -> Need content
    Is it true? -> Sounds likely! I know this stuff!
    Is it new, but in agreement with what I know? -> Need content
What is this paper about?
                        CLAIMS AND EVIDENCE
Claim:
• sustained miR-31 activity is necessary to prevent the acquisition of aggressive
   traits by both tumor cells and untransformed breast epithelial
Evidence: Method:
• We transiently inhibited miR-31 in noninvasive MCF7-Ras cells with either
   antisense oligonucleotides or miRNA sponges.
Evidence: Result:
• Both approaches inhibited miR-31 function by >4.5-fold (Figure S7A).
• Suppression of miR-31 enhanced invasion by 20-fold and motility by 5-
   fold, but cell viability was unaffected by either inhibitor (Figure 3A; Figure
   S7B).
• The miR-31 sponge reduced miR-31 function by 2.5-fold, but did not affect
   the activity of other known antimetastatic miRNAs (Figures S8A and S8B).
 Is it pertinent? -> Probably
 Is it true? -> Sounds likely!
 Is it new, but in agreement with what I know? -> Check/know
What is this paper about?
                            DATA




Is it pertinent? -> Need content
Is it true? -> Need methods
Is it new, but in agreement with what I know? -> Check/know
What is this paper about?
                         METADATA




Is it pertinent? -> Possibly
Is it true?      -> Probably!
Is it new, but in agreement with what I know? -> Need background
How scientists read:
       Representation        Pertinence   Truth        Fit with
                                                      knowledge
Noun phrases                     x
Triples                          x
Metadiscourse                               x
Claims and evidence              x          x             x
Data                             x          x             x
Metadata                                    x
                                                      Text mining
                        Publishing
                                          Data-centric science
Outline
1. How do scientists read?
2. How do computers read?
3. What should we do?
Noun Phrases: some issues
• Problem 1: disambiguating terms (© GoPubMed):
  – Hnrpa1 = Tis = Fli-2 = nuclear ribonucleoprotein A1 = helix
    destabilizing protein = single-strand binding protein = hnRNP core
    protein A1 = HDP-1 = topoisomerase-inhibitor suppressed.
  – Cellulose 1,4-beta-cellobiosidase = exoglucanase
  – COLD =/ C.O.L.D. =/ cold (runny nose) =/ cold (low T)
• Problem 2: disambiguating entities (© M. Martone):
  – 95 antibodies were (manually!) identified in 8 articles
  – 52 did not contain enough information to determine the antibody
    used
  – Some provided details in other papers
  – Failed to give species, clonality, vendor, or catalog number
Noun Phrases: some progress
• Despite these difficulties, noun phrase recall/precision is
  quite high, e.g. I2B22011 [1], [2], others: 90%-98%
• Many tools, see [3] for a list; e.g. GoPubMed:
Triples: some issues:
• Contingent on good NP & VP detection
• Hard to parse text! E.g. a commercial tool gave:
insulin maintaining glucose homeostasis
When insulin secretion cannot be increased adequately (type I
diabetes defect) to overcome insulin resistance in maintaining
glucose homeostasis, hyperglycemia and glucose intolerance
ensues.
insulin may be involved glucose homeostasis
Because PANDER is expressed by pancreatic beta-cells and in
response to glucose in a similar way to those of insulin, PANDER
may be involved in glucose homeostasis.
Triples: some progress:
Biological Expression Language [4]:
We provide evidence that these miRNAs are potential novel oncogenes participating in the development
of human testicular germ cell tumors by numbing the p53 pathway, thus allowing tumorigenic growth in
the presence of wild-type p53.
Increased abundance of miR-372 decreases activity of TP53
r(MIR:miR-372) -| tscript(p(HUGO:Trp53))
Context: cancer
SET Disease = “Cancer”
Activity of TP53 decreases cell growth
tscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth”
Use biological pathway visualizations
as a user interface for knowledge discovery.




                                               23
Author-created triples:
   MSR ActiveText
Metadiscourse: why it matters:
    “[Y]ou can transform .. fiction into fact just by adding or
    subtracting references”, Bruno Latour [5]

• Voorhoeve et al., 2006: “These miRNAs neutralize p53- mediated CDK
  inhibition, possibly through direct inhibition of the expression of the tumor
  suppressor LATS2.”
• Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373
  were found to allow proliferation of primary human cells that express
  oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor
  LATS2 (Voorhoeve et al., 2006).”
• Okada et al., 2011: “Two oncogenic miRNAs, miR-372 and miR-373, directly
  inhibit the expression of Lats2, thereby allowing tumorigenic growth in the
  presence of p53 (Voorhoeve et al., 2006).”
Adding metadiscourse to triples:
Biological statement with BEL/ epistemic         BEL representation:           Epistemic
markup                                                                         evaluation
These miRNAs neutralize p53-mediated CDK         r(MIR:miR-372) -              Value =
inhibition, possibly through direct inhibition   |(tscript(p(HUGO:Trp53)) -|   Possible
of the expression of the tumor-suppressor        kin(p(PFH:”CDK Family”)))     Source =
LATS2.                                           Increased abundance of miR-   Unknown
                                                 372 decreases abundance of    Basis =
                                                 LATS2                         Unknown
                                                 r(MIR:miR-372) -|
                                                 r(HUGO:LATS2)

Biological statement with                        MedScan Analysis:             Epistemic
Medscan/epistemic markup                                                       evaluation
Furthermore, we present evidence that the        IL-6  NUCB2 (nesfatin-1)     Value =
secretion of nesfatin-1 into the culture         Relation: MolTransport        Probable
media was dramatically increased during the      Effect: Positive              Source =
differentiation of 3T3-L1 preadipocytes into     CellType: Adipocytes          Author
adipocytes (P < 0.001) and after treatments      Cell Line: 3T3-L1             Basis = Data
with TNF-alpha, IL-6, insulin, and
dexamethasone (P < 0.01).
Claims and Evidence, some examples:
                Data2Semantics [11]
• Linking clinical guidelines to evidence in a linked data form
• Goal: improve speed of integration of research > practice
• Issue: evidence is not even correct within guideline?
   •   Studies have demonstrated inconsistent results regarding the
       use of such markers of inflammation as C-reactive protein (CRP),
       interleukins- 6 (IL-6) and -8, and procalcitonin (PCT) in
       neutropenic patients with cancer [55–57].
       • [55]: PCT and IL-6 are more reliable markers than CRP for
           predicting bacteremia in patients with febrile neutropenia
       • [56] In conclusion, daily measurement of PCT or IL-6
           could help identify neutropenic patients with a stable
           course when the fever lasts >3 d. …,
           it would reduce adverse events and treatment costs.
       • [57] Our study supports the value of PCT as a reliable tool to
           predict clinical outcome in febrile neutropenia.
Claims and Evidence, example:
    Drug Interaction Knowledgebase [12]
• Extracting adverse drug interactions (ADIs) from literature
  and creating linked data node of this
• Goal: improve speed and coverage of ADIs and allowing
  improved access to patients and doctors
• Issue: how to identify evidence?
   – Claim:
     R-citalopram_is_not_substrate_of_cyp2c19:
   – Evidence:
     At 10uM R- or S-CT, ketoconazole reduced reaction velocity to 55 -
     60% of control, quinidine to 80%, and omeprazole to 80-85% of
     control (Fig. 6)
Data, e.g. Web Science 2.0:
        Mark Wilkinson (SADI, Madrid)




Using what is known about interactions in fly & yeast:
predict new interactions with a human protein
Wilkinson: doing science ON the web:




                              These are different
                              Web services!

                              ...selected at run-time based
                              on the same model
Data
• All this evidence is based on data
• Increasingly: science is distributed between
   – Groups creating data
   – Groups using data – creating tools
   – Groups using tools on data – ideas
• All of these groups need to communicate!
In summary:
1. How do scientists read?
2. How do computers read?
3. What should we do?
How we read vs. computers:
Level:              People read:               Computers read:
Noun phrases        Know topic                 Pretty well
Triples             Know topic                 Pretty well
Metadiscourse       Trust method               Not very well
Claims and evidence Understand and trust       Not very well
Data                Trust - and new science!   Can enable!
Is this the future of publishing? [17]
                                                                        1. Research: Each item in the system has metadata
                                               metadata
                                                                        (including provenance) and relations to other data items
                                                          metadata      added to it.
                                                                         2. Workflow: All data items created in the lab are added to a
             metadata
                                                                         (lab-owned) workflow system.
                                                                        3. Authoring: A paper is written in an authoring tool which can
                                                                        pull data with provenance from the workflow tool in the
                                                                        appropriate representation into the document.
                       metadata                                         4. Editing and review: Once the co-authors agree, the paper is
                                                                        „exposed‟ to the editors, who in turn expose it to reviewers.
                                                             metadata
                                                                        Reports are stored in the authoring/editing system, the paper
                                                                        gets updated, until it is validated.
                                                                        5. Publishing and distribution: When a paper is
                                                                        published, a collection of validated information is exposed to
                                                                        the world. It remains connected to its related data item, and
    Rats were subjected to two grueling                                 its heritage can be traced.
    tests
    (click on fig 2 to see underlying data).                            6. User applications: distributed applications run on this
    These results suggest that the
    neurological pain pro-                                              „exposed data‟ universe.

                                                                                           Publisher runs
Review
                                      Revise
                                                                                           service (‘app’)
                    Edit
                                                                                         Publisher runs
                                                                                         service (‘app’)
What should we do?
• Experiment! All over the place. Scientists get it !
• Support scientists working on these (e.g. text
  miners, web science evangelists, data repositories, etc
  etc) – great return for your investment!
• Join forums where interactions happen between
  scientists, publishers, libraries, etc. e.g. Force11.org:
   – Collective, sponsored by Sloane, aimed at
     enabling/supporting this discussion
   – Planning workshop,
     innovative projects for 2013
   – Please join us at
     http://force11.org!
Thank you!



         Anita de Waard
    a.dewaard@elsevier.com
http://elsatglabs.com/labs/anita/
References
[1] J Am Med Inform Assoc. 2010 September; 17(5): 514–518 http://dx.doi.org/10.1136/jamia.2010.003947
[2] Quanzhi Li, Yi-Fang Brook Wu (2006): Identifying important concepts from medical documents, Journal of Biomedical Informatics 39 (2006)
668–679
[3] Useful list of resources in bioinformatics http://www.bioinformatics.ca/
[4] Biological Expression Language – http://www.openbel.org
[5] Latour, B. and Woolgar, S., Laboratory Life: the Social Construction of Scientific Facts, 1979, Sage Publications
[6] Light M, Qiu XY, Srinivasan P. (2004). The language of bioscience: facts, speculations, and statements in between. BioLINK 2004: Linking
Biological Literature, Ontologies and Databases 2004:17-24.
[7] Wilbur WJ, Rzhetsky A, Shatkay H (2006). New directions in biomedical text annotations: definitions, guidelines and corpus construction. BMC
Bioinformatics 2006, 7:356.
[8] Thompson P., Venturi G., McNaught J, Montemagni S, Ananiadou S. (2008). Categorising modality in biomedical texts. Proc. LREC 2008 Wkshp
Building and Evaluating Resources for Biomedical Text Mining 2008.
[9] Kim, S-M. Hovy, E.H. (2004). Determining the Sentiment of Opinions. Proceedings of the COLING conference, Geneva, 2004.
[10] de Waard, A. and Pander Maat, H. (2012). Epistemic Modality and Knowledge Attribution in Scientific Discourse: A Taxonomy of Types and
Overview of Features. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 47–55, Jeju, Republic of
Korea, 12 July 2012.
[11] Data2Semantics project: http://www.data2semantics.org/
[12] Boyce R, Collins C, Horn J, Kalet I. (2009) Computing with evidence Part I: A drug-mechanism evidence taxonomy oriented toward
confidence assignment. J Biomed Inform. 2009 Dec;42(6):979-89. Epub 2009 May 10, see also http://dbmi-icode-01.dbmi.pitt.edu/dikb-
evidence/front-page.html
[13] Sándor, Àgnes and de Waard, Anita, (2012). Identifying Claimed Knowledge Updates in Biomedical Research Articles, Workshop on Detecting
Structure in Scholarly Discourse, ACL 2012.
[14] Blake, C. (2010) Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles, Journal of Biomedical
Informatics, 43(2):173-189
[15] See e.g. http://ucsdbiolit.codeplex.com/ and http://research.microsoft.com/en-us/projects/ontology/ for MS Word ontology add-ins
[16] de Waard, A. and Schneider, J. (2012) Formalising Uncertainty: An Ontology of Reasoning, Certainty and Attribution (ORCA), Semantic
Technologies Applied to Biomedical Informatics and Individualized Medicine workshop, ISWC 2012
[17] de Waard, A. (2010). The Future of the Journal? Integrating research data with scientific discourse, LOGOS: The Journal of the World Book
Community, Volume 21, Numbers 1-2, 2010 , pp. 7-11(5) also published in Nature
Precedings,http://precedings.nature.com/documents/4742/version/1

Weitere ähnliche Inhalte

Andere mochten auch

Why life is so complicated
Why life is so complicatedWhy life is so complicated
Why life is so complicatedAnita de Waard
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New ScienceAnita de Waard
 
Why Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About ItWhy Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About ItAnita de Waard
 
The Future of the Journal And Applications in an Open Scientific Ecosystem
The Future of the Journal And Applications in an Open Scientific Ecosystem The Future of the Journal And Applications in an Open Scientific Ecosystem
The Future of the Journal And Applications in an Open Scientific Ecosystem Anita de Waard
 
Linking data to publications: Towards the execution of papers
Linking data to publications: Towards the execution of papersLinking data to publications: Towards the execution of papers
Linking data to publications: Towards the execution of papersAnita de Waard
 
On the research paper, and the knowledge within
On the research paper, and the knowledge withinOn the research paper, and the knowledge within
On the research paper, and the knowledge withinAnita de Waard
 
The Future of the Journal
The Future of the JournalThe Future of the Journal
The Future of the JournalAnita de Waard
 
The habits of highly successful data:
The habits of highly successful data: The habits of highly successful data:
The habits of highly successful data: Anita de Waard
 
A model for epistemic modality and knowledge attribution
A model for epistemic modality and knowledge attributionA model for epistemic modality and knowledge attribution
A model for epistemic modality and knowledge attributionAnita de Waard
 
'Stories that persuade with data' - talk at CENDI meeting January 9 2014
'Stories that persuade with data' - talk at CENDI meeting January 9 2014'Stories that persuade with data' - talk at CENDI meeting January 9 2014
'Stories that persuade with data' - talk at CENDI meeting January 9 2014Anita de Waard
 
How to persuade with data
How to persuade with dataHow to persuade with data
How to persuade with dataAnita de Waard
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Anita de Waard
 
How cool is cold: strange things that happen at ultraow temperatures
How cool is cold: strange things that happen at ultraow temperaturesHow cool is cold: strange things that happen at ultraow temperatures
How cool is cold: strange things that happen at ultraow temperaturesAnita de Waard
 
Data CItation Principles in Practice
Data CItation Principles in PracticeData CItation Principles in Practice
Data CItation Principles in PracticeAnita de Waard
 

Andere mochten auch (20)

Why life is so complicated
Why life is so complicatedWhy life is so complicated
Why life is so complicated
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New Science
 
Elpub
ElpubElpub
Elpub
 
Why Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About ItWhy Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About It
 
The Future of the Journal And Applications in an Open Scientific Ecosystem
The Future of the Journal And Applications in an Open Scientific Ecosystem The Future of the Journal And Applications in an Open Scientific Ecosystem
The Future of the Journal And Applications in an Open Scientific Ecosystem
 
Ncbo webinar force11
Ncbo webinar force11Ncbo webinar force11
Ncbo webinar force11
 
deWaardAAMC2012
deWaardAAMC2012deWaardAAMC2012
deWaardAAMC2012
 
Torsten Reimer
Torsten ReimerTorsten Reimer
Torsten Reimer
 
Linking data to publications: Towards the execution of papers
Linking data to publications: Towards the execution of papersLinking data to publications: Towards the execution of papers
Linking data to publications: Towards the execution of papers
 
On the research paper, and the knowledge within
On the research paper, and the knowledge withinOn the research paper, and the knowledge within
On the research paper, and the knowledge within
 
The Future of the Journal
The Future of the JournalThe Future of the Journal
The Future of the Journal
 
Unknown Unknowns
Unknown UnknownsUnknown Unknowns
Unknown Unknowns
 
KNDI Toronto panel
KNDI Toronto panelKNDI Toronto panel
KNDI Toronto panel
 
The habits of highly successful data:
The habits of highly successful data: The habits of highly successful data:
The habits of highly successful data:
 
A model for epistemic modality and knowledge attribution
A model for epistemic modality and knowledge attributionA model for epistemic modality and knowledge attribution
A model for epistemic modality and knowledge attribution
 
'Stories that persuade with data' - talk at CENDI meeting January 9 2014
'Stories that persuade with data' - talk at CENDI meeting January 9 2014'Stories that persuade with data' - talk at CENDI meeting January 9 2014
'Stories that persuade with data' - talk at CENDI meeting January 9 2014
 
How to persuade with data
How to persuade with dataHow to persuade with data
How to persuade with data
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
How cool is cold: strange things that happen at ultraow temperatures
How cool is cold: strange things that happen at ultraow temperaturesHow cool is cold: strange things that happen at ultraow temperatures
How cool is cold: strange things that happen at ultraow temperatures
 
Data CItation Principles in Practice
Data CItation Principles in PracticeData CItation Principles in Practice
Data CItation Principles in Practice
 

Ähnlich wie How Scientists Read, How Computers Read, and What We Should Do

The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...Anita de Waard
 
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Neuroscience Information Framework
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ ManchesterAndre Freitas
 
Biomarkers brain regions
Biomarkers brain regionsBiomarkers brain regions
Biomarkers brain regionsAnn-Marie Roche
 
Accelerating Scientific Research Through Machine Learning and Graph
Accelerating Scientific Research Through Machine Learning and GraphAccelerating Scientific Research Through Machine Learning and Graph
Accelerating Scientific Research Through Machine Learning and GraphNeo4j
 
dkNET Webinar: Tabula Sapiens 03/22/2024
dkNET Webinar: Tabula Sapiens 03/22/2024dkNET Webinar: Tabula Sapiens 03/22/2024
dkNET Webinar: Tabula Sapiens 03/22/2024dkNET
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Vall d'Hebron Institute of Research (VHIR)
 
Albert pujol reingeneering the human biology
Albert pujol   reingeneering the human biologyAlbert pujol   reingeneering the human biology
Albert pujol reingeneering the human biologyAlbert Pujol Torras
 
Characterization of microRNA expression profiles in normal human tissues
Characterization of microRNA expression profiles in normal human tissuesCharacterization of microRNA expression profiles in normal human tissues
Characterization of microRNA expression profiles in normal human tissuesYu Liang
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talkc.titus.brown
 
Research in Progress April 2014
Research in Progress April 2014Research in Progress April 2014
Research in Progress April 2014Vanessa S
 
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and ChallengesSingle-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and Challengesinside-BigData.com
 

Ähnlich wie How Scientists Read, How Computers Read, and What We Should Do (20)

The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
 
2013 alumni-webinar
2013 alumni-webinar2013 alumni-webinar
2013 alumni-webinar
 
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ Manchester
 
Biomarkers brain regions
Biomarkers brain regionsBiomarkers brain regions
Biomarkers brain regions
 
Accelerating Scientific Research Through Machine Learning and Graph
Accelerating Scientific Research Through Machine Learning and GraphAccelerating Scientific Research Through Machine Learning and Graph
Accelerating Scientific Research Through Machine Learning and Graph
 
dkNET Webinar: Tabula Sapiens 03/22/2024
dkNET Webinar: Tabula Sapiens 03/22/2024dkNET Webinar: Tabula Sapiens 03/22/2024
dkNET Webinar: Tabula Sapiens 03/22/2024
 
Open Minds Bring Open Collaborations
Open Minds Bring Open CollaborationsOpen Minds Bring Open Collaborations
Open Minds Bring Open Collaborations
 
Xerox2009
Xerox2009Xerox2009
Xerox2009
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...
 
Albert pujol reingeneering the human biology
Albert pujol   reingeneering the human biologyAlbert pujol   reingeneering the human biology
Albert pujol reingeneering the human biology
 
2014 ucl
2014 ucl2014 ucl
2014 ucl
 
Characterization of microRNA expression profiles in normal human tissues
Characterization of microRNA expression profiles in normal human tissuesCharacterization of microRNA expression profiles in normal human tissues
Characterization of microRNA expression profiles in normal human tissues
 
Biology Lab Report
Biology Lab ReportBiology Lab Report
Biology Lab Report
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talk
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
 
2014 naples
2014 naples2014 naples
2014 naples
 
Research in Progress April 2014
Research in Progress April 2014Research in Progress April 2014
Research in Progress April 2014
 
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and ChallengesSingle-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
 

Mehr von Anita de Waard

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataAnita de Waard
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesAnita de Waard
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Anita de Waard
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data ManagementAnita de Waard
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of PublishingAnita de Waard
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data SharingAnita de Waard
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingAnita de Waard
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataAnita de Waard
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupAnita de Waard
 

Mehr von Anita de Waard (20)

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
History of the future
History of the futureHistory of the future
History of the future
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 

Kürzlich hochgeladen

Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 

Kürzlich hochgeladen (20)

Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 

How Scientists Read, How Computers Read, and What We Should Do

  • 1. How Scientists Read, How Computers Read, and What We Should Do (= not what it says in the abstract!) Anita de Waard Disruptive Technologies Director Elsevier Labs
  • 2. Outline 1. How do scientists read? 2. How do computers read? 3. What should we do?
  • 3. Outline 1. How do scientists read? 2. How do computers read? 3. What should we do?
  • 4. How we read • Letter < syllable < word < clause < sentence < discourse: This is how linguistics is structured. But it is not how we understand text!
  • 5. How we read • Letter < syllable < word < clause < sentence < discourse: This is how linguistics is structured. But it is not how we understand text!
  • 6. How we read • Letter < syllable < word < clause < sentence < discourse: This is how linguistics is structured. But it is not how we understand text!
  • 7. How we read • Letter < syllable < word < clause < sentence < discourse: This is how linguistics is structured. But it is not how we understand text!
  • 8. How we read • Letter < syllable < word < clause < sentence < discourse: This is how linguistics is structured. But it is not how we understand text!
  • 9. How we read • Letter < syllable < word < clause < sentence < discourse: This is how linguistics is structured. But it is not how we understand text!
  • 10. Scientists read: • Why do scientists read? – They want to ingest knowledge: – read, integrate with their current knowledge • What do scientists read? – Things that are ‘interesting’ : – Pertinent (within their ‘shell of interest’) – Possibly or probably true – Novel, but in agreement with what we know
  • 11. What is this paper about? NOUN PHRASES transiently expressed miRNA sponges human breast cancer high-grade malignancy miR-31 noninvasive MCF7-Ras antisense oligonucleotides cell viability cloned retroviral vector Is it pertinent? -> Possibly… Is it true? -> ? Is it new, but in agreement with what I know? -> -?
  • 12. What is this paper about? TRIPLES miR-31 expression DEPRIVE metastatic cells miR-31 PREVENT acquisition of aggressive traits miR-31 INHIBIT noninvasive MCF7-Ras cells miR-31 ENHANCE invasion cell viability AFFECT inhibitor Is it pertinent? -> Possibly… Is it true? -> ? Is it new, but in agreement with what I know? ->?
  • 13. What is this paper about? METADISCOURSE The preceding observations demonstrated that X expression deprives Y cells of attributes associated with Z. We next asked whether X also prevents the acquisition of A traits by B cells. To do so, we transiently inhibited X in C cells with either D or E. Both approaches inhibited X function by > 4.5-fold (Figure S7A). Suppression of X enhanced invasion by 20-fold and motility by 5-fold, but F was unaffected by either inhibitor (Figure 3A; Figure S7B). The E sponge reduced X function by 2.5-fold, but did not affect the activity of other known Js (Figures S8A and S8B). Collectively, these data indicated that sustained X activity is necessary to prevent the acquisition of Z traits by both K and untransformed B cells. Is it pertinent? -> Need content Is it true? -> Sounds likely! I know this stuff! Is it new, but in agreement with what I know? -> Need content
  • 14. What is this paper about? CLAIMS AND EVIDENCE Claim: • sustained miR-31 activity is necessary to prevent the acquisition of aggressive traits by both tumor cells and untransformed breast epithelial Evidence: Method: • We transiently inhibited miR-31 in noninvasive MCF7-Ras cells with either antisense oligonucleotides or miRNA sponges. Evidence: Result: • Both approaches inhibited miR-31 function by >4.5-fold (Figure S7A). • Suppression of miR-31 enhanced invasion by 20-fold and motility by 5- fold, but cell viability was unaffected by either inhibitor (Figure 3A; Figure S7B). • The miR-31 sponge reduced miR-31 function by 2.5-fold, but did not affect the activity of other known antimetastatic miRNAs (Figures S8A and S8B). Is it pertinent? -> Probably Is it true? -> Sounds likely! Is it new, but in agreement with what I know? -> Check/know
  • 15. What is this paper about? DATA Is it pertinent? -> Need content Is it true? -> Need methods Is it new, but in agreement with what I know? -> Check/know
  • 16. What is this paper about? METADATA Is it pertinent? -> Possibly Is it true? -> Probably! Is it new, but in agreement with what I know? -> Need background
  • 17. How scientists read: Representation Pertinence Truth Fit with knowledge Noun phrases x Triples x Metadiscourse x Claims and evidence x x x Data x x x Metadata x Text mining Publishing Data-centric science
  • 18. Outline 1. How do scientists read? 2. How do computers read? 3. What should we do?
  • 19. Noun Phrases: some issues • Problem 1: disambiguating terms (© GoPubMed): – Hnrpa1 = Tis = Fli-2 = nuclear ribonucleoprotein A1 = helix destabilizing protein = single-strand binding protein = hnRNP core protein A1 = HDP-1 = topoisomerase-inhibitor suppressed. – Cellulose 1,4-beta-cellobiosidase = exoglucanase – COLD =/ C.O.L.D. =/ cold (runny nose) =/ cold (low T) • Problem 2: disambiguating entities (© M. Martone): – 95 antibodies were (manually!) identified in 8 articles – 52 did not contain enough information to determine the antibody used – Some provided details in other papers – Failed to give species, clonality, vendor, or catalog number
  • 20. Noun Phrases: some progress • Despite these difficulties, noun phrase recall/precision is quite high, e.g. I2B22011 [1], [2], others: 90%-98% • Many tools, see [3] for a list; e.g. GoPubMed:
  • 21. Triples: some issues: • Contingent on good NP & VP detection • Hard to parse text! E.g. a commercial tool gave: insulin maintaining glucose homeostasis When insulin secretion cannot be increased adequately (type I diabetes defect) to overcome insulin resistance in maintaining glucose homeostasis, hyperglycemia and glucose intolerance ensues. insulin may be involved glucose homeostasis Because PANDER is expressed by pancreatic beta-cells and in response to glucose in a similar way to those of insulin, PANDER may be involved in glucose homeostasis.
  • 22. Triples: some progress: Biological Expression Language [4]: We provide evidence that these miRNAs are potential novel oncogenes participating in the development of human testicular germ cell tumors by numbing the p53 pathway, thus allowing tumorigenic growth in the presence of wild-type p53. Increased abundance of miR-372 decreases activity of TP53 r(MIR:miR-372) -| tscript(p(HUGO:Trp53)) Context: cancer SET Disease = “Cancer” Activity of TP53 decreases cell growth tscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth”
  • 23. Use biological pathway visualizations as a user interface for knowledge discovery. 23
  • 24. Author-created triples: MSR ActiveText
  • 25. Metadiscourse: why it matters: “[Y]ou can transform .. fiction into fact just by adding or subtracting references”, Bruno Latour [5] • Voorhoeve et al., 2006: “These miRNAs neutralize p53- mediated CDK inhibition, possibly through direct inhibition of the expression of the tumor suppressor LATS2.” • Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373 were found to allow proliferation of primary human cells that express oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor LATS2 (Voorhoeve et al., 2006).” • Okada et al., 2011: “Two oncogenic miRNAs, miR-372 and miR-373, directly inhibit the expression of Lats2, thereby allowing tumorigenic growth in the presence of p53 (Voorhoeve et al., 2006).”
  • 26. Adding metadiscourse to triples: Biological statement with BEL/ epistemic BEL representation: Epistemic markup evaluation These miRNAs neutralize p53-mediated CDK r(MIR:miR-372) - Value = inhibition, possibly through direct inhibition |(tscript(p(HUGO:Trp53)) -| Possible of the expression of the tumor-suppressor kin(p(PFH:”CDK Family”))) Source = LATS2. Increased abundance of miR- Unknown 372 decreases abundance of Basis = LATS2 Unknown r(MIR:miR-372) -| r(HUGO:LATS2) Biological statement with MedScan Analysis: Epistemic Medscan/epistemic markup evaluation Furthermore, we present evidence that the IL-6  NUCB2 (nesfatin-1) Value = secretion of nesfatin-1 into the culture Relation: MolTransport Probable media was dramatically increased during the Effect: Positive Source = differentiation of 3T3-L1 preadipocytes into CellType: Adipocytes Author adipocytes (P < 0.001) and after treatments Cell Line: 3T3-L1 Basis = Data with TNF-alpha, IL-6, insulin, and dexamethasone (P < 0.01).
  • 27. Claims and Evidence, some examples: Data2Semantics [11] • Linking clinical guidelines to evidence in a linked data form • Goal: improve speed of integration of research > practice • Issue: evidence is not even correct within guideline? • Studies have demonstrated inconsistent results regarding the use of such markers of inflammation as C-reactive protein (CRP), interleukins- 6 (IL-6) and -8, and procalcitonin (PCT) in neutropenic patients with cancer [55–57]. • [55]: PCT and IL-6 are more reliable markers than CRP for predicting bacteremia in patients with febrile neutropenia • [56] In conclusion, daily measurement of PCT or IL-6 could help identify neutropenic patients with a stable course when the fever lasts >3 d. …, it would reduce adverse events and treatment costs. • [57] Our study supports the value of PCT as a reliable tool to predict clinical outcome in febrile neutropenia.
  • 28. Claims and Evidence, example: Drug Interaction Knowledgebase [12] • Extracting adverse drug interactions (ADIs) from literature and creating linked data node of this • Goal: improve speed and coverage of ADIs and allowing improved access to patients and doctors • Issue: how to identify evidence? – Claim: R-citalopram_is_not_substrate_of_cyp2c19: – Evidence: At 10uM R- or S-CT, ketoconazole reduced reaction velocity to 55 - 60% of control, quinidine to 80%, and omeprazole to 80-85% of control (Fig. 6)
  • 29. Data, e.g. Web Science 2.0: Mark Wilkinson (SADI, Madrid) Using what is known about interactions in fly & yeast: predict new interactions with a human protein
  • 30. Wilkinson: doing science ON the web: These are different Web services! ...selected at run-time based on the same model
  • 31. Data • All this evidence is based on data • Increasingly: science is distributed between – Groups creating data – Groups using data – creating tools – Groups using tools on data – ideas • All of these groups need to communicate!
  • 32. In summary: 1. How do scientists read? 2. How do computers read? 3. What should we do?
  • 33. How we read vs. computers: Level: People read: Computers read: Noun phrases Know topic Pretty well Triples Know topic Pretty well Metadiscourse Trust method Not very well Claims and evidence Understand and trust Not very well Data Trust - and new science! Can enable!
  • 34. Is this the future of publishing? [17] 1. Research: Each item in the system has metadata metadata (including provenance) and relations to other data items metadata added to it. 2. Workflow: All data items created in the lab are added to a metadata (lab-owned) workflow system. 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document. metadata 4. Editing and review: Once the co-authors agree, the paper is „exposed‟ to the editors, who in turn expose it to reviewers. metadata Reports are stored in the authoring/editing system, the paper gets updated, until it is validated. 5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related data item, and Rats were subjected to two grueling its heritage can be traced. tests (click on fig 2 to see underlying data). 6. User applications: distributed applications run on this These results suggest that the neurological pain pro- „exposed data‟ universe. Publisher runs Review Revise service (‘app’) Edit Publisher runs service (‘app’)
  • 35. What should we do? • Experiment! All over the place. Scientists get it ! • Support scientists working on these (e.g. text miners, web science evangelists, data repositories, etc etc) – great return for your investment! • Join forums where interactions happen between scientists, publishers, libraries, etc. e.g. Force11.org: – Collective, sponsored by Sloane, aimed at enabling/supporting this discussion – Planning workshop, innovative projects for 2013 – Please join us at http://force11.org!
  • 36. Thank you! Anita de Waard a.dewaard@elsevier.com http://elsatglabs.com/labs/anita/
  • 37. References [1] J Am Med Inform Assoc. 2010 September; 17(5): 514–518 http://dx.doi.org/10.1136/jamia.2010.003947 [2] Quanzhi Li, Yi-Fang Brook Wu (2006): Identifying important concepts from medical documents, Journal of Biomedical Informatics 39 (2006) 668–679 [3] Useful list of resources in bioinformatics http://www.bioinformatics.ca/ [4] Biological Expression Language – http://www.openbel.org [5] Latour, B. and Woolgar, S., Laboratory Life: the Social Construction of Scientific Facts, 1979, Sage Publications [6] Light M, Qiu XY, Srinivasan P. (2004). The language of bioscience: facts, speculations, and statements in between. BioLINK 2004: Linking Biological Literature, Ontologies and Databases 2004:17-24. [7] Wilbur WJ, Rzhetsky A, Shatkay H (2006). New directions in biomedical text annotations: definitions, guidelines and corpus construction. BMC Bioinformatics 2006, 7:356. [8] Thompson P., Venturi G., McNaught J, Montemagni S, Ananiadou S. (2008). Categorising modality in biomedical texts. Proc. LREC 2008 Wkshp Building and Evaluating Resources for Biomedical Text Mining 2008. [9] Kim, S-M. Hovy, E.H. (2004). Determining the Sentiment of Opinions. Proceedings of the COLING conference, Geneva, 2004. [10] de Waard, A. and Pander Maat, H. (2012). Epistemic Modality and Knowledge Attribution in Scientific Discourse: A Taxonomy of Types and Overview of Features. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 47–55, Jeju, Republic of Korea, 12 July 2012. [11] Data2Semantics project: http://www.data2semantics.org/ [12] Boyce R, Collins C, Horn J, Kalet I. (2009) Computing with evidence Part I: A drug-mechanism evidence taxonomy oriented toward confidence assignment. J Biomed Inform. 2009 Dec;42(6):979-89. Epub 2009 May 10, see also http://dbmi-icode-01.dbmi.pitt.edu/dikb- evidence/front-page.html [13] Sándor, Àgnes and de Waard, Anita, (2012). Identifying Claimed Knowledge Updates in Biomedical Research Articles, Workshop on Detecting Structure in Scholarly Discourse, ACL 2012. [14] Blake, C. (2010) Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles, Journal of Biomedical Informatics, 43(2):173-189 [15] See e.g. http://ucsdbiolit.codeplex.com/ and http://research.microsoft.com/en-us/projects/ontology/ for MS Word ontology add-ins [16] de Waard, A. and Schneider, J. (2012) Formalising Uncertainty: An Ontology of Reasoning, Certainty and Attribution (ORCA), Semantic Technologies Applied to Biomedical Informatics and Individualized Medicine workshop, ISWC 2012 [17] de Waard, A. (2010). The Future of the Journal? Integrating research data with scientific discourse, LOGOS: The Journal of the World Book Community, Volume 21, Numbers 1-2, 2010 , pp. 7-11(5) also published in Nature Precedings,http://precedings.nature.com/documents/4742/version/1