SlideShare a Scribd company logo
1 of 58
Download to read offline
Data-driven Papers and
Grand Challenges



Anita de Waard, a.dewaard@elsevier.com
Disruptive Technologies Director, Elsevier Labs


August 26, 2010
Science is made of information...
Science is made of information...




   ...that gets created...
Science is made of information...




   ...that gets created...   ... and destroyed.
What is the problem?
What is the problem?



1. Researchers can’t keep track of their data.
What is the problem?



1. Researchers can’t keep track of their data.


2. Data is not stored in a way that is easy for authors.
What is the problem?



1. Researchers can’t keep track of their data.


2. Data is not stored in a way that is easy for authors.


3. For readers, article text is not linked to the underlying data.
The Vision   Work done with Ed Hovy, Phil Bourne,
             Gully Burns and Cartic Ramakrishnan
The Vision                                                        Work done with Ed Hovy, Phil Bourne,
                                                                  Gully Burns and Cartic Ramakrishnan

                                                 1. Research: Each item in the system has metadata
                        metadata                 (including provenance) and relations to other data items
                                   metadata      added to it.

       metadata




             metadata

                                      metadata
The Vision                                                        Work done with Ed Hovy, Phil Bourne,
                                                                  Gully Burns and Cartic Ramakrishnan

                                                 1. Research: Each item in the system has metadata
                        metadata                 (including provenance) and relations to other data items
                                   metadata      added to it.
                                                 2. Workflow: All data items created in the lab are added
       metadata
                                                 to a (lab-owned) workflow system.




             metadata

                                      metadata
The Vision                                                                         Work done with Ed Hovy, Phil Bourne,
                                                                                   Gully Burns and Cartic Ramakrishnan

                                                                  1. Research: Each item in the system has metadata
                                         metadata                 (including provenance) and relations to other data items
                                                    metadata      added to it.
                                                                  2. Workflow: All data items created in the lab are added
             metadata
                                                                  to a (lab-owned) workflow system.
                                                                  3. Authoring: A paper is written in an authoring tool which
                                                                  can pull data with provenance from the workflow tool in the
                                                                  appropriate representation into the document.

                    metadata

                                                       metadata




     Rats were subjected to two
     grueling tests
     (click on fig 2 to see underlying
     data). These results suggest that
     the neurological pain pro-
The Vision                                                                           Work done with Ed Hovy, Phil Bourne,
                                                                                     Gully Burns and Cartic Ramakrishnan

                                                                    1. Research: Each item in the system has metadata
                                           metadata                 (including provenance) and relations to other data items
                                                      metadata      added to it.
                                                                    2. Workflow: All data items created in the lab are added
               metadata
                                                                    to a (lab-owned) workflow system.
                                                                    3. Authoring: A paper is written in an authoring tool which
                                                                    can pull data with provenance from the workflow tool in the
                                                                    appropriate representation into the document.

                      metadata                                      4. Editing and review: Once the co-authors agree, the
                                                                    paper is ‘exposed’ to the editors, who in turn expose it to
                                                         metadata   reviewers. Reports are stored in the authoring/editing
                                                                    system, the paper gets updated, until it is validated.




       Rats were subjected to two
       grueling tests
       (click on fig 2 to see underlying
       data). These results suggest that
       the neurological pain pro-



    Review
                                   Revise
                    Edit
The Vision                                                                           Work done with Ed Hovy, Phil Bourne,
                                                                                     Gully Burns and Cartic Ramakrishnan

                                                                    1. Research: Each item in the system has metadata
                                           metadata                 (including provenance) and relations to other data items
                                                      metadata      added to it.
                                                                    2. Workflow: All data items created in the lab are added
               metadata
                                                                    to a (lab-owned) workflow system.
                                                                    3. Authoring: A paper is written in an authoring tool which
                                                                    can pull data with provenance from the workflow tool in the
                                                                    appropriate representation into the document.

                      metadata                                      4. Editing and review: Once the co-authors agree, the
                                                                    paper is ‘exposed’ to the editors, who in turn expose it to
                                                         metadata   reviewers. Reports are stored in the authoring/editing
                                                                    system, the paper gets updated, until it is validated.
                                                                    5. Publishing and distribution: When a paper is
                                                                    published, a collection of validated information is
                                                                    exposed to the world. It remains connected to its related
       Rats were subjected to two
                                                                    data item, and its heritage can be traced.
       grueling tests
       (click on fig 2 to see underlying
       data). These results suggest that
       the neurological pain pro-



    Review
                                   Revise
                    Edit
The Vision                                                                           Work done with Ed Hovy, Phil Bourne,
                                                                                     Gully Burns and Cartic Ramakrishnan

                                                                    1. Research: Each item in the system has metadata
                                           metadata                 (including provenance) and relations to other data items
                                                      metadata      added to it.
                                                                    2. Workflow: All data items created in the lab are added
               metadata
                                                                    to a (lab-owned) workflow system.
                                                                    3. Authoring: A paper is written in an authoring tool which
                                                                    can pull data with provenance from the workflow tool in the
                                                                    appropriate representation into the document.

                      metadata                                      4. Editing and review: Once the co-authors agree, the
                                                                    paper is ‘exposed’ to the editors, who in turn expose it to
                                                         metadata   reviewers. Reports are stored in the authoring/editing
                                                                    system, the paper gets updated, until it is validated.
                                                                    5. Publishing and distribution: When a paper is
                                                                    published, a collection of validated information is
                                                                    exposed to the world. It remains connected to its related
       Rats were subjected to two
                                                                    data item, and its heritage can be traced.
       grueling tests
       (click on fig 2 to see underlying
                                                                    6. User applications: distributed applications run on this
       data). These results suggest that                            ‘exposed data’ universe.
       the neurological pain pro-


                                                                                   Some other publisher
    Review
                                   Revise
                    Edit
What is needed to get there?
What is needed to get there?
Workflow tools: Linked-data-based workflow tools for all
 sciences: scalable, safe, and user-friendly
What is needed to get there?
Workflow tools: Linked-data-based workflow tools for all
 sciences: scalable, safe, and user-friendly
Authoring and reviewing tools: that enable use of rich and
 provenance-tracked elements
What is needed to get there?
Workflow tools: Linked-data-based workflow tools for all
 sciences: scalable, safe, and user-friendly
Authoring and reviewing tools: that enable use of rich and
 provenance-tracked elements
Metadata standards: Standards that allow exchange of
 information on any knowledge item created in a lab,
 including provenance/privacy/IPR rights
What is needed to get there?
Workflow tools: Linked-data-based workflow tools for all
 sciences: scalable, safe, and user-friendly
Authoring and reviewing tools: that enable use of rich and
 provenance-tracked elements
Metadata standards: Standards that allow exchange of
 information on any knowledge item created in a lab,
 including provenance/privacy/IPR rights
Semantic/Linked Data XML repositories.
What is needed to get there?
Workflow tools: Linked-data-based workflow tools for all
 sciences: scalable, safe, and user-friendly
Authoring and reviewing tools: that enable use of rich and
 provenance-tracked elements
Metadata standards: Standards that allow exchange of
 information on any knowledge item created in a lab,
 including provenance/privacy/IPR rights
Semantic/Linked Data XML repositories.
Publishing systems that are application servers
What is needed to get there?
Workflow tools: Linked-data-based workflow tools for all
 sciences: scalable, safe, and user-friendly
Authoring and reviewing tools: that enable use of rich and
 provenance-tracked elements
Metadata standards: Standards that allow exchange of
 information on any knowledge item created in a lab,
 including provenance/privacy/IPR rights
Semantic/Linked Data XML repositories.
Publishing systems that are application servers
Social change: Scientists store, track and annotate their
 work.
What is needed to get there?
Workflow tools: Linked-data-based workflow tools for all
 sciences: scalable, safe, and user-friendly    tool builders
Authoring and reviewing tools: that enable use of rich and
 provenance-tracked elements
Metadata standards: Standards that allow exchange of
 information on any knowledge item created in a lab,
 including provenance/privacy/IPR rights
Semantic/Linked Data XML repositories.
Publishing systems that are application servers
Social change: Scientists store, track and annotate their
 work.
What is needed to get there?
Workflow tools: Linked-data-based workflow tools for all
 sciences: scalable, safe, and user-friendly    tool builders
Authoring and reviewing tools: that enable use of rich and
 provenance-tracked elements                    tool builders
Metadata standards: Standards that allow exchange of
 information on any knowledge item created in a lab,
 including provenance/privacy/IPR rights
Semantic/Linked Data XML repositories.
Publishing systems that are application servers
Social change: Scientists store, track and annotate their
 work.
What is needed to get there?
Workflow tools: Linked-data-based workflow tools for all
 sciences: scalable, safe, and user-friendly    tool builders
Authoring and reviewing tools: that enable use of rich and
 provenance-tracked elements                    tool builders
Metadata standards: Standards that allow exchange of
 information on any knowledge item created in a lab,
 including provenance/privacy/IPR rights standards bodies
Semantic/Linked Data XML repositories.
Publishing systems that are application servers
Social change: Scientists store, track and annotate their
 work.
What is needed to get there?
Workflow tools: Linked-data-based workflow tools for all
 sciences: scalable, safe, and user-friendly    tool builders
Authoring and reviewing tools: that enable use of rich and
 provenance-tracked elements                    tool builders
Metadata standards: Standards that allow exchange of
 information on any knowledge item created in a lab,
 including provenance/privacy/IPR rights standards bodies
Semantic/Linked Data XML repositories.           publishers
Publishing systems that are application servers
Social change: Scientists store, track and annotate their
 work.
What is needed to get there?
Workflow tools: Linked-data-based workflow tools for all
 sciences: scalable, safe, and user-friendly    tool builders
Authoring and reviewing tools: that enable use of rich and
 provenance-tracked elements                    tool builders
Metadata standards: Standards that allow exchange of
 information on any knowledge item created in a lab,
 including provenance/privacy/IPR rights standards bodies
Semantic/Linked Data XML repositories.           publishers
Publishing systems that are application servers publishers
Social change: Scientists store, track and annotate their
 work.
What is needed to get there?
Workflow tools: Linked-data-based workflow tools for all
 sciences: scalable, safe, and user-friendly      tool builders
Authoring and reviewing tools: that enable use of rich and
 provenance-tracked elements                      tool builders
Metadata standards: Standards that allow exchange of
 information on any knowledge item created in a lab,
 including provenance/privacy/IPR rights standards bodies
Semantic/Linked Data XML repositories.              publishers
Publishing systems that are application servers publishers
Social change: Scientists store, track and annotate their
 work.                  institutes, funding bodies, individuals
A. Workflow tools are emerging
A. Workflow tools are emerging



                                http://MyExperiment.org
A. Workflow tools are emerging
         http://VisTrails.org




                                http://MyExperiment.org
A. Workflow tools are emerging
                http://VisTrails.org




                                       http://MyExperiment.org




    http://wings.isi.edu/
B. Authoring ‘ecosystems’: e.g., SWAN
                 SWAN Semantic Relationships




                Excel file            describes


      Private     makes            hasEvidence                  annotates
                                                                                       comment

                                                  publication        person
                             Claim
                                  hasEvidence              authoredBy       authorOf


                                                  publication
                                                            shareWith
                                      describes

                                       MSWORD file                               Slide by Tim Clark
B. Authoring ‘ecosystems’: e.g., SWAN
            person           SWAN Semantic Relationships
                                                                              annotates
                                                                                                        comment
    authoredBy
                                 makes             hasEvidence
                                                                                                                concept
                                                                                 annotates
                                           Claim                  publication
    shareWith       hypothesis
                                 makes             hasEvidence
                                                                                                                  gene
                                           Claim                  publication
                                                   hasEvidence                    discussedIn
            group

                                                                  publication

          Public            Excel file              describes                 describes
                                                                                                         PDFs

          Private                makes         hasEvidence                      annotates
                                                                                                          comment

                                                                publication          person
                                         Claim
                                              hasEvidence                authoredBy          authorOf


                                                                publication
                                                                          shareWith
                                                    describes

                                                     MSWORD file                                  Slide by Tim Clark
C. Example of Metadata: Harvard’s Annotation
    foaf:person rdf:Type                     Ontology
                                           http://www.ht.org/
                                               foaf.rdf#me

       June 1, 2010
                                                pav:createdBy

                   pav:createdOn                                      ann:annotates                   http://anyurl.com/sf_pat01.html


                     hasTag

                                                rdf:Type
                                   hasTopic
       Tag
                                                       Atomic

           tag
                                   FMA:skull                ann:context
                                                                               onDocument

Linear skull fracture



                                                                 rdf:Type
Other annotations on the same document:
1. Atomic annotation on image (tag: “hematoma”)
2. General annotation (tag: “injury”)                                     InitEndCornerSelector
                                                                                                             init
Other annotations on similar documents:                                                                                      (304, 507)
1. General annotation (tag: “skull fracture”)                                  rdfs:SubClassOf
                                                                                                             end
                                                                                                                             (380, 618)
                                                                                      ImageSelector
                                                                                                                       Slide by Tim Clark
D. Linked Data at Elsevier
D. Linked Data at Elsevier




 <ce:section id=#123>
D. Linked Data at Elsevier




                        this says
 <ce:section id=#123>               mice like cheese
D. Linked Data at Elsevier




                                      said @anita
                                    on May 31 2010




                        this says
 <ce:section id=#123>               mice like cheese
D. Linked Data at Elsevier

                                         but we all know
                                      she was jetlagged then


                                      said @anita
                                    on May 31 2010




                        this says
 <ce:section id=#123>               mice like cheese
D. Linked Data at Elsevier
             immutable, $$, proprietary
                                                       but we all know
                                                    she was jetlagged then


                                                    said @anita
                                                  on May 31 2010




                                      this says
 <ce:section id=#123>                             mice like cheese
D. Linked Data at Elsevier
             immutable, $$, proprietary     dynamic, personal, task-driven, - open?
                                                           but we all know
                                                        she was jetlagged then


                                                        said @anita
                                                      on May 31 2010




                                      this says
 <ce:section id=#123>                                mice like cheese
E. ScienceDirect Application Server
F. Social Change. Some next Steps:
F. Social Change. Some next Steps:
• 2010 - 2011: Try to gather resources, current leaders, etc. for
 ‘Future of Research Communication’ effort
F. Social Change. Some next Steps:
• 2010 - 2011: Try to gather resources, current leaders, etc. for
 ‘Future of Research Communication’ effort
  –Fall 2010: Develop virtual community (with Harvard)
F. Social Change. Some next Steps:
• 2010 - 2011: Try to gather resources, current leaders, etc. for
 ‘Future of Research Communication’ effort
  –Fall 2010: Develop virtual community (with Harvard)
  –August 2011: Dagstuhl Workshop:
F. Social Change. Some next Steps:
• 2010 - 2011: Try to gather resources, current leaders, etc. for
 ‘Future of Research Communication’ effort
  –Fall 2010: Develop virtual community (with Harvard)
  –August 2011: Dagstuhl Workshop:
    • Involve key people (include funding bodies, libraries,
      institutions) to see where bottlenecks are
F. Social Change. Some next Steps:
• 2010 - 2011: Try to gather resources, current leaders, etc. for
 ‘Future of Research Communication’ effort
  –Fall 2010: Develop virtual community (with Harvard)
  –August 2011: Dagstuhl Workshop:
    • Involve key people (include funding bodies, libraries,
      institutions) to see where bottlenecks are
    • Write white paper, implement
F. Social Change. Some next Steps:
• 2010 - 2011: Try to gather resources, current leaders, etc. for
  ‘Future of Research Communication’ effort
   –Fall 2010: Develop virtual community (with Harvard)
   –August 2011: Dagstuhl Workshop:
     • Involve key people (include funding bodies, libraries,
       institutions) to see where bottlenecks are
     • Write white paper, implement
• 2011: ICCS ‘Executable Paper Challenge’?
Scope: Tools and processes to:
- Improve the process of creating, reviewing and
  editing scientific content
- Interpret, visualize or connect science knowledge
- Provide tools/ideas for measuring the impact of these
  improvements.
Scope: Tools and processes to:
- Improve the process of creating, reviewing and
  editing scientific content
- Interpret, visualize or connect science knowledge
- Provide tools/ideas for measuring the impact of these
  improvements.
June 2008: 71 Submissions from 15 countries.
Scope: Tools and processes to:
- Improve the process of creating, reviewing and
  editing scientific content
- Interpret, visualize or connect science knowledge
- Provide tools/ideas for measuring the impact of these
  improvements.
June 2008: 71 Submissions from 15 countries.
August 2008: 10 Semi-finalists teams, access to:
   -   500,000 full text articles
   -   Plus EMTREE, EmBase, Scopus
   -   Created tool/demo
   -   Presented to the Judges
   -   Wrote a paper (accepted for JWeb Semantics)
Scope: Tools and processes to:
- Improve the process of creating, reviewing and
  editing scientific content
- Interpret, visualize or connect science knowledge
- Provide tools/ideas for measuring the impact of these
  improvements.
June 2008: 71 Submissions from 15 countries.
August 2008: 10 Semi-finalists teams, access to:
   -   500,000 full text articles
   -   Plus EMTREE, EmBase, Scopus
   -   Created tool/demo
   -   Presented to the Judges
   -   Wrote a paper (accepted for JWeb Semantics)
April 2009: Judges selected 4 Finalist teams.
Scope: Tools and processes to:
- Improve the process of creating, reviewing and
  editing scientific content
- Interpret, visualize or connect science knowledge
- Provide tools/ideas for measuring the impact of these
  improvements.
June 2008: 71 Submissions from 15 countries.
August 2008: 10 Semi-finalists teams, access to:
   -   500,000 full text articles
   -   Plus EMTREE, EmBase, Scopus
   -   Created tool/demo
   -   Presented to the Judges
   -   Wrote a paper (accepted for JWeb Semantics)
April 2009: Judges selected 4 Finalist teams.
And the winners were:
Scope: Tools and processes to:
- Improve the process of creating, reviewing and
  editing scientific content
- Interpret, visualize or connect science knowledge
- Provide tools/ideas for measuring the impact of these
  improvements.
June 2008: 71 Submissions from 15 countries.
August 2008: 10 Semi-finalists teams, access to:
   -   500,000 full text articles
   -   Plus EMTREE, EmBase, Scopus
   -   Created tool/demo
   -   Presented to the Judges
   -   Wrote a paper (accepted for JWeb Semantics)
April 2009: Judges selected 4 Finalist teams.
And the winners were:
Scope: Tools and processes to:
- Improve the process of creating, reviewing and
  editing scientific content
- Interpret, visualize or connect science knowledge
- Provide tools/ideas for measuring the impact of these
  improvements.
June 2008: 71 Submissions from 15 countries.
August 2008: 10 Semi-finalists teams, access to:
   -   500,000 full text articles
   -   Plus EMTREE, EmBase, Scopus
   -   Created tool/demo
   -   Presented to the Judges
   -   Wrote a paper (accepted for JWeb Semantics)
April 2009: Judges selected 4 Finalist teams.
And the winners were:

More Related Content

More from Anita de Waard

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataAnita de Waard
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesAnita de Waard
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Anita de Waard
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data ManagementAnita de Waard
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of PublishingAnita de Waard
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data SharingAnita de Waard
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingAnita de Waard
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataAnita de Waard
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupAnita de Waard
 

More from Anita de Waard (20)

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
History of the future
History of the futureHistory of the future
History of the future
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 

Workflows and challenges

  • 1. Data-driven Papers and Grand Challenges Anita de Waard, a.dewaard@elsevier.com Disruptive Technologies Director, Elsevier Labs August 26, 2010
  • 2. Science is made of information...
  • 3. Science is made of information... ...that gets created...
  • 4. Science is made of information... ...that gets created... ... and destroyed.
  • 5. What is the problem?
  • 6. What is the problem? 1. Researchers can’t keep track of their data.
  • 7. What is the problem? 1. Researchers can’t keep track of their data. 2. Data is not stored in a way that is easy for authors.
  • 8. What is the problem? 1. Researchers can’t keep track of their data. 2. Data is not stored in a way that is easy for authors. 3. For readers, article text is not linked to the underlying data.
  • 9. The Vision Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan
  • 10. The Vision Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan 1. Research: Each item in the system has metadata metadata (including provenance) and relations to other data items metadata added to it. metadata metadata metadata
  • 11. The Vision Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan 1. Research: Each item in the system has metadata metadata (including provenance) and relations to other data items metadata added to it. 2. Workflow: All data items created in the lab are added metadata to a (lab-owned) workflow system. metadata metadata
  • 12. The Vision Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan 1. Research: Each item in the system has metadata metadata (including provenance) and relations to other data items metadata added to it. 2. Workflow: All data items created in the lab are added metadata to a (lab-owned) workflow system. 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document. metadata metadata Rats were subjected to two grueling tests (click on fig 2 to see underlying data). These results suggest that the neurological pain pro-
  • 13. The Vision Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan 1. Research: Each item in the system has metadata metadata (including provenance) and relations to other data items metadata added to it. 2. Workflow: All data items created in the lab are added metadata to a (lab-owned) workflow system. 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document. metadata 4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to metadata reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated. Rats were subjected to two grueling tests (click on fig 2 to see underlying data). These results suggest that the neurological pain pro- Review Revise Edit
  • 14. The Vision Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan 1. Research: Each item in the system has metadata metadata (including provenance) and relations to other data items metadata added to it. 2. Workflow: All data items created in the lab are added metadata to a (lab-owned) workflow system. 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document. metadata 4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to metadata reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated. 5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related Rats were subjected to two data item, and its heritage can be traced. grueling tests (click on fig 2 to see underlying data). These results suggest that the neurological pain pro- Review Revise Edit
  • 15. The Vision Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan 1. Research: Each item in the system has metadata metadata (including provenance) and relations to other data items metadata added to it. 2. Workflow: All data items created in the lab are added metadata to a (lab-owned) workflow system. 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document. metadata 4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to metadata reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated. 5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related Rats were subjected to two data item, and its heritage can be traced. grueling tests (click on fig 2 to see underlying 6. User applications: distributed applications run on this data). These results suggest that ‘exposed data’ universe. the neurological pain pro- Some other publisher Review Revise Edit
  • 16. What is needed to get there?
  • 17. What is needed to get there? Workflow tools: Linked-data-based workflow tools for all sciences: scalable, safe, and user-friendly
  • 18. What is needed to get there? Workflow tools: Linked-data-based workflow tools for all sciences: scalable, safe, and user-friendly Authoring and reviewing tools: that enable use of rich and provenance-tracked elements
  • 19. What is needed to get there? Workflow tools: Linked-data-based workflow tools for all sciences: scalable, safe, and user-friendly Authoring and reviewing tools: that enable use of rich and provenance-tracked elements Metadata standards: Standards that allow exchange of information on any knowledge item created in a lab, including provenance/privacy/IPR rights
  • 20. What is needed to get there? Workflow tools: Linked-data-based workflow tools for all sciences: scalable, safe, and user-friendly Authoring and reviewing tools: that enable use of rich and provenance-tracked elements Metadata standards: Standards that allow exchange of information on any knowledge item created in a lab, including provenance/privacy/IPR rights Semantic/Linked Data XML repositories.
  • 21. What is needed to get there? Workflow tools: Linked-data-based workflow tools for all sciences: scalable, safe, and user-friendly Authoring and reviewing tools: that enable use of rich and provenance-tracked elements Metadata standards: Standards that allow exchange of information on any knowledge item created in a lab, including provenance/privacy/IPR rights Semantic/Linked Data XML repositories. Publishing systems that are application servers
  • 22. What is needed to get there? Workflow tools: Linked-data-based workflow tools for all sciences: scalable, safe, and user-friendly Authoring and reviewing tools: that enable use of rich and provenance-tracked elements Metadata standards: Standards that allow exchange of information on any knowledge item created in a lab, including provenance/privacy/IPR rights Semantic/Linked Data XML repositories. Publishing systems that are application servers Social change: Scientists store, track and annotate their work.
  • 23. What is needed to get there? Workflow tools: Linked-data-based workflow tools for all sciences: scalable, safe, and user-friendly tool builders Authoring and reviewing tools: that enable use of rich and provenance-tracked elements Metadata standards: Standards that allow exchange of information on any knowledge item created in a lab, including provenance/privacy/IPR rights Semantic/Linked Data XML repositories. Publishing systems that are application servers Social change: Scientists store, track and annotate their work.
  • 24. What is needed to get there? Workflow tools: Linked-data-based workflow tools for all sciences: scalable, safe, and user-friendly tool builders Authoring and reviewing tools: that enable use of rich and provenance-tracked elements tool builders Metadata standards: Standards that allow exchange of information on any knowledge item created in a lab, including provenance/privacy/IPR rights Semantic/Linked Data XML repositories. Publishing systems that are application servers Social change: Scientists store, track and annotate their work.
  • 25. What is needed to get there? Workflow tools: Linked-data-based workflow tools for all sciences: scalable, safe, and user-friendly tool builders Authoring and reviewing tools: that enable use of rich and provenance-tracked elements tool builders Metadata standards: Standards that allow exchange of information on any knowledge item created in a lab, including provenance/privacy/IPR rights standards bodies Semantic/Linked Data XML repositories. Publishing systems that are application servers Social change: Scientists store, track and annotate their work.
  • 26. What is needed to get there? Workflow tools: Linked-data-based workflow tools for all sciences: scalable, safe, and user-friendly tool builders Authoring and reviewing tools: that enable use of rich and provenance-tracked elements tool builders Metadata standards: Standards that allow exchange of information on any knowledge item created in a lab, including provenance/privacy/IPR rights standards bodies Semantic/Linked Data XML repositories. publishers Publishing systems that are application servers Social change: Scientists store, track and annotate their work.
  • 27. What is needed to get there? Workflow tools: Linked-data-based workflow tools for all sciences: scalable, safe, and user-friendly tool builders Authoring and reviewing tools: that enable use of rich and provenance-tracked elements tool builders Metadata standards: Standards that allow exchange of information on any knowledge item created in a lab, including provenance/privacy/IPR rights standards bodies Semantic/Linked Data XML repositories. publishers Publishing systems that are application servers publishers Social change: Scientists store, track and annotate their work.
  • 28. What is needed to get there? Workflow tools: Linked-data-based workflow tools for all sciences: scalable, safe, and user-friendly tool builders Authoring and reviewing tools: that enable use of rich and provenance-tracked elements tool builders Metadata standards: Standards that allow exchange of information on any knowledge item created in a lab, including provenance/privacy/IPR rights standards bodies Semantic/Linked Data XML repositories. publishers Publishing systems that are application servers publishers Social change: Scientists store, track and annotate their work. institutes, funding bodies, individuals
  • 29. A. Workflow tools are emerging
  • 30. A. Workflow tools are emerging http://MyExperiment.org
  • 31. A. Workflow tools are emerging http://VisTrails.org http://MyExperiment.org
  • 32. A. Workflow tools are emerging http://VisTrails.org http://MyExperiment.org http://wings.isi.edu/
  • 33. B. Authoring ‘ecosystems’: e.g., SWAN SWAN Semantic Relationships Excel file describes Private makes hasEvidence annotates comment publication person Claim hasEvidence authoredBy authorOf publication shareWith describes MSWORD file Slide by Tim Clark
  • 34. B. Authoring ‘ecosystems’: e.g., SWAN person SWAN Semantic Relationships annotates comment authoredBy makes hasEvidence concept annotates Claim publication shareWith hypothesis makes hasEvidence gene Claim publication hasEvidence discussedIn group publication Public Excel file describes describes PDFs Private makes hasEvidence annotates comment publication person Claim hasEvidence authoredBy authorOf publication shareWith describes MSWORD file Slide by Tim Clark
  • 35. C. Example of Metadata: Harvard’s Annotation foaf:person rdf:Type Ontology http://www.ht.org/ foaf.rdf#me June 1, 2010 pav:createdBy pav:createdOn ann:annotates http://anyurl.com/sf_pat01.html hasTag rdf:Type hasTopic Tag Atomic tag FMA:skull ann:context onDocument Linear skull fracture rdf:Type Other annotations on the same document: 1. Atomic annotation on image (tag: “hematoma”) 2. General annotation (tag: “injury”) InitEndCornerSelector init Other annotations on similar documents: (304, 507) 1. General annotation (tag: “skull fracture”) rdfs:SubClassOf end (380, 618) ImageSelector Slide by Tim Clark
  • 36. D. Linked Data at Elsevier
  • 37. D. Linked Data at Elsevier <ce:section id=#123>
  • 38. D. Linked Data at Elsevier this says <ce:section id=#123> mice like cheese
  • 39. D. Linked Data at Elsevier said @anita on May 31 2010 this says <ce:section id=#123> mice like cheese
  • 40. D. Linked Data at Elsevier but we all know she was jetlagged then said @anita on May 31 2010 this says <ce:section id=#123> mice like cheese
  • 41. D. Linked Data at Elsevier immutable, $$, proprietary but we all know she was jetlagged then said @anita on May 31 2010 this says <ce:section id=#123> mice like cheese
  • 42. D. Linked Data at Elsevier immutable, $$, proprietary dynamic, personal, task-driven, - open? but we all know she was jetlagged then said @anita on May 31 2010 this says <ce:section id=#123> mice like cheese
  • 44. F. Social Change. Some next Steps:
  • 45. F. Social Change. Some next Steps: • 2010 - 2011: Try to gather resources, current leaders, etc. for ‘Future of Research Communication’ effort
  • 46. F. Social Change. Some next Steps: • 2010 - 2011: Try to gather resources, current leaders, etc. for ‘Future of Research Communication’ effort –Fall 2010: Develop virtual community (with Harvard)
  • 47. F. Social Change. Some next Steps: • 2010 - 2011: Try to gather resources, current leaders, etc. for ‘Future of Research Communication’ effort –Fall 2010: Develop virtual community (with Harvard) –August 2011: Dagstuhl Workshop:
  • 48. F. Social Change. Some next Steps: • 2010 - 2011: Try to gather resources, current leaders, etc. for ‘Future of Research Communication’ effort –Fall 2010: Develop virtual community (with Harvard) –August 2011: Dagstuhl Workshop: • Involve key people (include funding bodies, libraries, institutions) to see where bottlenecks are
  • 49. F. Social Change. Some next Steps: • 2010 - 2011: Try to gather resources, current leaders, etc. for ‘Future of Research Communication’ effort –Fall 2010: Develop virtual community (with Harvard) –August 2011: Dagstuhl Workshop: • Involve key people (include funding bodies, libraries, institutions) to see where bottlenecks are • Write white paper, implement
  • 50. F. Social Change. Some next Steps: • 2010 - 2011: Try to gather resources, current leaders, etc. for ‘Future of Research Communication’ effort –Fall 2010: Develop virtual community (with Harvard) –August 2011: Dagstuhl Workshop: • Involve key people (include funding bodies, libraries, institutions) to see where bottlenecks are • Write white paper, implement • 2011: ICCS ‘Executable Paper Challenge’?
  • 51.
  • 52. Scope: Tools and processes to: - Improve the process of creating, reviewing and editing scientific content - Interpret, visualize or connect science knowledge - Provide tools/ideas for measuring the impact of these improvements.
  • 53. Scope: Tools and processes to: - Improve the process of creating, reviewing and editing scientific content - Interpret, visualize or connect science knowledge - Provide tools/ideas for measuring the impact of these improvements. June 2008: 71 Submissions from 15 countries.
  • 54. Scope: Tools and processes to: - Improve the process of creating, reviewing and editing scientific content - Interpret, visualize or connect science knowledge - Provide tools/ideas for measuring the impact of these improvements. June 2008: 71 Submissions from 15 countries. August 2008: 10 Semi-finalists teams, access to: - 500,000 full text articles - Plus EMTREE, EmBase, Scopus - Created tool/demo - Presented to the Judges - Wrote a paper (accepted for JWeb Semantics)
  • 55. Scope: Tools and processes to: - Improve the process of creating, reviewing and editing scientific content - Interpret, visualize or connect science knowledge - Provide tools/ideas for measuring the impact of these improvements. June 2008: 71 Submissions from 15 countries. August 2008: 10 Semi-finalists teams, access to: - 500,000 full text articles - Plus EMTREE, EmBase, Scopus - Created tool/demo - Presented to the Judges - Wrote a paper (accepted for JWeb Semantics) April 2009: Judges selected 4 Finalist teams.
  • 56. Scope: Tools and processes to: - Improve the process of creating, reviewing and editing scientific content - Interpret, visualize or connect science knowledge - Provide tools/ideas for measuring the impact of these improvements. June 2008: 71 Submissions from 15 countries. August 2008: 10 Semi-finalists teams, access to: - 500,000 full text articles - Plus EMTREE, EmBase, Scopus - Created tool/demo - Presented to the Judges - Wrote a paper (accepted for JWeb Semantics) April 2009: Judges selected 4 Finalist teams. And the winners were:
  • 57. Scope: Tools and processes to: - Improve the process of creating, reviewing and editing scientific content - Interpret, visualize or connect science knowledge - Provide tools/ideas for measuring the impact of these improvements. June 2008: 71 Submissions from 15 countries. August 2008: 10 Semi-finalists teams, access to: - 500,000 full text articles - Plus EMTREE, EmBase, Scopus - Created tool/demo - Presented to the Judges - Wrote a paper (accepted for JWeb Semantics) April 2009: Judges selected 4 Finalist teams. And the winners were:
  • 58. Scope: Tools and processes to: - Improve the process of creating, reviewing and editing scientific content - Interpret, visualize or connect science knowledge - Provide tools/ideas for measuring the impact of these improvements. June 2008: 71 Submissions from 15 countries. August 2008: 10 Semi-finalists teams, access to: - 500,000 full text articles - Plus EMTREE, EmBase, Scopus - Created tool/demo - Presented to the Judges - Wrote a paper (accepted for JWeb Semantics) April 2009: Judges selected 4 Finalist teams. And the winners were: