Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

AI-SDV 2022: Scientific publishing in the age of data mining and artificial intelligence Dieter Küry (Kuery Knowledge Management, Switzerland)

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
Simon hodson
Simon hodson
Wird geladen in …3
×

Hier ansehen

1 von 19 Anzeige

AI-SDV 2022: Scientific publishing in the age of data mining and artificial intelligence Dieter Küry (Kuery Knowledge Management, Switzerland)

Herunterladen, um offline zu lesen

Most scientific journals request, that the complete set of research data is published simultaneously with the peer-reviewed paper. The publication of the research data usually is carried out as so-called "Supplementary Material", attached to the original paper, or on a "Research Data Repository". Both forms have in common, that the data is published usually unstructured and not in an uniform machine processable format. This makes its further use in electronic tools for AI or data mining unnecessarily difficult or even impossible. A concept is presented, in which the data is digitally recorded, following the principle of FAIR data, as part of the publication process. This digital capture makes the data available to the scientific community for easy use in data mining and AI tools. The data in the repository contains links to the publication to document its origin. The concept is applicable for preprints, peer-review papers, diploma and doctoral theses and is particularly suitable for open access publications. Moreover, the presentation highlights correspondent activities, which were released in scientific publications recently.

Most scientific journals request, that the complete set of research data is published simultaneously with the peer-reviewed paper. The publication of the research data usually is carried out as so-called "Supplementary Material", attached to the original paper, or on a "Research Data Repository". Both forms have in common, that the data is published usually unstructured and not in an uniform machine processable format. This makes its further use in electronic tools for AI or data mining unnecessarily difficult or even impossible. A concept is presented, in which the data is digitally recorded, following the principle of FAIR data, as part of the publication process. This digital capture makes the data available to the scientific community for easy use in data mining and AI tools. The data in the repository contains links to the publication to document its origin. The concept is applicable for preprints, peer-review papers, diploma and doctoral theses and is particularly suitable for open access publications. Moreover, the presentation highlights correspondent activities, which were released in scientific publications recently.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Ähnlich wie AI-SDV 2022: Scientific publishing in the age of data mining and artificial intelligence Dieter Küry (Kuery Knowledge Management, Switzerland) (20)

Weitere von Dr. Haxel Consult (20)

Anzeige

Aktuellste (20)

AI-SDV 2022: Scientific publishing in the age of data mining and artificial intelligence Dieter Küry (Kuery Knowledge Management, Switzerland)

  1. 1. Scientific publishing in the age of data mining and artificial intelligence Oktober 10, 2022, AI-SDV Conference, Vienna Dr. Dieter Küry Küry Knowledge Management GmbH, Riehen, Switzerland
  2. 2. Abstract Most scientific journals request, that the complete set of research data is published simultaneously with the peer-reviewed paper. The publication of the research data usually is carried out as so-called "Supplementary Material", attached to the original paper, or on a "Research Data Repository". Both forms have in common, that the data is usually published unstructured and not in an uniform machine processable format. This makes its further use in electronic tools for AI or in text and data mining unnecessarily difficult or even impossible. A concept for a publication process is presented, in which the data is digitally recorded, following the principle of FAIR data, as part of the publication process. This digital capture makes the data available to the scientific community for easy use in data mining and AI tools. The data in the repository contains links to the publication to document its origin. The concept is applicable for preprints, peer-review papers, diploma and doctoral theses and is particularly suitable for open access publications. Moreover, the presentation highlights correspondent activities, which were released in scientific publications recently. 11.10.2022 Kuery Knowledge Management GmbH 2
  3. 3. Recently published 11.10.2022 Kuery Knowledge Management GmbH 3 Kearnes S. M., Maser M. R., Wleklinski M., Kast A., Doyle A. G., Dreher S. D., Hawkins J. M., Jensen K. F., and Coley C. W.; J. Am. Chem. Soc. 143, 18820– 18826 (2021). The Open Reaction Database Baldi, P.. J.; Chem. Inf. Model. acs.jcim.1c01140 (2021) doi:10.1021/acs.jcim.1c01140. Call for a Public Open Database of All Chemical Reactions Jablonka, K. M., Patiny, L. & Smit, B.; Nat. Chem. 14, 365–376 (2022). Making the collective knowledge of chemistry open and machine actionable
  4. 4. Excerpt 11.10.2022 Kuery Knowledge Management GmbH 4 from: Kearnes S. M., Maser M. R., Wleklinski M., Kast A., Doyle A. G., Dreher S. D., Hawkins J. M., Jensen K. F., and Coley C. W. The Open Reaction Database. J. Am. Chem. Soc. 143, 18820–18826 (2021). Publication date: November 2, 2021. “To be clear: we believe that PDFs without accompanying structured data should no longer clear the bar for publication.”
  5. 5. Where can research data be found? Current solutions (I) 11.10.2022 Kuery Knowledge Management GmbH 5 Universal solutions • Supporting/supplementary material for many journal titles • Unstructured repositories such as Dryad or FigShare (PLOS) • Sequences in WIPO applications to be listed in a standardized form Country specific solutions • RADAR by FIZ Karlsruhe for Germany
  6. 6. Where can research data be found? Current solutions (II) 11.10.2022 Kuery Knowledge Management GmbH 6 Subject area repositories • Crystallographic data in the Cambridge Structural Database (CSD) of the Cambridge Crystallographic Data Centre (CCDC) (founded 1965) • GenBank for sequences • Clinicaltrials.gov for clinical trials data • PubChem for chemical structures
  7. 7. Where can research data be found? Supporting services 11.10.2022 Kuery Knowledge Management GmbH 7 Directories • Registry of Research Data Repositories https://www.re3data.org/ Aggregator • CORE https://core.ac.uk/ • Aggregator of open access research papers from repositories and journals Guidelines • FAIRsharing.org https://fairsharing.org/ • Catalogue of standards, databases and policies to help ensure that 'experiments are reported with enough information to be comprehensible and (in principle) reproducible, compared or integrated'
  8. 8. Research data issues 11.10.2022 Kuery Knowledge Management GmbH 8 DOI • Enable easy citation • Provide a permanent link to the data Structure and formatting • Data published in an unstructured way e.g., PDF documents • No minimum standardization e.g., FAIR data principles Inconsistent guidelines by publishers • Authors can choose how and where to deposit • Poor linking between data and publication Storage of data • In repositories, hosted by various organizations • Government funded institutions • Non-profit societies and organizations • Commercial publishers
  9. 9. Purpose of scientific publishing 11.10.2022 Kuery Knowledge Management GmbH 9 Sharing research results with scientific community Demand, so far Contribution to the advancement of science Goal
  10. 10. Purpose of scientific publishing 11.10.2022 Kuery Knowledge Management GmbH 10 • Provide data for analysis, text and data mining • Provide data for AI tools Additional demands, today • New insights from scientific findings • Use of available and published data for the own scientific research Goals
  11. 11. Area of conflict 11.10.2022 Kuery Knowledge Management GmbH 11 Structured data easy reusable Unstructured data arduously adaptable
  12. 12. Area of conflict 11.10.2022 Kuery Knowledge Management GmbH 12 Structured data easy reusable Unstructured data arduously adaptable Reinforce with a modified publication process
  13. 13. Fragmentation of scientific publishing 11.10.2022 Kuery Knowledge Management GmbH 13 Publication with peer review Publication with peer review Supplementary material Preprint Publication with peer review Supplementary material / research data Preprint Publication with peer review Supplementary material / research data Notification and discussion in social media
  14. 14. Elements of a scientific publication 11.10.2022 Kuery Knowledge Management GmbH 14 Publication Preprint Comments/ discussion in social media Research data Peer review Final paper Notifications in social media by author
  15. 15. To a new process of publishing • Integrates all elements of a publication • Addresses challenges and trends • Increases value of research data 11.10.2022 Kuery Knowledge Management GmbH 15 Forward-looking process of publishing Data Final paper Peer review Pre- print Feed- back
  16. 16. The forward-looking process of publishing 11.10.2022 Kuery Knowledge Management GmbH 16 Final paper Peer review Community feedback Revision Comments Research data Pre-print Scientist Author Data scientist All elements hosted by one organization/publisher
  17. 17. Goals/advantages Publication process 11.10.2022 Kuery Knowledge Management GmbH 17 All elements integrated in one process, managed by one organization/publisher Full interlinking between all elements Early publication of results as preprint Community comments and feedback complement peer review Transparent peer review with reviewer’s comments linked to publication Final paper, authoritative for the advancement of science Missing final paper apparent; Retraction includes preprints and data in repository
  18. 18. Goals/advantages Research data 11.10.2022 Kuery Knowledge Management GmbH 18 Standardized data repositories Source of data identifiable Following FAIR data principles Managed and supported by publisher Data applicable for handling by AI and other software tools Open data or DaaS subscription
  19. 19. Thank you Dr. Dieter Küry Kuery Knowledge Management GmbH, Riehen, Switzerland dieter.kuery@kuery-km.ch 11.10.2022 Kuery Knowledge Management GmbH 19

×