Aggregating Research papers from Publishers' Systems to Support Text and Data Mining

•

1 gefällt mir•555 views

Discussing the challenges in interoperability of databases providing access to research papers and its need for text-mining.

Daten & Analysen

Aggregating Research Papers from Publishers’
Systems to Support Text and Data Mining
Deliberate Lack of Interoperability or
Not?
@openminted_eu
Dr. Petr Knoth
Knowledge Media institute, The Open University
United Kingdom
@petrknoth

Goal
Achieve seamless harmonised access to full
texts of open access research papers
originating from thousands of systems around
the world for machines to process and extract
knowledge from.
2

What are we doing
@openminted_eu
- Aggregating full texts of open access
research papers from all over the world
- Institutional, subject-based open
repositories & journals
- Publisher systems
- Pre-processing millions of research papers,
making them ready to text-mine (API, data
dumps)
- Working with researchers around the world
to extract knowledge from these data

Challenges
@openminted_eu
- Standardisation (OAI-PMH, ResourceSync,
bespoke APIs, nothing, etc.)
- Inconsistent implementation of standards
(referencing of full-texts from metadata,
variation in fields’ semantics, OpenAIRE
guidelines/RIOXX, etc.)
- Lack of incentives to adopt standards +
legal & ethical issues
- Scalability (due to in-adequate standards)
or bad practices (Robots exclusion, etc.)

Approach
@openminted_eu
- Surveying publishers for machine
accessibillity of OA content and technically
validating their answers
- Encouraging providers to follow good
practices (validation tools, advocacy)
- Implementing connectors to publishers
systems
- Addressing scalability issues
- Pragmatic approach

Conclusion
Seamless access to world’s research papers is
needed to enable the creation of text-mining
applications that will transform the way we do
research.
While we have already managed to provide this
for millions of research papers, we are still
facing a number of technical, organisational,
legal and ethical challenges in making seamless
machine access to world’s research papers a
reality.
6

Empfohlen

OpenMinted: It's Uses and Benefits for the Social Sciencesopenminted_eu

How can repositories support the text mining of their content and why?openminted_eu

The Future is All Mineopenminted_eu

Development of an statistical package for genetic evaluation of treesFacundo Muñoz

CV-LuisIbanezLuis Ibanez Herrera

OpenAIRE Presentation in the OpenAIRE Berlin Conference, Dec 2009, ParisOpenAIRE

OpenMinTeD: Making Sense of Large Volumes of Dataopenminted_eu

Report of the second FAIRDOM foundryFAIRDOM

Weitere ähnliche Inhalte

Was ist angesagt?

OpenMinTeD - Repositories in the centre of new scientific knowledgeopenminted_eu

pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)Gregor Hagedorn

Voa3r Identification Analysis Technical Requirementsalbertoabian

Reproducible and citable data and models: an introduction.FAIRDOM

Overview of the NIH BD2K CEDAR centre, on metadata and standardsSusanna-Assunta Sansone

The FAIRDOM Commons for Systems BiologyFAIRDOM

From data to knowledge – the Ondex System for integrating Life Sciences data ...Catherine Canevet

Making your data good enough for sharing.FAIRDOM

Enabling Semantically Aware Software Applications Trish Whetzel

LIBER on the path towards Open Science: Libraries as enablers LIBER Europe

FAIR data and model management for systems biology.FAIRDOM

Opportunities in chemical structure standardizationValery Tkachenko

Open Science in a European PerspectivePlatforma Otwartej Nauki

Pl data science october 2017Data Science Leuven

OpenAIRE at EIFL General Assembly, Lund, August 2010OpenAIRE

Abel L Packer – SciELO advances as an Open Science programPlatforma Otwartej Nauki

Understanding the users of the Parliamentary Web Archive: a user research pro...Peter Webster

OpenAIRE at Workshop on CRIS and OAR, May 2010OpenAIRE

Standards and tools for model management in biomedical researchUniversity Medicine Greifswald

Improving the Management of Computational Models -- Invited talk at the EBIMartin Scharm

Was ist angesagt? (20)

OpenMinTeD - Repositories in the centre of new scientific knowledge

pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)

Voa3r Identification Analysis Technical Requirements

Reproducible and citable data and models: an introduction.

Overview of the NIH BD2K CEDAR centre, on metadata and standards

The FAIRDOM Commons for Systems Biology

From data to knowledge – the Ondex System for integrating Life Sciences data ...

Making your data good enough for sharing.

Enabling Semantically Aware Software Applications

LIBER on the path towards Open Science: Libraries as enablers

FAIR data and model management for systems biology.

Opportunities in chemical structure standardization

Open Science in a European Perspective

Pl data science october 2017

OpenAIRE at EIFL General Assembly, Lund, August 2010

Abel L Packer – SciELO advances as an Open Science program

Understanding the users of the Parliamentary Web Archive: a user research pro...

OpenAIRE at Workshop on CRIS and OAR, May 2010

Standards and tools for model management in biomedical research

Improving the Management of Computational Models -- Invited talk at the EBI

Andere mochten auch

My repository is being aggregated: a blessing or a curse?petrknoth

FOSTER - Content Delivery (WP3)petrknoth

Semantometrics: Towards Fulltext-based Research Evaluationpetrknoth

RFringe15GSGraham Steel

Towards an Infrastructure for Mining Scientific Publicationspetrknoth

Amicable resources corporate presentation- Human resource companyrachna1122

All Joke PhotosPaquetrash Corp. Ink. LTDA. S/A

The murder of a student.selimkaradag

DEVCSI Core Mobilepetrknoth

Snail 12345reblyn1

Core presentationpetrknoth

CORE projects familypetrknoth

From Open Access Metadata to Open Access Content: Two Principles for Increase...petrknoth

DiggiCORE: Digging into Connected Repositoriespetrknoth

Ali’S Careers Power Pointguestb4db5a8

Text mining in CORE (OR2012)petrknoth

CORE: Aggregating and Enriching Content to Support Open Accesspetrknoth

Suman Panditsumanpandit

The Clown DoctorGrace Sevilla-Giestas

93136540 spider-cloud-small-cell-cluster-case-study-091911-finalZarobiza

Andere mochten auch (20)

My repository is being aggregated: a blessing or a curse?

FOSTER - Content Delivery (WP3)

Semantometrics: Towards Fulltext-based Research Evaluation

RFringe15GS

Towards an Infrastructure for Mining Scientific Publications

Amicable resources corporate presentation- Human resource company

All Joke Photos

The murder of a student.

DEVCSI Core Mobile

Snail 12345

Core presentation

CORE projects family

From Open Access Metadata to Open Access Content: Two Principles for Increase...

DiggiCORE: Digging into Connected Repositories

Ali’S Careers Power Point

Text mining in CORE (OR2012)

CORE: Aggregating and Enriching Content to Support Open Access

Suman Pandit

The Clown Doctor

93136540 spider-cloud-small-cell-cluster-case-study-091911-final

Ähnlich wie Aggregating Research papers from Publishers' Systems to Support Text and Data Mining

Open Data (and Software, and other Research Artefacts) -A proper managementOscar Corcho

New trends in ontological engineering, practices and toolsMaría Poveda Villalón

OpenAIRE: eInfrastructure for Open ScienceOpenAIRE

Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)OpenAIRE

OpenAIRE presentation at EuroCRIS Seminar "Evaluation of Research using a CRIS"OpenAIRE

The case for cloud computing in Life SciencesOla Spjuth

(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)OpenAIRE

Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...OpenAIRE

OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance

Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe

Data and Research Infrastructures and Open ScienceLaboratorio di Cultura Digitale, labcd.humnet.unipi.it

Hughes RDAP11 Data Publication RepositoriesASIS&T

The BlueBRIDGE approach to collaborative researchBlue BRIDGE

Semantic Technologies for Big Sciences including AstrophysicsArtificial Intelligence Institute at UofSC

The European Open Science Cloud: just what is it?Carole Goble

Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble

OpenAIRE at the Open Access Tage 2010, GöttingenOpenAIRE

Introduction to FAIRDOMCarole Goble

eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...e-ROSA

Ähnlich wie Aggregating Research papers from Publishers' Systems to Support Text and Data Mining (20)

Open Data (and Software, and other Research Artefacts) -A proper management

New trends in ontological engineering, practices and tools

OpenAIRE: eInfrastructure for Open Science

Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)

OpenAIRE presentation at EuroCRIS Seminar "Evaluation of Research using a CRIS"

The case for cloud computing in Life Sciences

(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)

Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...

OpenAIRE and Eudat services and tools to support FAIR DMP implementation

Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...

Data and Research Infrastructures and Open Science

Hughes RDAP11 Data Publication Repositories

The BlueBRIDGE approach to collaborative research

Semantic Technologies for Big Sciences including Astrophysics

The European Open Science Cloud: just what is it?

Being FAIR: FAIR data and model management SSBSS 2017 Summer School

OpenAIRE at the Open Access Tage 2010, Göttingen

Introduction to FAIRDOM

eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...

Mehr von petrknoth

Qui Bono? Cumulative advantage in open access publishingpetrknoth

CORE APIv3petrknoth

OAI Identifiers: Decentralised PIDs for Research Outputs in Repositoriespetrknoth

UKRI OA policy requirements for repositories and how to meet thempetrknoth

Enabling Educators to LocateHigh-Quality Teaching Resourcespetrknoth

Tracking compliance of the REF2021 policy with the CORE Repository Dashboardpetrknoth

Better together: building services for public good on top of content from the...petrknoth

CORE Analytics Dashboardpetrknoth

Better together: building services for public good on top of content from the...petrknoth

Analysing the performance of open access papers discovery toolspetrknoth

Assessing Compliance with the UK REF 2021 Open Access Policypetrknoth

Data interoperability toolkit (OpenMinTeD)petrknoth

Integrating research indicators for use in the repositories infrastructure petrknoth

Towards effective research recommender systems for repositoriespetrknoth

COAR Next Generation Repositories WG - Text mining and Recommender system sto...petrknoth

Seamless access to the world’s open access research papers via ResourceSyncpetrknoth

Mehr von petrknoth (16)

Qui Bono? Cumulative advantage in open access publishing

CORE APIv3

OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories

UKRI OA policy requirements for repositories and how to meet them

Enabling Educators to LocateHigh-Quality Teaching Resources

Tracking compliance of the REF2021 policy with the CORE Repository Dashboard

Better together: building services for public good on top of content from the...

CORE Analytics Dashboard

Better together: building services for public good on top of content from the...

Analysing the performance of open access papers discovery tools

Assessing Compliance with the UK REF 2021 Open Access Policy

Data interoperability toolkit (OpenMinTeD)

Integrating research indicators for use in the repositories infrastructure

Towards effective research recommender systems for repositories

COAR Next Generation Repositories WG - Text mining and Recommender system sto...

Seamless access to the world’s open access research papers via ResourceSync

Kürzlich hochgeladen

6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)

Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo

IBEF report on the Insurance market in IndiaManalVerma4

why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole

Insurance Churn Prediction Data Analysis ProjectBoston Institute of Analytics

Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56

2023 Survey Shows Dip in High School E-Cigarette UseBisnar Chase Personal Injury Attorneys

Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics

DATA ANALYSIS using various data sets like shoping data set etclalithasri22

Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation

Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics

Presentation of project of business person who are successPratikSingh115843

Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...ThinkInnovation

Role of Consumer Insights in business transformationAnnie Melnic

Data Analysis Project: Stroke PredictionBoston Institute of Analytics

Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics

Kürzlich hochgeladen (16)

6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...

Digital Indonesia Report 2024 by We Are Social .pdf

IBEF report on the Insurance market in India

why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...

Insurance Churn Prediction Data Analysis Project

Statistics For Management by Richard I. Levin 8ed.pdf

2023 Survey Shows Dip in High School E-Cigarette Use

Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model

DATA ANALYSIS using various data sets like shoping data set etc

Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...

Bank Loan Approval Analysis: A Comprehensive Data Analysis Project

Presentation of project of business person who are success

Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...

Role of Consumer Insights in business transformation

Data Analysis Project: Stroke Prediction

Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...

Aggregating Research papers from Publishers' Systems to Support Text and Data Mining

1. Aggregating Research Papers from Publishers’ Systems to Support Text and Data Mining Deliberate Lack of Interoperability or Not? @openminted_eu Dr. Petr Knoth Knowledge Media institute, The Open University United Kingdom @petrknoth

2. Goal Achieve seamless harmonised access to full texts of open access research papers originating from thousands of systems around the world for machines to process and extract knowledge from. 2

3. What are we doing @openminted_eu - Aggregating full texts of open access research papers from all over the world - Institutional, subject-based open repositories & journals - Publisher systems - Pre-processing millions of research papers, making them ready to text-mine (API, data dumps) - Working with researchers around the world to extract knowledge from these data

4. Challenges @openminted_eu - Standardisation (OAI-PMH, ResourceSync, bespoke APIs, nothing, etc.) - Inconsistent implementation of standards (referencing of full-texts from metadata, variation in fields’ semantics, OpenAIRE guidelines/RIOXX, etc.) - Lack of incentives to adopt standards + legal & ethical issues - Scalability (due to in-adequate standards) or bad practices (Robots exclusion, etc.)

5. Approach @openminted_eu - Surveying publishers for machine accessibillity of OA content and technically validating their answers - Encouraging providers to follow good practices (validation tools, advocacy) - Implementing connectors to publishers systems - Addressing scalability issues - Pragmatic approach

6. Conclusion Seamless access to world’s research papers is needed to enable the creation of text-mining applications that will transform the way we do research. While we have already managed to provide this for millions of research papers, we are still facing a number of technical, organisational, legal and ethical challenges in making seamless machine access to world’s research papers a reality. 6

Hinweis der Redaktion

= don’t say Text and Data Mining (TDM) of research literature has the potential to revolutionise the way we do research. It can improve the ways in which we discover, access, read, disseminate and evaluate research. However, to realise the full potential of text mining scientific data text-miners need seemless unrestricted access to the underlying data.
With more than 1.5 million new research papers a year and more than 100 million research papers published, there is no one who can read all relevant information in their field. Consequently, Text and Data Mining (TDM) of research literature has the potential to revolutionise the way we do research. It can improve the ways in which we discover, access, read, disseminate and evaluate research. To realise the full potential of text mining scientific information, we need seamless unrestricted access to the underlying data. We need infrastructure for not just people to access papers, but in particulars for machine to be able to read scientific data at scale. This is an essential building block that will make it possible to increase the effectiveness of science, help us to find new treatments, enable businesses to innovate faster, etc. In this lightning talk I will introduce the challenges we are facing in working towards achieving this.
Include a slide explaining we are specifically looking for publisher platforms