SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
Aggregating Research Papers from Publishers’
Systems to Support Text and Data Mining
Deliberate Lack of Interoperability or
Not?
@openminted_eu
Dr. Petr Knoth
Knowledge Media institute, The Open University
United Kingdom
@petrknoth
Goal
Achieve seamless harmonised access to full
texts of open access research papers
originating from thousands of systems around
the world for machines to process and extract
knowledge from.
2
What are we doing
@openminted_eu
- Aggregating full texts of open access
research papers from all over the world
- Institutional, subject-based open
repositories & journals
- Publisher systems
- Pre-processing millions of research papers,
making them ready to text-mine (API, data
dumps)
- Working with researchers around the world
to extract knowledge from these data
Challenges
@openminted_eu
- Standardisation (OAI-PMH, ResourceSync,
bespoke APIs, nothing, etc.)
- Inconsistent implementation of standards
(referencing of full-texts from metadata,
variation in fields’ semantics, OpenAIRE
guidelines/RIOXX, etc.)
- Lack of incentives to adopt standards +
legal & ethical issues
- Scalability (due to in-adequate standards)
or bad practices (Robots exclusion, etc.)
Approach
@openminted_eu
- Surveying publishers for machine
accessibillity of OA content and technically
validating their answers
- Encouraging providers to follow good
practices (validation tools, advocacy)
- Implementing connectors to publishers
systems
- Addressing scalability issues
- Pragmatic approach
Conclusion
Seamless access to world’s research papers is
needed to enable the creation of text-mining
applications that will transform the way we do
research.
While we have already managed to provide this
for millions of research papers, we are still
facing a number of technical, organisational,
legal and ethical challenges in making seamless
machine access to world’s research papers a
reality.
6

Weitere ähnliche Inhalte

Was ist angesagt?

OpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeOpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeopenminted_eu
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)Gregor Hagedorn
 
Voa3r Identification Analysis Technical Requirements
Voa3r Identification Analysis Technical RequirementsVoa3r Identification Analysis Technical Requirements
Voa3r Identification Analysis Technical Requirementsalbertoabian
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.FAIRDOM
 
Overview of the NIH BD2K CEDAR centre, on metadata and standards
Overview of the NIH BD2K CEDAR centre, on metadata and standardsOverview of the NIH BD2K CEDAR centre, on metadata and standards
Overview of the NIH BD2K CEDAR centre, on metadata and standardsSusanna-Assunta Sansone
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyFAIRDOM
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...Catherine Canevet
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.FAIRDOM
 
Enabling Semantically Aware Software Applications
Enabling Semantically Aware Software Applications Enabling Semantically Aware Software Applications
Enabling Semantically Aware Software Applications Trish Whetzel
 
LIBER on the path towards Open Science: Libraries as enablers
LIBER on the path towards Open Science:  Libraries as enablers LIBER on the path towards Open Science:  Libraries as enablers
LIBER on the path towards Open Science: Libraries as enablers LIBER Europe
 
FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIRDOM
 
Opportunities in chemical structure standardization
Opportunities in chemical structure standardizationOpportunities in chemical structure standardization
Opportunities in chemical structure standardizationValery Tkachenko
 
OpenAIRE at EIFL General Assembly, Lund, August 2010
OpenAIRE at EIFL General Assembly, Lund, August 2010OpenAIRE at EIFL General Assembly, Lund, August 2010
OpenAIRE at EIFL General Assembly, Lund, August 2010OpenAIRE
 
Abel L Packer – SciELO advances as an Open Science program
Abel L Packer – SciELO advances as an Open Science programAbel L Packer – SciELO advances as an Open Science program
Abel L Packer – SciELO advances as an Open Science programPlatforma Otwartej Nauki
 
Understanding the users of the Parliamentary Web Archive: a user research pro...
Understanding the users of the Parliamentary Web Archive: a user research pro...Understanding the users of the Parliamentary Web Archive: a user research pro...
Understanding the users of the Parliamentary Web Archive: a user research pro...Peter Webster
 
OpenAIRE at Workshop on CRIS and OAR, May 2010
OpenAIRE at Workshop on CRIS and OAR, May 2010OpenAIRE at Workshop on CRIS and OAR, May 2010
OpenAIRE at Workshop on CRIS and OAR, May 2010OpenAIRE
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchUniversity Medicine Greifswald
 
Improving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBIImproving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBIMartin Scharm
 

Was ist angesagt? (20)

OpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeOpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledge
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
 
Voa3r Identification Analysis Technical Requirements
Voa3r Identification Analysis Technical RequirementsVoa3r Identification Analysis Technical Requirements
Voa3r Identification Analysis Technical Requirements
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
 
Overview of the NIH BD2K CEDAR centre, on metadata and standards
Overview of the NIH BD2K CEDAR centre, on metadata and standardsOverview of the NIH BD2K CEDAR centre, on metadata and standards
Overview of the NIH BD2K CEDAR centre, on metadata and standards
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems Biology
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.
 
Enabling Semantically Aware Software Applications
Enabling Semantically Aware Software Applications Enabling Semantically Aware Software Applications
Enabling Semantically Aware Software Applications
 
LIBER on the path towards Open Science: Libraries as enablers
LIBER on the path towards Open Science:  Libraries as enablers LIBER on the path towards Open Science:  Libraries as enablers
LIBER on the path towards Open Science: Libraries as enablers
 
FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.
 
Opportunities in chemical structure standardization
Opportunities in chemical structure standardizationOpportunities in chemical structure standardization
Opportunities in chemical structure standardization
 
Open Science in a European Perspective
Open Science in a European PerspectiveOpen Science in a European Perspective
Open Science in a European Perspective
 
Pl data science october 2017
Pl data science october 2017Pl data science october 2017
Pl data science october 2017
 
OpenAIRE at EIFL General Assembly, Lund, August 2010
OpenAIRE at EIFL General Assembly, Lund, August 2010OpenAIRE at EIFL General Assembly, Lund, August 2010
OpenAIRE at EIFL General Assembly, Lund, August 2010
 
Abel L Packer – SciELO advances as an Open Science program
Abel L Packer – SciELO advances as an Open Science programAbel L Packer – SciELO advances as an Open Science program
Abel L Packer – SciELO advances as an Open Science program
 
Understanding the users of the Parliamentary Web Archive: a user research pro...
Understanding the users of the Parliamentary Web Archive: a user research pro...Understanding the users of the Parliamentary Web Archive: a user research pro...
Understanding the users of the Parliamentary Web Archive: a user research pro...
 
OpenAIRE at Workshop on CRIS and OAR, May 2010
OpenAIRE at Workshop on CRIS and OAR, May 2010OpenAIRE at Workshop on CRIS and OAR, May 2010
OpenAIRE at Workshop on CRIS and OAR, May 2010
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical research
 
Improving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBIImproving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBI
 

Andere mochten auch

My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?petrknoth
 
FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)petrknoth
 
Semantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research EvaluationSemantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research Evaluationpetrknoth
 
Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publicationspetrknoth
 
Amicable resources corporate presentation- Human resource company
Amicable resources corporate presentation- Human resource companyAmicable resources corporate presentation- Human resource company
Amicable resources corporate presentation- Human resource companyrachna1122
 
The murder of a student.
The murder of a student.The murder of a student.
The murder of a student.selimkaradag
 
DEVCSI Core Mobile
DEVCSI Core MobileDEVCSI Core Mobile
DEVCSI Core Mobilepetrknoth
 
Snail 12345
Snail 12345Snail 12345
Snail 12345reblyn1
 
Core presentation
Core presentationCore presentation
Core presentationpetrknoth
 
CORE projects family
CORE projects familyCORE projects family
CORE projects familypetrknoth
 
From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...petrknoth
 
DiggiCORE: Digging into Connected Repositories
DiggiCORE: Digging into Connected RepositoriesDiggiCORE: Digging into Connected Repositories
DiggiCORE: Digging into Connected Repositoriespetrknoth
 
Ali’S Careers Power Point
Ali’S Careers Power PointAli’S Careers Power Point
Ali’S Careers Power Pointguestb4db5a8
 
Text mining in CORE (OR2012)
Text mining in CORE (OR2012)Text mining in CORE (OR2012)
Text mining in CORE (OR2012)petrknoth
 
CORE: Aggregating and Enriching Content to Support Open Access
CORE: Aggregating and Enriching Content to Support Open AccessCORE: Aggregating and Enriching Content to Support Open Access
CORE: Aggregating and Enriching Content to Support Open Accesspetrknoth
 
93136540 spider-cloud-small-cell-cluster-case-study-091911-final
93136540 spider-cloud-small-cell-cluster-case-study-091911-final93136540 spider-cloud-small-cell-cluster-case-study-091911-final
93136540 spider-cloud-small-cell-cluster-case-study-091911-finalZarobiza
 

Andere mochten auch (20)

My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?
 
FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)
 
Semantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research EvaluationSemantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research Evaluation
 
RFringe15GS
RFringe15GSRFringe15GS
RFringe15GS
 
Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publications
 
Amicable resources corporate presentation- Human resource company
Amicable resources corporate presentation- Human resource companyAmicable resources corporate presentation- Human resource company
Amicable resources corporate presentation- Human resource company
 
All Joke Photos
All Joke PhotosAll Joke Photos
All Joke Photos
 
The murder of a student.
The murder of a student.The murder of a student.
The murder of a student.
 
DEVCSI Core Mobile
DEVCSI Core MobileDEVCSI Core Mobile
DEVCSI Core Mobile
 
Snail 12345
Snail 12345Snail 12345
Snail 12345
 
Core presentation
Core presentationCore presentation
Core presentation
 
CORE projects family
CORE projects familyCORE projects family
CORE projects family
 
From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...
 
DiggiCORE: Digging into Connected Repositories
DiggiCORE: Digging into Connected RepositoriesDiggiCORE: Digging into Connected Repositories
DiggiCORE: Digging into Connected Repositories
 
Ali’S Careers Power Point
Ali’S Careers Power PointAli’S Careers Power Point
Ali’S Careers Power Point
 
Text mining in CORE (OR2012)
Text mining in CORE (OR2012)Text mining in CORE (OR2012)
Text mining in CORE (OR2012)
 
CORE: Aggregating and Enriching Content to Support Open Access
CORE: Aggregating and Enriching Content to Support Open AccessCORE: Aggregating and Enriching Content to Support Open Access
CORE: Aggregating and Enriching Content to Support Open Access
 
Suman Pandit
Suman PanditSuman Pandit
Suman Pandit
 
The Clown Doctor
The Clown DoctorThe Clown Doctor
The Clown Doctor
 
93136540 spider-cloud-small-cell-cluster-case-study-091911-final
93136540 spider-cloud-small-cell-cluster-case-study-091911-final93136540 spider-cloud-small-cell-cluster-case-study-091911-final
93136540 spider-cloud-small-cell-cluster-case-study-091911-final
 

Ähnlich wie Aggregating Research papers from Publishers' Systems to Support Text and Data Mining

Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management Oscar Corcho
 
New trends in ontological engineering, practices and tools
New trends in ontological engineering, practices and toolsNew trends in ontological engineering, practices and tools
New trends in ontological engineering, practices and toolsMaría Poveda Villalón
 
OpenAIRE: eInfrastructure for Open Science
OpenAIRE: eInfrastructure for Open ScienceOpenAIRE: eInfrastructure for Open Science
OpenAIRE: eInfrastructure for Open ScienceOpenAIRE
 
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)OpenAIRE
 
OpenAIRE presentation at EuroCRIS Seminar "Evaluation of Research using a CRIS"
OpenAIRE presentation at EuroCRIS Seminar "Evaluation of Research using a CRIS"OpenAIRE presentation at EuroCRIS Seminar "Evaluation of Research using a CRIS"
OpenAIRE presentation at EuroCRIS Seminar "Evaluation of Research using a CRIS"OpenAIRE
 
The case for cloud computing in Life Sciences
The case for cloud computing in Life SciencesThe case for cloud computing in Life Sciences
The case for cloud computing in Life SciencesOla Spjuth
 
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)OpenAIRE
 
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...OpenAIRE
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesASIS&T
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?Carole Goble
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
OpenAIRE at the Open Access Tage 2010, Göttingen
OpenAIRE at the Open Access Tage 2010, GöttingenOpenAIRE at the Open Access Tage 2010, Göttingen
OpenAIRE at the Open Access Tage 2010, GöttingenOpenAIRE
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOMCarole Goble
 
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...e-ROSA
 

Ähnlich wie Aggregating Research papers from Publishers' Systems to Support Text and Data Mining (20)

Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management
 
New trends in ontological engineering, practices and tools
New trends in ontological engineering, practices and toolsNew trends in ontological engineering, practices and tools
New trends in ontological engineering, practices and tools
 
OpenAIRE: eInfrastructure for Open Science
OpenAIRE: eInfrastructure for Open ScienceOpenAIRE: eInfrastructure for Open Science
OpenAIRE: eInfrastructure for Open Science
 
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
 
OpenAIRE presentation at EuroCRIS Seminar "Evaluation of Research using a CRIS"
OpenAIRE presentation at EuroCRIS Seminar "Evaluation of Research using a CRIS"OpenAIRE presentation at EuroCRIS Seminar "Evaluation of Research using a CRIS"
OpenAIRE presentation at EuroCRIS Seminar "Evaluation of Research using a CRIS"
 
The case for cloud computing in Life Sciences
The case for cloud computing in Life SciencesThe case for cloud computing in Life Sciences
The case for cloud computing in Life Sciences
 
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
 
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
Data and Research Infrastructures and Open Science
Data and Research Infrastructures and Open ScienceData and Research Infrastructures and Open Science
Data and Research Infrastructures and Open Science
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
OpenAIRE at the Open Access Tage 2010, Göttingen
OpenAIRE at the Open Access Tage 2010, GöttingenOpenAIRE at the Open Access Tage 2010, Göttingen
OpenAIRE at the Open Access Tage 2010, Göttingen
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
 
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
 

Mehr von petrknoth

Qui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishingQui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishingpetrknoth
 
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in RepositoriesOAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositoriespetrknoth
 
UKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet themUKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet thempetrknoth
 
Enabling Educators to Locate High-Quality Teaching Resources
Enabling Educators to LocateHigh-Quality Teaching ResourcesEnabling Educators to LocateHigh-Quality Teaching Resources
Enabling Educators to Locate High-Quality Teaching Resourcespetrknoth
 
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository DashboardTracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository Dashboardpetrknoth
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...petrknoth
 
CORE Analytics Dashboard
CORE Analytics DashboardCORE Analytics Dashboard
CORE Analytics Dashboardpetrknoth
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...petrknoth
 
Analysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery toolsAnalysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery toolspetrknoth
 
Assessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access PolicyAssessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access Policypetrknoth
 
Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)petrknoth
 
Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure petrknoth
 
Towards effective research recommender systems for repositories
Towards effective research recommender systems for repositoriesTowards effective research recommender systems for repositories
Towards effective research recommender systems for repositoriespetrknoth
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...petrknoth
 
Seamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncSeamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncpetrknoth
 

Mehr von petrknoth (16)

Qui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishingQui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishing
 
CORE APIv3
CORE APIv3CORE APIv3
CORE APIv3
 
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in RepositoriesOAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
 
UKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet themUKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet them
 
Enabling Educators to Locate High-Quality Teaching Resources
Enabling Educators to LocateHigh-Quality Teaching ResourcesEnabling Educators to LocateHigh-Quality Teaching Resources
Enabling Educators to Locate High-Quality Teaching Resources
 
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository DashboardTracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
CORE Analytics Dashboard
CORE Analytics DashboardCORE Analytics Dashboard
CORE Analytics Dashboard
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
Analysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery toolsAnalysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery tools
 
Assessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access PolicyAssessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access Policy
 
Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)
 
Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure
 
Towards effective research recommender systems for repositories
Towards effective research recommender systems for repositoriesTowards effective research recommender systems for repositories
Towards effective research recommender systems for repositories
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
 
Seamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncSeamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSync
 

Kürzlich hochgeladen

6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...ThinkInnovation
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 

Kürzlich hochgeladen (16)

6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 

Aggregating Research papers from Publishers' Systems to Support Text and Data Mining

  • 1. Aggregating Research Papers from Publishers’ Systems to Support Text and Data Mining Deliberate Lack of Interoperability or Not? @openminted_eu Dr. Petr Knoth Knowledge Media institute, The Open University United Kingdom @petrknoth
  • 2. Goal Achieve seamless harmonised access to full texts of open access research papers originating from thousands of systems around the world for machines to process and extract knowledge from. 2
  • 3. What are we doing @openminted_eu - Aggregating full texts of open access research papers from all over the world - Institutional, subject-based open repositories & journals - Publisher systems - Pre-processing millions of research papers, making them ready to text-mine (API, data dumps) - Working with researchers around the world to extract knowledge from these data
  • 4. Challenges @openminted_eu - Standardisation (OAI-PMH, ResourceSync, bespoke APIs, nothing, etc.) - Inconsistent implementation of standards (referencing of full-texts from metadata, variation in fields’ semantics, OpenAIRE guidelines/RIOXX, etc.) - Lack of incentives to adopt standards + legal & ethical issues - Scalability (due to in-adequate standards) or bad practices (Robots exclusion, etc.)
  • 5. Approach @openminted_eu - Surveying publishers for machine accessibillity of OA content and technically validating their answers - Encouraging providers to follow good practices (validation tools, advocacy) - Implementing connectors to publishers systems - Addressing scalability issues - Pragmatic approach
  • 6. Conclusion Seamless access to world’s research papers is needed to enable the creation of text-mining applications that will transform the way we do research. While we have already managed to provide this for millions of research papers, we are still facing a number of technical, organisational, legal and ethical challenges in making seamless machine access to world’s research papers a reality. 6

Hinweis der Redaktion

  1. = don’t say Text and Data Mining (TDM) of research literature has the potential to revolutionise the way we do research. It can improve the ways in which we discover, access, read, disseminate and evaluate research. However, to realise the full potential of text mining scientific data text-miners need seemless unrestricted access to the underlying data.
  2. With more than 1.5 million new research papers a year and more than 100 million research papers published, there is no one who can read all relevant information in their field. Consequently, Text and Data Mining (TDM) of research literature has the potential to revolutionise the way we do research. It can improve the ways in which we discover, access, read, disseminate and evaluate research. To realise the full potential of text mining scientific information, we need seamless unrestricted access to the underlying data. We need infrastructure for not just people to access papers, but in particulars for machine to be able to read scientific data at scale. This is an essential building block that will make it possible to increase the effectiveness of science, help us to find new treatments, enable businesses to innovate faster, etc. In this lightning talk I will introduce the challenges we are facing in working towards achieving this.
  3. Include a slide explaining we are specifically looking for publisher platforms