Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Aggregating Research papers from Publishers' Systems to Support Text and Data Mining
1. Aggregating Research Papers from Publishers’
Systems to Support Text and Data Mining
Deliberate Lack of Interoperability or
Not?
@openminted_eu
Dr. Petr Knoth
Knowledge Media institute, The Open University
United Kingdom
@petrknoth
2. Goal
Achieve seamless harmonised access to full
texts of open access research papers
originating from thousands of systems around
the world for machines to process and extract
knowledge from.
2
3. What are we doing
@openminted_eu
- Aggregating full texts of open access
research papers from all over the world
- Institutional, subject-based open
repositories & journals
- Publisher systems
- Pre-processing millions of research papers,
making them ready to text-mine (API, data
dumps)
- Working with researchers around the world
to extract knowledge from these data
4. Challenges
@openminted_eu
- Standardisation (OAI-PMH, ResourceSync,
bespoke APIs, nothing, etc.)
- Inconsistent implementation of standards
(referencing of full-texts from metadata,
variation in fields’ semantics, OpenAIRE
guidelines/RIOXX, etc.)
- Lack of incentives to adopt standards +
legal & ethical issues
- Scalability (due to in-adequate standards)
or bad practices (Robots exclusion, etc.)
5. Approach
@openminted_eu
- Surveying publishers for machine
accessibillity of OA content and technically
validating their answers
- Encouraging providers to follow good
practices (validation tools, advocacy)
- Implementing connectors to publishers
systems
- Addressing scalability issues
- Pragmatic approach
6. Conclusion
Seamless access to world’s research papers is
needed to enable the creation of text-mining
applications that will transform the way we do
research.
While we have already managed to provide this
for millions of research papers, we are still
facing a number of technical, organisational,
legal and ethical challenges in making seamless
machine access to world’s research papers a
reality.
6
Hinweis der Redaktion
= don’t say
Text and Data Mining (TDM) of research literature has the potential to revolutionise the way we do research. It can improve the ways in which we discover, access, read, disseminate and evaluate research. However, to realise the full potential of text mining scientific data text-miners need seemless unrestricted access to the underlying data.
With more than 1.5 million new research papers a year and more than 100 million research papers published, there is no one who can read all relevant information in their field. Consequently, Text and Data Mining (TDM) of research literature has the potential to revolutionise the way we do research. It can improve the ways in which we discover, access, read, disseminate and evaluate research.
To realise the full potential of text mining scientific information, we need seamless unrestricted access to the underlying data. We need infrastructure for not just people to access papers, but in particulars for machine to be able to read scientific data at scale. This is an essential building block that will make it possible to increase the effectiveness of science, help us to find new treatments, enable businesses to innovate faster, etc.
In this lightning talk I will introduce the challenges we are facing in working towards achieving this.
Include a slide explaining we are specifically looking for publisher platforms