SlideShare ist ein Scribd-Unternehmen logo
1 von 20
How can repositories support the
text-mining of their content and
why?
@openminted_eu
Dr. Petr Knoth and Dr. Nancy Pontika
Knowledge Media institute, The Open University
United Kingdom
Twitter: @oacore
Why should repositories
support TDM?
@openminted_eu
@openminted_eu
In the UK
Repositories and TDM
@openminted_eu
Institutional
Repositories
Subject
Repositories
Publishers/
OA journals
Other sources:
Research
Networking
Services
Primary Research
Data...
Text Mining Services
TDM & Repositories
Managers
@openminted_eu
• Established and maintain a close collaboration with
researchers
• Extensive experience in advocacy, i.e. open access
• Knowledgeable about the repository’s collection
• Participate in the Academic Institution’s Research
Committees
• Knowledgeable of your repository’s collection
• Familiarity with Copyright issues and Creative Commons
Licenses
How can repositories support
TDM?
TDM is all about processing text and data at
scale. The role of repositories is to facilitate the
aggregation of research papers at a full-text level
(and beyond) effectively enabling TDM services
to operate seamlessly on all available research
content.
7
What is the problem?
@openminted_eu
• A small study (Knoth, 2013)
• 83 repositories - mainly Eprints with PDF research
outputs
• 1,461,016 metadata records
metadata linked
to content
content
downloadable
content
machine
readable
Mean 54.1% 34.4% 27.6%
Median 39.5% 16.7% 13.0%
Standard
deviation
39.2% 34.2% 31.0%
How is content aggregated
today?
@openminted_eu
• DC over OAI-PMH: vast majority of repositories, never
intended to support content harvesting. The main problem:
linking metadata with content.
“The nature of a resource identifier is outside the scope of the OAI-
PMH. To facilitate access to the resource associated with harvested
metadata, repositories should use an element in metadata records
to establish a linkage between the record (and the identifier of its
item) and the identifier (URL, URN, DOI, etc.) of the associated
resource. The mandatory Dublin Core format provides the identifier
element that should be used for this purpose.”
How is content aggregated
today?
@openminted_eu
• RIOXX: Just one identifier, recommends the identifier
points to the actual resource being described.
• OpenAIRE Guidelines: identifier links to either the
resource or a jump-off page. Does allow multiple
identifiers.
• ResourceSync
• CrossRef: comercial publishers/journals
The content referencing
problem
@openminted_eu
Principle 1: content
referencing
Repositories should always establish a link from
the metadata record to the item the metadata
record describes using a dereferencable identifier
pointing to the version held locally in the
repository. The dereferencable identifier should
be provided in the appropriate metadata element
in the used metadata format (i.e. dc:identifier in
the case of Dublin Core). If multiple identifiers are
used, it is recommended listing the local
dereferencable identifier first.
1
The accessibility of
repositories to harvesting
systems
@openminted_eu
Principle 2: Content
accessibility to machines
Repositories must provide universal access to
machines with the same level of access as
humans have. It is the role of repositories to
allow aggregators to harvest the entire content of
the repository in a reasonable time to enable
acquiring and maintain up-to-date information
about the repository content.
1
What can repositories do?
@openminted_eu
• Ensure correct referencing of content from metadata:
• Dereferencable link which resolves to content
• Locally held (content under its control)
• Using a standard repository platform can help
• Check robots.txt
• Register your repository
• Advocate for good pdf (media) quality of deposited content
• Use monitoring tools
• CORE Repository Dashboard
• OpenAIRE Repository Manager Dashboard
• Machine readable licensing
beyond Open Access
MAKING SENSE OF
LARGE VOLUMES OF
SCIENTIFIC CONTENT
1
Interested in how to TDM
research papers?
@openminted_eu
We have 3 more
talks tomorrow!
Developer track 1, 11:00
Mining Open Access
publications with CORE
Interested in how to TDM
research papers?
@openminted_eu
We have 3 more
talks tomorrow!
Developer track 1, 11:20
Oxford vs Cambridge
Contest: Collecting Open
Research Evaluation
Metrics for University
Ranking
Interested in how to TDM
research papers?
@openminted_eu
We have 3 more
talks tomorrow!
Papers 4, 4:00
Exploring
Semantometrics:
full text-based
research
evaluation for
open repositories
Thank you
Dr. Pert Knoth,, Research Fellow
petr.knoth@open.ac.uk
Dr. Nancy Pontika, Open Access Aggregation
Officer
nancy.pontika@open.ac.uk
.
2

Weitere ähnliche Inhalte

Was ist angesagt?

Jisc on repositories unleashing data - Daniela Duca
Jisc on repositories unleashing data - Daniela DucaJisc on repositories unleashing data - Daniela Duca
Jisc on repositories unleashing data - Daniela DucaRepository Fringe
 
Unlocking Thesis Data - Stephen Grace, University of East London
Unlocking Thesis Data - Stephen Grace, University of East LondonUnlocking Thesis Data - Stephen Grace, University of East London
Unlocking Thesis Data - Stephen Grace, University of East LondonRepository Fringe
 
OpenAIRE compatibility for repositories - Webinar on the OpenAIRE Guidelines
OpenAIRE compatibility for repositories - Webinar on the OpenAIRE GuidelinesOpenAIRE compatibility for repositories - Webinar on the OpenAIRE Guidelines
OpenAIRE compatibility for repositories - Webinar on the OpenAIRE GuidelinesPedro Príncipe
 
OpenAIRE Broker Service and the Dashboard for Content Providers
OpenAIRE Broker Service and the Dashboard for Content ProvidersOpenAIRE Broker Service and the Dashboard for Content Providers
OpenAIRE Broker Service and the Dashboard for Content ProvidersOpenAIRE
 
Open Access: funders' policies and recent updates
Open Access: funders' policies and recent updatesOpen Access: funders' policies and recent updates
Open Access: funders' policies and recent updatesNancy Pontika
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Paolo Manghi
 
OpenAIRE presentation - Open Access Week 2014 @EKT Conference (Greece)
OpenAIRE presentation - Open Access Week 2014 @EKT Conference (Greece)OpenAIRE presentation - Open Access Week 2014 @EKT Conference (Greece)
OpenAIRE presentation - Open Access Week 2014 @EKT Conference (Greece)Pedro Príncipe
 
The Tropical Rain Forest Information Center
The Tropical Rain Forest Information CenterThe Tropical Rain Forest Information Center
The Tropical Rain Forest Information CenterRudolf Husar
 
Embl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildishEmbl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildishArchiver
 
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...OpenAIRE
 
OpenAIRE Dashboard for Content Providers: Open Science as-a-Service for liter...
OpenAIRE Dashboard for Content Providers: Open Science as-a-Service for liter...OpenAIRE Dashboard for Content Providers: Open Science as-a-Service for liter...
OpenAIRE Dashboard for Content Providers: Open Science as-a-Service for liter...OpenAIRE
 
CORE Repositories Dashboard
CORE Repositories DashboardCORE Repositories Dashboard
CORE Repositories DashboardNancy Pontika
 
Storage for research-data webinar - Deakin University
Storage for research-data webinar - Deakin UniversityStorage for research-data webinar - Deakin University
Storage for research-data webinar - Deakin UniversityARDC
 
Webinar on OpenAIRE compatibility for repositories: EPrints repository platform
Webinar on OpenAIRE compatibility for repositories: EPrints repository platform Webinar on OpenAIRE compatibility for repositories: EPrints repository platform
Webinar on OpenAIRE compatibility for repositories: EPrints repository platform OpenAIRE
 
OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...
OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...
OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...OpenAIRE
 
OAI and Publishers’ metadata: Using the static repositories approach to discl...
OAI and Publishers’ metadata: Using the static repositories approach to discl...OAI and Publishers’ metadata: Using the static repositories approach to discl...
OAI and Publishers’ metadata: Using the static repositories approach to discl...R. John Robertson
 
Voa3r Identification Analysis Technical Requirements
Voa3r Identification Analysis Technical RequirementsVoa3r Identification Analysis Technical Requirements
Voa3r Identification Analysis Technical Requirementsalbertoabian
 

Was ist angesagt? (20)

Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]
 
Using OpenURL Activity Data Project 03 Aug 2011
Using OpenURL Activity Data Project 03 Aug 2011Using OpenURL Activity Data Project 03 Aug 2011
Using OpenURL Activity Data Project 03 Aug 2011
 
Jisc on repositories unleashing data - Daniela Duca
Jisc on repositories unleashing data - Daniela DucaJisc on repositories unleashing data - Daniela Duca
Jisc on repositories unleashing data - Daniela Duca
 
Unlocking Thesis Data - Stephen Grace, University of East London
Unlocking Thesis Data - Stephen Grace, University of East LondonUnlocking Thesis Data - Stephen Grace, University of East London
Unlocking Thesis Data - Stephen Grace, University of East London
 
OpenAIRE compatibility for repositories - Webinar on the OpenAIRE Guidelines
OpenAIRE compatibility for repositories - Webinar on the OpenAIRE GuidelinesOpenAIRE compatibility for repositories - Webinar on the OpenAIRE Guidelines
OpenAIRE compatibility for repositories - Webinar on the OpenAIRE Guidelines
 
OpenAIRE Broker Service and the Dashboard for Content Providers
OpenAIRE Broker Service and the Dashboard for Content ProvidersOpenAIRE Broker Service and the Dashboard for Content Providers
OpenAIRE Broker Service and the Dashboard for Content Providers
 
Open Access: funders' policies and recent updates
Open Access: funders' policies and recent updatesOpen Access: funders' policies and recent updates
Open Access: funders' policies and recent updates
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
 
OpenAIRE presentation - Open Access Week 2014 @EKT Conference (Greece)
OpenAIRE presentation - Open Access Week 2014 @EKT Conference (Greece)OpenAIRE presentation - Open Access Week 2014 @EKT Conference (Greece)
OpenAIRE presentation - Open Access Week 2014 @EKT Conference (Greece)
 
COBWEB Project Status
COBWEB Project StatusCOBWEB Project Status
COBWEB Project Status
 
The Tropical Rain Forest Information Center
The Tropical Rain Forest Information CenterThe Tropical Rain Forest Information Center
The Tropical Rain Forest Information Center
 
Embl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildishEmbl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildish
 
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
 
OpenAIRE Dashboard for Content Providers: Open Science as-a-Service for liter...
OpenAIRE Dashboard for Content Providers: Open Science as-a-Service for liter...OpenAIRE Dashboard for Content Providers: Open Science as-a-Service for liter...
OpenAIRE Dashboard for Content Providers: Open Science as-a-Service for liter...
 
CORE Repositories Dashboard
CORE Repositories DashboardCORE Repositories Dashboard
CORE Repositories Dashboard
 
Storage for research-data webinar - Deakin University
Storage for research-data webinar - Deakin UniversityStorage for research-data webinar - Deakin University
Storage for research-data webinar - Deakin University
 
Webinar on OpenAIRE compatibility for repositories: EPrints repository platform
Webinar on OpenAIRE compatibility for repositories: EPrints repository platform Webinar on OpenAIRE compatibility for repositories: EPrints repository platform
Webinar on OpenAIRE compatibility for repositories: EPrints repository platform
 
OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...
OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...
OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...
 
OAI and Publishers’ metadata: Using the static repositories approach to discl...
OAI and Publishers’ metadata: Using the static repositories approach to discl...OAI and Publishers’ metadata: Using the static repositories approach to discl...
OAI and Publishers’ metadata: Using the static repositories approach to discl...
 
Voa3r Identification Analysis Technical Requirements
Voa3r Identification Analysis Technical RequirementsVoa3r Identification Analysis Technical Requirements
Voa3r Identification Analysis Technical Requirements
 

Andere mochten auch

Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century ResearchRoss Mounce
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 
Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]Ross Mounce
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research DataRoss Mounce
 
The PLUTo project @iEvoBio 2014
The PLUTo project @iEvoBio 2014The PLUTo project @iEvoBio 2014
The PLUTo project @iEvoBio 2014Ross Mounce
 
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference Kaitlin Thaney
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureRoss Mounce
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open DataRoss Mounce
 
Subscription costs versus open access costs, & Dissolving journals' boundaries
Subscription costs versus open access costs, & Dissolving journals' boundariesSubscription costs versus open access costs, & Dissolving journals' boundaries
Subscription costs versus open access costs, & Dissolving journals' boundariesAlex Holcombe
 
SocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meetingSocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meetingKent Anderson
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
 
Open Access for Early Career Researchers
Open Access for Early Career ResearchersOpen Access for Early Career Researchers
Open Access for Early Career ResearchersRoss Mounce
 
Research publication support for scholars in Brazil: Rising to the challenge
Research publication support for scholars in Brazil: Rising to the challengeResearch publication support for scholars in Brazil: Rising to the challenge
Research publication support for scholars in Brazil: Rising to the challengeRon Martinez
 
Open Access: Which Side Are You On
Open Access: Which Side Are You OnOpen Access: Which Side Are You On
Open Access: Which Side Are You OnJill Cirasella
 
Fifty shades of green and gold: open access to scholarly information
Fifty shades of green and gold: open access to scholarly informationFifty shades of green and gold: open access to scholarly information
Fifty shades of green and gold: open access to scholarly informationhierohiero
 

Andere mochten auch (16)

Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century Research
 
Open Access Publishing, Threat or Opportunity?
Open Access Publishing, Threat or Opportunity?Open Access Publishing, Threat or Opportunity?
Open Access Publishing, Threat or Opportunity?
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research Data
 
The PLUTo project @iEvoBio 2014
The PLUTo project @iEvoBio 2014The PLUTo project @iEvoBio 2014
The PLUTo project @iEvoBio 2014
 
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open Data
 
Subscription costs versus open access costs, & Dissolving journals' boundaries
Subscription costs versus open access costs, & Dissolving journals' boundariesSubscription costs versus open access costs, & Dissolving journals' boundaries
Subscription costs versus open access costs, & Dissolving journals' boundaries
 
SocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meetingSocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meeting
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
 
Open Access for Early Career Researchers
Open Access for Early Career ResearchersOpen Access for Early Career Researchers
Open Access for Early Career Researchers
 
Research publication support for scholars in Brazil: Rising to the challenge
Research publication support for scholars in Brazil: Rising to the challengeResearch publication support for scholars in Brazil: Rising to the challenge
Research publication support for scholars in Brazil: Rising to the challenge
 
Open Access: Which Side Are You On
Open Access: Which Side Are You OnOpen Access: Which Side Are You On
Open Access: Which Side Are You On
 
Fifty shades of green and gold: open access to scholarly information
Fifty shades of green and gold: open access to scholarly informationFifty shades of green and gold: open access to scholarly information
Fifty shades of green and gold: open access to scholarly information
 

Ähnlich wie How can repositories support the text-mining of their content and why?

How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?openminted_eu
 
A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining Chris Shillum
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...petrknoth
 
Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publicationspetrknoth
 
OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...Open Science Fair
 
Next generation repositories
Next generation repositoriesNext generation repositories
Next generation repositoriesPaul Walk
 
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Chris Shillum
 
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...Open Science Fair
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)floyd taag
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)floyd taag
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)floyd taag
 
-Open Archives Initiatives(final)
-Open Archives Initiatives(final)-Open Archives Initiatives(final)
-Open Archives Initiatives(final)floyd taag
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)marevil awas
 
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...Pedro Príncipe
 
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...OpenAIRE
 
OA Repositories for DE in Myanmar presentation
OA Repositories for DE in Myanmar presentationOA Repositories for DE in Myanmar presentation
OA Repositories for DE in Myanmar presentationaduchesne1
 
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)OpenAIRE
 
From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...petrknoth
 
Open Science, Open Data: towards a new transparent and reproducible ecosystem
Open Science, Open Data:   towards a new transparent and reproducible ecosystemOpen Science, Open Data:   towards a new transparent and reproducible ecosystem
Open Science, Open Data: towards a new transparent and reproducible ecosystemLIBER Europe
 

Ähnlich wie How can repositories support the text-mining of their content and why? (20)

How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?
 
A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
 
Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publications
 
OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...
 
Next generation repositories
Next generation repositoriesNext generation repositories
Next generation repositories
 
Patham "NISO-ODI (Open Discovery Initiative) Standards Update"
Patham "NISO-ODI (Open Discovery Initiative) Standards Update"Patham "NISO-ODI (Open Discovery Initiative) Standards Update"
Patham "NISO-ODI (Open Discovery Initiative) Standards Update"
 
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
 
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
-Open Archives Initiatives(final)
-Open Archives Initiatives(final)-Open Archives Initiatives(final)
-Open Archives Initiatives(final)
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
 
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
 
OA Repositories for DE in Myanmar presentation
OA Repositories for DE in Myanmar presentationOA Repositories for DE in Myanmar presentation
OA Repositories for DE in Myanmar presentation
 
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
 
From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...
 
Open Science, Open Data: towards a new transparent and reproducible ecosystem
Open Science, Open Data:   towards a new transparent and reproducible ecosystemOpen Science, Open Data:   towards a new transparent and reproducible ecosystem
Open Science, Open Data: towards a new transparent and reproducible ecosystem
 

Mehr von Nancy Pontika

Closing the scientific literature access gap with CORE - how to gain free acc...
Closing the scientific literature access gap with CORE - how to gain free acc...Closing the scientific literature access gap with CORE - how to gain free acc...
Closing the scientific literature access gap with CORE - how to gain free acc...Nancy Pontika
 
The future of scholarly communications professionals
The future of scholarly communications professionalsThe future of scholarly communications professionals
The future of scholarly communications professionalsNancy Pontika
 
CORE: Recommender and Publisher Connector
CORE: Recommender and Publisher Connector CORE: Recommender and Publisher Connector
CORE: Recommender and Publisher Connector Nancy Pontika
 
CORE Recommender: a plug in suggesting open access content
CORE Recommender: a plug in suggesting open access contentCORE Recommender: a plug in suggesting open access content
CORE Recommender: a plug in suggesting open access contentNancy Pontika
 
General introduction to Open Data Policies H2020, influence of OD policies on...
General introduction to Open Data Policies H2020, influence of OD policies on...General introduction to Open Data Policies H2020, influence of OD policies on...
General introduction to Open Data Policies H2020, influence of OD policies on...Nancy Pontika
 
Open Science: Tools and platforms
Open Science: Tools and platformsOpen Science: Tools and platforms
Open Science: Tools and platformsNancy Pontika
 
Understanding Open Science: Definitions and framework
Understanding Open Science: Definitions and framework Understanding Open Science: Definitions and framework
Understanding Open Science: Definitions and framework Nancy Pontika
 
What is Open Science
What is Open ScienceWhat is Open Science
What is Open ScienceNancy Pontika
 
Open Science, Why not?
Open Science, Why not?Open Science, Why not?
Open Science, Why not?Nancy Pontika
 
Open Science: Application and Benefits
Open Science: Application and BenefitsOpen Science: Application and Benefits
Open Science: Application and BenefitsNancy Pontika
 
Fostering Open Science to Research Using a Taxonomy and an eLearning Portal
Fostering Open Science to Research Using a Taxonomy and an eLearning PortalFostering Open Science to Research Using a Taxonomy and an eLearning Portal
Fostering Open Science to Research Using a Taxonomy and an eLearning PortalNancy Pontika
 
Benefits of Open Access to Early Career Researchers
Benefits of Open Access to Early Career Researchers Benefits of Open Access to Early Career Researchers
Benefits of Open Access to Early Career Researchers Nancy Pontika
 
What young researchers can do to promote open access
What young researchers can do to promote open accessWhat young researchers can do to promote open access
What young researchers can do to promote open accessNancy Pontika
 
Developing Infrastructure to Support Closer Collaboration of Aggregators with...
Developing Infrastructure to Support Closer Collaboration of Aggregators with...Developing Infrastructure to Support Closer Collaboration of Aggregators with...
Developing Infrastructure to Support Closer Collaboration of Aggregators with...Nancy Pontika
 
Putting Open Access into Practice
Putting Open Access into Practice Putting Open Access into Practice
Putting Open Access into Practice Nancy Pontika
 
Reusing Open Access content & HEFCE policy on Open Access
 Reusing Open Access content & HEFCE policy on Open Access Reusing Open Access content & HEFCE policy on Open Access
Reusing Open Access content & HEFCE policy on Open AccessNancy Pontika
 
REF2020 and Open Access : How to comply?
REF2020 and Open Access : How to comply?REF2020 and Open Access : How to comply?
REF2020 and Open Access : How to comply?Nancy Pontika
 
Managing Open Access in the Library
Managing Open Access in the Library Managing Open Access in the Library
Managing Open Access in the Library Nancy Pontika
 
Open Access Publishing: Understanding the implications for the Arts and Human...
Open Access Publishing: Understanding the implications for the Arts and Human...Open Access Publishing: Understanding the implications for the Arts and Human...
Open Access Publishing: Understanding the implications for the Arts and Human...Nancy Pontika
 

Mehr von Nancy Pontika (19)

Closing the scientific literature access gap with CORE - how to gain free acc...
Closing the scientific literature access gap with CORE - how to gain free acc...Closing the scientific literature access gap with CORE - how to gain free acc...
Closing the scientific literature access gap with CORE - how to gain free acc...
 
The future of scholarly communications professionals
The future of scholarly communications professionalsThe future of scholarly communications professionals
The future of scholarly communications professionals
 
CORE: Recommender and Publisher Connector
CORE: Recommender and Publisher Connector CORE: Recommender and Publisher Connector
CORE: Recommender and Publisher Connector
 
CORE Recommender: a plug in suggesting open access content
CORE Recommender: a plug in suggesting open access contentCORE Recommender: a plug in suggesting open access content
CORE Recommender: a plug in suggesting open access content
 
General introduction to Open Data Policies H2020, influence of OD policies on...
General introduction to Open Data Policies H2020, influence of OD policies on...General introduction to Open Data Policies H2020, influence of OD policies on...
General introduction to Open Data Policies H2020, influence of OD policies on...
 
Open Science: Tools and platforms
Open Science: Tools and platformsOpen Science: Tools and platforms
Open Science: Tools and platforms
 
Understanding Open Science: Definitions and framework
Understanding Open Science: Definitions and framework Understanding Open Science: Definitions and framework
Understanding Open Science: Definitions and framework
 
What is Open Science
What is Open ScienceWhat is Open Science
What is Open Science
 
Open Science, Why not?
Open Science, Why not?Open Science, Why not?
Open Science, Why not?
 
Open Science: Application and Benefits
Open Science: Application and BenefitsOpen Science: Application and Benefits
Open Science: Application and Benefits
 
Fostering Open Science to Research Using a Taxonomy and an eLearning Portal
Fostering Open Science to Research Using a Taxonomy and an eLearning PortalFostering Open Science to Research Using a Taxonomy and an eLearning Portal
Fostering Open Science to Research Using a Taxonomy and an eLearning Portal
 
Benefits of Open Access to Early Career Researchers
Benefits of Open Access to Early Career Researchers Benefits of Open Access to Early Career Researchers
Benefits of Open Access to Early Career Researchers
 
What young researchers can do to promote open access
What young researchers can do to promote open accessWhat young researchers can do to promote open access
What young researchers can do to promote open access
 
Developing Infrastructure to Support Closer Collaboration of Aggregators with...
Developing Infrastructure to Support Closer Collaboration of Aggregators with...Developing Infrastructure to Support Closer Collaboration of Aggregators with...
Developing Infrastructure to Support Closer Collaboration of Aggregators with...
 
Putting Open Access into Practice
Putting Open Access into Practice Putting Open Access into Practice
Putting Open Access into Practice
 
Reusing Open Access content & HEFCE policy on Open Access
 Reusing Open Access content & HEFCE policy on Open Access Reusing Open Access content & HEFCE policy on Open Access
Reusing Open Access content & HEFCE policy on Open Access
 
REF2020 and Open Access : How to comply?
REF2020 and Open Access : How to comply?REF2020 and Open Access : How to comply?
REF2020 and Open Access : How to comply?
 
Managing Open Access in the Library
Managing Open Access in the Library Managing Open Access in the Library
Managing Open Access in the Library
 
Open Access Publishing: Understanding the implications for the Arts and Human...
Open Access Publishing: Understanding the implications for the Arts and Human...Open Access Publishing: Understanding the implications for the Arts and Human...
Open Access Publishing: Understanding the implications for the Arts and Human...
 

Kürzlich hochgeladen

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 

Kürzlich hochgeladen (20)

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 

How can repositories support the text-mining of their content and why?

  • 1. How can repositories support the text-mining of their content and why? @openminted_eu Dr. Petr Knoth and Dr. Nancy Pontika Knowledge Media institute, The Open University United Kingdom Twitter: @oacore
  • 2. Why should repositories support TDM? @openminted_eu
  • 4. Repositories and TDM @openminted_eu Institutional Repositories Subject Repositories Publishers/ OA journals Other sources: Research Networking Services Primary Research Data... Text Mining Services
  • 5.
  • 6. TDM & Repositories Managers @openminted_eu • Established and maintain a close collaboration with researchers • Extensive experience in advocacy, i.e. open access • Knowledgeable about the repository’s collection • Participate in the Academic Institution’s Research Committees • Knowledgeable of your repository’s collection • Familiarity with Copyright issues and Creative Commons Licenses
  • 7. How can repositories support TDM? TDM is all about processing text and data at scale. The role of repositories is to facilitate the aggregation of research papers at a full-text level (and beyond) effectively enabling TDM services to operate seamlessly on all available research content. 7
  • 8. What is the problem? @openminted_eu • A small study (Knoth, 2013) • 83 repositories - mainly Eprints with PDF research outputs • 1,461,016 metadata records metadata linked to content content downloadable content machine readable Mean 54.1% 34.4% 27.6% Median 39.5% 16.7% 13.0% Standard deviation 39.2% 34.2% 31.0%
  • 9. How is content aggregated today? @openminted_eu • DC over OAI-PMH: vast majority of repositories, never intended to support content harvesting. The main problem: linking metadata with content. “The nature of a resource identifier is outside the scope of the OAI- PMH. To facilitate access to the resource associated with harvested metadata, repositories should use an element in metadata records to establish a linkage between the record (and the identifier of its item) and the identifier (URL, URN, DOI, etc.) of the associated resource. The mandatory Dublin Core format provides the identifier element that should be used for this purpose.”
  • 10. How is content aggregated today? @openminted_eu • RIOXX: Just one identifier, recommends the identifier points to the actual resource being described. • OpenAIRE Guidelines: identifier links to either the resource or a jump-off page. Does allow multiple identifiers. • ResourceSync • CrossRef: comercial publishers/journals
  • 12. Principle 1: content referencing Repositories should always establish a link from the metadata record to the item the metadata record describes using a dereferencable identifier pointing to the version held locally in the repository. The dereferencable identifier should be provided in the appropriate metadata element in the used metadata format (i.e. dc:identifier in the case of Dublin Core). If multiple identifiers are used, it is recommended listing the local dereferencable identifier first. 1
  • 13. The accessibility of repositories to harvesting systems @openminted_eu
  • 14. Principle 2: Content accessibility to machines Repositories must provide universal access to machines with the same level of access as humans have. It is the role of repositories to allow aggregators to harvest the entire content of the repository in a reasonable time to enable acquiring and maintain up-to-date information about the repository content. 1
  • 15. What can repositories do? @openminted_eu • Ensure correct referencing of content from metadata: • Dereferencable link which resolves to content • Locally held (content under its control) • Using a standard repository platform can help • Check robots.txt • Register your repository • Advocate for good pdf (media) quality of deposited content • Use monitoring tools • CORE Repository Dashboard • OpenAIRE Repository Manager Dashboard • Machine readable licensing
  • 16. beyond Open Access MAKING SENSE OF LARGE VOLUMES OF SCIENTIFIC CONTENT 1
  • 17. Interested in how to TDM research papers? @openminted_eu We have 3 more talks tomorrow! Developer track 1, 11:00 Mining Open Access publications with CORE
  • 18. Interested in how to TDM research papers? @openminted_eu We have 3 more talks tomorrow! Developer track 1, 11:20 Oxford vs Cambridge Contest: Collecting Open Research Evaluation Metrics for University Ranking
  • 19. Interested in how to TDM research papers? @openminted_eu We have 3 more talks tomorrow! Papers 4, 4:00 Exploring Semantometrics: full text-based research evaluation for open repositories
  • 20. Thank you Dr. Pert Knoth,, Research Fellow petr.knoth@open.ac.uk Dr. Nancy Pontika, Open Access Aggregation Officer nancy.pontika@open.ac.uk . 2

Hinweis der Redaktion

  1. Mining individual repositories is not intersteing. TDM is about processing at scale. The role of repositories is: …
  2. So why am I talking about what the role of the repositories is? Well I think we have a slight problem here … We have done a study to …
  3. The main problem: linking metadata with content.
  4. OpenAIRE guidelines: https://guidelines.openaire.eu/en/latest/literature/field_resourceidentifier.html The ideal use of this element is to use a direct link or a link to a jump-off page (persistent URL) fromdc:identifier in the metadata record to the digital resource or a jump-off page.
  5. <dc:identifier> field: The aim of the Dublin Core Metadata tags is to ensure online interoperability of metadata standards. The importance of the <dc:identifier> tag is that it describes the resource of the harvested output. CORE expects in this field to find the direct URL of the PDF. When the information in this field is not presented properly, the CORE crawler needs to crawl for the PDF and the success of finding it cannot be guaranteed. This also causes additional server processing time and bandwidth both for the harvester and the hosting institution. There are also three additional points that need to be considered with regards to the <dc:identifier>; a) this field should describe an absolute path to the file, b) it should contain an appropriate file name extension, for example “.pdf” and c) the full-text items should be stored under the same repository domain.
  6. The problem is not multiple metadata formats, but the fact that none of them is good enough! Thinking that by supporting the guidelines you allow content aggregation is an issue. Locally means within the repositories control. <dc:identifier> field: The aim of the Dublin Core Metadata tags is to ensure online interoperability of metadata standards. The importance of the <dc:identifier> tag is that it describes the resource of the harvested output. CORE expects in this field to find the direct URL of the PDF. When the information in this field is not presented properly, the CORE crawler needs to crawl for the PDF and the success of finding it cannot be guaranteed. This also causes additional server processing time and bandwidth both for the harvester and the hosting institution. There are also three additional points that need to be considered with regards to the <dc:identifier>; a) this field should describe an absolute path to the file, b) it should contain an appropriate file name extension, for example “.pdf” and c) the full-text items should be stored under the same repository domain.
  7. Arxiv has now a slightly nicer robots.txt where anoyone is allowed access with a 15s delay. Still not doable …
  8. Platform: For those who haven’t deployed a repository yet, it is highly advised that the repository platform is not built in house, but one of the industry standard platforms is chosen. The benefits of choosing one of the existing platforms is that they provide frequent content updates, constant support and extend repository functionality through plug-ins.
  9. Our ultimate goal is to put in place infrastructure that will enable anyone to make sense of large volumes of scientific data. The infrastructure is open and transparent.
  10. If you are interested in how we makes sense of the large volumes of scientific content.
  11. If you are interested in how we makes sense of the large volumes of scientific content.
  12. If you are interested in how we makes sense of the large volumes of scientific content.