SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
twitter.com/openminted_eu
beyond Open Access
MAKING SENSE OF LARGE VOLUMEs OF SCIENTIFIC CONTENT
Stelios Piperidis
Athena Research & Innovation Centre
spip@ilsp.athena-innovation.gr
1
The global research community generates over 1.5 million new
scholarly articles per annum.
The STM report (2009)
2
Lokman I. Meho, The rise and rise of
citation analysis, 2007
e STM report (2009)
… some 90% of papers … are never cited.
… 50% of papers are never read by anyone other than their
authors, referees and journal editors
…about scientific literature?
… one paper published every 30 seconds
… 70,000 papers published on a single protein, the tumor
suppressor p53
Spangler et al, Automated Hypothesis
Generation based on Mining Scientific
Literature, 2014
e STM report (2009)
Emerging solution(s)
Machine reading
process textual sources, organise and classify in various
dimensions, extract main (indexical) information items,
… and understanding
identify and extract entities and relations between entities, facilitate
the transformation of unstructured textual sources into structured
data
… and predicting
enable the multidimensional analysis of structured data to extract
meaningful insights and improve the ability to predict
3
Structuring and mining
textual data
many examples from medical research
An example from social sciences:
study social confrontation in the Greek
society with a focus on the years of the crisis
based on newspaper corpora
what have been the claims of the social agents (parties,
unions, different professional associations, etc) against which
government/state bodies, instruments used, how they were
reported in different newspapers
4
Study social confrontation
example
Κατάληψη στα Υποθηκοφυλακεία Πειραιώς και
Σαλαμίνας αποφάσισε ο Δικηγορικός Σύλλογος
Πειραιώς (ΔΣΠ), στις 26 και 27 Απριλίου 2011,
διαμαρτυρόμενος για τα σοβαρότατα
προβλήματα λειτουργίας που παρουσιάζουν.
The Piraeus Bar Association ( SAB) decided to
go for the occupation of land registries in
Piraeus and Salamis on 26 and April 27, 2011 ,
protesting about the serious operational
problems they present.
5
Study social confrontation
example
6
Form
Actor/
Addressee
Issue
Time/
Location
Claims
Named
Entity
Recognition
Chunking
Dependency
Parsing
Co-reference
Resolution
Aggregatio
n
Analytics
Stack
ILSP-NLP
IE
workflow
Summarize/
Export
Summarize/
Export
Visualise statistics
Main objective
Establish an open and sustainable Text and
Data Mining (TDM) platform and infrastructure
where researchers can collaboratively create,
discover, share and re-use knowledge from a
wide range of text based scientific and
scholarly related sources.
9
infrastructure - focus on
interoperability
build on existing TDM tools - no new
algorithms
service oriented - discovery, re-use of
content & tools
community driven - user centric
requirements
open science - openness at all levels
Key aspects
10
The landscape
Text Mining
Researchers
Text Mining
Researchers
Content ProvidersContent Providers
End UsersEnd UsersComputing InfrastructuresComputing Infrastructures
11
the project
• Started: June 2015
• Duration: 3 years
• Total budget: 6,068,074
Euros
16 Partners
• 6 mining research groups
• 3 content providers
• 1 data center
• 1 library association
• 2 legal experts
• 6 community related partners
• 2 SMEs
12
Partners
Athena RIC
Univ. of Manchester (NacTem)
Univ. of Darmstadt
INRA
EMBL-EBI
Agro-Know
LIBER
Univ. of Amsterdam
Open University UK
EPFL
CNIO
Univ. of Sheffield (GATE)
GESIS
GRNET
Frontiers
Univ. of Stirling
the challenges
Content
Barriers and obstacles due to non-availability, technical restrictions,
copyright law or licensing issues.
No uniform way to search for, retrieve and access content for TDM.
Services
How to identify the most fitting one? Do I have permission to use it?
How to combine with other services I have access to or I need? How
to use them on my content?
Processing
Where to deploy? Are my machines powerful enough? How can I
get access to powerful machines? Where to store intermediate and
final results? How to ensure persistence of storage?
13
Bring all stakeholders together!
Main routes
14
accessible content
Metadata and transfer protocols
•Document literature content, language resources, data categories
taxonomies, provenance information
•Generic and domain-specific metadata descriptions
•Identify standards for metadata harvesting and federated search in
distributed repositories
IPR and licensing
•Study IPR restrictions for reuse of sources
• Exceptions?
• What about non-commercial research?
•Translate the legal & policy aspects into authentication and
authorization specifications (GEANT’s EduGain, …)
• User-to-service and service-to-service interactions
15
Starting with repositories and OA
publishers
via OpenAIRE and CORE
Starting with repositories and OA
publishers
via OpenAIRE and CORE
In close collaboration with the
FUTURETDM project
http://project.futuretdm.eu/
In close collaboration with the
FUTURETDM project
http://project.futuretdm.eu/
Scholarly
Comm.
life
sciences agriculture social
sciences
Community driven
17
From the very beginning…
Requirements, content, barriers, expected outcomes.
… to the very end
Create applications, validate and evaluate the results.
Use cases (1)
Scholarly communication analytics
OpenAIRE, CORE, Frontiers
•Semantic search and discovery of open scientific outcomes
•Map of academia – scholarly communication network
•Research monitoring and analytics
Life sciences
EBI, Human brain project
•Assisted curation of the EMBL-EBI chemical databases for
metabolomics
•Curation of the neurosciences resources KnowledgeBase and
Neurolex
18
Use cases (2)
Agriculture and biodiversity
INRA, AGRO-KNOW, EFSA
•Enrich agricultural databases to assist food- and water-borne
disease outbreak alerts and product recalls
•Image, figure and dataset discovery in the AGRIS FAO online
service
social sciences
GESIS
•Develop and evaluate methods for the automatic detection and
linking of named entities, citation traces and intentions in social
science scientific publications
19
Expectations from today’s WS
•Establish contact and dialogue with content providers,
especially OA content providers
•Understand current practices, problems and limitations
•Look into the emerging requirements
•Explore the challenges content providers face at
technical, legal, policy and organisational challenges
face in making their data open for text and data mining
•Develop a common vision and strategy
20
twitter.com/openminted_eu
facebook.com/openminted
bit.do/openmintedlinkedin
vimeo.com/openminted
bit.do/openmintedplus
THANK YOU!
21

Weitere ähnliche Inhalte

Was ist angesagt?

Library Science Talk: Tensions between copyright and knowledge discovery
Library Science Talk: Tensions between copyright and knowledge discoveryLibrary Science Talk: Tensions between copyright and knowledge discovery
Library Science Talk: Tensions between copyright and knowledge discoveryLIBER Europe
 
FREYA - Connected Open Identifiers for Discovery, Access and Use of Research ...
FREYA - Connected Open Identifiers for Discovery, Access and Use of Research ...FREYA - Connected Open Identifiers for Discovery, Access and Use of Research ...
FREYA - Connected Open Identifiers for Discovery, Access and Use of Research ...EUDAT
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
 
Understanding the users of the Parliamentary Web Archive: a user research pro...
Understanding the users of the Parliamentary Web Archive: a user research pro...Understanding the users of the Parliamentary Web Archive: a user research pro...
Understanding the users of the Parliamentary Web Archive: a user research pro...Peter Webster
 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...BigData_Europe
 
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...OpenAIRE
 
The Regulation of Text and Data Mining
The Regulation of Text and Data MiningThe Regulation of Text and Data Mining
The Regulation of Text and Data MiningLIBER Europe
 
Horizon 2020: Outline of a Pilot for Open Research Data
Horizon 2020: Outline of a Pilot for Open Research Data  Horizon 2020: Outline of a Pilot for Open Research Data
Horizon 2020: Outline of a Pilot for Open Research Data LIBER Europe
 
Eva Méndez: Política europea y EOSC
Eva Méndez: Política europea y EOSCEva Méndez: Política europea y EOSC
Eva Méndez: Política europea y EOSCmaredata
 
Gobinda Chowdhury
Gobinda ChowdhuryGobinda Chowdhury
Gobinda Chowdhurymaredata
 
Open Science in HORIZON Grant Agreement
Open Science in HORIZON Grant AgreementOpen Science in HORIZON Grant Agreement
Open Science in HORIZON Grant AgreementMilan Zdravković
 
Sessions presentation slides - 8th OpenAIRE workshop
Sessions presentation slides - 8th OpenAIRE workshopSessions presentation slides - 8th OpenAIRE workshop
Sessions presentation slides - 8th OpenAIRE workshopOpenAIRE
 
Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...petrknoth
 
Zenodo - The catch-all repository
Zenodo - The catch-all repository Zenodo - The catch-all repository
Zenodo - The catch-all repository OpenAccessBelgium
 
OpenAIRE short presentation
OpenAIRE short presentationOpenAIRE short presentation
OpenAIRE short presentationOpenAIRE
 
Linking Collections Through Linked Open Data
Linking Collections Through Linked Open DataLinking Collections Through Linked Open Data
Linking Collections Through Linked Open DataThe European Library
 

Was ist angesagt? (20)

Library Science Talk: Tensions between copyright and knowledge discovery
Library Science Talk: Tensions between copyright and knowledge discoveryLibrary Science Talk: Tensions between copyright and knowledge discovery
Library Science Talk: Tensions between copyright and knowledge discovery
 
FREYA - Connected Open Identifiers for Discovery, Access and Use of Research ...
FREYA - Connected Open Identifiers for Discovery, Access and Use of Research ...FREYA - Connected Open Identifiers for Discovery, Access and Use of Research ...
FREYA - Connected Open Identifiers for Discovery, Access and Use of Research ...
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
Understanding the users of the Parliamentary Web Archive: a user research pro...
Understanding the users of the Parliamentary Web Archive: a user research pro...Understanding the users of the Parliamentary Web Archive: a user research pro...
Understanding the users of the Parliamentary Web Archive: a user research pro...
 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
 
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
 
The Regulation of Text and Data Mining
The Regulation of Text and Data MiningThe Regulation of Text and Data Mining
The Regulation of Text and Data Mining
 
Horizon 2020: Outline of a Pilot for Open Research Data
Horizon 2020: Outline of a Pilot for Open Research Data  Horizon 2020: Outline of a Pilot for Open Research Data
Horizon 2020: Outline of a Pilot for Open Research Data
 
Eva Méndez: Política europea y EOSC
Eva Méndez: Política europea y EOSCEva Méndez: Política europea y EOSC
Eva Méndez: Política europea y EOSC
 
Gobinda Chowdhury
Gobinda ChowdhuryGobinda Chowdhury
Gobinda Chowdhury
 
Scholze goportis 4-11-14
Scholze goportis 4-11-14Scholze goportis 4-11-14
Scholze goportis 4-11-14
 
Open Science in HORIZON Grant Agreement
Open Science in HORIZON Grant AgreementOpen Science in HORIZON Grant Agreement
Open Science in HORIZON Grant Agreement
 
Sessions presentation slides - 8th OpenAIRE workshop
Sessions presentation slides - 8th OpenAIRE workshopSessions presentation slides - 8th OpenAIRE workshop
Sessions presentation slides - 8th OpenAIRE workshop
 
Open Access In Biomedical Research
Open Access In Biomedical ResearchOpen Access In Biomedical Research
Open Access In Biomedical Research
 
Scholze imcw 2014-11-25
Scholze imcw 2014-11-25Scholze imcw 2014-11-25
Scholze imcw 2014-11-25
 
Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...
 
Zenodo - The catch-all repository
Zenodo - The catch-all repository Zenodo - The catch-all repository
Zenodo - The catch-all repository
 
Opendata repository-v2
Opendata repository-v2Opendata repository-v2
Opendata repository-v2
 
OpenAIRE short presentation
OpenAIRE short presentationOpenAIRE short presentation
OpenAIRE short presentation
 
Linking Collections Through Linked Open Data
Linking Collections Through Linked Open DataLinking Collections Through Linked Open Data
Linking Collections Through Linked Open Data
 

Ähnlich wie OpenMinTeD: Making Sense of Large Volumes of Data

Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019heila1
 
OpenAIRE at Workshop on CRIS and OAR, May 2010
OpenAIRE at Workshop on CRIS and OAR, May 2010OpenAIRE at Workshop on CRIS and OAR, May 2010
OpenAIRE at Workshop on CRIS and OAR, May 2010OpenAIRE
 
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)OpenAIRE
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...William Gunn
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewAngelo Salatino
 
UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...
UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...
UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...UKSG: connecting the knowledge community
 
The CRIS-Repository connection: possibilities and values – Ed Simons and Dani...
The CRIS-Repository connection: possibilities and values – Ed Simons and Dani...The CRIS-Repository connection: possibilities and values – Ed Simons and Dani...
The CRIS-Repository connection: possibilities and values – Ed Simons and Dani...OpenAIRE
 
A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining Chris Shillum
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Carole Goble
 
Open Science, Open Data: towards a new transparent and reproducible ecosystem
Open Science, Open Data:   towards a new transparent and reproducible ecosystemOpen Science, Open Data:   towards a new transparent and reproducible ecosystem
Open Science, Open Data: towards a new transparent and reproducible ecosystemLIBER Europe
 
Elsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing IndustryElsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing IndustryAntonio Gulli
 
OpenAIRE-connect: Services for open science
OpenAIRE-connect: Services for open scienceOpenAIRE-connect: Services for open science
OpenAIRE-connect: Services for open scienceJisc
 
Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries? Robin Rice
 
New trends in ontological engineering, practices and tools
New trends in ontological engineering, practices and toolsNew trends in ontological engineering, practices and tools
New trends in ontological engineering, practices and toolsMaría Poveda Villalón
 
Open science / open research
Open science / open researchOpen science / open research
Open science / open researchheila1
 
Connecting the dots - e-Infra services for open science
Connecting the dots - e-Infra services for open scienceConnecting the dots - e-Infra services for open science
Connecting the dots - e-Infra services for open scienceOpenAIRE
 
20190527_Karen Hytteballe Ibanez _ The OPERA project
 20190527_Karen Hytteballe Ibanez _ The OPERA project 20190527_Karen Hytteballe Ibanez _ The OPERA project
20190527_Karen Hytteballe Ibanez _ The OPERA projectOpenAIRE
 
OpenAIRE: eInfrastructure for Open Science
OpenAIRE: eInfrastructure for Open ScienceOpenAIRE: eInfrastructure for Open Science
OpenAIRE: eInfrastructure for Open ScienceOpenAIRE
 

Ähnlich wie OpenMinTeD: Making Sense of Large Volumes of Data (20)

Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019
 
OpenAIRE at Workshop on CRIS and OAR, May 2010
OpenAIRE at Workshop on CRIS and OAR, May 2010OpenAIRE at Workshop on CRIS and OAR, May 2010
OpenAIRE at Workshop on CRIS and OAR, May 2010
 
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
 
Data and Research Infrastructures and Open Science
Data and Research Infrastructures and Open ScienceData and Research Infrastructures and Open Science
Data and Research Infrastructures and Open Science
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an Overview
 
UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...
UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...
UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...
 
The CRIS-Repository connection: possibilities and values – Ed Simons and Dani...
The CRIS-Repository connection: possibilities and values – Ed Simons and Dani...The CRIS-Repository connection: possibilities and values – Ed Simons and Dani...
The CRIS-Repository connection: possibilities and values – Ed Simons and Dani...
 
A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
 
Open Science, Open Data: towards a new transparent and reproducible ecosystem
Open Science, Open Data:   towards a new transparent and reproducible ecosystemOpen Science, Open Data:   towards a new transparent and reproducible ecosystem
Open Science, Open Data: towards a new transparent and reproducible ecosystem
 
Elsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing IndustryElsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing Industry
 
OpenAIRE-connect: Services for open science
OpenAIRE-connect: Services for open scienceOpenAIRE-connect: Services for open science
OpenAIRE-connect: Services for open science
 
Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?
 
New trends in ontological engineering, practices and tools
New trends in ontological engineering, practices and toolsNew trends in ontological engineering, practices and tools
New trends in ontological engineering, practices and tools
 
Open science / open research
Open science / open researchOpen science / open research
Open science / open research
 
Connecting the dots - e-Infra services for open science
Connecting the dots - e-Infra services for open scienceConnecting the dots - e-Infra services for open science
Connecting the dots - e-Infra services for open science
 
20190527_Karen Hytteballe Ibanez _ The OPERA project
 20190527_Karen Hytteballe Ibanez _ The OPERA project 20190527_Karen Hytteballe Ibanez _ The OPERA project
20190527_Karen Hytteballe Ibanez _ The OPERA project
 
OpenAIRE: eInfrastructure for Open Science
OpenAIRE: eInfrastructure for Open ScienceOpenAIRE: eInfrastructure for Open Science
OpenAIRE: eInfrastructure for Open Science
 

Mehr von openminted_eu

Supporting the uptake of TDM
Supporting the uptake of TDMSupporting the uptake of TDM
Supporting the uptake of TDMopenminted_eu
 
OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017openminted_eu
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...openminted_eu
 
Seamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncSeamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncopenminted_eu
 
Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...openminted_eu
 
Legal issues Text and Data Mining
Legal issues Text and Data MiningLegal issues Text and Data Mining
Legal issues Text and Data Miningopenminted_eu
 
Tentative steps in mining UK theses
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK thesesopenminted_eu
 
Jisc Text Mining Capabilities
Jisc Text Mining CapabilitiesJisc Text Mining Capabilities
Jisc Text Mining Capabilitiesopenminted_eu
 
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesOpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesopenminted_eu
 
Infrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProInfrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProopenminted_eu
 
Experiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveExperiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveopenminted_eu
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlandsopenminted_eu
 

Mehr von openminted_eu (12)

Supporting the uptake of TDM
Supporting the uptake of TDMSupporting the uptake of TDM
Supporting the uptake of TDM
 
OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
 
Seamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncSeamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources sync
 
Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...
 
Legal issues Text and Data Mining
Legal issues Text and Data MiningLegal issues Text and Data Mining
Legal issues Text and Data Mining
 
Tentative steps in mining UK theses
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK theses
 
Jisc Text Mining Capabilities
Jisc Text Mining CapabilitiesJisc Text Mining Capabilities
Jisc Text Mining Capabilities
 
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesOpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
 
Infrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProInfrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKPro
 
Experiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveExperiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspective
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
 

Kürzlich hochgeladen

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Kürzlich hochgeladen (20)

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 

OpenMinTeD: Making Sense of Large Volumes of Data

  • 1. twitter.com/openminted_eu beyond Open Access MAKING SENSE OF LARGE VOLUMEs OF SCIENTIFIC CONTENT Stelios Piperidis Athena Research & Innovation Centre spip@ilsp.athena-innovation.gr 1
  • 2. The global research community generates over 1.5 million new scholarly articles per annum. The STM report (2009) 2 Lokman I. Meho, The rise and rise of citation analysis, 2007 e STM report (2009) … some 90% of papers … are never cited. … 50% of papers are never read by anyone other than their authors, referees and journal editors …about scientific literature? … one paper published every 30 seconds … 70,000 papers published on a single protein, the tumor suppressor p53 Spangler et al, Automated Hypothesis Generation based on Mining Scientific Literature, 2014 e STM report (2009)
  • 3. Emerging solution(s) Machine reading process textual sources, organise and classify in various dimensions, extract main (indexical) information items, … and understanding identify and extract entities and relations between entities, facilitate the transformation of unstructured textual sources into structured data … and predicting enable the multidimensional analysis of structured data to extract meaningful insights and improve the ability to predict 3
  • 4. Structuring and mining textual data many examples from medical research An example from social sciences: study social confrontation in the Greek society with a focus on the years of the crisis based on newspaper corpora what have been the claims of the social agents (parties, unions, different professional associations, etc) against which government/state bodies, instruments used, how they were reported in different newspapers 4
  • 5. Study social confrontation example Κατάληψη στα Υποθηκοφυλακεία Πειραιώς και Σαλαμίνας αποφάσισε ο Δικηγορικός Σύλλογος Πειραιώς (ΔΣΠ), στις 26 και 27 Απριλίου 2011, διαμαρτυρόμενος για τα σοβαρότατα προβλήματα λειτουργίας που παρουσιάζουν. The Piraeus Bar Association ( SAB) decided to go for the occupation of land registries in Piraeus and Salamis on 26 and April 27, 2011 , protesting about the serious operational problems they present. 5
  • 8.
  • 9. Main objective Establish an open and sustainable Text and Data Mining (TDM) platform and infrastructure where researchers can collaboratively create, discover, share and re-use knowledge from a wide range of text based scientific and scholarly related sources. 9
  • 10. infrastructure - focus on interoperability build on existing TDM tools - no new algorithms service oriented - discovery, re-use of content & tools community driven - user centric requirements open science - openness at all levels Key aspects 10
  • 11. The landscape Text Mining Researchers Text Mining Researchers Content ProvidersContent Providers End UsersEnd UsersComputing InfrastructuresComputing Infrastructures 11
  • 12. the project • Started: June 2015 • Duration: 3 years • Total budget: 6,068,074 Euros 16 Partners • 6 mining research groups • 3 content providers • 1 data center • 1 library association • 2 legal experts • 6 community related partners • 2 SMEs 12 Partners Athena RIC Univ. of Manchester (NacTem) Univ. of Darmstadt INRA EMBL-EBI Agro-Know LIBER Univ. of Amsterdam Open University UK EPFL CNIO Univ. of Sheffield (GATE) GESIS GRNET Frontiers Univ. of Stirling
  • 13. the challenges Content Barriers and obstacles due to non-availability, technical restrictions, copyright law or licensing issues. No uniform way to search for, retrieve and access content for TDM. Services How to identify the most fitting one? Do I have permission to use it? How to combine with other services I have access to or I need? How to use them on my content? Processing Where to deploy? Are my machines powerful enough? How can I get access to powerful machines? Where to store intermediate and final results? How to ensure persistence of storage? 13 Bring all stakeholders together!
  • 15. accessible content Metadata and transfer protocols •Document literature content, language resources, data categories taxonomies, provenance information •Generic and domain-specific metadata descriptions •Identify standards for metadata harvesting and federated search in distributed repositories IPR and licensing •Study IPR restrictions for reuse of sources • Exceptions? • What about non-commercial research? •Translate the legal & policy aspects into authentication and authorization specifications (GEANT’s EduGain, …) • User-to-service and service-to-service interactions 15 Starting with repositories and OA publishers via OpenAIRE and CORE Starting with repositories and OA publishers via OpenAIRE and CORE In close collaboration with the FUTURETDM project http://project.futuretdm.eu/ In close collaboration with the FUTURETDM project http://project.futuretdm.eu/
  • 16. Scholarly Comm. life sciences agriculture social sciences Community driven 17 From the very beginning… Requirements, content, barriers, expected outcomes. … to the very end Create applications, validate and evaluate the results.
  • 17. Use cases (1) Scholarly communication analytics OpenAIRE, CORE, Frontiers •Semantic search and discovery of open scientific outcomes •Map of academia – scholarly communication network •Research monitoring and analytics Life sciences EBI, Human brain project •Assisted curation of the EMBL-EBI chemical databases for metabolomics •Curation of the neurosciences resources KnowledgeBase and Neurolex 18
  • 18. Use cases (2) Agriculture and biodiversity INRA, AGRO-KNOW, EFSA •Enrich agricultural databases to assist food- and water-borne disease outbreak alerts and product recalls •Image, figure and dataset discovery in the AGRIS FAO online service social sciences GESIS •Develop and evaluate methods for the automatic detection and linking of named entities, citation traces and intentions in social science scientific publications 19
  • 19. Expectations from today’s WS •Establish contact and dialogue with content providers, especially OA content providers •Understand current practices, problems and limitations •Look into the emerging requirements •Explore the challenges content providers face at technical, legal, policy and organisational challenges face in making their data open for text and data mining •Develop a common vision and strategy 20