Integration of research literature and data (InFoLiS)

Integration of research literature and data
(InFoLiS)
Katarina Boland1
Philipp Zumstein2
1
GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
2
Mannheim University Library, Mannheim, Germany
CNI 2015 Spring Membership Meeting
April 14th, 2015
the InFoLiS project:
Integration of research data and publications
InFoLiS I: 05/2011 - 05/2013
InFoLiS II: 08/2014 - 08/2016
InFoLiS is funded by the DFG (SU 647/2-1)
Integration of research literature and data (InFoLiS) 2/22
Introduction
Catalogue:
Publications
SSOAR (GESIS),
Primo (UB MA),
...
DataCatalogue:
Research Data
da|ra (GESIS),
...
Query
Query
Response
Links
Response
Response
Response
Integration of research literature and data (InFoLiS) 3/22
InFoLiS Project Goals
1 Part I: Generation of Links
2 Part II: How can you reuse it?
Integration of research literature and data (InFoLiS) 4/22
Outline
Integration of research literature and data (InFoLiS) 5/22
Part 1: Generation of Links
Recommendation:1
:
Creator (Publication Date): Title. Publication
Agent. Identifier
Creator (Publication Date): Title. Version.
Publication Agent. Type of Resource. Identifier.
→ Extraction based on these patterns?
1
see
http://auffinden-zitieren-dokumentieren.de/zitieren/empfohlene-datenzitation/
Integration of research literature and data (InFoLiS) 6/22
Citation of Research Data
presentation and discussion of the empirical findings. For this purpose, data
from the Socio-Economic Panel (SOEP) of the years 1990 and 2003 are used
and for both periods, the impact factors are estimated using linear regression
models.
data from the title of the years year are used
Integration of research literature and data (InFoLiS) 7/22
References to Datasets
Table 1: Population forecast for Germany depending on age cohorts -
proportion in percent.
Data base: 10th Population Forecast of the Federal Statistical Office , variant 5.
(Data base: number title of the publication agent, variant
variant)
Integration of research literature and data (InFoLiS) 8/22
References to Datasets
Consulted were furthermore ...
Consulted were furthermore title1, title2, title3, ..., titleN.
Integration of research literature and data (InFoLiS) 9/22
References to Datasets
Table 3: Sample of the surveys conducted in the years 2003 and 2004 as well
as size of the sample, with valid data from both surveys
(Source: Ditton et al. 2005a)
(Source: citation of descriptive publication)
Integration of research literature and data (InFoLiS) 10/22
References to Datasets
...are hard to detect!
see also...
Green, Toby (2009). We Need Publishing Standards for
Datasets and Data Tables. OECD Publishing White Paper.
doi: 10.1787/603233448430
Altman, Micah and Gary King (2007). A Proposed Standard
for the Scholarly Citation of Quantitative Data. In: D-Lib
Magazine 13.3.
url: http://www.dlib.org/dlib/march07/altman/03altman.html
Integration of research literature and data (InFoLiS) 11/22
References to Datasets
Integration of research literature and data (InFoLiS) 12/22
Automatic Identification of
References
Why not simply search for study titles in publications?
Integration of research literature and data (InFoLiS) 12/22
Automatic Identification of
References
Why not simply search for study titles in publications?
“ALLBUS/GGSS 1996 (Allgemeine Bev¨olkerungsumfrage der
Sozialwissenschaften/German General Social Survey 1996)”
Integration of research literature and data (InFoLiS) 12/22
Automatic Identification of
References
Why not simply search for study titles in publications?
“ALLBUS/GGSS 1996 (Allgemeine Bev¨olkerungsumfrage
der Sozialwissenschaften/German General Social Survey 1996)”
“ALLBUS 96”
Integration of research literature and data (InFoLiS) 12/22
Automatic Identification of
References
Why not simply search for study titles in publications?
“Youth 2010”
How do humans recognize study references?
Source: Estimations based on SOEP, wave 2002.
Integration of research literature and data (InFoLiS) 13/22
General idea
How do humans recognize study references?
Source: Estimations based on xyz, wave 2002.
Integration of research literature and data (InFoLiS) 13/22
General idea
Integration of research literature and data (InFoLiS) 14/22
Algorithm
for details see...
Katarina Boland, Dominique Ritze, Kai Eckert & Brigitte Mathiak (2012).
Identifying References to Datasets in Publications. In: Proceedings of the
Second International Conference on Theory and Practice of Digital Libraries
(TPDL), Lecture Notes in Computer Science Volume 7489, pp. 150-161. Berlin:
Springer. doi:10.1007/978-3-642-33290-6 17
Integration of research literature and data (InFoLiS) 15/22
Reference Extraction
Integration of research literature and data (InFoLiS) 16/22
Mapping to Datasets in da|ra
Strategies: 1) greedy; 2) exact; 3) best
Integration of research literature and data (InFoLiS) 17/22
Mapping to Datasets in da|ra:
granularity of registration vs. citation
ALLBUS
ALLBUS 2000 ALLBUS 1996ALLBUS 1998
ALLBUS 2000
CAPI/PAPI
ALLBUScompact 2000
CAPI/PAPI
ALLBUScompact 2000
CAPI
ALLBUS - Cumulation 1980-2006 ALLBUS - Cumulation 1980-2008ALLBUScompact - Cumulation 1980-2010
ALLBUScompact 2000 ... ... ...
......
... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ......
ALLBUScompact
→ use ontology
Integration of research literature and data (InFoLiS) 18/22
Mapping to Datasets in da|ra
Vocabulary: e.g. DDI-RDF Discovery Vocabulary2
2
Thomas Bosch, Richard Cyganiak, Arofan Gregory, Joachim Wackerow (2013): DDI-RDF Discovery Vocabulary: A Metadata
Vocabulary for Documenting Research and Survey Data. In: Proceedings of the 6th Linked Data on the Web (LDOW) Workshop at
the 22nd International World Wide Web Conference (WWW). CEUR Workshop Proceedings, pp. 46-55
Integration of research literature and data (InFoLiS) 19/22
Ontology: Approach
Integration of research literature and data (InFoLiS) 20/22
Links
Example: da|ra
Example: SSOAR
Integration of research literature and data (InFoLiS) 21/22
Integration of Links into Information
Systems
Thank you for your attention!
katarina.boland@gesis.org
Integration of research literature and data (InFoLiS) 22/22
Next part: How can you
reuse it?
Part II
How can you reuse it?
!
Work in Progress
Interna, Data Structure, Technology
(Internal) Data structure
Document
Pattern
Executation of
Algorithm
Study Title
Study URI
(Internal) Data structure
Document
Pattern
Executation of
Algorithm
Study Title
Study URI
Which studies
are found in a
document?
(Internal) Data structure
Document
Pattern
Executation of
Algorithm
Study Title
Study URI
How was a
pattern derived?
Which studies
are found in an
document?
(Internal) Data structure
Document
Pattern
Executation of
Algorithm
Study Title
Study URI
Which other study
titles are found with
the new
configuration of the
algorithm?
How was a
pattern derived?
Which studies
are found in an
document?
Technology stack
Web Services
RESTful API (web services)
 GET, POST, PUT, DELETE, PATCH resources
 Search, perform algorithms, upload files
 open for integration into other workflows, e.g. in
 ressource discovery systems
 research data catalogues
 digital repositories
 possible to orchestrate over a web interface for
individual use
Lookup services
DB
(links)
lookup service
publication
URI
study URI
study URI
reverse lookup
service
publication
URI
Extraction of study URIs from a PDF
pdf (fulltext)
DB
(patterns)
pdf2txt
txt (fulltext) extract study titles
study URI
study titles
linking
Recognizing patterns
pdfs
(fulltext)
pattern recognizer
seed
DB
(pattern)
Integration of publications and
research data
Quoting the Horizon Report 2014
“Visionary leadership for research data management
models is also required to determine how to best
incorporate data connections into library catalogs” (NMC
Horizon Report 2014 - Library Edition, p. 7)
Current situation: Several steps needed
 Common situation today:
 Search online catalogue
 Evaluate search results
 Find fulltext to relevant source
 Read the publication
 Spot the research data
 Moreover, often the reverse information is missing
completely
 Which publications are built on some specific
research data?
Clientside
load additional data in
catalogue view (e.g. over
Ajax)
 enrich view, links
 up-to-date data
 Embedd data in the web
presentation
Serverside
add additional data in your
catalogue database (e.g.
Primo enrichement process)
 enrich view, links, search,
sort, filter
 time-lagged because of
the update mechanism
 Do the data fit into
existing infrastructure?
(fields, tables, database)
Two Approaches
Integration as links
 Link from catalogue entry ...
 … to the corresponding research data
Integration as popup
Cited research data: 2
• ALLBUS 2010 (used in 512 publications)
• part of ALLBUS (used in 13.456 publications)
• own research data (used in 1 publications)
Integration in search/sort
Cited data sets 4
Cited data sets 1
Sort by data
citation
Integration in search/filter
Research data available
Enrich your research data catalogue
Cited in: Ritze, D., Paulheim, H., &
Eckert, K. (2013). Evaluation Measures
for Ontology Matchers in Supervised
Matching Scenarios. In The Semantic
Web – ISWC 2013 (p. 392–407).
Tags from Publication: Supervised
Ontology Matching, Evaluation, Recall,
Precision, F-Measure, Precision@N-
Curves, ROC-Curves, Precision-Recall-
Curves
Current Goals of the Project
1. Expansion to other disciplines and languages
2. Linked data based infrastructure
3. Improve the reusability of generated links
Dissemination
 our web services will be open for everyone
 project webpage
 http://infolis.github.io/
 background information,
slides, publications, news
 Additionally our code is open source
 https://github.com/infolis
 you can install/try out everything locally
 development of code
Questions, Discussions, Feedback
 Questions?
 Discussions
 Give us feedback
 Small online survey: http://t1p.de/infolis
http://wiki.bib.uni-mannheim.de/limesurvey/index.php?sid=55594
1 von 51

Recomendados

Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research von
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-researchUc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-researchUniversity of California Curation Center
1.3K views32 Folien
Poster RDAP13: Research Data in eCommons @ Cornell: Present and Future von
Poster RDAP13: Research Data in eCommons @ Cornell: Present and FuturePoster RDAP13: Research Data in eCommons @ Cornell: Present and Future
Poster RDAP13: Research Data in eCommons @ Cornell: Present and FutureASIS&T
1.4K views1 Folie
Libraries and Research Data Curation: Barriers and Incentives for Preservatio... von
Libraries and Research Data Curation: Barriers and Incentives for Preservatio...Libraries and Research Data Curation: Barriers and Incentives for Preservatio...
Libraries and Research Data Curation: Barriers and Incentives for Preservatio...University of California Curation Center
1K views20 Folien
Supporting UC Research Data Management von
Supporting UC Research Data ManagementSupporting UC Research Data Management
Supporting UC Research Data Managementslabrams
284 views21 Folien
Poster: Very Open Data Project von
Poster: Very Open Data ProjectPoster: Very Open Data Project
Poster: Very Open Data ProjectEdward Blurock
590 views1 Folie
Poster RDAP13: Data information literacy multiple paths to a single goal von
Poster RDAP13: Data information literacy multiple paths to a single goalPoster RDAP13: Data information literacy multiple paths to a single goal
Poster RDAP13: Data information literacy multiple paths to a single goalASIS&T
1.2K views2 Folien

Más contenido relacionado

Was ist angesagt?

RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema... von
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...ASIS&T
1.3K views1 Folie
NISO Training Thursday Crafting a Scientific Data Management Plan von
NISO Training Thursday Crafting a Scientific Data Management PlanNISO Training Thursday Crafting a Scientific Data Management Plan
NISO Training Thursday Crafting a Scientific Data Management PlanNational Information Standards Organization (NISO)
1.9K views32 Folien
Introduction to PANGAEA & EURO-BASIN Data Management, by Janine Felden von
Introduction to PANGAEA & EURO-BASIN Data Management, by Janine FeldenIntroduction to PANGAEA & EURO-BASIN Data Management, by Janine Felden
Introduction to PANGAEA & EURO-BASIN Data Management, by Janine FeldenDTU - Technical University of Denmark
1.4K views22 Folien
Repository Fringe 2016 - Survey Documentation and Analysis von
Repository Fringe 2016 - Survey Documentation and AnalysisRepository Fringe 2016 - Survey Documentation and Analysis
Repository Fringe 2016 - Survey Documentation and AnalysisEDINA, University of Edinburgh
400 views20 Folien
Open Data and Institutional Repositories von
Open Data and Institutional RepositoriesOpen Data and Institutional Repositories
Open Data and Institutional RepositoriesRobin Rice
1.1K views23 Folien
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti... von
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...National Information Standards Organization (NISO)
2.5K views26 Folien

Was ist angesagt?(20)

RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema... von ASIS&T
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
ASIS&T1.3K views
Open Data and Institutional Repositories von Robin Rice
Open Data and Institutional RepositoriesOpen Data and Institutional Repositories
Open Data and Institutional Repositories
Robin Rice1.1K views
Edinburgh DataShare: Tackling research data in a DSpace institutional repository von Robin Rice
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Robin Rice1.7K views
Overcoming obstacles to sharing data about human subjects von Robin Rice
Overcoming obstacles to sharing data about human subjectsOvercoming obstacles to sharing data about human subjects
Overcoming obstacles to sharing data about human subjects
Robin Rice1.2K views
An analysis and characterization of DMPs in NSF proposals from the University... von Megan O'Donnell
An analysis and characterization of DMPs in NSF proposals from the University...An analysis and characterization of DMPs in NSF proposals from the University...
An analysis and characterization of DMPs in NSF proposals from the University...
Megan O'Donnell401 views

Destacado

Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer... von
Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer...Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer...
Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer...Karen Estlund
1.5K views29 Folien
Software curation as a digital preservation service von
Software curation as a digital preservation serviceSoftware curation as a digital preservation service
Software curation as a digital preservation serviceKeith Webster
2.8K views75 Folien
ResourceSync Overview von
ResourceSync OverviewResourceSync Overview
ResourceSync OverviewHerbert Van de Sompel
5.7K views66 Folien
Piloting Linked Data to Connect Library and Archive Resources to the New Worl... von
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...Laura Akerman
1.2K views42 Folien
Carpenter/Lagace: NISO Recommended Practices to Support Adoption of Altmetric... von
Carpenter/Lagace: NISO Recommended Practices to Support Adoption of Altmetric...Carpenter/Lagace: NISO Recommended Practices to Support Adoption of Altmetric...
Carpenter/Lagace: NISO Recommended Practices to Support Adoption of Altmetric...National Information Standards Organization (NISO)
1K views44 Folien
Student-Driven Innovation von
Student-Driven InnovationStudent-Driven Innovation
Student-Driven InnovationKevin Rundblad
4.2K views41 Folien

Destacado(6)

Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer... von Karen Estlund
Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer...Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer...
Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer...
Karen Estlund1.5K views
Software curation as a digital preservation service von Keith Webster
Software curation as a digital preservation serviceSoftware curation as a digital preservation service
Software curation as a digital preservation service
Keith Webster2.8K views
Piloting Linked Data to Connect Library and Archive Resources to the New Worl... von Laura Akerman
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
Laura Akerman1.2K views

Similar a Integration of research literature and data (InFoLiS)

Connecting GESIS research data and publication information systems – Katarina... von
Connecting GESIS research data and publication information systems – Katarina...Connecting GESIS research data and publication information systems – Katarina...
Connecting GESIS research data and publication information systems – Katarina...OpenAIRE
1K views46 Folien
Riding the wave - Paradigm shifts in information access von
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessdatacite
8.3K views35 Folien
Semantic Linking & Retrieval for Digital Libraries von
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesStefan Dietze
787 views34 Folien
Decomposing Social and Semantic Networks in Emerging “Big Data” Research von
Decomposing Social and Semantic Networks in Emerging “Big Data” ResearchDecomposing Social and Semantic Networks in Emerging “Big Data” Research
Decomposing Social and Semantic Networks in Emerging “Big Data” ResearchHan Woo PARK
883 views24 Folien
euclid_linkedup WWW tutorial (Besnik Fetahu) von
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)Besnik Fetahu
2.7K views18 Folien
Thinking About the Making of Data von
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of DataPaul Groth
440 views59 Folien

Similar a Integration of research literature and data (InFoLiS)(20)

Connecting GESIS research data and publication information systems – Katarina... von OpenAIRE
Connecting GESIS research data and publication information systems – Katarina...Connecting GESIS research data and publication information systems – Katarina...
Connecting GESIS research data and publication information systems – Katarina...
OpenAIRE1K views
Riding the wave - Paradigm shifts in information access von datacite
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information access
datacite8.3K views
Semantic Linking & Retrieval for Digital Libraries von Stefan Dietze
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital Libraries
Stefan Dietze787 views
Decomposing Social and Semantic Networks in Emerging “Big Data” Research von Han Woo PARK
Decomposing Social and Semantic Networks in Emerging “Big Data” ResearchDecomposing Social and Semantic Networks in Emerging “Big Data” Research
Decomposing Social and Semantic Networks in Emerging “Big Data” Research
Han Woo PARK883 views
euclid_linkedup WWW tutorial (Besnik Fetahu) von Besnik Fetahu
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
Besnik Fetahu2.7K views
Thinking About the Making of Data von Paul Groth
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
Paul Groth440 views
RDA-WDS Publishing Data Interest Group von Anita de Waard
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
Anita de Waard401 views
Tools für das Management von Forschungsdaten von Heinz Pampel
Tools für das Management von ForschungsdatenTools für das Management von Forschungsdaten
Tools für das Management von Forschungsdaten
Heinz Pampel1.2K views
Sci 2011 big_data(30_may13)2nd revised _ loet von Han Woo PARK
Sci 2011 big_data(30_may13)2nd revised _ loetSci 2011 big_data(30_may13)2nd revised _ loet
Sci 2011 big_data(30_may13)2nd revised _ loet
Han Woo PARK1.9K views
Interlinking educational data to Web of Data (Thesis presentation) von Enayat Rajabi
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)
Enayat Rajabi1.1K views
A metadata scheme of the software-data relationship: A proposal von Kai Li
A metadata scheme of the software-data relationship: A proposalA metadata scheme of the software-data relationship: A proposal
A metadata scheme of the software-data relationship: A proposal
Kai Li101 views
The web of data: how are we doing so far von Elena Simperl
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
Elena Simperl196 views
David Shotton - Research Integrity: Integrity of the published record von Jisc
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published record
Jisc2.4K views
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016) von dri_ireland
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
dri_ireland1.1K views
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ... von Natsuko Nicholls
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Natsuko Nicholls721 views
Current and emerging scientific data curation practices von Michael Day
Current and emerging scientific data curation practicesCurrent and emerging scientific data curation practices
Current and emerging scientific data curation practices
Michael Day2.5K views
Experimental research data quality in von ijait
Experimental research data quality inExperimental research data quality in
Experimental research data quality in
ijait385 views
Metadata for digital long-term preservation von Michael Day
Metadata for digital long-term preservationMetadata for digital long-term preservation
Metadata for digital long-term preservation
Michael Day1.2K views
Research Objects: more than the sum of the parts von Carole Goble
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
Carole Goble1.5K views

Último

CRIJ4385_Death Penalty_F23.pptx von
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptxyvettemm100
6 views24 Folien
3196 The Case of The East River von
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East RiverErickANDRADE90
11 views4 Folien
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx von
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxRIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxJaysonGarabilesEspej
6 views3 Folien
How Leaders See Data? (Level 1) von
How Leaders See Data? (Level 1)How Leaders See Data? (Level 1)
How Leaders See Data? (Level 1)Narendra Narendra
13 views76 Folien
Chapter 3b- Process Communication (1) (1)(1) (1).pptx von
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptxayeshabaig2004
5 views30 Folien
ColonyOS von
ColonyOSColonyOS
ColonyOSJohanKristiansson6
9 views17 Folien

Último(20)

CRIJ4385_Death Penalty_F23.pptx von yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1006 views
Chapter 3b- Process Communication (1) (1)(1) (1).pptx von ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20045 views
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx von DataScienceConferenc1
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
Cross-network in Google Analytics 4.pdf von GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
UNEP FI CRS Climate Risk Results.pptx von pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
Supercharging your Data with Azure AI Search and Azure OpenAI von Peter Gallagher
Supercharging your Data with Azure AI Search and Azure OpenAISupercharging your Data with Azure AI Search and Azure OpenAI
Supercharging your Data with Azure AI Search and Azure OpenAI
Peter Gallagher37 views
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation von DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
RuleBookForTheFairDataEconomy.pptx von noraelstela1
RuleBookForTheFairDataEconomy.pptxRuleBookForTheFairDataEconomy.pptx
RuleBookForTheFairDataEconomy.pptx
noraelstela167 views
Building Real-Time Travel Alerts von Timothy Spann
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann111 views
Understanding Hallucinations in LLMs - 2023 09 29.pptx von Greg Makowski
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptx
Greg Makowski17 views
Organic Shopping in Google Analytics 4.pdf von GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials11 views
Short Story Assignment by Kelly Nguyen von kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 views
Survey on Factuality in LLM's.pptx von NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra15 views

Integration of research literature and data (InFoLiS)

  • 1. Integration of research literature and data (InFoLiS) Katarina Boland1 Philipp Zumstein2 1 GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany 2 Mannheim University Library, Mannheim, Germany CNI 2015 Spring Membership Meeting April 14th, 2015
  • 2. the InFoLiS project: Integration of research data and publications InFoLiS I: 05/2011 - 05/2013 InFoLiS II: 08/2014 - 08/2016 InFoLiS is funded by the DFG (SU 647/2-1) Integration of research literature and data (InFoLiS) 2/22 Introduction
  • 3. Catalogue: Publications SSOAR (GESIS), Primo (UB MA), ... DataCatalogue: Research Data da|ra (GESIS), ... Query Query Response Links Response Response Response Integration of research literature and data (InFoLiS) 3/22 InFoLiS Project Goals
  • 4. 1 Part I: Generation of Links 2 Part II: How can you reuse it? Integration of research literature and data (InFoLiS) 4/22 Outline
  • 5. Integration of research literature and data (InFoLiS) 5/22 Part 1: Generation of Links
  • 6. Recommendation:1 : Creator (Publication Date): Title. Publication Agent. Identifier Creator (Publication Date): Title. Version. Publication Agent. Type of Resource. Identifier. → Extraction based on these patterns? 1 see http://auffinden-zitieren-dokumentieren.de/zitieren/empfohlene-datenzitation/ Integration of research literature and data (InFoLiS) 6/22 Citation of Research Data
  • 7. presentation and discussion of the empirical findings. For this purpose, data from the Socio-Economic Panel (SOEP) of the years 1990 and 2003 are used and for both periods, the impact factors are estimated using linear regression models. data from the title of the years year are used Integration of research literature and data (InFoLiS) 7/22 References to Datasets
  • 8. Table 1: Population forecast for Germany depending on age cohorts - proportion in percent. Data base: 10th Population Forecast of the Federal Statistical Office , variant 5. (Data base: number title of the publication agent, variant variant) Integration of research literature and data (InFoLiS) 8/22 References to Datasets
  • 9. Consulted were furthermore ... Consulted were furthermore title1, title2, title3, ..., titleN. Integration of research literature and data (InFoLiS) 9/22 References to Datasets
  • 10. Table 3: Sample of the surveys conducted in the years 2003 and 2004 as well as size of the sample, with valid data from both surveys (Source: Ditton et al. 2005a) (Source: citation of descriptive publication) Integration of research literature and data (InFoLiS) 10/22 References to Datasets
  • 11. ...are hard to detect! see also... Green, Toby (2009). We Need Publishing Standards for Datasets and Data Tables. OECD Publishing White Paper. doi: 10.1787/603233448430 Altman, Micah and Gary King (2007). A Proposed Standard for the Scholarly Citation of Quantitative Data. In: D-Lib Magazine 13.3. url: http://www.dlib.org/dlib/march07/altman/03altman.html Integration of research literature and data (InFoLiS) 11/22 References to Datasets
  • 12. Integration of research literature and data (InFoLiS) 12/22 Automatic Identification of References Why not simply search for study titles in publications?
  • 13. Integration of research literature and data (InFoLiS) 12/22 Automatic Identification of References Why not simply search for study titles in publications? “ALLBUS/GGSS 1996 (Allgemeine Bev¨olkerungsumfrage der Sozialwissenschaften/German General Social Survey 1996)”
  • 14. Integration of research literature and data (InFoLiS) 12/22 Automatic Identification of References Why not simply search for study titles in publications? “ALLBUS/GGSS 1996 (Allgemeine Bev¨olkerungsumfrage der Sozialwissenschaften/German General Social Survey 1996)” “ALLBUS 96”
  • 15. Integration of research literature and data (InFoLiS) 12/22 Automatic Identification of References Why not simply search for study titles in publications? “Youth 2010”
  • 16. How do humans recognize study references? Source: Estimations based on SOEP, wave 2002. Integration of research literature and data (InFoLiS) 13/22 General idea
  • 17. How do humans recognize study references? Source: Estimations based on xyz, wave 2002. Integration of research literature and data (InFoLiS) 13/22 General idea
  • 18. Integration of research literature and data (InFoLiS) 14/22 Algorithm
  • 19. for details see... Katarina Boland, Dominique Ritze, Kai Eckert & Brigitte Mathiak (2012). Identifying References to Datasets in Publications. In: Proceedings of the Second International Conference on Theory and Practice of Digital Libraries (TPDL), Lecture Notes in Computer Science Volume 7489, pp. 150-161. Berlin: Springer. doi:10.1007/978-3-642-33290-6 17 Integration of research literature and data (InFoLiS) 15/22 Reference Extraction
  • 20. Integration of research literature and data (InFoLiS) 16/22 Mapping to Datasets in da|ra
  • 21. Strategies: 1) greedy; 2) exact; 3) best Integration of research literature and data (InFoLiS) 17/22 Mapping to Datasets in da|ra: granularity of registration vs. citation
  • 22. ALLBUS ALLBUS 2000 ALLBUS 1996ALLBUS 1998 ALLBUS 2000 CAPI/PAPI ALLBUScompact 2000 CAPI/PAPI ALLBUScompact 2000 CAPI ALLBUS - Cumulation 1980-2006 ALLBUS - Cumulation 1980-2008ALLBUScompact - Cumulation 1980-2010 ALLBUScompact 2000 ... ... ... ...... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...... ALLBUScompact → use ontology Integration of research literature and data (InFoLiS) 18/22 Mapping to Datasets in da|ra
  • 23. Vocabulary: e.g. DDI-RDF Discovery Vocabulary2 2 Thomas Bosch, Richard Cyganiak, Arofan Gregory, Joachim Wackerow (2013): DDI-RDF Discovery Vocabulary: A Metadata Vocabulary for Documenting Research and Survey Data. In: Proceedings of the 6th Linked Data on the Web (LDOW) Workshop at the 22nd International World Wide Web Conference (WWW). CEUR Workshop Proceedings, pp. 46-55 Integration of research literature and data (InFoLiS) 19/22 Ontology: Approach
  • 24. Integration of research literature and data (InFoLiS) 20/22 Links
  • 25. Example: da|ra Example: SSOAR Integration of research literature and data (InFoLiS) 21/22 Integration of Links into Information Systems
  • 26. Thank you for your attention! katarina.boland@gesis.org Integration of research literature and data (InFoLiS) 22/22 Next part: How can you reuse it?
  • 27. Part II How can you reuse it?
  • 30. (Internal) Data structure Document Pattern Executation of Algorithm Study Title Study URI
  • 31. (Internal) Data structure Document Pattern Executation of Algorithm Study Title Study URI Which studies are found in a document?
  • 32. (Internal) Data structure Document Pattern Executation of Algorithm Study Title Study URI How was a pattern derived? Which studies are found in an document?
  • 33. (Internal) Data structure Document Pattern Executation of Algorithm Study Title Study URI Which other study titles are found with the new configuration of the algorithm? How was a pattern derived? Which studies are found in an document?
  • 36. RESTful API (web services)  GET, POST, PUT, DELETE, PATCH resources  Search, perform algorithms, upload files  open for integration into other workflows, e.g. in  ressource discovery systems  research data catalogues  digital repositories  possible to orchestrate over a web interface for individual use
  • 37. Lookup services DB (links) lookup service publication URI study URI study URI reverse lookup service publication URI
  • 38. Extraction of study URIs from a PDF pdf (fulltext) DB (patterns) pdf2txt txt (fulltext) extract study titles study URI study titles linking
  • 40. Integration of publications and research data
  • 41. Quoting the Horizon Report 2014 “Visionary leadership for research data management models is also required to determine how to best incorporate data connections into library catalogs” (NMC Horizon Report 2014 - Library Edition, p. 7)
  • 42. Current situation: Several steps needed  Common situation today:  Search online catalogue  Evaluate search results  Find fulltext to relevant source  Read the publication  Spot the research data  Moreover, often the reverse information is missing completely  Which publications are built on some specific research data?
  • 43. Clientside load additional data in catalogue view (e.g. over Ajax)  enrich view, links  up-to-date data  Embedd data in the web presentation Serverside add additional data in your catalogue database (e.g. Primo enrichement process)  enrich view, links, search, sort, filter  time-lagged because of the update mechanism  Do the data fit into existing infrastructure? (fields, tables, database) Two Approaches
  • 44. Integration as links  Link from catalogue entry ...  … to the corresponding research data
  • 45. Integration as popup Cited research data: 2 • ALLBUS 2010 (used in 512 publications) • part of ALLBUS (used in 13.456 publications) • own research data (used in 1 publications)
  • 46. Integration in search/sort Cited data sets 4 Cited data sets 1 Sort by data citation
  • 48. Enrich your research data catalogue Cited in: Ritze, D., Paulheim, H., & Eckert, K. (2013). Evaluation Measures for Ontology Matchers in Supervised Matching Scenarios. In The Semantic Web – ISWC 2013 (p. 392–407). Tags from Publication: Supervised Ontology Matching, Evaluation, Recall, Precision, F-Measure, Precision@N- Curves, ROC-Curves, Precision-Recall- Curves
  • 49. Current Goals of the Project 1. Expansion to other disciplines and languages 2. Linked data based infrastructure 3. Improve the reusability of generated links
  • 50. Dissemination  our web services will be open for everyone  project webpage  http://infolis.github.io/  background information, slides, publications, news  Additionally our code is open source  https://github.com/infolis  you can install/try out everything locally  development of code
  • 51. Questions, Discussions, Feedback  Questions?  Discussions  Give us feedback  Small online survey: http://t1p.de/infolis http://wiki.bib.uni-mannheim.de/limesurvey/index.php?sid=55594