Aleksandar Kapisoda: The semantic approach for tracking scientific publications
1. Boehringer Ingelheim Pharma GmbH & Co. KG
Scientific Information Center - Aleksandar Kapisoda
Semantics 2015 - Vienna, Austria
The semantic approach for tracking
scientific publications
2. Content
1. Intro
2. Goal
3. Overview Data & Technology
4. Workflow / Pipeline
5. Challenges
6. Data Curation
7. Conclusions
8. Outlook
9. Acknowledgement
Semantics 2015
Vienna, Austria
2
4. 1895
We don't live in that kind of world
1895
Paul Otlet
Semantics 2015
Vienna, Austria
4http://www.mondotheque.be/wiki/index.php/Here
5. Paul Otlet
The Father of Information Science
He is one of several people who have been considered the
father of information science, a field he called "documentation".
As a young man, THE UTOPIAN started to think about a system that could represent
the multiple networked relations between objects of various formats with various
objectives.
Paul Otlet designed explicitly mapped multiple relations between multi-media
objects (so not just books) and allowed for constant transformation and modification.
THE UTOPIAN was imagining his universal information structure by making 'symbolic
links' from document to document.
Semantics 2015
Vienna, Austria
5
6. 1895 - Universal Decimal Classification system
http://www.mondotheque.be/wiki/index.php/Here
Paul Otlet created 1895 the
Universal Decimal Classification,
(based on the Dewey Decimal
Classification),
one of the most prominent
examples of faceted
classification
Semantics 2015
Vienna, Austria
6
7. 1895 - Universal Decimal Classification system
2015 - Taxonomies, Dictionaries & Ontologies
https://blog.semantic-web.at/wp-content/uploads/2011/02/GICS_PP.jpg
In 2015
we are creating,
editing and using
Taxonomies,
Dictionaries & Ontologies.
Semantics 2015
Vienna, Austria
7
8. 1934 - Radiated Library
Semantics 2015
Vienna, Austria
8
Paul Otlets vision
The Book of the Books
A great network of
knowledge which is centered on
documents, included the notions,
books, journals, radio, televisionâŚ
In 1934, Paul Otlet laid
out this vision in
what he called âRadiated Libraryâ
vision.
http://www.mondotheque.be/wiki/index.php/Here
9. 1934 - Radiated Library
2015 â World Wide Web / Internet
Otlet's writings have sometimes been
called prescient of the current
World Wide Web/ Internet
His vision of a great network of
knowledge was centered on documents
and included the notions of hyperlinks,
search engines, remote access,
and social networksâalthough these
notions were described by different names.
In 1934, Otlet laid out this vision of the
computer and internet in what he called
âRadiated Libraryâ vision.
Semantics 2015
Vienna, Austria
9
http://www.mondotheque.be/wiki/index.php/Here
10. 1934 - Universal Information Structure
https://s-media-cache-ak0.pinimg.com/736x/7d/71/0f/7d710ffe8ad97234ebc4867546d68a28.jpg
Semantics 2015
Vienna, Austria
10
Paul Otlet
was imagining his
universal information structure
by making 'symbolic links'
from document to document,
11. 1934 - Universal Information Structure
2015 â Semantic Web
Paul Otlet
was imagining his
universal information structure
by making 'symbolic links'
from document to document,
a system that looks surprisingly similar
to what we now might call a
'Semantic Web'.
https://s-media-cache-ak0.pinimg.com/736x/7d/71/0f/7d710ffe8ad97234ebc4867546d68a28.jpg
Semantics 2015
Vienna, Austria
11
12. Ubiquitous Web/Symbiotic Web
Semantics @ BI - Evolution of Information Management
12
Evolution of WebTechnology
Semantic Web
Semantic Databases, Linked Data
Semantic Search, RDF,Text Mining
2020 - 2030
1990 - 2000
2000 - 2010
2010 - 2020
Year Evolution of Information Management at BI
Scientific Information Center
âExpert Searchesâ based onText MiningTechnologies
Data Analysis based on SemanticTechnololgies
Version
Web 4.0
Web 1.0
Web 2.0
Web 3.0
Scientific Information Center
Text Mining, BI-internal Wikis (MediaWiki)
Social Web
Blogs, Wikis
Keyword Search
World Wide Web
Portals, Internet
Databases, File Servers, SQL
Scientific Library
E-Journals
LinkSolver
Interaction between
humans and machines
in symbiosis
Comparison WebTechnology vs. BI internal Information Management
13. BI â Publication Tracker
Goal
Why BI needs
a Publication Tracking System?
Semantics 2015
Vienna, Austria
13
14. BI â Publication Tracker
Goals
Automatically Data import
ContentCuration
State of the Art Visualisation
Storage in a semantic database
Data Analysis possible
Semantics 2015
Vienna, Austria
14
Manually added database
NoContent Curation
Primite Visualisation
Storage
No Data Analysis possible
Scientific Publication Database
(State July 2015)
BI Scientific PublicationTracking
Going live September 2015
15. Goal â Data Analysis
Number of BI Research Publications in 2015 (Q1, Q2)
Semantics 2015
Vienna, Austria
15
Sample Data
TA: Therapeutic Area
16. Goal â Data Analysis
Impact Factors 2015 (Q1 + Q2) & Published Article
Semantics 2015
Vienna, Austria
16
Sample Data
TA: Therapeutic Area
17. Goal - Analysing data
Based on Impact Factor Journal Ranking
Semantics 2015
Vienna, Austria
17
https://sciencetechblog.files.wordpress.com/2011/05/journal-impact-factors-2008_1.jpg
18. BI â Publication Tracker
Why BI is using
Semantic Technology
for
Publication Tracking?
Semantics 2015
Vienna, Austria
18
19. Scientific Publication
How it is looking like?
Semantics 2015
Vienna, Austria
19
http://www.ncbi.nlm.nih.gov/pubmed/26210363
20. BI â Publication Tracker
Overview Data & Technology
⢠Data & Data Storage
⢠xml. files from OVID http://www.ovid.com
⢠MS Excel (sheet)
⢠Virtuoso Universal Server as a Triple Store http://virtuoso.openlinksw.com/
⢠Systems
⢠PoolParty (Thesaurus Server) https://www.poolparty.biz/portfolio-item/poolparty-thesaurus-server
⢠PoolParty Graph Search https://www.poolparty.biz/tag/graph-search
⢠SPARQL http://www.w3.org/TR/sparql11-query/#docResultDesc
⢠Spring https://de.wikipedia.org/wiki/Spring_(Framework)
⢠Maven http://maven.apache.org
Semantics 2015
Vienna, Austria
20
21. BI â Publication Tracker
Workflow / Pipeline
auto-alerts from ovid (.xml file)
Alert Profile
(SearchTerms)
Scheduled Alerts
Content Enrichment
Admin User Interface
SIC Crawler
21
Current awareness searches
Data Curation
Thesaurus Management System
âreflects the average
number of citations to
recent articles published
in a journalâ
Impact Factor List
Virtuoso Database
BI PublicationTracker
User Interface
22. Data Curation & Analysis
Challenges
Cleaning noisy data
from ovid.xml
Authors
Institutions
Cleaning noisy data
from ovid.xml
Authors
Institutions
Adding
BI internal data
Division
Theraupeutic Area
Location
Adding
BI internal data
Division
Theraupeutic Area
Location
Adding
external data
Impact Factors
Lightweight
High scalable
User Interface
Adding
external data
Impact Factors
Lightweight
High scalable
User Interface
Challenges
Semantics 2015
Vienna, Austria
22
23. Data Curration - Challenge
Data from .xml
Cleaning noisy data
from ovid.xml
Authors
Institutions
Cleaning noisy data
from ovid.xml
Authors
Institutions
Adding
BI internal data
Division
Theraupeutic Area
Location
Adding
BI internal data
Division
Theraupeutic Area
Location
Building
Web Application
Lightweight
High scalable
User Interface
Building
Web Application
Lightweight
High scalable
User Interface
Noisy Data
PoolParty
Thesaurus Server Admin GUI User GUI
Semantics 2015
Vienna, Austria
23
33. Data Curration - Challenge
BI internal Data
Cleaning noisy data
from ovid.xml
Authors
Institutions
Cleaning noisy data
from ovid.xml
Authors
Institutions
Adding
BI internal data
Division
Theraupeutic Area
Location
Adding
BI internal data
Division
Theraupeutic Area
Location
Building
Web Application
Lightweight
High scalable
User Interface
Building
Web Application
Lightweight
High scalable
User Interface
Missing BI internal data
PoolParty
Thesaurus Server Admin GUI User GUI
Semantics 2015
Vienna, Austria
33
35. Data Visualisation & Analysis
Challenges
Cleaning noisy data
from ovid.xml
Authors
Institutions
Cleaning noisy data
from ovid.xml
Authors
Institutions
Adding
BI internal data
Division
Theraupeutic Area
Location
Adding
BI internal data
Division
Theraupeutic Area
Location
Adding
external data
Impact Factors
Lightweight
High scalable
User Interface
Adding
external data
Impact Factors
Lightweight
High scalable
User Interface
Visualization & Analysis
PoolParty
Thesaurus Server Admin GUI
Visualisation &
Analysis
Semantics 2015
Vienna, Austria
35
37. Conclusions
Cleaning noisy data
from ovid.xml
Authors
Institutions
Cleaning noisy data
from ovid.xml
Authors
Institutions
Adding & Linking
BI internal data
Division
Theraupeutic Area
Location
Adding & Linking
BI internal data
Division
Theraupeutic Area
Location
Adding & Linking
external data
Impact Factors
Lightweight
High scalable
User Interface
Adding & Linking
external data
Impact Factors
Lightweight
High scalable
User Interface
PoolParty
Thesaurus Server Admin GUI
Visualisation &
Analysis
Semantics 2015
Vienna, Austria
37
38. Conclusions
⢠Linked Data:
Reuse of the Data (SPARQL Endpoint)
ď Domain Expert
⢠Data Network Solution
Semantics 2015
Vienna, Austria
38
39. Outlook:
What We Want to achieve in the Next Steps
Technology
User Perspective
GUI
⢠Export of Search Results
Optimization of data of data import
⢠Using Ovid RSS-feeds for updates
Semantics 2015
Vienna, Austria
39