Life science’s main asset is its data. Data forms the basis of scientific decision making and its availability via electronic systems is a prerequisite for collaborative work and successful innovation. While more data is published as linked (open) data, huge amounts of data remain unused in internal data silos, such as various ELN’s, because of substantial integration efforts and data quality issues. Since the overwhelming amount of data is unstructured, information extraction and corresponding classification and semantic labeling of content is required. To generate value from your ELN data, a solid informatics strategy is needed to ensure data quality and streamline analytics. Semantic technologies are key enabler to overcome existing limitations.
4. slide 4
OSTHUS Group
Connecting Data, People and Organizations
Onsite
Support
Lab
&
Data
Implementation
&
Integration
Business
Process
Science
Installation
&
Roll-Out
Maintenance
&
Support
Requirements
Engineering
Cutting edge in R&D
Consulting, Solutions and
Services
Global Partner
for industry
Independent
from vendors and platforms
Hosted
Services
Digital Life Science Consulting Integrated Solutions Managed Services
Data Science Lab Informatics
12. Slide 12
Use of ELN
Simple documentation of performed experiments
Capturing and processing of measurement data
Import of result data from instruments
Compare measurements
Check consistency
Search and retrieval
Reporting & IP
14. Slide 14
Value of ELN data
Documentation and IP
Answering scientific questions
Answering business questions
Multiple use within different scenarios
15. Slide 15
The Drug Discovery Life-Cycle
publications
organisms
substances
molecules
devices
linked ct
Drug Bank
drug side
effects
genes
proteins
pharmacology
social
mediapatient
reports
locations,
organizations
market
analysis
„Since I take medication X my
stomach feels better, however I
am always so tired.”
experiments
(pre-clinical )
clinical trials production
use of
medication
16. Slide 16
Roadmap
Code (Lists) Terms (Soil, Plant, etc.)
Controlled Vocabulary
(Agreed Upon Terms)
Taxonomy
(Hierarchy)
Thesaurus
(Preferred Labels, Synonyms, etc.)
RDF Models
(Triples as Graphs)
OWL Ontologies
(RDF + Axioms)
Reasoning
(Rule-based Logics:
Discover New Patterns)
Ontologies and Reasoning add
Axioms and Advanced Logic
17. Slide 17
The 4V’s of Big Data
Normally the focus –
Big Data Analysis is
more than just size
Performance is
Critical to Success
Data complexity is
increasing – Model
complexity
Uncertainty abounds
– requires statistics
and probabilities
Majority of Big Data analytics
approaches treat these two V’s
Semantic
technologies provide
clear advantages
Mathematical
Clustering
Techniques
provide clear
advantages
18. Slide 18
Words, Terms and Concepts
isobutylphenylpropanoic acid
word
term = A compound of words
with a specific meaning in a
certain context.
concept = ”An abstract entity signifying a general characterizing idea or
universal which acts as a category for instances. The unit of semantics
(meaning), the node in some mental or knowledge organization system.”
[Obrst2010]
19. Slide 19
Synonyms are …
different terms which represent the same concept:
TraumaDolgit Gel
IP82
Ibuprofen, Copper (2+) Salt
Calcium Salt Ibuprofen
Ibuprofen, Sodium Salt
Ibuprofen-Zinc
Magnesium Salt Ibuprofen
isobutylphenylpropanoic acid
IP-82
Ibuprofen, Zinc Salt
Motrin
Benzeneacetic acid, alpha-methyl-4-(2-methylpropyl)-
Ibumetin
Ibuprofen I.V. Solution
Potassium Salt Ibuprofen
Rufen
alpha-Methyl-4-(2-methylpropyl)benzeneacetic Acid
Trauma Dolgit Gel
Nuprin
Brufen
…
Sources: MeSH Thesaurus, ChEBI Ontology
20. Slide 20
Abbreviation example
• Has its origins in philosophy - generally understood as
the abstract study of meaning
• Distinguished from syntax – which is the rules-based
grammar of a language
“Washington”
22. Slide 22
Textual Description
Ibuprofen, from isobutylphenylpropanoic acid, is a
nonsteroidal anti-inflammatory drug (NSAID) used for treating
pain, fever, and inflammation. This includes painful menstrual
periods, migraines, and rheumatoid arthritis. About 60% of
people improve with any given NSAID, and it is recommended
that if one does not work then another should be tried. It may
also be used to close a patent ductus arteriosus in a premature
baby. It can be used by mouth or intravenously. It typically
begins working within an hour.
Common side effects includes heartburn and a rash. Compared
to other NSAIDs it may have fewer side effects such as
gastrointestinal bleeding. It increases the risk of heart failure,
kidney failure, and liver failure…
Source: https://en.wikipedia.org/wiki/Ibuprofen
23. Slide 23
Textual Description
Ibuprofen, from isobutylphenylpropanoic acid, is a
nonsteroidal anti-inflammatory drug (NSAID) used for treating
pain, fever, and inflammation. This includes painful menstrual
periods, migraines, and rheumatoid arthritis. About 60% of
people improve with any given NSAID, and it is recommended
that if one does not work then another should be tried. It may
also be used to close a patent ductus arteriosus in a premature
baby. It can be used by mouth or intravenously. It typically
begins working within an hour.
Common side effects includes heartburn and a rash. Compared
to other NSAIDs it may have fewer side effects such as
gastrointestinal bleeding. It increases the risk of heart failure,
kidney failure, and liver failure…
Source: https://en.wikipedia.org/wiki/Ibuprofen
25. Slide 25
Semantic Networks
A simple, non-formal way to express the meaning of a concept
through relations (links) to other concepts.
antipyretics
C13H18O2
rash
symptom
pain
cyclooxygenase 2
treats
is-a
broader
has-formula
trade name
may-has-side-effect
is-a
medication
ibuproxam
narrower
ibuprofen
inhibitor-of
Motrin
26. Slide 26
Taxonomies and Ontologies
Opportunity:
• Many existing taxonomies available
• Company-specific adaptations: additional classes, synonyms, relations etc.
Insect
Sucking Insect
Leaf Miner
has pest
28. Slide 28
Why Semantics Matters for Data Analytics
Big Data approaches
require proper metadata
and terminologies to
integrate information well
Relationships matter in the
data
Understanding perspective
(context) is crucial for
success in today’s world
Semantics provides better
data models/schemas
29. Slide 29
Smart Labs for the 21st Century
Smart labs in the future will
provide the enterprise with:
Integrated Data – common
reference data structures
(vocabularies)
Sharable Data – easier interaction
across teams and business units
Scalability – Big data applications
that can be highly elastic
Conceptual Representations –
context and perspective are
captured
Advanced Analytics – complex &
automated problem-solving
capabilities
30. Slide 30
Reference Data Management: ensure a common
language between your applications
ELN
DWH
LIMS
InstrumentsInventory
Reporting tools
Reference Data Service
• provides shared vocabulary
• provides synonyms
• provides mapping
• …
31. Slide 31
References and more information
OSTHUS Webinar
(https://www.youtube.com/watch?v=Drm3r3BVkxE)
Allotrope Foundation
(http://www.allotrope.org/)
SmartLab 2016
(https://www.youtube.com/watch?v=maA1nQEedos)
32. slide 33
Thank you for your attention!
Heiner Oberkampf
Tel.: +49 241-94314-490
Fax: +49 241-94314-19
Email: heiner.oberkampf@osthus.com
Web: www.osthus.com
Friedrich Hübner
Tel.: +49 241-94314-476
Fax: +49 241-94314-19
Email: friedrich.huebner@osthus.com
Web: www.osthus.com