1. Elsevier Health Sciences
Smart Content Drives Smart Applications
Linked Data in HCLS for Commercial Applications
Semantic Web for Health Care and Life Iker Huerga
Sciences Summer School, W3C@MIT Sr. Semantic Software Engineer
August 29, 2012 i.huerga@elsevier.com
@ihuerga
Elsevier Proprietary and Confidential
3. The Challenge
Providing doctors/researchers with
the right information in the right
moment to make the best decisions
Elsevier Health Sciences | Proprietary and Confidential
4. How to solve it?
Elsevier Proprietary and Confidential
5. How to Solve it
•Step 1
Making Elsevier’s authoritative Health Care and Life
Sciences content “smarter”
•Step 2
Enriching Elsevier’s content by integration with third party
data
•Step 3
Creating interfaces to provide fast discoverability of the
most relevant answers and more intuitive searching.
We Need Semantic Web for all This
Elsevier Health Sciences | Proprietary and Confidential
6. Introducing
Smart Content
Elsevier Proprietary and Confidential
7. Taxonomy-Powered Content = Smart Content
Content with applied taxonomy
Content today with
structured XML
Copyright 2011 Outsell Gilbane Services, Inc.
Elsevier Proprietary and Confidential http://www.outsellinc.com
http://gilbane.com/xml/2009/11/what-is-smart-content.html#ixzz0hnuRhaBc
8. Smart Content At Elsevier
Smart Content Applications
Better discovery through
semantic search & navigation
Linked data from
•Faceted search & browse
partners and the Web
•Ontology-driven navigation
•Task-specific results
•Personalized/localized results
•Question answering
•Link to evidenced-based content
Text Better understanding
through analysis and
Entities, visualization
Elsevier concepts •Tag clouds
content Tables and •Heatmaps
•Streamgraphs
relationships •Scatterplots
•Time series
Images •Animations
New knowledge through
aggregation and synthesis
•Topic pages
Elsevier •Social network maps
knowledge •Geolocation maps
organization •Data mashups
systems •Text mining reports
8
Elsevier Proprietary and Confidentiall
9. Introducing EMMeT (Elsevier Merged Medical Taxonomy)
Parent Terms
• Breast Disorders 2
• Cancer of the Thorax
• Mammary Neoplasms
• More….
Symptoms Breast Lump, Nipple Retraction, …..
Medical Name
Diagnostic
Malignant Neoplasm of the Breast Mammography, Breast Biopsy, …..
Procedures
Consumer Friendly Name
Breast Cancer
Synonyms 1 4
Malignant Tumor of Breast Treatment
Chemotherapy, Mastectomy, ….
Malignant Breast Neoplasm Procedures
Semantic Relationships
Breast Ca
Codes
ICD9 – 174.9
MeSH – D001943 Medications Tamoxifen, Doxorubicin, …..
SNOMED-CT – 190121004
Semantic Type/Group
Neoplastic Process/Disease
Risk Factors Family History, Genetics, Predisposition, ….
Children Terms
• Breast Sarcoma
3 Prevention Screening, Preemptive Mastectomy, ….
• Familial Breast Cancer
• Malignant lymphoma of the Breast
• Malignant Neoplasm of the breast outer
quadrant Complications Metastatic Cancer, ….
• More…
Elsevier Proprietary and Confidential
10. Automated Indexing: Weighted Tags for Better Search
Article-level SMART Content tags help
confirm relevance and provide a topical
overview about a piece of content.
Paragraph-level SMART Content tags
uncover highly-relevant information not
necessarily evident from the title or
abstract alone.
Elsevier Proprietary and Confidential
11. Standards
The Key Piece
Elsevier Proprietary and Confidential
12. The Satellite: a Linked Data Compliant Data Format
• Motivations:
–Help answer research questions
–Direct material to interested readers
–Extract disparate facts from the
literature to create knowledge bases Satellite Specification First Version
•Use RDF/XML serialization
• Technical Requirements:
•Use XML Schemas to validate the syntax
–Use of open standards based so that document which validate will
metadata frameworks: SKOS, DCMI
produce correct RDF
and SWAN
–Need of a common model to represent •Use the extensive XML-capable
ontological annotations infrastructure, QA tools, etc.
–Data will be transferred from suppliers
to Elsevier and back
–QA of tags (aka Provenance)
–Some people have RDF knowledge,
but very limited in proportion
Elsevier Proprietary and Confidential
13. The Satellite Format: a Linked Data Compliant Data
Elsevier Health Sciences | Proprietary and Confidential
14. The Satellite Format: a Linked Data Compliant Data
•What we have learned so far
–RDF/XML has some limitations
• Not all RDF graphs can be serialized in XML (QNames, Unicode characters)
• There is no support for RDF Graphs in RDF/XML, at the moment one satellite is one RDF Graph
in the LDR
• Complexity of RDF/XML abbreviation rules
• Can’t put attributes on the predicates
–An XML Capable infrastructure does not necessarily entail an RDF/XML Capable
infrastructure
• Many XML tools can’t be used with RDF/XML
• Multiple different serializations for the same RDF Graph exist
• XML Schema validation makes the specification less flexible
It’s time to move towards a more “RDF friendly” serialization
Elsevier Health Sciences | Proprietary and Confidential
15. The Satellite Format: a Linked Data Compliant Data
•Turtle as the RDF serialization format
–It is becoming the de facto serialization for RDF
–It makes RDF much more ‘human friendly’
–Gives us the flexibility we need for the next satellite generation
–All the Libraries we are currently using support Turtle
–It follows the triple pattern syntax of SPARQL, more convenient for querying
•Steps to the transition
–Both serializations will coexist for a period of time
–Internal tools, Validation, QA, etc., need to be adapted to ‘understand’ Turtle
–Tools for transforming RDF/XML into Turtle needs to be provided to the suppliers
Elsevier Health Sciences | Proprietary and Confidential
16. How is all this transformed
into Commercial applications
Elsevier Proprietary and Confidential
17. The Linked Data Repository
• The LDR stores metadata describing Non Information Resources [httpRange14]
• The LDR provides a rich semantic layer on top of IR and enables search and discovery of
metadata
• Extends Elsevier extracted knowledge by interlinking data with other related sources of
content from partners and the Web, using the Web as its API
• Optimized for high-volume of RDF I/O operations
• Provide service layer APIs for ease of integration
• Opens up discovery and utility of content beyond searchable documents
Elsevier Health Sciences | Proprietary and Confidential
18. Represent Enhancements and Vocabularies In RDF
Satellites
•Creation of Satellite Standards
–Linked data compliant RDF representing metadata objects
–Leverage common namespaces from dct, pav, rdf, skos
–Taxonomies in SKOS to enhance portability in the linked data
world LDR
–Subject tagging against a vocabulary representing extracted
knowledge
–Concept URIs that can be equated to URIs in linked data
•Example RDF Statements
–Tags from a taxonomy for a given document
–Document sections relevant to a given concept
–Document sections providing answers to a given question
–Genes mentioned in a given document
–Documents supporting or disputing conclusions of a given
document
Elsevier Proprietary and Confidential –Concepts in the areas of expertise for a given author
19. LDR Semantic Infrastructure
Linked Data Linked Data Loader (REST)
Data Space Services
Vocab &
Annotation
Linked Data
Annotation
Satellites
Satellites
Satellites
3rd Party
Vocab
Asset
RDF
Data
Satellites
Smart Content Indexing Pipeline
Linked Data Pipeline Services (Hadoop)
AWS Cloud Management
EMMeT Vocabulary SKOS
Semantic
RDF Validation
Ontology Svcs
Generation
Interlinking
Reasoning
Transform
Network
N-Quads
Extract
JSON
…
Tagging and Indexing
Services (Concepts,
Content
Elsevier
Chapters, Articles,
Guidelines,etc)
RDF Generation Discovery Services (Semantic Knowledgebase)
3rd Party
Content
Content
Instit.
Amazon MongoDB SOLR/SIRE Virtuoso
S3 NoSQL n Triplestore
Product-specific
Smart Content Access & Admin &
Atom Feed Analytics
Search Index Entitlements Monitoring
Discovery Svc Ontology
SPARQL Alerts
API (REST) Service
19
Elsevier Proprietary and Confidential
20. Clinical Key - the most clinically relevant answers
Elsevier Proprietary and Confidential
21. Clinical Key - the most clinically relevant answers
Elsevier Proprietary and Confidential
22. Comprehensive Drug Research
• Moving world-class content online to Point of Care.
• Extracted knowledge is linked for further enrichment.
• Information is condensed, immediate and actionable.
Elsevier Proprietary and Confidential
23. Linking Patient Data To Evidence-Based Research
- Discover knowledge from research relevant to a
patient profile
- Alerts on FDA Announcements.
Elsevier Proprietary and Confidential
24. SciVerse Widgets Powered by Smart Content
Article search on ScienceDirect results in related
specialty content recommendations available from
The Lancet Journal.
Elsevier Proprietary and Confidential
25. Questions
Iker Huerga
i.huerga@elsevier.com
Elsevier Proprietary and Confidential