NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web

Linked
Data
for
Smart
Content

Ellen
Hays,
Elsevier
Labs

e.hays@elsevier.com

Presented
at:

NISO
Webinar
on
Seman?c
Web
Linking

28
September
2011

1

Why
Smart
Content?

Elsevier’s
readers
want
more
than
text
and

images,
that
is,
more
than
simply
an
online

rendi?on
of
what
we
print.
They
want:

•  Seman?cally
enhanced
content,
such
as
mashups

that
combine
informa?on
from
diverse
sources

and
in
diverse
media

•  The
ability
to
do
seman?cally-‐mo?vated
search

•  Source
data,
and
the
tools
to
mine
it
eﬀec?vely
for

more
informa?on

•  I.e.,
informa?on,
presented
in
ways
that
make
it

straighOorward
to
use
and
understand

2

The
challenge

How
to
do
seman?c
enhancement
at
scale

for
STM
publishing?

•  In
harmony
with
our
culture
and
legacy

•  Across
the
breadth
of
our
content

•  Within
an
ecosystem
of
authors,

ins?tu?ons,
publishers,
content
suppliers,

and
funding
agencies

3

Smarter
Content
Applied Smart Content
Better discovery
Text
•  Faceted search & browse
•  Ontology-driven navigation
Elsevier
•  Task-specific results
content •  Personalized/localized
Tables results
•  Question answering

Images
Better understanding
•  Tag clouds
•  Heatmaps
Related
Concepts: •  Streamgraphs
Elsevier
content Metadata, •  Scatterplots
and data Entities, •  Time series
Relationships •  Animations
Actionable, persuasive knowledge
•  Topic pages
•  Social network maps
Linked data •  Geolocation maps
from partners •  Data mashups
and the Web •  Text mining reports

4

Content
enrichment
Evaluation and management of delirium in hospitalized older
patients
Delirium is common in hospitalized older patients and may be a
Title • 
Concepts
and
rela?ons

symptom of a medical emergency, such as hypoxia or hypoglycemia.
It is characterized by an acute change in cognition and attention,
between
concepts
are

although the symptoms may be subtle and usually fluctuate
throughout the day. This heterogeneous syndrome requires prompt iden?ﬁed
in
text,
compared
to

recognition and evaluation, because the underlying medical condition
Disease
may be life threatening. Risk factors for delirium include visual a
controlled
vocabulary
or

impairment, previous cognitive impairment, severe illness, and an
elevated blood urea nitrogen/serum creatinine ratio. Interventions seman?c
model,
and
the

that have been shown to reduce the incidence of delirium in at-risk
hospitalized patients include repeated reorientation of the patient to
person and place, promotion of good sleep hygiene, early
resul?ng
informa?on
is
stored

Clinical finding
mobilization, correction of dehydration, and the minimization of
unnecessary noise and stimuli. The treatment of delirium centers on
as
RDF
in
annota?on
ﬁles

the identification and management of the medical condition that
triggered the delirious state. Nonpharmacologic interventions may be
• 
The
storage
mechanism
for

beneficial, but antipsychotic agents may be needed when the cause
is nonspecific and other interventions do not sufficiently control this
informa?on
is
the
Elsevier

symptoms such as severe agitation or psychosis. Although delirium
is aDrugs condition, it may persist for several months in the
temporary Linked
Data
Repository
(LDR)
most vulnerable patients. Patient outcomes at one year include a
higher mortality rate and a lower level of functioning compared with
age-matched control patients. Copyright © 2008 American Academy
of Family Physicians.

Source
5

Guiding
principles

•  Leverage
our
exis?ng
content
produc?on

workﬂow
and
infrastructure

•  Acknowledge
a
deep
dependence
on
subject

maZer
exper?se,
third
par?es
and
the
Web
for

content
enhancement
and
knowledge

organiza?on
systems

•  Deliver
beneﬁts
across
the
complementary
use

cases
of
researcher
and
prac??oner

6

Current
approach

•  Embrace
linked
data
principles

•  Reuse
Web-‐standard
vocabularies,
taxonomies,

ontologies
and
en?ty
resources
where
possible

•  Start
with
a
focus
on
standards
and

infrastructure

•  Leverage
partners
and
acquisi?ons
for
content

enhancement
algorithms/capabili?es

•  Build
out
linked
data
design
paZerns
for

applica?on
development

•  Explore
new
product
opportuni?es
around

linked
data

7

Linked
data
principles

1.  Use
URIs
to
name
things

2.  Use
HTTP
URIs
so
they
can
be

looked
up

3.  Return
useful
data
when

things
are
looked
up

4.  Include
links
to
other
things

in
the
returned
data

“Linked
data
is
just
a
term
for
how
to

publish
data
on
the
web
while
working

with
the
web.
And
the
web
is
the
best

architecture
we
know
for
publishing

informa?on
in
a
hugely
diverse
and

distributed
environment,
in
a
gradual

and
sustainable
way.”

Tennison
J,
2010.
Why
Linked
Data
for
data.gov.uk?
hZp://
www.jenitennison.com/blog/node/140

ShoZon
D,
Portwin
K,
Klyne
G,
Miles
A,
2009.

Adventures
in
Seman?c

Publishing:
Exemplar
Seman?c
Enhancements
of
a
Research
Ar?cle.
PLoS

Comput
Biol
5(4):
e1000361.
doi:10.1371/journal.pcbi.1000361

Standards:
Content
satellites

Content
satellites
are
XML
documents
containing
RDF

statements;
for
example:

•  Tags
from
a
taxonomy
for
a
given
document

•  Document
sec?ons
relevant
to
a
given
concept

•  Document
sec?ons
providing
answers
to
a
given
ques?on

•  Learning
objects
compliant
with
a
given
state
educa?onal

standard

•  Genes
men?oned
in
a
given
document

•  Documents
suppor?ng
or
dispu?ng
conclusions
of
a
given

document

•  Concepts
that
are
in
the
areas
of
exper?se
for
a
given
author

Goal
is
to
balance
expressivity
and
manageability
for

seman?c
enhancement

•  Constrain
the
RDF
serializa?on
to
allow
exis?ng
XML-‐centric

staff,
tools,
and
workflows
to
accommodate
RDF
modeling
for

specific
applica?on
use
cases

9

Infrastructure:

Linked
Data
Repository

•  Allows
Elsevier
plaOorms
and
applica?ons
to
retrieve

and
store
content
enhancements

•  About
Elsevier
content

•  About
third
party
content

•  Allows
third
par?es
to
store
content
enhancements

•  About
primary
and
secondary
content

•  Provides
a
REST
API
for

•  CRUD
opera?ons
on
satellites
as
RDF
named
graphs

•  Simple,
low-‐expressivity
queries
across
stored
named
graphs

•  For
<subject>,
give
me
all
objects
for
<property>

•  Give
me
all
subjects
that
have
<object>
for
<property>

•  These
can
be
for
sets
of
subjects
and
objects

•  Supports
content
nego?a?on

•  Op?mized
for
high-‐volume
read-‐write
of
RDF
named

graphs

10

Beneﬁts
of
the
LDR

•  Unprecedented
access
to
Elsevier
content

•  Key
enabler
for
providing
advanced
seman?c
search

across
products

•  Provides
links
to
other
data
sources
to
provide

further
contextual
enrichment

•  Allow
others
to
discover
and
integrate
with
Elsevier

content

•  Link
content
across
domains

•  Data
can
be
pulled
out
of
large
amounts
of
text
and

organized
for
review
and
ac?on

•  Informa?on
mining
for
compliance
and
research

•  Create
mashups
from
mul?ple
data
sources

•  Present
informa?on
with
enhanced
visualiza?on

11

Mining
text
for
semanHc
data

Building
the
databases
that
support
content

enrichment
includes
extrac?ng
from
unstructured

text:

―
men?ons
of
concepts

―
men?ons
of
rela,ons

between
concepts

―
other
seman,c

informa,on,
such
as

document

metadata

and
context
indicators

http://www.ifs.tuwien.ac.at/dm/

12

Mining
text
for
semanHc
data

•  We’re
exploring
a
range
of
tools
and
techniques
to
do

text
mining,
including:

Rule-‐based
informa?on
extrac?on

Sta?s?cal
informa?on
extrac?on

Mapping
terms
in
text
to
thesauri
(Ei
Thesaurus,
EMTREE)

or
other
sources
of
lexical/seman?c
informa?on

•  Working
with
GATE
and
UIMA
components
to
design
and

implement
language
processing
pipelines,
and
with
a

number
of
text
mining
vendors

•  Because
Elsevier
publishes
in
a
broad
range
of
subject

areas,
content
types,
and
languages,
no
one
approach
is

appropriate
for
all
uses

13

SemanHc
and
lexical
models

Suppor?ng
our
text
mining
eﬀorts
is
an
increased
focus
on

acquiring,
building,
and
maintaining
vocabularies
and

seman?c
models,
including:

Dic?onaries/thesauri

Taxonomies

Ontologies

We
reuse
Web-‐standard
seman?c
and
lexical
resources

wherever
possible,
but
also
create
applica?on-‐speciﬁc

domain
models,
some?mes
by
hand,
for
narrow

domains

These
seman?c
resources
are
also
stored
in
the
LDR,
which

links
seman?c
data
to
documents,
to
non-‐text
content,

and
to
other
resources,
to
create
a
web
of
meaningful

and
re-‐usable
informa?on

14

Smart
Content
design
paIerns

Linked
data
•  Link-‐following
naviga?on
over
linked
graph
of

browser
RDF
resources

•  Integrated
presenta?on
of
content
and
data

Mashup
across
mul?ple
sources

•  Free
text/faceted
search
over
document/data

Seman?c
search
sets

•  Rela?onal
query
over
aggregated/federated
sets

Seman?c
query
of
RDF
statements

15

Example:
Marine
Geology

16

Example:
Marine
Geology

17

Linking
data
to
support
enriched

content
is
an
essenHal
part
of
the

future
of
STM
publishing

Ellen Hays,
Elsevier Labs
e.hays@elsevier.com

18

NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web

Ähnlich wie NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web (20)

Mehr von National Information Standards Organization (NISO)

Mehr von National Information Standards Organization (NISO) (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web