1. Leveraging the Semantic
Web with Drupal 7
Stéphane Corlosquet, Paolo Ciccarese
MIND Informatics
SemTechBiz San Francisco 2012
June 4th, 2012
2. About the speakers
● Stéphane Corlosquet
● 6 years with Drupal
● Drupal core maintainer (RDF)
● Drupal Security Team member
● Co-authored the
Definitive Guide to Drupal 7
● Co-maintain RDF Extensions,
SPARQL, schema.org
● Member of the RDFa WG
3. About the speakers
● Paolo Ciccarese, PhD
● Assistant in Neurology at Mass General Hospital
● Research faculty at Harvard Medical School
● Author of 30+ scientific publications
● Senior software and knowledge engineer
● Member of W3C HCLS Interest Group
● Co-chair of the W3C Open Annotation Community
Group
4. Tutorial outline
● Introduction to Drupal
● What is it good for
● Installation / Hosted Drupal
● Semantic Web and Drupal
● Technology stack
● Use cases, hands on session
● Domeo & Drupal
5. Drupal
● Dries Buytaert - small news site in 2000
● Open Source - 2001
● Content Management System
● LAMP stack
● Non-developers can build sites
and publish content
● Control panels instead of code
http://www.flickr.com/photos/funkyah/2400889778
/
6. Drupal
● Open & modular
architecture
● Extensible by modules
● Standards-based
● Low resource hosting
● Scalable
7. Building a Drupal site
http://www.flickr.com/photos/toomuchdew/3792159077/
8. Building a Drupal site
● Create the content types
you need
Blog, article, wiki, forum, polls,
image, video, podcast, e-
commerce... (be creative)
http://www.flickr.com/photos/georgivar/4795856532/
9. Building a Drupal site
● Enable the features you
want
Comments, tags, voting/rating,
location, translations, revisions,
search...
http://www.flickr.com/photos/skip/42288941/
11. Building a Drupal site
Thousands of free
contributed modules
● Google Analytics
● Wysiwyg
● Captcha
● Calendar
● XML sitemap
● Five stars
● Twitter
● ...
http://www.flickr.com/photos/kaptainkobold/1422600992/
13. The Drupal Community
“It’s really the Drupal community and not so much
the software that makes the Drupal project what it
is. So fostering the Drupal community is actually
more important than just managing the code base.” -
Dries Buytaert
http://webchick.net/node/80
29. Why Structured Data in HTML
● Help machines extract relevant
data from HTML
● Can make use of this data in
amazing ways (e.g. enhanced
search results)
30. Structured Data in HTML
● Add or alter HTML attributes
● Syntaxes
– Microformats (@class, @rel)
– RDFa (@property, @about, @typeof, …)
– Microdata (@itemscope, @itemtype, @itemprop, …)
– RDFa 1.1 & RDFa Lite
33. Schema.org
● Describe the type of your content (Person,
Event, Recipe, Product, Book, Movie, etc.)
– 290 types and counting
● Each type has a set of properties
– Common properties: name, description, image, url
– Specific properties depending on the type (see type page
on schema.org)
– 240 properties and counting
39. Examples in the wild
● Events
– “force11 events”: http://goo.gl/VVhNM
– DrupalCon Munich: http://goo.gl/jgMvw
● Recipes
– “delicious lemon coconut squares”: http://goo.gl/ORdl1
– Apple pie with ingredients: http://goo.gl/wCO1w
40. Examples in the wild
● University of Waterloo
– School of Public Health and Health Systems launch:
http://goo.gl/Df9hp
● Curling tournament calendar
– European Curling Championships 2012:
http://goo.gl/YXgXl
– World Women’s Curling Championships 2013:
http://goo.gl/BDNZW
45. Drupal 7 and RDF
● Drupal 7 core is RDFa enabled
●
RDFa output by default on blogs, forums,
comments, etc. using FOAF, SIOC, DC, SKOS
46. Architecture
● User driven data model
● Content type => RDF class
● Field => RDF property
● Node => RDF resource
http://en.wikipedia.org/wiki/File:Oriente_Station_Lisboa_roof.jpg
51. Drupal 7 and RDF
● Contributed module for more features
● RDF Extensions
● Serialization formats: RDF/XML, Turtle, N-Triples
● SPARQL
● Expose Drupal RDF data in a SPARQL Endpoint
● SPARQL Views
● Display remote RDF data in Drupal using SPARQL
● JSON-LD
● Expose Drupal RDF data as JSON-LD (CORS-enabled)
● Features and packaging
● Build distributions / deployment workflow
53. SPARQL Endpoint
● Public endpoint available at /sparql
● http://prefix.cc/sioc,rnews.sparql
54. JSON-LD in Drupal
● Client side as well as server side friendly
● Browser Scripting:
– Native javascript format
– RDFa API in the DOM
● Data can be fetched from anywhere:
– Cross-Origin Resource Sharing (CORS) enabled
● Client can mash data
● http://drupal.org/project/jsonld
58. Demos
● Occupy Directory
– http://directory.occupy.net/occupations
– JSON-LD: http://directory.occupy.net/node/19652.jsonld
● Federated General Assembly
– Drupal distribution for occupy movement
– http://wiki.occupy.net/wiki/Federated_General_Assembly
59. D OM E O : a web-based tool for
semantic annotation of online
documents
60. As (biomedical) scientists…
• We deal with an increasing amount of
digital resources (documents, images,
videos, datasets, databases… )
• We commonly use annotation but…
– are we really efficient?
– can we leverage machine computation?
– can we share it easily with our
colleagues?
– can we capitalize on the work of
colleagues?
61. Annotation Framework
(C omponents)
• A nno ta tion O ntolog y (A O ): O WL
vocabulary for representing and sharing
annotation of digital resources and their
fragments
– Website http:/purl.org/ home
/ ao/
– P aper http:/www.jbiomedsem.com/
/ content/ S 2/ 4
2/ S
• D O M E O c lient: web application for
producing and sharing manual, semi-
automatic and automatic annotation
– Website http:/annotationframework.org
/
– P aper http:/www.jbiomedsem.com/
/ content/ S 1/ 1
3/ S
62. Annotation of digital resources
Visually and effectively annotate - better
semantically annotate - any digital resource
and resource fragment, while performing our
regular browsing/ reading activities
http:/ antibodyregis try.org/
/ antibody17/antibodyform.html?
gui_type=advanced&ab_id=2266850
antibodyregistry.org
63. Leverage text mining and
community curation
R un text mining and entities recognition
algorithms on scientific documents and
persist the results in a standard format
B enefit from crowdsourcing by supporting
curation of manual and automatic annotation
64. … and more
• E fficiently search and reuse the annotation
– S emantic inference
• S ubscribe to feeds related to topics of
interest
– P roteins, C ells, Authors, P apers…
• R etrieve additional content (mashups)
– E ntrez gene, UniP rot, …
65. S emantic tagging through
ontologies
S emantic Tag
http:/ purl.obolibrary.org/
/ obo/ R _000004168
P
Label ‘amyloid beta A4 protein’
E xact synonyms ‘AP P ’, ‘amyloidogenic glycoprotein’, …
R elated S ynonyms ‘A4’, ‘AB P P ’,
Is a
http:/ purl.obolibrary.org/
/ obo/ R _000000001
P
Label ‘protein’
D efinition ‘An amino acid chain that… ’
S ource: P rotein Ontology (P R O )
https:/ pir5.georgetown.edu/
/ wiki/ R O
P
66. AP P s for the S emantic R esources P roject, M ay 2010
67. Zooming in
AP P s for the S emantic R esources P roject, M ay 2010
68. Annotation O ntology (AO )
O WL vocabulary for representing and sharing
annotation of digital resources and their fragments
Not only for biomedicine!
–Website http:/purl.org/ home
/ ao/
–P aper http:/www.jbiomedsem.com/
/ content/ S 2/ 4
2/ S
69. A simplified view of AO
AO allows to annotate:
R es o urc es : D ocuments (HTM L, P D F, Word, E xcel), Images,
D atabases, Web S ervices... (and their fragments)
S pecifying (or not) an:
A nno ta tio n Type : through one of the already available
types (errata, highlight, qualifiers...) or the ones the users
will define.
With (or without) a:
Topic : free text, structured text, UR Is, R D F entities,
R D F graphs, domain ontologies…
Tracing:
P rovena nc e : who created what, when, with which
software, with what expectations…
74. Open Annotation Community Group
Annotation O ntology is going to be replaced
in our applications by the O pen Annotation
M odel developed through the W3C Open
Annotation C ommunity Group
–Website http:/www.w3.org/
/ community/openannotation/
–C ore M odel http:/www.openannotation.org/
/ spec/core/
–E xtensions
http:/www.openannotation.org/
/ spec/extension/
79. D omeo and the NC B O
Annotator
annotator-service
D omeo allows automatic/ manual annotation with
terms coming from selected ontologies managed by
the B ioP ortal
http:/ www.bioontology.org/
/
80. R unning NC BO Annotator
Additional text mining services
will be listed here
81. NC BO Annotator R esults in
D omeo
List of recognized
entities
85. UIM A, C lerezza and AO
E valuating P erformance
C omparing Algorithms
Learning
…
Text
Curated
M ining
R esults
AO R D F Text
M ining
R esults
Applications
AO R D F P ublishing
http:/ www.slideshare.net/
/ paolociccarese/domeo-and-text-mining