Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 1 of 26
Plone Conference 2010
Bristol
Plone is so semantic,
isn't it?
Jens W. Klein <jk@kleinundpartner.at>
2010-10-28
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 2 of 26
Store Knowledge?
● Humans have different kinds of knowledge:
tacit and explicit
● only possible to transform
explict knowlegde (brain) =>
reduce to information =>
store information in ICT-system
● creation of documents (html-form, doc,
powerpoint, pdf, html, audio, video, CAD/CAM,
and hundreds more)
● sometimes collaborative
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 3 of 26
IN PLONE...
● create documents (HTML),
● upload or place links to documents.
● Excellent collaboration
– sharing,
– versioning,
– workflow.
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 4 of 26
Problem:
● stupid ICT-Systems: they dont 'know' anything
about the document.
● we need to provide algorythms to fetch
information from the ones and zeros.
● Algorythms are as good as stored information is
● Fulltext search: Cut into words, create index.
● Organize documents in folders, subfolder, give
them meaningful names.
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 5 of 26
IN PLONE...
● portal_catalog
● full text index and search
● hirarchical folderish structure
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 6 of 26
BUT ...
● this is not enough. We want to know the author,
intellectual property information, dates, ...
● So we need additional information.
● One solution: Store additional information with
the document: Metadata.
● Dublin Core (DC) Metadata, Learning Objects
Metadata (LOM), ..
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 7 of 26
BUT ... lazy Humans!
*sigh*
● metadata need to be created/added by editors.
● Humans are lazy,
● so must of the time
– NO METADATA ENTERED
● Helpers:
– Extrensic motivation (pay, required fields, ...)
– Automatic adding out of context: i.e. username
as creator, dates of creation or publishing.
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 8 of 26
IN PLONE...
● Dublin-Core Metadata on any document.
● Some fields are filled automatically,
– i.e.author, date of creation and publication, with
some limits also language.
● Others need to be entered manually.
● Using add-on products utilizing schemaextender
other metadata-fields can be added, i.e. the
Dublin Core Terms extension.
● Exposed in HTML-header (if switched on)
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 9 of 26
Semantics (from Greek sēmantikós)
is the study of meaning.
It typically focuses on the
relation between signifiers,
such as words, phrases, signs and
symbols, and what they stand for.
Wikipedia
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 10 of 26
so no semantics so far
in Plone?
● we have hierarchie
– A is parent of B => relation
● we have metadata
– John Smith is author of B
– B was published 2010-10-10 10:10:10
● we have general references
– B is related to C
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 11 of 26
Limitations in Plone
● If the word „Paris“ is in the text we dont know
its a city „Paris is a city“.
● We can search for the string „Paris“, but not for
articles about cities in france. „Paris is part of
France“
● No way to connect with articles outside Plone.
● Naked Plone only exposes a tiny set of its
limited semantic information.
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 12 of 26
RDF helps
Resource Description
Framework
● all information is broke down into triples of
– subject => Paris
– predicate => is part of
– object => France
● triple is an element of a graph
● multiple triples forms complex RDF graphs
● RDF is family of W3C specifications to work with
theses graphs.
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 13 of 26
IN PLONE?
● No RDF dialect out of the box
● eea.rdfmarshaller (add-on) builds
– RDF/XML from archetypes content, hierarchie and
relations
– RDF-Schemas from FTI
● no triple storage
● no possibility to query
● no auto-enhancement, i.e. finding geo-names
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 15 of 26
Introducing IKS
● IKS (interactive knowledge stack, ICT-231527) is
– a Semantic-based Open Source Platform for
Small to Medium CMS Providers
– raise the semantic capability of European
software houses to develop intelligent content
management solutions for their customers.
– an Integrated Project (IP) of the European Union's
7th Framework Programme: ICT – Call 3. From
2009-01-01 to 2012-12-31 (48 months).
– 13 participants from 7 countries involved, and
the EC contribution is 6.57 million Euros (total
cost: €8.55m).
–
http://[www|wiki].iks-project.org
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 17 of 26
Introducing FISE
● FISE, a major IKS outcome, is
– an Open Source RESTful Semantic Engine
software component extracing meaning of
electronic documents to organize it as
partially structured knowledge.
– semantic middleware with pluggable enhancers,
– triple store,
– sparql endpoint (query mechanism)
– ... more to come
– alpha, java, easy to integrate, nice devs behind.
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 18 of 26
FISE
current enhancers
● categorize documents,
● suggest meaningful tags from a controlled
taxonomy and assert there relative importance,
● find related documents in the local database or
on the web,
● extract and recognize mentions of known
entities,
● detect yet unknown entities of the same afore
mentioned types to enrich the knowledge base,
● more and more to come.
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 19 of 26
not IKS, but related:
Aloha-Editor
● WYSIWIG editor using Contenteditable
(xHTML5)
● very fast (loading, init, multiple instances)
● pluggable - possible to create semantic
plugins for i.e. microformats.
● OpenSource (GPL, initially made by Gentics)
http://www.aloha-editor.com/
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 20 of 26
YES. We need it in Plone.
But: Out of my scope
Who will integrate it?
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 21 of 26
back to FISE:
planned Python/ Plone
Integration
Klein & Partner KG got an IKS Early Adopter
(funded)
● Create a generic Python API to communicate
with FISE over its Restful API
● Integrate with Plone, „index“ into FISE.
● Create a Plone Portlet (UI) showing some
enhancements.
● Present the results to the Plone Community.
Spread the word.
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 23 of 26
FISE Integration
work done so far
● buildout for FISE http://github.com/collective/fise-buildout
● fise.client http://github.com/collective/fise.client
● started to spread the word
● research done:
– RDFlib
– SuRF
– restful client APIs
– SPARQL and Python
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 24 of 26
fise.client
Initialize:
>>> from fise.client import FISE
>>> fise = FISE('http://localhost:8080/')
Use the engines:
>>> somedoc = u"This is an example text."
>>> fise.engines(somedoc)
<xml...>
>>> fise.engines(somedoc,
format='rdfjson')
jsonresponse
Use the store, first store content
(only plain text is accepted for now):
>>> id = 'test123'
>>> fise.store.content[id] = payload
Next get the text back:
>>> fise.store.content[id]
u"This is an example text."
Then get the metadata:
>>> fise.store.metadata(id)
<RDF>
And FISE special feature: Get an HTML
page about the content:
>>> fise.store.page(id)
<HTML>
Creative Commons Namensnennung-
Keine kommerzielle Nutzung-
Keine Bearbeitung 3.0 Österreich LizenzPage 25 of 26
Work todo
● support passing SPARQL queries to FISE (easy)
● write fise.plone and index content in FISE.
● write some visualization (i.e. viewlet/ portlet) to
show enhancement found
● document all this
● sprint on FISE in Bristol
● organize a semantic sprint in Innsbruck