2. 01/30/15 2
About Recognos
• Established 1999 ( www.recognos.com )
• California S-Corporation – Offices in San Rafael,
San Mateo
• In 2000 created Recognos Romania
• Office in Romania situated in Cluj (
www.cluj4all.com)
• 70 employees
• Semantic technologies R&D
• Started a meetup : http://www.meetup.com/Cluj-Semantic-
WEB/
• Applications in Finance, CRM, Life Sciences, etc.
3. 01/30/15 3
What is the Semantic Technology
• WEB 3.0 ?
• Gives meaning through relationships
• Building bloc – statements
• The statements describe: concepts, logic, restrictions
and individuals (instances)
• WWW is for human consumption
• Semantic WEB – for machines
• Relationships: definitions, associations, aggregations
and restrictions
5. 01/30/15 5
Major Difficulty
Open World vs Closed World
Anybody can say ANYTHING about ANYTHING!
You don’t know what you don’t know!
6. 01/30/15 6
Semantic Technology vs.
Semantic WEB
• Semantic Technology – “machines” try to understand :
– Natural Language Text
– Images
– Sounds
– Machine learning
• Semantic WEB Technology – part of the Semantic
Technology (semantic search, semantic tagging,
microformats (FOAF), web site federation), Linked Open
Data
8. 01/30/15 8
How to represent the knowledge
• Gives meaning through relationships
• Everybody to understand the same thing
• The machines could understand
• Eliminates ambiguities through URI – Uniform Resource
Identifier – PURL – Persistent Uniform Locator
• Need software that will be able to read these and
“understand”
• Describe things on the internet using such a universal
language
9. 01/30/15 9
Building Block RDF
“There is a Person identified by http://www.w3.org/People/EM/contact#me, whose name
is Eric Miller, whose email address is em@w3.org, and whose title is Dr.".
Triplets:
(i) http://www.w3.org/People/EM/contact#me,
http://www.w3.org/2000/10/swap/pim/contact#fullName
, "Eric Miller"
(ii) http://www.w3.org/People/EM/contact#me,
http://www.w3.org/2000/10/swap/pim/contact#personalTi
, "Dr."
(iii) http://www.w3.org/People/EM/contact#me,
http://www.w3.org/1999/02/22-rdf-syntax-ns#type,
http://www.w3.org/2000/10/swap/pim/contact#Person
(iv) http://www.w3.org/People/EM/contact#me,
http://www.w3.org/2000/10/swap/pim/contact#mailbox
, em@w3.org
11. 01/30/15 11
Ontologies - OWL
http://www.fao.org/countryprofiles/geoinfo.asp?lang=en
An Ontology is a kind of dictionary that describes information in a certain domain using
concepts and relationships. It is often implemented using OWL
•A Concept is defined as abstract knowledge. (Example: Movie, Country, Organizatiuon).
Concepts are explicitly implemented in the ontology with individuals and classes:
•An individual is defined as an object perceived from the real world. (The Sound of
Music is a Movie , and belongs to the musical genre.
•A class is defined as a set of individuals sharing common properties. In the
geopolitical domain, Ethiopia, Republic of Korea or Italy are individuals of the class
country; Relationships between concepts are explicitly implemented by:
•Object properties between individuals of two classes. For example, has member
and is in group properties.
•Datatype properties between individuals and literals or XML datatypes. For
example, the individual “United States” has the datatype property CodeISO3 with
the value “USA".
•Restrictions in classes and/or properties. For example, the property spoken
Language of the class Movie has been restricted to have only one value, this means
that a movie canb have oly one spoken language].
15. 01/30/15 15
• The main entities can be represented as Class using an
ontology language (Movie , Person, Role)
• Other attributes (movie rating, movie genres,…) can be
represented as Properties of the appropriate Classes
Movie Person
Role
acted
film
Brad PittTroy
Achilles
acted
film
17. 01/30/15 17
Troy
title 2004
year
163runtime
English
language
6.9
85463
rating
votes
Alejandro Avendano
longname
stuntPerformer
Jack El Despertador
title
setDecorator
Romero
titlefilm
acted
i.e.
Alejandro Avendano as
• Actor
• Stunt Perfomer
• Set Decorator
p1 m1
m2
m3
r1
p1:http://www.imdb.com/Person/Avendano
m1:http://www.imdb.com/Movie/Troy
m2:http://www.imdb.com/Movie/Romero
m3:http://www.imdb.com/Movie/JackElDespertador
r1:http://www.imdb.com/Role/DeathSquadMember
18. 01/30/15 18
• find resources according to specific criteria
– i.e. Find movies with Roger Bratt as a cinematographer, or movies
with producer Halle Berry’s spouse
• and simpler queries
– i.e. Find movies with genre = War, Romance etc
19. 01/30/15 19
How to represent the knowledge
Feature Relational Database Knowledgebase
Structure Schema Ontology Statements
Data Rows Instance Elements
Admin
language
DDL Ontology Statements
(OWL)
Query
language
SQL SPARQL
Relationship Foreign Keys Multidimensional
Logic External of DB / triggers Formal logic
statements
Uniqueness Key for table Uniqueness
Restriction
20. 01/30/15 20
How to store the knowledge
RDF Stores
•These are “referential databases”
•Oracle 11g – stores RDF in relational database
•http://www.franz.com/agraph/allegrograph/ - Allegrograph
•AllegroGraph RDFStore is a high-performance, persistent
RDF graph database. AllegroGraph uses disk-based storage,
enabling it to scale to billions of triples. AllegroGraph
supports SPARQL, RDFS++, and Prolog reasoning.
•Sesame
•Virtuoso
21. 01/30/15 21
Applications
• Are used to solve complicated problems
• All problems could be solved manually or with
conventional applications but with much more effort
• The Semantic WEB core idea is to “teach” the machine
to “mimic” the human reasoning – simplistic approach
• This is in fact “recycled AI techniques”
• Alternative to data warehouses
• Using inference to find new facts
• Integrates formatted with non-formatted docs
• Cross technology queries
22. 01/30/15 22
Potential for Netflix
Applications Categories:
1) Data Integration of Heterogeneous data silos
2) Semantic Search
a) Semantic Tagging
b) Faceted Search
c) NL Queries
1) Use of Open Linked Data
2) (Others: Market Sentiment Analysis – blogs, forums;
Advertising)
23. 01/30/15 23
Data Integration using Ontologies
n:Movie
n:MovieId
n:hasIdentifier
n:Documentary
isA
n:Director
hasDirector
n:Person
isA
n:Actor
isA
hasActor
IMDB Movie Database
a - Namespace
a. Character
a. Cast Member
a.Picture
a.IMDB Id
...
Paramount Movie Database
b-Namespace
b. Role
b. Person
b.Motion Picture
b. Other fields
...
Warner Bros Movie
Database
c:Namespace
c. RoleName
c. PersonName
c.MovieName
c. Other fields
...
RDF Store 1 RDF Store 2 RDF Store 3
Data Mapping:
n:Movie owl:sameAs a:Picture
n:Actor owl:sameAs a:character
n:Actor owl:sameAs a:character
n:Actor owl:sameAs c.PersonName
….
Data Federation using
SPARQL
The fields on the integrated dataset
consists of the union of fields in the
federated data sources.
Is is very easy to add new data
sources.
Unformatted text…
Blogs, Forums, RSS
Feeds….
RDF Store 3
Knowledge
extraction from
text
Canb be data sources in different
technologies : Oracle , MySQL,
XLS, CSV, etc.
25. 01/30/15 25
Semantic Search
• Wolfram Alpha, Semantifi
• Faceted Search (www.needlebase.com)
• Micro Formats
• Good Relations
• Open Linked Data
• Using natural language as a query language
26. 01/30/15 26
Deep WEB vs. Shallow WEB
• www.wolframalpha.com, www.google.com
• www.semantifi.com
30. 01/30/15 30
Microformats
A microformat (sometimes abbreviated μF) is a web-based approach to
semantic markup which seeks to re-use existing HTML/XHTML tags to convey
metadata and other attributes in web pages and other contexts that support
(X)HTML, such as RSS. This approach allows software to process information
intended for end-users (such as contact information, geographic coordinates,
calendar events, and the like) automatically. Examples:
hAtom – for marking up Atom feeds from within standard HTML
hCalendar – for events
hCard – for contact information; includes:
adr – for postal addresses
geo – for geographical coordinates (latitude, longitude)
hNews - for news content
hProduct – for products
hRecipe - for recipes and foodstuffs.
hResume – for resumes or CVs
hReview – for reviews
rel-directory – for distributed directory creation and inclusion[7]
32. 01/30/15 32
Open Linked Data - Folksonomies
http://linkeddata.org/
Open Linked Data "a term used to describe a recommended best
practice for exposing, sharing, and connecting pieces of data,
information, and knowledge on the Semantic Web using URIs and
RDF."
http://esw.w3.org/DataSetRDFDumps
www.wikipedia.com
www.freebase.com – bought by Google i9n July, 2010 –
Metaweb
Folksonomy - Folksonomy is the result of personal free tagging of
information and objects (anything with a URL) for one's own retrieval. The
tagging is done in a social environment (usually shared and open to
others). Folksonomy is created from the act of tagging by the person
consuming the information. (Thomas Vander Wal – 2004)
39. 01/30/15 39
The Future: Using NL as a query
language
Comedies with John Travolta filmed in the US
All movies with Clint Eastwood as director
Coppola family movies
Documentaries about the genocide in Africa
Movies filmed in San Francisco Marina
Where can I buy the music from Love Story ?
Is any tour based on the Da Vinci Code ?
Movies based on novels written by 19th
Century British writers
46. 01/30/15 46
How can Recognos Help
•Recognos is a Semantic Applications Developer
•Works with vendors to develop applications
•Help Netflix create a Semantic Group
•Help selecting technologies
•Build search applications for Linked Data, Faceted Search
•Detect similarities between film descriptions
•Data Integrations
•Leverage the 3 years experience in developing semantic
applications (data integration, NLP, semantic search)
• etc.
47. 01/30/15 47
Contact Info
George Roth – CEO Recognos Inc
Skype Id: grecognos
eMail: groth@recognos.com
WEB Site: www.recognos.com
Adonis Damian – Senior Semantic Application Architect
eMail: adonis@recognos.com