Here is the deck we shared with the SF and LA Semantic Web Meetups this past week (March, '09). It covers Calais 4.0 and its connection to the Linked Data cloud. Please join us at OpenCalais.com
2. Overview
• Going to discuss five basic topics
– What is Calais?
– Why we’re doing it & what our goals are
– How it works / What’s under the hood?
– A few examples
– Where it’s headed
3. Calais…
• Calais extracts smart metadata from unstructured
text and links that metadata to the Linked Data
cloud.
4. Calais progress to date
• Launched in late January, 2008
• 9,500 developers have joined
OpenCalais.com
• 1-3 million content ‘transactions’ per day
• Delivered four major update releases
• Free (as in free) for commercial or non-
commercial use
5. 5
3 Which provides
Metadata
information and
1 returned to
other Linked
the user
Unstructur Data pointers
with keys
ed Text
4
Keys
provide
access to
the Calais
2 Linked
Calais
Data cloud
6
extracts
entities, To a range of open
and partner Linked
facts and
data assets,
events
including
Thomson Reuters
6. Quick Demo
You can find the Calais Viewer demonstration tool here:
http://viewer.opencalais.com (Note that the Calais Viewer is not the
Calais service. It is merely a demonstration of how the service works.)
– Copy and paste the text of a business news article from AP, Dow Jones
or Reuters.com into the viewer, and press submit. The article is sent to
the Calais engine which tags the content and returns it, marked-up.
– The tags appear on the left hand rail, and you can click on the plus (+)
sign to see the tags expand.
– Since we are now on Calais 4.0, you can also use the viewer to see the
Linked Data assets related to the tags Calais returns.
• Click on a company name on the left hand rail to find a Calais summary page
featuring a basic description for that company, as well as a number of links.
• Follow those links to see the other data entries on that company that are
available for public use in the Linked Data Cloud.
– For example, here is the Calais summary page for IBM:
http://d.opencalais.com/er/company/ralg-tr1r/9e3f6c34-aa6b-3a3b-b221-
a07aa7933633.html
– And here is the summary page for IBM in DBPedia (the Wikipedia
translated into computer language): http://dbpedia.org/page/IBM
7. Why & What
1. Derive semantic metadata from textual assets
2. Use that semantic metadata to create entry points into
the linked data ecosystem
3. Provide a simple mechanism for the sharing of semantic
metadata about textual content assets
4. And just why are you doing this…
8. 1: Semantics from Text: The Text Problem
• People consume text
• Most of it isn’t semantically enabled
• Most of it won’t be semantically
enabled
• This isn’t about standards –
microfromats vs RDFa vs.
whatever.
• Why: Latency, cost and short shelf-
life
9. 1: Semantics from Text: The Text Problem
• Target areas
where:
Years
– The economics
Great
Novels
don’t support
Scient.
Shelf Life
metadata
Pubs
creation
Legacy
– The value of
News
metadata is
New
Gen
potentially high
News
– The value of
Seconds Tweets
aggregated
metadata is
Latency
ds
potentially
on
rs
extremely high
a
c
Ye
Se
12. 3: Semantic Metadata Transport Layer
• I’m a content producer.
We’ve loaded the car
with rich semantic
metadata
– I’m sharing it within my
four walls
– How do I transport it to
my consumers?
– RSS / Atom, XML,
Proprietary data feeds,
Content API’s
13. 4: Why We’re Doing It
• Two simple answers:
– Hyper-evolution of capabilities – better, faster, stronger
– The walled garden content world
15. How it Works – Under the Hood of Calais
Document
Level
Metadata
Metadata
Reference Management
Data Assets
Entity Level
Linked Data
and …
Stat Tools
Disambig.
ClearForest
Calais Web
RD
Engine
NLP Engine
Service
F
Rule Lexi
Base cons
Output
Formatting
16. Where From Here?
• We’ve seen examples of first generation uses.
• Where does this go in the future?
• Beyond the document
– Social Resume analysis
– Museum Content Coalitions
– Knowledge Management Applications
– Investigative Journalism*
17. Investigative Journalism
FOIA Calais Web Company:Contract
Contract Service Company:Affiliation
Document
s
Big Fuzzy Graph
News Calais Web Company:Person
Service FamilyRelation
18. What’s in the Pipeline?
• 2009 (this is a fuzzy list)
– Person disambiguation @ domain level?
– Other disambiguation
– Continued expansion of URI’s (entities & events)
– Calais as hub
– Exposure of the IDE?
– User managed lexicons
– Languages
– Opt-in SPARQL Endpoint?
19. • www.opencalais.com
– Gallery – code and applications examples
– Forums
– Documentation
• Twitter @opencalais, Facebook Group