SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Downloaden Sie, um offline zu lesen
1




                               Semantically enriching content
                                    using Open Calais
                                               Marius-Gabriel Butuc, MSc Student, UAIC



   Abstract—One of the key challenges of the Semantic Web
is how to go from today’s unstructured web to a web rich
with semantic information. Following the bottom up approach,
OpenCalais is an automated system intended to annotate data
and information, allowing us to gradually build semantically
enabled systems. By semantically enriching the published content,
we help users enjoy their on-line experiences and reduce the
frustration of dealing with voluminous amounts of information
that is incoherently organized and often irrelevant to a particular
person’s need.
   Index Terms—API, Linked Data, resource management, web
service.
                                                                             Fig. 1.   Overview of Calais applications and tools.


                         I. I NTRODUCTION                                    A. How does it work?
      ALAIS identifies and automatically tags entities (e.g.,
C     people, places, companies, geographies), facts (e.g., rela-
tionships) and events (e.g., things that happened) in text. It then
                                                                               The basic interaction delivered by the OpenCalais API
                                                                             (Figure 2) can be summed up in the following steps:
                                                                               1) First of all, we need an API key – a string that uniquely
fabricates connections between those entities and significant                       identifies the specific instances of the application that
data sets, media files, and similar entries using open data from                    uses the service.
sources like Wikipedia, DBpedia, GeoNames, the Internet                        2) After obtaining the key, we can submit content to the
Movie Database (IMDB), Shopping.com, Reuters.com, etc.                             Calais Web service. The methods supported by the
   With the original goal - help developers, bloggers and                          OpenCalais Web Service API are: SOAP, REST and
publishers automatically tag their content to improve search                       HTTP Traffic Compression.
and navigation, and enable new reader engagement features -                    3) Calais tags each person, place, fact and event in the
the team behind Calais has developed a tool to automate time                       content, making it machine-readable and interoperable
consuming content operations and increase productivity.                            on the Web.
   This paper will look into the possibilities of semantically                 4) Each piece of content – and each entity or event in that
enriching published content [1] using Calais.                                      content – is assigned a unique identifier (a document
                                                                                   ID and many URIs) that ties back to the Linked Data
                                                                                   Cloud.
     II. C ALAIS AND THE O PEN C ALAIS W EB S ERVICE
                                                                               5) We can use the metadata returned by Calais (tags,
   Calais is described by it’s developers as a ”big initiative                     document IDs and URIs) in our application, e.g., we can
with a lot of components” [2], yet at it’s core lays the                           use OpenCalais to scan a set of Web pages and build a
OpenCalais web service. The web service is an API that                             local RDF store for querying and display.
accepts unstructured text, processes it using Natural Language
Processing (NLP) and Machine Learning (ML) algorithms,                       B. LinkedData Entities
returns RDF-formatted entities, facts and events and is used
to fuel the applications that build the Calais initiative.                      As of March 2009, Calais is oficially part of the Linked
                                                                             Open Data (LOD) Cloud (Figure 3).
   An overview of the Calais applications and tools is depicted
                                                                                Linked Data is a paradigm of exposing, sharing, and con-
in Fig. 1
                                                                             necting data via dereferenceable URIs (Uniform Resource
   Recently nominated one of the Top 10 Semantic Web prod-
                                                                             Identifier) on the Web. The method includes four princi-
ucts of 2009 [3], OpenCalais uses semantic technology and
                                                                             ples [6]:
natural language processing to analyze text and add metadata
by drawing out entities from documents, blog posts, news                        1) Use URIs to identify things that you expose to the Web
stories, etc. In some cases [4], this type of data can identify                    as resources.
or help identify relationships between people, businesses, etc.                 2) Use HTTP URIs so that people/machines can locate and
                                                                                   dereference these things.
  Marius-Gabriel Butuc is with the Department of Computer Science, Alexan-      3) Provide useful information about the resource when its
dru Ioan Cuza University, Iasi, Romania, e-mail: mbutuc@info.uaic.ro
                            ,                                                      URI is dereferenced.
2



                                                                   there is one in the North of France and several in
                                                                   the USA. The URI of the ambiguous city “Calais” is:
                                                                   http://d.opencalais.com/genericHasher-1/
                                                                   82b41c38-0b5e-301f-8a84-839d4d78e78e.html
                                                                   If we try to dereference this URI, we’ll get disambiguation
                                                                   options, each leading to another resource known to the
                                                                   API that can disambiguate “Calais”. If we follow any of
                                                                   those resources, we will find useful information such as
                                                                   geographical coordinates and links to other Linked Data or
                                                                   Web assets.
                                                                      Currently the list of disambiguated entities is limited to
                                                                   Company, Product (Electronics) and Geographies:
                                                                   City, Country and ProvinceOrState. Company, for
                                                                   example, is the richest of endpoints providing rich informa-
                                                                   tion, such as ticker symbol, officers and directors, corporate
                                                                   website, industry codes, etc.

                                                                   C. Interpreting the API Response
Fig. 2.   OpenCalais web service
                                                                      The Calais response can be easily understood [7] from
                                                                   RDF, JSON, microformats [8], [9] – a web-based approach to
                                                                   semantic markup that seeks to re-use existing (X)HTML tags
                                                                   to convey metadata and other attributes – or a simple HTML
                                                                   format. Another example of the implementation of the LOD
                                                                   principles is given when we look into the RDF representation
                                                                   for Moscow, Russia. This entity is linked via owl:sameAs
                                                                   to
                                                                      • http://dbpedia.org/resource/Moscow
                                                                      • http://sws.geonames.org/524901/
                                                                      • http://rdf.freebase.co/ns/guid.
                                                                        9202a8c04000641f800000000002636c
                                                                    III. E XAMPLES OF APPLICATIONS THAT USE O PEN C ALAIS
                                                                   A. Calais Viewer
                                                                      Calais Viewer [10] is the best way to get a quick glimpse of
                                                                   OpenCalais. We can manually try the web service by typing
Fig. 3.   The Linked Open Data Cloud – July 2009 [5]
                                                                   in or pasting text into the viewer box, without needing an
                                                                   API key. For example, we used a Wikinews teaser about
                                                                   Barack Obama and got the results depicted in Figure 4. It
   4) Include links to other, related URIs in the exposed data     identifies the correct topic of the article – Politics – and
      as a means of improving information discovery on the         also Washington, United States as a City that the
      Web.                                                         article is about, and two Person entities: Barack Obama
   The Calais initiative has exposed it’s data via Linked Data     and George W. Bush.
endpoints. When Calais extracts an entity from a given text
it also returns a dereferenceable entity URI. All the naming       B. Tagaroo
conventions for entities, facts, events and their corresponding       As we saw in Figure 1, Calais provides Content Manage-
attributes are rationalized and the schema is published on the     ment Tools for Drupal and Wordpress. Tagaroo [11] is one
official site as RDFS.                                              of those tools aimed to make any WordPress blog better for
   There     are     several    classes    of    entities   that   the publisher, for the readers and more accessible to search
are considered meaningful to Calais and they                       engines. Similar to Zemanta [12], Tagaroo analyzes the text
include              Currency, MarketIndex, Movie,                 in our post and suggests intelligent tags for the things and
MusicGroup, MusicAlbum, OperatingSystem,                           events we’re writing about. Before using it, we need to have
ProgrammingLanguage and many others. The breadth                   a valid API key (Figure 5). The user interface is intuitive and
and depth of information provided in each URI varies               seamlessly integrated in WordPress. In order to semantically
based on how well Calais can unambiguously identify this           enrich our content, all we need to do is to start writing our
entity, automatically using it’s specific NLP algorithms.           post. After writing at least 64 characters, Tagaroo searches for
As an example, ”Calais” is an ambiguous name: we                   tags. We can add them to our post, ignore them, look up Flickr
have both Calais – the semantic tool and Calais – the              images for any of them as illustrated in Figure 6 or even add
city. Even as a city name Calais is ambigous, because              our own tags.
3



                                                                              C. SemanticProxy
                                                                                 Following the standard for publishing Linked Data on the
                                                                              Web [13], the Calais team developed SemanticProxy [14] to
                                                                              translate the content of any URL on the web to its semantic
                                                                              representation in RDF, JSON, microformats or HTML (Fig-
                                                                              ure 8). It is optimized for performance on 30 of the world’s
                                                                              largest English-language news sites, but it also works well with
                                                                              other sites like in Figure 7. Intended for the “little semantic
                                                                              machines love to come by and spend a few high-quality
                                                                              milliseconds” [14], it is meant to leverage both current and
                                                                              future semantic applications.




Fig. 4.   Calais Viewer in action: semantically enriching a Wikinews teaser




                                                                              Fig. 7.   SemanticProxy: Interface for the human user




Fig. 5.   Installing Tagaroo




                                                                              Fig. 8.   SemanticProxy in action: semantically enriched HTML results




Fig. 6.   Tagaroo in action: semantically enriching our content as we type
4



                      IV. C ONCLUSION                                                           R EFERENCES
   OpenCalais currently only supports content in English,            [1] D. Allemang and J. Hendler, Semantic Web for the Working Ontologist:
French and Spanish and it tries to identify automatically                Effective Modeling in RDFS and OWL.
                                                                         Morgan Kaufmann, 2008.
among the given languages by applying a Language Identifica-          [2] K. Thomas, “Simple OpenCalais Whitepaper,”
tion module before processing the text for entities, events and          http://slidesha.re/qoPWP, 2009.
facts. Supporting more languages could be a good direction           [3] R. MacManus, “Top 10 Semantic Web Products of 2009,”
                                                                         http://www.readwriteweb.com/archives/top 10 semantic web
for development, opening Calais even more. The web service               products of 2009.php, 2009.
is free for both commercial and non-commercial use. The              [4] T. Segaran, C. Evans, and J. Taylor, Programming the Semantic Web.
current limitations are 50,000 transactions per day with a               O’Reilly Media, 2009.
                                                                     [5] ***, “Linked Data - Connect Distributed Data across the web,”
frequency of maximum 4 transactions per second while an                  http://linkeddata.org/.
average transaction is expected to take from 0.5 to 1 second.        [6] S. Buraga, “Arhitectura aplicatiilor Web-ului semantic. Linked Data,”
                                                                                                         ,
   Using a semantic-based tool, the users are able to easily             Alexandru Ioan Cuza University Iasi, Tech. Rep., 2009,
                                                                                                              ,
                                                                         http://profs.info.uaic.ro/∼busaco/teach/courses/wade/presentations/
automate content operations, increase productivity and cut               web08ArhitecturaAplicatiilorWebSemantic-LinkedData.pdf.
costs. Since the Semantic Web goal is not to build a web             [7] ***, “Understanding the OpenCalais RDF Response,”
of data, but to enrich lives through access to information [15],         http://blog.3kbo.com/2009/09/26/opencalais-response/.
                                                                     [8] ——, “Microformats,” http://microformats.org/.
we can now approach the bottom up paradigm [16], enhancing           [9] J. Allsopp, Microformats: Empowering Your Markup for Web 2.0.
the value of our content and improving the user experience.              Friends of Ed, 2007.
   As stated in Section II-A, we can use OpenCalais, or even        [10] ***, “Calais viewer,” http://viewer.opencalais.com/.
                                                                    [11] ——, “Tagaroo – make blogging better!” http://tagaroo.opencalais.com/.
better SemanticProxy, to scan a set of Web pages and build a        [12] ——, “Zemanta: Blog smarter,” http://www.zemanta.com/.
local RDF store for querying and display.                           [13] C. Bizer, R. Cyganiak, and T. Heath, “How to Publish Linked Data on
   Another application could consist in a visualization tool that                                   a
                                                                         the Web,” Freie Universit¨t Berlin, Tech. Rep., 2007,
                                                                         http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/.
could make OpenCalais even more powerful. For example it            [14] ***, “Semanticproxy,” http://semanticproxy.com/.
might be interesting for visualization tools like Muckety [17]      [15] I. Davis, “If you love something... set it free,”
or NNDB Mapper [18] and to quickly ”see” relationships that              http://slidesha.re/haCax, 2009, at Code4Lib.
                                                                    [16] A. Iskold, “Semantic Web Patterns: A Guide to Semantic Technologies,”
might go unnoticed without tools like OpenCalais.                        http://www.readwriteweb.com/archives/semantic web patterns.php,
   We intend to experiment the use of OpenCalais for enhanc-             2008.
ing the content of the academic social Web applications (e.g.,      [17] ***, “Muckety - Mapping relations and measuring influence,”
                                                                         http://muckety.com/.
blogs, wikis) in the context of the semantic e-learning.            [18] ——, “NNDB Mapper: Tracking the entire world,”
                                                                         http://mapper.nndb.com/.

Weitere ähnliche Inhalte

Was ist angesagt?

The Semantic Web
The Semantic WebThe Semantic Web
The Semantic WebAnil Mishra
 
Semantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic WebSemantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic WebEditor IJCATR
 
IRJET- Semantic Web Mining and Semantic Search Engine: A Review
IRJET- Semantic Web Mining and Semantic Search Engine: A ReviewIRJET- Semantic Web Mining and Semantic Search Engine: A Review
IRJET- Semantic Web Mining and Semantic Search Engine: A ReviewIRJET Journal
 
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporatesMapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporatesTony Hirst
 
IP, OA and DAMs! Oh My! (Speakers Notes)
IP, OA and DAMs! Oh My! (Speakers Notes)IP, OA and DAMs! Oh My! (Speakers Notes)
IP, OA and DAMs! Oh My! (Speakers Notes)John ffrench
 

Was ist angesagt? (7)

The Semantic Web
The Semantic WebThe Semantic Web
The Semantic Web
 
Semantic we bnext
Semantic we bnextSemantic we bnext
Semantic we bnext
 
Semantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic WebSemantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic Web
 
IRJET- Semantic Web Mining and Semantic Search Engine: A Review
IRJET- Semantic Web Mining and Semantic Search Engine: A ReviewIRJET- Semantic Web Mining and Semantic Search Engine: A Review
IRJET- Semantic Web Mining and Semantic Search Engine: A Review
 
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporatesMapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
 
Paper9
Paper9Paper9
Paper9
 
IP, OA and DAMs! Oh My! (Speakers Notes)
IP, OA and DAMs! Oh My! (Speakers Notes)IP, OA and DAMs! Oh My! (Speakers Notes)
IP, OA and DAMs! Oh My! (Speakers Notes)
 

Ähnlich wie Semantically enriching content using OpenCalais

OpenCalais in Linked Data context
OpenCalais in Linked Data contextOpenCalais in Linked Data context
OpenCalais in Linked Data contexteldorina
 
OpenCalais At The San Diego Software Industry Council
OpenCalais At The San Diego Software Industry CouncilOpenCalais At The San Diego Software Industry Council
OpenCalais At The San Diego Software Industry CouncilKrista Thomas
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic WebJohn Breslin
 
An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semanticijasa
 
Security-Challenges-in-Implementing-Semantic-Web-Unifying-Logic
Security-Challenges-in-Implementing-Semantic-Web-Unifying-LogicSecurity-Challenges-in-Implementing-Semantic-Web-Unifying-Logic
Security-Challenges-in-Implementing-Semantic-Web-Unifying-LogicNana Kwame(Emeritus) Gyamfi
 
Semantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information SpacesSemantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information SpacesJohn Breslin
 
Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Cataloguesdavid-read
 
semantic web tech.ppt
semantic web tech.pptsemantic web tech.ppt
semantic web tech.pptNaglaaFathy42
 
Providing geospatial information as Linked Open Data
Providing geospatial information as Linked Open DataProviding geospatial information as Linked Open Data
Providing geospatial information as Linked Open DataPat Kenny
 
Web of Data as a Solution for Interoperability. Case Studies
Web of Data as a Solution for Interoperability. Case StudiesWeb of Data as a Solution for Interoperability. Case Studies
Web of Data as a Solution for Interoperability. Case StudiesSabin Buraga
 
Linked dataresearch
Linked dataresearchLinked dataresearch
Linked dataresearchTope Omitola
 
Discovering Resume Information using linked data  
Discovering Resume Information using linked data  Discovering Resume Information using linked data  
Discovering Resume Information using linked data  dannyijwest
 
Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database dannyijwest
 

Ähnlich wie Semantically enriching content using OpenCalais (20)

OpenCalais in Linked Data context
OpenCalais in Linked Data contextOpenCalais in Linked Data context
OpenCalais in Linked Data context
 
San diego
San diegoSan diego
San diego
 
San diego
San diegoSan diego
San diego
 
OpenCalais At The San Diego Software Industry Council
OpenCalais At The San Diego Software Industry CouncilOpenCalais At The San Diego Software Industry Council
OpenCalais At The San Diego Software Industry Council
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic Web
 
Metadata and Scotland’s information environment: potential benefits of Web 2.0
Metadata and Scotland’s information environment: potential benefits of Web 2.0Metadata and Scotland’s information environment: potential benefits of Web 2.0
Metadata and Scotland’s information environment: potential benefits of Web 2.0
 
An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semantic
 
Security-Challenges-in-Implementing-Semantic-Web-Unifying-Logic
Security-Challenges-in-Implementing-Semantic-Web-Unifying-LogicSecurity-Challenges-in-Implementing-Semantic-Web-Unifying-Logic
Security-Challenges-in-Implementing-Semantic-Web-Unifying-Logic
 
Semantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information SpacesSemantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information Spaces
 
Edu.03 assignment
Edu.03 assignment Edu.03 assignment
Edu.03 assignment
 
Edu.03
Edu.03 Edu.03
Edu.03
 
Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Catalogues
 
semantic web tech.ppt
semantic web tech.pptsemantic web tech.ppt
semantic web tech.ppt
 
Sup documentation
Sup documentationSup documentation
Sup documentation
 
Providing geospatial information as Linked Open Data
Providing geospatial information as Linked Open DataProviding geospatial information as Linked Open Data
Providing geospatial information as Linked Open Data
 
Web of Data as a Solution for Interoperability. Case Studies
Web of Data as a Solution for Interoperability. Case StudiesWeb of Data as a Solution for Interoperability. Case Studies
Web of Data as a Solution for Interoperability. Case Studies
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Linked dataresearch
Linked dataresearchLinked dataresearch
Linked dataresearch
 
Discovering Resume Information using linked data  
Discovering Resume Information using linked data  Discovering Resume Information using linked data  
Discovering Resume Information using linked data  
 
Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database
 

Mehr von Marius Butuc

Modern PHP RDF toolkits: a comparative study
Modern PHP RDF toolkits: a comparative studyModern PHP RDF toolkits: a comparative study
Modern PHP RDF toolkits: a comparative studyMarius Butuc
 
IHM et Genie Logiciel: Plasticite
IHM et Genie Logiciel: PlasticiteIHM et Genie Logiciel: Plasticite
IHM et Genie Logiciel: PlasticiteMarius Butuc
 
Interfaces adaptatives. Agents adaptatifs.
Interfaces adaptatives. Agents adaptatifs.Interfaces adaptatives. Agents adaptatifs.
Interfaces adaptatives. Agents adaptatifs.Marius Butuc
 
Next HR - Interviul
Next HR - InterviulNext HR - Interviul
Next HR - InterviulMarius Butuc
 
Next HR - Competentele
Next HR - CompetenteleNext HR - Competentele
Next HR - CompetenteleMarius Butuc
 
Rochi2008 Microwler
Rochi2008 MicrowlerRochi2008 Microwler
Rochi2008 MicrowlerMarius Butuc
 
How to fit 1000 words into an image?
How to fit 1000 words into an image?How to fit 1000 words into an image?
How to fit 1000 words into an image?Marius Butuc
 

Mehr von Marius Butuc (8)

Modern PHP RDF toolkits: a comparative study
Modern PHP RDF toolkits: a comparative studyModern PHP RDF toolkits: a comparative study
Modern PHP RDF toolkits: a comparative study
 
IHM et Genie Logiciel: Plasticite
IHM et Genie Logiciel: PlasticiteIHM et Genie Logiciel: Plasticite
IHM et Genie Logiciel: Plasticite
 
Interfaces adaptatives. Agents adaptatifs.
Interfaces adaptatives. Agents adaptatifs.Interfaces adaptatives. Agents adaptatifs.
Interfaces adaptatives. Agents adaptatifs.
 
Next HR - Interviul
Next HR - InterviulNext HR - Interviul
Next HR - Interviul
 
Next HR - Competentele
Next HR - CompetenteleNext HR - Competentele
Next HR - Competentele
 
Rochi2008 Microwler
Rochi2008 MicrowlerRochi2008 Microwler
Rochi2008 Microwler
 
How to fit 1000 words into an image?
How to fit 1000 words into an image?How to fit 1000 words into an image?
How to fit 1000 words into an image?
 
XHTML 2.0
XHTML 2.0XHTML 2.0
XHTML 2.0
 

Kürzlich hochgeladen

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Kürzlich hochgeladen (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

Semantically enriching content using OpenCalais

  • 1. 1 Semantically enriching content using Open Calais Marius-Gabriel Butuc, MSc Student, UAIC Abstract—One of the key challenges of the Semantic Web is how to go from today’s unstructured web to a web rich with semantic information. Following the bottom up approach, OpenCalais is an automated system intended to annotate data and information, allowing us to gradually build semantically enabled systems. By semantically enriching the published content, we help users enjoy their on-line experiences and reduce the frustration of dealing with voluminous amounts of information that is incoherently organized and often irrelevant to a particular person’s need. Index Terms—API, Linked Data, resource management, web service. Fig. 1. Overview of Calais applications and tools. I. I NTRODUCTION A. How does it work? ALAIS identifies and automatically tags entities (e.g., C people, places, companies, geographies), facts (e.g., rela- tionships) and events (e.g., things that happened) in text. It then The basic interaction delivered by the OpenCalais API (Figure 2) can be summed up in the following steps: 1) First of all, we need an API key – a string that uniquely fabricates connections between those entities and significant identifies the specific instances of the application that data sets, media files, and similar entries using open data from uses the service. sources like Wikipedia, DBpedia, GeoNames, the Internet 2) After obtaining the key, we can submit content to the Movie Database (IMDB), Shopping.com, Reuters.com, etc. Calais Web service. The methods supported by the With the original goal - help developers, bloggers and OpenCalais Web Service API are: SOAP, REST and publishers automatically tag their content to improve search HTTP Traffic Compression. and navigation, and enable new reader engagement features - 3) Calais tags each person, place, fact and event in the the team behind Calais has developed a tool to automate time content, making it machine-readable and interoperable consuming content operations and increase productivity. on the Web. This paper will look into the possibilities of semantically 4) Each piece of content – and each entity or event in that enriching published content [1] using Calais. content – is assigned a unique identifier (a document ID and many URIs) that ties back to the Linked Data Cloud. II. C ALAIS AND THE O PEN C ALAIS W EB S ERVICE 5) We can use the metadata returned by Calais (tags, Calais is described by it’s developers as a ”big initiative document IDs and URIs) in our application, e.g., we can with a lot of components” [2], yet at it’s core lays the use OpenCalais to scan a set of Web pages and build a OpenCalais web service. The web service is an API that local RDF store for querying and display. accepts unstructured text, processes it using Natural Language Processing (NLP) and Machine Learning (ML) algorithms, B. LinkedData Entities returns RDF-formatted entities, facts and events and is used to fuel the applications that build the Calais initiative. As of March 2009, Calais is oficially part of the Linked Open Data (LOD) Cloud (Figure 3). An overview of the Calais applications and tools is depicted Linked Data is a paradigm of exposing, sharing, and con- in Fig. 1 necting data via dereferenceable URIs (Uniform Resource Recently nominated one of the Top 10 Semantic Web prod- Identifier) on the Web. The method includes four princi- ucts of 2009 [3], OpenCalais uses semantic technology and ples [6]: natural language processing to analyze text and add metadata by drawing out entities from documents, blog posts, news 1) Use URIs to identify things that you expose to the Web stories, etc. In some cases [4], this type of data can identify as resources. or help identify relationships between people, businesses, etc. 2) Use HTTP URIs so that people/machines can locate and dereference these things. Marius-Gabriel Butuc is with the Department of Computer Science, Alexan- 3) Provide useful information about the resource when its dru Ioan Cuza University, Iasi, Romania, e-mail: mbutuc@info.uaic.ro , URI is dereferenced.
  • 2. 2 there is one in the North of France and several in the USA. The URI of the ambiguous city “Calais” is: http://d.opencalais.com/genericHasher-1/ 82b41c38-0b5e-301f-8a84-839d4d78e78e.html If we try to dereference this URI, we’ll get disambiguation options, each leading to another resource known to the API that can disambiguate “Calais”. If we follow any of those resources, we will find useful information such as geographical coordinates and links to other Linked Data or Web assets. Currently the list of disambiguated entities is limited to Company, Product (Electronics) and Geographies: City, Country and ProvinceOrState. Company, for example, is the richest of endpoints providing rich informa- tion, such as ticker symbol, officers and directors, corporate website, industry codes, etc. C. Interpreting the API Response Fig. 2. OpenCalais web service The Calais response can be easily understood [7] from RDF, JSON, microformats [8], [9] – a web-based approach to semantic markup that seeks to re-use existing (X)HTML tags to convey metadata and other attributes – or a simple HTML format. Another example of the implementation of the LOD principles is given when we look into the RDF representation for Moscow, Russia. This entity is linked via owl:sameAs to • http://dbpedia.org/resource/Moscow • http://sws.geonames.org/524901/ • http://rdf.freebase.co/ns/guid. 9202a8c04000641f800000000002636c III. E XAMPLES OF APPLICATIONS THAT USE O PEN C ALAIS A. Calais Viewer Calais Viewer [10] is the best way to get a quick glimpse of OpenCalais. We can manually try the web service by typing Fig. 3. The Linked Open Data Cloud – July 2009 [5] in or pasting text into the viewer box, without needing an API key. For example, we used a Wikinews teaser about Barack Obama and got the results depicted in Figure 4. It 4) Include links to other, related URIs in the exposed data identifies the correct topic of the article – Politics – and as a means of improving information discovery on the also Washington, United States as a City that the Web. article is about, and two Person entities: Barack Obama The Calais initiative has exposed it’s data via Linked Data and George W. Bush. endpoints. When Calais extracts an entity from a given text it also returns a dereferenceable entity URI. All the naming B. Tagaroo conventions for entities, facts, events and their corresponding As we saw in Figure 1, Calais provides Content Manage- attributes are rationalized and the schema is published on the ment Tools for Drupal and Wordpress. Tagaroo [11] is one official site as RDFS. of those tools aimed to make any WordPress blog better for There are several classes of entities that the publisher, for the readers and more accessible to search are considered meaningful to Calais and they engines. Similar to Zemanta [12], Tagaroo analyzes the text include Currency, MarketIndex, Movie, in our post and suggests intelligent tags for the things and MusicGroup, MusicAlbum, OperatingSystem, events we’re writing about. Before using it, we need to have ProgrammingLanguage and many others. The breadth a valid API key (Figure 5). The user interface is intuitive and and depth of information provided in each URI varies seamlessly integrated in WordPress. In order to semantically based on how well Calais can unambiguously identify this enrich our content, all we need to do is to start writing our entity, automatically using it’s specific NLP algorithms. post. After writing at least 64 characters, Tagaroo searches for As an example, ”Calais” is an ambiguous name: we tags. We can add them to our post, ignore them, look up Flickr have both Calais – the semantic tool and Calais – the images for any of them as illustrated in Figure 6 or even add city. Even as a city name Calais is ambigous, because our own tags.
  • 3. 3 C. SemanticProxy Following the standard for publishing Linked Data on the Web [13], the Calais team developed SemanticProxy [14] to translate the content of any URL on the web to its semantic representation in RDF, JSON, microformats or HTML (Fig- ure 8). It is optimized for performance on 30 of the world’s largest English-language news sites, but it also works well with other sites like in Figure 7. Intended for the “little semantic machines love to come by and spend a few high-quality milliseconds” [14], it is meant to leverage both current and future semantic applications. Fig. 4. Calais Viewer in action: semantically enriching a Wikinews teaser Fig. 7. SemanticProxy: Interface for the human user Fig. 5. Installing Tagaroo Fig. 8. SemanticProxy in action: semantically enriched HTML results Fig. 6. Tagaroo in action: semantically enriching our content as we type
  • 4. 4 IV. C ONCLUSION R EFERENCES OpenCalais currently only supports content in English, [1] D. Allemang and J. Hendler, Semantic Web for the Working Ontologist: French and Spanish and it tries to identify automatically Effective Modeling in RDFS and OWL. Morgan Kaufmann, 2008. among the given languages by applying a Language Identifica- [2] K. Thomas, “Simple OpenCalais Whitepaper,” tion module before processing the text for entities, events and http://slidesha.re/qoPWP, 2009. facts. Supporting more languages could be a good direction [3] R. MacManus, “Top 10 Semantic Web Products of 2009,” http://www.readwriteweb.com/archives/top 10 semantic web for development, opening Calais even more. The web service products of 2009.php, 2009. is free for both commercial and non-commercial use. The [4] T. Segaran, C. Evans, and J. Taylor, Programming the Semantic Web. current limitations are 50,000 transactions per day with a O’Reilly Media, 2009. [5] ***, “Linked Data - Connect Distributed Data across the web,” frequency of maximum 4 transactions per second while an http://linkeddata.org/. average transaction is expected to take from 0.5 to 1 second. [6] S. Buraga, “Arhitectura aplicatiilor Web-ului semantic. Linked Data,” , Using a semantic-based tool, the users are able to easily Alexandru Ioan Cuza University Iasi, Tech. Rep., 2009, , http://profs.info.uaic.ro/∼busaco/teach/courses/wade/presentations/ automate content operations, increase productivity and cut web08ArhitecturaAplicatiilorWebSemantic-LinkedData.pdf. costs. Since the Semantic Web goal is not to build a web [7] ***, “Understanding the OpenCalais RDF Response,” of data, but to enrich lives through access to information [15], http://blog.3kbo.com/2009/09/26/opencalais-response/. [8] ——, “Microformats,” http://microformats.org/. we can now approach the bottom up paradigm [16], enhancing [9] J. Allsopp, Microformats: Empowering Your Markup for Web 2.0. the value of our content and improving the user experience. Friends of Ed, 2007. As stated in Section II-A, we can use OpenCalais, or even [10] ***, “Calais viewer,” http://viewer.opencalais.com/. [11] ——, “Tagaroo – make blogging better!” http://tagaroo.opencalais.com/. better SemanticProxy, to scan a set of Web pages and build a [12] ——, “Zemanta: Blog smarter,” http://www.zemanta.com/. local RDF store for querying and display. [13] C. Bizer, R. Cyganiak, and T. Heath, “How to Publish Linked Data on Another application could consist in a visualization tool that a the Web,” Freie Universit¨t Berlin, Tech. Rep., 2007, http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/. could make OpenCalais even more powerful. For example it [14] ***, “Semanticproxy,” http://semanticproxy.com/. might be interesting for visualization tools like Muckety [17] [15] I. Davis, “If you love something... set it free,” or NNDB Mapper [18] and to quickly ”see” relationships that http://slidesha.re/haCax, 2009, at Code4Lib. [16] A. Iskold, “Semantic Web Patterns: A Guide to Semantic Technologies,” might go unnoticed without tools like OpenCalais. http://www.readwriteweb.com/archives/semantic web patterns.php, We intend to experiment the use of OpenCalais for enhanc- 2008. ing the content of the academic social Web applications (e.g., [17] ***, “Muckety - Mapping relations and measuring influence,” http://muckety.com/. blogs, wikis) in the context of the semantic e-learning. [18] ——, “NNDB Mapper: Tracking the entire world,” http://mapper.nndb.com/.