SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Apache Stanbol (Incubating) and the Web of Data Olivier Grisel, Nuxeo ogrisel@apache.org, 2011-11-11 11/7/11
My Background 11/7/11 Olivier Grisel - R&D Engineer nuxeo Open Source ECM    European project: IKS Stuff I do: Machine Learning  Natural Language Processing  All things data
Agenda 11/7/11 The Web of Data: what, why, how? CMS integration demo Semantic Components in Stanbol Building models for Stanbol
The Web of Data What, Why, How?
 
11/7/11 “ To a computer, then, the web is a  flat ,  boring  world devoid of  meaning ” Tim Berners Lee,  http://www.w3.org/Talks/WWW94Tim/
11/7/11 “ This is a pity, as in fact  documents  on the web describe  real objects  and imaginary  concepts , and give particular  relationships  between them” Tim Berners Lee,  http://www.w3.org/Talks/WWW94Tim/
11/7/11 “ The Semantic Web is not a separate Web but an  extension  of the current one, in which information is given well-defined  meaning , better enabling  computers and people  to work in cooperation.” Tim Berners Lee,  http://www.w3.org/Talks/WWW94Tim/
11/7/11 “ Adding semantics to the web involves two things: allowing  documents  which have information in  machine-readable  forms, and allowing  links  to be created with  relationship values .” Tim Berners Lee,  http://www.w3.org/Talks/WWW94Tim/
The Web of Data – What? 11/7/11 ,[object Object],[object Object],[object Object],[object Object],[object Object]
The Web of Data – Why? 11/7/11 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Decoding User Intents 11/7/11
Decoding User Intents 11/7/11 Next Generation User Interfaces Siri - conversational interface IBM DeepQA: Watson for Heath Care Tell Google about your stuff Publish structured prediction of your products "3 bedrooms flat near Montmartre" Useful for non-public data as well Intranet query: "ApacheCon slides" Intranet query: "Xerox invoices" Intranet query: "Xerox salesperson email"
The Web of Data - How? 11/7/11 RDF / TripeStores / Sparql Graph stores with dynamic schemas Strong interoperability JSON-LD Upgrade your JSON with scoped vocabularies Web / Mobile / JS developer friendly RDFa + schema.org & rNews Publish annotation in structured markup Vocabulary understood by Search Engines
HTML example 11/7/11 <p>    My name is Manu Sporny and you can give me a ring via   1-800-555-0155.      <img src=&quot;http://manu.sporny.org/images/manu.png&quot; />      I have a <a href=&quot;http://manu.sporny.org/&quot;>blog</a>. </p>
RDFa example 11/7/11 <p  vocab=&quot;http://schema.org/&quot;     prefix=&quot;foaf: http://xmlns.com/foaf/0.1/&quot;    about=&quot;#manu&quot; typeof=&quot;Person&quot; >    My name is <span  property=&quot;name&quot; >Manu Sporny</span>   and you can give me a ring via   <span  property=&quot;telephone&quot; >1-800-555-0155</span>.      <img  rel=&quot;image&quot;       src=&quot;http://manu.sporny.org/images/manu.png&quot; />      I have a <a  rel=&quot;foaf:weblog&quot;      href=&quot;http://manu.sporny.org/&quot;>blog</a>. </p>
JSON-LD example 11/7/11
11/7/11 2007 2008 2009 2010
2011
Bridging the Web of Data and my CMS
 
Apache Stanbol 11/7/11 Enhancer Text analysis with Apache OpenNLP   / Tika EntityHub / ContentHub Linked Data Indexing with Apache Solr Graph Storage with Apache Clerezza / Jena Reasoner / Rules Inference with Apache Jena & OWLApi  Components / HTTP Services OSGi with Apache Felix / JAX-RS with Jersey
 
 
 
 
RESTful is Beautiful
Minimalist HTTP Client 11/7/11 curl -X POST -H &quot;Accept: text/turtle&quot; -H &quot;Content-type: text/plain&quot; --data &quot;John Smith was born in London.&quot; http://stanbol.demo.nuxeo.com/engines
 
 
Local IT infrastructure (LAN) Nuxeo DM addon 1 1 Apache Stanbol 1 2 1 Engine 1 Engine 2 Engine 3 3 DBpedia Freebase Geonames LDAP
Stanbol Enhancer 11/7/11 Chain of Enhancement Engines Language Detection (Tika) Named Entity Detection (OpenNLP) Linked Data dereferencing (Solr) Refactoring / Translation (Jena)
Stanbol EntityHub 11/7/11 Referenced Sites DBpedia Geonames (NY Times, MusicBrainz, ProductDB, UnitProt...) Fast local offline indices (Solr) Batch indexing utilities for RDF dumps Multilingual fulltext search in labels & descriptions Vocabulary mapping / merging
Stanbol Reasoner 11/7/11 RDFS / OWL-lite / OWL2 Consistency checks Cardinality checks: each person has 1 birth date  Range constraints: birth dates are valid dates Materializing types / properties Types from subclass: Musician > Artist > Person Symmetric property:  A worked with B Transitive property:  A is a located in B Query-time expansion / inference?
Stanbol Rules 11/7/11 Simple Prolog-like language uncleRule[   has(<http://example.org/family.owl#hasParent>, ?x, ?z) . has(<http://example.org/family.owl#hasSibling>, ?z, ?y) -> has(<http://example.org/family.owl#hasUncle>, ?x, ?y) ] Sparql Construct or SWRL PREFIX family: <http://example.org/family.owl#> CONSTRUCT { ?x family:hasUncle} ?y } WHERE { ?x family:hasParent ?z .  ?z family:hasSibling ?y}
Online Demos 11/7/11 Simple analyzer with small index https://stanbol.demo.nuxeo.com All services deployed http://dev.iks-project.eu:8081
Building Stanbol Enhancer models from Wikipedia with the Apache data tools
Universal Topic Classification 11/7/11 Use Apache Lucene / Solr MoreLikeThis to perform a truncated nearest neighbors query in the TF-IDF vector space of Wikipedia
Universal Topic Classification 11/7/11 Index text of all articles grouped by topic  Solr MoreLikeThis query on new document DBpedia dumps provide: Text summaries for each article “ subject” relationships between articles and topics “ broader” / “narrower” SKOS hieararchy between topics
About the Data 11/7/11 500k purely technical categories “ People_with_missing_birth_place”,  “Rivers_in_Romania” 70k “semantically grounded” categories Paths to roots require both “ technical” and “grounded” categories Scale: 1.2M topic / topic links 30M topic / article links
Some results (Wikinews) 11/7/11 US children who celebrate Independence Day more likely to become Republicans, says Harvard study Fireworks Voting theory Republican Party (United States) Statistics Electoral systems
Some results (Wikinews) 11/7/11 U.S. space agency NASA sues ex-astronaut American astronauts Aviation halls of fame Edwards Air Force Base Apollo program Exploration of the Moon
Some results (Wikinews) 11/7/11 Hundreds of thousands of British public sector workers strike over planned pension changes Retirement in the United Kingdom United Kingdom pensions and benefits Pensions in the United Kingdom Labor disputes by country Labor disputes
Some results (PLoS One) 11/7/11 Metabolic Programming during Lactation Stimulates Renal Na+ Transport in the Adult Offspring Due to an Early Impact on Local Angiotensin II Pathways Renal physiology Kidney Nephrology Hypertension Membrane biology
Wrap Up 11/7/11 Web of Data brings  Sructured Context Frame to decode  User Intention NLP  +  Entities & Topics  indices to automate  Content   Enrichment to provide  Disambiguationn
Resources 11/7/11 Documentation, svn, mailing list:   http://incubator.apache.org/stanbol IKS project blog:   http://blog.iks-project.eu Blog posts about Semantic ECM:   http://blogs.nuxeo.com/dev/semantic/
Thank you for your attention! 11/7/11 Olivier Grisel [email_address] https://twitter.com/ogrisel
Training models for NER from Wikipedia Extract  sentences with link positions  in Wikipedia articles DBPedia to the  find type of the target entity  (Person, Location, Organization) Apache Pig scripts  to compute the  join + format  the result as training files for OpenNLP Apache OpenNLP  to build and evaluate the models Apache Hadoop / Apache Whirr  for distributed processing
 
 
 
 

Weitere ähnliche Inhalte

Was ist angesagt?

Web search engine
Web search engineWeb search engine
Web search engineshowhow916
 
Linked data and rdf
Linked  data and rdfLinked  data and rdf
Linked data and rdfDaniel Nüst
 
TXDHC OpenRefine Training
TXDHC OpenRefine TrainingTXDHC OpenRefine Training
TXDHC OpenRefine TrainingLiz Grumbach
 
Introduction to OpenRefine
Introduction to OpenRefineIntroduction to OpenRefine
Introduction to OpenRefineHeather Myers
 
OpenRefine Class Tutorial
OpenRefine Class TutorialOpenRefine Class Tutorial
OpenRefine Class TutorialAshwin Dinoriya
 
Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...
Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...
Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...Benjamin Adrian
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlPrimal Pappachan
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for LibrariesLukas Koster
 
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...CIARD Movement
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
The Semantic Web #10 - SPARQL
The Semantic Web #10 - SPARQLThe Semantic Web #10 - SPARQL
The Semantic Web #10 - SPARQLMyungjin Lee
 
In the Trenches with Accessible EPUB - Charles LaPierre - ebookcraft 2017
In the Trenches with Accessible EPUB - Charles LaPierre - ebookcraft 2017In the Trenches with Accessible EPUB - Charles LaPierre - ebookcraft 2017
In the Trenches with Accessible EPUB - Charles LaPierre - ebookcraft 2017BookNet Canada
 
Evolutionary & Swarm Computing for the Semantic Web
Evolutionary & Swarm Computing for the Semantic WebEvolutionary & Swarm Computing for the Semantic Web
Evolutionary & Swarm Computing for the Semantic WebAnkit Solanki
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in LibrariesCarl Hess
 
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
Introduction to Research Objects - Collaboartions Workshop 2015, OxfordIntroduction to Research Objects - Collaboartions Workshop 2015, Oxford
Introduction to Research Objects - Collaboartions Workshop 2015, Oxfordmatthewgamble
 
Emerging technologies in academic libraries
Emerging technologies in academic librariesEmerging technologies in academic libraries
Emerging technologies in academic librariesMichael Cummings
 

Was ist angesagt? (19)

Web search engine
Web search engineWeb search engine
Web search engine
 
Linked data and rdf
Linked  data and rdfLinked  data and rdf
Linked data and rdf
 
TXDHC OpenRefine Training
TXDHC OpenRefine TrainingTXDHC OpenRefine Training
TXDHC OpenRefine Training
 
Introduction to OpenRefine
Introduction to OpenRefineIntroduction to OpenRefine
Introduction to OpenRefine
 
OpenRefine Class Tutorial
OpenRefine Class TutorialOpenRefine Class Tutorial
OpenRefine Class Tutorial
 
Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...
Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...
Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing Webcrawl
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
 
LIBRIS - Linked Library Data
LIBRIS - Linked Library DataLIBRIS - Linked Library Data
LIBRIS - Linked Library Data
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
The Semantic Web #10 - SPARQL
The Semantic Web #10 - SPARQLThe Semantic Web #10 - SPARQL
The Semantic Web #10 - SPARQL
 
In the Trenches with Accessible EPUB - Charles LaPierre - ebookcraft 2017
In the Trenches with Accessible EPUB - Charles LaPierre - ebookcraft 2017In the Trenches with Accessible EPUB - Charles LaPierre - ebookcraft 2017
In the Trenches with Accessible EPUB - Charles LaPierre - ebookcraft 2017
 
OpenURL @ Rice U. (2008)
OpenURL @ Rice U. (2008)OpenURL @ Rice U. (2008)
OpenURL @ Rice U. (2008)
 
Evolutionary & Swarm Computing for the Semantic Web
Evolutionary & Swarm Computing for the Semantic WebEvolutionary & Swarm Computing for the Semantic Web
Evolutionary & Swarm Computing for the Semantic Web
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 
Metadata is back!
Metadata is back!Metadata is back!
Metadata is back!
 
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
Introduction to Research Objects - Collaboartions Workshop 2015, OxfordIntroduction to Research Objects - Collaboartions Workshop 2015, Oxford
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
 
Emerging technologies in academic libraries
Emerging technologies in academic librariesEmerging technologies in academic libraries
Emerging technologies in academic libraries
 

Andere mochten auch

Semantic Integration with Apache Jena and Stanbol
Semantic Integration with Apache Jena and StanbolSemantic Integration with Apache Jena and Stanbol
Semantic Integration with Apache Jena and StanbolAll Things Open
 
What Apache Stanbol Can Do for You
What Apache Stanbol Can Do for YouWhat Apache Stanbol Can Do for You
What Apache Stanbol Can Do for YouFabian Christ
 
TypeScript for Alfresco and CMIS - Alfresco DevCon 2012 San Jose
TypeScript for Alfresco and CMIS - Alfresco DevCon 2012 San JoseTypeScript for Alfresco and CMIS - Alfresco DevCon 2012 San Jose
TypeScript for Alfresco and CMIS - Alfresco DevCon 2012 San JoseSteve Reiner
 
[Webinar] Semantic Technologies
[Webinar] Semantic Technologies[Webinar] Semantic Technologies
[Webinar] Semantic TechnologiesNuxeo
 
OpenCms Days 2012 - Keynote: Semantic Technologies for CMS
OpenCms Days 2012 - Keynote: Semantic Technologies for CMSOpenCms Days 2012 - Keynote: Semantic Technologies for CMS
OpenCms Days 2012 - Keynote: Semantic Technologies for CMSAlkacon Software GmbH & Co. KG
 
Triplestore and SPARQL
Triplestore and SPARQLTriplestore and SPARQL
Triplestore and SPARQLLino Valdivia
 
Saveface - Save your Facebook content as RDF data
Saveface - Save your Facebook content as RDF dataSaveface - Save your Facebook content as RDF data
Saveface - Save your Facebook content as RDF dataFuming Shih
 
An Introduction to the Jena API
An Introduction to the Jena APIAn Introduction to the Jena API
An Introduction to the Jena APICraig Trim
 
Facebook ( Open ) Graph and the Semantic Web
Facebook ( Open ) Graph and the Semantic WebFacebook ( Open ) Graph and the Semantic Web
Facebook ( Open ) Graph and the Semantic WebMatteo Brunati
 

Andere mochten auch (11)

Semantic Integration with Apache Jena and Stanbol
Semantic Integration with Apache Jena and StanbolSemantic Integration with Apache Jena and Stanbol
Semantic Integration with Apache Jena and Stanbol
 
What Apache Stanbol Can Do for You
What Apache Stanbol Can Do for YouWhat Apache Stanbol Can Do for You
What Apache Stanbol Can Do for You
 
TypeScript for Alfresco and CMIS - Alfresco DevCon 2012 San Jose
TypeScript for Alfresco and CMIS - Alfresco DevCon 2012 San JoseTypeScript for Alfresco and CMIS - Alfresco DevCon 2012 San Jose
TypeScript for Alfresco and CMIS - Alfresco DevCon 2012 San Jose
 
[Webinar] Semantic Technologies
[Webinar] Semantic Technologies[Webinar] Semantic Technologies
[Webinar] Semantic Technologies
 
OpenCms Days 2012 - Keynote: Semantic Technologies for CMS
OpenCms Days 2012 - Keynote: Semantic Technologies for CMSOpenCms Days 2012 - Keynote: Semantic Technologies for CMS
OpenCms Days 2012 - Keynote: Semantic Technologies for CMS
 
Jena framework
Jena frameworkJena framework
Jena framework
 
Apache Marmotta - Introduction
Apache Marmotta - IntroductionApache Marmotta - Introduction
Apache Marmotta - Introduction
 
Triplestore and SPARQL
Triplestore and SPARQLTriplestore and SPARQL
Triplestore and SPARQL
 
Saveface - Save your Facebook content as RDF data
Saveface - Save your Facebook content as RDF dataSaveface - Save your Facebook content as RDF data
Saveface - Save your Facebook content as RDF data
 
An Introduction to the Jena API
An Introduction to the Jena APIAn Introduction to the Jena API
An Introduction to the Jena API
 
Facebook ( Open ) Graph and the Semantic Web
Facebook ( Open ) Graph and the Semantic WebFacebook ( Open ) Graph and the Semantic Web
Facebook ( Open ) Graph and the Semantic Web
 

Ähnlich wie Apache Stanbol 
and the Web of Data - ApacheCon 2011

Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersEmanuele Della Valle
 
Krug Fat Client
Krug Fat ClientKrug Fat Client
Krug Fat ClientPaul Klipp
 
Document Databases & RavenDB
Document Databases & RavenDBDocument Databases & RavenDB
Document Databases & RavenDBBrian Ritchie
 
Mashups MAX 360|MAX 2008 Unconference
Mashups MAX 360|MAX 2008 UnconferenceMashups MAX 360|MAX 2008 Unconference
Mashups MAX 360|MAX 2008 UnconferenceElad Elrom
 
Instrumentation with Splunk
Instrumentation with SplunkInstrumentation with Splunk
Instrumentation with SplunkDatavail
 
XML-Free Programming : Java Server and Client Development without &lt;>
XML-Free Programming : Java Server and Client Development without &lt;>XML-Free Programming : Java Server and Client Development without &lt;>
XML-Free Programming : Java Server and Client Development without &lt;>Arun Gupta
 
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and JujuMining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Jujuseoul_engineer
 
Web data from R
Web data from RWeb data from R
Web data from Rschamber
 
Search as a Service with Xapian - Search Solutions 2009
Search as a Service with Xapian - Search Solutions 2009Search as a Service with Xapian - Search Solutions 2009
Search as a Service with Xapian - Search Solutions 2009Richard Boulton
 
Proud to be polyglot!
Proud to be polyglot!Proud to be polyglot!
Proud to be polyglot!NLJUG
 
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLARiot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLAsean_seannery
 
Implementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoCImplementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoCjimfuller2009
 
Introduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseIntroduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseTugdual Grall
 
Content Mirror
Content MirrorContent Mirror
Content Mirrorfravy
 
Application Engine ETL
Application Engine ETLApplication Engine ETL
Application Engine ETLkabrilake
 

Ähnlich wie Apache Stanbol 
and the Web of Data - ApacheCon 2011 (20)

Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS Practitioners
 
Krug Fat Client
Krug Fat ClientKrug Fat Client
Krug Fat Client
 
Document Databases & RavenDB
Document Databases & RavenDBDocument Databases & RavenDB
Document Databases & RavenDB
 
Lecture 6 Data Driven Design
Lecture 6  Data Driven DesignLecture 6  Data Driven Design
Lecture 6 Data Driven Design
 
Mashups MAX 360|MAX 2008 Unconference
Mashups MAX 360|MAX 2008 UnconferenceMashups MAX 360|MAX 2008 Unconference
Mashups MAX 360|MAX 2008 Unconference
 
Instrumentation with Splunk
Instrumentation with SplunkInstrumentation with Splunk
Instrumentation with Splunk
 
XML-Free Programming : Java Server and Client Development without &lt;>
XML-Free Programming : Java Server and Client Development without &lt;>XML-Free Programming : Java Server and Client Development without &lt;>
XML-Free Programming : Java Server and Client Development without &lt;>
 
Not only SQL
Not only SQL Not only SQL
Not only SQL
 
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and JujuMining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
 
Web data from R
Web data from RWeb data from R
Web data from R
 
Search as a Service with Xapian - Search Solutions 2009
Search as a Service with Xapian - Search Solutions 2009Search as a Service with Xapian - Search Solutions 2009
Search as a Service with Xapian - Search Solutions 2009
 
Proud to be polyglot!
Proud to be polyglot!Proud to be polyglot!
Proud to be polyglot!
 
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLARiot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
 
Jake_Park_resume
Jake_Park_resumeJake_Park_resume
Jake_Park_resume
 
Implementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoCImplementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoC
 
Introduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseIntroduction to NoSQL with Couchbase
Introduction to NoSQL with Couchbase
 
Content Mirror
Content MirrorContent Mirror
Content Mirror
 
Application Engine ETL
Application Engine ETLApplication Engine ETL
Application Engine ETL
 
Thinking restfully
Thinking restfullyThinking restfully
Thinking restfully
 
Web Topics
Web TopicsWeb Topics
Web Topics
 

Mehr von Nuxeo

Own the Digital Shelf Strategies Food and Beverage Companies
Own the Digital Shelf Strategies Food and Beverage CompaniesOwn the Digital Shelf Strategies Food and Beverage Companies
Own the Digital Shelf Strategies Food and Beverage CompaniesNuxeo
 
How DAM Librarians Can Get Ready for the Uncertain Future
How DAM Librarians Can Get Ready for the Uncertain FutureHow DAM Librarians Can Get Ready for the Uncertain Future
How DAM Librarians Can Get Ready for the Uncertain FutureNuxeo
 
How Insurers Fueled Transformation During a Pandemic
How Insurers Fueled Transformation During a PandemicHow Insurers Fueled Transformation During a Pandemic
How Insurers Fueled Transformation During a PandemicNuxeo
 
Manage your Content at Scale with MongoDB and Nuxeo
Manage your Content at Scale with MongoDB and NuxeoManage your Content at Scale with MongoDB and Nuxeo
Manage your Content at Scale with MongoDB and NuxeoNuxeo
 
Accelerate the Digital Supply Chain From Idea to Support
Accelerate the Digital Supply Chain From Idea to SupportAccelerate the Digital Supply Chain From Idea to Support
Accelerate the Digital Supply Chain From Idea to SupportNuxeo
 
Where are you in the DAM Continuum
Where are you in the DAM ContinuumWhere are you in the DAM Continuum
Where are you in the DAM ContinuumNuxeo
 
Customer Experience in 2021
Customer Experience in 2021Customer Experience in 2021
Customer Experience in 2021Nuxeo
 
L’IA personnalisée, clé d’une gestion de l’information innovante
L’IA personnalisée, clé d’une gestion de l’information innovanteL’IA personnalisée, clé d’une gestion de l’information innovante
L’IA personnalisée, clé d’une gestion de l’information innovanteNuxeo
 
Gérer ses contenus avec MongoDB et Nuxeo
Gérer ses contenus avec MongoDB et NuxeoGérer ses contenus avec MongoDB et Nuxeo
Gérer ses contenus avec MongoDB et NuxeoNuxeo
 
Le DAM en 2021 : Tendances, points clés et critères d'évaluation
Le DAM en 2021 : Tendances, points clés et critères d'évaluationLe DAM en 2021 : Tendances, points clés et critères d'évaluation
Le DAM en 2021 : Tendances, points clés et critères d'évaluationNuxeo
 
Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...
Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...
Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...Nuxeo
 
Elevate your Customer's Experience and Stay Ahead of the Competition
Elevate your Customer's Experience and Stay Ahead of the CompetitionElevate your Customer's Experience and Stay Ahead of the Competition
Elevate your Customer's Experience and Stay Ahead of the CompetitionNuxeo
 
Driving Brand Loyalty Through Superior Customer Experience
Driving Brand Loyalty Through Superior Customer Experience Driving Brand Loyalty Through Superior Customer Experience
Driving Brand Loyalty Through Superior Customer Experience Nuxeo
 
Drive Enterprise Speed and Scale with A Cloud-Native DAM
Drive Enterprise Speed and Scale with A Cloud-Native DAMDrive Enterprise Speed and Scale with A Cloud-Native DAM
Drive Enterprise Speed and Scale with A Cloud-Native DAMNuxeo
 
The Big Picture: the Role of Video, Photography, and Content in Enhancing the...
The Big Picture: the Role of Video, Photography, and Content in Enhancing the...The Big Picture: the Role of Video, Photography, and Content in Enhancing the...
The Big Picture: the Role of Video, Photography, and Content in Enhancing the...Nuxeo
 
How Creatives Are Getting Creative in 2020 and Beyond
How Creatives Are Getting Creative in 2020 and BeyondHow Creatives Are Getting Creative in 2020 and Beyond
How Creatives Are Getting Creative in 2020 and BeyondNuxeo
 
Digitalisation : Améliorez la collaboration et l’expérience client grâce au DAM
Digitalisation : Améliorez la collaboration et l’expérience client grâce au DAMDigitalisation : Améliorez la collaboration et l’expérience client grâce au DAM
Digitalisation : Améliorez la collaboration et l’expérience client grâce au DAMNuxeo
 
Reimagine Your Claims Process with Future-Proof Technologies
Reimagine Your Claims Process with Future-Proof TechnologiesReimagine Your Claims Process with Future-Proof Technologies
Reimagine Your Claims Process with Future-Proof TechnologiesNuxeo
 
Comment le Centre Hospitalier Laborit dématérialise ses processus administratifs
Comment le Centre Hospitalier Laborit dématérialise ses processus administratifsComment le Centre Hospitalier Laborit dématérialise ses processus administratifs
Comment le Centre Hospitalier Laborit dématérialise ses processus administratifsNuxeo
 
Accelerating the Packaging Design Process with Artificial Intelligence
Accelerating the Packaging Design Process with Artificial IntelligenceAccelerating the Packaging Design Process with Artificial Intelligence
Accelerating the Packaging Design Process with Artificial IntelligenceNuxeo
 

Mehr von Nuxeo (20)

Own the Digital Shelf Strategies Food and Beverage Companies
Own the Digital Shelf Strategies Food and Beverage CompaniesOwn the Digital Shelf Strategies Food and Beverage Companies
Own the Digital Shelf Strategies Food and Beverage Companies
 
How DAM Librarians Can Get Ready for the Uncertain Future
How DAM Librarians Can Get Ready for the Uncertain FutureHow DAM Librarians Can Get Ready for the Uncertain Future
How DAM Librarians Can Get Ready for the Uncertain Future
 
How Insurers Fueled Transformation During a Pandemic
How Insurers Fueled Transformation During a PandemicHow Insurers Fueled Transformation During a Pandemic
How Insurers Fueled Transformation During a Pandemic
 
Manage your Content at Scale with MongoDB and Nuxeo
Manage your Content at Scale with MongoDB and NuxeoManage your Content at Scale with MongoDB and Nuxeo
Manage your Content at Scale with MongoDB and Nuxeo
 
Accelerate the Digital Supply Chain From Idea to Support
Accelerate the Digital Supply Chain From Idea to SupportAccelerate the Digital Supply Chain From Idea to Support
Accelerate the Digital Supply Chain From Idea to Support
 
Where are you in the DAM Continuum
Where are you in the DAM ContinuumWhere are you in the DAM Continuum
Where are you in the DAM Continuum
 
Customer Experience in 2021
Customer Experience in 2021Customer Experience in 2021
Customer Experience in 2021
 
L’IA personnalisée, clé d’une gestion de l’information innovante
L’IA personnalisée, clé d’une gestion de l’information innovanteL’IA personnalisée, clé d’une gestion de l’information innovante
L’IA personnalisée, clé d’une gestion de l’information innovante
 
Gérer ses contenus avec MongoDB et Nuxeo
Gérer ses contenus avec MongoDB et NuxeoGérer ses contenus avec MongoDB et Nuxeo
Gérer ses contenus avec MongoDB et Nuxeo
 
Le DAM en 2021 : Tendances, points clés et critères d'évaluation
Le DAM en 2021 : Tendances, points clés et critères d'évaluationLe DAM en 2021 : Tendances, points clés et critères d'évaluation
Le DAM en 2021 : Tendances, points clés et critères d'évaluation
 
Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...
Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...
Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...
 
Elevate your Customer's Experience and Stay Ahead of the Competition
Elevate your Customer's Experience and Stay Ahead of the CompetitionElevate your Customer's Experience and Stay Ahead of the Competition
Elevate your Customer's Experience and Stay Ahead of the Competition
 
Driving Brand Loyalty Through Superior Customer Experience
Driving Brand Loyalty Through Superior Customer Experience Driving Brand Loyalty Through Superior Customer Experience
Driving Brand Loyalty Through Superior Customer Experience
 
Drive Enterprise Speed and Scale with A Cloud-Native DAM
Drive Enterprise Speed and Scale with A Cloud-Native DAMDrive Enterprise Speed and Scale with A Cloud-Native DAM
Drive Enterprise Speed and Scale with A Cloud-Native DAM
 
The Big Picture: the Role of Video, Photography, and Content in Enhancing the...
The Big Picture: the Role of Video, Photography, and Content in Enhancing the...The Big Picture: the Role of Video, Photography, and Content in Enhancing the...
The Big Picture: the Role of Video, Photography, and Content in Enhancing the...
 
How Creatives Are Getting Creative in 2020 and Beyond
How Creatives Are Getting Creative in 2020 and BeyondHow Creatives Are Getting Creative in 2020 and Beyond
How Creatives Are Getting Creative in 2020 and Beyond
 
Digitalisation : Améliorez la collaboration et l’expérience client grâce au DAM
Digitalisation : Améliorez la collaboration et l’expérience client grâce au DAMDigitalisation : Améliorez la collaboration et l’expérience client grâce au DAM
Digitalisation : Améliorez la collaboration et l’expérience client grâce au DAM
 
Reimagine Your Claims Process with Future-Proof Technologies
Reimagine Your Claims Process with Future-Proof TechnologiesReimagine Your Claims Process with Future-Proof Technologies
Reimagine Your Claims Process with Future-Proof Technologies
 
Comment le Centre Hospitalier Laborit dématérialise ses processus administratifs
Comment le Centre Hospitalier Laborit dématérialise ses processus administratifsComment le Centre Hospitalier Laborit dématérialise ses processus administratifs
Comment le Centre Hospitalier Laborit dématérialise ses processus administratifs
 
Accelerating the Packaging Design Process with Artificial Intelligence
Accelerating the Packaging Design Process with Artificial IntelligenceAccelerating the Packaging Design Process with Artificial Intelligence
Accelerating the Packaging Design Process with Artificial Intelligence
 

Kürzlich hochgeladen

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Kürzlich hochgeladen (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Apache Stanbol 
and the Web of Data - ApacheCon 2011

  • 1. Apache Stanbol (Incubating) and the Web of Data Olivier Grisel, Nuxeo ogrisel@apache.org, 2011-11-11 11/7/11
  • 2. My Background 11/7/11 Olivier Grisel - R&D Engineer nuxeo Open Source ECM    European project: IKS Stuff I do: Machine Learning Natural Language Processing  All things data
  • 3. Agenda 11/7/11 The Web of Data: what, why, how? CMS integration demo Semantic Components in Stanbol Building models for Stanbol
  • 4. The Web of Data What, Why, How?
  • 5.  
  • 6. 11/7/11 “ To a computer, then, the web is a  flat ,  boring  world devoid of  meaning ” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • 7. 11/7/11 “ This is a pity, as in fact  documents  on the web describe  real objects  and imaginary  concepts , and give particular  relationships  between them” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • 8. 11/7/11 “ The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning , better enabling computers and people to work in cooperation.” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • 9. 11/7/11 “ Adding semantics to the web involves two things: allowing  documents  which have information in  machine-readable  forms, and allowing  links  to be created with  relationship values .” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • 10.
  • 11.
  • 13. Decoding User Intents 11/7/11 Next Generation User Interfaces Siri - conversational interface IBM DeepQA: Watson for Heath Care Tell Google about your stuff Publish structured prediction of your products &quot;3 bedrooms flat near Montmartre&quot; Useful for non-public data as well Intranet query: &quot;ApacheCon slides&quot; Intranet query: &quot;Xerox invoices&quot; Intranet query: &quot;Xerox salesperson email&quot;
  • 14. The Web of Data - How? 11/7/11 RDF / TripeStores / Sparql Graph stores with dynamic schemas Strong interoperability JSON-LD Upgrade your JSON with scoped vocabularies Web / Mobile / JS developer friendly RDFa + schema.org & rNews Publish annotation in structured markup Vocabulary understood by Search Engines
  • 15. HTML example 11/7/11 <p>   My name is Manu Sporny and you can give me a ring via   1-800-555-0155.     <img src=&quot;http://manu.sporny.org/images/manu.png&quot; />      I have a <a href=&quot;http://manu.sporny.org/&quot;>blog</a>. </p>
  • 16. RDFa example 11/7/11 <p vocab=&quot;http://schema.org/&quot;     prefix=&quot;foaf: http://xmlns.com/foaf/0.1/&quot;    about=&quot;#manu&quot; typeof=&quot;Person&quot; >   My name is <span property=&quot;name&quot; >Manu Sporny</span>   and you can give me a ring via   <span property=&quot;telephone&quot; >1-800-555-0155</span>.     <img rel=&quot;image&quot;     src=&quot;http://manu.sporny.org/images/manu.png&quot; />      I have a <a rel=&quot;foaf:weblog&quot;     href=&quot;http://manu.sporny.org/&quot;>blog</a>. </p>
  • 18. 11/7/11 2007 2008 2009 2010
  • 19. 2011
  • 20. Bridging the Web of Data and my CMS
  • 21.  
  • 22. Apache Stanbol 11/7/11 Enhancer Text analysis with Apache OpenNLP  / Tika EntityHub / ContentHub Linked Data Indexing with Apache Solr Graph Storage with Apache Clerezza / Jena Reasoner / Rules Inference with Apache Jena & OWLApi  Components / HTTP Services OSGi with Apache Felix / JAX-RS with Jersey
  • 23.  
  • 24.  
  • 25.  
  • 26.  
  • 28. Minimalist HTTP Client 11/7/11 curl -X POST -H &quot;Accept: text/turtle&quot; -H &quot;Content-type: text/plain&quot; --data &quot;John Smith was born in London.&quot; http://stanbol.demo.nuxeo.com/engines
  • 29.  
  • 30.  
  • 31. Local IT infrastructure (LAN) Nuxeo DM addon 1 1 Apache Stanbol 1 2 1 Engine 1 Engine 2 Engine 3 3 DBpedia Freebase Geonames LDAP
  • 32. Stanbol Enhancer 11/7/11 Chain of Enhancement Engines Language Detection (Tika) Named Entity Detection (OpenNLP) Linked Data dereferencing (Solr) Refactoring / Translation (Jena)
  • 33. Stanbol EntityHub 11/7/11 Referenced Sites DBpedia Geonames (NY Times, MusicBrainz, ProductDB, UnitProt...) Fast local offline indices (Solr) Batch indexing utilities for RDF dumps Multilingual fulltext search in labels & descriptions Vocabulary mapping / merging
  • 34. Stanbol Reasoner 11/7/11 RDFS / OWL-lite / OWL2 Consistency checks Cardinality checks: each person has 1 birth date Range constraints: birth dates are valid dates Materializing types / properties Types from subclass: Musician > Artist > Person Symmetric property: A worked with B Transitive property: A is a located in B Query-time expansion / inference?
  • 35. Stanbol Rules 11/7/11 Simple Prolog-like language uncleRule[ has(<http://example.org/family.owl#hasParent>, ?x, ?z) . has(<http://example.org/family.owl#hasSibling>, ?z, ?y) -> has(<http://example.org/family.owl#hasUncle>, ?x, ?y) ] Sparql Construct or SWRL PREFIX family: <http://example.org/family.owl#> CONSTRUCT { ?x family:hasUncle} ?y } WHERE { ?x family:hasParent ?z . ?z family:hasSibling ?y}
  • 36. Online Demos 11/7/11 Simple analyzer with small index https://stanbol.demo.nuxeo.com All services deployed http://dev.iks-project.eu:8081
  • 37. Building Stanbol Enhancer models from Wikipedia with the Apache data tools
  • 38. Universal Topic Classification 11/7/11 Use Apache Lucene / Solr MoreLikeThis to perform a truncated nearest neighbors query in the TF-IDF vector space of Wikipedia
  • 39. Universal Topic Classification 11/7/11 Index text of all articles grouped by topic Solr MoreLikeThis query on new document DBpedia dumps provide: Text summaries for each article “ subject” relationships between articles and topics “ broader” / “narrower” SKOS hieararchy between topics
  • 40. About the Data 11/7/11 500k purely technical categories “ People_with_missing_birth_place”, “Rivers_in_Romania” 70k “semantically grounded” categories Paths to roots require both “ technical” and “grounded” categories Scale: 1.2M topic / topic links 30M topic / article links
  • 41. Some results (Wikinews) 11/7/11 US children who celebrate Independence Day more likely to become Republicans, says Harvard study Fireworks Voting theory Republican Party (United States) Statistics Electoral systems
  • 42. Some results (Wikinews) 11/7/11 U.S. space agency NASA sues ex-astronaut American astronauts Aviation halls of fame Edwards Air Force Base Apollo program Exploration of the Moon
  • 43. Some results (Wikinews) 11/7/11 Hundreds of thousands of British public sector workers strike over planned pension changes Retirement in the United Kingdom United Kingdom pensions and benefits Pensions in the United Kingdom Labor disputes by country Labor disputes
  • 44. Some results (PLoS One) 11/7/11 Metabolic Programming during Lactation Stimulates Renal Na+ Transport in the Adult Offspring Due to an Early Impact on Local Angiotensin II Pathways Renal physiology Kidney Nephrology Hypertension Membrane biology
  • 45. Wrap Up 11/7/11 Web of Data brings Sructured Context Frame to decode  User Intention NLP + Entities & Topics indices to automate Content Enrichment to provide Disambiguationn
  • 46. Resources 11/7/11 Documentation, svn, mailing list:   http://incubator.apache.org/stanbol IKS project blog:   http://blog.iks-project.eu Blog posts about Semantic ECM:   http://blogs.nuxeo.com/dev/semantic/
  • 47. Thank you for your attention! 11/7/11 Olivier Grisel [email_address] https://twitter.com/ogrisel
  • 48. Training models for NER from Wikipedia Extract sentences with link positions in Wikipedia articles DBPedia to the find type of the target entity (Person, Location, Organization) Apache Pig scripts to compute the join + format the result as training files for OpenNLP Apache OpenNLP to build and evaluate the models Apache Hadoop / Apache Whirr for distributed processing
  • 49.  
  • 50.  
  • 51.  
  • 52.