SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Apache Stanbol (Incubating) and the Web of Data Olivier Grisel, Nuxeo ogrisel@apache.org, 2011-11-11 11/7/11
My Background 11/7/11 Olivier Grisel - R&D Engineer nuxeo Open Source ECM    European project: IKS Stuff I do: Machine Learning  Natural Language Processing  All things data
Agenda 11/7/11 The Web of Data: what, why, how? CMS integration demo Semantic Components in Stanbol Building models for Stanbol
The Web of Data What, Why, How?
 
11/7/11 “ To a computer, then, the web is a  flat ,  boring  world devoid of  meaning ” Tim Berners Lee,  http://www.w3.org/Talks/WWW94Tim/
11/7/11 “ This is a pity, as in fact  documents  on the web describe  real objects  and imaginary  concepts , and give particular  relationships  between them” Tim Berners Lee,  http://www.w3.org/Talks/WWW94Tim/
11/7/11 “ The Semantic Web is not a separate Web but an  extension  of the current one, in which information is given well-defined  meaning , better enabling  computers and people  to work in cooperation.” Tim Berners Lee,  http://www.w3.org/Talks/WWW94Tim/
11/7/11 “ Adding semantics to the web involves two things: allowing  documents  which have information in  machine-readable  forms, and allowing  links  to be created with  relationship values .” Tim Berners Lee,  http://www.w3.org/Talks/WWW94Tim/
The Web of Data – What? 11/7/11 ,[object Object],[object Object],[object Object],[object Object],[object Object]
The Web of Data – Why? 11/7/11 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Decoding User Intents 11/7/11
Decoding User Intents 11/7/11 Next Generation User Interfaces Siri - conversational interface IBM DeepQA: Watson for Heath Care Tell Google about your stuff Publish structured descriptions of your products "3 bedrooms flat near Montmartre" Useful for non-public data as well Intranet query: "ApacheCon slides" Intranet query: "Xerox invoices" Intranet query: "Xerox salesperson email"
The Web of Data - How? 11/7/11 RDF / TripeStores / Sparql Graph stores with dynamic schemas Strong interoperability JSON-LD Upgrade your JSON with scoped vocabularies Web / Mobile / JS developer friendly RDFa + schema.org & rNews Publish annotation in structured markup Vocabulary understood by Search Engines
HTML example 11/7/11 <p>    My name is Manu Sporny and you can give me a ring via   1-800-555-0155.      <img src=&quot;http://manu.sporny.org/images/manu.png&quot; />      I have a <a href=&quot;http://manu.sporny.org/&quot;>blog</a>. </p>
RDFa example 11/7/11 <p  vocab=&quot;http://schema.org/&quot;     prefix=&quot;foaf: http://xmlns.com/foaf/0.1/&quot;    about=&quot;#manu&quot; typeof=&quot;Person&quot; >    My name is <span  property=&quot;name&quot; >Manu Sporny</span>   and you can give me a ring via   <span  property=&quot;telephone&quot; >1-800-555-0155</span>.      <img  rel=&quot;image&quot;       src=&quot;http://manu.sporny.org/images/manu.png&quot; />      I have a <a  rel=&quot;foaf:weblog&quot;      href=&quot;http://manu.sporny.org/&quot;>blog</a>. </p>
JSON-LD example 11/7/11
11/7/11 2007 2008 2009 2010
2011
Bridging the Web of Data and my CMS
 
Apache Stanbol 11/7/11 Enhancer Text analysis with Apache OpenNLP   / Tika EntityHub / ContentHub Linked Data Indexing with Apache Solr Graph Storage with Apache Clerezza / Jena Reasoner / Rules Inference with Apache Jena & OWLApi  Components / HTTP Services OSGi with Apache Felix / JAX-RS with Jersey
 
 
 
 
RESTful is Beautiful
Minimalist HTTP Client 11/7/11 curl -X POST -H &quot;Accept: text/turtle&quot; -H &quot;Content-type: text/plain&quot; --data &quot;John Smith was born in London.&quot; http://stanbol.demo.nuxeo.com/engines
 
 
Local IT infrastructure (LAN) Nuxeo DM addon 1 1 Apache Stanbol 1 2 1 Engine 1 Engine 2 Engine 3 3 DBpedia Freebase Geonames LDAP
Stanbol Enhancer 11/7/11 Chain of Enhancement Engines Language Detection (Tika) Named Entity Detection (OpenNLP) Linked Data dereferencing (Solr) Refactoring / Translation (Jena)
Stanbol EntityHub 11/7/11 Referenced Sites DBpedia Geonames (NY Times, MusicBrainz, ProductDB, UnitProt...) Fast local offline indices (Solr) Batch indexing utilities for RDF dumps Multilingual fulltext search in labels & descriptions Vocabulary mapping / merging
Stanbol Reasoner 11/7/11 RDFS / OWL-lite / OWL2 Consistency checks Cardinality checks: each person has 1 birth date  Range constraints: birth dates are valid dates Materializing types / properties Types from subclass: Musician > Artist > Person Symmetric property:  A worked with B Transitive property:  A is a located in B Query-time expansion / inference?
Stanbol Rules 11/7/11 Simple Prolog-like language uncleRule[   has(<http://example.org/family.owl#hasParent>, ?x, ?z) . has(<http://example.org/family.owl#hasSibling>, ?z, ?y) -> has(<http://example.org/family.owl#hasUncle>, ?x, ?y) ] Sparql Construct or SWRL PREFIX family: <http://example.org/family.owl#> CONSTRUCT { ?x family:hasUncle} ?y } WHERE { ?x family:hasParent ?z .  ?z family:hasSibling ?y}
Online Demos 11/7/11 Simple analyzer with small index https://stanbol.demo.nuxeo.com All services deployed http://dev.iks-project.eu:8081
Building Stanbol Enhancer models from Wikipedia with the Apache data tools
Universal Topic Classification 11/7/11 Use Apache Lucene / Solr MoreLikeThis to perform a truncated nearest neighbors query in the TF-IDF vector space of Wikipedia
Universal Topic Classification 11/7/11 Index text of all articles grouped by topic  Solr MoreLikeThis query on new document DBpedia dumps provide: Text summaries for each article “ subject” relationships between articles and topics “ broader” / “narrower” SKOS hieararchy between topics
About the Data 11/7/11 500k purely technical categories “ People_with_missing_birth_place”,  “Rivers_in_Romania” 70k “semantically grounded” categories Paths to roots require both “ technical” and “grounded” categories Scale: 1.2M topic / topic links 30M topic / article links
Some results (Wikinews) 11/7/11 US children who celebrate Independence Day more likely to become Republicans, says Harvard study Fireworks Voting theory Republican Party (United States) Statistics Electoral systems
Some results (Wikinews) 11/7/11 U.S. space agency NASA sues ex-astronaut American astronauts Aviation halls of fame Edwards Air Force Base Apollo program Exploration of the Moon
Some results (Wikinews) 11/7/11 Hundreds of thousands of British public sector workers strike over planned pension changes Retirement in the United Kingdom United Kingdom pensions and benefits Pensions in the United Kingdom Labor disputes by country Labor disputes
Some results (PLoS One) 11/7/11 Metabolic Programming during Lactation Stimulates Renal Na+ Transport in the Adult Offspring Due to an Early Impact on Local Angiotensin II Pathways Renal physiology Kidney Nephrology Hypertension Membrane biology
Wrap Up 11/7/11 Web of Data brings  Sructured Context Frame to decode  User Intention NLP  +  Entities & Topics  indices to automate  Content   Enrichment to provide  Disambiguationn
Resources 11/7/11 Documentation, svn, mailing list:   http://incubator.apache.org/stanbol IKS project blog:   http://blog.iks-project.eu Blog posts about Semantic ECM:   http://blogs.nuxeo.com/dev/semantic/
Thank you for your attention! 11/7/11 Olivier Grisel [email_address] https://twitter.com/ogrisel
Training models for NER from Wikipedia Extract  sentences with link positions  in Wikipedia articles DBPedia to the  find type of the target entity  (Person, Location, Organization) Apache Pig scripts  to compute the  join + format  the result as training files for OpenNLP Apache OpenNLP  to build and evaluate the models Apache Hadoop / Apache Whirr  for distributed processing
 
 
 
 

Weitere ähnliche Inhalte

Andere mochten auch

Linked Open Data - Seminar 25.04.12
Linked Open Data - Seminar 25.04.12Linked Open Data - Seminar 25.04.12
Linked Open Data - Seminar 25.04.12Vestforsk.no
 
StartupDigest: Passion, Marketing, Launch. - EventTech 2010
StartupDigest: Passion, Marketing, Launch. - EventTech 2010StartupDigest: Passion, Marketing, Launch. - EventTech 2010
StartupDigest: Passion, Marketing, Launch. - EventTech 2010Chris McCann
 
RDFa: an introduction
RDFa: an introductionRDFa: an introduction
RDFa: an introductionKai Li
 
RDF: Resource Description Failures?
RDF: Resource Description Failures?RDF: Resource Description Failures?
RDF: Resource Description Failures?Robert Sanderson
 
Blogs for Teaching and Learning
Blogs for Teaching and LearningBlogs for Teaching and Learning
Blogs for Teaching and Learninggeorginalopes
 
Web Application Forensics: Taxonomy and Trends
Web Application Forensics: Taxonomy and TrendsWeb Application Forensics: Taxonomy and Trends
Web Application Forensics: Taxonomy and TrendsKrassen Deltchev
 
Linked Data and Images: Building Blocks for Cultural Heritage
Linked Data and Images: Building Blocks for Cultural HeritageLinked Data and Images: Building Blocks for Cultural Heritage
Linked Data and Images: Building Blocks for Cultural HeritageRobert Sanderson
 
Bowery Capital - Guide To Startup Sales Tools
Bowery Capital - Guide To Startup Sales ToolsBowery Capital - Guide To Startup Sales Tools
Bowery Capital - Guide To Startup Sales ToolsNic Poulos
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic WebTomek Pluskiewicz
 

Andere mochten auch (10)

Linked Open Data - Seminar 25.04.12
Linked Open Data - Seminar 25.04.12Linked Open Data - Seminar 25.04.12
Linked Open Data - Seminar 25.04.12
 
StartupDigest: Passion, Marketing, Launch. - EventTech 2010
StartupDigest: Passion, Marketing, Launch. - EventTech 2010StartupDigest: Passion, Marketing, Launch. - EventTech 2010
StartupDigest: Passion, Marketing, Launch. - EventTech 2010
 
RDFa: an introduction
RDFa: an introductionRDFa: an introduction
RDFa: an introduction
 
RDF: Resource Description Failures?
RDF: Resource Description Failures?RDF: Resource Description Failures?
RDF: Resource Description Failures?
 
Blogs for Teaching and Learning
Blogs for Teaching and LearningBlogs for Teaching and Learning
Blogs for Teaching and Learning
 
Web Application Forensics: Taxonomy and Trends
Web Application Forensics: Taxonomy and TrendsWeb Application Forensics: Taxonomy and Trends
Web Application Forensics: Taxonomy and Trends
 
Linked Data and Images: Building Blocks for Cultural Heritage
Linked Data and Images: Building Blocks for Cultural HeritageLinked Data and Images: Building Blocks for Cultural Heritage
Linked Data and Images: Building Blocks for Cultural Heritage
 
Bowery Capital - Guide To Startup Sales Tools
Bowery Capital - Guide To Startup Sales ToolsBowery Capital - Guide To Startup Sales Tools
Bowery Capital - Guide To Startup Sales Tools
 
A Model for the Flipped Classroom
A Model for the Flipped ClassroomA Model for the Flipped Classroom
A Model for the Flipped Classroom
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 

Ähnlich wie Apachecon 2011 stanbol_ogrisel

Linked data based semantic annotation using Drupal and Apache Stanbol
Linked data based semantic annotation using Drupal and Apache StanbolLinked data based semantic annotation using Drupal and Apache Stanbol
Linked data based semantic annotation using Drupal and Apache StanbolGabriel Dragomir
 
Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersEmanuele Della Valle
 
Krug Fat Client
Krug Fat ClientKrug Fat Client
Krug Fat ClientPaul Klipp
 
Document Databases & RavenDB
Document Databases & RavenDBDocument Databases & RavenDB
Document Databases & RavenDBBrian Ritchie
 
Mashups MAX 360|MAX 2008 Unconference
Mashups MAX 360|MAX 2008 UnconferenceMashups MAX 360|MAX 2008 Unconference
Mashups MAX 360|MAX 2008 UnconferenceElad Elrom
 
Instrumentation with Splunk
Instrumentation with SplunkInstrumentation with Splunk
Instrumentation with SplunkDatavail
 
XML-Free Programming : Java Server and Client Development without &lt;>
XML-Free Programming : Java Server and Client Development without &lt;>XML-Free Programming : Java Server and Client Development without &lt;>
XML-Free Programming : Java Server and Client Development without &lt;>Arun Gupta
 
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and JujuMining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Jujuseoul_engineer
 
Web data from R
Web data from RWeb data from R
Web data from Rschamber
 
Search as a Service with Xapian - Search Solutions 2009
Search as a Service with Xapian - Search Solutions 2009Search as a Service with Xapian - Search Solutions 2009
Search as a Service with Xapian - Search Solutions 2009Richard Boulton
 
Proud to be polyglot!
Proud to be polyglot!Proud to be polyglot!
Proud to be polyglot!NLJUG
 
Evolutionary & Swarm Computing for the Semantic Web
Evolutionary & Swarm Computing for the Semantic WebEvolutionary & Swarm Computing for the Semantic Web
Evolutionary & Swarm Computing for the Semantic WebAnkit Solanki
 
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLARiot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLAsean_seannery
 
Implementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoCImplementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoCjimfuller2009
 
Content Mirror
Content MirrorContent Mirror
Content Mirrorfravy
 

Ähnlich wie Apachecon 2011 stanbol_ogrisel (20)

Linked data based semantic annotation using Drupal and Apache Stanbol
Linked data based semantic annotation using Drupal and Apache StanbolLinked data based semantic annotation using Drupal and Apache Stanbol
Linked data based semantic annotation using Drupal and Apache Stanbol
 
Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS Practitioners
 
Krug Fat Client
Krug Fat ClientKrug Fat Client
Krug Fat Client
 
Document Databases & RavenDB
Document Databases & RavenDBDocument Databases & RavenDB
Document Databases & RavenDB
 
Lecture 6 Data Driven Design
Lecture 6  Data Driven DesignLecture 6  Data Driven Design
Lecture 6 Data Driven Design
 
Mashups MAX 360|MAX 2008 Unconference
Mashups MAX 360|MAX 2008 UnconferenceMashups MAX 360|MAX 2008 Unconference
Mashups MAX 360|MAX 2008 Unconference
 
Not only SQL
Not only SQL Not only SQL
Not only SQL
 
Instrumentation with Splunk
Instrumentation with SplunkInstrumentation with Splunk
Instrumentation with Splunk
 
XML-Free Programming : Java Server and Client Development without &lt;>
XML-Free Programming : Java Server and Client Development without &lt;>XML-Free Programming : Java Server and Client Development without &lt;>
XML-Free Programming : Java Server and Client Development without &lt;>
 
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and JujuMining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
 
Web data from R
Web data from RWeb data from R
Web data from R
 
Search as a Service with Xapian - Search Solutions 2009
Search as a Service with Xapian - Search Solutions 2009Search as a Service with Xapian - Search Solutions 2009
Search as a Service with Xapian - Search Solutions 2009
 
Proud to be polyglot!
Proud to be polyglot!Proud to be polyglot!
Proud to be polyglot!
 
Evolutionary & Swarm Computing for the Semantic Web
Evolutionary & Swarm Computing for the Semantic WebEvolutionary & Swarm Computing for the Semantic Web
Evolutionary & Swarm Computing for the Semantic Web
 
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLARiot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
 
Jake_Park_resume
Jake_Park_resumeJake_Park_resume
Jake_Park_resume
 
Implementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoCImplementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoC
 
Content Mirror
Content MirrorContent Mirror
Content Mirror
 
Thinking restfully
Thinking restfullyThinking restfully
Thinking restfully
 
Web Topics
Web TopicsWeb Topics
Web Topics
 

Mehr von Nuxeo

Own the Digital Shelf Strategies Food and Beverage Companies
Own the Digital Shelf Strategies Food and Beverage CompaniesOwn the Digital Shelf Strategies Food and Beverage Companies
Own the Digital Shelf Strategies Food and Beverage CompaniesNuxeo
 
How DAM Librarians Can Get Ready for the Uncertain Future
How DAM Librarians Can Get Ready for the Uncertain FutureHow DAM Librarians Can Get Ready for the Uncertain Future
How DAM Librarians Can Get Ready for the Uncertain FutureNuxeo
 
How Insurers Fueled Transformation During a Pandemic
How Insurers Fueled Transformation During a PandemicHow Insurers Fueled Transformation During a Pandemic
How Insurers Fueled Transformation During a PandemicNuxeo
 
Manage your Content at Scale with MongoDB and Nuxeo
Manage your Content at Scale with MongoDB and NuxeoManage your Content at Scale with MongoDB and Nuxeo
Manage your Content at Scale with MongoDB and NuxeoNuxeo
 
Accelerate the Digital Supply Chain From Idea to Support
Accelerate the Digital Supply Chain From Idea to SupportAccelerate the Digital Supply Chain From Idea to Support
Accelerate the Digital Supply Chain From Idea to SupportNuxeo
 
Where are you in the DAM Continuum
Where are you in the DAM ContinuumWhere are you in the DAM Continuum
Where are you in the DAM ContinuumNuxeo
 
Customer Experience in 2021
Customer Experience in 2021Customer Experience in 2021
Customer Experience in 2021Nuxeo
 
L’IA personnalisée, clé d’une gestion de l’information innovante
L’IA personnalisée, clé d’une gestion de l’information innovanteL’IA personnalisée, clé d’une gestion de l’information innovante
L’IA personnalisée, clé d’une gestion de l’information innovanteNuxeo
 
Gérer ses contenus avec MongoDB et Nuxeo
Gérer ses contenus avec MongoDB et NuxeoGérer ses contenus avec MongoDB et Nuxeo
Gérer ses contenus avec MongoDB et NuxeoNuxeo
 
Le DAM en 2021 : Tendances, points clés et critères d'évaluation
Le DAM en 2021 : Tendances, points clés et critères d'évaluationLe DAM en 2021 : Tendances, points clés et critères d'évaluation
Le DAM en 2021 : Tendances, points clés et critères d'évaluationNuxeo
 
Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...
Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...
Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...Nuxeo
 
Elevate your Customer's Experience and Stay Ahead of the Competition
Elevate your Customer's Experience and Stay Ahead of the CompetitionElevate your Customer's Experience and Stay Ahead of the Competition
Elevate your Customer's Experience and Stay Ahead of the CompetitionNuxeo
 
Driving Brand Loyalty Through Superior Customer Experience
Driving Brand Loyalty Through Superior Customer Experience Driving Brand Loyalty Through Superior Customer Experience
Driving Brand Loyalty Through Superior Customer Experience Nuxeo
 
Drive Enterprise Speed and Scale with A Cloud-Native DAM
Drive Enterprise Speed and Scale with A Cloud-Native DAMDrive Enterprise Speed and Scale with A Cloud-Native DAM
Drive Enterprise Speed and Scale with A Cloud-Native DAMNuxeo
 
The Big Picture: the Role of Video, Photography, and Content in Enhancing the...
The Big Picture: the Role of Video, Photography, and Content in Enhancing the...The Big Picture: the Role of Video, Photography, and Content in Enhancing the...
The Big Picture: the Role of Video, Photography, and Content in Enhancing the...Nuxeo
 
How Creatives Are Getting Creative in 2020 and Beyond
How Creatives Are Getting Creative in 2020 and BeyondHow Creatives Are Getting Creative in 2020 and Beyond
How Creatives Are Getting Creative in 2020 and BeyondNuxeo
 
Digitalisation : Améliorez la collaboration et l’expérience client grâce au DAM
Digitalisation : Améliorez la collaboration et l’expérience client grâce au DAMDigitalisation : Améliorez la collaboration et l’expérience client grâce au DAM
Digitalisation : Améliorez la collaboration et l’expérience client grâce au DAMNuxeo
 
Reimagine Your Claims Process with Future-Proof Technologies
Reimagine Your Claims Process with Future-Proof TechnologiesReimagine Your Claims Process with Future-Proof Technologies
Reimagine Your Claims Process with Future-Proof TechnologiesNuxeo
 
Comment le Centre Hospitalier Laborit dématérialise ses processus administratifs
Comment le Centre Hospitalier Laborit dématérialise ses processus administratifsComment le Centre Hospitalier Laborit dématérialise ses processus administratifs
Comment le Centre Hospitalier Laborit dématérialise ses processus administratifsNuxeo
 
Accelerating the Packaging Design Process with Artificial Intelligence
Accelerating the Packaging Design Process with Artificial IntelligenceAccelerating the Packaging Design Process with Artificial Intelligence
Accelerating the Packaging Design Process with Artificial IntelligenceNuxeo
 

Mehr von Nuxeo (20)

Own the Digital Shelf Strategies Food and Beverage Companies
Own the Digital Shelf Strategies Food and Beverage CompaniesOwn the Digital Shelf Strategies Food and Beverage Companies
Own the Digital Shelf Strategies Food and Beverage Companies
 
How DAM Librarians Can Get Ready for the Uncertain Future
How DAM Librarians Can Get Ready for the Uncertain FutureHow DAM Librarians Can Get Ready for the Uncertain Future
How DAM Librarians Can Get Ready for the Uncertain Future
 
How Insurers Fueled Transformation During a Pandemic
How Insurers Fueled Transformation During a PandemicHow Insurers Fueled Transformation During a Pandemic
How Insurers Fueled Transformation During a Pandemic
 
Manage your Content at Scale with MongoDB and Nuxeo
Manage your Content at Scale with MongoDB and NuxeoManage your Content at Scale with MongoDB and Nuxeo
Manage your Content at Scale with MongoDB and Nuxeo
 
Accelerate the Digital Supply Chain From Idea to Support
Accelerate the Digital Supply Chain From Idea to SupportAccelerate the Digital Supply Chain From Idea to Support
Accelerate the Digital Supply Chain From Idea to Support
 
Where are you in the DAM Continuum
Where are you in the DAM ContinuumWhere are you in the DAM Continuum
Where are you in the DAM Continuum
 
Customer Experience in 2021
Customer Experience in 2021Customer Experience in 2021
Customer Experience in 2021
 
L’IA personnalisée, clé d’une gestion de l’information innovante
L’IA personnalisée, clé d’une gestion de l’information innovanteL’IA personnalisée, clé d’une gestion de l’information innovante
L’IA personnalisée, clé d’une gestion de l’information innovante
 
Gérer ses contenus avec MongoDB et Nuxeo
Gérer ses contenus avec MongoDB et NuxeoGérer ses contenus avec MongoDB et Nuxeo
Gérer ses contenus avec MongoDB et Nuxeo
 
Le DAM en 2021 : Tendances, points clés et critères d'évaluation
Le DAM en 2021 : Tendances, points clés et critères d'évaluationLe DAM en 2021 : Tendances, points clés et critères d'évaluation
Le DAM en 2021 : Tendances, points clés et critères d'évaluation
 
Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...
Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...
Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...
 
Elevate your Customer's Experience and Stay Ahead of the Competition
Elevate your Customer's Experience and Stay Ahead of the CompetitionElevate your Customer's Experience and Stay Ahead of the Competition
Elevate your Customer's Experience and Stay Ahead of the Competition
 
Driving Brand Loyalty Through Superior Customer Experience
Driving Brand Loyalty Through Superior Customer Experience Driving Brand Loyalty Through Superior Customer Experience
Driving Brand Loyalty Through Superior Customer Experience
 
Drive Enterprise Speed and Scale with A Cloud-Native DAM
Drive Enterprise Speed and Scale with A Cloud-Native DAMDrive Enterprise Speed and Scale with A Cloud-Native DAM
Drive Enterprise Speed and Scale with A Cloud-Native DAM
 
The Big Picture: the Role of Video, Photography, and Content in Enhancing the...
The Big Picture: the Role of Video, Photography, and Content in Enhancing the...The Big Picture: the Role of Video, Photography, and Content in Enhancing the...
The Big Picture: the Role of Video, Photography, and Content in Enhancing the...
 
How Creatives Are Getting Creative in 2020 and Beyond
How Creatives Are Getting Creative in 2020 and BeyondHow Creatives Are Getting Creative in 2020 and Beyond
How Creatives Are Getting Creative in 2020 and Beyond
 
Digitalisation : Améliorez la collaboration et l’expérience client grâce au DAM
Digitalisation : Améliorez la collaboration et l’expérience client grâce au DAMDigitalisation : Améliorez la collaboration et l’expérience client grâce au DAM
Digitalisation : Améliorez la collaboration et l’expérience client grâce au DAM
 
Reimagine Your Claims Process with Future-Proof Technologies
Reimagine Your Claims Process with Future-Proof TechnologiesReimagine Your Claims Process with Future-Proof Technologies
Reimagine Your Claims Process with Future-Proof Technologies
 
Comment le Centre Hospitalier Laborit dématérialise ses processus administratifs
Comment le Centre Hospitalier Laborit dématérialise ses processus administratifsComment le Centre Hospitalier Laborit dématérialise ses processus administratifs
Comment le Centre Hospitalier Laborit dématérialise ses processus administratifs
 
Accelerating the Packaging Design Process with Artificial Intelligence
Accelerating the Packaging Design Process with Artificial IntelligenceAccelerating the Packaging Design Process with Artificial Intelligence
Accelerating the Packaging Design Process with Artificial Intelligence
 

Kürzlich hochgeladen

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Kürzlich hochgeladen (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Apachecon 2011 stanbol_ogrisel

  • 1. Apache Stanbol (Incubating) and the Web of Data Olivier Grisel, Nuxeo ogrisel@apache.org, 2011-11-11 11/7/11
  • 2. My Background 11/7/11 Olivier Grisel - R&D Engineer nuxeo Open Source ECM    European project: IKS Stuff I do: Machine Learning Natural Language Processing  All things data
  • 3. Agenda 11/7/11 The Web of Data: what, why, how? CMS integration demo Semantic Components in Stanbol Building models for Stanbol
  • 4. The Web of Data What, Why, How?
  • 5.  
  • 6. 11/7/11 “ To a computer, then, the web is a  flat ,  boring  world devoid of  meaning ” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • 7. 11/7/11 “ This is a pity, as in fact  documents  on the web describe  real objects  and imaginary  concepts , and give particular  relationships  between them” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • 8. 11/7/11 “ The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning , better enabling computers and people to work in cooperation.” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • 9. 11/7/11 “ Adding semantics to the web involves two things: allowing  documents  which have information in  machine-readable  forms, and allowing  links  to be created with  relationship values .” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • 10.
  • 11.
  • 13. Decoding User Intents 11/7/11 Next Generation User Interfaces Siri - conversational interface IBM DeepQA: Watson for Heath Care Tell Google about your stuff Publish structured descriptions of your products &quot;3 bedrooms flat near Montmartre&quot; Useful for non-public data as well Intranet query: &quot;ApacheCon slides&quot; Intranet query: &quot;Xerox invoices&quot; Intranet query: &quot;Xerox salesperson email&quot;
  • 14. The Web of Data - How? 11/7/11 RDF / TripeStores / Sparql Graph stores with dynamic schemas Strong interoperability JSON-LD Upgrade your JSON with scoped vocabularies Web / Mobile / JS developer friendly RDFa + schema.org & rNews Publish annotation in structured markup Vocabulary understood by Search Engines
  • 15. HTML example 11/7/11 <p>   My name is Manu Sporny and you can give me a ring via   1-800-555-0155.     <img src=&quot;http://manu.sporny.org/images/manu.png&quot; />      I have a <a href=&quot;http://manu.sporny.org/&quot;>blog</a>. </p>
  • 16. RDFa example 11/7/11 <p vocab=&quot;http://schema.org/&quot;     prefix=&quot;foaf: http://xmlns.com/foaf/0.1/&quot;    about=&quot;#manu&quot; typeof=&quot;Person&quot; >   My name is <span property=&quot;name&quot; >Manu Sporny</span>   and you can give me a ring via   <span property=&quot;telephone&quot; >1-800-555-0155</span>.     <img rel=&quot;image&quot;     src=&quot;http://manu.sporny.org/images/manu.png&quot; />      I have a <a rel=&quot;foaf:weblog&quot;     href=&quot;http://manu.sporny.org/&quot;>blog</a>. </p>
  • 18. 11/7/11 2007 2008 2009 2010
  • 19. 2011
  • 20. Bridging the Web of Data and my CMS
  • 21.  
  • 22. Apache Stanbol 11/7/11 Enhancer Text analysis with Apache OpenNLP  / Tika EntityHub / ContentHub Linked Data Indexing with Apache Solr Graph Storage with Apache Clerezza / Jena Reasoner / Rules Inference with Apache Jena & OWLApi  Components / HTTP Services OSGi with Apache Felix / JAX-RS with Jersey
  • 23.  
  • 24.  
  • 25.  
  • 26.  
  • 28. Minimalist HTTP Client 11/7/11 curl -X POST -H &quot;Accept: text/turtle&quot; -H &quot;Content-type: text/plain&quot; --data &quot;John Smith was born in London.&quot; http://stanbol.demo.nuxeo.com/engines
  • 29.  
  • 30.  
  • 31. Local IT infrastructure (LAN) Nuxeo DM addon 1 1 Apache Stanbol 1 2 1 Engine 1 Engine 2 Engine 3 3 DBpedia Freebase Geonames LDAP
  • 32. Stanbol Enhancer 11/7/11 Chain of Enhancement Engines Language Detection (Tika) Named Entity Detection (OpenNLP) Linked Data dereferencing (Solr) Refactoring / Translation (Jena)
  • 33. Stanbol EntityHub 11/7/11 Referenced Sites DBpedia Geonames (NY Times, MusicBrainz, ProductDB, UnitProt...) Fast local offline indices (Solr) Batch indexing utilities for RDF dumps Multilingual fulltext search in labels & descriptions Vocabulary mapping / merging
  • 34. Stanbol Reasoner 11/7/11 RDFS / OWL-lite / OWL2 Consistency checks Cardinality checks: each person has 1 birth date Range constraints: birth dates are valid dates Materializing types / properties Types from subclass: Musician > Artist > Person Symmetric property: A worked with B Transitive property: A is a located in B Query-time expansion / inference?
  • 35. Stanbol Rules 11/7/11 Simple Prolog-like language uncleRule[ has(<http://example.org/family.owl#hasParent>, ?x, ?z) . has(<http://example.org/family.owl#hasSibling>, ?z, ?y) -> has(<http://example.org/family.owl#hasUncle>, ?x, ?y) ] Sparql Construct or SWRL PREFIX family: <http://example.org/family.owl#> CONSTRUCT { ?x family:hasUncle} ?y } WHERE { ?x family:hasParent ?z . ?z family:hasSibling ?y}
  • 36. Online Demos 11/7/11 Simple analyzer with small index https://stanbol.demo.nuxeo.com All services deployed http://dev.iks-project.eu:8081
  • 37. Building Stanbol Enhancer models from Wikipedia with the Apache data tools
  • 38. Universal Topic Classification 11/7/11 Use Apache Lucene / Solr MoreLikeThis to perform a truncated nearest neighbors query in the TF-IDF vector space of Wikipedia
  • 39. Universal Topic Classification 11/7/11 Index text of all articles grouped by topic Solr MoreLikeThis query on new document DBpedia dumps provide: Text summaries for each article “ subject” relationships between articles and topics “ broader” / “narrower” SKOS hieararchy between topics
  • 40. About the Data 11/7/11 500k purely technical categories “ People_with_missing_birth_place”, “Rivers_in_Romania” 70k “semantically grounded” categories Paths to roots require both “ technical” and “grounded” categories Scale: 1.2M topic / topic links 30M topic / article links
  • 41. Some results (Wikinews) 11/7/11 US children who celebrate Independence Day more likely to become Republicans, says Harvard study Fireworks Voting theory Republican Party (United States) Statistics Electoral systems
  • 42. Some results (Wikinews) 11/7/11 U.S. space agency NASA sues ex-astronaut American astronauts Aviation halls of fame Edwards Air Force Base Apollo program Exploration of the Moon
  • 43. Some results (Wikinews) 11/7/11 Hundreds of thousands of British public sector workers strike over planned pension changes Retirement in the United Kingdom United Kingdom pensions and benefits Pensions in the United Kingdom Labor disputes by country Labor disputes
  • 44. Some results (PLoS One) 11/7/11 Metabolic Programming during Lactation Stimulates Renal Na+ Transport in the Adult Offspring Due to an Early Impact on Local Angiotensin II Pathways Renal physiology Kidney Nephrology Hypertension Membrane biology
  • 45. Wrap Up 11/7/11 Web of Data brings Sructured Context Frame to decode  User Intention NLP + Entities & Topics indices to automate Content Enrichment to provide Disambiguationn
  • 46. Resources 11/7/11 Documentation, svn, mailing list:   http://incubator.apache.org/stanbol IKS project blog:   http://blog.iks-project.eu Blog posts about Semantic ECM:   http://blogs.nuxeo.com/dev/semantic/
  • 47. Thank you for your attention! 11/7/11 Olivier Grisel [email_address] https://twitter.com/ogrisel
  • 48. Training models for NER from Wikipedia Extract sentences with link positions in Wikipedia articles DBPedia to the find type of the target entity (Person, Location, Organization) Apache Pig scripts to compute the join + format the result as training files for OpenNLP Apache OpenNLP to build and evaluate the models Apache Hadoop / Apache Whirr for distributed processing
  • 49.  
  • 50.  
  • 51.  
  • 52.