SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Making the Web Searchable Peter Mika  Senior Researcher and Data Architect Yahoo! Inc.
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Convergence of  Search and Online Media
It used to be pretty simple…
Yahoo! today is a global network of online media sites
... with search as an important entry point to content Information box with content from and links to Yahoo! Travel Points of interest in Vienna, Austria Since Aug, 2010, ‘regular’ search results are ‘Powered by Bing’ Shopping results from  Yahoo! Shopping
Conversely, online media as an entry point to search Hovering over an underlined phrase triggers a search for related news items.
Aggregation across space: hyperlocal pages Hyperlocal: showing content from across Yahoo that is relevant to a particular neighbourhood.
Aggregation across entity types: special events
Personalization Yahoo’s Content Optimization Relevance Engine (CORE) technology uses machine learning to predict click behavior based on user profile Display advertizing is also personalized by default. Users can opt-out of behavioral targeting through AdChoices.
Contextualization Show related content Social discovery: connect with friends watching the same
Convergence of search and online media ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Semantic technologies for Search
Search is really fast, without necessarily being intelligent
State of Search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Not just search…
What it’s like to be a machine? Roi Blanco
What it’s like to be a machine?  ✜ Θ ♬♬ţğ   ✜ Θ ♬♬ţğ √∞  ®ÇĤĪ ✜★  ♬☐ ✓✓ ţğ  ★  ✜   ✪✚✜ Δ ΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫ Γ ≠ =⅚ ©§ ★✓♪ ΒΓΕ  ℠   ✖ Γ ♫⅜±  ⏎ ↵⏏  ☐ģğğğμλκσςτ   ⏎  ⌥ °¶§ΥΦΦΦ ✗✕ ☐ 
If machines are dumb, how to make their job easier? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Enter the Semantic Web ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
History of metadata in HTML ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
HTML meta tags ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Microformats (μf) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: the hCard microformat <cite  class=&quot;vcard&quot; > <a  class=&quot;fn url&quot;  rel=&quot;friend colleague met” href=&quot;http://meyerweb.com/&quot;> Eric Meyer</a> </cite> wrote a post (<cite> <a href=&quot;http://meyerweb.com/eric/thoughts/2005/12/16/tax-relief/&quot;> Tax Relief</a></cite>) about an unintentionally humorous letter he received from  the <span  class=&quot;vcard” > <a  class=&quot;fn org url&quot;  href=&quot;http://irs.gov/&quot;> Internal Revenue Service</a>  </span>.  <div  class=&quot;vcard&quot; >  <a  class=&quot;email fn&quot;  href=&quot;mailto:jfriday@host.com&quot;>Joe Friday</a>  <div  class=&quot;tel&quot; >+1-919-555-7878</div>  <div  class=&quot;title&quot; >Area Administrator, Assistant</div>  </div>
Microformats: limitations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
RDFa ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
RDFa evolution ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: Yahoo! Enhanced Results (was: SearchMonkey) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: Google’s Rich Snippets ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: Facebook’s Like and the Open Graph Protocol ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: Facebook’s Open Graph Protocol ,[object Object],[object Object],[object Object],[object Object],<html  xmlns:og=&quot;http://opengraphprotocol.org/schema/&quot; >  <head>  <title>The Rock (1996)</title>  <meta  property=&quot;og:title&quot;  content=&quot;The Rock&quot; />  <meta  property=&quot;og:type&quot;  content=&quot;movie&quot; />  <meta  property=&quot;og:url&quot;  content=&quot;http://www.imdb.com/title/tt0117500/&quot; />  <meta  property=&quot;og:image&quot;  content=&quot;http://ia.media-imdb.com/images/rock.jpg&quot; /> … </head> ...
Example: rNews ,[object Object],[object Object],[object Object],[object Object],[object Object]
Microdata ,[object Object],[object Object],[object Object],[object Object],<div  itemscope itemid=“http://www.yahoo.com/resource/person ”> <p>My name is <span  itemprop=&quot;name&quot; >Neil</span>.</p> <p>My band is called  <span  itemprop =&quot;band&quot;>Four Parts Water</span>. I was born on  <time  itemprop=&quot;birthday&quot;  datetime=&quot;2009-05-10&quot;>May 10th 2009</time>. <img  itemprop=&quot;image&quot;  src=”me.png&quot; alt=”me”> </p> </div
Competing formats, competing schemas ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
schema.org ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1 st  schema.org workshop (Sept 21, 2011) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Current state of semantic search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
RDFa on the rise Percentage of URLs with embedded metadata in various formats 510% increase between March, 2009 and October, 2010
Semantic Search development ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Semantic technologies for Data Integration
Today’s world is a Web of Pages
All these pages come from structured knowledge about people, places, and things MLB team Chicago Cubs Is a Chicago Barack Obama Carlos Zambrano 10% off tickets for plays for plays in from
This underlying world is WOO—the Web of Objects MLB team Chicago Cubs Is a Chicago Barack Obama Carlos Zambrano 10% off tickets for plays for plays in from
Today our knowledge of this world is siloed, incomplete, inconsistent, inaccurate, and hard to reuse Sports Entertainment Finance Local Shopping Upcoming MLB team Chicago Cubs isa Chicago Scott Roy Carlos Zambrano 10% off tickets for plays for plays in from
Our vision is a single shared knowledge base—accurate, scalable, and easy to reuse MLB team Chicago Cubs isa Chicago Barack Obama Carlos Zambrano 10% off tickets for plays for plays in from
Knowledge comes from many sources Entities Attributes Show times and other information for US movies from source B Harry Potter and the Deathly Hallows part II Show times Show times for Harry Potter and the Deathly Hallows part II
Combining these requires working with complementary, parallel, and overlapping sources Attributes Entities Cast information for global movies from Wikipedia Cast information for US movies from source A Cast and show time information for global movies from licensed feeds
There is a tremendous opportunity to do this directly from Web pages, reverse engineering the Web Attributes Entities Information from structured data extraction on billions of Web pages
Semantic technologies for data integration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Components ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
WOO ontology ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
WOO ontology cntd. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Value #1 — Breadth, depth, and accuracy at scale Real entities Dups, errors, and outdated entities Up-to-date correct entities  Incorrect store URL No photo We show many entities we shouldn’t No business hours WOO improves our breadth, depth, and accuracy by combining knowledge from alternative sources, and by modernizing how we do matching, blending, and de-duping
Value #2 — Agility launching new experiences Answers instead of links WOO lets us quickly create entity centric DD modules using the existing knowledge in the KB Related knowledge in context The integrated KB lets us show relevant knowledge from one Yahoo property on other properties and off network Emerging markets and tail pages The KB gets us deep into the tail by combining and blending knowledge from many sources
Other potential benefits ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Innovative media companies are moving in this direction Courtesy of Silver Oliver (BBC)
Innovative media companies are moving in this direction Courtesy of Evan Sandhaus (NYT).
Take home: use what works! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The End ,[object Object],[object Object],[object Object]

Weitere ähnliche Inhalte

Was ist angesagt?

RDFa Introductory Course Session 3/4 Why RDFa
RDFa Introductory Course Session 3/4 Why RDFaRDFa Introductory Course Session 3/4 Why RDFa
RDFa Introductory Course Session 3/4 Why RDFaPlatypus
 
Get on the Linked Data Web!
Get on the Linked Data Web!Get on the Linked Data Web!
Get on the Linked Data Web!Armin Haller
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech LegislationMartin Necasky
 
Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the RisePeter Mika
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011sssw2011
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic  Web and Linked DataAn introduction to Semantic  Web and Linked Data
An introduction to Semantic Web and Linked DataGabriela Agustini
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialPeter Mika
 
Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Juan Sequeda
 
Gain Super Powers in Data Science: Relationship Discovery Across Public Data
Gain Super Powers in Data Science: Relationship Discovery Across Public DataGain Super Powers in Data Science: Relationship Discovery Across Public Data
Gain Super Powers in Data Science: Relationship Discovery Across Public DataOntotext
 
Making things findable
Making things findableMaking things findable
Making things findablePeter Mika
 
Diving in Panama Papers and Open Data to Discover Emerging News
Diving in Panama Papers and Open Data to Discover Emerging NewsDiving in Panama Papers and Open Data to Discover Emerging News
Diving in Panama Papers and Open Data to Discover Emerging NewsOntotext
 
Linked Data Usecases
Linked Data UsecasesLinked Data Usecases
Linked Data UsecasesMyungjin Lee
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked DataJuan Sequeda
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Peter Mika
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataOntotext
 
Semantic Search at Yahoo
Semantic Search at YahooSemantic Search at Yahoo
Semantic Search at YahooPeter Mika
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upDavide Palmisano
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Peter Mika
 

Was ist angesagt? (20)

RDFa Introductory Course Session 3/4 Why RDFa
RDFa Introductory Course Session 3/4 Why RDFaRDFa Introductory Course Session 3/4 Why RDFa
RDFa Introductory Course Session 3/4 Why RDFa
 
Get on the Linked Data Web!
Get on the Linked Data Web!Get on the Linked Data Web!
Get on the Linked Data Web!
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
 
Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the Rise
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic  Web and Linked DataAn introduction to Semantic  Web and Linked Data
An introduction to Semantic Web and Linked Data
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorial
 
Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010
 
Gain Super Powers in Data Science: Relationship Discovery Across Public Data
Gain Super Powers in Data Science: Relationship Discovery Across Public DataGain Super Powers in Data Science: Relationship Discovery Across Public Data
Gain Super Powers in Data Science: Relationship Discovery Across Public Data
 
Making things findable
Making things findableMaking things findable
Making things findable
 
Diving in Panama Papers and Open Data to Discover Emerging News
Diving in Panama Papers and Open Data to Discover Emerging NewsDiving in Panama Papers and Open Data to Discover Emerging News
Diving in Panama Papers and Open Data to Discover Emerging News
 
Linked Data Usecases
Linked Data UsecasesLinked Data Usecases
Linked Data Usecases
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked Data
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Semantic search
Semantic searchSemantic search
Semantic search
 
Semantic Search at Yahoo
Semantic Search at YahooSemantic Search at Yahoo
Semantic Search at Yahoo
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking up
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
 

Ähnlich wie Making the Web searchable

Yahoo Making The Web Searchable
Yahoo  Making The  Web  SearchableYahoo  Making The  Web  Searchable
Yahoo Making The Web Searchablekksst
 
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorialThengo Kim
 
Recent Trends in Semantic Search Technologies
Recent Trends in Semantic Search TechnologiesRecent Trends in Semantic Search Technologies
Recent Trends in Semantic Search TechnologiesThanh Tran
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0John Breslin
 
Semantic Web
Semantic WebSemantic Web
Semantic Webhardchiu
 
The Semantic Web An Introduction
The Semantic Web An IntroductionThe Semantic Web An Introduction
The Semantic Web An Introductionshaouy
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.Shyjal Raazi
 
Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Bradley Allen
 
Semantic Web, Cataloging, & Metadata
Semantic Web, Cataloging, & MetadataSemantic Web, Cataloging, & Metadata
Semantic Web, Cataloging, & Metadatarobin fay
 
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data21Style
 
Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008Blogtalk 2008
 
Microformats I: What & Why
Microformats I: What & WhyMicroformats I: What & Why
Microformats I: What & WhyRachael L Moore
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016Jessie Chuang
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic WebIvan Herman
 
Metadata first, ontologies second
Metadata first, ontologies secondMetadata first, ontologies second
Metadata first, ontologies secondJoseba Abaitua
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Webliddy
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Researchadameq
 
Accessibility, Automation and Metadata
Accessibility, Automation and MetadataAccessibility, Automation and Metadata
Accessibility, Automation and Metadatalisbk
 

Ähnlich wie Making the Web searchable (20)

Yahoo Making The Web Searchable
Yahoo  Making The  Web  SearchableYahoo  Making The  Web  Searchable
Yahoo Making The Web Searchable
 
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorial
 
Recent Trends in Semantic Search Technologies
Recent Trends in Semantic Search TechnologiesRecent Trends in Semantic Search Technologies
Recent Trends in Semantic Search Technologies
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0
 
Semantic Web
Semantic WebSemantic Web
Semantic Web
 
The Semantic Web An Introduction
The Semantic Web An IntroductionThe Semantic Web An Introduction
The Semantic Web An Introduction
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
 
Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)
 
Semantic Web, Cataloging, & Metadata
Semantic Web, Cataloging, & MetadataSemantic Web, Cataloging, & Metadata
Semantic Web, Cataloging, & Metadata
 
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
 
Hacia la Internet del Futuro: Web Semántica y Open Linked Data, Parte 2
Hacia la Internet del Futuro: Web Semántica y Open Linked Data, Parte 2Hacia la Internet del Futuro: Web Semántica y Open Linked Data, Parte 2
Hacia la Internet del Futuro: Web Semántica y Open Linked Data, Parte 2
 
Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008
 
Document repositories-and-metadata
Document repositories-and-metadataDocument repositories-and-metadata
Document repositories-and-metadata
 
Microformats I: What & Why
Microformats I: What & WhyMicroformats I: What & Why
Microformats I: What & Why
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
Metadata first, ontologies second
Metadata first, ontologies secondMetadata first, ontologies second
Metadata first, ontologies second
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
 
Accessibility, Automation and Metadata
Accessibility, Automation and MetadataAccessibility, Automation and Metadata
Accessibility, Automation and Metadata
 

Mehr von Peter Mika

What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?Peter Mika
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in PracticePeter Mika
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through EntitiesPeter Mika
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsPeter Mika
 
Related Entity Finding on the Web
Related Entity Finding on the WebRelated Entity Finding on the Web
Related Entity Finding on the WebPeter Mika
 
Hackathon s pb
Hackathon s pbHackathon s pb
Hackathon s pbPeter Mika
 
Investigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisInvestigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisPeter Mika
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011Peter Mika
 
Semantic Web Austin Yahoo
Semantic Web Austin YahooSemantic Web Austin Yahoo
Semantic Web Austin YahooPeter Mika
 

Mehr von Peter Mika (9)

What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in Practice
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through Entities
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistants
 
Related Entity Finding on the Web
Related Entity Finding on the WebRelated Entity Finding on the Web
Related Entity Finding on the Web
 
Hackathon s pb
Hackathon s pbHackathon s pb
Hackathon s pb
 
Investigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisInvestigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log Analysis
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011
 
Semantic Web Austin Yahoo
Semantic Web Austin YahooSemantic Web Austin Yahoo
Semantic Web Austin Yahoo
 

Kürzlich hochgeladen

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Kürzlich hochgeladen (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Making the Web searchable

  • 1. Making the Web Searchable Peter Mika Senior Researcher and Data Architect Yahoo! Inc.
  • 2.
  • 3. Convergence of Search and Online Media
  • 4. It used to be pretty simple…
  • 5. Yahoo! today is a global network of online media sites
  • 6. ... with search as an important entry point to content Information box with content from and links to Yahoo! Travel Points of interest in Vienna, Austria Since Aug, 2010, ‘regular’ search results are ‘Powered by Bing’ Shopping results from Yahoo! Shopping
  • 7. Conversely, online media as an entry point to search Hovering over an underlined phrase triggers a search for related news items.
  • 8. Aggregation across space: hyperlocal pages Hyperlocal: showing content from across Yahoo that is relevant to a particular neighbourhood.
  • 9. Aggregation across entity types: special events
  • 10. Personalization Yahoo’s Content Optimization Relevance Engine (CORE) technology uses machine learning to predict click behavior based on user profile Display advertizing is also personalized by default. Users can opt-out of behavioral targeting through AdChoices.
  • 11. Contextualization Show related content Social discovery: connect with friends watching the same
  • 12.
  • 14. Search is really fast, without necessarily being intelligent
  • 15.
  • 17. What it’s like to be a machine? Roi Blanco
  • 18. What it’s like to be a machine?  ✜ Θ ♬♬ţğ   ✜ Θ ♬♬ţğ √∞  ®ÇĤĪ ✜★  ♬☐ ✓✓ ţğ  ★  ✜   ✪✚✜ Δ ΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫ Γ ≠ =⅚ ©§ ★✓♪ ΒΓΕ  ℠   ✖ Γ ♫⅜±  ⏎ ↵⏏  ☐ģğğğμλκσςτ   ⏎  ⌥ °¶§ΥΦΦΦ ✗✕ ☐ 
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Example: the hCard microformat <cite class=&quot;vcard&quot; > <a class=&quot;fn url&quot; rel=&quot;friend colleague met” href=&quot;http://meyerweb.com/&quot;> Eric Meyer</a> </cite> wrote a post (<cite> <a href=&quot;http://meyerweb.com/eric/thoughts/2005/12/16/tax-relief/&quot;> Tax Relief</a></cite>) about an unintentionally humorous letter he received from the <span class=&quot;vcard” > <a class=&quot;fn org url&quot; href=&quot;http://irs.gov/&quot;> Internal Revenue Service</a> </span>. <div class=&quot;vcard&quot; > <a class=&quot;email fn&quot; href=&quot;mailto:jfriday@host.com&quot;>Joe Friday</a> <div class=&quot;tel&quot; >+1-919-555-7878</div> <div class=&quot;title&quot; >Area Administrator, Assistant</div> </div>
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38. RDFa on the rise Percentage of URLs with embedded metadata in various formats 510% increase between March, 2009 and October, 2010
  • 39.
  • 40. Semantic technologies for Data Integration
  • 41. Today’s world is a Web of Pages
  • 42. All these pages come from structured knowledge about people, places, and things MLB team Chicago Cubs Is a Chicago Barack Obama Carlos Zambrano 10% off tickets for plays for plays in from
  • 43. This underlying world is WOO—the Web of Objects MLB team Chicago Cubs Is a Chicago Barack Obama Carlos Zambrano 10% off tickets for plays for plays in from
  • 44. Today our knowledge of this world is siloed, incomplete, inconsistent, inaccurate, and hard to reuse Sports Entertainment Finance Local Shopping Upcoming MLB team Chicago Cubs isa Chicago Scott Roy Carlos Zambrano 10% off tickets for plays for plays in from
  • 45. Our vision is a single shared knowledge base—accurate, scalable, and easy to reuse MLB team Chicago Cubs isa Chicago Barack Obama Carlos Zambrano 10% off tickets for plays for plays in from
  • 46. Knowledge comes from many sources Entities Attributes Show times and other information for US movies from source B Harry Potter and the Deathly Hallows part II Show times Show times for Harry Potter and the Deathly Hallows part II
  • 47. Combining these requires working with complementary, parallel, and overlapping sources Attributes Entities Cast information for global movies from Wikipedia Cast information for US movies from source A Cast and show time information for global movies from licensed feeds
  • 48. There is a tremendous opportunity to do this directly from Web pages, reverse engineering the Web Attributes Entities Information from structured data extraction on billions of Web pages
  • 49.
  • 50.
  • 51.
  • 52.
  • 53. Value #1 — Breadth, depth, and accuracy at scale Real entities Dups, errors, and outdated entities Up-to-date correct entities Incorrect store URL No photo We show many entities we shouldn’t No business hours WOO improves our breadth, depth, and accuracy by combining knowledge from alternative sources, and by modernizing how we do matching, blending, and de-duping
  • 54. Value #2 — Agility launching new experiences Answers instead of links WOO lets us quickly create entity centric DD modules using the existing knowledge in the KB Related knowledge in context The integrated KB lets us show relevant knowledge from one Yahoo property on other properties and off network Emerging markets and tail pages The KB gets us deep into the tail by combining and blending knowledge from many sources
  • 55.
  • 56. Innovative media companies are moving in this direction Courtesy of Silver Oliver (BBC)
  • 57. Innovative media companies are moving in this direction Courtesy of Evan Sandhaus (NYT).
  • 58.
  • 59.

Hinweis der Redaktion

  1. Everything is search: search and online media are converging businesses
  2. Yahoo serves over 600 million users in 25 countries 38% of O&amp;O revenue from search advertizing, 53% from display advertizing, 9% from listings and other marketing services (Q3 2010)
  3. Search is a form of content aggregation
  4. Improvements in search are harder and harder to come by…. The current search paradigm reached a plateau: we have solved large classes of queries, and what remains is difficult to solve in the current paradigm.
  5. With ads, the situation is even worse due to the sparsity problem. Note how poor the ads are…
  6. This is how a human sees the world.
  7. This is how a machine sees the world… Machines are not ‘intelligent’ and can not ‘read’… they just see a string of symbols and try to match the users input to that stream.
  8. However, we can make the job of the machine easier by giving some hints…
  9. Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns
  10. Facebook invited, but continues to pursue OGP
  11. Publisher: schema.org enable your website, publish Linked Data Developer: build standard APIs using Linked Data technology