1. Build your Own Search Service
Chris Heilmann
Saurabh Sahni
Open Hack Day 2009 - Bangalore
http://www.slideshare.net/saurabhsahni/
2. Outline
• Search engines using BOSS
• About BOSS API
– What?
– Why?
– Features
• How to use it
– BOSS API
– Code example
– BOSS Mashup framework
-2-
17. What?
• Open Yahoo’s core search features via web services to
let 3rd parties revolutionize Search
http://developer.yahoo.com/search/boss
- 17 -
18. Opening the search technology stack
Rank
Assist
EXTRACT
Retrieve
SPAM <-> Gold Usage
CRAWL
Web Map
Analyze
Index Index
50B pages * 20ms page download = 31 years
- 18 -
19. Opening the search technology stack
Your App here
WEB API
Rank
Assist
EXTRACT
Retrieve
SPAM <-> Gold Usage
CRAWL
Web Map
Analyze
Index Index
50B pages * 20ms page download = 31 years
- 19 -
21. BOSS API features
• No branding or attribution
• Ability to change presentation stlye
• Ability to re-order results and blend-in additional content
• Access to multiple verticals (web search, image, news)
• Keyword suggestions, spell checks
• Semantic data, in-links, abstracts
• Ability to monetize
- 21 -
23. Get Started
• Register for an application id
http://developer.yahoo.com/wsregapp/
• Documentation
http://developer.yahoo.com/search/boss/boss_guide/
• Code samples: Javascript, PHP and Python
http://www.saurabhsahni.com/boss-examples.zip
- 23 -
24. BOSS API
Searching
Slumdog
Millionaire
(Source: http://en.wikipedia.org/wiki/File:Slumdog_Millionaire_poster.jpg)
- 24 -
25. BOSS API
• Search for slumdog millionaire:
– http://boss.yahooapis.com/ysearch/web
/v1/slumdog+millionaire
?appid=xyz&format=xml
- 25 -
26. BOSS API: XML response
http://boss.yahooapis.com/ysearch/web/v1/slumdog+millionaire?appid=xyz&format=xml
- 26 -
27. Site Restrict Search
• Search for slumdog millionaire on selected movie sites
– Add param sites=indiatimes.com,movies.yahoo.com,imdb.com
– http://boss.yahooapis.com/ysearch/web/v1/slumdog
+millionaire?appid=xyz&sites=indiatimes.co
m%2Cmovies.yahoo.com&format=xml
- 27 -
45. BOSS Search API REST Interface
http://boss.yahooapis.com/ysearch/{vert}/v1/{query}
• {query}: term to look for (url-encoded)
• {vert} := {web, news, images, spelling}
• @ required
– appid
• @ optional
– start, count, lang, region, format, callback, sites, view
- 45 -
46. Site Explorer
• Get page inlinks
– http://boss.yahooapis.com/ysearch/se_inlink/v1/{URL}
?appid={APPID}
• Page data: collection of subpages in a domain
– http://boss.yahooapis.com/ysearch/se_pagedata/v1/{URL}
?appid={APPID}
- 46 -
47. BOSS Mashup Framework
• Python (v2.5+) library
• BOSS Search SDK plus …
• SQL for remixing arbitrary XML/JSON sources
http://developer.yahoo.com/search/boss/mashup.html
- 47 -
48. BMF + Google App Engine
• Enhanced version of BMF to GAE platform
• http://zooie.wordpress.com/2008/08/04/yahoo-boss-google-app-engine-integrated/
• Enables quick deployment of BOSS applications online
- 48 -
49. More BOSS Implementations
• http://mashable.com/boss/
• http://delicious.com/tag/bossmashup
• Add yours by tagging it with “bossmashup” on
Del.icio.us!
- 49 -
51. BOSS Custom
Your App here
WEB API
Rank
Assist
EXTRACT
Retrieve
SPAM <-> Gold Usage
CRAWL
Web Map
Analyze
Index
50B pages * 20ms page download = 31 years
- 51 -
54. Search UI Templates are Included in the
BOSS Mashup Framework
http://www.yahoo.com
BOSS Mashup Framework simplifies aggregating and presenting multiple data sources
- 54 -
55. BMF Features
• select, group, sort, union, joins, udfs, where
• Text normalization and duplicate removal
• Auto-transformation of resource-oriented API results
into tables w/o parsing
• All-in-memory storage and retrieval operations
• Ability to join lists of tables via an arbitrary predicate
function (map-like)
• Search UI template framework
• Single search function provides total access to
BOSS REST API
- 55 -
56. BOSS in Academic Research
• The biggest dataset available on web
• Very useful for Web-mining research experiments
– Natural language processing
– Semantic extraction
– Related keywords
– Similarity detection
– Clustering algorithms
– Spelling corrections
- 56 -