A Service-Based Architecture for Multi-domain Search on the Web

A service-based Architecture for
Multi-domain Search on the Web
ICSOC 2010 - 8th International Conference on Service Oriented Computing
San Francisco
December 7th - 10th 2010

Motivation

• Search is a fundamental activity for information management

• Search engines are superb in their ability of extracting documents that most
closely match with user’s query

• But they fail in going beyond such simple paradigm!

• Example:

a theater close to Union Square, San Francisco,
• “Find
showing a recent thriller movie, close to a steak house”

Search for a solution using keywords

Not useful....

Let’s split the tasks, and search for
theaters using keywords

We got results

Check for showtimes

down in the result set... no thriller movies! Search for a new one

Now let’s look for a steak house

What is Google doing?

• Guessing the user’s intention

• Solving her atomic task (ﬁnd - and book - a restaurant, ﬁnd a theatre)

• Yahoo: We’re Moving From Web Of Pages To Web Of Objects

• http://techcrunch.com/2010/09/16/live-from-yahoos-product-runway-
event/

• “There’s going to be a blurring between typing in a query and getting a
bunch of a links… People just want answers. Let you buy a ticket right
from a movie result.”

What should we - as users - do?

•Work on subproblems

• Start a search process

• Use available search services ... and there’s a lot of!
• ProgrammableWeb
• YQL
• GoogleBase
• Linked Data
• ...

•Try to be smart!

The Search Computing Project

• Build theories, methods and tools to support search
processes
• Given a multi-domain query
• Given a set of search services
• Build several “solutions” which already integrate all the search
process dimensions
• Ranks “solutions” according to a global rank function and output
results in rank order
• Support user-friendly query deﬁnition and result browsing
• Add search domains while the search process proceeds
• Possibly change the relative weight of each ranking

Why search services deserve special handling?

•Ranking
•Chunked results
•Data and control driven orchestration
• Need for ad-hoc optimization strategies of query plans [1]
• Algorithms for the computation of composite results (based on
different join strategies) [2]

•Novel user interaction paradigms [3]

[1] Braga, D., Ceri, S., Corcoglioniti, F., Grossniklaus, M. 2010. Panta Rhei: Flexible Execution Engine
for Search Computing Queries, in Ceri, S., Brambilla, M. (eds.) 2010. Search Computing Challenges and
Directions. Springer LNCS vol. 5950

[2] Martinenghi, D., Tagliasacchi, M. 2010. Proximity Rank Join. VLDB 2010, Singapore, September 2010
[3] Bozzon, A., Brambilla, M., Ceri, S., Fraternali, P. 2010. Liquid Query: Multi-Domain Exploratory Search
on the Web, WWW 2010, Raleigh, North Carolina, April 2010.

An architecture for multi-domain search

On-line Off-line
Client

Client Tools

Load Balancer
Server

Cache Query Proﬁler /
Processor Analyzer

Service Wrappers Materializer
Services

Services

Client Tier
Browser
“Search as a process” enactment Liquid Query and Result Application
View Model
Widget Transient
• Query submission Liquid
Query Liquid
Persistent

Module Result
Module Data Management

• Result browsing and query

Controller
Interaction Application State
UI Builder
reﬁnement Controller Manager

Communication

• Interaction primitives that support result manipulation,
exploration and expansion of the search results

• Alternative data visualizations

www.search-computing.net/demo/UI

Server Tier
QUERY ANALYZER
Query Analyzer API
Query Parser

Query Planner

Query Optimizer

EXECUTION ENGINE
Engine API
Service Repository
Session
Manager Service Repository API
Engine Controller
Service Repository
Cache Chain Of Controller
Manager Responsability

Persistency
Local
Manager Functs
Persistent
Service Invoker Query Repository
Custom Custom QEP Query Repository API
Legacy Web Standard Wrappers
Invoker
Wrappers Wrappers Query Repository
REST YQL GBase DB WSDL SPARQL
Controller

Server Tier
• Query analyzer and optimizer
• translate SQL-like queries into an executable plan

DEFINE QUERY NightPlan($X:String, $Y: Integer , $U:String, $V:String, $W:String) AS
SELECT M.*, T.*, R.*, TotalPrice=T.Price + R.AvgPrice
FROM ((Movie (iGenre: $X, iYear: $Y) AS M USING IMDB_MOVIES,
JOIN
Theatre (iAddress: $U, iCity: $V, iCountry: $W) AS T USING GOOGLE_DISPLAYING
ON M.Title=T.Title)
JOIN
Restaurant (iCountry: $W, iCategory: "Italian Restaurant") AS R USING YQL_LOCAL
ON T.address=R.Address AND T.city=R.City)
WHERE R.Rating>3
RANK BY (R=0.4, T=0.3, M=0.3)
LIMIT 20 TUPLES AND 50 CALLS

• transformation takes into account data dependencies, execution
constraints, service proﬁle informations

Server Tier
• Execution Engine Movie

• Executes service execution plans while Restaurant

satisfying the execution constraints
Theater
• Synchronous and asynchronous search
mechanism
• Service Repository
• Abstraction of actual search services
• Input/Output/Ranking
• Set of connection patterns, pairwise
coupling of services attributes to pre-deﬁne
joins
• Reference to actual search services

Some toughs on distribution

On-line Off-line
Client

Client Tools

Load Balancer
Server

Cache Query Proﬁler /
Processor Analyzer

Service Wrappers Materializer
Services

Services

Experimental Evaluation
• Query
• “Find a theater close to Union Square, San Francisco, showing a
recent thriller movie, close to a steak house”

• Services
• IMDB Archive, Google Movie Search, Yahoo Local Search

• Demo
• http://www.search-computing.com/UIMovie

Results
• Service invocation is the most time-consuming Movie

task Restaurant

• Number of service invocation depends on the Theater

topology of execution plan
• Cache can lead to huge performance
improvements
40 40

Execution time (sec)
Service invocations

30 30

20 20

10 10

0 0
0 10 20 30 40 50 60 70 80 90 100 0 0.25 0.5 0.75 1
Combinations Cache hit probability
movie theater restaurant 0.25 s 0.5 s 1s 2s 4s

Conclusion and Future Works

• We propose an architecture supporting the execution of
multi-domain queries using search services service
based environment

• Our implementation, demonstrations, and experimental
results show the feasibility of the approach

• Architecturally speaking, there is a lot of space for
improvements
• Smart control strategies, such as top-k joins
• Efﬁcient search service pre-fetching and
materialization
• Dynamic evolution of execution plans according to an
exploratory search approach

Thanks!

Questions?

www.search-computing.net!

bozzon@elet.polimi.it!

A Service-Based Architecture for Multi-domain Search on the Web

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to A Service-Based Architecture for Multi-domain Search on the Web

Similar to A Service-Based Architecture for Multi-domain Search on the Web (20)

More from Alessandro Bozzon

More from Alessandro Bozzon (11)

Recently uploaded

Recently uploaded (20)

A Service-Based Architecture for Multi-domain Search on the Web

Editor's Notes