This document proposes an architecture for multi-domain search across the web using search services. It describes a motivation for going beyond simple keyword search to support more complex queries. An example complex query is provided. The architecture supports splitting complex queries into subproblems, using available search services, and integrating results. Key aspects include ranking, handling of chunked/distributed results, and novel user interactions. Experimental results demonstrate feasibility and opportunities for improvements like smart control strategies and efficient pre-fetching.
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
A Service-Based Architecture for Multi-domain Search on the Web
1. A service-based Architecture for
Multi-domain Search on the Web
ICSOC 2010 - 8th International Conference on Service Oriented Computing
San Francisco
December 7th - 10th 2010
2. Motivation
• Search is a fundamental activity for information management
• Search engines are superb in their ability of extracting documents that most
closely match with user’s query
• But they fail in going beyond such simple paradigm!
• Example:
a theater close to Union Square, San Francisco,
• “Find
showing a recent thriller movie, close to a steak house”
3. Search for a solution using keywords
Not useful....
4. Let’s split the tasks, and search for
theaters using keywords
We got results
9. What is Google doing?
• Guessing the user’s intention
• Solving her atomic task (find - and book - a restaurant, find a theatre)
• Yahoo: We’re Moving From Web Of Pages To Web Of Objects
• http://techcrunch.com/2010/09/16/live-from-yahoos-product-runway-
event/
• “There’s going to be a blurring between typing in a query and getting a
bunch of a links… People just want answers. Let you buy a ticket right
from a movie result.”
10. What should we - as users - do?
•Work on subproblems
• Start a search process
• Use available search services ... and there’s a lot of!
• ProgrammableWeb
• YQL
• GoogleBase
• Linked Data
• ...
•Try to be smart!
11. The Search Computing Project
• Build theories, methods and tools to support search
processes
• Given a multi-domain query
• Given a set of search services
• Build several “solutions” which already integrate all the search
process dimensions
• Ranks “solutions” according to a global rank function and output
results in rank order
• Support user-friendly query definition and result browsing
• Add search domains while the search process proceeds
• Possibly change the relative weight of each ranking
12. Why search services deserve special handling?
•Ranking
•Chunked results
•Data and control driven orchestration
• Need for ad-hoc optimization strategies of query plans [1]
• Algorithms for the computation of composite results (based on
different join strategies) [2]
•Novel user interaction paradigms [3]
[1] Braga, D., Ceri, S., Corcoglioniti, F., Grossniklaus, M. 2010. Panta Rhei: Flexible Execution Engine
for Search Computing Queries, in Ceri, S., Brambilla, M. (eds.) 2010. Search Computing Challenges and
Directions. Springer LNCS vol. 5950
[2] Martinenghi, D., Tagliasacchi, M. 2010. Proximity Rank Join. VLDB 2010, Singapore, September 2010
[3] Bozzon, A., Brambilla, M., Ceri, S., Fraternali, P. 2010. Liquid Query: Multi-Domain Exploratory Search
on the Web, WWW 2010, Raleigh, North Carolina, April 2010.
13. An architecture for multi-domain search
On-line Off-line
Client
Client Tools
Load Balancer
Server
Cache Query Profiler /
Processor Analyzer
Service Wrappers Materializer
Services
Services
14. Client Tier
Browser
“Search as a process” enactment Liquid Query and Result Application
View Model
Widget Transient
• Query submission Liquid
Query Liquid
Persistent
Module Result
Module Data Management
• Result browsing and query
Controller
Interaction Application State
UI Builder
refinement Controller Manager
Communication
• Interaction primitives that support result manipulation,
exploration and expansion of the search results
• Alternative data visualizations
www.search-computing.net/demo/UI
15. Server Tier
QUERY ANALYZER
Query Analyzer API
Query Parser
Query Planner
Query Optimizer
EXECUTION ENGINE
Engine API
Service Repository
Session
Manager Service Repository API
Engine Controller
Service Repository
Cache Chain Of Controller
Manager Responsability
Persistency
Local
Manager Functs
Persistent
Service Invoker Query Repository
Custom Custom QEP Query Repository API
Legacy Web Standard Wrappers
Invoker
Wrappers Wrappers Query Repository
REST YQL GBase DB WSDL SPARQL
Controller
16. Server Tier
• Query analyzer and optimizer
• translate SQL-like queries into an executable plan
DEFINE QUERY NightPlan($X:String, $Y: Integer , $U:String, $V:String, $W:String) AS
SELECT M.*, T.*, R.*, TotalPrice=T.Price + R.AvgPrice
FROM ((Movie (iGenre: $X, iYear: $Y) AS M USING IMDB_MOVIES,
JOIN
Theatre (iAddress: $U, iCity: $V, iCountry: $W) AS T USING GOOGLE_DISPLAYING
ON M.Title=T.Title)
JOIN
Restaurant (iCountry: $W, iCategory: "Italian Restaurant") AS R USING YQL_LOCAL
ON T.address=R.Address AND T.city=R.City)
WHERE R.Rating>3
RANK BY (R=0.4, T=0.3, M=0.3)
LIMIT 20 TUPLES AND 50 CALLS
• transformation takes into account data dependencies, execution
constraints, service profile informations
17. Server Tier
• Execution Engine Movie
• Executes service execution plans while Restaurant
satisfying the execution constraints
Theater
• Synchronous and asynchronous search
mechanism
• Service Repository
• Abstraction of actual search services
• Input/Output/Ranking
• Set of connection patterns, pairwise
coupling of services attributes to pre-define
joins
• Reference to actual search services
18. Some toughs on distribution
On-line Off-line
Client
Client Tools
Load Balancer
Server
Cache Query Profiler /
Processor Analyzer
Service Wrappers Materializer
Services
Services
19. Experimental Evaluation
• Query
• “Find a theater close to Union Square, San Francisco, showing a
recent thriller movie, close to a steak house”
• Services
• IMDB Archive, Google Movie Search, Yahoo Local Search
• Demo
• http://www.search-computing.com/UIMovie
20. Results
• Service invocation is the most time-consuming Movie
task Restaurant
• Number of service invocation depends on the Theater
topology of execution plan
• Cache can lead to huge performance
improvements
40 40
Execution time (sec)
Service invocations
30 30
20 20
10 10
0 0
0 10 20 30 40 50 60 70 80 90 100 0 0.25 0.5 0.75 1
Combinations Cache hit probability
movie theater restaurant 0.25 s 0.5 s 1s 2s 4s
21. Conclusion and Future Works
• We propose an architecture supporting the execution of
multi-domain queries using search services service
based environment
• Our implementation, demonstrations, and experimental
results show the feasibility of the approach
• Architecturally speaking, there is a lot of space for
improvements
• Smart control strategies, such as top-k joins
• Efficient search service pre-fetching and
materialization
• Dynamic evolution of execution plans according to an
exploratory search approach
All these aspects also challenge the performance and scalability issues of traditional SOA solutions\n
All these aspects also challenge the performance and scalability issues of traditional SOA solutions\n\n
All these aspects also challenge the performance and scalability issues of traditional SOA solutions\n
\n
\n
\n
Cache working both for search service results and queries\n\nLoad balancer distributes the execution among several engine instances, which can share the same cache\n\nService profiling can enanche the performance of the system by updating run-time information about services and the query processor\n\nData prefetch and service materialization can reduce latency time\n
\n
SX: average number of service invocation required to get the top-k results with k ranging from 1 to 100, averaged out of several inputs\n\nRestaurants invocation is almost linear, since it is in pipe and it returns on average 10 query results (chunk size). Fewer invocation for Movie and Theathres are required (they are on the left side of the pipe). \n\nDX: impact of caching on the average execution time, for cache hit probabilities ranging from 0 to 1\n20 concurrent clients, top 10 results, synthetic services\n