SlideShare ist ein Scribd-Unternehmen logo
1 von 80
SearchComputing
Stefano Ceri, Keynote talk at CAISE, Hammamet, June 9, 2010
Joint work with: Adnan Abid, Mamoun Abu Helu, Davide Barbieri,
Daniele Braga, Marco Brambilla, Alessandro Bozzon, Alessandro
Campi, Sofia Ceppi, Francesco Corcoglioniti, Emanuele Della Valle,
Davide Eynard, Piero Fraternali, Nicola Gatti, Giorgio Ghisalberghi,
Michael Grossniklaus, Davide Martinenghi, Marco Masseroli,
Maristella Matera, Chiara Pasini, Elena Pellizzotti, Stefania Ronchi,
Marco Tagliasacchi, Luca Tettamanti, Salvatore Vadacca, Riccardo
Volonterio, Serge Zagorac
Genesis of Search Computing


  My “Gong Show” challenge at 2003 Lowell Workshop:
   “Find an ethnical restaurant in a nice place close to Milano” .
  Logically a composition of domains:
    – Restaurants (ethnical)
    – Geo-locations (nice place close to Milano)
  Composing maps with “geo-located” information is now
   solved by all search engines …

   … but in general no system is capable of composing
   arbitrary semantic domains




      Database Management   Prof. Stefano Ceri
Motivating Examples                                         3

 “Who are the strongest candidates in Europe for
  competing on software ideas?”
 “Who is the best doctor who can cure insomnia in a
  close-by hospital?”
 “Where can I attend an interesting scientific conference in
  my field and at the same time relax on a beautiful beach
  nearby?”




     Database Management   Prof. Stefano Ceri
Their Common Aspect                                                     4


 Multi-domain queries
 Individual answers are on the Web
 A knowledgeable user would do the query step-by-step:
   –     Search database conferences, get their city
   –     Check that the city average temperature is warm enough
   –     Search low-cost flights via a broker for that city
   –     Search luxury hotels via another broker

 We want a system for supporting this search process
   – Build several “solutions” which already integrate all dimensions
   – Rank “solutions” according to a global rank function and output
     results in rank order
   – Support user-friendly query definition and result browsing
   – Add search domains while the search proceeds
   – Possibly change the relative weight of each ranking

       Database Management   Prof. Stefano Ceri
5




OVERALL FRAMEWORK


  Database Management   Prof. Stefano Ceri
Search Computing architecture: overall view                                                             6
                                                     Front End


                                 High-Level Query
                                                                   Final User
                 Query Analysis                                     Results
                                    Cache

                                          Sub-queries
                                                                                  Cache
                 Query To Domain                                     Result
                 Mapper                                              Transformation
                                    Cache

                                    Low-level queries
                                                               Merged Results
                 Query Planner
                                Cache

                                 Concrete
                                Query Plan
         Query Engine
                                                                     WS-Framework          Main Query flow
          OP 1   OP 2     ...     OP N      Cache
                                                                                  Cache



                                                                                           <Uses> relation


        Domain                               Domain            Service              WS
        Framework                           Repository        Repository           World
                          Cache




    Database Management                  Prof. Stefano Ceri
Search Computing architecture: overall view                                                                7
                                               Front End
                                     High level query
                                 “Where can I attend a DB
   Sub query 1                 scientific conference close to
                                High-Level Query
“Where can I attend            a beautiful beach reachable Final User
                     Query Analysis with cheap flights?”    Results
  a DB scientific    Sub query 2
   conference?”                   Cache
                    “place close to
                      a beautiful     Sub-queries
                        beach?”         Sub query 3                        Cache
                     Query To Domain                          Result
                     Mapper
                                     “place reachable         Transformation
                                    with cheap flight?”
                                  Cache

                                            Low-level queries
                                                                       Merged Results
                         Query Planner
                                        Cache

                                         Concrete
                                        Query Plan
                 Query Engine
                                                                             WS-Framework        Main Query flow
                  OP 1   OP 2     ...     OP N      Cache
                                                                                        Cache



                                                                                                 <Uses> relation


                Domain                              Domain             Service            WS
                Framework                          Repository         Repository         World
                                  Cache




            Database Management                  Prof. Stefano Ceri
Search Computing architecture: overall view                                                                8
                                                           Front End


                                       High-Level Query
                                                                         Final User
                       Query Analysis                                     Results
                                          Cache

                                                Sub-queries
     Low level query 1
                                                                                        Cache
ConfSearch(“DB”,placeX,dateY)
                    Query To Domain                                        Result
                    Mapper                                                 Transformation
               Low level query 2Cache

          TourSearch(“Beach”,PlaceX) queries
                                Low-level
                                                                     Merged Results
                       Query Planner Low level query 3
                             Flight(“cost<200”,PlaceX,DateY)
                                      Cache

                                       Concrete
                                      Query Plan
               Query Engine
                                                                           WS-Framework          Main Query flow
                OP 1   OP 2     ...     OP N      Cache
                                                                                        Cache



                                                                                                 <Uses> relation


              Domain                               Domain            Service              WS
              Framework                           Repository        Repository           World
                                Cache




          Database Management                  Prof. Stefano Ceri
Search Computing architecture: overall view                                                               9
                                                     Front End


                                 High-Level Query
                                                                Presented results
                                                                  Final User
                 Query Analysis                                    Results
                                                                ESWC-Crete-Olympic
                                    Cache
                                                                CAISE- Hammamet – Alitalia
                                                                TOOLS-Malaga-EasyJet
                                          Sub-queries
                                                                                  Cache
                 Query To Domain                                     Result
                 Mapper                                              Transformation
                                    Cache

                                    Low-level queries
                                                               Merged Results
                 Query Planner                                               Results
                                Cache
                    Query plan
                                 Concrete
                                Query Plan
         Query Engine
                                                                     WS-Framework            Main Query flow
                                            Cache
          OP 1   OP 2     ...     OP N       Services invocations                 Cache
                                            and operators execution
                                                                                             <Uses> relation


        Domain                               Domain            Service              WS
        Framework                           Repository        Repository           World
                          Cache




    Database Management                  Prof. Stefano Ceri
Search Computing architecture: incremental prototyping
                                                                                                                                                                   11
                                                                                                                              Front End




                                   Concrete Query Plan
                                                         Low-level queries
                                                                             Sub-queries
                                                                                                           High-Level Query
                                                                                                                                          Final User
                                                                                             Query Analysis                                Results
                                                                                                              Cache

                                                                                                                   Sub-queries
                                                                                                                                                         Cache




                                                                                                                                                                   Admin Interface
                                                                                             Query To Domain                                Result
                                                                                             Mapper                                         Transformation
                                                                                                              Cache

                                                                                                              Low-level queries
                                                                                                                                     Merged Results
                                                                                             Query Planner
                                                                                                          Cache

                                                                                                           Concrete
Prototype 1:                                                                                              Query Plan
Core behaviour of the
                                                                                 Query Engine
system.                                                                                                                                     WS-Framework
                                                                                      OP 1   OP 2   ...     OP N      Cache
                                                                                                                                                          Cache
• Engine-based execution
of queries
• Domain repository                                                                                                                                    <Uses> relation
• Service repository
• Coarse result
presentation
                                                                              Domain                                   Domain       Service                 WS
                                                                              Framework                               Repository   Repository              World
                                                                                                    Cache




             Database Management                   Prof. Stefano Ceri
Search Computing architecture: incremental prototyping
                                                                                                                                                                     12
                                                                                                                                Front End




                                     Concrete Query Plan
                                                           Low-level queries
                                                                               Sub-queries
                                                                                                             High-Level Query
                                                                                                                                            Final User
                                                                                               Query Analysis                                Results
                                                                                                                Cache

                                                                                                                     Sub-queries
                                                                                                                                                           Cache




                                                                                                                                                                     Admin Interface
                                                                                               Query To Domain                                Result
Prototype 2:                                                                                   Mapper                                         Transformation
                                                                                                                Cache
Planning
                                                                                                                Low-level queries
                                                                                                                                       Merged Results
• Automatic optimized
query planning                                                                                 Query Planner
                                                                                                            Cache

                                                                                                             Concrete
Prototype 1:                                                                                                Query Plan
Core behaviour of the
                                                                                   Query Engine
system.                                                                                                                                       WS-Framework
                                                                                        OP 1   OP 2   ...     OP N      Cache
                                                                                                                                                            Cache
• Engine-based execution
of queries
• Domain repository                                                                                                                                      <Uses> relation
• Service repository
• Coarse result
presentation
                                                                                Domain                                   Domain       Service                 WS
                                                                                Framework                               Repository   Repository              World
                                                                                                      Cache




               Database Management                   Prof. Stefano Ceri
Search Computing architecture: incremental prototyping
                                                                                                                                                                     13
Prototype 4:
                                                                                                                                Front End
High level queries




                                     Concrete Query Plan
                                                           Low-level queries
Prototype 3:




                                                                               Sub-queries
                                                                                                             High-Level Query
Mapping and                                                                                                                                 Final User
presentation                                                                                   Query Analysis                                Results
                                                                                                                Cache
• mapping to domains
• presentation of results                                                                                            Sub-queries
                                                                                                                                                           Cache




                                                                                                                                                                     Admin Interface
                                                                                               Query To Domain                                Result
Prototype 2:                                                                                   Mapper                                         Transformation
                                                                                                                Cache
Planning
                                                                                                                Low-level queries
                                                                                                                                       Merged Results
• Automatic optimized
query planning                                                                                 Query Planner
                                                                                                            Cache

                                                                                                             Concrete
Prototype 1:                                                                                                Query Plan
Core behaviour of the
                                                                                   Query Engine
system.                                                                                                                                       WS-Framework
                                                                                        OP 1   OP 2   ...     OP N      Cache
                                                                                                                                                            Cache
• Engine-based execution
of queries
• Domain repository                                                                                                                                      <Uses> relation
• Service repository
• Coarse result
presentation
                                                                                Domain                                   Domain       Service                 WS
                                                                                Framework                               Repository   Repository              World
                                                                                                      Cache




               Database Management                   Prof. Stefano Ceri
CAISE FOCUS on: Service Registration                                    14

                                               Service Marts:
                                               •   Conceptualrepresentatio
                                                   nofresourcesasentities
                                                   and connections
                                               •   Logicalrepresentationofs
                                                   ignatures
                                               •   Physicalrepresentationa
                                                   s service
                                                   implementations




    Database Management   Prof. Stefano Ceri
CAISE FOCUS on: Front-end                                                                                         15

Liquid Query                                                                   Front End

   Client-
                                                            High-Level Query
    sideframeworkforconfi                                                                  Final User
    guration and                              Query Analysis                                Results
    automaticrenderingof                                       Cache

    query and                                                       Sub-queries
    resultinterfaces                          Query To Domain                                Result       Cache

                                              Mapper                                         Transformation
   User interaction                                           Cache

                                                               Low-level queries
    primitives that allow to                                                          Merged Results
    perform explanatory                       Query Planner

    search                                                 Cache

                                                            Concrete
                                                           Query Plan
                                      Query Engine
                                                                                             WS-Framework
                                       OP 1   OP 2   ...     OP N      Cache
                                                                                                          Cache




                                     Domain                             Domain       Service                WS
                                     Framework                         Repository   Repository             World
                                                     Cache

        Database Management    Prof. Stefano Ceri
CAISE FOCUS on: Development Process                                                                                                                                   16

Development Support




                                                                                                                                         Deploy
                                                  <<implements>>                                        <<deploys>>




                                                                                                                                          Time
                                                                      Search Services                                    SeCo platform
   Environment                           Service
                                        Developer
                                                                                               SeCo
                                                                                               Expert

Tools supporting




                                                                                                                                            Service Publishing Time
                                                                              Wrapping
   Service Registration                            <<implements>>




   Query Design                                         <<defines>>       Materialization /
                                                                            Normalization
   Performance Monitoring                   Service
                                            Publisher
                                                                                                    <<uses>>
                                                    <<performs>>           Registration of
                                                                            Service Mart       <<produces>>


                                                                                                               Service Mart
                                                                                                 <<uses>>       Repository




                                                                                                                                         Config.
                                                                                                                                          Time
                                                        <<defines>>         Liquid Query
                                                                                                <<produces>>
                                                                              Template
                                             Expert
                                              User




                                                                                                                                            Execution Time
                                                                                                 <<uses>>
                                                       <<submits>>          Liquid Query                       User Interface
                                                                                                               Specification

                                                       <<manipulates>>
                                                                                                      <<uses>>
                                            Final User                      Liquid Result




       Database Management   Prof. Stefano Ceri
17




SERVICE REGISTRATION


  Database Management   Prof. Stefano Ceri
Service Registration in SeCo
Objective: providing a framework for registering services as
  first-class citizens within SeCo

=> Service Marts
      High-level abstractions of “real world entities” that provide a simple
       interface to users and hide implementation details
      Inspired by Data Marts, a data modeling pattern used in data
       warehousing
      Each Service Mart can have multiple modalities of data access and
       can be mapped to multiple service implementations, possibly
       offered by different providers

=>Connection Patterns
      High-level abstractions of “real world relationships” that provide a
       simple interface to users and hide implementation details
      Built by means of attributes that share the same domains

       Database Management   Prof. Stefano Ceri
Service Marts – Conceptual Level
Every SM definition includes a name and a collection of the exposed
  attributes,i.e. the attributes of the real world object described by the SM
          Movie(Title, Director, Year, Language, Genres(Genre), Actors(Name, Sex))


    Atomic, single valued, typed attributes
    Repeating groups (multi-valued, typed attributes)
           Each “repeating group” is a non-empty set of typed sub-attributes
            that collectively defines a property of the service mart

The model choices are:
           To support structural complexity with only one level of nesting
            (rather than an arbitrary level of nestings)
           To avoid explicit descriptions of relationship (using repeating
            groups for M:N relationships)




             Database Management   Prof. Stefano Ceri
Service Marts – Logical Level

At this level, each SM is associated with one or more Access Patterns, i.e.:


Movie1(TitleO, DirectorO, ScoreRO, YearO, LanguageI, Genres.GenreO,
  Actors.NameO , Actors.SexO,Genres.GenreI)

Movie2(TitleI, DirectorO, YearO, LanguageO, Genres.GenreO, Actors.NameO ,
  Actors.SexO)

    Access patterns contain adorned attributes, i.e. attributes tagged with one of
     the following:
        I, if they are input attributes
        O, if they are Output attributes
        R, if they are attributes used for ranking – they may or may not be visible in output

    Movie1 makes access to movies by Language and Genre (i.e., “action movies
     in English”) and results are ranked by Score (a new attribute).
    Movie2 makes access to movies by Title (e.g. “Ben Hur”). We expect few
     (zero, one, more) results which are not ranked.


         Database Management     Prof. Stefano Ceri
Service Marts – Physical Level


At this level, every Access Pattern can match different
  Service Implementations, having:
        Physical URI to be called
        Physical properties which are specific to the implementation
        Mapping between logical and physical parameters

    IMDBMovie1(MovieTitleO, DirectorO, StarsRO, YearO, LanguageI,
          Genres.GenreI, Actors.NameO , Actors.GenderO)


                  IMDBMovie              AP: Movie1    URI: http://...

                  TTL=6000, chunksize=10, cacheable=true, exposed=false, ...

                    Title     Director       Score     Year       Language     ...

                 MovieTitle   Director       Stars     Year          Lang      ...




       Database Management        Prof. Stefano Ceri
External and Selector Attributes



        external attributes, for supporting access and ranking


SM                   Movie(Title, Director, Year, Language, …)


    AP               Movie1: TitleO | DirectorO | YearO |    …     | ScoreRO | GenreI

    AP               Movien: TitleO | DirectorO | YearO |    …     | TitleI


                                                                              External attributes
        selector attributes, for supporting choices among service implementations

SM                   Movie(Title, Director, Year, Language, …)


                                                                      Language
    SI     Movie Implementation 1
                            ...                                  Selector
    SI     Movie Implementation n



            Database Management         Prof. Stefano Ceri
Connection Patterns


Connections between marts only exist in terms of attributes that share the same
domains, on different levels of abstraction:
       Conceptually by a nondirected edge with a name:
     PlayingMovie(Movie,Theatre)


                               Movie                Theatre




       Logically by an edge (possibly directed) with name and join condition:
     PlayingMovie(Movie,Theatre): (Title=Movie.Title)

                               Movie4               Theatre2




         Database Management   Prof. Stefano Ceri
Connection Patterns – Logical Level

Directed edge: Information is “piped” from one access pattern to another,
   along connection attributes which are in output in the first service and in
   input in the second service -> PIPE JOIN



                Movie1         Title    Director     Score   Year   Language A.Name A.Sex   G.Genre




Theatre1    Name     Address   M.Start M.Title




     Database Management        Prof. Stefano Ceri
Connection Patterns – Logical Level

Undirected edge: results are produced by both access patterns in output and
  then joined -> PARALLEL JOIN


                                    Movie1           Title   Director   Score   …   …   G.Genre




Theatre1    Name     Address   M.Start M.Title




     Database Management        Prof. Stefano Ceri
Join of two Services, Pipe Version, NY City
                                              Search only in NY


                                          Movie                                            Theatre
                           Service Mart                                      Service Mart




                            Movie1                             Movie2                     Theatre1
                 Access                             Access                      Access
                 pattern                            pattern                     pattern




                                                               IMDB2
                                                    Service
                                                   Interface


             IMDB1                 Hyperrev1
  Service                      Service
 Interface                    Interface
                                                                              Google1                NYLocalSearch
                                                                         Service               Service
                                                                        Interface             Interface




        Database Management               Prof. Stefano Ceri
27




JOIN OF TWO SEARCH
SERVICES

  Database Management   Prof. Stefano Ceri
JOIN of Web Services

  Input: items resulting from TWO web service calls,
   possibly ranked
  Output: composed items resulting from the
   concatenation of matching items, presented in a
   “global ranking order”
  Matching condition using:
   – value equality,
   – partial set matching
   – term matching within a vocabulary
   …..
  Services are known, their matching function is
   predefined: this is not service discovery!



     Database Management   Prof. Stefano Ceri
Join                                                                        29

                         Service X                         Service Y



                              bx5                                     by5


                               bx4                                   by4


                                    bx3                         by3


                                     bx2                       by2


                                       bx1               by1




                                                   r1

                                                   r2

                                                   r3

       Database Management          Prof. Stefano Ceri
Matching items                                 30




    Database Management   Prof. Stefano Ceri
Choice of the join strategies
 The join search space
    – Different explorations for different joins methods under different
      assumptions and with different guarantees


    Chunksize



     Chunk
                                                    tij




Any exploration trajectory                       Candidate join result
for this space is a join strategy
      Database Management   Prof. Stefano Ceri
Nested Loop - Rectangular                      32




    Database Management   Prof. Stefano Ceri
Merge scan - Triangular                        33




    Database Management   Prof. Stefano Ceri
Parallel and Pipe Joins                                                          34

   Parallel join of two search services
                                          (1)

                                                   S1




                         (1,2)n                                   C1
                    period: 150 ms                             stop: 10
                                                             excess: (1,1)
                                          (2)

                                                   S2

   Pipe join of two search services
                                        (2)



                             (1)


             (1,10)5(0,1)n             S1               S2            size: 20
            period: 500 ms                                            stop: 1



      Database Management          Prof. Stefano Ceri
35




SUPPORT OF “SIMILARITY
JOINS"

  Database Management   Prof. Stefano Ceri
Supporting value similarity


   Concept of “nearness” is widely implemented depending
    on different contexts, such as:
        Lexical near (similar strings)
        Spatial near (between addresses/geo locations)
        Temporal near (between dates/times)
        Economic near (between costs)

   Context is defined according to the attributes involved


=> Semantics of nearness built bottom-up, starting from the
  physical layer (available services) up to the conceptual
  one.


     Database Management   Prof. Stefano Ceri
Similarity comes from Shared Domains




 The attribute
 “address” is shared
 by the 4 entities. Its         restaurant                      apartment
 semantic type,                             Address         Address
 describing a location,
 enables “nearness”                                   Spatial
 connections between                                  Near
 each pair of entities
 (i.e. addresses can
                                            Address         Address
 be compared for
 “nearness” within the               hotel                       theatre
 same city, country,
 …)




      Database Management   Prof. Stefano Ceri
Supporting Nearness within Services
Severalphysicalservicesnativelysupport ranking
  bydistances (e.g. GoogleMovies)
   E.g.: GoogleMovies receives the user address as input,
    and returns theatres ranked by distance, each one with
    its address as output. UserAddress and Distance are
    external attributes.
GoogleMovies(UserAddressI, DistanceR| NameO, AddressO, Movie.TitleI,
  Movie.StartTimeO)



              GoogleMovies           AP: Theatre1    URI: http://...

               TTL=6000, chunksize=10, cacheable=true, provides=Spatial Near

              UserAddress   Name        Address      M.Title   M.StartTime   ...

                 IAddr      Name         OAddr      MovieTit    MovieTime    ...




     Database Management       Prof. Stefano Ceri
“Nearness” Support within Services


                                        Theatre                                     Restaurant



                                                     Spatial Near




                                                               Restaurant2         Address     Name   Cuisine   Price
                                    Spatial near


Theatre1      UserAddress        Name      Address        M.Title    M.StartTime    Distance




 GoogleMovies               AP: Theatre1 URI: http://...

  TTL=600, chunksize=10, cache=1, provides=Spatial Near

  UserAddr       Name           Address       M.Title          ...

   Addr          Name            Addr       MovieTit           ...


          Database Management             Prof. Stefano Ceri
Nearness Services within the Execution Engine

Ad-hoc services providing the notion of distance at the physical level require
   two domain values as input and produce their distance as output
       Two input attributes to specify two values of the domain
       One output attribute specifies the distance in given units

           SpatialNear           System            URI: http://...

         TTL=600, chunksize=1, cacheable=1, ...

         Input1, Input2: Coordinates       Output: Distance (Km)




        Database Management       Prof. Stefano Ceri
Supporting Nearness within the Execution Engine


                                     Theatre                              Restaurant



                                                 Spatial Near



                                                            Restaurant2   Address   Name    Cuisine   Price


Theatre1      Address      Name       M.Title   M.StartTime




                                                      Spatial Near   Addr1 Addr2 Distance




   SpatialNear             System         URI: http://...

 TTL=600, chunksize=1, cacheable=1, ...

 Input1, Input2: Coordinates      Output: Distance (Km)


          Database Management         Prof. Stefano Ceri
Join of three Services at the three Levels in NY
                             Search only in NY


                        Movie                                                  Theatre                         Restaurant
             Service Mart                                      Service Mart                           Service Mart



                                                                                    Spatial Near




              Movie1                           Movie2                         Theatre1                     Rest1            Rest2
                                                                                                  AP
 Access                            Access                           Access                                           Access
                                                                                               providing
 pattern                           pattern                          pattern                                          pattern
                                                                                              spatial near




                                               IMDB2                                                 Yahoo1                Yahoo2
                                    Service                                                     Service               Service
                                   Interface                                                   Interface             Interface

               IMDB1
  Service
 Interface                  Hyperrev1                           Google1                  NYLocalSearch
                        Service                           Service                  Service
                       Interface                         Interface                Interface




             Database Management               Prof. Stefano Ceri
Three Levels with Connection Semantics




                                        Services                Connections

                                                             Name (with associated
      Conceptual                       Service Mart
                                                                 semantics)
                              Bindings between SM and
                                  AP attributes, plus
                                  definition of extra
                                      attributes
                                                           Join attributes,directed vs
          Logical                    Access Pattern      undirected edge (with nearness
                                                         service APs added as needed)
                                 Bindings between AP
                                   attributes and SI
                                      parameters
                               Service Interface (with
         Physical             associated semantics and        Nearness Services
                                with system services)



    Database Management   Prof. Stefano Ceri
Resource graph
        Specialized way for describing search service based
          knowledge available on the web [ER model, ontology,
          class diagram?]
                                                         News            Restaurant
               Exhibition
                                                                                         ...
Piece
                                                                  ...
                                             Concert
                                                                                        ...
          Artist
                              ...            Photo                                       ...
                                                                    Hotel

                   Movie

  ...                                                                          Metro Station
                                     Theatre
                                                              Landmark
         ...                                                                 ShoppingCenter
                                                  ...
               Database Management       Prof. Stefano Ceri
46




APPLICATION DEVELOPMENT
PROCESS

  Database Management   Prof. Stefano Ceri
SeCo development process




                                            Search Service
                           and Registration Development
 Main Roles:
  • Service
    developer                                                Service developer     Implement search service


  • Service
    publisher



                             Adaptation
                               Service
  • Expert user                                                Service publisher     Wrap or materialize       Register service
  • SeCo expert                                                                           service             mart and interface



 Dichotomy:
                                      Configuration
                                       Application

  • Top-down                                                                                                       Service Mart model
     vs.                                                     Expert user     Design Liquid Query Template

     Bottom-up
  • Run time                                                       Manual optimization
                                                                             needed?
                                                                                                N
                                                                                                                   Liquid Query model
                                                                                            Y
     vs.
                                        Refinement
                                        Query Plan




     Design time                                                                                                    Query Plan model

                                                             SeCo expert         Panta Rhei plan refinement


     Database Management    Prof. Stefano Ceri
The service registration process

                                                     Service
                                                    Description



                                                 SM Identification


                                                                          Buttom up Strategy
                                          YES                        NO
                                                        Some SM
                                                        retrieved
                                                           ?
                             YES
                                                                                SM CREATION
             Modification                   Hybrid Strategy
              of the SM
              structure?           SM UPDATE
             NO
                             Associated SI Update
 Top down Strategy            (new connections)


             SM MAPPING


                                                   AP CREATION

                                                  Service Physical
                                                    Description


                                                           END



       Database Management         Prof. Stefano Ceri
The SM Creation process, with semantic hints
  SM CREATION




                                                             Movie(Title, Director, Score, Year, Genres(Genre),
                                                                Openings(Country, Date), Actors(Name))
       Type             SM Name and attributes
    conventions           schema definition
                                                                  Movie: S: (n) movie, film, picture, moving
                                                                picture, moving-picture show, motion picture,
                                                                 motion-picture show, picture show, pic, flick
                                                               (a form of entertainment that enacts a story by
     WN
                                                                 sound and a sequence of images giving the
                           SM and attributes
                         Semantical Description               illusion of continuous movement) "they went to
                          Synsets (and tags?)
                                                                       a movie every Saturday night";

Automatic recommendation                                            Director: S: (n) film director, director
of connectable SMs                                             (the person who directs the making of a film)
        SM1
                          Connection patterns            Shows(Movie, Theatre): [(Title=Title)]
   Theatres                 (CP) definition

        SMn                                                  Defined CP: Shows Textual_near
                                                             Possible CP: Title (String)  Textual_near
Spatial_near            Composition Language
  Textual near          operators association                Year (Date)  Temporal_near …
     Temporal_near



           Database Management          Prof. Stefano Ceri
The SM Mapping procedure
    SM MAPPING
                Original
                SM
                   Movie(Title, Director, Score, Year, …)
                                               Director: String
                                               Director: S: the
                                               person who
                                               directs the making
                                               of a film)
                                                                    f
                                               Director (String)

                  SI

Selector                           ImdbMovie: Title | Director | Score | Year | …
                                                                                      Auxiliary
 Selector                              CorrespondingSM                               attributes
attributes                                    attributes                            (i.e. query
                                                                                    attributes)




             Database Management          Prof. Stefano Ceri
SeCo Tools
• Online tool suite that covers the whole development
  process
• Mashup-based
• Built by using state of the art technologies:

   1. MVC on the client: Javascript MVC
   2. UI organization and panels: Yahoo! User Interfaces
   3. Diagram drawing and editing: WireIt




     Database Management   Prof. Stefano Ceri
Service Mart Registration                      53




    Database Management   Prof. Stefano Ceri
Mapping editor                                 54




    Database Management   Prof. Stefano Ceri
Query Registration Interface                   55




    Database Management   Prof. Stefano Ceri
Query Registration Editor, Logical Connections   56




    Database Management   Prof. Stefano Ceri
57




LIQUID QUERY INTERFACE


  Database Management   Prof. Stefano Ceri
Liquid Query


“ A new paradigm allowing users to formulate and get responses
   to multi-domain queries through an exploratory information
   seeking approach, based upon structured information
   sources exposed as software services…”

• Composite answers obtained by aggregating search results
  from various domains
• Highlight the contribution of each search service
• Join of results based on the structural information afforded by
  the search service interfaces
• Refine the user query
• Re-shape the result list


      Database Management   Prof. Stefano Ceri
Liquid query definition
        It consists of subsetting and parametrizing the resource
            graph...
                                                          News             Restaurant
                Exhibition
                                                                                           ...
Piece
                                                                    ...
                                              Concert
                                                                                          ...
          Artist
                               ...            Photo                                        ...
                                                                      Hotel

                    Movie

  ...                                                                            Metro Station
                                      Theatre
                                                                Landmark
          ...                                                                  ShoppingCenter
                                                   ...
                = inputs, outputs                     +        GR = global ranking
                Database Management       Prof. Stefano Ceri
Liquid query definition
... And then characterizing the user interaction
                                              News       Restaurant
        Exhibition


                                  Concert
    Artist
                                  Photo
                                                     Hotel
                                                                Expand




Plus:                                                          Metro Station
•   Parametrization of global ranking
•   Data visualization options
•   .. and so on
        Database Management   Prof. Stefano Ceri
Query Submission




          Concert                                   Hotels
      query conditions                         query conditions



    Database Management   Prof. Stefano Ceri
Query Execution & Result Presentation




    Database Management   Prof. Stefano Ceri
63




SECO ENGINE


  Database Management   Prof. Stefano Ceri
Overview
   The tools is aimed at developers and permits to compose, plan and
    run a SeCo query
   Four panels, one for each query processing phase:

         Query                Logical            Physical       Query
       composition           planning            planning      execution




                                                            Splashscreen!




      Database Management   Prof. Stefano Ceri
Query composition (1)




                 Service interface browser
                 • listsregistered service interfaces
                 • Input and output parameters are listed




                 Selected service’s statistics
                 • collected service statistics are displayed
                 • statisticsmaybeeditedfortestingpurposes




    Database Management       Prof. Stefano Ceri
Query composition (2)




     User-entereddatalog-likequery
                                                Queryoptimisationparameters
     • joinsimplicitlyencodedbydatalogvars      • control the behaviourof the planner
     • $varsencodequeryinputsprovided at        • trigger the planning process
       runtime




    Database Management    Prof. Stefano Ceri
Logical planning




    Database Management   Prof. Stefano Ceri
Physical planning




    Database Management   Prof. Stefano Ceri
Query execution (1)



                Executionsession management
                • a sessioncorrespondsto a single queryexecution, where
                  multiple usercommandsmaybeissued
                • query input parameters are specified at
                  sessioninitialisation

                Execution status
                • displays the currentsession status
                • displays the status of the executioncommandsissued so
                  far

                Executioncommandsforms
                • a more-allcommandrequires more queryresults
                • a more-onecommandrequires more resultsbyextracting
                  more data from a specific service invokedby the query




    Database Management     Prof. Stefano Ceri
Query execution (2)




Queryresults
• Displaysrankedr
  esults,
  assoonascomp
  uted


Executiontime
line
• displaysactivati
  onofexecutionu
  nits (e.g.
  service calls)
• usefulto fine
  tune the engine
  and the join
  strategies




                Database Management   Prof. Stefano Ceri
Query execution (3)




                          Service calls log
                          • displays service calls at the chunkgranularity
                          • showsresponsetimes, statistics, cache behaviour




    Database Management        Prof. Stefano Ceri
DEMO




http://demo.search-computing.eu
  Database Management   Prof. Stefano Ceri
73




SUMMARY OF SECO RESULTS


  Database Management   Prof. Stefano Ceri
Results after 18 months                                            74

 Concepts
   – Service marts, rank join methods, pantarhei, liquidquery

 Researchresults
   – Springer LNCS: SearchComputingChallenges and Directions
   – Manypublications (withVLDB,WWW), manyongoingsubmissions
   – Filingof US Patent (top-k method, random&sequentialservices)

 Prototypes
   – Executionenvironment, focus on liquidquery and on integration
   – Design supportenvironment, focus on mashups

 Dissemination
   – Fifteenkeynotetalks, twelvearticles in the Italian press
   – SeCo Web site, SeCo blog, facebook, linked-in, twittercommunities
   – SearchComputing Graduate Course at PoliMi

 Temporary research positions (1 phd, 5 post-ms, 3 post-doc)

        Database Management   Prof. Stefano Ceri
Publications
                                                                                                                                                                                                                75
 SeCo
 - D. Braga, A. Campi, S. Ceri, A. RaffioJoining the results of heterogeneous search enginesInformation Systems, Vol. 33, Issues 7-8, (November-December 2008), Pages 658-680
 - D. Braga, S. Ceri, F. Daniel, D. MartinenghiOptimization of Multi-Domain Queries on the WebVLDB 2008: 562-573, Auckland, New Zealand, August 2008
 - D. Braga, S. Ceri, F. Daniel, D. MartinenghiMashing Up Search Services, IEEE Internet Computing 12(5): 16-23 (2008)
 - D. Braga, D. Calvanese, A. Campi, S. Ceri, F. Daniel, D. Martinenghi, P. Merialdo, R. Torlone, NGS: a framework for multi-domain query answering, ICDE Workshops 2008: 254-261
 - S. Ceri, Search Computin Invited Paper, 25th International Conference on Data Engineering, Shanghai, March 29 - April 2, 2009
 - D. Barbieri, A. Bozzon, D. Braga, M. Brambilla,A. Campi, S. Ceri, E. Della Valle, P. Fraternali, M. Grossniklaus, D. Martinenghi, S. Ronchi, M. TagliasacchiData-driven optimization of -
  search service composition for answering multi-domain queries (USETIM 2009) workshop at VLDB 2009, Lyon, France, August 24-28, 2009
 - M.Brambilla, S. Ceri, Engineering Search Computing Applications: Vision and Challenges The 7th joint meeting of the European Software Engineering Conference (ESEC) and the ACM
  SIGSOFT Symposium on the Foundations of Software Engineering (FSE), Amsterdam, The Netherlands, August 24-28 2009
 - S. Ceri Search Computing The 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Milan, Italy, September 15-18 2009
 - S. Ceppi and N. Gatti, An Automated Mechanism Design Approach for Sponsored Search Auctions with Federated Search Engines In Proceedings of the 12^th Workshop on Agent-
  Mediated Electronic Commerce (AMEC) in the 9^th International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Toronto, Canada May 10 2010
 - D. Martinenghi, M. Tagliasacchi, and S. Ceri Top-k pipe-join International Workshop on Ranking in Databases, Long Beach, USA, March 2010
 - A. Bozzon, M. Brambilla, S. Ceri, P. FraternaliLiquid Query: Multi-Domain Exploratory Search on the WebWWW 2010 - 19th International World Wide Web Conference - Raleigh,
  North Carolina, April 26-30 2010
 - A. Campi, S. Ceri, A. Maesani, S. RonchiDesigning Service Marts for Engineering Search Computing Applications The Tenth International Conference on Web Engineering, ICWE
  2010, Vienna, Austria, July 5-9 2010



 Related
 - M. Brambilla, S. Ceri, I. Celino, D. Cerizza, E. Della Valle, F. M. Facca, A. Turati, C. TziviskouExperiences in the Design of Semantic Services Using Web Engineering Methods and Tools
  Journal on Data Semantics 2008- A. Raffio, D. Braga, S. Ceri, P. Papotti, M. Hernandez Clip: a Visual Language for Explicit Schema Mappings International Conference on Data Engineering (ICDE), April 2008
 - D. Braga, D. Calvanese, A. Campi, S. Ceri, F. Daniel, D. Martinenghi, P. Merialdo, R. TorloneA New Generation Search Engine Supporting Cross Domain Queries Italian Symposium on
  Advanced Database Systems (SEBD), June 2008
 - D. Braga, D. Calvanese, A. Campi, S. Ceri, F. Daniel, D. Martinenghi, P. Merialdo, R. TorloneNGS: a Framework for Multi-Domain Query Answering IIMAS, International Conference on Data
  Engineering Workshops (ICDE), April 2008
 - A. Raffio, D. Braga, S. Ceri, P. Papotti, M. Hernandez Clip: a Tool for Mapping Hierarchical Schemas ACM SIGMOD/PODS Conference, Demo Session, June 2008
 - A. Bozzon, M. Brambilla, P. FraternaliConceptual Modeling of Multimedia Search Applications Using Rich Process Models ICWE 2009, Springer LNCS, vol. 5648, ISBN 978-3-642-02817-5.
 - E. Della Valle, S. Ceri, D. F. Barbieri, D. Braga, A. CampiA First Step Towards Stream Reasoning Future Internet Symposium (FIS) 2008, pp. 72-81.
 - A. Bozzon, M. Brambilla, F. M. Facca, G. ToffettiCarughiA Conceptual Modeling Approach to Business Service Mashup Development IEEE International Conference on Web Services, ICWS
  2009, Los Angeles. IEEE Press, July 2009, pp. 751 - 758.
 - P. Fraternali, M. Brambilla, A. Bozzon, Model-Driven Design of Audiovisual Indexing Processes for Search-Based Applications Content-Based Multimedia Indexing, 2009, CBMI '09, IEEE Press,
  ISBN: 978-1-4244-4265-2, pp. 120-125.
 - D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle and M. Grossniklaus, C-SPARQL: SPARQL for Continuous Querying Proceedings of WWW 2009, 18th International World Wide Web Conference
  (Poster), Madrid, Spain, April 2009
 - D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle and M. GrossniklausContinuous Queries and Real-time Analysis of Social Semantic Data with C-SPARQL
  In Proceedings of SDoW 2009, 2nd ISWC Workshop on Social Data on the Web, Washington, DC, USA, October 2009
 - D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle and M. GrossniklausC-SPARQL: A Continuous Query Language for RDF Data Streams International Journal of Semantic Computing (IJSC), 2010,
  World Scientific Publishing
 - D. F. Barbieri, D. Braga, S. Ceri and M. GrossniklausAn Execution Environment for C-SPARQL Queries In Proceedings of EDBT 2010, 13th International Conference on Extending Database Technology,
 Lausanne, Switzerland, March 2010


                      Database Management                                        Prof. Stefano Ceri
Web Site & Blog                                                  76

 Web Site




 TechWatch Blog




Blog stats: ~ 900 absoluteuniquevisitors in the last twomonths




      Database Management   Prof. Stefano Ceri
Accessesto Web Site & Blog                                                    77
                           Visits: 20% USA, 18% Italy, 6% UK, 4% India, 4% Canada
 Provenance




 Sources




    Database Management   Prof. Stefano Ceri
Search Computing First Workshop
June 17-19, 2009                               78




    Database Management   Prof. Stefano Ceri
Search Computing Challenges and Directions
 (LNCS, vol. 5950, Ceri-Brambilla eds.)                                         79

 Part 1: Vision
   –   Ceri: Search computing
   –   Baeza-Yates: Next generation search
   –   Weikum: Search for knowledge

 Part 2: Technology Watch
   –   Della Valle-Buganza-Gatti: The search engine industry
   –   Casati-Daniel-Soi: Mashup technologies
   –   Baumgartner-Campi-Gottlob-Herzog: Web data extraction
   –   Hedeler-Belhajjame-Campi-Embury-Fernandez-Paton:Dataspaces
   –   Bozzon-Fraternali: Multimedia and multimodal information retrieval

 Part 3: Issues in Search Computing
   –   Campi-Ceri-Gottlob-Ronchi: Service marts
   –   Braga-Campi-Grossniklaus: Join methods and query optimization
   –   Ilyas-Martinenghi-Tagliasacchi: Rank aggregation
   –   Braga-Grossinklaus-Ceri: Panta Rhei, a query execution environment
   –   Brambilla-Ceri-Fraternali-Manolescu: Liquid queries and liquid results
   –   Brambilla-Ceri: Software engineering of search computing applications
   –   Masseroli-Paton-Spasic: Search computing and the life sciences
       Database Management   Prof. Stefano Ceri
Second Workshop: Design Principles                                80

   Consolidate severalongoingresearchchapterstouching the
    variousaspectsof the project
   Developconnectionstootherresearchprojects so asto share
    knowledge - and possiblybuildcooperationsbased on
    mutualcomplementarity.
   Settinginternaldeadlinesto project evolution
    – Beingreadyfor the workshop
    – Dump organisational responsibility to session chairs
   Try a more discussion-oriented format
    – Ourview
    – Guest’s views
    – Panel/discussion (sometimes driven, sometimes not)
   Produce Proceedings as Springer LNCS, each session contributing
    to a short part



      Database Management   Prof. Stefano Ceri
Second SeCo Workshop Last Week                 81




    Database Management   Prof. Stefano Ceri
Second Workshop: Sessions                                    82


 Pre-Workshop (Milano, May 25)
  – Searchas a Process
  – Business Models

 Workshop (Como, May 26-28)
  –     SemanticResourceFramework
  –     WrappingTechnology and OntologicalAnnotation
  –     Design Tools and MashupLanguages
  –     SearchComputing and ResearchEvaluation
  –     Query Processing
  –     Rank Join
  –     SearchComputingforBioMedicalApplications
  –     User-CenteredApproachtoSearchComputingApplications

 Post-Workshop (Milano, May 31)
  – VisualInterfacesforComplexSearch


      Database Management   Prof. Stefano Ceri
Lookingforward                                                       83

 Establishstrongerco-operationwithotherprojects
   – Bothfortechnology and applications

 StrengthenSeCo “coreresearch”
   –     Cover the processlifecyclewithmethods&tools
   –     Improve result visualization and user interaction
   –     Usesemantics in service registration and query processing
   –     Turn PantaRheiinto a full Service Base Management System
         (SBMS) withnewrank join methods, proximity, uncertainty…

 Strengthen the prototypes
   – Fullydevelop the registrationenvironment
   – Extend the executionenvironment, makeitscalableoverclouds
   – Extend the liquid interface, cover mobile interfaces

 Put a “killer” application online (usable!)
 Exploreexploitationoptions

       Database Management   Prof. Stefano Ceri

Weitere ähnliche Inhalte

Ähnlich wie Search Computing Overview

Virtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireVirtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireCarter Shanklin
 
Naukri Search Team achievements, 2009-2010
Naukri Search Team achievements, 2009-2010Naukri Search Team achievements, 2009-2010
Naukri Search Team achievements, 2009-2010Aditya Varun Chadha
 
Common Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the CloudCommon Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the CloudNick Gerner
 
Skeuomorphs, Databases, and Mobile Performance
Skeuomorphs, Databases, and Mobile PerformanceSkeuomorphs, Databases, and Mobile Performance
Skeuomorphs, Databases, and Mobile PerformanceApigee | Google Cloud
 
Riak at shareaholic
Riak at shareaholicRiak at shareaholic
Riak at shareaholicfreerobby
 
Migrating to Riak at Shareaholic
Migrating to Riak at ShareaholicMigrating to Riak at Shareaholic
Migrating to Riak at ShareaholicShareaholic
 
Skeuomorphs, Databases, and Mobile Performance
Skeuomorphs, Databases, and Mobile PerformanceSkeuomorphs, Databases, and Mobile Performance
Skeuomorphs, Databases, and Mobile PerformanceSam Ramji
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerCloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerDatabricks
 
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "Kuniyasu Suzaki
 
Shared personalization service. How to scale to 15 k rps (Patrice Pelland)
Shared personalization service. How to scale to 15 k rps (Patrice Pelland)Shared personalization service. How to scale to 15 k rps (Patrice Pelland)
Shared personalization service. How to scale to 15 k rps (Patrice Pelland)Ontico
 
Hibernate Search Seam 1.5
Hibernate Search Seam 1.5Hibernate Search Seam 1.5
Hibernate Search Seam 1.5Prasoon Kumar
 
Distributed_Database_System
Distributed_Database_SystemDistributed_Database_System
Distributed_Database_SystemPhilip Zhong
 
인메모리 클러스터링 아키텍처
인메모리 클러스터링 아키텍처인메모리 클러스터링 아키텍처
인메모리 클러스터링 아키텍처Jaehong Cheon
 
Couchdb + Membase = Couchbase
Couchdb + Membase = CouchbaseCouchdb + Membase = Couchbase
Couchdb + Membase = Couchbaseiammutex
 
seven-ways-to-run-flink-on-aws.pdf
seven-ways-to-run-flink-on-aws.pdfseven-ways-to-run-flink-on-aws.pdf
seven-ways-to-run-flink-on-aws.pdfSergioBruno21
 

Ähnlich wie Search Computing Overview (20)

Virtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireVirtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFire
 
Bca1931 final
Bca1931 finalBca1931 final
Bca1931 final
 
Naukri Search Team achievements, 2009-2010
Naukri Search Team achievements, 2009-2010Naukri Search Team achievements, 2009-2010
Naukri Search Team achievements, 2009-2010
 
Common Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the CloudCommon Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the Cloud
 
Skeuomorphs, Databases, and Mobile Performance
Skeuomorphs, Databases, and Mobile PerformanceSkeuomorphs, Databases, and Mobile Performance
Skeuomorphs, Databases, and Mobile Performance
 
Riak at shareaholic
Riak at shareaholicRiak at shareaholic
Riak at shareaholic
 
Migrating to Riak at Shareaholic
Migrating to Riak at ShareaholicMigrating to Riak at Shareaholic
Migrating to Riak at Shareaholic
 
Skeuomorphs, Databases, and Mobile Performance
Skeuomorphs, Databases, and Mobile PerformanceSkeuomorphs, Databases, and Mobile Performance
Skeuomorphs, Databases, and Mobile Performance
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerCloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
 
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
 
Shared personalization service. How to scale to 15 k rps (Patrice Pelland)
Shared personalization service. How to scale to 15 k rps (Patrice Pelland)Shared personalization service. How to scale to 15 k rps (Patrice Pelland)
Shared personalization service. How to scale to 15 k rps (Patrice Pelland)
 
SenseiDB
SenseiDBSenseiDB
SenseiDB
 
Hibernate Search Seam 1.5
Hibernate Search Seam 1.5Hibernate Search Seam 1.5
Hibernate Search Seam 1.5
 
Distributed_Database_System
Distributed_Database_SystemDistributed_Database_System
Distributed_Database_System
 
Pnuts
PnutsPnuts
Pnuts
 
PNUTS
PNUTSPNUTS
PNUTS
 
Pnuts Review
Pnuts ReviewPnuts Review
Pnuts Review
 
인메모리 클러스터링 아키텍처
인메모리 클러스터링 아키텍처인메모리 클러스터링 아키텍처
인메모리 클러스터링 아키텍처
 
Couchdb + Membase = Couchbase
Couchdb + Membase = CouchbaseCouchdb + Membase = Couchbase
Couchdb + Membase = Couchbase
 
seven-ways-to-run-flink-on-aws.pdf
seven-ways-to-run-flink-on-aws.pdfseven-ways-to-run-flink-on-aws.pdf
seven-ways-to-run-flink-on-aws.pdf
 

Kürzlich hochgeladen

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Kürzlich hochgeladen (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Search Computing Overview

  • 1. SearchComputing Stefano Ceri, Keynote talk at CAISE, Hammamet, June 9, 2010 Joint work with: Adnan Abid, Mamoun Abu Helu, Davide Barbieri, Daniele Braga, Marco Brambilla, Alessandro Bozzon, Alessandro Campi, Sofia Ceppi, Francesco Corcoglioniti, Emanuele Della Valle, Davide Eynard, Piero Fraternali, Nicola Gatti, Giorgio Ghisalberghi, Michael Grossniklaus, Davide Martinenghi, Marco Masseroli, Maristella Matera, Chiara Pasini, Elena Pellizzotti, Stefania Ronchi, Marco Tagliasacchi, Luca Tettamanti, Salvatore Vadacca, Riccardo Volonterio, Serge Zagorac
  • 2. Genesis of Search Computing  My “Gong Show” challenge at 2003 Lowell Workshop: “Find an ethnical restaurant in a nice place close to Milano” .  Logically a composition of domains: – Restaurants (ethnical) – Geo-locations (nice place close to Milano)  Composing maps with “geo-located” information is now solved by all search engines … … but in general no system is capable of composing arbitrary semantic domains Database Management Prof. Stefano Ceri
  • 3. Motivating Examples 3  “Who are the strongest candidates in Europe for competing on software ideas?”  “Who is the best doctor who can cure insomnia in a close-by hospital?”  “Where can I attend an interesting scientific conference in my field and at the same time relax on a beautiful beach nearby?” Database Management Prof. Stefano Ceri
  • 4. Their Common Aspect 4  Multi-domain queries  Individual answers are on the Web  A knowledgeable user would do the query step-by-step: – Search database conferences, get their city – Check that the city average temperature is warm enough – Search low-cost flights via a broker for that city – Search luxury hotels via another broker  We want a system for supporting this search process – Build several “solutions” which already integrate all dimensions – Rank “solutions” according to a global rank function and output results in rank order – Support user-friendly query definition and result browsing – Add search domains while the search proceeds – Possibly change the relative weight of each ranking Database Management Prof. Stefano Ceri
  • 5. 5 OVERALL FRAMEWORK Database Management Prof. Stefano Ceri
  • 6. Search Computing architecture: overall view 6 Front End High-Level Query Final User Query Analysis Results Cache Sub-queries Cache Query To Domain Result Mapper Transformation Cache Low-level queries Merged Results Query Planner Cache Concrete Query Plan Query Engine WS-Framework Main Query flow OP 1 OP 2 ... OP N Cache Cache <Uses> relation Domain Domain Service WS Framework Repository Repository World Cache Database Management Prof. Stefano Ceri
  • 7. Search Computing architecture: overall view 7 Front End High level query “Where can I attend a DB Sub query 1 scientific conference close to High-Level Query “Where can I attend a beautiful beach reachable Final User Query Analysis with cheap flights?” Results a DB scientific Sub query 2 conference?” Cache “place close to a beautiful Sub-queries beach?” Sub query 3 Cache Query To Domain Result Mapper “place reachable Transformation with cheap flight?” Cache Low-level queries Merged Results Query Planner Cache Concrete Query Plan Query Engine WS-Framework Main Query flow OP 1 OP 2 ... OP N Cache Cache <Uses> relation Domain Domain Service WS Framework Repository Repository World Cache Database Management Prof. Stefano Ceri
  • 8. Search Computing architecture: overall view 8 Front End High-Level Query Final User Query Analysis Results Cache Sub-queries Low level query 1 Cache ConfSearch(“DB”,placeX,dateY) Query To Domain Result Mapper Transformation Low level query 2Cache TourSearch(“Beach”,PlaceX) queries Low-level Merged Results Query Planner Low level query 3 Flight(“cost<200”,PlaceX,DateY) Cache Concrete Query Plan Query Engine WS-Framework Main Query flow OP 1 OP 2 ... OP N Cache Cache <Uses> relation Domain Domain Service WS Framework Repository Repository World Cache Database Management Prof. Stefano Ceri
  • 9. Search Computing architecture: overall view 9 Front End High-Level Query Presented results Final User Query Analysis Results ESWC-Crete-Olympic Cache CAISE- Hammamet – Alitalia TOOLS-Malaga-EasyJet Sub-queries Cache Query To Domain Result Mapper Transformation Cache Low-level queries Merged Results Query Planner Results Cache Query plan Concrete Query Plan Query Engine WS-Framework Main Query flow Cache OP 1 OP 2 ... OP N Services invocations Cache and operators execution <Uses> relation Domain Domain Service WS Framework Repository Repository World Cache Database Management Prof. Stefano Ceri
  • 10. Search Computing architecture: incremental prototyping 11 Front End Concrete Query Plan Low-level queries Sub-queries High-Level Query Final User Query Analysis Results Cache Sub-queries Cache Admin Interface Query To Domain Result Mapper Transformation Cache Low-level queries Merged Results Query Planner Cache Concrete Prototype 1: Query Plan Core behaviour of the Query Engine system. WS-Framework OP 1 OP 2 ... OP N Cache Cache • Engine-based execution of queries • Domain repository <Uses> relation • Service repository • Coarse result presentation Domain Domain Service WS Framework Repository Repository World Cache Database Management Prof. Stefano Ceri
  • 11. Search Computing architecture: incremental prototyping 12 Front End Concrete Query Plan Low-level queries Sub-queries High-Level Query Final User Query Analysis Results Cache Sub-queries Cache Admin Interface Query To Domain Result Prototype 2: Mapper Transformation Cache Planning Low-level queries Merged Results • Automatic optimized query planning Query Planner Cache Concrete Prototype 1: Query Plan Core behaviour of the Query Engine system. WS-Framework OP 1 OP 2 ... OP N Cache Cache • Engine-based execution of queries • Domain repository <Uses> relation • Service repository • Coarse result presentation Domain Domain Service WS Framework Repository Repository World Cache Database Management Prof. Stefano Ceri
  • 12. Search Computing architecture: incremental prototyping 13 Prototype 4: Front End High level queries Concrete Query Plan Low-level queries Prototype 3: Sub-queries High-Level Query Mapping and Final User presentation Query Analysis Results Cache • mapping to domains • presentation of results Sub-queries Cache Admin Interface Query To Domain Result Prototype 2: Mapper Transformation Cache Planning Low-level queries Merged Results • Automatic optimized query planning Query Planner Cache Concrete Prototype 1: Query Plan Core behaviour of the Query Engine system. WS-Framework OP 1 OP 2 ... OP N Cache Cache • Engine-based execution of queries • Domain repository <Uses> relation • Service repository • Coarse result presentation Domain Domain Service WS Framework Repository Repository World Cache Database Management Prof. Stefano Ceri
  • 13. CAISE FOCUS on: Service Registration 14 Service Marts: • Conceptualrepresentatio nofresourcesasentities and connections • Logicalrepresentationofs ignatures • Physicalrepresentationa s service implementations Database Management Prof. Stefano Ceri
  • 14. CAISE FOCUS on: Front-end 15 Liquid Query Front End  Client- High-Level Query sideframeworkforconfi Final User guration and Query Analysis Results automaticrenderingof Cache query and Sub-queries resultinterfaces Query To Domain Result Cache Mapper Transformation  User interaction Cache Low-level queries primitives that allow to Merged Results perform explanatory Query Planner search Cache Concrete Query Plan Query Engine WS-Framework OP 1 OP 2 ... OP N Cache Cache Domain Domain Service WS Framework Repository Repository World Cache Database Management Prof. Stefano Ceri
  • 15. CAISE FOCUS on: Development Process 16 Development Support Deploy <<implements>> <<deploys>> Time Search Services SeCo platform Environment Service Developer SeCo Expert Tools supporting Service Publishing Time Wrapping  Service Registration <<implements>>  Query Design <<defines>> Materialization / Normalization  Performance Monitoring Service Publisher <<uses>> <<performs>> Registration of Service Mart <<produces>> Service Mart <<uses>> Repository Config. Time <<defines>> Liquid Query <<produces>> Template Expert User Execution Time <<uses>> <<submits>> Liquid Query User Interface Specification <<manipulates>> <<uses>> Final User Liquid Result Database Management Prof. Stefano Ceri
  • 16. 17 SERVICE REGISTRATION Database Management Prof. Stefano Ceri
  • 17. Service Registration in SeCo Objective: providing a framework for registering services as first-class citizens within SeCo => Service Marts  High-level abstractions of “real world entities” that provide a simple interface to users and hide implementation details  Inspired by Data Marts, a data modeling pattern used in data warehousing  Each Service Mart can have multiple modalities of data access and can be mapped to multiple service implementations, possibly offered by different providers =>Connection Patterns  High-level abstractions of “real world relationships” that provide a simple interface to users and hide implementation details  Built by means of attributes that share the same domains Database Management Prof. Stefano Ceri
  • 18. Service Marts – Conceptual Level Every SM definition includes a name and a collection of the exposed attributes,i.e. the attributes of the real world object described by the SM Movie(Title, Director, Year, Language, Genres(Genre), Actors(Name, Sex))  Atomic, single valued, typed attributes  Repeating groups (multi-valued, typed attributes)  Each “repeating group” is a non-empty set of typed sub-attributes that collectively defines a property of the service mart The model choices are:  To support structural complexity with only one level of nesting (rather than an arbitrary level of nestings)  To avoid explicit descriptions of relationship (using repeating groups for M:N relationships) Database Management Prof. Stefano Ceri
  • 19. Service Marts – Logical Level At this level, each SM is associated with one or more Access Patterns, i.e.: Movie1(TitleO, DirectorO, ScoreRO, YearO, LanguageI, Genres.GenreO, Actors.NameO , Actors.SexO,Genres.GenreI) Movie2(TitleI, DirectorO, YearO, LanguageO, Genres.GenreO, Actors.NameO , Actors.SexO)  Access patterns contain adorned attributes, i.e. attributes tagged with one of the following:  I, if they are input attributes  O, if they are Output attributes  R, if they are attributes used for ranking – they may or may not be visible in output  Movie1 makes access to movies by Language and Genre (i.e., “action movies in English”) and results are ranked by Score (a new attribute).  Movie2 makes access to movies by Title (e.g. “Ben Hur”). We expect few (zero, one, more) results which are not ranked. Database Management Prof. Stefano Ceri
  • 20. Service Marts – Physical Level At this level, every Access Pattern can match different Service Implementations, having:  Physical URI to be called  Physical properties which are specific to the implementation  Mapping between logical and physical parameters IMDBMovie1(MovieTitleO, DirectorO, StarsRO, YearO, LanguageI, Genres.GenreI, Actors.NameO , Actors.GenderO) IMDBMovie AP: Movie1 URI: http://... TTL=6000, chunksize=10, cacheable=true, exposed=false, ... Title Director Score Year Language ... MovieTitle Director Stars Year Lang ... Database Management Prof. Stefano Ceri
  • 21. External and Selector Attributes  external attributes, for supporting access and ranking SM Movie(Title, Director, Year, Language, …) AP Movie1: TitleO | DirectorO | YearO | … | ScoreRO | GenreI AP Movien: TitleO | DirectorO | YearO | … | TitleI External attributes  selector attributes, for supporting choices among service implementations SM Movie(Title, Director, Year, Language, …) Language SI Movie Implementation 1 ... Selector SI Movie Implementation n Database Management Prof. Stefano Ceri
  • 22. Connection Patterns Connections between marts only exist in terms of attributes that share the same domains, on different levels of abstraction:  Conceptually by a nondirected edge with a name: PlayingMovie(Movie,Theatre) Movie Theatre  Logically by an edge (possibly directed) with name and join condition: PlayingMovie(Movie,Theatre): (Title=Movie.Title) Movie4 Theatre2 Database Management Prof. Stefano Ceri
  • 23. Connection Patterns – Logical Level Directed edge: Information is “piped” from one access pattern to another, along connection attributes which are in output in the first service and in input in the second service -> PIPE JOIN Movie1 Title Director Score Year Language A.Name A.Sex G.Genre Theatre1 Name Address M.Start M.Title Database Management Prof. Stefano Ceri
  • 24. Connection Patterns – Logical Level Undirected edge: results are produced by both access patterns in output and then joined -> PARALLEL JOIN Movie1 Title Director Score … … G.Genre Theatre1 Name Address M.Start M.Title Database Management Prof. Stefano Ceri
  • 25. Join of two Services, Pipe Version, NY City Search only in NY Movie Theatre Service Mart Service Mart Movie1 Movie2 Theatre1 Access Access Access pattern pattern pattern IMDB2 Service Interface IMDB1 Hyperrev1 Service Service Interface Interface Google1 NYLocalSearch Service Service Interface Interface Database Management Prof. Stefano Ceri
  • 26. 27 JOIN OF TWO SEARCH SERVICES Database Management Prof. Stefano Ceri
  • 27. JOIN of Web Services  Input: items resulting from TWO web service calls, possibly ranked  Output: composed items resulting from the concatenation of matching items, presented in a “global ranking order”  Matching condition using: – value equality, – partial set matching – term matching within a vocabulary …..  Services are known, their matching function is predefined: this is not service discovery! Database Management Prof. Stefano Ceri
  • 28. Join 29 Service X Service Y bx5 by5 bx4 by4 bx3 by3 bx2 by2 bx1 by1 r1 r2 r3 Database Management Prof. Stefano Ceri
  • 29. Matching items 30 Database Management Prof. Stefano Ceri
  • 30. Choice of the join strategies  The join search space – Different explorations for different joins methods under different assumptions and with different guarantees Chunksize Chunk tij Any exploration trajectory Candidate join result for this space is a join strategy Database Management Prof. Stefano Ceri
  • 31. Nested Loop - Rectangular 32 Database Management Prof. Stefano Ceri
  • 32. Merge scan - Triangular 33 Database Management Prof. Stefano Ceri
  • 33. Parallel and Pipe Joins 34  Parallel join of two search services (1) S1 (1,2)n C1 period: 150 ms stop: 10 excess: (1,1) (2) S2  Pipe join of two search services (2) (1) (1,10)5(0,1)n S1 S2 size: 20 period: 500 ms stop: 1 Database Management Prof. Stefano Ceri
  • 34. 35 SUPPORT OF “SIMILARITY JOINS" Database Management Prof. Stefano Ceri
  • 35. Supporting value similarity  Concept of “nearness” is widely implemented depending on different contexts, such as: Lexical near (similar strings) Spatial near (between addresses/geo locations) Temporal near (between dates/times) Economic near (between costs)  Context is defined according to the attributes involved => Semantics of nearness built bottom-up, starting from the physical layer (available services) up to the conceptual one. Database Management Prof. Stefano Ceri
  • 36. Similarity comes from Shared Domains The attribute “address” is shared by the 4 entities. Its restaurant apartment semantic type, Address Address describing a location, enables “nearness” Spatial connections between Near each pair of entities (i.e. addresses can Address Address be compared for “nearness” within the hotel theatre same city, country, …) Database Management Prof. Stefano Ceri
  • 37. Supporting Nearness within Services Severalphysicalservicesnativelysupport ranking bydistances (e.g. GoogleMovies)  E.g.: GoogleMovies receives the user address as input, and returns theatres ranked by distance, each one with its address as output. UserAddress and Distance are external attributes. GoogleMovies(UserAddressI, DistanceR| NameO, AddressO, Movie.TitleI, Movie.StartTimeO) GoogleMovies AP: Theatre1 URI: http://... TTL=6000, chunksize=10, cacheable=true, provides=Spatial Near UserAddress Name Address M.Title M.StartTime ... IAddr Name OAddr MovieTit MovieTime ... Database Management Prof. Stefano Ceri
  • 38. “Nearness” Support within Services Theatre Restaurant Spatial Near Restaurant2 Address Name Cuisine Price Spatial near Theatre1 UserAddress Name Address M.Title M.StartTime Distance GoogleMovies AP: Theatre1 URI: http://... TTL=600, chunksize=10, cache=1, provides=Spatial Near UserAddr Name Address M.Title ... Addr Name Addr MovieTit ... Database Management Prof. Stefano Ceri
  • 39. Nearness Services within the Execution Engine Ad-hoc services providing the notion of distance at the physical level require two domain values as input and produce their distance as output  Two input attributes to specify two values of the domain  One output attribute specifies the distance in given units SpatialNear System URI: http://... TTL=600, chunksize=1, cacheable=1, ... Input1, Input2: Coordinates Output: Distance (Km) Database Management Prof. Stefano Ceri
  • 40. Supporting Nearness within the Execution Engine Theatre Restaurant Spatial Near Restaurant2 Address Name Cuisine Price Theatre1 Address Name M.Title M.StartTime Spatial Near Addr1 Addr2 Distance SpatialNear System URI: http://... TTL=600, chunksize=1, cacheable=1, ... Input1, Input2: Coordinates Output: Distance (Km) Database Management Prof. Stefano Ceri
  • 41. Join of three Services at the three Levels in NY Search only in NY Movie Theatre Restaurant Service Mart Service Mart Service Mart Spatial Near Movie1 Movie2 Theatre1 Rest1 Rest2 AP Access Access Access Access providing pattern pattern pattern pattern spatial near IMDB2 Yahoo1 Yahoo2 Service Service Service Interface Interface Interface IMDB1 Service Interface Hyperrev1 Google1 NYLocalSearch Service Service Service Interface Interface Interface Database Management Prof. Stefano Ceri
  • 42. Three Levels with Connection Semantics Services Connections Name (with associated Conceptual Service Mart semantics) Bindings between SM and AP attributes, plus definition of extra attributes Join attributes,directed vs Logical Access Pattern undirected edge (with nearness service APs added as needed) Bindings between AP attributes and SI parameters Service Interface (with Physical associated semantics and Nearness Services with system services) Database Management Prof. Stefano Ceri
  • 43. Resource graph Specialized way for describing search service based knowledge available on the web [ER model, ontology, class diagram?] News Restaurant Exhibition ... Piece ... Concert ... Artist ... Photo ... Hotel Movie ... Metro Station Theatre Landmark ... ShoppingCenter ... Database Management Prof. Stefano Ceri
  • 44. 46 APPLICATION DEVELOPMENT PROCESS Database Management Prof. Stefano Ceri
  • 45. SeCo development process Search Service and Registration Development  Main Roles: • Service developer Service developer Implement search service • Service publisher Adaptation Service • Expert user Service publisher Wrap or materialize Register service • SeCo expert service mart and interface  Dichotomy: Configuration Application • Top-down Service Mart model vs. Expert user Design Liquid Query Template Bottom-up • Run time Manual optimization needed? N Liquid Query model Y vs. Refinement Query Plan Design time Query Plan model SeCo expert Panta Rhei plan refinement Database Management Prof. Stefano Ceri
  • 46. The service registration process Service Description SM Identification Buttom up Strategy YES NO Some SM retrieved ? YES SM CREATION Modification Hybrid Strategy of the SM structure? SM UPDATE NO Associated SI Update Top down Strategy (new connections) SM MAPPING AP CREATION Service Physical Description END Database Management Prof. Stefano Ceri
  • 47. The SM Creation process, with semantic hints SM CREATION Movie(Title, Director, Score, Year, Genres(Genre), Openings(Country, Date), Actors(Name)) Type SM Name and attributes conventions schema definition Movie: S: (n) movie, film, picture, moving picture, moving-picture show, motion picture, motion-picture show, picture show, pic, flick (a form of entertainment that enacts a story by WN sound and a sequence of images giving the SM and attributes Semantical Description illusion of continuous movement) "they went to Synsets (and tags?) a movie every Saturday night"; Automatic recommendation Director: S: (n) film director, director of connectable SMs (the person who directs the making of a film) SM1 Connection patterns Shows(Movie, Theatre): [(Title=Title)] Theatres (CP) definition SMn Defined CP: Shows Textual_near Possible CP: Title (String)  Textual_near Spatial_near Composition Language Textual near operators association Year (Date)  Temporal_near … Temporal_near Database Management Prof. Stefano Ceri
  • 48. The SM Mapping procedure SM MAPPING Original SM Movie(Title, Director, Score, Year, …) Director: String Director: S: the person who directs the making of a film) f Director (String) SI Selector ImdbMovie: Title | Director | Score | Year | … Auxiliary Selector CorrespondingSM attributes attributes attributes (i.e. query attributes) Database Management Prof. Stefano Ceri
  • 49. SeCo Tools • Online tool suite that covers the whole development process • Mashup-based • Built by using state of the art technologies: 1. MVC on the client: Javascript MVC 2. UI organization and panels: Yahoo! User Interfaces 3. Diagram drawing and editing: WireIt Database Management Prof. Stefano Ceri
  • 50. Service Mart Registration 53 Database Management Prof. Stefano Ceri
  • 51. Mapping editor 54 Database Management Prof. Stefano Ceri
  • 52. Query Registration Interface 55 Database Management Prof. Stefano Ceri
  • 53. Query Registration Editor, Logical Connections 56 Database Management Prof. Stefano Ceri
  • 54. 57 LIQUID QUERY INTERFACE Database Management Prof. Stefano Ceri
  • 55. Liquid Query “ A new paradigm allowing users to formulate and get responses to multi-domain queries through an exploratory information seeking approach, based upon structured information sources exposed as software services…” • Composite answers obtained by aggregating search results from various domains • Highlight the contribution of each search service • Join of results based on the structural information afforded by the search service interfaces • Refine the user query • Re-shape the result list Database Management Prof. Stefano Ceri
  • 56. Liquid query definition It consists of subsetting and parametrizing the resource graph... News Restaurant Exhibition ... Piece ... Concert ... Artist ... Photo ... Hotel Movie ... Metro Station Theatre Landmark ... ShoppingCenter ... = inputs, outputs + GR = global ranking Database Management Prof. Stefano Ceri
  • 57. Liquid query definition ... And then characterizing the user interaction News Restaurant Exhibition Concert Artist Photo Hotel Expand Plus: Metro Station • Parametrization of global ranking • Data visualization options • .. and so on Database Management Prof. Stefano Ceri
  • 58. Query Submission Concert Hotels query conditions query conditions Database Management Prof. Stefano Ceri
  • 59. Query Execution & Result Presentation Database Management Prof. Stefano Ceri
  • 60. 63 SECO ENGINE Database Management Prof. Stefano Ceri
  • 61. Overview  The tools is aimed at developers and permits to compose, plan and run a SeCo query  Four panels, one for each query processing phase: Query Logical Physical Query composition planning planning execution Splashscreen! Database Management Prof. Stefano Ceri
  • 62. Query composition (1) Service interface browser • listsregistered service interfaces • Input and output parameters are listed Selected service’s statistics • collected service statistics are displayed • statisticsmaybeeditedfortestingpurposes Database Management Prof. Stefano Ceri
  • 63. Query composition (2) User-entereddatalog-likequery Queryoptimisationparameters • joinsimplicitlyencodedbydatalogvars • control the behaviourof the planner • $varsencodequeryinputsprovided at • trigger the planning process runtime Database Management Prof. Stefano Ceri
  • 64. Logical planning Database Management Prof. Stefano Ceri
  • 65. Physical planning Database Management Prof. Stefano Ceri
  • 66. Query execution (1) Executionsession management • a sessioncorrespondsto a single queryexecution, where multiple usercommandsmaybeissued • query input parameters are specified at sessioninitialisation Execution status • displays the currentsession status • displays the status of the executioncommandsissued so far Executioncommandsforms • a more-allcommandrequires more queryresults • a more-onecommandrequires more resultsbyextracting more data from a specific service invokedby the query Database Management Prof. Stefano Ceri
  • 67. Query execution (2) Queryresults • Displaysrankedr esults, assoonascomp uted Executiontime line • displaysactivati onofexecutionu nits (e.g. service calls) • usefulto fine tune the engine and the join strategies Database Management Prof. Stefano Ceri
  • 68. Query execution (3) Service calls log • displays service calls at the chunkgranularity • showsresponsetimes, statistics, cache behaviour Database Management Prof. Stefano Ceri
  • 69. DEMO http://demo.search-computing.eu Database Management Prof. Stefano Ceri
  • 70. 73 SUMMARY OF SECO RESULTS Database Management Prof. Stefano Ceri
  • 71. Results after 18 months 74  Concepts – Service marts, rank join methods, pantarhei, liquidquery  Researchresults – Springer LNCS: SearchComputingChallenges and Directions – Manypublications (withVLDB,WWW), manyongoingsubmissions – Filingof US Patent (top-k method, random&sequentialservices)  Prototypes – Executionenvironment, focus on liquidquery and on integration – Design supportenvironment, focus on mashups  Dissemination – Fifteenkeynotetalks, twelvearticles in the Italian press – SeCo Web site, SeCo blog, facebook, linked-in, twittercommunities – SearchComputing Graduate Course at PoliMi  Temporary research positions (1 phd, 5 post-ms, 3 post-doc) Database Management Prof. Stefano Ceri
  • 72. Publications 75 SeCo - D. Braga, A. Campi, S. Ceri, A. RaffioJoining the results of heterogeneous search enginesInformation Systems, Vol. 33, Issues 7-8, (November-December 2008), Pages 658-680 - D. Braga, S. Ceri, F. Daniel, D. MartinenghiOptimization of Multi-Domain Queries on the WebVLDB 2008: 562-573, Auckland, New Zealand, August 2008 - D. Braga, S. Ceri, F. Daniel, D. MartinenghiMashing Up Search Services, IEEE Internet Computing 12(5): 16-23 (2008) - D. Braga, D. Calvanese, A. Campi, S. Ceri, F. Daniel, D. Martinenghi, P. Merialdo, R. Torlone, NGS: a framework for multi-domain query answering, ICDE Workshops 2008: 254-261 - S. Ceri, Search Computin Invited Paper, 25th International Conference on Data Engineering, Shanghai, March 29 - April 2, 2009 - D. Barbieri, A. Bozzon, D. Braga, M. Brambilla,A. Campi, S. Ceri, E. Della Valle, P. Fraternali, M. Grossniklaus, D. Martinenghi, S. Ronchi, M. TagliasacchiData-driven optimization of - search service composition for answering multi-domain queries (USETIM 2009) workshop at VLDB 2009, Lyon, France, August 24-28, 2009 - M.Brambilla, S. Ceri, Engineering Search Computing Applications: Vision and Challenges The 7th joint meeting of the European Software Engineering Conference (ESEC) and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE), Amsterdam, The Netherlands, August 24-28 2009 - S. Ceri Search Computing The 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Milan, Italy, September 15-18 2009 - S. Ceppi and N. Gatti, An Automated Mechanism Design Approach for Sponsored Search Auctions with Federated Search Engines In Proceedings of the 12^th Workshop on Agent- Mediated Electronic Commerce (AMEC) in the 9^th International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Toronto, Canada May 10 2010 - D. Martinenghi, M. Tagliasacchi, and S. Ceri Top-k pipe-join International Workshop on Ranking in Databases, Long Beach, USA, March 2010 - A. Bozzon, M. Brambilla, S. Ceri, P. FraternaliLiquid Query: Multi-Domain Exploratory Search on the WebWWW 2010 - 19th International World Wide Web Conference - Raleigh, North Carolina, April 26-30 2010 - A. Campi, S. Ceri, A. Maesani, S. RonchiDesigning Service Marts for Engineering Search Computing Applications The Tenth International Conference on Web Engineering, ICWE 2010, Vienna, Austria, July 5-9 2010 Related - M. Brambilla, S. Ceri, I. Celino, D. Cerizza, E. Della Valle, F. M. Facca, A. Turati, C. TziviskouExperiences in the Design of Semantic Services Using Web Engineering Methods and Tools Journal on Data Semantics 2008- A. Raffio, D. Braga, S. Ceri, P. Papotti, M. Hernandez Clip: a Visual Language for Explicit Schema Mappings International Conference on Data Engineering (ICDE), April 2008 - D. Braga, D. Calvanese, A. Campi, S. Ceri, F. Daniel, D. Martinenghi, P. Merialdo, R. TorloneA New Generation Search Engine Supporting Cross Domain Queries Italian Symposium on Advanced Database Systems (SEBD), June 2008 - D. Braga, D. Calvanese, A. Campi, S. Ceri, F. Daniel, D. Martinenghi, P. Merialdo, R. TorloneNGS: a Framework for Multi-Domain Query Answering IIMAS, International Conference on Data Engineering Workshops (ICDE), April 2008 - A. Raffio, D. Braga, S. Ceri, P. Papotti, M. Hernandez Clip: a Tool for Mapping Hierarchical Schemas ACM SIGMOD/PODS Conference, Demo Session, June 2008 - A. Bozzon, M. Brambilla, P. FraternaliConceptual Modeling of Multimedia Search Applications Using Rich Process Models ICWE 2009, Springer LNCS, vol. 5648, ISBN 978-3-642-02817-5. - E. Della Valle, S. Ceri, D. F. Barbieri, D. Braga, A. CampiA First Step Towards Stream Reasoning Future Internet Symposium (FIS) 2008, pp. 72-81. - A. Bozzon, M. Brambilla, F. M. Facca, G. ToffettiCarughiA Conceptual Modeling Approach to Business Service Mashup Development IEEE International Conference on Web Services, ICWS 2009, Los Angeles. IEEE Press, July 2009, pp. 751 - 758. - P. Fraternali, M. Brambilla, A. Bozzon, Model-Driven Design of Audiovisual Indexing Processes for Search-Based Applications Content-Based Multimedia Indexing, 2009, CBMI '09, IEEE Press, ISBN: 978-1-4244-4265-2, pp. 120-125. - D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle and M. Grossniklaus, C-SPARQL: SPARQL for Continuous Querying Proceedings of WWW 2009, 18th International World Wide Web Conference (Poster), Madrid, Spain, April 2009 - D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle and M. GrossniklausContinuous Queries and Real-time Analysis of Social Semantic Data with C-SPARQL In Proceedings of SDoW 2009, 2nd ISWC Workshop on Social Data on the Web, Washington, DC, USA, October 2009 - D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle and M. GrossniklausC-SPARQL: A Continuous Query Language for RDF Data Streams International Journal of Semantic Computing (IJSC), 2010, World Scientific Publishing - D. F. Barbieri, D. Braga, S. Ceri and M. GrossniklausAn Execution Environment for C-SPARQL Queries In Proceedings of EDBT 2010, 13th International Conference on Extending Database Technology, Lausanne, Switzerland, March 2010 Database Management Prof. Stefano Ceri
  • 73. Web Site & Blog 76  Web Site  TechWatch Blog Blog stats: ~ 900 absoluteuniquevisitors in the last twomonths Database Management Prof. Stefano Ceri
  • 74. Accessesto Web Site & Blog 77 Visits: 20% USA, 18% Italy, 6% UK, 4% India, 4% Canada  Provenance  Sources Database Management Prof. Stefano Ceri
  • 75. Search Computing First Workshop June 17-19, 2009 78 Database Management Prof. Stefano Ceri
  • 76. Search Computing Challenges and Directions (LNCS, vol. 5950, Ceri-Brambilla eds.) 79  Part 1: Vision – Ceri: Search computing – Baeza-Yates: Next generation search – Weikum: Search for knowledge  Part 2: Technology Watch – Della Valle-Buganza-Gatti: The search engine industry – Casati-Daniel-Soi: Mashup technologies – Baumgartner-Campi-Gottlob-Herzog: Web data extraction – Hedeler-Belhajjame-Campi-Embury-Fernandez-Paton:Dataspaces – Bozzon-Fraternali: Multimedia and multimodal information retrieval  Part 3: Issues in Search Computing – Campi-Ceri-Gottlob-Ronchi: Service marts – Braga-Campi-Grossniklaus: Join methods and query optimization – Ilyas-Martinenghi-Tagliasacchi: Rank aggregation – Braga-Grossinklaus-Ceri: Panta Rhei, a query execution environment – Brambilla-Ceri-Fraternali-Manolescu: Liquid queries and liquid results – Brambilla-Ceri: Software engineering of search computing applications – Masseroli-Paton-Spasic: Search computing and the life sciences Database Management Prof. Stefano Ceri
  • 77. Second Workshop: Design Principles 80  Consolidate severalongoingresearchchapterstouching the variousaspectsof the project  Developconnectionstootherresearchprojects so asto share knowledge - and possiblybuildcooperationsbased on mutualcomplementarity.  Settinginternaldeadlinesto project evolution – Beingreadyfor the workshop – Dump organisational responsibility to session chairs  Try a more discussion-oriented format – Ourview – Guest’s views – Panel/discussion (sometimes driven, sometimes not)  Produce Proceedings as Springer LNCS, each session contributing to a short part Database Management Prof. Stefano Ceri
  • 78. Second SeCo Workshop Last Week 81 Database Management Prof. Stefano Ceri
  • 79. Second Workshop: Sessions 82  Pre-Workshop (Milano, May 25) – Searchas a Process – Business Models  Workshop (Como, May 26-28) – SemanticResourceFramework – WrappingTechnology and OntologicalAnnotation – Design Tools and MashupLanguages – SearchComputing and ResearchEvaluation – Query Processing – Rank Join – SearchComputingforBioMedicalApplications – User-CenteredApproachtoSearchComputingApplications  Post-Workshop (Milano, May 31) – VisualInterfacesforComplexSearch Database Management Prof. Stefano Ceri
  • 80. Lookingforward 83  Establishstrongerco-operationwithotherprojects – Bothfortechnology and applications  StrengthenSeCo “coreresearch” – Cover the processlifecyclewithmethods&tools – Improve result visualization and user interaction – Usesemantics in service registration and query processing – Turn PantaRheiinto a full Service Base Management System (SBMS) withnewrank join methods, proximity, uncertainty…  Strengthen the prototypes – Fullydevelop the registrationenvironment – Extend the executionenvironment, makeitscalableoverclouds – Extend the liquid interface, cover mobile interfaces  Put a “killer” application online (usable!)  Exploreexploitationoptions Database Management Prof. Stefano Ceri

Hinweis der Redaktion

  1. DATI SITO: 2353 unique visits from JanuaryDal punto di vista dei contenuti, il blog vuole essere un aggregatore di informazioni connesse al Search Computing, in tutte le sue sfaccettature, inclusa quella tecnologica. E’ nostra intenzione, infatti, pubblicare periodicamente tutorial e rassegne riguardanti le tecnologie che vengono utlizzate nello sviluppo del sistema e dei suoi dimostratori. !! DATI BLOG: si può notare un trend di crescita costante nel numero di visitatori. I massimi negativi nei cicli che vediamo corrispondono ai week-end, segno anche del fatto che il blog attira professionisti.