SlideShare a Scribd company logo
1 of 51
How to build the next 1000
    search engines?!

         Arjen P. de Vries
          arjen@acm.org
      Centrum Wiskunde & Informatica
       Delft University of Technology
                Spinque B.V.
Search is everywhere
Search is everywhere
 Yet it only works well on the web…
Complications
 Heterogeneous data sources
   WWW, wikipedia, news, e-
    mail, patents, twitter, personal information, …
 Varying result types
   “Documents”, tweets, courses, people, expert
    s, gene expressions, temperatures, …
 Multiple dimensions of relevance
   Topicality, recency, reading level, …
Complications
 Many search tasks require a mix within
  these dimensions:
   News and patents
   Companies and their CEOs
   Recent and on topic
 Many search tasks also require a mix
  across these dimensions:
   Patents assigned to our top 3 competitors in
    market segments mentioned in the recent
    press releases issued by our top 10 clients
 System‟s internal information representation
   Linguistic annotations
      Named entities, sentiment, dependencies, …
   Knowledge resources
      Wikipedia, Freebase, IDC9, IPTC, …
   Links to related documents
      Citations, urls
 Anchors that describe the URI
   Anchor text
 Queries that lead to clicks on the URI
   Session, user, dwell-time, …
 Tweets that mention the URI
   Time, location, user, …
 Other social media that describe the URI
   User, rating
   Tag, organisation of `folksonomy‟
     + UNCERTAINTY ALL OVER!
What goes in the black box?
   Document Collection:
      Anchors
      Entity types
      Sentiment
      Tweets                       BM25
      Cited documents              BM25F
               …                     LM
                                    RM              Ranked
                                    VSM               list
                                    DFR                of
                                                    answers
                                     QIR?
User                           Learning to rank?




   Context

                  ECIR / CIKM / SIGIR / ICTIR / WSDM papers!
Rarely & scarcely addressed…

    Student: How do I build it?
   Professor: Who will build it for
               me?


        Last session of the conference…
Search System
Parameterised Search System




        Cornacchia, De Vries, ECIR 2007
        A Parametrised Search System
Parameterised Search System

    Cannot we ‘remove’
    this IR engineer (or
    scientist!) from the
      loop, like DBMS
     software removes
     the data engineer
       from the loop?




                      Cornacchia, De Vries, ECIR 2007
                      A Parametrised Search System
And three (four?) children, a startup and 5 years later, a PhD defense!
Search by Strategy
 Visually construct search strategies by
  connecting building blocks
Search by Strategy
 Visually construct search strategies by
  connecting building blocks
 Each block describes either data or actions
  upon that data
   Connection points (“pins”) are typed:
    doc / sec / term / ne (named entity) / tuple
   Actions are expressed as scripts (later more)
Strategy Builder
From Patent to Inventor
Reports




          Visits
Generate Search Engine!




Or, really, generate a REST API from the strategy specification!
Demo
(Showed demo of children‟s search engine)
How Strategies Help
 Strategies improve communication between
  search intermediary and user
   Encapsulate domain expert knowledge
   Abstract representation of search expert knowledge
   Analyze information seeking process at any stage
 Strategies facilitate knowledge management
   Store / share / publish / refine
 Strategies mix exact (DB) and ranked (IR)
  searches
   Avoid the need for “human (probabilistic) joins”
Search Intermediaries
 Travel agency




                                   Task complexity
 Real estate agents
 Recruiters
 Librarians
 Archivists
 Digital forensics detectives
 Patent information specialists
Exploratory Search
 Search & (Faceted) Browsing
   Help discover schema, ontology, etc.
   Help discover the relevant sources
     Within-collection (by year/location, by type, …)
     Across multiple collections (by source)
Probabilistic faceted browsing
    Traditional (boolean
    filters)                                      Probabilistic
                                 Price                                             Price

                                 • 100K - 200K                                     • 100K - 200K
                                 • 200K - 300K                                     • 200K - 300K
                                 • 300K - 400K                                     • 300K - 400K

                                 Rooms                                             Rooms

                                 • 3                                               • 3
                                 • 4                                               • 4
                                 • 5                                               • 5

                                 Size                                              Size

                                 • 100 - 150 m2                                    • 100 - 150 m2
                                 • 150 - 200 m2                                    • 150 - 200 m2
                                 • 200 - 250 m2                                    • 200 - 250 m2



•    Good when user knows exactly                 •   Good for exploratory search
     which filters to apply
                                                  •   Will see perfect-match results
•    Will see perfect-match results
•    Won’t see “interesting” results              •   Will also see “interesting” results
Dynamic facets

    Pre-indexed                                Dynamic
                              Price                                                 Price

                              • 100K - 200K                                         • 100K - 200K
                              • 200K - 300K                                         • 200K - 300K
                              • 300K - 400K                                         • 300K - 400K

                              Rooms                                                 Rooms

                              • 3                                                   • 3
                              • 4                                                   • 4
                              • 5                                                   • 5

                              Size                                                  Size

                              • 100 - 150 m2                                        • 100 - 150 m2
                              • 150 - 200 m2                                        • 150 - 200 m2
                              • 200 - 250 m2                                        • 200 - 250 m2




•   Pre-defined ad-hoc indices                 •   Facets decided from result set
    intersected with result set                •   Challenge: dynamically adapt granularity
•   Challenge: many indices to maintain             • Different price ranges for villa/garage!
                                               •   Challenge: heavy concurrent queries to DB
Demo
(Showed Spinque‟s Real-estate search
  demo)
Limitations Search & Browse
 Faceted exploration does not include joins
   Cannot construct new data sources from
    existing ones!
   Only the pre-defined paths through the
    information space can actually be traversed
Who needs a Join?
 You!!!
  … whenever „relevance cues‟ are typed:
   People (e.g., inventors)
   Companies (e.g., assignees)
   Categories (e.g., IPTC)
   Time (e.g., expiry date)
   Location (e.g., country)
 … or whenever multiple sources are to be
 combined
   E.g., patents & news, patents & Wikipedia, …
Patents on X by Y(y)

            by Y(y)
Interactive Information Access

 Feedback:
   Interaction improves information
    representation
 Faceted Browsing:
   Interaction can let user take over where
    machine would fail
 Search by Strategy:
   Interaction can let user take over where
    system designer would fail
Conclusion
 “No idealized one-shot search engine”
 Empower the user!
Under the Hood
From Strategies to DB Queries
  in1     in2         in3
                                 Strategy

                            • Data flow
  BB1(in1,in2,in3, u1,u2)


                out

         in1


         BB2(in1)
                              Spinque: strategy
                out




  CREATE VIEW a AS
  SELECT ..                 • Query: strategy made operational
  CREATE VIEW b AS
  SELECT ..

  CREATE VIEW c AS
                              Spinque: PRA
  SELECT ..




                             Database
                              Spinque: RDBMS (MonetDB)
                                 Relational DB
Probabilistic Relational Algebra
                     Strategy




 x = Project DISTINCT
                                     • PRA: probabilistic
             [$1,$3](y);               relational algebra
                                       (Fuhr and
                                       Roelleke, TOIS 2001)

 CREATE VIEW x AS
 SELECT a1, a3,                      • SQL
         1-prod(1-prob) AS prob
 FROM y                                explicit probabilities
 GROUP BY a1, a3;



                     Relational DB
What‟s in the DB?
 Text-based ranking                                   T         D          f
   term-doc-freq relations (inverted file)            t0        d3         3
      One per language, stemming, section             t0        d5      10
   Domain-independent, click and index                t1        d2         4


 Entity ranking                         subj      pred/attr      obj/value      p

   Probabilistic triples                Arjen     speaks_to          you       0.95

   Domain-aware                             you    follow            Arjen     0.5

                                         speech    minutes             45       0.8
      Needs supervised indexing

 Content-based (MM) retrieval           Img_id             f1           …       fN

                                                                         …
   Feature vectors, click and index
                                              0          0.12                   0.84

                                              1          0.54            …      0.31

                                              2          0.23            …      0.1
VIEWS and TABLES
                                                                             User
                                                   Stored relation        parameter


   CREATE   VIEW
            TABLE   a   AS   SELECT   …   FROM   term-doc … ;
   CREATE   VIEW    b   AS   SELECT   …   FROM   a WHERE a.x = u1 ;
   CREATE   VIEW
            TABLE   c   AS   SELECT   …   FROM   a WHERE a.x = 42 ;
   CREATE   VIEW    d   AS   SELECT   …   FROM   b … ;                             No user
                                                                                  parameter
                                                                 Pre-computable
 BB content: sequence of VIEW definitions                           relation
 A VIEW is pre-computable when
    All the relations addressed are pre-computable / stored
    No dependency on user parameters
 Pre-computable VIEWs can become TABLEs (or MATERIALIZED
  VIEWs)
    Query-independent computations are performed only once, then
     read from TABLEs at each query
    Recognition of these patterns is fully automatic
    Extends MonetDB‟s per-session caching to across-sessions caching
What Next?
Current Situation
 index ;              Schema definition
 repeat {
      specify ;
      retrieve        Search & explore
 } until 
Traditional Indexing




 Preprocessing determines to large extend how
  search request form will be processed
   Especially regarding tokenization, stemming, etc.
 Fast and scalable, but inflexible
   E.g., entity search hard-coded on top of engine,
    advertisements matched on different data, etc.
Search by Strategy




 Flexible: generate arbitrary engine on the fly
 Not as fast as highly optimized and very well
  engineered inverted file based systems
Desirable Situation
 repeat {
      index ;     Mixed Initiative
      specify ;     Schema definition
                     Search & explore
      retrieve
 } until 
Non-Indexed Search




 Grep
   Very flexible
      Use it all the time on my mh mail folders when gmail
       fails me!
   Not scalable, little or no structure
Minimal Indexing




 How to reduce pre-processing necessary to
  create a search engine over a new collection?
   Can we do without a keyword index?
   Can we avoid hardwired decisions for tokenization,
    language detection, stemming, …
Suffix Array
 Pro's:
   provides many core search functions: term
    statistics, keyword search, phrase search.
   no upfront tokenization needed (access at
    character level)
   no upfront language detection needed
 Con's:
   difficult to build for large corpora
   expensive w.r.t. disk space
Demo
(Showed patent search demo)
“Real Code”
Patents on X by Y(y)

            by Y(y)
PRA
s__STRATEGY___filter_DOC_with_NE_nes =
Project [$2,$3](
 Join [$1 = $2](
    s__STRATEGY___clef_ip_patents_DATA_result,
    Project [$1,$3](
       Select [$2 = "ipcr-classification"](
          s__STRATEGY___clef_ip_patents_DATA_ne_doc
       )
    )
 )
);
CREATE TABLE s__STRATEGY___filter_DOC_with_NE_nes AS
   SELECT
    tmp_1814091754.a2 AS a1,
    tmp_1814091754.a3 AS a2,
    tmp_1814091754.prob AS prob
   FROM
   (
     SELECT
         s__STRATEGY___clef_ip_patents_DATA_result.a1 AS a1,
         tmp__1652836708.a1 AS a2,
         tmp__1652836708.a2 AS a3,
        s__STRATEGY___clef_ip_patents_DATA_result.prob
           * tmp__1652836708.prob AS prob
    FROM
        s__STRATEGY___clef_ip_patents_DATA_result,
        (
            SELECT
                 tmp_1444787941.a1 AS a1,
                 tmp_1444787941.a3 AS a2,
                 tmp_1444787941.prob AS prob
             FROM
                 (
                      SELECT
                          s__STRATEGY___clef_ip_patents_DATA_ne_doc.a1 AS a1,
                          s__STRATEGY___clef_ip_patents_DATA_ne_doc.a2 AS a2,
                          s__STRATEGY___clef_ip_patents_DATA_ne_doc.a3 AS a3,
                          s__STRATEGY___clef_ip_patents_DATA_ne_doc.prob AS prob
                  FROM
                     s__STRATEGY___clef_ip_patents_DATA_ne_doc
                  WHERE
                     s__STRATEGY___clef_ip_patents_DATA_ne_doc.a2
                             =‘ipcr-classification’
              ) AS tmp_1444787941
       ) AS tmp__1652836708
    WHERE
       s__STRATEGY___clef_ip_patents_DATA_result.a1
              = tmp__1652836708.a2
   ) AS tmp_1814091754
   ORDER BY a1
   WITH DATA;
info@spinque.com
    www.spinque.com
facebook.com/spinque

More Related Content

Similar to How to build the next 1000 search engines?!

Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesOpenSource Connections
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectorsSimon Hughes
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Lucidworks
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingSimon Hughes
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialSteven Francia
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Enginelucenerevolution
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRoelof Pieters
 
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...Infinum Android Talks #03 - Android Design Best Practices - for Designers and...
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...Infinum
 
Intelligent Stream Filtering Using MongoDB
Intelligent Stream Filtering Using MongoDBIntelligent Stream Filtering Using MongoDB
Intelligent Stream Filtering Using MongoDBMihnea Giurgea
 
ZendCon 2011 UnCon Domain-Driven Design
ZendCon 2011 UnCon Domain-Driven DesignZendCon 2011 UnCon Domain-Driven Design
ZendCon 2011 UnCon Domain-Driven DesignBradley Holt
 
DDC2011 - Association
DDC2011 - AssociationDDC2011 - Association
DDC2011 - AssociationBuhwan Jeong
 
DevLOVE Beautiful Development - 第一幕 陽の巻
DevLOVE Beautiful Development - 第一幕 陽の巻DevLOVE Beautiful Development - 第一幕 陽の巻
DevLOVE Beautiful Development - 第一幕 陽の巻都元ダイスケ Miyamoto
 
U of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreU of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreTim Schneider
 
Sitecore at the University of Alberta
Sitecore at the University of AlbertaSitecore at the University of Alberta
Sitecore at the University of AlbertaTim Schneider
 
Tiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data ModelingTiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data ModelingRobert Sanderson
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Lucidworks
 
Clients in control: building demand-driven systems with Om Next
Clients in control: building demand-driven systems with Om NextClients in control: building demand-driven systems with Om Next
Clients in control: building demand-driven systems with Om NextAntónio Monteiro
 
Android Talks #3 Android Design Best Practices - for Designers and Developers
Android Talks #3 Android Design Best Practices - for Designers and DevelopersAndroid Talks #3 Android Design Best Practices - for Designers and Developers
Android Talks #3 Android Design Best Practices - for Designers and DevelopersDenis_infinum
 

Similar to How to build the next 1000 search engines?! (20)

Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
MongoDB for Genealogy
MongoDB for GenealogyMongoDB for Genealogy
MongoDB for Genealogy
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Engine
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and Graphs
 
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...Infinum Android Talks #03 - Android Design Best Practices - for Designers and...
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...
 
Intelligent Stream Filtering Using MongoDB
Intelligent Stream Filtering Using MongoDBIntelligent Stream Filtering Using MongoDB
Intelligent Stream Filtering Using MongoDB
 
ZendCon 2011 UnCon Domain-Driven Design
ZendCon 2011 UnCon Domain-Driven DesignZendCon 2011 UnCon Domain-Driven Design
ZendCon 2011 UnCon Domain-Driven Design
 
DDC2011 - Association
DDC2011 - AssociationDDC2011 - Association
DDC2011 - Association
 
DevLOVE Beautiful Development - 第一幕 陽の巻
DevLOVE Beautiful Development - 第一幕 陽の巻DevLOVE Beautiful Development - 第一幕 陽の巻
DevLOVE Beautiful Development - 第一幕 陽の巻
 
U of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreU of A Web Strategy and Sitecore
U of A Web Strategy and Sitecore
 
Sitecore at the University of Alberta
Sitecore at the University of AlbertaSitecore at the University of Alberta
Sitecore at the University of Alberta
 
Tiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data ModelingTiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data Modeling
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
 
Clients in control: building demand-driven systems with Om Next
Clients in control: building demand-driven systems with Om NextClients in control: building demand-driven systems with Om Next
Clients in control: building demand-driven systems with Om Next
 
14 spatial analyst
14   spatial analyst14   spatial analyst
14 spatial analyst
 
Android Talks #3 Android Design Best Practices - for Designers and Developers
Android Talks #3 Android Design Best Practices - for Designers and DevelopersAndroid Talks #3 Android Design Best Practices - for Designers and Developers
Android Talks #3 Android Design Best Practices - for Designers and Developers
 

More from Arjen de Vries

Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Arjen de Vries
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Arjen de Vries
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Arjen de Vries
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineArjen de Vries
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social MediaArjen de Vries
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMMArjen de Vries
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsArjen de Vries
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master SpecialisationArjen de Vries
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big DataArjen de Vries
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part IIArjen de Vries
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with SparkArjen de Vries
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelArjen de Vries
 
The personal search engine
The personal search engineThe personal search engine
The personal search engineArjen de Vries
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationArjen de Vries
 
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeArjen de Vries
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Arjen de Vries
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Arjen de Vries
 
Twente ir-course 20-10-2010
Twente ir-course 20-10-2010Twente ir-course 20-10-2010
Twente ir-course 20-10-2010Arjen de Vries
 
Context Adaptation in Image Search
Context Adaptation in Image SearchContext Adaptation in Image Search
Context Adaptation in Image SearchArjen de Vries
 

More from Arjen de Vries (20)

Doing a PhD @ DOSSIER
Doing a PhD @ DOSSIERDoing a PhD @ DOSSIER
Doing a PhD @ DOSSIER
 
Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen)
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6)
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social Media
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMM
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC Chairs
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master Specialisation
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part II
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward Panel
 
The personal search engine
The personal search engineThe personal search engine
The personal search engine
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain Knowledge
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
 
Twente ir-course 20-10-2010
Twente ir-course 20-10-2010Twente ir-course 20-10-2010
Twente ir-course 20-10-2010
 
Context Adaptation in Image Search
Context Adaptation in Image SearchContext Adaptation in Image Search
Context Adaptation in Image Search
 

Recently uploaded

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Recently uploaded (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

How to build the next 1000 search engines?!

  • 1. How to build the next 1000 search engines?! Arjen P. de Vries arjen@acm.org Centrum Wiskunde & Informatica Delft University of Technology Spinque B.V.
  • 3. Search is everywhere  Yet it only works well on the web…
  • 4. Complications  Heterogeneous data sources  WWW, wikipedia, news, e- mail, patents, twitter, personal information, …  Varying result types  “Documents”, tweets, courses, people, expert s, gene expressions, temperatures, …  Multiple dimensions of relevance  Topicality, recency, reading level, …
  • 5. Complications  Many search tasks require a mix within these dimensions:  News and patents  Companies and their CEOs  Recent and on topic  Many search tasks also require a mix across these dimensions:  Patents assigned to our top 3 competitors in market segments mentioned in the recent press releases issued by our top 10 clients
  • 6.  System‟s internal information representation  Linguistic annotations  Named entities, sentiment, dependencies, …  Knowledge resources  Wikipedia, Freebase, IDC9, IPTC, …  Links to related documents  Citations, urls  Anchors that describe the URI  Anchor text  Queries that lead to clicks on the URI  Session, user, dwell-time, …  Tweets that mention the URI  Time, location, user, …  Other social media that describe the URI  User, rating  Tag, organisation of `folksonomy‟ + UNCERTAINTY ALL OVER!
  • 7. What goes in the black box? Document Collection: Anchors Entity types Sentiment Tweets BM25 Cited documents BM25F … LM RM Ranked VSM list DFR of answers QIR? User Learning to rank? Context ECIR / CIKM / SIGIR / ICTIR / WSDM papers!
  • 8. Rarely & scarcely addressed… Student: How do I build it? Professor: Who will build it for me? Last session of the conference…
  • 10. Parameterised Search System Cornacchia, De Vries, ECIR 2007 A Parametrised Search System
  • 11. Parameterised Search System Cannot we ‘remove’ this IR engineer (or scientist!) from the loop, like DBMS software removes the data engineer from the loop? Cornacchia, De Vries, ECIR 2007 A Parametrised Search System And three (four?) children, a startup and 5 years later, a PhD defense!
  • 12. Search by Strategy  Visually construct search strategies by connecting building blocks
  • 13.
  • 14. Search by Strategy  Visually construct search strategies by connecting building blocks  Each block describes either data or actions upon that data  Connection points (“pins”) are typed: doc / sec / term / ne (named entity) / tuple  Actions are expressed as scripts (later more)
  • 16. From Patent to Inventor
  • 17. Reports Visits
  • 18. Generate Search Engine! Or, really, generate a REST API from the strategy specification!
  • 19. Demo (Showed demo of children‟s search engine)
  • 20. How Strategies Help  Strategies improve communication between search intermediary and user  Encapsulate domain expert knowledge  Abstract representation of search expert knowledge  Analyze information seeking process at any stage  Strategies facilitate knowledge management  Store / share / publish / refine  Strategies mix exact (DB) and ranked (IR) searches  Avoid the need for “human (probabilistic) joins”
  • 21.
  • 22. Search Intermediaries  Travel agency Task complexity  Real estate agents  Recruiters  Librarians  Archivists  Digital forensics detectives  Patent information specialists
  • 23. Exploratory Search  Search & (Faceted) Browsing  Help discover schema, ontology, etc.  Help discover the relevant sources  Within-collection (by year/location, by type, …)  Across multiple collections (by source)
  • 24. Probabilistic faceted browsing Traditional (boolean filters) Probabilistic Price Price • 100K - 200K • 100K - 200K • 200K - 300K • 200K - 300K • 300K - 400K • 300K - 400K Rooms Rooms • 3 • 3 • 4 • 4 • 5 • 5 Size Size • 100 - 150 m2 • 100 - 150 m2 • 150 - 200 m2 • 150 - 200 m2 • 200 - 250 m2 • 200 - 250 m2 • Good when user knows exactly • Good for exploratory search which filters to apply • Will see perfect-match results • Will see perfect-match results • Won’t see “interesting” results • Will also see “interesting” results
  • 25. Dynamic facets Pre-indexed Dynamic Price Price • 100K - 200K • 100K - 200K • 200K - 300K • 200K - 300K • 300K - 400K • 300K - 400K Rooms Rooms • 3 • 3 • 4 • 4 • 5 • 5 Size Size • 100 - 150 m2 • 100 - 150 m2 • 150 - 200 m2 • 150 - 200 m2 • 200 - 250 m2 • 200 - 250 m2 • Pre-defined ad-hoc indices • Facets decided from result set intersected with result set • Challenge: dynamically adapt granularity • Challenge: many indices to maintain • Different price ranges for villa/garage! • Challenge: heavy concurrent queries to DB
  • 27. Limitations Search & Browse  Faceted exploration does not include joins  Cannot construct new data sources from existing ones!  Only the pre-defined paths through the information space can actually be traversed
  • 28. Who needs a Join?  You!!! … whenever „relevance cues‟ are typed:  People (e.g., inventors)  Companies (e.g., assignees)  Categories (e.g., IPTC)  Time (e.g., expiry date)  Location (e.g., country) … or whenever multiple sources are to be combined  E.g., patents & news, patents & Wikipedia, …
  • 29. Patents on X by Y(y) by Y(y)
  • 30. Interactive Information Access  Feedback:  Interaction improves information representation  Faceted Browsing:  Interaction can let user take over where machine would fail  Search by Strategy:  Interaction can let user take over where system designer would fail
  • 31. Conclusion  “No idealized one-shot search engine”  Empower the user!
  • 33. From Strategies to DB Queries in1 in2 in3 Strategy • Data flow BB1(in1,in2,in3, u1,u2) out in1 BB2(in1) Spinque: strategy out CREATE VIEW a AS SELECT .. • Query: strategy made operational CREATE VIEW b AS SELECT .. CREATE VIEW c AS Spinque: PRA SELECT ..  Database Spinque: RDBMS (MonetDB) Relational DB
  • 34. Probabilistic Relational Algebra Strategy x = Project DISTINCT • PRA: probabilistic [$1,$3](y); relational algebra (Fuhr and Roelleke, TOIS 2001) CREATE VIEW x AS SELECT a1, a3, • SQL 1-prod(1-prob) AS prob FROM y explicit probabilities GROUP BY a1, a3; Relational DB
  • 35. What‟s in the DB?  Text-based ranking T D f  term-doc-freq relations (inverted file) t0 d3 3  One per language, stemming, section t0 d5 10  Domain-independent, click and index t1 d2 4  Entity ranking subj pred/attr obj/value p  Probabilistic triples Arjen speaks_to you 0.95  Domain-aware you follow Arjen 0.5 speech minutes 45 0.8  Needs supervised indexing  Content-based (MM) retrieval Img_id f1 … fN …  Feature vectors, click and index 0 0.12 0.84 1 0.54 … 0.31 2 0.23 … 0.1
  • 36. VIEWS and TABLES User Stored relation parameter CREATE VIEW TABLE a AS SELECT … FROM term-doc … ; CREATE VIEW b AS SELECT … FROM a WHERE a.x = u1 ; CREATE VIEW TABLE c AS SELECT … FROM a WHERE a.x = 42 ; CREATE VIEW d AS SELECT … FROM b … ; No user parameter Pre-computable  BB content: sequence of VIEW definitions relation  A VIEW is pre-computable when  All the relations addressed are pre-computable / stored  No dependency on user parameters  Pre-computable VIEWs can become TABLEs (or MATERIALIZED VIEWs)  Query-independent computations are performed only once, then read from TABLEs at each query  Recognition of these patterns is fully automatic  Extends MonetDB‟s per-session caching to across-sessions caching
  • 38. Current Situation  index ; Schema definition  repeat {  specify ;  retrieve Search & explore  } until 
  • 39. Traditional Indexing  Preprocessing determines to large extend how search request form will be processed  Especially regarding tokenization, stemming, etc.  Fast and scalable, but inflexible  E.g., entity search hard-coded on top of engine, advertisements matched on different data, etc.
  • 40. Search by Strategy  Flexible: generate arbitrary engine on the fly  Not as fast as highly optimized and very well engineered inverted file based systems
  • 41. Desirable Situation  repeat {  index ; Mixed Initiative  specify ; Schema definition Search & explore  retrieve  } until 
  • 42. Non-Indexed Search  Grep  Very flexible  Use it all the time on my mh mail folders when gmail fails me!  Not scalable, little or no structure
  • 43. Minimal Indexing  How to reduce pre-processing necessary to create a search engine over a new collection?  Can we do without a keyword index?  Can we avoid hardwired decisions for tokenization, language detection, stemming, …
  • 44. Suffix Array  Pro's:  provides many core search functions: term statistics, keyword search, phrase search.  no upfront tokenization needed (access at character level)  no upfront language detection needed  Con's:  difficult to build for large corpora  expensive w.r.t. disk space
  • 45.
  • 48. Patents on X by Y(y) by Y(y)
  • 49. PRA s__STRATEGY___filter_DOC_with_NE_nes = Project [$2,$3]( Join [$1 = $2]( s__STRATEGY___clef_ip_patents_DATA_result, Project [$1,$3]( Select [$2 = "ipcr-classification"]( s__STRATEGY___clef_ip_patents_DATA_ne_doc ) ) ) );
  • 50. CREATE TABLE s__STRATEGY___filter_DOC_with_NE_nes AS SELECT tmp_1814091754.a2 AS a1, tmp_1814091754.a3 AS a2, tmp_1814091754.prob AS prob FROM ( SELECT s__STRATEGY___clef_ip_patents_DATA_result.a1 AS a1, tmp__1652836708.a1 AS a2, tmp__1652836708.a2 AS a3, s__STRATEGY___clef_ip_patents_DATA_result.prob * tmp__1652836708.prob AS prob FROM s__STRATEGY___clef_ip_patents_DATA_result, ( SELECT tmp_1444787941.a1 AS a1, tmp_1444787941.a3 AS a2, tmp_1444787941.prob AS prob FROM ( SELECT s__STRATEGY___clef_ip_patents_DATA_ne_doc.a1 AS a1, s__STRATEGY___clef_ip_patents_DATA_ne_doc.a2 AS a2, s__STRATEGY___clef_ip_patents_DATA_ne_doc.a3 AS a3, s__STRATEGY___clef_ip_patents_DATA_ne_doc.prob AS prob FROM s__STRATEGY___clef_ip_patents_DATA_ne_doc WHERE s__STRATEGY___clef_ip_patents_DATA_ne_doc.a2 =‘ipcr-classification’ ) AS tmp_1444787941 ) AS tmp__1652836708 WHERE s__STRATEGY___clef_ip_patents_DATA_result.a1 = tmp__1652836708.a2 ) AS tmp_1814091754 ORDER BY a1 WITH DATA;
  • 51. info@spinque.com www.spinque.com facebook.com/spinque

Editor's Notes

  1. Does “Entity-based ranking” make sense?
  2. NOTE: MATERIALIZED VIEWs, where supported (not in MonetDB), can be used instead of TABLEs when stored relations (index) are expected to get updates.