SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Implementing Click-through
    Relevance Ranking
    in Solr and LucidWorks Enterprise

            Andrzej Białecki
         ab@lucidimagination.com
About the speaker
§  Started using Lucene in 2003 (1.2-dev…)
§  Created Luke – the Lucene Index Toolbox
§  Apache Nutch, Hadoop, Solr committer, Lucene
    PMC member
§  Apache Nutch PMC Chair
§  LucidWorks Enterprise developer




                                                   3
Agenda
§  Click-through concepts
§  Apache Solr click-through scoring
  •  Model
  •  Integration options
§  LucidWorks Enterprise
  •  Click Scoring Framework
  •  Unsupervised feedback




                                        4
Click-through concepts



                         5
Improving relevance of top-N hits
§  N < 10, first page counts the most
   •  N = 3, first three results count the most
§  Many techniques available in Solr / Lucene
   •  Indexing-time
      §  text analysis, morphological analysis, synonyms, ...
   •  Query-time
      §  boosting, rewriting, synonyms, DisMax, function queries …
   •  Editorial ranking (QueryElevationComponent)
§  No direct feedback from users on relevance L
§  What user actions do we know about?
   •  Search, navigation, click-through, other actions…

                                                                      6
Query log and click-through events
Click-through: user selects an item at a
     among a                       for a query
§  Why this information may be useful
   •    “Indicates” user's interest in a selected result
   •    “Implies” that the result is relevant to the query
   •    “Significant” when low-ranking results selected
   •    “May be” considered as user's implicit feedback
§  Why this information may be useless
   •  Many strong assumptions about user’s intent
   •  “Average user’s behavior” could be a fiction
§  “Careful with that axe, Eugene”
                                                             7
Click-through in context
§  Query log, click positions, click intervals provide a
    context
§  Source of spell-checking data
   •  Query reformulation until a click event occurs
§  Click events per user – total or during a session
   •  Building a user profile (e.g. topics of interest)
§  Negative click events
   •  User did NOT click the top 3 results è demote?
§  Clicks of all users for an item (or a query, or both)
   •  Item popularity or relevance to queries
§  Goal: analysis and modification of result ranking
                                                            8
Click to add title…
§    Clicking through == adding labels!
§    Collaborative filtering, recommendation system
§    Topic discovery & opinion mining
§    Tracking the topic / opinion drift over time
§    Click-stream is sparse and noisy – caveat emptor
      •  Changing intent – “hey, this reminds me of smth…”
      •  Hidden intent – remember the “miserable failure”?
      •  No intent at all – “just messing around”




                                                         9
What’s in the click-through data?
§  Query log, with unique id=f(user,query,time)!
   •    User id (or group)
   •    Query (+ facets, filters, origin, etc)
   •    Number of returned results
   •    Context (suggestions, autocomplete, “more like
        this” terms …)
§  Click-through log
   •  Query id , document id, click position & click
      timestamp
§  What data we would like to get?
   •  Map of docId =>
       §  Aggregated queries, aggregated users
       §  Weight factor f(clickCount, positions, intervals)
                                                          10
Other aggregations / reports
§  User profiles
   •  Document types / categories viewed most often
   •  Population profile for a document
   •  User’s sophistication, education level, locations,
      interests, vices … (scary!)
§  Query re-formulations
   •  Spell-checking or “did you mean”
§  Corpus of the most useful queries
   •  Indicator for caching of results and documents
§  Zeitgeist – general user interest over time

                                                           11
Documents with click-through data
   original document     document with click-through data
   -  documentWeight             -  documentWeight

   -  field1 : weight1           -    field1 : weight1
   -  field2 : weight2           -    field2 : weight2
   -  field3 : weight3           -    field3 : weight3
                                 -    labels : weight4
                                 -    users : weight5


§  Modified document and field weights
§  Added / modified fields
   •  Top-N labels aggregated from successful queries
   •  User “profile” aggregated from click-throughs
§  Changing in time – new clicks arrive
                                                            12
Desired effects
§  Improvement in relevance of top-N results
  •  Non-query specific:
     f(clickCount)         (or “popularity”)
  •  Query-specific:
     f([query] Ÿ [labels])
  •  User-specific (personalized ranking):
     f([userProfile] Ÿ [docProfile])
§  Observed phenomena
   •  Top-10 better matches user expectations
   •  Inversion of ranking (oft-clicked > TF-IDF)
   •  Positive feedback
      clicked -> highly ranked -> clicked -> even higher ranked …
                                                                    13
Undesired effects
§  Unbounded positive feedback
   •  Top-10 dominated by popular but irrelevant
      results, self-reinforcing due to user expectations
      about the Top-10 results
§  Everlasting effects of past click-storms
   •  Top-10 dominated by old documents once
      extremely popular for no longer valid reasons
§  Off-topic (noisy) labels
§  Conclusions:
   •  f(click data) should be sub-linear
   •  f(click data, time) should discount older clicks
   •  f(click data) should be sanitized and bounded

                                                           14
Implementation



                 15
Click-through scoring in Solr
§  Not out of the box – you need:
   •    A component to log queries
   •    A component to record click-throughs
   •    A tool to correlate and aggregate the logs
   •    A tool to manage click-through history


§  …let’s (conveniently) assume the above is
    handled by a user-facing app… and we got that
    map of docId => click data

§  How to integrate this map into a Solr index?
                                                     16
Via ExternalFileField
§  Pros:
   •  Simple to implement
   •  Easy to update – no need to do full re-indexing
      (just core reload)
§  Cons:
   •  Only docId => field : boost
   •  No user-generated labels attached to docs L L
§  Still useful if a simple “popularity” metric is
    sufficient



                                                        17
Via full re-index
§  If the corpus is small, or click data updates
    infrequent… just re-index everything
§  Pros:
   •  Relatively easy to implement – join source docs
      and click data by docId + reindex
   •  Allows adding all click data, including labels as
      searchable text
§  Cons:
   •  Infeasible for larger corpora or frequent updates,
      time-wise and cost-wise


                                                           18
Via incremental field updates




§  Oops! Under construction, come back later…

§  … much later …
  •  Some discussions on the mailing lists
  •  No implementation yet, design in flux
                                                 19
Via ParallelReader
   click data                     main index

c1, c2, ...     D1           D4    1 f1, f2, ...         D4     c1, c2, ...   D4   1 f1, f2, ...
c1, c2, ...     D2           D2    2 f1, f2, ...         D2     c1, c2, ...   D2   2 f1, f2, ...
c1, c2, ...     D3           D6    3 f1, f2, ...         D6     c1, c2, ...   D6   3 f1, f2, ...
c1, c2, ...     D4           D1    4 f1, f2, ...         D1     c1, c2, ...   D1   4 f1, f2, ...
c1, c2, ...     D5           D3    5 f1, f2, ...         D3     c1, c2, ...   D3   5 f1, f2, ...
c1, c2,…        D6           D5    6 f1, f2, …           D5     c1, c2,…      D5   6 f1, f2, …


§  Pros:
      •  All click data (e.g. searchable labels) can be added
§  Cons:
      •  Complicated and fragile (rebuild on every update)
              §  Though only the click index needs a rebuild
      •  No tools to manage this parallel index in Solr
                                                                                              20
LucidWorks Enterprise
implementation



                        21
Click Scoring Framework
§  LucidWorks Enterprise feature
§  Click-through log collection & analysis
   •  Query logs and click-through logs (when using
      Lucid's search UI)
   •  Analysis of click-through events
   •  Maintenance of historical click data
   •  Creating of query phrase dictionary (-> autosuggest)
§  Modification of ranking based on click events:
   •  Modifies query rewriting & field boosts
   •  Adds top query phrases associated with a document
   http://getopt.org/   0.13   luke:0.5,stempel:0.3,murmur:0.2


                                                                 22
Aggregation of click events
§  Relative importance of clicks:
   •  Clicks on lower ranking documents more important
      §  Plateau after the second page
   •  The more clicks the more important a document
      §  Sub-linear to counter click-storms
   •  “Reading time” weighting factor
      §  Intervals between clicks on the same result list
§  Association of query terms with target document
   •  Top-N successful queries considered
   •  Top-N frequent phrases (shingles) extracted from
      queries, sanitized


                                                             23
Aggregation of click-through history
§  Needs to reflect document popularity over time
   •  Should react quickly to bursts (topics of the day)
   •  Has to avoid documents being “stuck” at the top
      due to the past popularity
§  Solution: half-life decay model
   •  Adjustable period & rate
   •  Adjustable length of history (affects smoothing)




                                                   time



                                                           24
Click scoring in practice
l    Query log and click log generated by the
      LucidWorks search UI
      l    Logs and intermediate data files in plain text,
            well-documented formats and locations
l    Scheduled click-through analysis activity
l    Final click data – open formats
      l    Boost factor plus top phrases per document
            (plain text)
l    Click data is integrated with the main index
      l    No need to re-index the main corpus
            (ParallelReader trick)
               l    Where are the incremental field updates when you need them ?!!!
      l    Works also with Solr replication (rsync or Java)
                                                                                       25
Click Scoring – added fields
l    Fields added to the main index
      l    click – a field with a constant value of 1, but
            with boost relative to aggregated click history
            l    Indexed, with norms
      l    click_val - “string” (not analyzed) field
            containing numerical value of boost
            l    Stored, indexed, not analyzed
      l    click_terms – top-N terms and phrases from
            queries that caused click events on this
            document
            l    Stored, indexed and analyzed



                                                              26
Click scoring – query modifications
§  Using click in queries (or DisMax’s bq)
  •  Constant term “1” with boost value
  •  Example: term1 OR click:1
§  Using click_val in function queries
  •  Floating point boost value as a string
  •  Example: term1 OR _val_:click_val
§  Using click_terms in queries (e.g. DisMax)
  •  Add click_terms to the list of query fields (qf)
     in DisMax handler (default in /lucid)
  •  Matches on click_terms will be scored as other
     matches on other fields

                                                        27
Click Scoring – impact
l    Configuration options of the click analysis tools
      l  max normalization

            l    The highest value of click boost will be 1, all
                  other values are proportionally lower
            l    Controlled max impact on any given result list
      l    total normalization
            l    Total value of all boosts will be constant
            l    Limits the total impact of click scoring on all lists
                  of results
      l raw – whatever value is in the click data
l    Controlled impact is the key for improving the
      top–N results
                                                                          28
LucidWorks Enterprise –
Unsupervised Feedback



                          29
Unsupervised feedback
l    LucidWorks Enterprise feature
l    Unsupervised – no need to train the system
l    Enhances quality of top-N results
      l    Well-researched topic
      l    Several strategies for keyword extraction and
            combining with the original query
l    Automatic feedback loop:
      l    Submit original query and take the top 5 docs
      l    Extracts some keywords (“important” terms)
      l    Combine original query with extracted keywords
      l    Submit the modified query & return results
                                                             30
Unsupervised feedback options
l    “Enhance precision” option (tighter fit)      precision

      l    Extracted terms are AND-ed with the
            original query
             dog AND (cat OR mouse)

      l    Filters out documents less similar to               recall

            the original top-5
l    “Enhance recall” option (more
      documents)                                    precision

      l    Extracted terms are OR-ed with the
            original query
             dog OR cat OR mouse
                                                                recall
      l    Adds more documents loosely similar
            to the original top-5
                                                                 31
Summary & QA
§  Click-through concepts
§  Apache Solr click-through scoring
  •  Model
  •  Integration options
§  LucidWorks Enterprise
  •  Click Scoring Framework
  •  Unsupervised feedback


§  More questions? ab@lucidimagination.com


                                              32

Weitere ähnliche Inhalte

Was ist angesagt?

Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
10 Keys to Solr's Future: Presented by Grant Ingersoll, LucidworksLucidworks
 
Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphLucidworks
 
Solr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewSolr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewKevin Watters
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Lucidworks
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesLucidworks (Archived)
 
Deduplication Using Solr: Presented by Neeraj Jain, Stubhub
Deduplication Using Solr: Presented by Neeraj Jain, StubhubDeduplication Using Solr: Presented by Neeraj Jain, Stubhub
Deduplication Using Solr: Presented by Neeraj Jain, StubhubLucidworks
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Lucidworks
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...lucenerevolution
 
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...Christian Posse
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrTrey Grainger
 
Enhance discovery Solr and Mahout
Enhance discovery Solr and MahoutEnhance discovery Solr and Mahout
Enhance discovery Solr and Mahoutlucenerevolution
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksLucidworks
 
Solr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW TechnologySolr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW TechnologyLucidworks
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Lucidworks
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 

Was ist angesagt? (18)

Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
 
Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and Graph
 
Solr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewSolr 6.0 Graph Query Overview
Solr 6.0 Graph Query Overview
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User Preferences
 
Deduplication Using Solr: Presented by Neeraj Jain, Stubhub
Deduplication Using Solr: Presented by Neeraj Jain, StubhubDeduplication Using Solr: Presented by Neeraj Jain, Stubhub
Deduplication Using Solr: Presented by Neeraj Jain, Stubhub
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
 
Whitepaper- Real World Search
Whitepaper-  Real World SearchWhitepaper-  Real World Search
Whitepaper- Real World Search
 
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/Solr
 
Enhance discovery Solr and Mahout
Enhance discovery Solr and MahoutEnhance discovery Solr and Mahout
Enhance discovery Solr and Mahout
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
 
Solr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW TechnologySolr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW Technology
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 

Ähnlich wie Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise

Making Session Stores More Intelligent
Making Session Stores More IntelligentMaking Session Stores More Intelligent
Making Session Stores More IntelligentKyle Davis
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
It4 Coursework Help
It4 Coursework HelpIt4 Coursework Help
It4 Coursework HelpJTHSICT
 
Stc preso2012 b
Stc preso2012 bStc preso2012 b
Stc preso2012 bprboswell
 
Adaptable Information Workshop slides
Adaptable Information Workshop slidesAdaptable Information Workshop slides
Adaptable Information Workshop slidesLouis Rosenfeld
 
Alla ricerca della User Story perduta
Alla ricerca della User Story perdutaAlla ricerca della User Story perduta
Alla ricerca della User Story perdutaEdoardo Schepis
 
Alla ricerca della user story perduta
Alla ricerca della user story perdutaAlla ricerca della user story perduta
Alla ricerca della user story perdutaBetter Software
 
Revolutionizing the hypatia metadata experience
Revolutionizing the hypatia metadata experienceRevolutionizing the hypatia metadata experience
Revolutionizing the hypatia metadata experienceKat Chuang
 
Usability and Salesforce - Dallas Salesforce.com User Group September 2011
Usability and Salesforce - Dallas Salesforce.com User Group September 2011Usability and Salesforce - Dallas Salesforce.com User Group September 2011
Usability and Salesforce - Dallas Salesforce.com User Group September 2011Shell Black
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentationTao Feng
 
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...museums and the web
 
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...Lucidworks
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
 
Exploring Data Preparation and Visualization Tools for Urban Forestry
Exploring Data Preparation and Visualization Tools for Urban ForestryExploring Data Preparation and Visualization Tools for Urban Forestry
Exploring Data Preparation and Visualization Tools for Urban ForestryAzavea
 
Implimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyImplimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyIndiana Online Users Group
 

Ähnlich wie Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise (20)

Making Session Stores More Intelligent
Making Session Stores More IntelligentMaking Session Stores More Intelligent
Making Session Stores More Intelligent
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
It4 Coursework Help
It4 Coursework HelpIt4 Coursework Help
It4 Coursework Help
 
Stc preso2012 b
Stc preso2012 bStc preso2012 b
Stc preso2012 b
 
Adaptable Information Workshop slides
Adaptable Information Workshop slidesAdaptable Information Workshop slides
Adaptable Information Workshop slides
 
Alla ricerca della User Story perduta
Alla ricerca della User Story perdutaAlla ricerca della User Story perduta
Alla ricerca della User Story perduta
 
Alla ricerca della user story perduta
Alla ricerca della user story perdutaAlla ricerca della user story perduta
Alla ricerca della user story perduta
 
Revolutionizing the hypatia metadata experience
Revolutionizing the hypatia metadata experienceRevolutionizing the hypatia metadata experience
Revolutionizing the hypatia metadata experience
 
Usability and Salesforce - Dallas Salesforce.com User Group September 2011
Usability and Salesforce - Dallas Salesforce.com User Group September 2011Usability and Salesforce - Dallas Salesforce.com User Group September 2011
Usability and Salesforce - Dallas Salesforce.com User Group September 2011
 
F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...
 
One day Course On Agile
One day Course On AgileOne day Course On Agile
One day Course On Agile
 
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
 
Gauge March 2012
Gauge March 2012 Gauge March 2012
Gauge March 2012
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
Exploring Data Preparation and Visualization Tools for Urban Forestry
Exploring Data Preparation and Visualization Tools for Urban ForestryExploring Data Preparation and Visualization Tools for Urban Forestry
Exploring Data Preparation and Visualization Tools for Urban Forestry
 
Implimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyImplimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled Technology
 
Ch 3
Ch   3Ch   3
Ch 3
 

Mehr von Lucidworks (Archived)

Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Lucidworks (Archived)
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchLucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...Lucidworks (Archived)
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCLucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCLucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 

Mehr von Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 

Kürzlich hochgeladen

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise

  • 1. Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise Andrzej Białecki ab@lucidimagination.com
  • 2. About the speaker §  Started using Lucene in 2003 (1.2-dev…) §  Created Luke – the Lucene Index Toolbox §  Apache Nutch, Hadoop, Solr committer, Lucene PMC member §  Apache Nutch PMC Chair §  LucidWorks Enterprise developer 3
  • 3. Agenda §  Click-through concepts §  Apache Solr click-through scoring •  Model •  Integration options §  LucidWorks Enterprise •  Click Scoring Framework •  Unsupervised feedback 4
  • 5. Improving relevance of top-N hits §  N < 10, first page counts the most •  N = 3, first three results count the most §  Many techniques available in Solr / Lucene •  Indexing-time §  text analysis, morphological analysis, synonyms, ... •  Query-time §  boosting, rewriting, synonyms, DisMax, function queries … •  Editorial ranking (QueryElevationComponent) §  No direct feedback from users on relevance L §  What user actions do we know about? •  Search, navigation, click-through, other actions… 6
  • 6. Query log and click-through events Click-through: user selects an item at a among a for a query §  Why this information may be useful •  “Indicates” user's interest in a selected result •  “Implies” that the result is relevant to the query •  “Significant” when low-ranking results selected •  “May be” considered as user's implicit feedback §  Why this information may be useless •  Many strong assumptions about user’s intent •  “Average user’s behavior” could be a fiction §  “Careful with that axe, Eugene” 7
  • 7. Click-through in context §  Query log, click positions, click intervals provide a context §  Source of spell-checking data •  Query reformulation until a click event occurs §  Click events per user – total or during a session •  Building a user profile (e.g. topics of interest) §  Negative click events •  User did NOT click the top 3 results è demote? §  Clicks of all users for an item (or a query, or both) •  Item popularity or relevance to queries §  Goal: analysis and modification of result ranking 8
  • 8. Click to add title… §  Clicking through == adding labels! §  Collaborative filtering, recommendation system §  Topic discovery & opinion mining §  Tracking the topic / opinion drift over time §  Click-stream is sparse and noisy – caveat emptor •  Changing intent – “hey, this reminds me of smth…” •  Hidden intent – remember the “miserable failure”? •  No intent at all – “just messing around” 9
  • 9. What’s in the click-through data? §  Query log, with unique id=f(user,query,time)! •  User id (or group) •  Query (+ facets, filters, origin, etc) •  Number of returned results •  Context (suggestions, autocomplete, “more like this” terms …) §  Click-through log •  Query id , document id, click position & click timestamp §  What data we would like to get? •  Map of docId => §  Aggregated queries, aggregated users §  Weight factor f(clickCount, positions, intervals) 10
  • 10. Other aggregations / reports §  User profiles •  Document types / categories viewed most often •  Population profile for a document •  User’s sophistication, education level, locations, interests, vices … (scary!) §  Query re-formulations •  Spell-checking or “did you mean” §  Corpus of the most useful queries •  Indicator for caching of results and documents §  Zeitgeist – general user interest over time 11
  • 11. Documents with click-through data original document document with click-through data -  documentWeight -  documentWeight -  field1 : weight1 -  field1 : weight1 -  field2 : weight2 -  field2 : weight2 -  field3 : weight3 -  field3 : weight3 -  labels : weight4 -  users : weight5 §  Modified document and field weights §  Added / modified fields •  Top-N labels aggregated from successful queries •  User “profile” aggregated from click-throughs §  Changing in time – new clicks arrive 12
  • 12. Desired effects §  Improvement in relevance of top-N results •  Non-query specific: f(clickCount) (or “popularity”) •  Query-specific: f([query] Ÿ [labels]) •  User-specific (personalized ranking): f([userProfile] Ÿ [docProfile]) §  Observed phenomena •  Top-10 better matches user expectations •  Inversion of ranking (oft-clicked > TF-IDF) •  Positive feedback clicked -> highly ranked -> clicked -> even higher ranked … 13
  • 13. Undesired effects §  Unbounded positive feedback •  Top-10 dominated by popular but irrelevant results, self-reinforcing due to user expectations about the Top-10 results §  Everlasting effects of past click-storms •  Top-10 dominated by old documents once extremely popular for no longer valid reasons §  Off-topic (noisy) labels §  Conclusions: •  f(click data) should be sub-linear •  f(click data, time) should discount older clicks •  f(click data) should be sanitized and bounded 14
  • 15. Click-through scoring in Solr §  Not out of the box – you need: •  A component to log queries •  A component to record click-throughs •  A tool to correlate and aggregate the logs •  A tool to manage click-through history §  …let’s (conveniently) assume the above is handled by a user-facing app… and we got that map of docId => click data §  How to integrate this map into a Solr index? 16
  • 16. Via ExternalFileField §  Pros: •  Simple to implement •  Easy to update – no need to do full re-indexing (just core reload) §  Cons: •  Only docId => field : boost •  No user-generated labels attached to docs L L §  Still useful if a simple “popularity” metric is sufficient 17
  • 17. Via full re-index §  If the corpus is small, or click data updates infrequent… just re-index everything §  Pros: •  Relatively easy to implement – join source docs and click data by docId + reindex •  Allows adding all click data, including labels as searchable text §  Cons: •  Infeasible for larger corpora or frequent updates, time-wise and cost-wise 18
  • 18. Via incremental field updates §  Oops! Under construction, come back later… §  … much later … •  Some discussions on the mailing lists •  No implementation yet, design in flux 19
  • 19. Via ParallelReader click data main index c1, c2, ... D1 D4 1 f1, f2, ... D4 c1, c2, ... D4 1 f1, f2, ... c1, c2, ... D2 D2 2 f1, f2, ... D2 c1, c2, ... D2 2 f1, f2, ... c1, c2, ... D3 D6 3 f1, f2, ... D6 c1, c2, ... D6 3 f1, f2, ... c1, c2, ... D4 D1 4 f1, f2, ... D1 c1, c2, ... D1 4 f1, f2, ... c1, c2, ... D5 D3 5 f1, f2, ... D3 c1, c2, ... D3 5 f1, f2, ... c1, c2,… D6 D5 6 f1, f2, … D5 c1, c2,… D5 6 f1, f2, … §  Pros: •  All click data (e.g. searchable labels) can be added §  Cons: •  Complicated and fragile (rebuild on every update) §  Though only the click index needs a rebuild •  No tools to manage this parallel index in Solr 20
  • 21. Click Scoring Framework §  LucidWorks Enterprise feature §  Click-through log collection & analysis •  Query logs and click-through logs (when using Lucid's search UI) •  Analysis of click-through events •  Maintenance of historical click data •  Creating of query phrase dictionary (-> autosuggest) §  Modification of ranking based on click events: •  Modifies query rewriting & field boosts •  Adds top query phrases associated with a document http://getopt.org/ 0.13 luke:0.5,stempel:0.3,murmur:0.2 22
  • 22. Aggregation of click events §  Relative importance of clicks: •  Clicks on lower ranking documents more important §  Plateau after the second page •  The more clicks the more important a document §  Sub-linear to counter click-storms •  “Reading time” weighting factor §  Intervals between clicks on the same result list §  Association of query terms with target document •  Top-N successful queries considered •  Top-N frequent phrases (shingles) extracted from queries, sanitized 23
  • 23. Aggregation of click-through history §  Needs to reflect document popularity over time •  Should react quickly to bursts (topics of the day) •  Has to avoid documents being “stuck” at the top due to the past popularity §  Solution: half-life decay model •  Adjustable period & rate •  Adjustable length of history (affects smoothing) time 24
  • 24. Click scoring in practice l  Query log and click log generated by the LucidWorks search UI l  Logs and intermediate data files in plain text, well-documented formats and locations l  Scheduled click-through analysis activity l  Final click data – open formats l  Boost factor plus top phrases per document (plain text) l  Click data is integrated with the main index l  No need to re-index the main corpus (ParallelReader trick) l  Where are the incremental field updates when you need them ?!!! l  Works also with Solr replication (rsync or Java) 25
  • 25. Click Scoring – added fields l  Fields added to the main index l  click – a field with a constant value of 1, but with boost relative to aggregated click history l  Indexed, with norms l  click_val - “string” (not analyzed) field containing numerical value of boost l  Stored, indexed, not analyzed l  click_terms – top-N terms and phrases from queries that caused click events on this document l  Stored, indexed and analyzed 26
  • 26. Click scoring – query modifications §  Using click in queries (or DisMax’s bq) •  Constant term “1” with boost value •  Example: term1 OR click:1 §  Using click_val in function queries •  Floating point boost value as a string •  Example: term1 OR _val_:click_val §  Using click_terms in queries (e.g. DisMax) •  Add click_terms to the list of query fields (qf) in DisMax handler (default in /lucid) •  Matches on click_terms will be scored as other matches on other fields 27
  • 27. Click Scoring – impact l  Configuration options of the click analysis tools l  max normalization l  The highest value of click boost will be 1, all other values are proportionally lower l  Controlled max impact on any given result list l  total normalization l  Total value of all boosts will be constant l  Limits the total impact of click scoring on all lists of results l raw – whatever value is in the click data l  Controlled impact is the key for improving the top–N results 28
  • 29. Unsupervised feedback l  LucidWorks Enterprise feature l  Unsupervised – no need to train the system l  Enhances quality of top-N results l  Well-researched topic l  Several strategies for keyword extraction and combining with the original query l  Automatic feedback loop: l  Submit original query and take the top 5 docs l  Extracts some keywords (“important” terms) l  Combine original query with extracted keywords l  Submit the modified query & return results 30
  • 30. Unsupervised feedback options l  “Enhance precision” option (tighter fit) precision l  Extracted terms are AND-ed with the original query dog AND (cat OR mouse) l  Filters out documents less similar to recall the original top-5 l  “Enhance recall” option (more documents) precision l  Extracted terms are OR-ed with the original query dog OR cat OR mouse recall l  Adds more documents loosely similar to the original top-5 31
  • 31. Summary & QA §  Click-through concepts §  Apache Solr click-through scoring •  Model •  Integration options §  LucidWorks Enterprise •  Click Scoring Framework •  Unsupervised feedback §  More questions? ab@lucidimagination.com 32