Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Webinar: Modern Techniques for Better Search Relevance with Fusion

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 33 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (19)

Ähnlich wie Webinar: Modern Techniques for Better Search Relevance with Fusion (20)

Anzeige

Weitere von Lucidworks (20)

Aktuellste (20)

Anzeige

Webinar: Modern Techniques for Better Search Relevance with Fusion

  1. 1. Modern  Techniques  for  Better  Search   Relevance  in  Fusion Grant  Ingersoll   CTO,  Lucidworks   November  15,  2017
  2. 2. 😊 iPad case
  3. 3. 😊 iPad case ! "ipad accessory"~3 OR "ipad case"~5
  4. 4. 1. 15.
  5. 5. 👎
  6. 6. So,  what  do  you  do?
  7. 7. RT(F)M!
  8. 8. if  (doc.name.contains(“Vikings”)){   doc.boost  =  100   } OR q:(MAIN  QUERY)  OR  (name:Vikings)^10 Index  Time: Query  Time:
  9. 9. • Term  Frequency:  “How  well  a  term  describes  a  document”   • Measure:  how  often  a  term  occurs  per  document   • Inverse  Document  Frequency:  “How  important  is  a  term   overall”   • Measure:  how  rare  the  term  is  across  all  documents TF*IDF
  10. 10. Score(q,  d)  =                  ∑    idf(t)  ·∙  (  tf(t  in  d)  ·∙  (k  +  1)  )  /  (  tf(t  in  d)  +  k  ·∙  (1  –  b  +  b  ·∙  |d|  /  avgdl  )                        t  in  q   Where:                      t  =  term;  d  =  document;  q  =  query;  i  =  index
                  tf(t  in  d)    =    numTermOccurrencesInDocument  ½                    idf(t)  =    1  +  log  (numDocs  /  (docFreq  +  1))                    |d|  =    ∑  1                                                            t  in  d                    avgdl  =  =  (  ∑  |d|    )  /  (  ∑  1  )  )                                                                                  d  in  i                              d  in  i                    k  =  Free  parameter.  Usually  ~1.2  to  2.0.  Increases  term  frequency  saturation  point.                    b  =  Free  parameter.  Usually  ~0.75.  Increases  impact  of  document  normalization. BM25  (aka  Okapi)
  11. 11. • Capture  and  log  pretty  much  everything   • Searches,  clicks,  time  on  page,  seen/not,  etc.   • Precision  —  Of  those  shown,  what’s  relevant?   • Recall  —  Of  all  that’s  relevant,  what  was  found?   • NDCG  —  Account  for  position Measure,  Measure,  Measure
  12. 12. Lather,  Rinse,  Repeat
  13. 13. 💡
  14. 14. WWGD?
  15. 15. 👮 ♀👜 👲 💼 !♀😎 🎓 🎅 "👸 🚴 💁 ♂🚜 !🙎 ♂👮 ♀👜 👲 💼 !♀😎 🎓 🎅 "👸 🚴 💁 ♂🚜 !🙎 ♂🚵 ♀!💆 0 💃 🔬 !♀👵 🏃 ♀💂 👔 8 🚶 🚵 ♀!💆 0 👮 1 👮 ♀👜 👲 💼 !♀😎 🎓 🎅 "👸 🚴 💁 ♂🚜 !🙎 ♂🚵 ♀!💆 0 "🏇 "♀🏈 "💆 ♂🏄 ♀💇 👷 !👷 ♀🏂 💇 ♂👴 👼 💃 🔬 !♀👵 🏃 ♀💂 👔 8 🚶 🚵 ♀!💆 0 👮 1 🚵 ♀!💆 0 !🎪 🚶 ♀👮 ♀👜 👲 💼 !♀😎 🎓 🎅 "👸 🚴 💁 ♂🚜 !🙎 ♂💃 🔬 !♀👵 🏃 ♀💂 👔 8 🚶 🚵 ♀!💆 0 👮 🚵 ♀! 💃 🔬 !♀👵 🏃 ♀💂 👔 8 🚶 🚵 ♀!💆 0 👮 1 💃 🔬 !♀👵 🏃 ♀💂 👔 8 🚶 🚵 ♀!💆 0 👮 1 🚵 ♀!💆 0 🏇 "♀🏈 "💆 ♂🏄 ♀💇 👷 !👷 ♀🏂 💇 ♂👴 👼 💂 ♀B 🎒 !🎪 🚶 ♀💃 🔬 !♀👵 🏃 ♀💂 👔 8 🚶 🚵 ♀!💆 0
  16. 16. Magic Guessing Core  Information  Theory  (aka  Lucene/Solr) Search  Aids  (Facets,  Did  You  Mean,  Highlighting) Machine  Learning/NLP     (Clicks,  Crowd  Sourcing,  Recs,  Personalization,  User  feedback) Rules,  Domain  Specific  Knowledge fuhgeddaboudit
  17. 17. Content Collabora*on Context Core  Solr  capabili*es:  text  matching,  face*ng,   spell  checking,  highligh*ng   Business  Rules  for  content:  landing  pages,  boost/ block,  promopons,  etc. Leverage  collec*ve  intelligence  to  predict  what   users  will  do  based  on  historical,  aggregated   data   Recommenders,  Popularity,  Search  Paths Who  are  you?    Where  are  you?    What  have  you   done  previously?   User/Market  Segmentapon,  Roles,  Security,   Personalizapon Next Generation Relevance
  18. 18. But  What  About  the  Real  World?  Indexing  Edipon Machine  Learning/ NLP   NER,  Topic  Detection,   Clustering  Word2Vec,  etc. Domain  Rules:   Synonyms,  Regexes,   Lexical  Resources Extraction Load  Into  Spark Build  W2V,   PageRank,  Topic,   Clustering  Models Offline Content Models
  19. 19. Real  World?  Query  Edipon Query  Intent     Strategic,  Tactical,   Semantic😊 iPad case Head/Tail/ Clickstream/ Recommenders User  Factors:   Segmentation,  Location,   History,  Profile,  Security Parse Domain  Specific   Rules Transform  Results … Cascading  Rerankers   Learn  To  Rank  (multi-­‐ model),  Bias  corrections
  20. 20. Real  World?  Users  Edipon Load  Into  SparkSignals Query  Analysis Recommenders/ Personalization 😊 iPad case Query  Edition Raw Models Clickstream  Models
  21. 21. (Exact/Original  Match)^X     (Sloppy  Phrase)~M^Y     (AND  Q)^Z     (OR  Q)^XX   (Expansions/Click/Head/Tail  Boosts)^YY   (Personalization  Biases)^ZZ   ({!ltr  model=…})   Filters+Options:  security,  rules,  hard  preferences,  categories The  Perfect(?!?)  Query*    YMMV! }  Precision Recall Caveat  Emptor! *  Note:  there  are  a  lot  of  variations  on  this.    edismax  handles  most Learn  to  Rank X  >  Y  >  Z  >  XX   All  weights  can  be  learned
  22. 22. • Don’t  take  my  word  for  it,  experiment!   • A/B  Tests,  Multi-­‐arm  Bandits   • Good  primer:     • http://www.slideshare.net/InfoQ/online-­‐controlled-­‐experiments-­‐introduction-­‐ insights-­‐scaling-­‐and-­‐humbling-­‐statistics   • Rules  are  fine,  as  long  as  the  are  contained,  have  a  lifespan   and  are  measured  for  effectiveness Experimentapon,  Not  Editorializapon
  23. 23. Show  Us  Already,  Will  You!
  24. 24. 29 Lucidworks Fusion Product Suite The Lucidworks platform provides all of the components needed to create and 
 run smart enterprise and consumer applications Create rich UI with modular components for web and mobile Surface the insights that matter most with the power of machine learning and artificial intelligence Highly scalable search engine and NoSQL datastore that gives you instant access to all your data Combine the power of the Fusion stack with the simplicity you’d expect in a SaaS-based application
  25. 25. Lucidworks Fusion Architecture Web App Mobile BI/Analytics Logs File Web Database Box/ Dropbox Elasticsearch SDK Sharepoint Slack Jive Connectors Admin UI Search Analytics Visualization REST/ SQL Hadoop Google Drive Security Built In Proven Speed CDCR Extensible Scalable Responsive NLP: NER, Phrases, POS Query Intent & Doc Classification Recommenders Anomaly Detection Signals and Query Analytics Clustering A/B Testing ETL and Query Pipelines Alerting and Messaging SQL and Catalog Scheduling Connectors/Federation Import/Export Custom Jobs RulesTopic Detection Custom Search Devs Data Scientists Business Users Cross Cutting Features HDFS (Optional)
  26. 26. Fusion: Meeting the Search Challenge Relevance and Discovery Business Support Intelligence Open & Scalable Signal Proc. Machine Learning NLP Math/Stats Proven Search Extensible Simplified DevOps Real Time Query/Doc Simulations Rules Analytics User Interface Personalization Recommendations Query Intent Experimentation (A/B)
  27. 27. Demo Details • Ecommerce  Data  Set   -­‐ Product  Catalog:  ~1.3M   -­‐ Signals:  1  month  of  query,  document  logs   • Fusion  3.1  +  Rules  (open  source  add-­‐on  module)  +  Solr  LTR  contrib   • Spark  2.x,  Solr  6.5   • App  Studio  UI  (hyp://twigkit.com) Demo  Details
  28. 28. • http://lucidworks.com   • grant@   • http://lucene.apache.org/solr   • http://spark.apache.org/   • https://github.com/lucidworks/spark-­‐solr   • https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank   • Bloomberg  talk  on  LTR  https://www.youtube.com/watch? v=M7BKwJoh96s Resources

×