SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
2016
OCTOBER 11-14

BOSTON, MA
http://lucenerevolution.com
Search and
Recommenders
Grant Ingersoll
@gsingers
CTO, Lucidworks
Jake Mannix
@pbrane
Lead Data Engineer, Lucidworks
• Vision, motivations and definitions
• Use cases for ecommerce, compliance, fraud and customer support
• Fusion and the evolution of recommenders
• Demo
• Future Directions
Agenda
Search-Driven
Everything
Customer
Service
Customer
Insights
Fraud Surveillance
Research
Portal
Online Retail
Digital
Content
• Many companies treat search, recommendations/discovery and analytics as different
beasts, yet:
• The same inputs that make search better can also drive recommendations and better
analytics
• Engagement analytics is the key:
• Your users give you engagement signals regarding the content that is relevant to them
• Over time, patterns emerge in similarities of behavior (simplest possible pattern is just
“popularity”)
• These signals are often the biggest factor in both search relevance AND
recommendations
• In the enterprise, this is still the case, but the types of signals are often different (email,
IM)
Three Sides of the Same Coin
• Content — documents which are textually similar are often good as “similar items” to be
recommended
• Collaborative — documents which have been engaged with by the same people (and/or in the
same search context) are also similar in a more subtle, but often more powerful way
• Multi-Modal — why choose one? Try a smooth interpolation between using a content-based
similarity metric, and an engagement based one!
Defining Moments
Search-Driven Online Retail
 Increase conversions with a
personalized shopping experience with
best in class reliability and
performance.
CATALOG
DYNAMIC NAVIGATION
AND LANDING PAGES
INSTANT INSIGHTS AND
ANALYTICS
PERSONALIZED
SHOPPING EXPERIENCE
PROMOTIONS USER HISTORY
Data Acquisition
Data Processing
Smart Access API
Search-Driven Compliance and Surveillance
Detect and investigate activity for
regulatory compliance, from one
unified view.
DATABASE
ACCURATE REAL-TIME
INFORMATION
CONTEXTUALLY-
ENRICHED
INFORMATION
MESSAGESLOGS
DATA EXPLORATION
AND VISUALIZATION
Data Acquisition
Indexing & Streaming
Smart Access API
Search-Driven Customer Service
Resolve customer issues quickly with
immediate access to relevant answers.
CUSTOMER 

SELF-SERVICE
KNOWLEDGE BASE
PROACTIVE ALERTS AND
RECOMMENDATIONS
EXPERT TUNED
RELEVANCY DRIVEN BY
ANALYTICS AND INSIGHTS
CRM SUPPORT TICKETS &
ISSUE TRACKING
Data Acquisition
Data Processing
Smart Access API
Fusion and Recommenders
Lucidworks Fusion Is Search-Driven Everything
•Drive next generation relevance
via Content, Collaboration and
Context
•Harness best in class Open
Source: Apache Solr + Spark
•Simplify application
development and reduce
ongoing maintenance
CATALOG
DYNAMIC NAVIGATION
AND LANDING PAGES
INSTANT INSIGHTS AND
ANALYTICS
PERSONALIZED
SHOPPING EXPERIENCE
PROMOTIONS USER HISTORY
Data Acquisition
Indexing & Streaming
Smart Access API
Recommendations &

Alerts
Analytics & InsightsExtreme Relevancy
Access data from
anywhere to build
intelligent, data-
driven applications.
Fusion Architecture
RESTAPI
Worker Worker Cluster Mgr.
Apache Spark
Shards Shards
Apache Solr
HDFS(Optional)
Shared Config
Mgmt
Leader
Election
Load
Balancing
ZK 1
Apache Zookeeper
ZK N
DATABASEWEBFILELOGSHADOOP CLOUD
Connectors
Alerting/Messaging
NLP
Pipelines
Blob Storage
Scheduling
Recommenders/Signals
…
Core Services
Admin UI
SECURITY BUILT-IN
Lucidworks View
• Fusion
• Recommenders API
• Machine Learning pipeline stages
• Scheduling
• Solr:
• More Like This + Signals
• Spark:
• MLlib, Mahout, custom
Key Platform Tech
• Solr comes built-in with a query parser, MoreLikeThis, which takes a given document, and:
• Extracts nontrivial terms from specified fields in it
• Builds an “OR” query to search for closest matches (like a cosine similarity computation)
• Has many knobs to tune regarding “data-cleaning” non-useful terms from the query
• TF-IDF is great, but there are other metrics possible: LSI, LDA, W2V
Content-focused
{!mlt qf=body,suggest,subject,title mintf=2 mindf=5 minwl=3}<DOC_ID>
“People who bought X also bought Y” / “Movies recommended for you”
Collaborative Filtering
Search User/
Item Index
Top K users
who’ve
interacted with
this Item
Search and
Rollup on User/
Item Index
Top Y docs
Current Doc
Filter by
context
Profit
User/Item Index
Offline Tasks
User/Item Signals
Math!
• Fusion CF-based “documents like this” pipeline stages:
• Sub-query: search aggregated signals index for current doc_id,
extracting the top-K pairs of (user_id, weight)
• Sub-query: search that table again with a weighted OR query:
(user_id:user_id_1^weight_1 OR user_id:user_id_2^weight_2 OR … )
• Roll-up: topN(sum(score_i * weight_i))
• Sub-query: fetch the documents from primary Solr index of
these top N doc_ids
Collaborative Filtering: step by step in Fusion
• Both content-based and CF recommenders use features of the documents to generate a
similarity metric
• Content uses the tokens in the document
• CF uses user ids who have engaged with it
• Metrics can be weighted-summed, allowing a “slider” between the two
• Fancy similarity techniques which can be done to a (doc, token) matrix can often be done on a
(doc, userId) matrix, or even a joint (doc, (token or userId)) concatenated matrix
• There is a cost to such techniques: harder to maintain, harder to A/B test variations
Multi-modal
• Basics:
• 26 Apache Projects registered so far plus LW web properties
• 93 datasources* including email, Github, JIRA*, Website and Wiki
• Fusion 2.4
• Signals everywhere
• UI based on Lucidworks View
• ASF Mail archives mirrored at: http://asfmail.lucidworks.io
Demo
http://searchhub.lucidworks.com
Implementation Details
http://github.com/lucidworks/searchhub
Branch: GH-28-doc-view
Key Source Code
UI
Angular Directives:
perdocument
recommendations
Offline Tasks
Spark Jobs:
mail_thread_signal_creation_job.json
SimpleTwoHopRecommender.scala
Fusion Pipelines
Query:
lucidfind-recommendations
cf-similar-items-batch-rec
cf-similar-items-rec
• Ensemble and Click-based approaches
• https://github.com/lucidworks/searchhub/issues/40
• https://github.com/lucidworks/searchhub/issues/28
• https://github.com/lucidworks/searchhub/issues/22
• Deploy live
• User registrations
• https://github.com/lucidworks/searchhub/issues/30
Future Work
Resources
Fusion: http://www.lucidworks.com/products/fusion
Search Hub: http://searchhub.lucidworks.com
Company: http://www.lucidworks.com
Our blog: http://www.lucidworks.com/blog
Twitter: @gsingers, @pbrane

Weitere ähnliche Inhalte

Was ist angesagt?

Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Lucidworks
 

Was ist angesagt? (18)

Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and Spark
 
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
 
Webinar: Site Search in an Hour with Fusion
Webinar: Site Search in an Hour with FusionWebinar: Site Search in an Hour with Fusion
Webinar: Site Search in an Hour with Fusion
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
 
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Whitepaper- Real World Search
Whitepaper-  Real World SearchWhitepaper-  Real World Search
Whitepaper- Real World Search
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Webinar: Solr & Fusion for Big Data
Webinar: Solr & Fusion for Big DataWebinar: Solr & Fusion for Big Data
Webinar: Solr & Fusion for Big Data
 
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 Click-through relevance ranking in solr &  lucid works enterprise - By Andrz... Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 

Andere mochten auch

Andere mochten auch (20)

Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015
 
Apache Solr 5.0 and beyond
Apache Solr 5.0 and beyondApache Solr 5.0 and beyond
Apache Solr 5.0 and beyond
 
Webinar: Fusion for Business Intelligence
Webinar: Fusion for Business IntelligenceWebinar: Fusion for Business Intelligence
Webinar: Fusion for Business Intelligence
 
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
 
What's New in Apache Solr 4.10
What's New in Apache Solr 4.10What's New in Apache Solr 4.10
What's New in Apache Solr 4.10
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0
 
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
 
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon ConsultingSolr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
 
Scaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsScaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of Collections
 
it's just search
it's just searchit's just search
it's just search
 
Ease of use in Apache Solr
Ease of use in Apache SolrEase of use in Apache Solr
Ease of use in Apache Solr
 
Solr security frameworks
Solr security frameworksSolr security frameworks
Solr security frameworks
 
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
 
SolrCloud Cluster management via APIs
SolrCloud Cluster management via APIsSolrCloud Cluster management via APIs
SolrCloud Cluster management via APIs
 
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro...
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro...Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro...
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro...
 
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
 
Working with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrWorking with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache Solr
 
Managing a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIsManaging a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIs
 
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, FlaxCoffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
 
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionWebinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks Fusion
 

Ähnlich wie Webinar: Search and Recommenders

Ähnlich wie Webinar: Search and Recommenders (20)

South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
 
Webinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better SearchWebinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better Search
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 
NOW! Get the internet to work for you!
NOW! Get the internet to work for you!NOW! Get the internet to work for you!
NOW! Get the internet to work for you!
 
Graphs for Recommendation Engines: Looking beyond Social, Retail, and Media
Graphs for Recommendation Engines: Looking beyond Social, Retail, and MediaGraphs for Recommendation Engines: Looking beyond Social, Retail, and Media
Graphs for Recommendation Engines: Looking beyond Social, Retail, and Media
 
What IA, UX and SEO Can Learn from Each Other
What IA, UX and SEO Can Learn from Each OtherWhat IA, UX and SEO Can Learn from Each Other
What IA, UX and SEO Can Learn from Each Other
 
Personalized Search at Sandia National Labs
Personalized Search at Sandia National LabsPersonalized Search at Sandia National Labs
Personalized Search at Sandia National Labs
 
Optimising Your Content for Findability
Optimising Your Content for FindabilityOptimising Your Content for Findability
Optimising Your Content for Findability
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation Demystified
 
Search Me: Designing Information Retrieval Experiences
Search Me: Designing Information Retrieval ExperiencesSearch Me: Designing Information Retrieval Experiences
Search Me: Designing Information Retrieval Experiences
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Optimising Your Content for findability
Optimising Your Content for findabilityOptimising Your Content for findability
Optimising Your Content for findability
 
Webinar: Building Customer-Targeted Search with Fusion
Webinar: Building Customer-Targeted Search with FusionWebinar: Building Customer-Targeted Search with Fusion
Webinar: Building Customer-Targeted Search with Fusion
 
Solving Real World Challenges with Enterprise Search
Solving Real World Challenges with Enterprise SearchSolving Real World Challenges with Enterprise Search
Solving Real World Challenges with Enterprise Search
 
Productionalize content recommendation engine
Productionalize content recommendation engine Productionalize content recommendation engine
Productionalize content recommendation engine
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise Search
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Presentation by Meshlabs at Zensar #TechShowcase - An iSPIRT ProductNation in...
Presentation by Meshlabs at Zensar #TechShowcase - An iSPIRT ProductNation in...Presentation by Meshlabs at Zensar #TechShowcase - An iSPIRT ProductNation in...
Presentation by Meshlabs at Zensar #TechShowcase - An iSPIRT ProductNation in...
 

Mehr von Lucidworks

Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 

Mehr von Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Webinar: Search and Recommenders

  • 1.
  • 3. Search and Recommenders Grant Ingersoll @gsingers CTO, Lucidworks Jake Mannix @pbrane Lead Data Engineer, Lucidworks
  • 4. • Vision, motivations and definitions • Use cases for ecommerce, compliance, fraud and customer support • Fusion and the evolution of recommenders • Demo • Future Directions Agenda
  • 6. • Many companies treat search, recommendations/discovery and analytics as different beasts, yet: • The same inputs that make search better can also drive recommendations and better analytics • Engagement analytics is the key: • Your users give you engagement signals regarding the content that is relevant to them • Over time, patterns emerge in similarities of behavior (simplest possible pattern is just “popularity”) • These signals are often the biggest factor in both search relevance AND recommendations • In the enterprise, this is still the case, but the types of signals are often different (email, IM) Three Sides of the Same Coin
  • 7. • Content — documents which are textually similar are often good as “similar items” to be recommended • Collaborative — documents which have been engaged with by the same people (and/or in the same search context) are also similar in a more subtle, but often more powerful way • Multi-Modal — why choose one? Try a smooth interpolation between using a content-based similarity metric, and an engagement based one! Defining Moments
  • 8. Search-Driven Online Retail  Increase conversions with a personalized shopping experience with best in class reliability and performance. CATALOG DYNAMIC NAVIGATION AND LANDING PAGES INSTANT INSIGHTS AND ANALYTICS PERSONALIZED SHOPPING EXPERIENCE PROMOTIONS USER HISTORY Data Acquisition Data Processing Smart Access API
  • 9. Search-Driven Compliance and Surveillance Detect and investigate activity for regulatory compliance, from one unified view. DATABASE ACCURATE REAL-TIME INFORMATION CONTEXTUALLY- ENRICHED INFORMATION MESSAGESLOGS DATA EXPLORATION AND VISUALIZATION Data Acquisition Indexing & Streaming Smart Access API
  • 10. Search-Driven Customer Service Resolve customer issues quickly with immediate access to relevant answers. CUSTOMER 
 SELF-SERVICE KNOWLEDGE BASE PROACTIVE ALERTS AND RECOMMENDATIONS EXPERT TUNED RELEVANCY DRIVEN BY ANALYTICS AND INSIGHTS CRM SUPPORT TICKETS & ISSUE TRACKING Data Acquisition Data Processing Smart Access API
  • 12. Lucidworks Fusion Is Search-Driven Everything •Drive next generation relevance via Content, Collaboration and Context •Harness best in class Open Source: Apache Solr + Spark •Simplify application development and reduce ongoing maintenance CATALOG DYNAMIC NAVIGATION AND LANDING PAGES INSTANT INSIGHTS AND ANALYTICS PERSONALIZED SHOPPING EXPERIENCE PROMOTIONS USER HISTORY Data Acquisition Indexing & Streaming Smart Access API Recommendations &
 Alerts Analytics & InsightsExtreme Relevancy Access data from anywhere to build intelligent, data- driven applications.
  • 13. Fusion Architecture RESTAPI Worker Worker Cluster Mgr. Apache Spark Shards Shards Apache Solr HDFS(Optional) Shared Config Mgmt Leader Election Load Balancing ZK 1 Apache Zookeeper ZK N DATABASEWEBFILELOGSHADOOP CLOUD Connectors Alerting/Messaging NLP Pipelines Blob Storage Scheduling Recommenders/Signals … Core Services Admin UI SECURITY BUILT-IN Lucidworks View
  • 14. • Fusion • Recommenders API • Machine Learning pipeline stages • Scheduling • Solr: • More Like This + Signals • Spark: • MLlib, Mahout, custom Key Platform Tech
  • 15. • Solr comes built-in with a query parser, MoreLikeThis, which takes a given document, and: • Extracts nontrivial terms from specified fields in it • Builds an “OR” query to search for closest matches (like a cosine similarity computation) • Has many knobs to tune regarding “data-cleaning” non-useful terms from the query • TF-IDF is great, but there are other metrics possible: LSI, LDA, W2V Content-focused {!mlt qf=body,suggest,subject,title mintf=2 mindf=5 minwl=3}<DOC_ID>
  • 16. “People who bought X also bought Y” / “Movies recommended for you” Collaborative Filtering Search User/ Item Index Top K users who’ve interacted with this Item Search and Rollup on User/ Item Index Top Y docs Current Doc Filter by context Profit User/Item Index Offline Tasks User/Item Signals Math!
  • 17. • Fusion CF-based “documents like this” pipeline stages: • Sub-query: search aggregated signals index for current doc_id, extracting the top-K pairs of (user_id, weight) • Sub-query: search that table again with a weighted OR query: (user_id:user_id_1^weight_1 OR user_id:user_id_2^weight_2 OR … ) • Roll-up: topN(sum(score_i * weight_i)) • Sub-query: fetch the documents from primary Solr index of these top N doc_ids Collaborative Filtering: step by step in Fusion
  • 18. • Both content-based and CF recommenders use features of the documents to generate a similarity metric • Content uses the tokens in the document • CF uses user ids who have engaged with it • Metrics can be weighted-summed, allowing a “slider” between the two • Fancy similarity techniques which can be done to a (doc, token) matrix can often be done on a (doc, userId) matrix, or even a joint (doc, (token or userId)) concatenated matrix • There is a cost to such techniques: harder to maintain, harder to A/B test variations Multi-modal
  • 19. • Basics: • 26 Apache Projects registered so far plus LW web properties • 93 datasources* including email, Github, JIRA*, Website and Wiki • Fusion 2.4 • Signals everywhere • UI based on Lucidworks View • ASF Mail archives mirrored at: http://asfmail.lucidworks.io Demo http://searchhub.lucidworks.com
  • 20. Implementation Details http://github.com/lucidworks/searchhub Branch: GH-28-doc-view Key Source Code UI Angular Directives: perdocument recommendations Offline Tasks Spark Jobs: mail_thread_signal_creation_job.json SimpleTwoHopRecommender.scala Fusion Pipelines Query: lucidfind-recommendations cf-similar-items-batch-rec cf-similar-items-rec
  • 21. • Ensemble and Click-based approaches • https://github.com/lucidworks/searchhub/issues/40 • https://github.com/lucidworks/searchhub/issues/28 • https://github.com/lucidworks/searchhub/issues/22 • Deploy live • User registrations • https://github.com/lucidworks/searchhub/issues/30 Future Work
  • 22. Resources Fusion: http://www.lucidworks.com/products/fusion Search Hub: http://searchhub.lucidworks.com Company: http://www.lucidworks.com Our blog: http://www.lucidworks.com/blog Twitter: @gsingers, @pbrane