SlideShare a Scribd company logo
1 of 43
Download to read offline
Intelligent Search
Intelligent Search
   (or at least really clever)
Some Preliminaries
• Text retrieval = matrix multiplication
     A: our corpus
     documents are rows
     terms are columns
Some Preliminaries
• Text retrieval = matrix multiplication
     A: our corpus
     documents are rows
     terms are columns

     for each document d:
      for each term t:
        sd += adt qt
Some Preliminaries
• Text retrieval = matrix multiplication
     A: our corpus
     documents are rows
     terms are columns

     sd = Σt adt qt
Some Preliminaries
• Text retrieval = matrix multiplication
     A: our corpus
     documents are rows
     terms are columns

     s=Aq
More Preliminaries
• Recommendation = Matrix multiply
    A: our users’ histories
    users are rows
    items are columns
More Preliminaries
• Recommendation = Matrix multiply
    A: our users’ histories
    users are rows
    items are columns

   Users who bought items
   in the list h also bought
   items in the list r
More Preliminaries
• Recommendation = Matrix multiply
    A: our users’ histories
    users are rows
    items are columns

   for each user u:
    for each item t1:
      for each item t2:
         rt1 += au,t1 au,t2 ht2
More Preliminaries
• Recommendation = Matrix multiply
    A: our users’ histories
    users are rows
    items are columns

   sd = Σt2 Σu au,t1 au,t2 qt2
More Preliminaries
• Recommendation = Matrix multiply
    A: our users’ histories
    users are rows
    items are columns

   s = A’ (A q)
More Preliminaries
• Recommendation = Matrix multiply
    A: our users’ histories
    users are rows
    items are columns

   s = (A’ A) q
More Preliminaries
• Recommendation = Matrix multiply
    A: our users’ histories
    users are rows
    items are columns

   s = (A’ A) q   ish!
Why so ish?

• In real life, ish happens because:
 • Big data ... so we selectively sample
 • Sparse data ... so we smooth
 • Finite computers ... so we sparsify
 • Top-40 effect ... so we use some stats
The same in spite of ish

• The shape of the computation is unchanged
• The cost of the computation is unchanged
• Broad algebraic conclusions still hold
Back to recommendations ...
Dyadic Structure
●   Functional
    – Interaction:   actor -> item*
●   Relational
    – Interaction    ⊆ Actors x Items
●   Matrix
    – Rows    indexed by actor, columns by item
    – Value   is count of interactions
●   Predict missing observations
Fundamental Algorithmics
●   Cooccurrence

●   A is actors x items, K is items x items
●   Product has general shape of matrix
●   K tells us “users who interacted with x also
    interacted with y”
Fundamental Algorithmic Structure
●   Cooccurrence

●   Matrix approximation by factoring




●   LLR
But Wait ...
But Wait ...
Does it have to be that way?
What we have:

For a user who watched/bought/listened to this
What we have:

  For a user who watched/bought/listened to this



Sum over all other users who watched/bought/...
What we have:

  For a user who watched/bought/listened to this



Sum over all other users who watched/bought/...



  Add up what they watched/bought/listened to
What we have:

  For a user who watched/bought/listened to this



Sum over all other users who watched/bought/...



  Add up what they watched/bought/listened to


                      And recommend that
What we have:

  For a user who watched/bought/listened to this



Sum over all other users who watched/bought/...



  Add up what they watched/bought/listened to


                      And recommend that


                               ish
What we have:

Add up what they watched/bought/listened to
What we have:

Add up what they watched/bought/listened to



               But wait, we can do that faster
What we have:

Add up what they watched/bought/listened to



               But wait, we can do that faster
But why not ...
But why not ...
But why not ...




Why just dyadic learning?
But why not ...




Why just dyadic learning?

Why not triadic learning?
But why not ...




Why just dyadic learning?

Why not p-adic learning?
For example
●   Users enter queries (A)
    – (actor   = user, item=query)
●   Users view videos (B)
    – (actor   = user, item=video)
●   AʼA gives query recommendation
    – “did   you mean to ask for”
●   BʼB gives video recommendation
    – “you   might like these videos”
The punch-line
●   BʼA recommends videos in response to a query
    – (isnʼt   that a search engine?)
    – (not   quite, it doesnʼt look at content or meta-data)
Real-life example
●   Query: “Paco de Lucia”
●   Conventional meta-data search results:
    – “hombres   del paco” times 400
    – not   much else
●   Recommendation based search:
    – Flamenco    guitar and dancers
    – Spanish   and classical guitar
    – Van   Halen doing a classical/flamenco riff
Real-life example
Real-life example
System Diagram
     Viewing Logs      selective
                                   count
       t user video    sampler


Search Logs            selective                       llr +    Related videos
                                     count
  t user query-term    sampler                       sparsify    v => v1 v2...


                                                                Related terms
                                   join on                       v => t1 t2...
                                             count
                                     user


                      Hadoop
Indexing
        Related terms
         v => t1 t2...


    Related videos
     v => v1 v2...
                         join on
                                    Lucene Index
                          video
 Video meta
   v => url title...



Hadoop                             Lucene (+Katta?)
Hypothetical Example
●   Want a navigational ontology?
●   Just put labels on a web page with traffic
    – This   gives A = users x label clicks
●   Remember viewing history
    – This   gives B = users x items
●   Cross recommend
    – BʼA =   click to item mapping
●   After several users click, results are whatever
    users think they should be
Resources
●   My blog
    – http://tdunning.blogspot.com/

●   The original LLR in NLP paper
    –   Accurate Methods for the Statistics of Surprise and Coincidence
        (check on citeseer)
●   Source code
    – Mahout      project
    – contact     me (tdunning@apache.org)

More Related Content

Similar to Intelligent Search

Recommendation Subsystem - Museum Radar
Recommendation Subsystem - Museum RadarRecommendation Subsystem - Museum Radar
Recommendation Subsystem - Museum RadarPanos Gemos
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engineKeeyong Han
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at SpotifyErik Bernhardsson
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisSeth Grimes
 
CSE545 sp23 (2) Streaming Algorithms 2-4.pdf
CSE545 sp23 (2) Streaming Algorithms 2-4.pdfCSE545 sp23 (2) Streaming Algorithms 2-4.pdf
CSE545 sp23 (2) Streaming Algorithms 2-4.pdfAlexanderKyalo3
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupAndy Sloane
 
Recsys 2018 overview and highlights
Recsys 2018 overview and highlightsRecsys 2018 overview and highlights
Recsys 2018 overview and highlightsSandra Garcia
 
Balancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem FormulationBalancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem FormulationAlex D. Gaudio
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationArjen de Vries
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Simplilearn
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetCrossing Minds
 
Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ SpotifyNikhil Tibrewal
 
Graph processing at scale using spark & graph frames
Graph processing at scale using spark & graph framesGraph processing at scale using spark & graph frames
Graph processing at scale using spark & graph framesRon Barabash
 
Getting the most out of your Ruby on Rails applications: from zero to hero
Getting the most out of your Ruby on Rails applications: from zero to heroGetting the most out of your Ruby on Rails applications: from zero to hero
Getting the most out of your Ruby on Rails applications: from zero to heroFilip Tepper
 
Recommendations play @flipkart
Recommendations play @flipkartRecommendations play @flipkart
Recommendations play @flipkarthava101
 
Tag And Tag Based Recommender
Tag And Tag Based RecommenderTag And Tag Based Recommender
Tag And Tag Based Recommendergu wendong
 

Similar to Intelligent Search (20)

Recommendation Subsystem - Museum Radar
Recommendation Subsystem - Museum RadarRecommendation Subsystem - Museum Radar
Recommendation Subsystem - Museum Radar
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engine
 
Recommender lecture
Recommender lectureRecommender lecture
Recommender lecture
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at Spotify
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion Analysis
 
CSE545 sp23 (2) Streaming Algorithms 2-4.pdf
CSE545 sp23 (2) Streaming Algorithms 2-4.pdfCSE545 sp23 (2) Streaming Algorithms 2-4.pdf
CSE545 sp23 (2) Streaming Algorithms 2-4.pdf
 
Let's Get to the Rapids
Let's Get to the RapidsLet's Get to the Rapids
Let's Get to the Rapids
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
Recsys 2018 overview and highlights
Recsys 2018 overview and highlightsRecsys 2018 overview and highlights
Recsys 2018 overview and highlights
 
Balancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem FormulationBalancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem Formulation
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right Dataset
 
Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ Spotify
 
Graph processing at scale using spark & graph frames
Graph processing at scale using spark & graph framesGraph processing at scale using spark & graph frames
Graph processing at scale using spark & graph frames
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Getting the most out of your Ruby on Rails applications: from zero to hero
Getting the most out of your Ruby on Rails applications: from zero to heroGetting the most out of your Ruby on Rails applications: from zero to hero
Getting the most out of your Ruby on Rails applications: from zero to hero
 
Recommender systems session b
Recommender systems session bRecommender systems session b
Recommender systems session b
 
Recommendations play @flipkart
Recommendations play @flipkartRecommendations play @flipkart
Recommendations play @flipkart
 
Tag And Tag Based Recommender
Tag And Tag Based RecommenderTag And Tag Based Recommender
Tag And Tag Based Recommender
 

More from Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Ted Dunning
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data SecurelyTed Dunning
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 

More from Ted Dunning (20)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 

Recently uploaded (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 

Intelligent Search

  • 2. Intelligent Search (or at least really clever)
  • 3. Some Preliminaries • Text retrieval = matrix multiplication A: our corpus documents are rows terms are columns
  • 4. Some Preliminaries • Text retrieval = matrix multiplication A: our corpus documents are rows terms are columns for each document d: for each term t: sd += adt qt
  • 5. Some Preliminaries • Text retrieval = matrix multiplication A: our corpus documents are rows terms are columns sd = Σt adt qt
  • 6. Some Preliminaries • Text retrieval = matrix multiplication A: our corpus documents are rows terms are columns s=Aq
  • 7. More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns
  • 8. More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns Users who bought items in the list h also bought items in the list r
  • 9. More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns for each user u: for each item t1: for each item t2: rt1 += au,t1 au,t2 ht2
  • 10. More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns sd = Σt2 Σu au,t1 au,t2 qt2
  • 11. More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns s = A’ (A q)
  • 12. More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns s = (A’ A) q
  • 13. More Preliminaries • Recommendation = Matrix multiply A: our users’ histories users are rows items are columns s = (A’ A) q ish!
  • 14. Why so ish? • In real life, ish happens because: • Big data ... so we selectively sample • Sparse data ... so we smooth • Finite computers ... so we sparsify • Top-40 effect ... so we use some stats
  • 15. The same in spite of ish • The shape of the computation is unchanged • The cost of the computation is unchanged • Broad algebraic conclusions still hold
  • 17. Dyadic Structure ● Functional – Interaction: actor -> item* ● Relational – Interaction ⊆ Actors x Items ● Matrix – Rows indexed by actor, columns by item – Value is count of interactions ● Predict missing observations
  • 18. Fundamental Algorithmics ● Cooccurrence ● A is actors x items, K is items x items ● Product has general shape of matrix ● K tells us “users who interacted with x also interacted with y”
  • 19. Fundamental Algorithmic Structure ● Cooccurrence ● Matrix approximation by factoring ● LLR
  • 21. But Wait ... Does it have to be that way?
  • 22. What we have: For a user who watched/bought/listened to this
  • 23. What we have: For a user who watched/bought/listened to this Sum over all other users who watched/bought/...
  • 24. What we have: For a user who watched/bought/listened to this Sum over all other users who watched/bought/... Add up what they watched/bought/listened to
  • 25. What we have: For a user who watched/bought/listened to this Sum over all other users who watched/bought/... Add up what they watched/bought/listened to And recommend that
  • 26. What we have: For a user who watched/bought/listened to this Sum over all other users who watched/bought/... Add up what they watched/bought/listened to And recommend that ish
  • 27. What we have: Add up what they watched/bought/listened to
  • 28. What we have: Add up what they watched/bought/listened to But wait, we can do that faster
  • 29. What we have: Add up what they watched/bought/listened to But wait, we can do that faster
  • 30. But why not ...
  • 31. But why not ...
  • 32. But why not ... Why just dyadic learning?
  • 33. But why not ... Why just dyadic learning? Why not triadic learning?
  • 34. But why not ... Why just dyadic learning? Why not p-adic learning?
  • 35. For example ● Users enter queries (A) – (actor = user, item=query) ● Users view videos (B) – (actor = user, item=video) ● AʼA gives query recommendation – “did you mean to ask for” ● BʼB gives video recommendation – “you might like these videos”
  • 36. The punch-line ● BʼA recommends videos in response to a query – (isnʼt that a search engine?) – (not quite, it doesnʼt look at content or meta-data)
  • 37. Real-life example ● Query: “Paco de Lucia” ● Conventional meta-data search results: – “hombres del paco” times 400 – not much else ● Recommendation based search: – Flamenco guitar and dancers – Spanish and classical guitar – Van Halen doing a classical/flamenco riff
  • 40. System Diagram Viewing Logs selective count t user video sampler Search Logs selective llr + Related videos count t user query-term sampler sparsify v => v1 v2... Related terms join on v => t1 t2... count user Hadoop
  • 41. Indexing Related terms v => t1 t2... Related videos v => v1 v2... join on Lucene Index video Video meta v => url title... Hadoop Lucene (+Katta?)
  • 42. Hypothetical Example ● Want a navigational ontology? ● Just put labels on a web page with traffic – This gives A = users x label clicks ● Remember viewing history – This gives B = users x items ● Cross recommend – BʼA = click to item mapping ● After several users click, results are whatever users think they should be
  • 43. Resources ● My blog – http://tdunning.blogspot.com/ ● The original LLR in NLP paper – Accurate Methods for the Statistics of Surprise and Coincidence (check on citeseer) ● Source code – Mahout project – contact me (tdunning@apache.org)