SlideShare a Scribd company logo
1 of 27
Download to read offline
Probabilistic Information Retrieval
Search - Week 6
Keerthi Nuthi
Vipul Munot
Arun Ram Sankaranarayanan
Basic Probability Theory
For events A and B
● P(A,B) : Joint Probability
● P(A / B) : Conditional Probability
● Chain Rule :
● Partition Rule :
● Bayes Rule :
● Prior probability P(A) : (initial estimate of how likely event A is in the absence of any
other information).
● Posterior probability P(A|B) : after having seen the evidence B, based on the
likelihood of B occurring in the two cases that A does or does not hold.
● Odds of an event provide a kind of multiplier for how probabilities change.
Odds :
Basic Probability Theory
The 1/0 loss case
● Ranked retrieval setup: given a collection of documents, the user
issues a query, and an ordered list of documents is returned.
● Assume binary notion of relevance: Rd,q
is a random dichotomous
variable, such that
● Rd,q
= 1 if document d is relevant w.r.t query q
● Rd,q
= 0 otherwise
The Probability Ranking Principle
● If a retrieval system responds to the query of an user by giving a set of
documents in the decreasing order of their probability of relevance, the
overall effectiveness of the system will be the best that is obtainable.
● It is assumed that probabilities are calculated based on the entire data
available to the system.
The Binary Independence Model (BIM)
● Binary (equivalent to Boolean) : Documents and Queries are both
represented as binary term vectors.
● E.g., document d represented by vector x = (x1
, . . . , xM
), where
xt
= 1 if term t occurs in d and xt
= 0 otherwise.
● Different documents may have the same vector representation.
Binary Independence Model
To make a probabilistic retrieval strategy precise, need to estimate
how terms in documents contribute to relevance
● Find measurable statistics (term frequency, document frequency, document length)
that affect judgments about document relevance
● Combine these statistics to estimate the probability of document relevance
● Order documents by decreasing estimated probability of relevance P(R|d, q)
● Assume that the relevance of each document is independent of the relevance of
other documents (not true, in practice allows duplicate results).
Binary Independence Model
●
●
○
○
○
●
Deriving a ranking function for query terms
● Aim: Given a query q,
○ return documents by descending P(R=1 | d,q) in BIM
○ As we are interested only in ranking the documents, we rank them by their odds of
relevance.
Deriving a ranking function for query terms
● Since each xt
is either 0 or 1, we can separate the terms to give:
● let = probability of a term appearing in a document
relevant to the query
● = be the probability of a term appearing in a nonrelevant
document
Deriving a ranking function for query terms
▪
▪
Retrieval Status Value (RSV)
●
●
▪
Retrieval Status Value (RSV)
▪
− −
▪
▪
Deriving a ranking function for query terms
Probability Estimates in Practice
▪
▪
− − ≈
▪
Probability Estimates in Practice
●
●
▪ −
▪
▪
▪
An appraisal and some extensions
●
●
●
●
●
●
Tree structured dependencies between terms
●
●
●
Okapi BM25, a non binary model
●
●
≤ ≤
Bayesian network approaches to IR
Thank You
References
● http://nlp.stanford.edu/IR-book/html/htmledition/probabilistic-information-retrieval-1.html
● nlp.stanford.edu/IR-book/ppt/11prob.pptx

More Related Content

What's hot

What's hot (20)

IR Evaluation using Rank-Biased Precision
IR Evaluation using Rank-Biased PrecisionIR Evaluation using Rank-Biased Precision
IR Evaluation using Rank-Biased Precision
 
Vector space model in information retrieval
Vector space model in information retrievalVector space model in information retrieval
Vector space model in information retrieval
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Information retrieval concept, practice and challenge
Information retrieval   concept, practice and challengeInformation retrieval   concept, practice and challenge
Information retrieval concept, practice and challenge
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset Generation
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Recurrences
RecurrencesRecurrences
Recurrences
 
Information Retrieval Evaluation
Information Retrieval EvaluationInformation Retrieval Evaluation
Information Retrieval Evaluation
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
 
Text categorization
Text categorizationText categorization
Text categorization
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
automatic classification in information retrieval
automatic classification in information retrievalautomatic classification in information retrieval
automatic classification in information retrieval
 
Recommender system
Recommender systemRecommender system
Recommender system
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 

Viewers also liked

Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 
Document similarity with vector space model
Document similarity with vector space modelDocument similarity with vector space model
Document similarity with vector space model
dalal404
 
Information storage and retrieval
Information storage and retrievalInformation storage and retrieval
Information storage and retrieval
Sadaf Rafiq
 

Viewers also liked (14)

Probabilistic Retrieval
Probabilistic RetrievalProbabilistic Retrieval
Probabilistic Retrieval
 
Lec 4,5
Lec 4,5Lec 4,5
Lec 4,5
 
Probabilistic Information Retrieval
Probabilistic Information RetrievalProbabilistic Information Retrieval
Probabilistic Information Retrieval
 
IR
IRIR
IR
 
Ir 08
Ir   08Ir   08
Ir 08
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
Document similarity with vector space model
Document similarity with vector space modelDocument similarity with vector space model
Document similarity with vector space model
 
similarity measure
similarity measure similarity measure
similarity measure
 
Information storage and retrieval
Information storage and retrievalInformation storage and retrieval
Information storage and retrieval
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
Genetic Algorithm by Example
Genetic Algorithm by ExampleGenetic Algorithm by Example
Genetic Algorithm by Example
 
Extending BM25 with multiple query operators
Extending BM25 with multiple query operatorsExtending BM25 with multiple query operators
Extending BM25 with multiple query operators
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & Models
 
The World of Geocoding and Challenges in India
The World of Geocoding and Challenges in IndiaThe World of Geocoding and Challenges in India
The World of Geocoding and Challenges in India
 

More from Vipul Munot

More from Vipul Munot (10)

Event pal
Event palEvent pal
Event pal
 
Empowering Businesses using Yelp Reviews Mining
Empowering Businesses using Yelp Reviews MiningEmpowering Businesses using Yelp Reviews Mining
Empowering Businesses using Yelp Reviews Mining
 
dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognition
 
Sentence level sentiment analysis
Sentence level sentiment analysisSentence level sentiment analysis
Sentence level sentiment analysis
 
IBM Cognos TM1
IBM Cognos TM1IBM Cognos TM1
IBM Cognos TM1
 
Airtel
AirtelAirtel
Airtel
 
Apeda
ApedaApeda
Apeda
 
Will chinese yuan become world currency
Will chinese yuan become world currencyWill chinese yuan become world currency
Will chinese yuan become world currency
 
Ascertaining Customer Satisfaction
Ascertaining Customer SatisfactionAscertaining Customer Satisfaction
Ascertaining Customer Satisfaction
 
Visual CV / Vipul Munot
Visual CV / Vipul MunotVisual CV / Vipul Munot
Visual CV / Vipul Munot
 

Recently uploaded

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 

Recently uploaded (20)

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 

Search: Probabilistic Information Retrieval

  • 1. Probabilistic Information Retrieval Search - Week 6 Keerthi Nuthi Vipul Munot Arun Ram Sankaranarayanan
  • 2. Basic Probability Theory For events A and B ● P(A,B) : Joint Probability ● P(A / B) : Conditional Probability ● Chain Rule : ● Partition Rule : ● Bayes Rule :
  • 3. ● Prior probability P(A) : (initial estimate of how likely event A is in the absence of any other information). ● Posterior probability P(A|B) : after having seen the evidence B, based on the likelihood of B occurring in the two cases that A does or does not hold. ● Odds of an event provide a kind of multiplier for how probabilities change. Odds : Basic Probability Theory
  • 4. The 1/0 loss case ● Ranked retrieval setup: given a collection of documents, the user issues a query, and an ordered list of documents is returned. ● Assume binary notion of relevance: Rd,q is a random dichotomous variable, such that ● Rd,q = 1 if document d is relevant w.r.t query q ● Rd,q = 0 otherwise
  • 5. The Probability Ranking Principle ● If a retrieval system responds to the query of an user by giving a set of documents in the decreasing order of their probability of relevance, the overall effectiveness of the system will be the best that is obtainable. ● It is assumed that probabilities are calculated based on the entire data available to the system.
  • 6. The Binary Independence Model (BIM) ● Binary (equivalent to Boolean) : Documents and Queries are both represented as binary term vectors. ● E.g., document d represented by vector x = (x1 , . . . , xM ), where xt = 1 if term t occurs in d and xt = 0 otherwise. ● Different documents may have the same vector representation.
  • 7. Binary Independence Model To make a probabilistic retrieval strategy precise, need to estimate how terms in documents contribute to relevance ● Find measurable statistics (term frequency, document frequency, document length) that affect judgments about document relevance ● Combine these statistics to estimate the probability of document relevance ● Order documents by decreasing estimated probability of relevance P(R|d, q) ● Assume that the relevance of each document is independent of the relevance of other documents (not true, in practice allows duplicate results).
  • 9. Deriving a ranking function for query terms ● Aim: Given a query q, ○ return documents by descending P(R=1 | d,q) in BIM ○ As we are interested only in ranking the documents, we rank them by their odds of relevance.
  • 10. Deriving a ranking function for query terms ● Since each xt is either 0 or 1, we can separate the terms to give: ● let = probability of a term appearing in a document relevant to the query ● = be the probability of a term appearing in a nonrelevant document
  • 11. Deriving a ranking function for query terms ▪ ▪
  • 12. Retrieval Status Value (RSV) ● ● ▪
  • 13. Retrieval Status Value (RSV) ▪ − − ▪ ▪
  • 14. Deriving a ranking function for query terms
  • 15. Probability Estimates in Practice ▪ ▪ − − ≈ ▪
  • 16. Probability Estimates in Practice ● ● ▪ − ▪ ▪ ▪
  • 17. An appraisal and some extensions ● ● ● ● ● ●
  • 18. Tree structured dependencies between terms ● ●
  • 19.
  • 20. Okapi BM25, a non binary model ● ●
  • 21.
  • 22.
  • 24.