SlideShare a Scribd company logo
1 of 42
The Economics in
Interactive Information Retrieval

                               Leif Azzopardi
                    http://www.dcs.gla.ac.uk/~leif
Interaction
Cost
          Benefit
Interactive and Iterative Search
                     A simplified, abstracted, representation



Information
    Need


                                  Documents
                                  Returned
              User

                                                       System




                                    Queries
  Relevant
Information
Observational & Empirical


          Berry
         Picking      IS&R
ASK                Framework
                               Information
                               Foraging
        Theoretical & Formal    Theory

                                  Pirolli (1999)
Theoretical & Formal
                              A Major Research Challenge
Interactive Information Retrieval needs formal models to:
   • describe, explain and predict the interaction of users
     with systems,
   • provide a basis on which to reason about interaction,
   • understand the relationships between interaction,
     performance and cost,
   • help guide the design, development and research of
     information systems, and
                                                     Belkin (2008)
   • derive laws and principles of interaction.
                                                      Jarvelin (2011)
User queries tend to be
short (only 2-3 terms)              Web searchers typically
                                    only examine the first
Users will often pose a             page of results
series of short queries


    WhyHowusers behave like this?
        do do users behave?
Patent searchers typically         Users adapt to degraded
examine 100-200 documents per      systems by issuing more
query (using a Boolean system)     queries

Patent searchers usually express   Users rarely provide
longer and complex queries         explicit relevance feedback
So why do users pose short queries?


User queries tend to be short




  But longer queries tend to be more effective!
So why do users pose short queries?

               0.5
                                                          Exponentially diminishing
              0.45
                                                          returns kicks in after 2
               0.4
                                                          query terms
              0.35
Performance




               0.3
                                 Total Performance
              0.25
               0.2
              0.15                                        Around 2-3 terms is where
               0.1            Marginal Performance        the user gets the most bang
              0.05                                        for their buck
                0
                     0          10        20         30

                         Query Length (No. of Terms)
                                                                       Azzopardi (2009)
How can we use microeconomics to
   model the search process?
Microeconomics

Production Theory

 Consumer Theory

Utility Maximization

 Cost Minimization
Production Theory
              a.k.a. Theory of Firms




Inputs                                     Output



                   The Firm
 Capital                                Widgets
 Labor
              Utilizes     Constrains


                                           Varian (1987)
                 Technology
Production Functions


Production Function
     Capital




                        Labor
Production Functions


Production Function      Quantity = F ( Capital, Labor )
     Capital




                                                     Quantity 3

                                                     Quantity 2

                                                     Quantity 1
                        Labor
Production Functions

                                 Production Set
Production Function
     Capital




                                       Quantity 3

                                       Quantity 2

                                       Quantity 1
                        Labor
Production Functions

                                  Production Set
Production Function
     Capital




                                        Quantity 3

                                        Quantity 2
  Technology constrains
    the production set                  Quantity 1
                          Labor
Applying Production Theory to
Interactive Information Retrieval
Interactive and Iterative Search
                     A simplified, abstracted, representation



Information
    Need


                                  Documents
                                  Returned
              User

                                                       System




                                    Queries
  Relevant
Information
Search as Production

Inputs                                            Output




                      The Firm
 Queries                                      Relevance
 Assessments                                       Gain
                   Utilizes      Constrains

               Search Engine Technology
Search Production Function

                                         The function represents how
                                         well a system could be used.
No. of Queries (Q)




                                         i.e. the min input required to
                                         achieve that level of gain



                                                            Gain = 30

 Gain = F(Q,A)                                              Gain = 20

                                                            Gain = 10
                        No. of Assessments per Query (A)
What strategies can the user employ
          when interacting with the search system to achieve their end goal

                                                              Lots of
  Few Queries,                                               Queries,
    Lots of                                                    Few
 Assessments?                                              Assessments
                                                                ?




  Or some
 other way?

                        What is the most cost-efficient way for
                        a user to interact with an IR system?
Modeling Caveats
     of an economic model of the search process



Abstracted
                                    Simplified


Representative
What does the model
tell us about search & interaction?
Search Scenario
                    Scenario
• Task: Find news articles about ….
• Goal: To find a number of relevant documents and reach
  the desired level of Cumulative Gain.
• Output: Total Cumulative Gain (G) across the session
• Inputs:
   – Y No. of Queries, and
   – X No. of Assessments per Query
• Collections:
   – TREC News Collections (AP, LA, Aquaint)
   – Each topic had about 30 or more relevant documents
• Simulation: built using C++ and the Lemur IR toolkit
Simulating User Interaction
                                                              Models:
                                                              Probabilistic
     TREC                                                     Vector Space
   Aquaint                                                    Boolean
    Topics

                                          Assesses
        Record X & Y                      X Documents
        for each      Simulated User      per Query
The simulation assumes the user
        level of gain              has perfect information –
in order to find out how well the system could be used.
                              Select the best
                                  query
                                  first/next


                  Queries generated        Issues Y Queries
TREC Documents    from Relevant set        of Length 3
marked Relevant
Search Production Curves
                            Same Retrieval Model, Different Gain
                                    TREC Aquaint Collection
                   20
To double the gain, requires
        18                                                BM25 NCG=0.2
more than double the no. of
        16
assessments                                               BM25 NCG=0.4
        14
  No. of Queries




                   12                        8 Q & 15 Q/A gets NCG = 0.4
                   10                        4 Q & 40 Q/A gets NCG = 0.4

                   8
                                             7.7 Q & 5 Q/A gets NCG = 0.2
                   6                         3.6 Q & 15 Q/A gets NCG = 0.2
                   4
                   2
                   0
                        0      50     100      150     200     250       300
                              No. of Assessments per Query
Search Production Curves
                           Different Retrieval Models, Same Gain
                                   TREC Aquaint Collection
                     20
                           No input combinations         BM25 NCG=0.4
                     18
                           with depth less than this     BOOL NCG=0.4
                     16    are technically feasible!     TFIDF NCG=0.4
                     14
    No. of Queries




                                                                         For the same gain, BOOL
                     12
                                                                         and TFIDF require a lot
                     10                                                  more interaction.
                      8
                      6
                                                                         BM25 provides more
                      4                                                  strategies (i.e. input
User Adaption:                                                           combinations) than
           2                                                             BOOL or TFIDF
-BM25: 5 Q @ 25 A/Q
           0
-BOOL: 10 Q @ 25A/Q
More queries on the 50
              0            100   150   200    250   300
degraded systems       No. of Assessments per Query
Search Production Function
                                          Cobbs-Douglas Production Function


  No. of Assessments per query           Mixing parameter determined by the technology


                                                      a        (1-a )
                 f (Q, A) = K.Q .A
No. of queries issued                             Efficiency of the technology used


                Model          K          α        Goodness of Fit
                BM25          5.39       0.58          0.995
                BOOL          3.47       0.58          0.992
                  TFIDF       1.69       0.50             0.997
                        Example Values on Aquaint when NCG = 0.6
Using the Cobbs-Douglas Search Function
    We can differentiate the function to find the rates of change of the input variables


                                                            ¶f (Q, A)
 Marginal Product of Querying                                  ¶Q
   – the change in gain over the change in querying
   – i.e. how much more gain do we get if we pose
     extra queries
                               ¶f (Q, A)
 Marginal Product of Assessing    ¶A
   – the change in gain over the change in assessing
   – i.e. how much more gain do we get if we assess
     extra documents
Technical Rate of Substitution
How many more assessments per query are needed, if one less query was posed?


                          TRS of Assessments for Queries
                                                                                ¶A
                 20
                                                                     TRS(A,Q) =
                 18         0.4                 BM25 NCG=0.4
                                           At this point if you gave up         ¶Q
                 16                        one query you’d need to
No. of Queries




                 14          1.2           assess 1.2 extra docs/query
                 12
                              2.5                           EXAMPLE:
                 10
                                                            If 5 queries are
                 8
                                   4.2                      submitted, instead of 6, then
                 6                                          24.2 docs/query need to be
                                          8.3
                 4                                          assessed, instead of 20
                 2
                                                            docs/query
                 0                                          6Q @ 20A / Q = 120 A
                      0         100             200         5Q300 24.2 / Q = 121 A
                                                               @
                             No. of Assessments per Query
What about the cost of
    interaction?
User Search Cost Function
                                                A linear cost function


  No. of Assessments per query                 Total no. of documents assessed



                 c(Q, A) = b.Q +Q.A
No. of queries issued        Relative cost of a Query to an Assessment


        What is the relative cost of a query?
  Using cognitive costs of querying and assessing taken
  from Gwizdka (2010):
     • The average cost of querying was 2628 ms
     • The average cost of assessing was 2226 ms
     • So β was set to 2628/2226 = 1.1598
Cost Efficient Strategies
                 BM25 0.4 and 0.6 Gains
                  50
No. of Queries



                  40                                     On BM25 to increase
                  30                                     gain pose more
                  20                                     queries, but examine
                  10                      BM25@0.6       the same no. of docs
                   0
                                          BM25@0.4       per query
                       0   10   20   30

                 380
                 330
Cost




                 280
                 230                      Minimum Cost
                 180
                 130
                       0   10   20   30

         No. of Assessment per Query
Cost Efficient Strategies
                                                 BOOL 0.4 & 0.6 Gains
     On Boolean, to                              12




                               No. of Queries
       increase gain,                            10
                                                  8
 issue the about the                              6                BOOL@0.6
same no. of queries,                              4
  but examine more                                2                BOOL@0.4
      docs per query                              0
                                                       0   100     200
                                                1500
                                                1300
                                                1100
                               Cost              900
                                                 700
                Minimum Cost                     500
                                                 300
                                                       0   100     200
                                            No. of Assessment per Query
Contrasting Systems
                       BM25 0.4 and 0.6 Gains                                 BOOL 0.4 and 0.6 Gains
                 50                                      12
                 40                                      10




                                                       No. of Queries
No. of Queries




                                                          8
                 30               On BM25 issue more queries
                                                          6                                      BOOL@0.6
                 20                                       4
                 10                                       2                                      BOOL@0.4
                  0               But examine less doc per query
                                                          0
                       0     10     20      30                            0           100         200

                 380                                              1500
                 330                                              1300
                                                                  1100
Cost




                                                       Cost
                 280
                                                                   900
                 230                                               700
                 180              BM25 is less costly to          use than
                                                                   500            BOOL
                 130                                               300
                       0     10     20      30                             0           100        200
                  No. of Assessment per Query                           No. of Assessment per Query
A Hypothetical Experiment
                              What happens if



       Querying     More                  Decrease in
       costs        queries               assessments
       go down?     issued                per query


$$$$


       Querying     Decrease in           Increase in
       costs        queries               assessments
       go up?       issued                per query
Changing the Relative Query Cost

                                         c(Q, A) = b.Q +Q.A

                                         As β increases the
                                         relative cost of
Cost




                                         querying goes up,
                                         it is cheaper to assess
                                         more documents per
                                         query and
                                         consequently query
                                         less!

           No. of Assessment per Query
Implications for Design
• Knowing how benefit, interaction and cost
  relate can help guide how we design systems
  – We can theorize about how changes to the
    system will affect the user’s interaction
     • Is this desirable? Do we want the user to query more?
       Or for them to assess more?
  – We can categorize the type of user
     • Is this a savvy rational user? Or is this a user behaving
       irrationally?
  – We can scrutinize the introduce of new features
     • Are they going to be of any use? Are they worth it for
       the user? i.e. how much more performance, or how
       little must they cost?
Future Directions
            Future Directions
• Validate the theory by conducting
  observational & empirical research
  – Do the predictions about user behavior hold?
• Incorporate other inputs into the model
  – Find Similar, Relevance Feedback, Browsing,
  – Query length, Query Type, etc
• Develop more accurate cost functions
  – Obtain Better Estimates of Costs
• Model other search tasks
Questions

  Contact Details

Email: Leifos@acm.org

Skype: Leifos

Twitter: @leifos
Selected References
• Varian, H., Intermediate Microeconomics, 1987
• Varian, H., Economics and Search, ACM SIGIR Forum, 1999
• Pirolli, P., Information Foraging Theory, 1999
• Belkin, N., Some (what) grand challenges of Interactive
  Information Retrieval, ACM SIGIR Forum, 2008
• Azzopardi, L., Query Side Evaluation, ACM SIGIR 2009
    –   http://dl.acm.org/citation.cfm?doid=1571941.1572037

• Azzopardi, L., The Economics of Interactive Information
  Retrieval, ACM SIGIR 2011
    –   http://dl.acm.org/citation.cfm?doid=2009916.2009923

• Jarvelin, K., IR Research:
  Systems, Interaction, Evaluation and Theories, ACM
  SIGIR Forum, 2011
Search Production Function
                                        Example




                                      G = F( X, Y )
Interaction X




                      Interaction Y
Search Production Function
                                        Example application for web search




                                                      P@10 = F(L,A)
Length of Query (L)




                                                                       P@10= 0.3

                                                                       P@10= 0.2

                                                                       P@10= 0.1
                            No. of Assessments (A)

More Related Content

Similar to Azzopardi2012economics of iir_tech_talk

Software Architecture: Test Case Writing
Software Architecture: Test Case WritingSoftware Architecture: Test Case Writing
Software Architecture: Test Case WritingSitdhibong Laokok
 
Breaking the Boundaries of Human-in-the-Loop Optimization
Breaking the Boundaries of Human-in-the-Loop OptimizationBreaking the Boundaries of Human-in-the-Loop Optimization
Breaking the Boundaries of Human-in-the-Loop OptimizationYiChiLiao2
 
Refactoring AOMs For AgilePT2010
Refactoring AOMs For AgilePT2010Refactoring AOMs For AgilePT2010
Refactoring AOMs For AgilePT2010Joseph Yoder
 
Real User Experience Insight
Real User Experience InsightReal User Experience Insight
Real User Experience Insightruiruitang
 
Real User Experience Insight
Real User Experience InsightReal User Experience Insight
Real User Experience Insightruiruitang
 
Real User Experience Insight
Real User Experience InsightReal User Experience Insight
Real User Experience Insightruiruitang
 
ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML Meetup
ML Infra @ Spotify: Lessons Learned - Romain Yon -  NYC ML MeetupML Infra @ Spotify: Lessons Learned - Romain Yon -  NYC ML Meetup
ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML MeetupRomain Yon
 
Faster apps. faster time to market. faster mean time to repair
Faster apps. faster time to market. faster mean time to repairFaster apps. faster time to market. faster mean time to repair
Faster apps. faster time to market. faster mean time to repairCompuware ASEAN
 
Artificial Bee Colony Based Image Enhancement for Color Images in Discrete Wa...
Artificial Bee Colony Based Image Enhancement for Color Images in Discrete Wa...Artificial Bee Colony Based Image Enhancement for Color Images in Discrete Wa...
Artificial Bee Colony Based Image Enhancement for Color Images in Discrete Wa...IRJET Journal
 
Software test automation_overview
Software test automation_overviewSoftware test automation_overview
Software test automation_overviewRohan Bhattarai
 
Res Software In Healthcare
Res Software In HealthcareRes Software In Healthcare
Res Software In Healthcarejckirby
 
Res Software In Healthcare
Res Software In HealthcareRes Software In Healthcare
Res Software In Healthcarejckirby
 
Design Principles of Advanced Task Elicitation Systems
Design Principles of Advanced Task Elicitation SystemsDesign Principles of Advanced Task Elicitation Systems
Design Principles of Advanced Task Elicitation SystemsProf. Dr. Alexander Maedche
 

Similar to Azzopardi2012economics of iir_tech_talk (20)

Software Architecture: Test Case Writing
Software Architecture: Test Case WritingSoftware Architecture: Test Case Writing
Software Architecture: Test Case Writing
 
Icsm08a.ppt
Icsm08a.pptIcsm08a.ppt
Icsm08a.ppt
 
Writing Good Use Cases
Writing Good Use CasesWriting Good Use Cases
Writing Good Use Cases
 
Feasible
FeasibleFeasible
Feasible
 
Breaking the Boundaries of Human-in-the-Loop Optimization
Breaking the Boundaries of Human-in-the-Loop OptimizationBreaking the Boundaries of Human-in-the-Loop Optimization
Breaking the Boundaries of Human-in-the-Loop Optimization
 
Refactoring AOMs For AgilePT2010
Refactoring AOMs For AgilePT2010Refactoring AOMs For AgilePT2010
Refactoring AOMs For AgilePT2010
 
Real User Experience Insight
Real User Experience InsightReal User Experience Insight
Real User Experience Insight
 
Real User Experience Insight
Real User Experience InsightReal User Experience Insight
Real User Experience Insight
 
Real User Experience Insight
Real User Experience InsightReal User Experience Insight
Real User Experience Insight
 
ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML Meetup
ML Infra @ Spotify: Lessons Learned - Romain Yon -  NYC ML MeetupML Infra @ Spotify: Lessons Learned - Romain Yon -  NYC ML Meetup
ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML Meetup
 
Savi chapter10
Savi chapter10Savi chapter10
Savi chapter10
 
Faster apps. faster time to market. faster mean time to repair
Faster apps. faster time to market. faster mean time to repairFaster apps. faster time to market. faster mean time to repair
Faster apps. faster time to market. faster mean time to repair
 
Cisco Case study
Cisco Case studyCisco Case study
Cisco Case study
 
Artificial Bee Colony Based Image Enhancement for Color Images in Discrete Wa...
Artificial Bee Colony Based Image Enhancement for Color Images in Discrete Wa...Artificial Bee Colony Based Image Enhancement for Color Images in Discrete Wa...
Artificial Bee Colony Based Image Enhancement for Color Images in Discrete Wa...
 
Itpe brief
Itpe briefItpe brief
Itpe brief
 
Software test automation_overview
Software test automation_overviewSoftware test automation_overview
Software test automation_overview
 
Res Software In Healthcare
Res Software In HealthcareRes Software In Healthcare
Res Software In Healthcare
 
Res Software In Healthcare
Res Software In HealthcareRes Software In Healthcare
Res Software In Healthcare
 
Software Metrics
Software MetricsSoftware Metrics
Software Metrics
 
Design Principles of Advanced Task Elicitation Systems
Design Principles of Advanced Task Elicitation SystemsDesign Principles of Advanced Task Elicitation Systems
Design Principles of Advanced Task Elicitation Systems
 

Recently uploaded

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Azzopardi2012economics of iir_tech_talk

  • 1. The Economics in Interactive Information Retrieval Leif Azzopardi http://www.dcs.gla.ac.uk/~leif
  • 3. Interactive and Iterative Search A simplified, abstracted, representation Information Need Documents Returned User System Queries Relevant Information
  • 4. Observational & Empirical Berry Picking IS&R ASK Framework Information Foraging Theoretical & Formal Theory Pirolli (1999)
  • 5. Theoretical & Formal A Major Research Challenge Interactive Information Retrieval needs formal models to: • describe, explain and predict the interaction of users with systems, • provide a basis on which to reason about interaction, • understand the relationships between interaction, performance and cost, • help guide the design, development and research of information systems, and Belkin (2008) • derive laws and principles of interaction. Jarvelin (2011)
  • 6. User queries tend to be short (only 2-3 terms) Web searchers typically only examine the first Users will often pose a page of results series of short queries WhyHowusers behave like this? do do users behave? Patent searchers typically Users adapt to degraded examine 100-200 documents per systems by issuing more query (using a Boolean system) queries Patent searchers usually express Users rarely provide longer and complex queries explicit relevance feedback
  • 7. So why do users pose short queries? User queries tend to be short But longer queries tend to be more effective!
  • 8. So why do users pose short queries? 0.5 Exponentially diminishing 0.45 returns kicks in after 2 0.4 query terms 0.35 Performance 0.3 Total Performance 0.25 0.2 0.15 Around 2-3 terms is where 0.1 Marginal Performance the user gets the most bang 0.05 for their buck 0 0 10 20 30 Query Length (No. of Terms) Azzopardi (2009)
  • 9. How can we use microeconomics to model the search process?
  • 10. Microeconomics Production Theory Consumer Theory Utility Maximization Cost Minimization
  • 11. Production Theory a.k.a. Theory of Firms Inputs Output The Firm Capital Widgets Labor Utilizes Constrains Varian (1987) Technology
  • 13. Production Functions Production Function Quantity = F ( Capital, Labor ) Capital Quantity 3 Quantity 2 Quantity 1 Labor
  • 14. Production Functions Production Set Production Function Capital Quantity 3 Quantity 2 Quantity 1 Labor
  • 15. Production Functions Production Set Production Function Capital Quantity 3 Quantity 2 Technology constrains the production set Quantity 1 Labor
  • 16. Applying Production Theory to Interactive Information Retrieval
  • 17. Interactive and Iterative Search A simplified, abstracted, representation Information Need Documents Returned User System Queries Relevant Information
  • 18. Search as Production Inputs Output The Firm Queries Relevance Assessments Gain Utilizes Constrains Search Engine Technology
  • 19. Search Production Function The function represents how well a system could be used. No. of Queries (Q) i.e. the min input required to achieve that level of gain Gain = 30 Gain = F(Q,A) Gain = 20 Gain = 10 No. of Assessments per Query (A)
  • 20. What strategies can the user employ when interacting with the search system to achieve their end goal Lots of Few Queries, Queries, Lots of Few Assessments? Assessments ? Or some other way? What is the most cost-efficient way for a user to interact with an IR system?
  • 21. Modeling Caveats of an economic model of the search process Abstracted Simplified Representative
  • 22. What does the model tell us about search & interaction?
  • 23. Search Scenario Scenario • Task: Find news articles about …. • Goal: To find a number of relevant documents and reach the desired level of Cumulative Gain. • Output: Total Cumulative Gain (G) across the session • Inputs: – Y No. of Queries, and – X No. of Assessments per Query • Collections: – TREC News Collections (AP, LA, Aquaint) – Each topic had about 30 or more relevant documents • Simulation: built using C++ and the Lemur IR toolkit
  • 24. Simulating User Interaction Models: Probabilistic TREC Vector Space Aquaint Boolean Topics Assesses Record X & Y X Documents for each Simulated User per Query The simulation assumes the user level of gain has perfect information – in order to find out how well the system could be used. Select the best query first/next Queries generated Issues Y Queries TREC Documents from Relevant set of Length 3 marked Relevant
  • 25. Search Production Curves Same Retrieval Model, Different Gain TREC Aquaint Collection 20 To double the gain, requires 18 BM25 NCG=0.2 more than double the no. of 16 assessments BM25 NCG=0.4 14 No. of Queries 12 8 Q & 15 Q/A gets NCG = 0.4 10 4 Q & 40 Q/A gets NCG = 0.4 8 7.7 Q & 5 Q/A gets NCG = 0.2 6 3.6 Q & 15 Q/A gets NCG = 0.2 4 2 0 0 50 100 150 200 250 300 No. of Assessments per Query
  • 26. Search Production Curves Different Retrieval Models, Same Gain TREC Aquaint Collection 20 No input combinations BM25 NCG=0.4 18 with depth less than this BOOL NCG=0.4 16 are technically feasible! TFIDF NCG=0.4 14 No. of Queries For the same gain, BOOL 12 and TFIDF require a lot 10 more interaction. 8 6 BM25 provides more 4 strategies (i.e. input User Adaption: combinations) than 2 BOOL or TFIDF -BM25: 5 Q @ 25 A/Q 0 -BOOL: 10 Q @ 25A/Q More queries on the 50 0 100 150 200 250 300 degraded systems No. of Assessments per Query
  • 27. Search Production Function Cobbs-Douglas Production Function No. of Assessments per query Mixing parameter determined by the technology a (1-a ) f (Q, A) = K.Q .A No. of queries issued Efficiency of the technology used Model K α Goodness of Fit BM25 5.39 0.58 0.995 BOOL 3.47 0.58 0.992 TFIDF 1.69 0.50 0.997 Example Values on Aquaint when NCG = 0.6
  • 28. Using the Cobbs-Douglas Search Function We can differentiate the function to find the rates of change of the input variables ¶f (Q, A) Marginal Product of Querying ¶Q – the change in gain over the change in querying – i.e. how much more gain do we get if we pose extra queries ¶f (Q, A) Marginal Product of Assessing ¶A – the change in gain over the change in assessing – i.e. how much more gain do we get if we assess extra documents
  • 29. Technical Rate of Substitution How many more assessments per query are needed, if one less query was posed? TRS of Assessments for Queries ¶A 20 TRS(A,Q) = 18 0.4 BM25 NCG=0.4 At this point if you gave up ¶Q 16 one query you’d need to No. of Queries 14 1.2 assess 1.2 extra docs/query 12 2.5 EXAMPLE: 10 If 5 queries are 8 4.2 submitted, instead of 6, then 6 24.2 docs/query need to be 8.3 4 assessed, instead of 20 2 docs/query 0 6Q @ 20A / Q = 120 A 0 100 200 5Q300 24.2 / Q = 121 A @ No. of Assessments per Query
  • 30. What about the cost of interaction?
  • 31. User Search Cost Function A linear cost function No. of Assessments per query Total no. of documents assessed c(Q, A) = b.Q +Q.A No. of queries issued Relative cost of a Query to an Assessment What is the relative cost of a query? Using cognitive costs of querying and assessing taken from Gwizdka (2010): • The average cost of querying was 2628 ms • The average cost of assessing was 2226 ms • So β was set to 2628/2226 = 1.1598
  • 32. Cost Efficient Strategies BM25 0.4 and 0.6 Gains 50 No. of Queries 40 On BM25 to increase 30 gain pose more 20 queries, but examine 10 BM25@0.6 the same no. of docs 0 BM25@0.4 per query 0 10 20 30 380 330 Cost 280 230 Minimum Cost 180 130 0 10 20 30 No. of Assessment per Query
  • 33. Cost Efficient Strategies BOOL 0.4 & 0.6 Gains On Boolean, to 12 No. of Queries increase gain, 10 8 issue the about the 6 BOOL@0.6 same no. of queries, 4 but examine more 2 BOOL@0.4 docs per query 0 0 100 200 1500 1300 1100 Cost 900 700 Minimum Cost 500 300 0 100 200 No. of Assessment per Query
  • 34. Contrasting Systems BM25 0.4 and 0.6 Gains BOOL 0.4 and 0.6 Gains 50 12 40 10 No. of Queries No. of Queries 8 30 On BM25 issue more queries 6 BOOL@0.6 20 4 10 2 BOOL@0.4 0 But examine less doc per query 0 0 10 20 30 0 100 200 380 1500 330 1300 1100 Cost Cost 280 900 230 700 180 BM25 is less costly to use than 500 BOOL 130 300 0 10 20 30 0 100 200 No. of Assessment per Query No. of Assessment per Query
  • 35. A Hypothetical Experiment What happens if Querying More Decrease in costs queries assessments go down? issued per query $$$$ Querying Decrease in Increase in costs queries assessments go up? issued per query
  • 36. Changing the Relative Query Cost c(Q, A) = b.Q +Q.A As β increases the relative cost of Cost querying goes up, it is cheaper to assess more documents per query and consequently query less! No. of Assessment per Query
  • 37. Implications for Design • Knowing how benefit, interaction and cost relate can help guide how we design systems – We can theorize about how changes to the system will affect the user’s interaction • Is this desirable? Do we want the user to query more? Or for them to assess more? – We can categorize the type of user • Is this a savvy rational user? Or is this a user behaving irrationally? – We can scrutinize the introduce of new features • Are they going to be of any use? Are they worth it for the user? i.e. how much more performance, or how little must they cost?
  • 38. Future Directions Future Directions • Validate the theory by conducting observational & empirical research – Do the predictions about user behavior hold? • Incorporate other inputs into the model – Find Similar, Relevance Feedback, Browsing, – Query length, Query Type, etc • Develop more accurate cost functions – Obtain Better Estimates of Costs • Model other search tasks
  • 39. Questions Contact Details Email: Leifos@acm.org Skype: Leifos Twitter: @leifos
  • 40. Selected References • Varian, H., Intermediate Microeconomics, 1987 • Varian, H., Economics and Search, ACM SIGIR Forum, 1999 • Pirolli, P., Information Foraging Theory, 1999 • Belkin, N., Some (what) grand challenges of Interactive Information Retrieval, ACM SIGIR Forum, 2008 • Azzopardi, L., Query Side Evaluation, ACM SIGIR 2009 – http://dl.acm.org/citation.cfm?doid=1571941.1572037 • Azzopardi, L., The Economics of Interactive Information Retrieval, ACM SIGIR 2011 – http://dl.acm.org/citation.cfm?doid=2009916.2009923 • Jarvelin, K., IR Research: Systems, Interaction, Evaluation and Theories, ACM SIGIR Forum, 2011
  • 41. Search Production Function Example G = F( X, Y ) Interaction X Interaction Y
  • 42. Search Production Function Example application for web search P@10 = F(L,A) Length of Query (L) P@10= 0.3 P@10= 0.2 P@10= 0.1 No. of Assessments (A)

Editor's Notes

  1. In this talk I will discuss how we can use micro-economics to describe how users interact with a retrieval system – essentially, I will model how the benefit/gain/performance a user obtains from a system, the interactions which they perform, and the cost of these interaction -
  2. Note: related to this work is the work by Piriolli, Card and Ed Chi on Information Foraging Theory.
  3. Belkin (2008) outlined some of the challenges within IIRJarvelin (2011) also argued the need to understand Info. Sys. Through the development of formal models and testable theories to describe the interaction b/w users and systems.It is a major research challenge because of all the complexities involved with users, their interactions with information and the systems that they employ.
  4. So this provides an economic justification for posing short queries..
  5. Microeconomics might give us the right tools to models IIR.we have build a formal model based on production theory from economics: which explains, predicts..etc. .An area that looks how to
  6. A firm produces output (such as goods or services) A firm requires inputs (such as capital and labor)A firm utilizes some form of technology to then transform the inputs into outputs.
  7. Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  8. Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  9. Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  10. Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  11. Inputs: the number of queries, the length of queries, the number of documents assessed per query, etc.Output: a number of relevant documents (or gain from the relevant information found).Technology used: a Search engine
  12. Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  13. And if we map a cost function to the interactions then we can ask, “what is the most cost-efficient way for a user to interact with an IR system?”What strategy should a user employ to achieve their goal?
  14. What strategy should a user eYes, the model is abstracted and general. ….. Search has many more inputs and outputs. Lots more variables, but we have abstracted away these details. We have simplified the search process to two core variables that affect the output.But this doesn’t mean the model doesn’t have any explanatory powerRepresentative, but not necessarily wholly realisticemploy to achieve their goal?
  15. Now that we have framed the search process as an economics problem, and we have an economic model that describes the output given the inputs and the technology, the big question is: WHAT CAN WE DO WITH IT?So to explore the application of this theory to IIR we perform an economic analysis of search
  16. Airbus Subsidies byEuropean governmentsCases of Insider tradingTropical storms where people were killedSImulated interaction: i.e. to determine the minimum inputs for the desired output – and thus obtain the production function.
  17. SImulated interaction: i.e. to determine the minimum inputs for the desired output – and thus obtain the production function.To explore the range of possible user strategies i.e. examine all the combinations of inputs .- Queries of length 3 were generated for each topic given the relevance documents. i.e. create high quality queries.Simulated Interaction: - A session was comprised of a series of queries, and a given assessment depth. The session ended when the desired gain was achieved.- Best-First approach to obtain an approximation for an empirical production function.
  18. The blue and purple lines converge, because I stopped the simulation when ncg > 0.2, and not (ncg > 0.2 and <0.4). So that is why they converge i.e. by the time A = 200, and the same query is submitted the gain is the same, >0.2 and >0.4.I really should have fixed the simulation, and stopped when the gain was greater than 0.25 so the the production curve for 0.2 gain would stop at about A =75.
  19. So far we have only examined empirical estimates of the production curves/functions. It would be good if we could fit a mathematical function to these curves to have well defined model.
  20. So far we have empirically estimated the production function. However, it is common in economics to fit a functional form to the production function – so that we can mathematically describe the production process.
  21. So far, we have obtained a model, which given the inputs and a particular technology, estimates the total cumulative gain. However, given that we want to determine what strategy minimizes the users cost – then we need to formulate a cost function to represent the cost of interaction.Assuming that assessing one document is equal to ONE.
  22. Now that we have a way to frame the interaction between a user and a system when searching, we can now hypothesise about the users behavior if variables or parameters in the model change. For example…
  23. Insert graph here from ECON-IIR paper.
  24. Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  25. Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs