SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
A    Crowdsourceable
         QoE Evaluation Framework for 
             Multimedia Content


                Kuan‐Ta Chen     Academia Sinica
                Chen‐Chi Wu      National Taiwan University
                Yu‐Chun Chang    National Taiwan University
                Chin‐Laung Lei   National Taiwan University 

ACM Multimedia 2009
What is QoE?

                        Quality of Experience = 

                        Users’ satisfaction 
                           about a service 
                    (e.g., multimedia content)


A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    2
Quality of Experience




               Poor                                                   Good
           (underexposed)                                          (exposure OK)

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    3
Challenges
        How to quantify the QoE of multimedia content 
        efficiently and reliably?


                                          Q=?                                       Q=?



                                          Q=?                                       Q=?

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen          4
Mean Opinion Score (MOS)
        Idea: Single Stimulus Method (SSM) + Absolute Categorial 
        Rating (ACR)
                            MOS        Quality                 Impairment
                             5      Excellent     Imperceptible
                             4      Good          Perceptible but not annoying
                             3      Fair          Slightly annoying
                             2      Poor          Annoying
                             1      Bad           Very annoying



                                                               Excellent?
                                                               Good?
                                                               Fair?                vote
                                                               Poor?
                                                               Bad?
                                                                                           Fair

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen                  5
Drawbacks of MOS‐based Evaluations
        ACR‐based
              Concepts of the scales cannot be concretely defined
              Dissimilar interpretations of the scale among users
              Only an ordinal scale, not an interval scale                          Solve all 
              Difficult to verify users’ scores                                     these 
                                                                                    drawbacks
        Subjective experiments in laboratory
              Monetary cost (reward, transportation)
              Labor cost (supervision)
              Physical space/time/hardware constraints



A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen             6
Drawbacks of MOS‐based Evaluations
        ACR‐based
              Concepts of the scales cannot be concretely defined                   Paired 
              Dissimilar interpretations of the scale among users                   Comparison
              Only an ordinal scale, not an interval scale
              Difficult to verify users’ scores


        Subjective experiments in laboratory
              Monetary cost (reward, transportation)
                                                                             Crowdsourcing
              Labor cost (supervision)
              Physical space/time/hardware constraints



A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen             7
Contribution




A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    8
Talk Progress
                  Overview
                  Methodology
                        Paired Comparison
                        Crowdsourcing Support
                        Experiment Design
                  Case Study & Evaluation
                        Acoustic QoE
                        Optical QoE
                  Conclusion

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    9
Current Approach: MOS Rating


                                                                    Excellent?
                                                                    Good?
                                                                    Fair?
                                                                    Poor?           Vote
                                                                    Bad?                   ?




A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen               10
Our Proposal: Paired Comparison




                       A                                                            B
                                                                          Which one 
                                                                          is better?

                                                     Vote



                                                      B
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen        11
Properties of Paired Comparison
        Generalizable across different content types and 
        applications

        Simple comparative judgment
              dichotomous decision easier than 5‐category rating


        Interval‐scale QoE scores can be inferred

        The users’ inputs can be verified

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    12
Choice Frequency Matrix
    10 experiments, each containing C(4,2)=6 paired comparisons
                      A           B           C            D




A
                      0           9           10           9
B                     1           0            7           8
C                     0           3            0           6
D
                      1           2            4           0
Inference of QoE Scores
        Bradley‐Terry‐Luce (BTL) model
              input: choice frequency matrix
              output: an interval‐scale score for each content 
              (based on maximum likelihood estimation)
                n content: T1,…, Tn
                Pij : the probability of choosing Ti over Tj
                u(Ti) is the estimated QoE score of the quality level Ti
                                                        u (Ti ) − u (T j )
                                   π (Ti )          e
                         Pij =                   =
                               π (Ti ) + π (T j ) 1 + eu (T ) −u (T )
                                                                i            j




           Basic Idea
                 P12 = P23            u(T1) - u(T2) = u(T2) - u(T3)
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    14
Inferred QoE Scores

                        0                   0.63                0.91                1




A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen        15
Talk Progress
                  Overview
                  Methodology
                        Paired Comparison
                        Crowdsourcing Support
                        Experiment Design
                  Case Study & Evaluation
                        Acoustic QoE
                        Optical QoE
                  Conclusion

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    16
Crowdsourcing
                  = Crowd + Outsourcing
                   “soliciting solutions via open calls 
                       to large‐scale communities”




A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    17
Image Understanding
        Reward: 0.04 USD

    main theme?
    key objects?
  unique attributes?




A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    18
Linguistic Annotations
        Word similarity (Snow et al. 2008)




               USD 0.2 for labeling 30 word pairs

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    19
More Examples
        Document relevance evaluation
            Alonso et al. (2008)
        Document rating collection
            Kittur et al. (2008)
        Noun compound paraphrasing
            Nakov (2008)
        Person name resoluation
            Su et al. (2007)




A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    20
The Risk

      Not every Internet user is trustworthy!
                      Users may give erroneous feedback perfunctorily, 
                      carelessly, or dishonestly
                      Dishonest users have more incentives to perform tasks




             Need to have an ONLINE algorithm to detect 
                        problematic inputs!

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    21
Verification of Users’ Inputs (1)
        Transitivity property
              If A > B and B > C               A should be > C


        Transitivity Satisfaction Rate (TSR)
                  # of triples satisfy the transitivity rule
               # of triples the transitivity rule may apply to




A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    22
Verification of Users’ Inputs (2)
        Detect inconsistent judgments from problematic 
        users
              TSR = 1  perfect consistency
              TSR >= 0.8  generally consistent
              TSR < 0.8  judgments are inconsistent



                                           TSR‐based reward / punishment
                                           (e.g., only pay a reward if TSR > 0.8)



A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    23
Experiment Design
        For n algorithms (e.g., speech encoding)
           1.    a source content as the evaluation target 
           2.    apply the n algorithms to generate n content w/ different Q
                                       ⎛n⎞
           3.    ask a user to perform       paired comparisons
                                       ⎜ ⎟
                                       ⎜ 2⎟
                                       ⎝ ⎠
           4.    compute TSR after an experiment




                reward a user ONLY if his inputs are self‐consistent
                       (i.e., TSR is higher than a certain threshold)


A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    24
Concept Flow in Each Round




A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    25
Audio QoE Evaluation
        Which one is better?




         (SPACE key released)                                  (SPACE key pressed)


A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen     26
Video QoE evaluation
        Which one is better?




         (SPACE key released)                                  (SPACE key pressed)


A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen     27
Talk Progress
                  Overview
                  Methodology
                        Paired Comparison
                        Crowdsourcing Support
                        Experiment Design
                  Case Study & Evaluation
                        Acoustic QoE
                        Optical QoE
                  Conclusion

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    28
Audio QoE Evaluation
        MP3 compression level
              Source clips: one fast‐paced and one slow‐paced song
              MP3 CBR format with 6 bit rate levels: 32, 48, 64, 80, 96, and 
              128 Kbps
              127 participants and 3,660 paired comparisons


        Effect of packet loss rate on VoIP
              Two speech codecs: G722.1 and G728
              Packet loss rate: 0%, 4%, and 8%
              62 participants and 1,545 paired comparisons


A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    29
Inferred QoE Scores

       MP3 Compression Level                                   VoIP Packet Loss Rate




A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen       30
Video QoE Evaluation
        Video codec
              Source clips: one fast‐paced and one slow‐paced video clip
              Three codecs: H.264, WMV3, and XVID
              Two bit rates: 400 and 800 Kbps
              121 participants and 3,345 paired comparisons


        Loss concealment scheme
              Source clips: one fast‐paced and one slow‐paced video clip
              Two schemes: Frame copy (FC) and FC with frame skip (FCFS) 
              Packet loss rate: 1%, 5%, and 8% 
              91 participants and 2,745 paired comparisons




A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    31
Inferred QoE Scores
       Video Codec                                           Concealment Scheme




A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    32
Participant Source
        Laboratory
              Recruit part‐time workers at an hourly rate of 8 USD
        MTurk
              Post experiments on the Mechanical Turk web site
              Pay the participant 0.15 USD for each qualified experiment
        Community
              Seek participants on the website of an Internet community 
              with 1.5 million members
              Pay the participant an amount of virtual currency that was 
              equivalent to one US cent for each qualified experiment


A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen    33
Participant Source Evaluation
With crowdsourcing…
  lower monetary cost
  wider participant diversity 
  maintaining the evaluation results’ quality




   Crowdsourcing seems a good strategy for 
        multimedia QoE assessment!
http://mmnet.iis.sinica.edu.tw/link/qoe
Conclusion
Crowdsourcing is not without limitations
   physical contact
   environment control
   media


With paired comparison and user input verification, 
   less monetary cost
   wider participant diversity
   shorter experiment cycle 
   evaluation quality maintained
May the crowd Force
                     with you!

                      Thank You!
                          Kuan‐Ta Chen
                         Academia Sinica


ACM Multimedia 2009

Weitere ähnliche Inhalte

Ähnlich wie A Crowdsourceable QoE Evaluation Framework for Multimedia Content

Qo E E2 E5 User Centric Approach Katrien De Moor
Qo E E2 E5   User Centric Approach   Katrien De MoorQo E E2 E5   User Centric Approach   Katrien De Moor
Qo E E2 E5 User Centric Approach Katrien De Moor
imec.archive
 
Some Methodological Thoughts on Using Text Mining for Frame Analysis of Media...
Some Methodological Thoughts on Using Text Mining for Frame Analysis of Media...Some Methodological Thoughts on Using Text Mining for Frame Analysis of Media...
Some Methodological Thoughts on Using Text Mining for Frame Analysis of Media...
Yuwei Lin
 

Ähnlich wie A Crowdsourceable QoE Evaluation Framework for Multimedia Content (10)

Quality of Multimedia Experience: Past, Present and Future
Quality of Multimedia Experience: Past, Present and FutureQuality of Multimedia Experience: Past, Present and Future
Quality of Multimedia Experience: Past, Present and Future
 
Serge astm-presentation-chicago-2014-final
Serge astm-presentation-chicago-2014-finalSerge astm-presentation-chicago-2014-final
Serge astm-presentation-chicago-2014-final
 
Qo E E2 E5 User Centric Approach Katrien De Moor
Qo E E2 E5   User Centric Approach   Katrien De MoorQo E E2 E5   User Centric Approach   Katrien De Moor
Qo E E2 E5 User Centric Approach Katrien De Moor
 
Some Methodological Thoughts on Using Text Mining for Frame Analysis of Media...
Some Methodological Thoughts on Using Text Mining for Frame Analysis of Media...Some Methodological Thoughts on Using Text Mining for Frame Analysis of Media...
Some Methodological Thoughts on Using Text Mining for Frame Analysis of Media...
 
PDK Slide Share
PDK Slide SharePDK Slide Share
PDK Slide Share
 
LAK13 linkedup tutorial_evaluation_framework
LAK13 linkedup tutorial_evaluation_frameworkLAK13 linkedup tutorial_evaluation_framework
LAK13 linkedup tutorial_evaluation_framework
 
Assessing the Quality of Opinion Retrieval Systems
Assessing the Quality of Opinion Retrieval SystemsAssessing the Quality of Opinion Retrieval Systems
Assessing the Quality of Opinion Retrieval Systems
 
Trusted networks for Open Education - Online Educa 20121128
Trusted networks for Open Education - Online Educa 20121128Trusted networks for Open Education - Online Educa 20121128
Trusted networks for Open Education - Online Educa 20121128
 
Quality of Experience: Measuring Quality from the End-User Perspective
Quality of Experience: Measuring Quality from the End-User PerspectiveQuality of Experience: Measuring Quality from the End-User Perspective
Quality of Experience: Measuring Quality from the End-User Perspective
 
NCTM Kanold Chicago
NCTM Kanold ChicagoNCTM Kanold Chicago
NCTM Kanold Chicago
 

Mehr von Academia Sinica

量化「樂趣」-以心理生理量測探究數位娛樂商品之市場價值
量化「樂趣」-以心理生理量測探究數位娛樂商品之市場價值量化「樂趣」-以心理生理量測探究數位娛樂商品之市場價值
量化「樂趣」-以心理生理量測探究數位娛樂商品之市場價值
Academia Sinica
 
GamingAnywhere: An Open Cloud Gaming System
GamingAnywhere: An Open Cloud Gaming SystemGamingAnywhere: An Open Cloud Gaming System
GamingAnywhere: An Open Cloud Gaming System
Academia Sinica
 
Identifying MMORPG Bots: A Traffic Analysis Approach
Identifying MMORPG Bots: A Traffic Analysis ApproachIdentifying MMORPG Bots: A Traffic Analysis Approach
Identifying MMORPG Bots: A Traffic Analysis Approach
Academia Sinica
 
Improving Reliability of Web 2.0-based Rating Systems Using Per-user Trustiness
Improving Reliability of Web 2.0-based Rating Systems Using Per-user TrustinessImproving Reliability of Web 2.0-based Rating Systems Using Per-user Trustiness
Improving Reliability of Web 2.0-based Rating Systems Using Per-user Trustiness
Academia Sinica
 
A Collusion-Resistant Automation Scheme for Social Moderation Systems
A Collusion-Resistant Automation Scheme for Social Moderation SystemsA Collusion-Resistant Automation Scheme for Social Moderation Systems
A Collusion-Resistant Automation Scheme for Social Moderation Systems
Academia Sinica
 

Mehr von Academia Sinica (20)

Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
 
Games on Demand: Are We There Yet?
Games on Demand: Are We There Yet?Games on Demand: Are We There Yet?
Games on Demand: Are We There Yet?
 
Detecting In-Situ Identity Fraud on Social Network Services: A Case Study on ...
Detecting In-Situ Identity Fraud on Social Network Services: A Case Study on ...Detecting In-Situ Identity Fraud on Social Network Services: A Case Study on ...
Detecting In-Situ Identity Fraud on Social Network Services: A Case Study on ...
 
Cloud Gaming Onward: Research Opportunities and Outlook
Cloud Gaming Onward: Research Opportunities and OutlookCloud Gaming Onward: Research Opportunities and Outlook
Cloud Gaming Onward: Research Opportunities and Outlook
 
Quantifying User Satisfaction in Mobile Cloud Games
Quantifying User Satisfaction in Mobile Cloud GamesQuantifying User Satisfaction in Mobile Cloud Games
Quantifying User Satisfaction in Mobile Cloud Games
 
量化「樂趣」-以心理生理量測探究數位娛樂商品之市場價值
量化「樂趣」-以心理生理量測探究數位娛樂商品之市場價值量化「樂趣」-以心理生理量測探究數位娛樂商品之市場價值
量化「樂趣」-以心理生理量測探究數位娛樂商品之市場價值
 
On The Battle between Online Gamers and Lags
On The Battle between Online Gamers and LagsOn The Battle between Online Gamers and Lags
On The Battle between Online Gamers and Lags
 
Understanding The Performance of Thin-Client Gaming
Understanding The Performance of Thin-Client GamingUnderstanding The Performance of Thin-Client Gaming
Understanding The Performance of Thin-Client Gaming
 
Quantifying QoS Requirements of Network Services: A Cheat-Proof Framework
Quantifying QoS Requirements of Network Services: A Cheat-Proof FrameworkQuantifying QoS Requirements of Network Services: A Cheat-Proof Framework
Quantifying QoS Requirements of Network Services: A Cheat-Proof Framework
 
Online Game QoE Evaluation using Paired Comparisons
Online Game QoE Evaluation using Paired ComparisonsOnline Game QoE Evaluation using Paired Comparisons
Online Game QoE Evaluation using Paired Comparisons
 
GamingAnywhere: An Open Cloud Gaming System
GamingAnywhere: An Open Cloud Gaming SystemGamingAnywhere: An Open Cloud Gaming System
GamingAnywhere: An Open Cloud Gaming System
 
Are All Games Equally Cloud-Gaming-Friendly? An Electromyographic Approach
Are All Games Equally Cloud-Gaming-Friendly? An Electromyographic ApproachAre All Games Equally Cloud-Gaming-Friendly? An Electromyographic Approach
Are All Games Equally Cloud-Gaming-Friendly? An Electromyographic Approach
 
Forecasting Online Game Addictiveness
Forecasting Online Game AddictivenessForecasting Online Game Addictiveness
Forecasting Online Game Addictiveness
 
Identifying MMORPG Bots: A Traffic Analysis Approach
Identifying MMORPG Bots: A Traffic Analysis ApproachIdentifying MMORPG Bots: A Traffic Analysis Approach
Identifying MMORPG Bots: A Traffic Analysis Approach
 
Toward an Understanding of the Processing Delay of Peer-to-Peer Relay Nodes
Toward an Understanding of the Processing Delay of Peer-to-Peer Relay NodesToward an Understanding of the Processing Delay of Peer-to-Peer Relay Nodes
Toward an Understanding of the Processing Delay of Peer-to-Peer Relay Nodes
 
Inferring Speech Activity from Encrypted Skype Traffic
Inferring Speech Activity from Encrypted Skype TrafficInferring Speech Activity from Encrypted Skype Traffic
Inferring Speech Activity from Encrypted Skype Traffic
 
Game Bot Detection Based on Avatar Trajectory
Game Bot Detection Based on Avatar TrajectoryGame Bot Detection Based on Avatar Trajectory
Game Bot Detection Based on Avatar Trajectory
 
Improving Reliability of Web 2.0-based Rating Systems Using Per-user Trustiness
Improving Reliability of Web 2.0-based Rating Systems Using Per-user TrustinessImproving Reliability of Web 2.0-based Rating Systems Using Per-user Trustiness
Improving Reliability of Web 2.0-based Rating Systems Using Per-user Trustiness
 
A Collusion-Resistant Automation Scheme for Social Moderation Systems
A Collusion-Resistant Automation Scheme for Social Moderation SystemsA Collusion-Resistant Automation Scheme for Social Moderation Systems
A Collusion-Resistant Automation Scheme for Social Moderation Systems
 
Tuning Skype’s Redundancy Control Algorithm for User Satisfaction
Tuning Skype’s Redundancy Control Algorithm for User SatisfactionTuning Skype’s Redundancy Control Algorithm for User Satisfaction
Tuning Skype’s Redundancy Control Algorithm for User Satisfaction
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

A Crowdsourceable QoE Evaluation Framework for Multimedia Content

  • 1. A Crowdsourceable QoE Evaluation Framework for  Multimedia Content Kuan‐Ta Chen Academia Sinica Chen‐Chi Wu National Taiwan University Yu‐Chun Chang National Taiwan University Chin‐Laung Lei National Taiwan University  ACM Multimedia 2009
  • 2. What is QoE? Quality of Experience =  Users’ satisfaction  about a service  (e.g., multimedia content) A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  2
  • 3. Quality of Experience Poor Good (underexposed) (exposure OK) A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  3
  • 4. Challenges How to quantify the QoE of multimedia content  efficiently and reliably? Q=? Q=? Q=? Q=? A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  4
  • 5. Mean Opinion Score (MOS) Idea: Single Stimulus Method (SSM) + Absolute Categorial  Rating (ACR) MOS Quality Impairment 5 Excellent Imperceptible 4 Good Perceptible but not annoying 3 Fair Slightly annoying 2 Poor Annoying 1 Bad Very annoying Excellent? Good? Fair? vote Poor? Bad? Fair A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  5
  • 6. Drawbacks of MOS‐based Evaluations ACR‐based Concepts of the scales cannot be concretely defined Dissimilar interpretations of the scale among users Only an ordinal scale, not an interval scale Solve all  Difficult to verify users’ scores these  drawbacks Subjective experiments in laboratory Monetary cost (reward, transportation) Labor cost (supervision) Physical space/time/hardware constraints A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  6
  • 7. Drawbacks of MOS‐based Evaluations ACR‐based Concepts of the scales cannot be concretely defined Paired  Dissimilar interpretations of the scale among users Comparison Only an ordinal scale, not an interval scale Difficult to verify users’ scores Subjective experiments in laboratory Monetary cost (reward, transportation) Crowdsourcing Labor cost (supervision) Physical space/time/hardware constraints A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  7
  • 9. Talk Progress Overview Methodology Paired Comparison Crowdsourcing Support Experiment Design Case Study & Evaluation Acoustic QoE Optical QoE Conclusion A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  9
  • 10. Current Approach: MOS Rating Excellent? Good? Fair? Poor? Vote Bad? ? A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  10
  • 11. Our Proposal: Paired Comparison A B Which one  is better? Vote B A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  11
  • 12. Properties of Paired Comparison Generalizable across different content types and  applications Simple comparative judgment dichotomous decision easier than 5‐category rating Interval‐scale QoE scores can be inferred The users’ inputs can be verified A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  12
  • 13. Choice Frequency Matrix 10 experiments, each containing C(4,2)=6 paired comparisons A B C D A 0 9 10 9 B 1 0 7 8 C 0 3 0 6 D 1 2 4 0
  • 14. Inference of QoE Scores Bradley‐Terry‐Luce (BTL) model input: choice frequency matrix output: an interval‐scale score for each content  (based on maximum likelihood estimation) n content: T1,…, Tn Pij : the probability of choosing Ti over Tj u(Ti) is the estimated QoE score of the quality level Ti u (Ti ) − u (T j ) π (Ti ) e Pij = = π (Ti ) + π (T j ) 1 + eu (T ) −u (T ) i j Basic Idea P12 = P23 u(T1) - u(T2) = u(T2) - u(T3) A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  14
  • 15. Inferred QoE Scores 0 0.63 0.91 1 A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  15
  • 16. Talk Progress Overview Methodology Paired Comparison Crowdsourcing Support Experiment Design Case Study & Evaluation Acoustic QoE Optical QoE Conclusion A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  16
  • 17. Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls  to large‐scale communities” A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  17
  • 18. Image Understanding Reward: 0.04 USD main theme? key objects? unique attributes? A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  18
  • 19. Linguistic Annotations Word similarity (Snow et al. 2008) USD 0.2 for labeling 30 word pairs A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  19
  • 20. More Examples Document relevance evaluation Alonso et al. (2008) Document rating collection Kittur et al. (2008) Noun compound paraphrasing Nakov (2008) Person name resoluation Su et al. (2007) A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  20
  • 21. The Risk Not every Internet user is trustworthy! Users may give erroneous feedback perfunctorily,  carelessly, or dishonestly Dishonest users have more incentives to perform tasks Need to have an ONLINE algorithm to detect  problematic inputs! A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  21
  • 22. Verification of Users’ Inputs (1) Transitivity property If A > B and B > C  A should be > C Transitivity Satisfaction Rate (TSR) # of triples satisfy the transitivity rule # of triples the transitivity rule may apply to A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  22
  • 23. Verification of Users’ Inputs (2) Detect inconsistent judgments from problematic  users TSR = 1  perfect consistency TSR >= 0.8  generally consistent TSR < 0.8  judgments are inconsistent TSR‐based reward / punishment (e.g., only pay a reward if TSR > 0.8) A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  23
  • 24. Experiment Design For n algorithms (e.g., speech encoding) 1. a source content as the evaluation target  2. apply the n algorithms to generate n content w/ different Q ⎛n⎞ 3. ask a user to perform       paired comparisons ⎜ ⎟ ⎜ 2⎟ ⎝ ⎠ 4. compute TSR after an experiment reward a user ONLY if his inputs are self‐consistent (i.e., TSR is higher than a certain threshold) A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  24
  • 26. Audio QoE Evaluation Which one is better? (SPACE key released) (SPACE key pressed) A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  26
  • 27. Video QoE evaluation Which one is better? (SPACE key released) (SPACE key pressed) A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  27
  • 28. Talk Progress Overview Methodology Paired Comparison Crowdsourcing Support Experiment Design Case Study & Evaluation Acoustic QoE Optical QoE Conclusion A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  28
  • 29. Audio QoE Evaluation MP3 compression level Source clips: one fast‐paced and one slow‐paced song MP3 CBR format with 6 bit rate levels: 32, 48, 64, 80, 96, and  128 Kbps 127 participants and 3,660 paired comparisons Effect of packet loss rate on VoIP Two speech codecs: G722.1 and G728 Packet loss rate: 0%, 4%, and 8% 62 participants and 1,545 paired comparisons A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  29
  • 30. Inferred QoE Scores MP3 Compression Level VoIP Packet Loss Rate A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  30
  • 31. Video QoE Evaluation Video codec Source clips: one fast‐paced and one slow‐paced video clip Three codecs: H.264, WMV3, and XVID Two bit rates: 400 and 800 Kbps 121 participants and 3,345 paired comparisons Loss concealment scheme Source clips: one fast‐paced and one slow‐paced video clip Two schemes: Frame copy (FC) and FC with frame skip (FCFS)  Packet loss rate: 1%, 5%, and 8%  91 participants and 2,745 paired comparisons A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  31
  • 32. Inferred QoE Scores Video Codec Concealment Scheme A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  32
  • 33. Participant Source Laboratory Recruit part‐time workers at an hourly rate of 8 USD MTurk Post experiments on the Mechanical Turk web site Pay the participant 0.15 USD for each qualified experiment Community Seek participants on the website of an Internet community  with 1.5 million members Pay the participant an amount of virtual currency that was  equivalent to one US cent for each qualified experiment A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  33
  • 34. Participant Source Evaluation With crowdsourcing… lower monetary cost wider participant diversity  maintaining the evaluation results’ quality Crowdsourcing seems a good strategy for  multimedia QoE assessment!
  • 36. Conclusion Crowdsourcing is not without limitations physical contact environment control media With paired comparison and user input verification,  less monetary cost wider participant diversity shorter experiment cycle  evaluation quality maintained
  • 37. May the crowd Force with you! Thank You! Kuan‐Ta Chen Academia Sinica ACM Multimedia 2009