SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Improving Low Quality 
StackOverflow Post Detection 
Luca Ponzanelli David Fullerton 
Andrea Mocci 
University Of Lugano 
Switzerland 
Alberto Bacchelli 
Delft University of Technology 
Netherlands 
StackExchange Inc. 
New York, USA 
Michele Lanza
Answer 
Answer Question 
StackOverflow
Answer 
Answer Question 
StackOverflow
Answer 
Answer 
Question 
6,000+ daily questions 
StackOverflow
Q 
Q 
Q 
Q 
StackOverflow 
Review Process 
Q 
Q 
Moderator 
System
Q 
Q 
Q 
Q 
StackOverflow 
Review Process 
Q 
Q 
Moderator 
System
Suggested Edits 
Late Answers and 
StackOverflow 
Review Process 
First Posts 
Low Quality Posts
Low Quality Posts 
Identified by the system 
StackOverflow 
Review Process
Low Quality Posts 
an inefficient approach 
increases the review 
StackOverflow 
Review Process 
queue size
Low Quality Posts 
an efficient approach 
saves time to reviewers 
StackOverflow 
Review Process
Low Quality Post 
Refine the review queue to 
remove misclassified posts 
StackOverflow 
Review Process
Body Length 
Capital Title 
Emails Count 
Lowercase Percentage 
Spaces Count 
StackOverflow 
Tags Count 
Text Speak Count 
Title Body Similarity 
Title Length 
Uppercase Percentage 
Quality Metrics
Body Length 
Capital Title 
Emails Count 
Lowercase Percentage 
Spaces Count 
Pure Textual Metrics 
StackOverflow 
Tags Count 
Text Speak Count 
Title Body Similarity 
Title Length 
Uppercase Percentage 
Quality Metrics
StackOverflow 
Quality Metrics 
Readability 
Metrics 
Popularity 
Metrics 
Textual 
Metrics
Average Term Entropy 
Automated Reading Index 
Coleman Liau Index 
Flesch Kincaid Grade Level 
Flesch Reading Ease Score 
Gunning Fox Index 
LOC Percentage 
Metric Entropy 
Sentences Count 
SMOG Grade 
Words Count 
Readability Metrics
Average Term Entropy 
Automated Reading Index 
Coleman Liau Index 
Flesch Kincaid Grade Level 
Flesch Reading Ease Score 
Gunning Fox Index 
Readab 
ility 
LOC Percentage 
Metric Entropy 
Sentences Count 
SMOG Grade 
Words Count 
Readability Metrics
Average Term Entropy 
Automated Reading Index 
Coleman Liau Index 
Flesch Kincaid Grade Level 
Flesch Reading Ease Score 
Gunning Fox Index 
Readab 
ility 
LOC Percentage 
Metric Entropy 
Sentences Count 
SMOG Grade 
Words Count 
Readability Metrics
Accepted by Originator Votes 
Approved Edit Suggestion 
Answer Badges Count 
Badges-Tags Coverage 
Bounty Start (End) Votes 
Close Votes 
Deletion Votes 
Down Votes 
Favorite Votes 
Moderator Review Votes 
Offensive Votes 
Reopen Votes 
Question Badges Count 
Spam Votes 
Total Badges 
Undeletion Votes 
Up Votes 
Popularity Metrics
StackOverflow 
Public Dump 
Classification 
Approach
StackOverflow 
Public Dump 
5,648,975 Questions 
(September 2013) 
Classification 
Approach
StackOverflow 
Public Dump 
Very Good (A) 
Good (B) 
Bad (C) 
Very Bad (D) 
Classification 
Approach 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
Very Good (A) 
Good (B) 
Bad (C) 
Very Bad (D) 
Classification 
Approach 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
Neither Closed nor Deleted 
With an Accepted Answer 
Score > 7 
Very Good (A) 
Good (B) 
Bad (C) 
Very Bad (D) 
Classification 
Approach 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
Neither Closed nor Deleted 
With an Accepted Answer 
1 < Score < 6 
Very Good (A) 
Good (B) 
Bad (C) 
Very Bad (D) 
Classification 
Approach 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
Neither Closed nor Deleted 
With an Accepted Answer 
Score < 0 
Very Good (A) 
Good (B) 
Bad (C) 
Very Bad (D) 
Classification 
Approach 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
Closed or Deleted 
Very Good (A) 
Good (B) 
Bad (C) 
Very Bad (D) 
Classification 
Approach 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
Genetic Algorithm 
Classification 
Function
Genetic Algorithm 
QF = 
Xn 
i=1 
wi · mi 
wi 2 [−1, 1] mi 2 [0, 1] 
Classification 
Function
Data Metrics 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Classification 
Function 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
Metrics 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Classification 
Function 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software) 
Data
A function assigns 
Positive Value if Good 
Negative Value if Bad 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Classification 
Function 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
quantiles 
q = 0.25 q = 0.25 
25% 
25% 
-1 0 1 
x = QF(post) 
y = freq(x) 
D C B A 
Classification 
Function
10% 10% 
q = 0.25 q = 0.25 
D C B A 
-1 0 1 
x = QF(post) 
y = freq(x) 
Classification 
Function
q = 0.25 q = 0.25 
D C B A 
-1 0 1 
x = QF(post) 
y = freq(x) 
40% 40% 
Classification 
Function
StackOverflow 
Public Dump 
Review Queue 
Refinement
StackOverflow 
Public Dump 
StackOverflow 
Private Dump 
Low Quality Post 
Review Queue 
Refinement
x x x 
Review Queue 
Refinement
Review Queue 
Refinement
Review Queue (RQ) 
D D D D C C B B A A A A 
A 
q=0.25 
D C C B A A A A A 
Review Queue 
Refinement
Review Queue (RQ) 
D D D C B B A 
A 
q=0.25 
D C C B A A A A A 
Review Queue 
Refinement
Review Queue (RQ) 
D D D D C C B B A A A A 
∩ 
D D D C C C B A A A A A 
D 
q=0.1 
Review Queue 
Refinement
Review Queue (RQ) 
D D B 
∩ 
D D D C C C B A A A A A 
D 
q=0.1 
Review Queue 
Refinement
Review Queue (RQ) 
D D D D C C B B A A A A 
A A 
q=0.25 
D C C B A A A A A 
q=0.1 
U 
Review Queue 
Refinement
Review Queue (RQ) 
D D D D C B A 
D C C B A A A A A 
A A 
U 
q=0.25 q=0.1 
Review Queue 
Refinement
Hard Precision (HP) 
The percentage of posts in the review 
queue belonging to the class D 
Soft Precision (SP) 
The percentage of posts in the review 
queue belonging to the class D and C 
Review Queue 
Refinement
Hard Precision (HP) 
41.90% 
Soft Precision (SP) 
64.31% 
Review Queue (RQ) Size 
3,416 
Without 
Refinement 
Review Queue 
Refinement
RQ Size 3,416 HP 41.90% SP 64.31% 
Model RQ Size 
RQ 
Reduction 
Hard 
Precision 
Soft 
Precision A Red. B Red. C Red. D Red. 
RQ  A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% 
RQ  A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% 
RQ  A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% 
RQ  A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% 
RQ  A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% 
RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% 
RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
RQ Size 3,416 HP 41.90% SP 64.31% 
Model RQ Size 
RQ 
Reduction 
Hard 
Precision 
Soft 
Precision A Red. B Red. C Red. D Red. 
RQ  A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% 
RQ  A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% 
RQ  A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% 
RQ  A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% 
RQ  A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% 
RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% 
RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
RQ Size 3,416 HP 41.90% SP 64.31% 
Model RQ Size 
RQ 
Reduction 
Hard 
Precision 
Soft 
Precision A Red. B Red. C Red. D Red. 
RQ  A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% 
RQ  A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% 
RQ  A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% 
RQ  A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% 
RQ  A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% 
RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% 
RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
RQ Size 3,416 HP 41.90% SP 64.31% 
Model RQ Size 
RQ 
Reduction 
Hard 
Precision 
Soft 
Precision A Red. B Red. C Red. D Red. 
RQ  A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% 
RQ  A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% 
RQ  A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% 
RQ  A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% 
RQ  A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% 
RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% 
RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
RQ Size 3,416 HP 41.90% SP 64.31% 
Model RQ Size 
RQ 
Reduction 
Hard 
Precision 
Soft 
Precision A Red. B Red. C Red. D Red. 
RQ  A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% 
RQ  A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% 
RQ  A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% 
RQ  A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% 
RQ  A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% 
RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% 
RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
RQ Size 3,416 HP 41.90% SP 64.31% 
Model RQ Size 
RQ 
Reduction 
Hard 
Precision 
Soft 
Precision A Red. B Red. C Red. D Red. 
RQ  A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% 
RQ  A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% 
RQ  A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% 
RQ  A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% 
RQ  A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% 
RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% 
RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
Readability and Popularity Metrics 
are the most effective 
for queue refinement 
Lessons Learned
Readability and Popularity Metrics 
are the most effective 
for queue refinement 
Tradeoff between review queue 
reduction and bad post reduction 
Lessons Learned

Weitere ähnliche Inhalte

Andere mochten auch

Collaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumentiCollaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumentiFilippo Lanubile
 
Big Data and Social Media Mining in Crisis and Emergency Management
Big Data and Social Media Mining in Crisis and Emergency ManagementBig Data and Social Media Mining in Crisis and Emergency Management
Big Data and Social Media Mining in Crisis and Emergency ManagementBYTE Project
 
Naïve multi label classification of you tube comments using
Naïve multi label classification of you tube comments usingNaïve multi label classification of you tube comments using
Naïve multi label classification of you tube comments usingNidhi Baranwal
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTao Xie
 
Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersTao Xie
 
Benevol 2012 Keynote: The Social Software (R)evolution
Benevol 2012 Keynote: The Social Software (R)evolutionBenevol 2012 Keynote: The Social Software (R)evolution
Benevol 2012 Keynote: The Social Software (R)evolutionMargaret-Anne Storey
 
The (R)evolution of Social Media in Software Engineering
The (R)evolution of Social Media in Software EngineeringThe (R)evolution of Social Media in Software Engineering
The (R)evolution of Social Media in Software EngineeringMargaret-Anne Storey
 
FSE 2016 Panel: The State of Software Engineering Research
FSE 2016 Panel: The State of Software Engineering ResearchFSE 2016 Panel: The State of Software Engineering Research
FSE 2016 Panel: The State of Software Engineering ResearchMargaret-Anne Storey
 
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...Margaret-Anne Storey
 
How Developers Stay Current Using Twitter
How Developers Stay Current Using TwitterHow Developers Stay Current Using Twitter
How Developers Stay Current Using TwitterMargaret-Anne Storey
 
Crowdsourcing Documentation in Software Engineering
Crowdsourcing Documentation in Software EngineeringCrowdsourcing Documentation in Software Engineering
Crowdsourcing Documentation in Software EngineeringMargaret-Anne Storey
 
Stack Overflow slides Data Analytics
Stack Overflow slides Data Analytics Stack Overflow slides Data Analytics
Stack Overflow slides Data Analytics Rahul Thankachan
 
Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences Biplab Debnath
 
To Bot or Not: How Bots can Support Collaboration in Software Engineering (I...
To Bot or Not:  How Bots can Support Collaboration in Software Engineering (I...To Bot or Not:  How Bots can Support Collaboration in Software Engineering (I...
To Bot or Not: How Bots can Support Collaboration in Software Engineering (I...Margaret-Anne Storey
 
[Report] Social Data Intelligence, by Susan Etlinger
[Report] Social Data Intelligence, by Susan Etlinger [Report] Social Data Intelligence, by Susan Etlinger
[Report] Social Data Intelligence, by Susan Etlinger Altimeter, a Prophet Company
 

Andere mochten auch (17)

Collaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumentiCollaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumenti
 
Big Data and Social Media Mining in Crisis and Emergency Management
Big Data and Social Media Mining in Crisis and Emergency ManagementBig Data and Social Media Mining in Crisis and Emergency Management
Big Data and Social Media Mining in Crisis and Emergency Management
 
Naïve multi label classification of you tube comments using
Naïve multi label classification of you tube comments usingNaïve multi label classification of you tube comments using
Naïve multi label classification of you tube comments using
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
 
Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that Matters
 
Benevol 2012 Keynote: The Social Software (R)evolution
Benevol 2012 Keynote: The Social Software (R)evolutionBenevol 2012 Keynote: The Social Software (R)evolution
Benevol 2012 Keynote: The Social Software (R)evolution
 
The (R)evolution of Social Media in Software Engineering
The (R)evolution of Social Media in Software EngineeringThe (R)evolution of Social Media in Software Engineering
The (R)evolution of Social Media in Software Engineering
 
Research industry panel review
Research industry panel reviewResearch industry panel review
Research industry panel review
 
FSE 2016 Panel: The State of Software Engineering Research
FSE 2016 Panel: The State of Software Engineering ResearchFSE 2016 Panel: The State of Software Engineering Research
FSE 2016 Panel: The State of Software Engineering Research
 
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
 
How Developers Stay Current Using Twitter
How Developers Stay Current Using TwitterHow Developers Stay Current Using Twitter
How Developers Stay Current Using Twitter
 
Crowdsourcing Documentation in Software Engineering
Crowdsourcing Documentation in Software EngineeringCrowdsourcing Documentation in Software Engineering
Crowdsourcing Documentation in Software Engineering
 
Stack Overflow slides Data Analytics
Stack Overflow slides Data Analytics Stack Overflow slides Data Analytics
Stack Overflow slides Data Analytics
 
Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences
 
STACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSISSTACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSIS
 
To Bot or Not: How Bots can Support Collaboration in Software Engineering (I...
To Bot or Not:  How Bots can Support Collaboration in Software Engineering (I...To Bot or Not:  How Bots can Support Collaboration in Software Engineering (I...
To Bot or Not: How Bots can Support Collaboration in Software Engineering (I...
 
[Report] Social Data Intelligence, by Susan Etlinger
[Report] Social Data Intelligence, by Susan Etlinger [Report] Social Data Intelligence, by Susan Etlinger
[Report] Social Data Intelligence, by Susan Etlinger
 

Ähnlich wie Improving Low Quality Stack Overflow Post Detection

Leveraging Your Company's DevOps Transformation (AppSec USA 2014)
Leveraging Your Company's DevOps Transformation (AppSec USA 2014)Leveraging Your Company's DevOps Transformation (AppSec USA 2014)
Leveraging Your Company's DevOps Transformation (AppSec USA 2014)dev2ops
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...NextMove Software
 
Without Self-Service Operations, the Cloud is Just Expensive Hosting 2.0 - (a...
Without Self-Service Operations, the Cloud is Just Expensive Hosting 2.0 - (a...Without Self-Service Operations, the Cloud is Just Expensive Hosting 2.0 - (a...
Without Self-Service Operations, the Cloud is Just Expensive Hosting 2.0 - (a...dev2ops
 
Webinar slides: DIY Market Mapping Using Correspondence Analysis
Webinar slides: DIY Market Mapping Using Correspondence AnalysisWebinar slides: DIY Market Mapping Using Correspondence Analysis
Webinar slides: DIY Market Mapping Using Correspondence AnalysisDisplayr
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016Masud Rahman
 
Factor analysis
Factor analysisFactor analysis
Factor analysis緯鈞 沈
 
Hedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial SurveyHedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial SurveyAvere Systems
 
Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...
Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...
Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...Ankita Kaul
 
Reasoning Loops over Arrays using Vampire
Reasoning Loops over Arrays using VampireReasoning Loops over Arrays using Vampire
Reasoning Loops over Arrays using VampireJeff Chen
 
Why Reinvent the Wheel: Let's Build Question Answering Systems Together
Why Reinvent the Wheel: Let's Build Question Answering Systems TogetherWhy Reinvent the Wheel: Let's Build Question Answering Systems Together
Why Reinvent the Wheel: Let's Build Question Answering Systems TogetherKuldeep Singh
 
A Set of Heuristics to Support Early Identification of Conflicting Requirements
A Set of Heuristics to Support Early Identification of Conflicting RequirementsA Set of Heuristics to Support Early Identification of Conflicting Requirements
A Set of Heuristics to Support Early Identification of Conflicting RequirementsAlejandro Salado
 
Quality Management.ppt
Quality Management.pptQuality Management.ppt
Quality Management.pptddelucy
 

Ähnlich wie Improving Low Quality Stack Overflow Post Detection (15)

Leveraging Your Company's DevOps Transformation (AppSec USA 2014)
Leveraging Your Company's DevOps Transformation (AppSec USA 2014)Leveraging Your Company's DevOps Transformation (AppSec USA 2014)
Leveraging Your Company's DevOps Transformation (AppSec USA 2014)
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
 
Without Self-Service Operations, the Cloud is Just Expensive Hosting 2.0 - (a...
Without Self-Service Operations, the Cloud is Just Expensive Hosting 2.0 - (a...Without Self-Service Operations, the Cloud is Just Expensive Hosting 2.0 - (a...
Without Self-Service Operations, the Cloud is Just Expensive Hosting 2.0 - (a...
 
Webinar slides: DIY Market Mapping Using Correspondence Analysis
Webinar slides: DIY Market Mapping Using Correspondence AnalysisWebinar slides: DIY Market Mapping Using Correspondence Analysis
Webinar slides: DIY Market Mapping Using Correspondence Analysis
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Hedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial SurveyHedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial Survey
 
Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...
Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...
Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...
 
RACK-SANER2016
RACK-SANER2016RACK-SANER2016
RACK-SANER2016
 
Reasoning Loops over Arrays using Vampire
Reasoning Loops over Arrays using VampireReasoning Loops over Arrays using Vampire
Reasoning Loops over Arrays using Vampire
 
Why Reinvent the Wheel: Let's Build Question Answering Systems Together
Why Reinvent the Wheel: Let's Build Question Answering Systems TogetherWhy Reinvent the Wheel: Let's Build Question Answering Systems Together
Why Reinvent the Wheel: Let's Build Question Answering Systems Together
 
MSR2015-Challenge
MSR2015-ChallengeMSR2015-Challenge
MSR2015-Challenge
 
ISEC-2021-Presentation-Saikat-Mondal
ISEC-2021-Presentation-Saikat-MondalISEC-2021-Presentation-Saikat-Mondal
ISEC-2021-Presentation-Saikat-Mondal
 
A Set of Heuristics to Support Early Identification of Conflicting Requirements
A Set of Heuristics to Support Early Identification of Conflicting RequirementsA Set of Heuristics to Support Early Identification of Conflicting Requirements
A Set of Heuristics to Support Early Identification of Conflicting Requirements
 
Quality Management.ppt
Quality Management.pptQuality Management.ppt
Quality Management.ppt
 

Kürzlich hochgeladen

University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 

Kürzlich hochgeladen (20)

University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 

Improving Low Quality Stack Overflow Post Detection

  • 1. Improving Low Quality StackOverflow Post Detection Luca Ponzanelli David Fullerton Andrea Mocci University Of Lugano Switzerland Alberto Bacchelli Delft University of Technology Netherlands StackExchange Inc. New York, USA Michele Lanza
  • 2. Answer Answer Question StackOverflow
  • 3. Answer Answer Question StackOverflow
  • 4. Answer Answer Question 6,000+ daily questions StackOverflow
  • 5. Q Q Q Q StackOverflow Review Process Q Q Moderator System
  • 6. Q Q Q Q StackOverflow Review Process Q Q Moderator System
  • 7. Suggested Edits Late Answers and StackOverflow Review Process First Posts Low Quality Posts
  • 8. Low Quality Posts Identified by the system StackOverflow Review Process
  • 9. Low Quality Posts an inefficient approach increases the review StackOverflow Review Process queue size
  • 10. Low Quality Posts an efficient approach saves time to reviewers StackOverflow Review Process
  • 11. Low Quality Post Refine the review queue to remove misclassified posts StackOverflow Review Process
  • 12. Body Length Capital Title Emails Count Lowercase Percentage Spaces Count StackOverflow Tags Count Text Speak Count Title Body Similarity Title Length Uppercase Percentage Quality Metrics
  • 13. Body Length Capital Title Emails Count Lowercase Percentage Spaces Count Pure Textual Metrics StackOverflow Tags Count Text Speak Count Title Body Similarity Title Length Uppercase Percentage Quality Metrics
  • 14. StackOverflow Quality Metrics Readability Metrics Popularity Metrics Textual Metrics
  • 15. Average Term Entropy Automated Reading Index Coleman Liau Index Flesch Kincaid Grade Level Flesch Reading Ease Score Gunning Fox Index LOC Percentage Metric Entropy Sentences Count SMOG Grade Words Count Readability Metrics
  • 16. Average Term Entropy Automated Reading Index Coleman Liau Index Flesch Kincaid Grade Level Flesch Reading Ease Score Gunning Fox Index Readab ility LOC Percentage Metric Entropy Sentences Count SMOG Grade Words Count Readability Metrics
  • 17. Average Term Entropy Automated Reading Index Coleman Liau Index Flesch Kincaid Grade Level Flesch Reading Ease Score Gunning Fox Index Readab ility LOC Percentage Metric Entropy Sentences Count SMOG Grade Words Count Readability Metrics
  • 18. Accepted by Originator Votes Approved Edit Suggestion Answer Badges Count Badges-Tags Coverage Bounty Start (End) Votes Close Votes Deletion Votes Down Votes Favorite Votes Moderator Review Votes Offensive Votes Reopen Votes Question Badges Count Spam Votes Total Badges Undeletion Votes Up Votes Popularity Metrics
  • 19. StackOverflow Public Dump Classification Approach
  • 20. StackOverflow Public Dump 5,648,975 Questions (September 2013) Classification Approach
  • 21. StackOverflow Public Dump Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
  • 22. Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
  • 23. Neither Closed nor Deleted With an Accepted Answer Score > 7 Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
  • 24. Neither Closed nor Deleted With an Accepted Answer 1 < Score < 6 Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
  • 25. Neither Closed nor Deleted With an Accepted Answer Score < 0 Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
  • 26. Closed or Deleted Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
  • 28. Genetic Algorithm QF = Xn i=1 wi · mi wi 2 [−1, 1] mi 2 [0, 1] Classification Function
  • 29. Data Metrics L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Classification Function Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
  • 30. Metrics L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Classification Function Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software) Data
  • 31. A function assigns Positive Value if Good Negative Value if Bad L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Classification Function Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
  • 32. quantiles q = 0.25 q = 0.25 25% 25% -1 0 1 x = QF(post) y = freq(x) D C B A Classification Function
  • 33. 10% 10% q = 0.25 q = 0.25 D C B A -1 0 1 x = QF(post) y = freq(x) Classification Function
  • 34. q = 0.25 q = 0.25 D C B A -1 0 1 x = QF(post) y = freq(x) 40% 40% Classification Function
  • 35. StackOverflow Public Dump Review Queue Refinement
  • 36. StackOverflow Public Dump StackOverflow Private Dump Low Quality Post Review Queue Refinement
  • 37. x x x Review Queue Refinement
  • 39. Review Queue (RQ) D D D D C C B B A A A A A q=0.25 D C C B A A A A A Review Queue Refinement
  • 40. Review Queue (RQ) D D D C B B A A q=0.25 D C C B A A A A A Review Queue Refinement
  • 41. Review Queue (RQ) D D D D C C B B A A A A ∩ D D D C C C B A A A A A D q=0.1 Review Queue Refinement
  • 42. Review Queue (RQ) D D B ∩ D D D C C C B A A A A A D q=0.1 Review Queue Refinement
  • 43. Review Queue (RQ) D D D D C C B B A A A A A A q=0.25 D C C B A A A A A q=0.1 U Review Queue Refinement
  • 44. Review Queue (RQ) D D D D C B A D C C B A A A A A A A U q=0.25 q=0.1 Review Queue Refinement
  • 45. Hard Precision (HP) The percentage of posts in the review queue belonging to the class D Soft Precision (SP) The percentage of posts in the review queue belonging to the class D and C Review Queue Refinement
  • 46. Hard Precision (HP) 41.90% Soft Precision (SP) 64.31% Review Queue (RQ) Size 3,416 Without Refinement Review Queue Refinement
  • 47. RQ Size 3,416 HP 41.90% SP 64.31% Model RQ Size RQ Reduction Hard Precision Soft Precision A Red. B Red. C Red. D Red. RQ A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% RQ A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% RQ A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% RQ A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% RQ A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
  • 48. RQ Size 3,416 HP 41.90% SP 64.31% Model RQ Size RQ Reduction Hard Precision Soft Precision A Red. B Red. C Red. D Red. RQ A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% RQ A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% RQ A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% RQ A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% RQ A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
  • 49. RQ Size 3,416 HP 41.90% SP 64.31% Model RQ Size RQ Reduction Hard Precision Soft Precision A Red. B Red. C Red. D Red. RQ A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% RQ A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% RQ A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% RQ A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% RQ A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
  • 50. RQ Size 3,416 HP 41.90% SP 64.31% Model RQ Size RQ Reduction Hard Precision Soft Precision A Red. B Red. C Red. D Red. RQ A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% RQ A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% RQ A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% RQ A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% RQ A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
  • 51. RQ Size 3,416 HP 41.90% SP 64.31% Model RQ Size RQ Reduction Hard Precision Soft Precision A Red. B Red. C Red. D Red. RQ A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% RQ A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% RQ A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% RQ A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% RQ A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
  • 52. RQ Size 3,416 HP 41.90% SP 64.31% Model RQ Size RQ Reduction Hard Precision Soft Precision A Red. B Red. C Red. D Red. RQ A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% RQ A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% RQ A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% RQ A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% RQ A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
  • 53. Readability and Popularity Metrics are the most effective for queue refinement Lessons Learned
  • 54. Readability and Popularity Metrics are the most effective for queue refinement Tradeoff between review queue reduction and bad post reduction Lessons Learned