Improving Low Quality Stack Overflow Post Detection

1. Improving Low Quality StackOverflow Post Detection Luca Ponzanelli David Fullerton Andrea Mocci University Of Lugano Switzerland Alberto Bacchelli Delft University of Technology Netherlands StackExchange Inc. New York, USA Michele Lanza

2. Answer Answer Question StackOverflow

3. Answer Answer Question StackOverflow

4. Answer Answer Question 6,000+ daily questions StackOverflow

5. Q Q Q Q StackOverflow Review Process Q Q Moderator System

6. Q Q Q Q StackOverflow Review Process Q Q Moderator System

7. Suggested Edits Late Answers and StackOverflow Review Process First Posts Low Quality Posts

8. Low Quality Posts Identified by the system StackOverflow Review Process

9. Low Quality Posts an inefficient approach increases the review StackOverflow Review Process queue size

10. Low Quality Posts an efficient approach saves time to reviewers StackOverflow Review Process

11. Low Quality Post Refine the review queue to remove misclassified posts StackOverflow Review Process

12. Body Length Capital Title Emails Count Lowercase Percentage Spaces Count StackOverflow Tags Count Text Speak Count Title Body Similarity Title Length Uppercase Percentage Quality Metrics

13. Body Length Capital Title Emails Count Lowercase Percentage Spaces Count Pure Textual Metrics StackOverflow Tags Count Text Speak Count Title Body Similarity Title Length Uppercase Percentage Quality Metrics

14. StackOverflow Quality Metrics Readability Metrics Popularity Metrics Textual Metrics

15. Average Term Entropy Automated Reading Index Coleman Liau Index Flesch Kincaid Grade Level Flesch Reading Ease Score Gunning Fox Index LOC Percentage Metric Entropy Sentences Count SMOG Grade Words Count Readability Metrics

16. Average Term Entropy Automated Reading Index Coleman Liau Index Flesch Kincaid Grade Level Flesch Reading Ease Score Gunning Fox Index Readab ility LOC Percentage Metric Entropy Sentences Count SMOG Grade Words Count Readability Metrics

17. Average Term Entropy Automated Reading Index Coleman Liau Index Flesch Kincaid Grade Level Flesch Reading Ease Score Gunning Fox Index Readab ility LOC Percentage Metric Entropy Sentences Count SMOG Grade Words Count Readability Metrics

18. Accepted by Originator Votes Approved Edit Suggestion Answer Badges Count Badges-Tags Coverage Bounty Start (End) Votes Close Votes Deletion Votes Down Votes Favorite Votes Moderator Review Votes Offensive Votes Reopen Votes Question Badges Count Spam Votes Total Badges Undeletion Votes Up Votes Popularity Metrics

19. StackOverflow Public Dump Classification Approach

20. StackOverflow Public Dump 5,648,975 Questions (September 2013) Classification Approach

21. StackOverflow Public Dump Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)

22. Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)

23. Neither Closed nor Deleted With an Accepted Answer Score > 7 Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)

24. Neither Closed nor Deleted With an Accepted Answer 1 < Score < 6 Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)

25. Neither Closed nor Deleted With an Accepted Answer Score < 0 Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)

26. Closed or Deleted Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)

27. Genetic Algorithm Classification Function

28. Genetic Algorithm QF = Xn i=1 wi · mi wi 2 [−1, 1] mi 2 [0, 1] Classification Function

29. Data Metrics L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Classification Function Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)

30. Metrics L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Classification Function Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software) Data

31. A function assigns Positive Value if Good Negative Value if Bad L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Classification Function Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)

32. quantiles q = 0.25 q = 0.25 25% 25% -1 0 1 x = QF(post) y = freq(x) D C B A Classification Function

33. 10% 10% q = 0.25 q = 0.25 D C B A -1 0 1 x = QF(post) y = freq(x) Classification Function

34. q = 0.25 q = 0.25 D C B A -1 0 1 x = QF(post) y = freq(x) 40% 40% Classification Function

35. StackOverflow Public Dump Review Queue Refinement

36. StackOverflow Public Dump StackOverflow Private Dump Low Quality Post Review Queue Refinement

37. x x x Review Queue Refinement

38. Review Queue Refinement

39. Review Queue (RQ) D D D D C C B B A A A A A q=0.25 D C C B A A A A A Review Queue Refinement

40. Review Queue (RQ) D D D C B B A A q=0.25 D C C B A A A A A Review Queue Refinement

41. Review Queue (RQ) D D D D C C B B A A A A ∩ D D D C C C B A A A A A D q=0.1 Review Queue Refinement

42. Review Queue (RQ) D D B ∩ D D D C C C B A A A A A D q=0.1 Review Queue Refinement

43. Review Queue (RQ) D D D D C C B B A A A A A A q=0.25 D C C B A A A A A q=0.1 U Review Queue Refinement

44. Review Queue (RQ) D D D D C B A D C C B A A A A A A A U q=0.25 q=0.1 Review Queue Refinement

45. Hard Precision (HP) The percentage of posts in the review queue belonging to the class D Soft Precision (SP) The percentage of posts in the review queue belonging to the class D and C Review Queue Refinement

46. Hard Precision (HP) 41.90% Soft Precision (SP) 64.31% Review Queue (RQ) Size 3,416 Without Refinement Review Queue Refinement

47. RQ Size 3,416 HP 41.90% SP 64.31% Model RQ Size RQ Reduction Hard Precision Soft Precision A Red. B Red. C Red. D Red. RQ A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% RQ A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% RQ A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% RQ A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% RQ A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%

53. Readability and Popularity Metrics are the most effective for queue refinement Lessons Learned

54. Readability and Popularity Metrics are the most effective for queue refinement Tradeoff between review queue reduction and bad post reduction Lessons Learned

Improving Low Quality Stack Overflow Post Detection

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (17)

Ähnlich wie Improving Low Quality Stack Overflow Post Detection

Ähnlich wie Improving Low Quality Stack Overflow Post Detection (15)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Improving Low Quality Stack Overflow Post Detection