SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
Predicting Question Quality in Community Question
                   Answering Sites


       Juan M. Caicedo C.                    Seshadri Sridharan                      Aarti Singh
       AndrewID: jcaicedo                    AndrewID: seshadrs                    (Project mentor)
    jcaicedo@cs.cmu.edu                 seshadrs@andrew.cmu.edu



                                              Abstract
         We present a model to predict question quality, an abstract measure, using only the
         content level features available at the time a new question is posted. We first pre-
         dict asker satisfaction using preexisting labels, and then predict different aspects
         of question quality using human annotations. For the former task, we use features
         from question text and community interaction to improve on the baseline model
         of Liu et al. For the latter, we hypothesize that question content and community
         response can independently model question quality, and enrich the content based
         model using co-training.


1   Introduction
Community Question Answering (CQA) web sites allow the users to post questions, submit an-
swers and interact with other users by voting, writing comments or by using other mechanisms of
participation. These sites have become a popular source for seeking information of different kinds,
both for general topics, such as Yahoo Answers or Answer Bag, and for more specialized ones, like
Quora and StackOverflow. One key element to the success of a CQA site is the quality of the con-
tent generated by the community of users. Particularly, the quality of the questions affects directly
the relevance of the content, the willingness of the community to participate and the likelihood that
visitors to the site want to engage in the process. For this reason, we think that it is important to
understand the factors that affect the quality of questions and, if possible, to be able to assess its
quality automatically.
Detecting the quality of the questions also benefits the users of a CQA site. First, the askers can
know in advance if the questions they ask will be graded as high quality. This would allow them to
learn to ask better questions and, ideally, improve the satisfaction that they get from the site. Second,
the moderators of a CQA can monitor the quality of the recently posted questions; this would allow
them to detect and improve those that are low quality and to highlight the high quality questions so
that they receive more attention by the community. We call the first application the online scenario,
when the question is being asked, and the second one the offline scenario, after the question has
been posted and the community has started to participate.
Although several problems of CQA have been addressed using diverse machine learning techniques,
predicting question quality poses challenges that have not been covered in much detail. The main
difficulty arises from the nature of the two possible applications. In the offline scenario, machine
learning algorithms can use features extracted from the community reaction to the question, which
is a reliable indicator of the quality of the content, whereas in the online scenario this information
is not available and the algorithms have to rely on the asker’s profile and the text of the question,
which requires techniques from NLP to extract informative features about the quality.
In this project we present a model for predicting the question quality in the online scenario. First,
we extend the existing work for predicting asker satisfaction [3] and we test its applicability across
a different dataset. Second, we improve it by using richer linguistic features extracted from the


                                                   1
question content. Then, after showing the high predictability of the models on this task, we move to
the related problem of predicting question quality. For this task we use manually labeled questions
to train the models again. To overcome the problem of labeling a large set of questions, we use
co-training to generate more training instances that allow us to improve both models.

2   Related Work
The interest in CQA sites has also increased within research areas related to information retrieval.
Much of that work has focused on content ranking and recommendation, content analysis, social
network analysis, user modeling and quality evaluation. [1] and [3] present an overview of the
research done in those areas. We will discuss here the works that are closely related to our tasks.
A framework for automatically classifying content of high quality in social media sites is described
in [1], where the quality is modeled in terms of the content itself and of the relations between the
user and the generated content and its usage statistics. However, they see the features extracted from
the content and the features derived from the community as complementary, and they do not study
the differences of the online and offline scenarios.
The problem of predicting user satisfaction is studied in [3]. It can be argued that modeling user
satisfaction can be used as an approximate measure of the quality of the content they create. We
believe that the satisfaction of an asker depends on the response from the community generated by
his or her questions, which depends in turn on the quality of the question itself. Liu et al. present a
prediction model that uses features based on the content and the community structure and evaluate it
the two scenarios that we are considering. We extend their work by using richer text based features
and exploiting additional interactions from the community. In [5], Shah and Pomerantz present a
study where they train a classifier that accurately predicts the quality of an answer based on human
judgment. We take a subset of the criteria used by the human judges that participated in their study,
and we use it to assess the quality of the questions.

3   Predicting Question Quality
The problem of assessing the quality of a question is subjective, since it depends on several factors
that can vary in the context of the evaluation. For this reason, we decide to address two related tasks:
(1) predict the asker satisfaction as an indicator of the question quality and (2) predict the quality
assessments assigned by humans.
Predicting Asker Satisfaction: We define that an asker is satisfied if he selects one of the posted
answers he received as the best one for his question; additionally, this answer must have at least 2
votes. We based this definition on the proposed by Liu et al. [3], and we adapt it to the data that we
have for this task. We add the constraint on the number of votes to also consider the judgment of the
community. Thus, we have a binary classification task where we have to predict whether the asker
of a question was satisfied or not.
Predicting Question Quality based on human assessments We use human judgments to assess the
question quality based on five aspects of question quality: readability, conciseness, detail, politeness
and appropriateness. They are a subset of the criteria used by [5] to measure answer quality on a
CQA site. The questions were annotated giving a value on a 1 to 5 scale for each of the selected
criteria and we aggregate them to define an indicator of the overall question quality. Under this
aggregated measure, we define that a high quality question has values greater than or equal to three
for at least 3 of the criteria. This is again a binary classification task, where our target label is this
aggregated measure.

4   Task 1: Predicting Asker Satisfaction
4.1 Experimental Setup

In this section we present the experimental setup for the asker satisfaction task. We describe the
dataset, features, classification algorithms and the evaluation metrics used for each of the experi-
ments.


                                                   2
Online Features                                    Offline Features
       Question Content Features                          Community Interaction Features
       Title length                                       Question favourite count
       Content punctuation density                        Question’s community score
       Text spacing density                               Number of question revisions
       Content body length                                Question new tag change count
       Code block counts, total length                    Community Answers Features
       Time(hour) posted                                  Answers count
       Tag count                                          Answers score max
       Extended Question Content Features                 Answers score total
       Text misspelling ratio                             Best Answer body length
       Text capitalization ratio                          Best Answer body spacing density
       Text blacklisted word count                        Accepted count
       Words per sentence                                 Accepted ratio
       Uppercase word length ratio                        Answers to question ratio
       Number of sentences                                Answers reputation
       Text similarity with the text of questions         Answerers Profile Features
       where the user is satisfied.                        Average Answerer membership age
       Similarity of the sequence of POS tags with the    Most voted, Most reputed answerer’s answer accepted answer count
       questions where the user is satisfied.              Most voted, Most reputed answerer’s answer acceptance ratio
       Asker Profile Features                              Most voted, Most reputed answerer’s reputation
       Answers to Questions ratio                         Most voted, Most reputed answerer’s question solved count
       Answers received
       Membership age
       Solved Question count
       Average past score
       Recent past score


                                        Table 1: Feature classes and their features


Dataset: Our dataset is based on the Stack Exchange network of CQA sites 1 . It contains 2.2 mil-
lion of questions and 4.8 million of answers, along with the complete information for the questions,
the answers posted, the selected answer by the asker, the user information (askers and answerers)
and the community response (votes, comments and modifications) for 35 of their sites. We selected
StackOverflow, a site dedicated to computer programming. For our experiments we randomly se-
lected 5,000 questions that were at least 2 month old and their corresponding 10,902 answers. There
are 1,734 (34.68 %) questions where the user is satisfied; this distribution is similar to the one of the
original dataset (33.72 %).
Features: We use two sets of features corresponding to the scenarios of our task. For the online
scenario, we extract features from the text of the question and the profile of the asker; for the offline
scenario, we add features from the answers posted and the reaction of the community to the question.
As our baseline model we use the features proposed in [3]; we extend them by adding more richer
features extracted from the text. Table 1 presents the list of the features used.
Algorithms: We trained three classifiers based on decision trees, logistic regression and Naive
Bayes. We chose these algorithms since they have been used successfully in CQA related problems.
We are also particularly interested in using decision trees for their readability, given the application
of our task. We used the algorithm implementations provided in Scikit-learn toolkit [4].
Metrics: We report the overall accuracy of the classifiers, along with the averaged measures of
precision, recall and the F1 score over the two classes. We perform 10-fold cross-validation over
5,000 training instances.

4.2 Experiment Results

We present in this section the results of the predicting asker satisfaction task. First, we evaluated the
performance of the algorithms varying the number of training instances and, for decision trees, vary-
ing their parameters. We then evaluated the best algorithm, logistic regression, using the different
sets of features for each of the scenarios. Finally, we report the features with higher predictability
according to their information gain.
Algorithm evaluation: Since we are mainly interested in the online scenario, we compared the
algorithms using the corresponding set of features. In the case of decision trees, we evaluated first
the maximum depth of the learned tree in order to choose the appropriate value for our task. Figure
1 presents how the complexity of the tree affects the accuracy of the training and the test set. We
chose a maximum depth of 8 for the rest of our experiments. The algorithm performs poorly below
   1
       http://stackexchange.com/


                                                            3
Features              Accuracy              Precision    Recall      F1
                                        Baseline offline        0.8177                0.8028      0.6411    0.7126
                                        Offline                 0.8175                0.7994      0.6451    0.7135
                                        Baseline online        0.6841                0.5822      0.3747    0.4544
                                        Online                 0.6887                0.5886      0.3912    0.4692
                                        Question only          0.6607                0.5337      0.2976    0.3811


                     Table 2: Classification results for the different sets of features


depth 8 and it starts to over-fit beyond it. The error bars in this and the upcoming figures correspond
to the standard error of the sample mean.
Although decision trees have a bad performance in this scenario (accuracy 66.06%), they achieve
better results in the offline scenario, where a tree of depth 8 was 87% accurate. This can be explained
by examining the features with most Information Gain presented in Table 3.
We compared all the algorithms varying the number of training instances. The performance of
logistic regression based learner has been the best overall. In fact, using L1 regularization we achieve
the highest accuracy (70%), this shows that some of the features might be redundant and we can
perform more experiments using feature selection. For using Naive Bayes learner, we normalized
the values of the features in order to adjust their scales and to use the same value for smoothing all
the features when there are zero values.
Different feature sets: We evaluated the logistic based classifier (L1) using five sets of features:
two sets based on [3] are considered as baselines for each scenario, then two sets with extended
features and another one considering only the features extracted from the question content. The
Table 2 presents these results. Using the new features, the classifiers are slightly better than the
baseline.
Most informative features: Of the new text based features we added to our satisfaction prediction
model, only the misspelling ratio appears (with a weak value of 0.008) in the top Information Gain
features presented in Table 3. The features in the baseline, such as asker information, are much
superior to the ones from the text. This could be a reason as to why we observe a decrease in
performance by supplementing the baseline with richer textual features.
We also observed that all algorithms performed better at predicting the unsatisfied questions better
than the satisfied ones. This could be attributed to the inherent skewed distribution: there are 1.85
times unsatisfied questions as the satisfied ones. Its also possible that there exist questions for which
many users haven’t selected an answer, in spite of having gotten an answer.
Another interesting phenomenon we observe is that the seniority and successfulness of users seems
to divulge most information about the asker satisfaction. We can see that total number of questions
solved by asker and his membership age are the two most important features in the online scenario.



                              1.0                                                                                   ●
                                                                                                             ●
                                                                                                      ●



                                                                                         ●   ●
                                                                                    ●
                              0.9                                        ●
                                                                                ●
                                                   ●
                                                   ●      ●
                                                          ●      ●
                                    ●                            ●       ●          ●
                                                                                ●        ●
                                                                                             ●
                                                                                                      ●
                                                                                                             ●      ●
                   Accuracy




                              0.8



                              0.7



                              0.6



                              0.5

                                    1              2      3      4       5      6   7    8 9          20     30     50
                                                                         Tree depth
                                                                              Dataset
                                                                     ●       Test   ●   Train




                                Figure 1: Decision Trees accuracy varying the tree depth


                                                                                4
Accuracy                                                                    Precision
             0.70                                                                   ●     0.60                                                     ●      ●
                                                                                    ●
                                                                              ●                                                             ●      ●      ●
                                                                  ●
             0.68                                                             ●                                                ●
                                                                                                                               ●            ●
                                      ●
                                                   ●                                      0.55                    ●
                                                                                                                  ●
                             ●                                                                               ●                              ●
             0.66        ●
                             ●
                                                   ●
                                                                  ●
                                                                              ●     ●                    ●
                                                                                                             ●                 ●                   ●
                         ●                                                    ●     ●                    ●
             0.64                     ●                           ●                       0.50                    ●
                                                                                                                                                   ●      ●
                             ●        ●            ●                                                                                        ●             ●
                         ●                                                                                                     ●
             0.62                                                                                                 ●
                         ●
                             ●                                                            0.45               ●
             0.60    ●                                                                               ●
                                                                                                     ●
                                                                                                         ●
                                                                                                         ●

             0.58    ●                                                                    0.40
                     ●                                                                               ●


                                                    Recall                                                                         F1
                                                                                    ●                                                                     ●
                                                                                                                                                          ●
                                                                              ●                                                                    ●
             0.40    ●       ●                     ●              ●           ●
                                                                                    ●
                                                                                          0.45
                                                                                                                                            ●
                     ●                                                                                                         ●            ●             ●
                         ●                         ●              ●                 ●                             ●            ●
                     ●                ●
             0.35                                  ●              ●           ●           0.40               ●
                                                                                                             ●
                                                                                                                                                          ●
                         ●   ●        ●                                                                  ●        ●                                ●
                                                                              ●                      ●   ●                                  ●
                                      ●                                                              ●       ●    ●
             0.30    ●   ●
                             ●
                                                                                                     ●   ●
                                                                                          0.35
                                                                                                     ●                                      ●
             0.25                                                 ●                       0.30
             0.20                                                                                                              ●
                             ●        ●
                                                   ●                                      0.25               ●    ●
                         ●                                                                               ●



                                  1000           2000           3000        4000                                 1000      2000            3000   4000
                                                                              Training instances

                                  ●       Decision Tree   ●    Logistic Regression (L1)     ●   Logistic Regression (L2)   ●    Multinomial NB




                                 Figure 2: Performance on Online Satisfaction Prediction task.

                                Online                                                                                    Offline
    Feature                                                   Information Gain              Feature                                                    Information Gain
    Asker’s total number of questions solved                  0.0453                        Best answer score                                          0.5449
    Asker’s membership age                                    0.0424                        Highest answerer reputation                                0.1139
    Asker’s answers to question ratio                         0.0338                        Community score for question                               0.11168
    Asker’s average past question score                       0.0307                        Answerer’s reputation                                      0.10283
    Asker’s recent questions score                            0.0166                        Top value of answerers’ accepted answer count              0.09611
    Question code length                                      0.0112                        Top value of answerers’ question solved count              0.09234
    Question text unigram entity                              0.0083                        Top value of answerers’ answer count                       0.09218
    Question text misspelling ratio                           0.0080                        Answerers’ answer accepted count                           0.09153
    Question text bigram the top                              0.0059                        Reputation of most voted answerer                          0.09129
    Question tag android                                      0.057                         Top answerer’s answer count                                0.08176
                        Question content only
    Feature                                                   Information Gain
    Question code length                                      0.01122
    Question text unigram entity                              0.00838
    Question text misspelling ratio                           0.00809
    Question text bigram the top                              0.00594
    Question tag android                                      0.00571
    Question body length                                      0.00515
    Question text unigram url                                 0.00481
    Question text bigram using this                           0.00458
    Question text bigram reference to                         0.00455
    Question text bigram any ideas                            0.00444


                                           Table 3: Feature sets ranked by the Information Gain


Importantly, this is in accord with the general trend of success and satisfaction patterns we would
expect in CQA sites.


5       Task 2: Predicting Question Quality based on human assessment

Our main interest in the question quality task is to evaluate the classifiers on a semi-supervised set-
ting as a solution to training data scarcity. We continue the experiments performed for the previous
task, but this time we evaluate the classifiers on the dataset of annotated questions. In addition to the
evaluate the different algorithms using the features available on each scenario (online and offline),
we apply co-training in order to expand the training examples. In the following section we present
these experiments and their results.
Definition of Question Quality
The merit or quality of a question is a highly subjective factor that is difficult to quantify. Since
it cannot be measured directly or extended from any feature(s) available, we had to hand annotate
it. We define the quality of a question as a combination of five metrics: conciseness, politeness,
readability, detail and relevance rated on a scale of 1 to 5.


                                                                                        5
In addition to these metrics, we also annotated them with a quality label, that represents our judgment
on the question quality. We used this measure to understand the importance of the five metrics and
their combined influence on question quality.We found that conciseness, readability, politeness and
detail are reliable estimators of question quality. We looked at different patterns of values the metrics
took. The most consistent one was that high-quality questions had at least three metrics with values
greater than or equal to 3. We used the same as our label for question quality. This rough rule for
labeling was followed because high quality questions occurred in different forms: they were concise
and readable but not detailed, detailed and polite but not concise, readable and polite but not relevant
et cetera. In addition to that, our question-quality labeling rule also had considerable correlation with
our hand annotated quality metric.

5.1 Experimental Setup

We use the algorithms, features and evaluation metrics defined in the Asker Satisfaction task, but we
apply them for the dataset of the manually labeled questions. This dataset contains 172 instances
where 127 (73.83 %) of them are labeled as high quality, and the rest (45, 26.16%) as low quality.
Another difference in this task is that we perform 4-fold cross validation, since the number of training
instances is much smaller.
Co-Training: For increasing the number of training examples, we apply this technique as presented
in [2], making an adjustment for our task: since the target classes are not balanced, we ensure that
the classifiers add the appropriate number of instances that preserves the class distribution. We train
two classifiers where each one uses one of the set of features of the online and offline scenarios.
The following are the values that we assign to the four parameters of the co-training algorithm:

      • p and n: number of positive and negative examples labeled by the classifiers and added to
        the training pool. We set this values to p = 3 and n = 1.
      • k Number of iterations. We set this parameter to k = 100.
      • u Number of unlabeled examples that are labeled by the classifiers in each iteration. We
        perform our experiments using different values.

For the evaluation, we use 4-fold cross validation as follows: we partition the set of labeled instances
in 4 subsets, maintaining the same class distribution on each subset. We select 3 subsets for training
the classifiers and leave the remaining subset for testing them after each iteration. The subsets
selected for training are going to be extended with the new instances labeled by the classifier, while
the test subset is not modified. In each iteration we evaluate the accuracy, precision, recall and F1
score of each classifier.

5.2 Experiment Results

Question quality and feature sets: We evaluated the same classifiers that we used on the Asker
satisfaction task but for predicting the label assigned after the annotation; we also compared them
using the two sets of features (online and offline). The averaged results are presented in Figure 3.
Again, the best results are obtained with the logistic regression based learner using L1 regularization,
with an accuracy of 0.74418 and a F1 score of 0.83851. However, for this task the classifiers are
more accurate and the features from the online scenario have higher predictability.
Cotraining: We evaluated the overall improvement across the iterations and the effect of varying the
parameter u (the number of unlabeled instances that the classifiers will label) on the performance of
each classifier. Figure 4 shows the F1 score of each classifier for each iteration of five experiments
running cotraining with different values for the parameter u.
We note that the accuracies of both classifiers improved, however the one for the offline scenario,
which initially was the weakest, improved with larger margins up to the point to achieve the perfor-
mance of the other classifier (when u = 75 and u = 100). Regarding the number of iterations, in
general the improvements occur within the first 50. At this point, the training set varies from 213 to
255 instances, depending on the value of u. This variation is the effect of the random sampling on
the set of unlabeled data, which is controlled by this parameter.


                                                   6
Accuracy                                                                                                                                                                                                                     F1


              Multinomial NB



     Logistic Regression (L2)



     Logistic Regression (L1)



               Decision Tree



                                0.0                                                 0.2                                                 0.4                                               0.6                                              0.8                           0.0                                   0.2                                       0.4                                       0.6         0.8

                                                                                                                                                                                                                  Feature set
                                                                                                                                                                                          Both                          Offline                          Online




                             Figure 3: Evaluation of the classifiers on the Question Quality task.


                      0.88
                      0.86                                                          ●
                                                                                                            ●
                                                                                                                ●
                                                                                                                                            ● ●                                                                   ●
                                                                                                                                                                                                                      ● ● ● ●
                                                                                                                                                                                                                                  ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
                                                                                                                                                                                                                                                                                                                   ● ●
                                                                                                                                                                                                                                                                                                                         ● ● ● ● ●
                                                                                                                                                                                                                                                                                                                                         ● ● ● ● ● ● ● ● ● ● ● ● ●
                                                                                                                                                                                                                                                                                                                                                                                     ●
                                                                                                                                                                                                                                                                                                                                                                                         ●
                                                                                                                                                                                                                                                                                                                                                                                             ● ● ●

                                                        ●               ● ● ●
                                                                    ●                   ● ●                         ●           ● ● ●                                         ●                   ● ● ● ●

                      0.84          ●
                                        ●
                                            ●       ●
                                                            ●
                                                                ●
                                                                                                ●
                                                                                                    ●                   ● ●                         ● ● ● ● ● ● ●

                                                                                                                                                        ● ● ●         ●
                                                                                                                                                                                  ● ● ● ●
                                                                                                                                                                                                  ●                                                                                                                                                                                          ●




                                                                                                                                                                                                                                                                                                                                                                                                         25
                                                                                                        ●
                                                ●                                                                                   ● ●         ● ●                       ● ● ● ● ● ●                 ●       ● ● ● ● ● ● ● ●                    ● ●                 ● ●                                                                                                                 ●
                                                                                                                                            ●                     ●                                       ●                            ● ● ●                 ●               ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●                                                 ● ● ● ● ● ● ●               ● ●

                      0.82                                                                              ● ● ●
                                                                                                                    ●
                                                                                                                        ●
                                                                                                                            ● ●                                                                                                                          ●       ●                                                                   ● ● ● ● ● ● ● ●


                                                                                                    ●

                      0.80                      ●
                                                    ●
                                                        ●
                                                            ●
                                                                ●
                                                                    ●
                                                                        ● ●
                                                                                ●
                                                                                    ●
                                                                                        ● ●
                                                                                                ●




                      0.78          ● ●
                                            ●




                      0.88
                      0.86              ●
                                                    ●
                                                                            ●
                                                                                ●
                                                                                            ●       ● ● ●
                                                                                                                        ●
                                                                                                                                                                                                                                                     ● ● ● ● ● ● ● ● ●
                                                        ●           ●                           ●                                               ●                                     ●                   ● ● ● ● ● ● ● ● ● ● ● ● ●                                                                                                                                              ●
                                                ●               ● ●                     ●                       ●           ● ● ●                           ●             ● ●             ● ● ● ●                                                                                                                                                                                    ●
                      0.84          ●       ●               ●
                                                                                    ●
                                                                                                                    ●
                                                                                                                                        ● ●         ●
                                                                                                                                                        ●
                                                                                                                                                                ● ● ●             ●
                                                                                                                                                                                                                                                                                       ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
                                                                                                                                                                                                                                                                                                                                                                                         ● ● ● ●




                                                                                                                                                                                                                                                                                                                                                                                                         50
                      0.82                                                  ● ●
                                                                                        ●
                                                                                            ● ●
                                                                                                    ● ●
                                                                                                            ●
                                                                                                                ● ●
                                                                                                                        ●
                                                                                                                            ●
                                                                                                                                ●       ●       ● ●                               ●                                     ● ●       ● ● ● ●        ●       ●       ● ●         ● ● ●                                                           ● ●
                                                                                                                                                                                                                                                                                                                                                     ●
                                                                                                                                                                                                                                                                                                                                                         ● ● ●
                                                                                                                                                                                                                                                                                                                                                                     ● ● ● ● ● ●
                                                                                                                                                                                                                                                                                                                                                                                     ● ● ● ●
                                                                                                                                                                                                                                                                                                                                                                                                   ●
                                                            ●                       ●                                               ●       ●           ● ● ● ● ● ● ●                 ● ● ● ● ● ● ● ● ●                       ●              ●       ●       ●           ●             ●         ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
                                                                                                                                                                                                                                                                                           ● ●
                      0.80                          ● ●         ●
                                                                    ● ●



                      0.78
                                                ●
                                            ●
                                    ● ●




                      0.88
                                                                                                                                            ●
                      0.86                                  ●
                                                                            ●       ● ● ●                   ●
                                                                                                                ●
                                                                                                                    ●
                                                                                                                        ●
                                                                                                                            ●
                                                                                                                                ● ● ●           ●
                                                                                                                                                    ●
                                                                                                                                                        ● ● ●
                                                                                                                                                                  ● ●
                                                                                                                                                                          ● ● ● ● ● ● ●
                                                                                                                                                                                                      ● ● ● ● ● ●
                                                                                                                                                                                                                          ● ●
                                                                                                                                                                                                                                      ●     ● ● ● ● ●                                         ●
                                                                        ●                       ● ● ●                                                                                                                             ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
                                                                                                                                                                                                                                                                        ●         ● ●   ●   ●
                                                        ●       ●
                                                                                ●                                                                                                                                                         ●           ● ● ● ● ● ● ●   ●     ● ● ●     ●   ●     ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
                                                                                                                                                                                                                                                                                                  ●

                      0.84          ●
                                        ●
                                                ●
                                                    ●
                                                                    ●
                                                                                                                                                                      ●
                                                                                                                                                                          ●
                                                                                                                                                                              ● ● ● ● ● ● ● ● ● ● ● ● ● ●
                                                                                                                                                                                                                                                                          ● ● ● ● ● ● ● ●
                                                                                                                                                                                                                                                                                          ● ● ●
                                                                                                                                                                                                                                                                                                ●
                                                                                                                                                                                                                                                                                                    ● ● ● ● ● ●
                                                                                                                                                                                                                                                                                                                ●
                                                                                                                                                                                                                                                                                                                  ●
                                                                                                                                                                                                                                                                                                                    ● ● ● ● ● ● ● ● ● ●




                                                                                                                                                                                                                                                                                                                                                                                                         75
                                            ●                                                                                   ●                                 ●
                                                                                                                                            ●           ● ● ●
                                                                                                                                                ●
                      0.82                                                  ●
                                                                                ● ●             ●
                                                                                                    ●
                                                                                                        ● ●
                                                                                                                ●
                                                                                                                    ●
                                                                                                                        ●
                                                                                                                            ●       ● ●             ●


                                                                        ●               ●
                      0.80                      ●
                                                        ● ●
                                                                ● ●                         ●


                                            ●       ●

                      0.78          ●
                                        ●
                 F1




                      0.88
                      0.86                                                                                                              ●                                                               ● ●
                                                                                                                                                                                                            ●
                                                                                                                                                                                                              ● ●   ● ●
                                                                                                                                                                                                                        ●
                                                                                                                                                                                                                          ● ● ●     ●
                                                                                            ●                                       ● ●                                                           ● ● ●           ●             ● ●     ●
                                            ● ●                                                                                                                                                                                                                 ●
                      0.84
                                                                        ● ● ●                         ●                   ● ● ● ● ●       ● ● ●             ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●                                     ●
                                                                                    ● ●         ● ●                                                                                                                                       ● ● ● ● ● ● ● ● ● ● ●   ● ● ● ●
                                                                                                                                                                                                                                                                      ● ●
                                    ● ●             ●               ●                             ● ●                 ● ●                       ● ● ● ● ● ●                         ● ● ● ●   ●   ●
                                                                                                        ● ●




                                                                                                                                                                                                                                                                                                                                                                                                         100
                                                        ● ● ●                                         ● ●   ● ● ● ● ●     ●
                                                                                                                                ● ●
                                                                                                                                    ● ● ●
                                                                                                                                                  ●     ● ● ● ●     ●   ●     ●             ●   ●   ●
                                                                                                                                                                                                      ● ● ● ● ● ● ● ● ●   ● ●                     ●   ● ●   ●     ● ●
                                                                                                                                                ●   ● ●         ● ●   ●     ●   ● ●                                             ● ●     ● ●         ●         ●
                                                                                                    ●     ●           ● ●   ● ●           ● ●                             ●                                             ●     ●     ● ●     ●             ●
                                                                                                            ● ● ●   ●                                                                                                                                           ●

                      0.82                                                      ●
                                                                                    ●
                                                                                        ●
                                                                                            ●
                                                                                                ●                 ●
                                                                                                                                              ●
                                                                                                                                                                                                                                              ● ●


                                                ● ●                     ● ●

                      0.80              ●
                                            ●
                                                        ● ● ●
                                                              ●




                      0.78          ●




                      0.88
                      0.86                                      ●                                                   ●       ● ● ●
                                                        ● ●                 ●                               ● ●         ●               ●                                                                                                                                      ●

                      0.84          ●       ● ● ●                   ● ●         ● ● ●               ● ●
                                                                                                                                            ● ● ● ●
                                                                                                                                                            ● ●
                                                                                                                                                                              ●
                                                                                                                                                                                  ●
                                                                                                                                                                                              ● ●
                                                                                                                                                                                                      ●
                                                                                                                                                                                                          ● ● ● ● ● ● ●
                                                                                                                                                                                                                                  ● ● ● ● ● ● ● ● ●
                                                                                                                                                                                                                                                                 ● ● ● ●           ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
                                                                                                                                                                                                                                                                                                                                     ●                                                           ● ●




                                                                                                                                                                                                                                                                                                                                                                                                         200
                                                                                            ● ●                                                                   ● ● ●               ● ●                                                                                                                                                ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
                                        ●


                      0.82                                                                                                                                                                                                                                                                                                       ●       ●
                                                                                                                                                                                                                                                                                                                                             ●
                                                                                                                                                                                                                                                                                                                                                 ● ● ● ● ● ● ● ● ● ● ●
                                                                                                                                                                                                                                                                                                                                                                                     ●
                                                                                                                                                                                                                                                                                                                                                                                         ● ● ●
                                                                                                                                                                                                                                                                                                                                                                                                   ●
                                                                                                                                                                                                                                                                                                                           ● ●       ●
                                                                                                                                                                                                                                                                               ●       ● ●               ● ● ● ● ● ● ● ●

                      0.80                      ●
                                                    ● ● ● ● ●               ● ●
                                                                                    ● ●         ● ●                                                                                                                       ●            ●                 ● ● ● ● ● ●
                                                                                                                                                                                                                                                                                   ●
                                                                                                                                                                                                                                                                                             ●
                                                                                                                                                                                                                                                                                                 ● ● ●


                                                                        ●                   ●                                                                         ●               ●                       ●               ● ● ●        ● ● ● ●

                      0.78          ●
                                        ● ●
                                                                                                        ●
                                                                                                            ●
                                                                                                                ● ● ●
                                                                                                                            ● ● ●
                                                                                                                                        ● ● ● ● ● ● ● ●                   ●
                                                                                                                                                                              ● ●
                                                                                                                                                                                          ●
                                                                                                                                                                                              ●
                                                                                                                                                                                                  ●
                                                                                                                                                                                                      ● ●         ● ● ●




                      0.88
                      0.86                              ●       ● ● ●
                                                                            ●       ●
                                                    ●
                                                            ●
                                                                                        ●                           ●       ● ● ● ● ● ● ●                                                                                                                                                                ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
                                                                                                                        ●                               ● ● ● ● ●                                                                                                                      ● ● ● ● ● ●                                                                         ● ● ● ● ● ● ● ● ●
                                                                                ●           ●                   ●                                                         ● ● ● ● ● ● ● ● ● ●                                                                            ● ● ● ●
                                            ● ●                                                 ●                                                                                                                 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
                                                                                                    ● ● ●
                      0.84          ●
                                                                                                                                                                                                                                                                                                                                                                                                         500




                                        ●

                      0.82                                                              ●                           ●
                                                                                                                        ●
                                                                                                        ●                   ● ●                             ●                 ●                       ●                                                                                                                                                                  ● ● ● ● ● ● ● ● ● ●
                                                                                                                ●                                                                                         ● ● ●                        ● ● ● ●
                      0.80
                                                                                                                                    ● ● ● ● ● ●                 ● ● ● ●           ● ● ● ● ●                                                                                                  ●
                                                                                                ●           ●                                                                                                         ● ● ● ● ● ●                    ●       ●                                                                   ●           ●       ●               ●
                                                                                                    ●                                                                                                                                                                        ● ● ● ● ●           ● ● ● ● ● ● ● ● ● ● ● ● ●           ● ●         ●           ●
                                                        ●                           ●                                                                                                                                                                            ●                                                                                       ●       ●
                                                                    ●       ●               ●                                                                                                                                                            ●               ●
                                        ●       ●           ● ●         ●       ●

                      0.78          ●
                                            ●       ●                                                                                                                                                                                                                ●




                                0                                   10                                      20                                      30                                    40                                  50                                 60                                70                      80                                    90                              100
                                                                                                                                                                                                                  Iteration

                                                                                                                                                                                              ●
                                                                                                                                                                                                          Offline                  ●
                                                                                                                                                                                                                                           Online




                 Figure 4: Evaluation of the classifiers in co-training varying the parameter u.




The best performance was obtained when u = 75. In this case, the F1 score of the classifier that
uses the offline features increased from 0.77662 to 0.85222, while for the online classifier the im-
provement was from 0.83852 to 0.86456. This coincides with the remark made in [2] about the
training pool size, which notes that selecting the unlabeled instances from a smaller set forces the
classifiers to learn from examples that are more representative of the underlying distribution. We
also examined the confidence of the classifiers for each target class, i.e. the minimum probability
predicted by the classifiers for the instances that were added to the training set. For the positive
class (high quality), the lowest value was 0.92328, while for the negative class it was 0.20913. This
uncertainty can be attributed to the fact that the classes are unbalanced. Table 4 presents a summary
of the results of the experiments.


                                                                                                                                                                                                                        7
Feature set   Initial F1     u   Max. F1   Iteration with Max. F1    Gain
                   Offline         0.77662      25   0.83277              30            0.05615
                                               50   0.82055              15            0.04393
                                               75   0.85222              80            0.07560
                                              100   0.83854              17            0.06192
                                              200   0.81701              84            0.04040
                                              500   0.81256              22            0.03594
                   Online        0.83852       25   0.86227              76            0.02375
                                               50   0.85710              11            0.01858
                                               75   0.86456              27            0.02604
                                              100   0.85415              70            0.01563
                                              200   0.85130              8             0.01278
                                              500   0.86388              8             0.02536


       Table 4: Maximum values of the F1 in co-training achieved varying the parameter u.


6   Conclusions
From the above experiments, we have learnt vital insights about the performance of the algorithms
in the asker satisfaction and question quality prediction. We realized the effect of the skew in asker
satisfaction in the classification accuracy values. We have also seen certain well known CQA trends
manifest in the feature analysis. The offline features clearly predict the asker satisfaction better. Our
approach to look at a question as having offline and online views has been successful, giving us
insights into question quality.
We characterized question quality, an abstract measure as a combination of particular aspects. Our
approach to quantify question quality showed the importance of the contributing metrics individu-
ally, besides revealing its high subjectivity, and we were able to train classifiers that predicted the
values of the annotations.
We used co-training to expand the training data; this increased the predictive performance of both
classifiers, but mainly the one based on the offline set of features. Furthermore, we learnt the the
quality and consistency of the annotations are a limitation of this technique in this scenario.
While we experimented with different learning algorithms and feature sets, we found that both asker
satisfaction and question quality can be modeled similarly. Through our experiments, we were able
to show that we can improve question quality prediction by defining more specific features and by
expanding the training set using semi-supervised learning.

References
[1] Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, and Gilad Mishne. Finding
    high-quality content in social media. Proceedings of the international conference on Web search
    and web data mining WSDM 08, page 183, 2008.
[2] Avrim Blum and Tom Mitchell. Combining Labeled and Unlabeled Data with Co-training. In
    COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann
    Publishers, 1998.
[3] Yandong Liu, Jiang Bian, and Eugene Agichtein. Predicting information seeker satisfaction
    in community question answering. Proceedings of the 31st annual international ACM SIGIR
    conference on Research and development in information retrieval SIGIR 08, (Section 2):483,
    2008.
[4] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Pret-
    tenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Per-
    rot, and Duchesnay E. Scikit-learn: Machine Learning in Python . Journal of Machine Learning
    Research, 12:2825–2830, 2011.
[5] C Shah and J Pomerantz. Evaluating and Predicting Answer Quality in Community QA. Li-
    brary, (March 2008):411–418, 2010.




                                                       8

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (7)

DEEP LEARNING SENTIMENT ANALYSIS OF AMAZON.COM REVIEWS AND RATINGS
DEEP LEARNING SENTIMENT ANALYSIS OF AMAZON.COM REVIEWS AND RATINGSDEEP LEARNING SENTIMENT ANALYSIS OF AMAZON.COM REVIEWS AND RATINGS
DEEP LEARNING SENTIMENT ANALYSIS OF AMAZON.COM REVIEWS AND RATINGS
 
An Efficient Trust Evaluation using Fact-Finder Technique
An Efficient Trust Evaluation using Fact-Finder TechniqueAn Efficient Trust Evaluation using Fact-Finder Technique
An Efficient Trust Evaluation using Fact-Finder Technique
 
USER PROFILE BASED PERSONALIZED WEB SEARCH
USER PROFILE BASED PERSONALIZED WEB SEARCHUSER PROFILE BASED PERSONALIZED WEB SEARCH
USER PROFILE BASED PERSONALIZED WEB SEARCH
 
V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...
V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...
V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...
 
Dt bar a dynamic ant
Dt bar a dynamic antDt bar a dynamic ant
Dt bar a dynamic ant
 
QUESTION ANSWERING SYSTEM USING ONTOLOGY IN MARATHI LANGUAGE
QUESTION ANSWERING SYSTEM USING ONTOLOGY IN MARATHI LANGUAGEQUESTION ANSWERING SYSTEM USING ONTOLOGY IN MARATHI LANGUAGE
QUESTION ANSWERING SYSTEM USING ONTOLOGY IN MARATHI LANGUAGE
 
Who gives a tweet
Who gives a tweetWho gives a tweet
Who gives a tweet
 

Andere mochten auch

Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1
janijanahmaeniherisna
 
Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1
janijanahmaeniherisna
 
Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1
janijanahmaeniherisna
 
Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1
janijanahmaeniherisna
 
Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1
janijanahmaeniherisna
 
Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1
janijanahmaeniherisna
 
Estadísticas 2010 2011
Estadísticas 2010 2011Estadísticas 2010 2011
Estadísticas 2010 2011
danielperico
 
AgileSense Business Intelligence Dashboards Reports Charts Metrics
AgileSense Business Intelligence Dashboards Reports Charts MetricsAgileSense Business Intelligence Dashboards Reports Charts Metrics
AgileSense Business Intelligence Dashboards Reports Charts Metrics
Rafi Dudekula
 

Andere mochten auch (16)

Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1
 
Emergency Department Information Systems (EDIS)
Emergency Department Information Systems (EDIS)Emergency Department Information Systems (EDIS)
Emergency Department Information Systems (EDIS)
 
Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1
 
Logo
LogoLogo
Logo
 
Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1
 
Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1
 
Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1
 
Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1Power point lutpia,miski,siti xii soc 1
Power point lutpia,miski,siti xii soc 1
 
Estadísticas 2010 2011
Estadísticas 2010 2011Estadísticas 2010 2011
Estadísticas 2010 2011
 
Web Usability
Web UsabilityWeb Usability
Web Usability
 
AgileSense Business Intelligence Dashboards Reports Charts Metrics
AgileSense Business Intelligence Dashboards Reports Charts MetricsAgileSense Business Intelligence Dashboards Reports Charts Metrics
AgileSense Business Intelligence Dashboards Reports Charts Metrics
 
Pekdemir: 2014 yılı makaleleri [DOMATES MÜTF Acil 29.11.14]
Pekdemir: 2014 yılı makaleleri [DOMATES MÜTF Acil 29.11.14]Pekdemir: 2014 yılı makaleleri [DOMATES MÜTF Acil 29.11.14]
Pekdemir: 2014 yılı makaleleri [DOMATES MÜTF Acil 29.11.14]
 
Amsti
AmstiAmsti
Amsti
 
İkizceli: Gençlerde Sessiz Kardiyak Katiller [DOMATES MÜTF Acil 29.11.14]
İkizceli: Gençlerde Sessiz Kardiyak Katiller  [DOMATES MÜTF Acil 29.11.14]İkizceli: Gençlerde Sessiz Kardiyak Katiller  [DOMATES MÜTF Acil 29.11.14]
İkizceli: Gençlerde Sessiz Kardiyak Katiller [DOMATES MÜTF Acil 29.11.14]
 
WakeMed Patient Family Centered Care Rollout
WakeMed Patient Family Centered Care RolloutWakeMed Patient Family Centered Care Rollout
WakeMed Patient Family Centered Care Rollout
 
Light
LightLight
Light
 

Ähnlich wie Seshadri ML Report

Répondre à la question automatique avec le web
Répondre à la question automatique avec le webRépondre à la question automatique avec le web
Répondre à la question automatique avec le web
Ahmed Hammami
 

Ähnlich wie Seshadri ML Report (20)

A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...
A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...
A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...
 
IRJET- Analysis of Question and Answering Recommendation System
IRJET-  	  Analysis of Question and Answering Recommendation SystemIRJET-  	  Analysis of Question and Answering Recommendation System
IRJET- Analysis of Question and Answering Recommendation System
 
[IJET-V2I2P5] Authors:Mr. Veer Karan Bharat1, Miss. Dethe Pratima Vilas2, Mis...
[IJET-V2I2P5] Authors:Mr. Veer Karan Bharat1, Miss. Dethe Pratima Vilas2, Mis...[IJET-V2I2P5] Authors:Mr. Veer Karan Bharat1, Miss. Dethe Pratima Vilas2, Mis...
[IJET-V2I2P5] Authors:Mr. Veer Karan Bharat1, Miss. Dethe Pratima Vilas2, Mis...
 
Analyzing Stack Overflow - Problem
Analyzing Stack Overflow - ProblemAnalyzing Stack Overflow - Problem
Analyzing Stack Overflow - Problem
 
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
 
A Query Routing Model to Rank Expertcandidates on Twitter
A Query Routing Model to Rank Expertcandidates on TwitterA Query Routing Model to Rank Expertcandidates on Twitter
A Query Routing Model to Rank Expertcandidates on Twitter
 
An Adaptive Approach for Subjective Answer Evaluation
An Adaptive Approach for Subjective Answer EvaluationAn Adaptive Approach for Subjective Answer Evaluation
An Adaptive Approach for Subjective Answer Evaluation
 
Novel Scoring System for Identify Accurate Answers for Factoid Questions
Novel Scoring System for Identify Accurate Answers for Factoid QuestionsNovel Scoring System for Identify Accurate Answers for Factoid Questions
Novel Scoring System for Identify Accurate Answers for Factoid Questions
 
Beyond text qa multimedia answer generation by harvesting web information
Beyond text qa multimedia answer generation by harvesting web informationBeyond text qa multimedia answer generation by harvesting web information
Beyond text qa multimedia answer generation by harvesting web information
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLP
 
Report
ReportReport
Report
 
Co-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsCo-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online Reviews
 
Répondre à la question automatique avec le web
Répondre à la question automatique avec le webRépondre à la question automatique avec le web
Répondre à la question automatique avec le web
 
Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02
 
User Feedback based Ranking for Yahoo Answers
User Feedback based Ranking for Yahoo AnswersUser Feedback based Ranking for Yahoo Answers
User Feedback based Ranking for Yahoo Answers
 
dexa08linli
dexa08linlidexa08linli
dexa08linli
 
Artificial Unintelligence:Why and How Automated Essay Scoring Doesn’t Work (m...
Artificial Unintelligence:Why and How Automated Essay Scoring Doesn’t Work (m...Artificial Unintelligence:Why and How Automated Essay Scoring Doesn’t Work (m...
Artificial Unintelligence:Why and How Automated Essay Scoring Doesn’t Work (m...
 
Supervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmSupervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithm
 
Proposal
ProposalProposal
Proposal
 
ARlab RESEARCH | Social search
ARlab RESEARCH | Social searchARlab RESEARCH | Social search
ARlab RESEARCH | Social search
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Seshadri ML Report

  • 1. Predicting Question Quality in Community Question Answering Sites Juan M. Caicedo C. Seshadri Sridharan Aarti Singh AndrewID: jcaicedo AndrewID: seshadrs (Project mentor) jcaicedo@cs.cmu.edu seshadrs@andrew.cmu.edu Abstract We present a model to predict question quality, an abstract measure, using only the content level features available at the time a new question is posted. We first pre- dict asker satisfaction using preexisting labels, and then predict different aspects of question quality using human annotations. For the former task, we use features from question text and community interaction to improve on the baseline model of Liu et al. For the latter, we hypothesize that question content and community response can independently model question quality, and enrich the content based model using co-training. 1 Introduction Community Question Answering (CQA) web sites allow the users to post questions, submit an- swers and interact with other users by voting, writing comments or by using other mechanisms of participation. These sites have become a popular source for seeking information of different kinds, both for general topics, such as Yahoo Answers or Answer Bag, and for more specialized ones, like Quora and StackOverflow. One key element to the success of a CQA site is the quality of the con- tent generated by the community of users. Particularly, the quality of the questions affects directly the relevance of the content, the willingness of the community to participate and the likelihood that visitors to the site want to engage in the process. For this reason, we think that it is important to understand the factors that affect the quality of questions and, if possible, to be able to assess its quality automatically. Detecting the quality of the questions also benefits the users of a CQA site. First, the askers can know in advance if the questions they ask will be graded as high quality. This would allow them to learn to ask better questions and, ideally, improve the satisfaction that they get from the site. Second, the moderators of a CQA can monitor the quality of the recently posted questions; this would allow them to detect and improve those that are low quality and to highlight the high quality questions so that they receive more attention by the community. We call the first application the online scenario, when the question is being asked, and the second one the offline scenario, after the question has been posted and the community has started to participate. Although several problems of CQA have been addressed using diverse machine learning techniques, predicting question quality poses challenges that have not been covered in much detail. The main difficulty arises from the nature of the two possible applications. In the offline scenario, machine learning algorithms can use features extracted from the community reaction to the question, which is a reliable indicator of the quality of the content, whereas in the online scenario this information is not available and the algorithms have to rely on the asker’s profile and the text of the question, which requires techniques from NLP to extract informative features about the quality. In this project we present a model for predicting the question quality in the online scenario. First, we extend the existing work for predicting asker satisfaction [3] and we test its applicability across a different dataset. Second, we improve it by using richer linguistic features extracted from the 1
  • 2. question content. Then, after showing the high predictability of the models on this task, we move to the related problem of predicting question quality. For this task we use manually labeled questions to train the models again. To overcome the problem of labeling a large set of questions, we use co-training to generate more training instances that allow us to improve both models. 2 Related Work The interest in CQA sites has also increased within research areas related to information retrieval. Much of that work has focused on content ranking and recommendation, content analysis, social network analysis, user modeling and quality evaluation. [1] and [3] present an overview of the research done in those areas. We will discuss here the works that are closely related to our tasks. A framework for automatically classifying content of high quality in social media sites is described in [1], where the quality is modeled in terms of the content itself and of the relations between the user and the generated content and its usage statistics. However, they see the features extracted from the content and the features derived from the community as complementary, and they do not study the differences of the online and offline scenarios. The problem of predicting user satisfaction is studied in [3]. It can be argued that modeling user satisfaction can be used as an approximate measure of the quality of the content they create. We believe that the satisfaction of an asker depends on the response from the community generated by his or her questions, which depends in turn on the quality of the question itself. Liu et al. present a prediction model that uses features based on the content and the community structure and evaluate it the two scenarios that we are considering. We extend their work by using richer text based features and exploiting additional interactions from the community. In [5], Shah and Pomerantz present a study where they train a classifier that accurately predicts the quality of an answer based on human judgment. We take a subset of the criteria used by the human judges that participated in their study, and we use it to assess the quality of the questions. 3 Predicting Question Quality The problem of assessing the quality of a question is subjective, since it depends on several factors that can vary in the context of the evaluation. For this reason, we decide to address two related tasks: (1) predict the asker satisfaction as an indicator of the question quality and (2) predict the quality assessments assigned by humans. Predicting Asker Satisfaction: We define that an asker is satisfied if he selects one of the posted answers he received as the best one for his question; additionally, this answer must have at least 2 votes. We based this definition on the proposed by Liu et al. [3], and we adapt it to the data that we have for this task. We add the constraint on the number of votes to also consider the judgment of the community. Thus, we have a binary classification task where we have to predict whether the asker of a question was satisfied or not. Predicting Question Quality based on human assessments We use human judgments to assess the question quality based on five aspects of question quality: readability, conciseness, detail, politeness and appropriateness. They are a subset of the criteria used by [5] to measure answer quality on a CQA site. The questions were annotated giving a value on a 1 to 5 scale for each of the selected criteria and we aggregate them to define an indicator of the overall question quality. Under this aggregated measure, we define that a high quality question has values greater than or equal to three for at least 3 of the criteria. This is again a binary classification task, where our target label is this aggregated measure. 4 Task 1: Predicting Asker Satisfaction 4.1 Experimental Setup In this section we present the experimental setup for the asker satisfaction task. We describe the dataset, features, classification algorithms and the evaluation metrics used for each of the experi- ments. 2
  • 3. Online Features Offline Features Question Content Features Community Interaction Features Title length Question favourite count Content punctuation density Question’s community score Text spacing density Number of question revisions Content body length Question new tag change count Code block counts, total length Community Answers Features Time(hour) posted Answers count Tag count Answers score max Extended Question Content Features Answers score total Text misspelling ratio Best Answer body length Text capitalization ratio Best Answer body spacing density Text blacklisted word count Accepted count Words per sentence Accepted ratio Uppercase word length ratio Answers to question ratio Number of sentences Answers reputation Text similarity with the text of questions Answerers Profile Features where the user is satisfied. Average Answerer membership age Similarity of the sequence of POS tags with the Most voted, Most reputed answerer’s answer accepted answer count questions where the user is satisfied. Most voted, Most reputed answerer’s answer acceptance ratio Asker Profile Features Most voted, Most reputed answerer’s reputation Answers to Questions ratio Most voted, Most reputed answerer’s question solved count Answers received Membership age Solved Question count Average past score Recent past score Table 1: Feature classes and their features Dataset: Our dataset is based on the Stack Exchange network of CQA sites 1 . It contains 2.2 mil- lion of questions and 4.8 million of answers, along with the complete information for the questions, the answers posted, the selected answer by the asker, the user information (askers and answerers) and the community response (votes, comments and modifications) for 35 of their sites. We selected StackOverflow, a site dedicated to computer programming. For our experiments we randomly se- lected 5,000 questions that were at least 2 month old and their corresponding 10,902 answers. There are 1,734 (34.68 %) questions where the user is satisfied; this distribution is similar to the one of the original dataset (33.72 %). Features: We use two sets of features corresponding to the scenarios of our task. For the online scenario, we extract features from the text of the question and the profile of the asker; for the offline scenario, we add features from the answers posted and the reaction of the community to the question. As our baseline model we use the features proposed in [3]; we extend them by adding more richer features extracted from the text. Table 1 presents the list of the features used. Algorithms: We trained three classifiers based on decision trees, logistic regression and Naive Bayes. We chose these algorithms since they have been used successfully in CQA related problems. We are also particularly interested in using decision trees for their readability, given the application of our task. We used the algorithm implementations provided in Scikit-learn toolkit [4]. Metrics: We report the overall accuracy of the classifiers, along with the averaged measures of precision, recall and the F1 score over the two classes. We perform 10-fold cross-validation over 5,000 training instances. 4.2 Experiment Results We present in this section the results of the predicting asker satisfaction task. First, we evaluated the performance of the algorithms varying the number of training instances and, for decision trees, vary- ing their parameters. We then evaluated the best algorithm, logistic regression, using the different sets of features for each of the scenarios. Finally, we report the features with higher predictability according to their information gain. Algorithm evaluation: Since we are mainly interested in the online scenario, we compared the algorithms using the corresponding set of features. In the case of decision trees, we evaluated first the maximum depth of the learned tree in order to choose the appropriate value for our task. Figure 1 presents how the complexity of the tree affects the accuracy of the training and the test set. We chose a maximum depth of 8 for the rest of our experiments. The algorithm performs poorly below 1 http://stackexchange.com/ 3
  • 4. Features Accuracy Precision Recall F1 Baseline offline 0.8177 0.8028 0.6411 0.7126 Offline 0.8175 0.7994 0.6451 0.7135 Baseline online 0.6841 0.5822 0.3747 0.4544 Online 0.6887 0.5886 0.3912 0.4692 Question only 0.6607 0.5337 0.2976 0.3811 Table 2: Classification results for the different sets of features depth 8 and it starts to over-fit beyond it. The error bars in this and the upcoming figures correspond to the standard error of the sample mean. Although decision trees have a bad performance in this scenario (accuracy 66.06%), they achieve better results in the offline scenario, where a tree of depth 8 was 87% accurate. This can be explained by examining the features with most Information Gain presented in Table 3. We compared all the algorithms varying the number of training instances. The performance of logistic regression based learner has been the best overall. In fact, using L1 regularization we achieve the highest accuracy (70%), this shows that some of the features might be redundant and we can perform more experiments using feature selection. For using Naive Bayes learner, we normalized the values of the features in order to adjust their scales and to use the same value for smoothing all the features when there are zero values. Different feature sets: We evaluated the logistic based classifier (L1) using five sets of features: two sets based on [3] are considered as baselines for each scenario, then two sets with extended features and another one considering only the features extracted from the question content. The Table 2 presents these results. Using the new features, the classifiers are slightly better than the baseline. Most informative features: Of the new text based features we added to our satisfaction prediction model, only the misspelling ratio appears (with a weak value of 0.008) in the top Information Gain features presented in Table 3. The features in the baseline, such as asker information, are much superior to the ones from the text. This could be a reason as to why we observe a decrease in performance by supplementing the baseline with richer textual features. We also observed that all algorithms performed better at predicting the unsatisfied questions better than the satisfied ones. This could be attributed to the inherent skewed distribution: there are 1.85 times unsatisfied questions as the satisfied ones. Its also possible that there exist questions for which many users haven’t selected an answer, in spite of having gotten an answer. Another interesting phenomenon we observe is that the seniority and successfulness of users seems to divulge most information about the asker satisfaction. We can see that total number of questions solved by asker and his membership age are the two most important features in the online scenario. 1.0 ● ● ● ● ● ● 0.9 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Accuracy 0.8 0.7 0.6 0.5 1 2 3 4 5 6 7 8 9 20 30 50 Tree depth Dataset ● Test ● Train Figure 1: Decision Trees accuracy varying the tree depth 4
  • 5. Accuracy Precision 0.70 ● 0.60 ● ● ● ● ● ● ● ● 0.68 ● ● ● ● ● ● 0.55 ● ● ● ● ● 0.66 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.64 ● ● 0.50 ● ● ● ● ● ● ● ● ● ● 0.62 ● ● ● 0.45 ● 0.60 ● ● ● ● ● 0.58 ● 0.40 ● ● Recall F1 ● ● ● ● ● 0.40 ● ● ● ● ● ● 0.45 ● ● ● ● ● ● ● ● ● ● ● ● ● 0.35 ● ● ● 0.40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.30 ● ● ● ● ● 0.35 ● ● 0.25 ● 0.30 0.20 ● ● ● ● 0.25 ● ● ● ● 1000 2000 3000 4000 1000 2000 3000 4000 Training instances ● Decision Tree ● Logistic Regression (L1) ● Logistic Regression (L2) ● Multinomial NB Figure 2: Performance on Online Satisfaction Prediction task. Online Offline Feature Information Gain Feature Information Gain Asker’s total number of questions solved 0.0453 Best answer score 0.5449 Asker’s membership age 0.0424 Highest answerer reputation 0.1139 Asker’s answers to question ratio 0.0338 Community score for question 0.11168 Asker’s average past question score 0.0307 Answerer’s reputation 0.10283 Asker’s recent questions score 0.0166 Top value of answerers’ accepted answer count 0.09611 Question code length 0.0112 Top value of answerers’ question solved count 0.09234 Question text unigram entity 0.0083 Top value of answerers’ answer count 0.09218 Question text misspelling ratio 0.0080 Answerers’ answer accepted count 0.09153 Question text bigram the top 0.0059 Reputation of most voted answerer 0.09129 Question tag android 0.057 Top answerer’s answer count 0.08176 Question content only Feature Information Gain Question code length 0.01122 Question text unigram entity 0.00838 Question text misspelling ratio 0.00809 Question text bigram the top 0.00594 Question tag android 0.00571 Question body length 0.00515 Question text unigram url 0.00481 Question text bigram using this 0.00458 Question text bigram reference to 0.00455 Question text bigram any ideas 0.00444 Table 3: Feature sets ranked by the Information Gain Importantly, this is in accord with the general trend of success and satisfaction patterns we would expect in CQA sites. 5 Task 2: Predicting Question Quality based on human assessment Our main interest in the question quality task is to evaluate the classifiers on a semi-supervised set- ting as a solution to training data scarcity. We continue the experiments performed for the previous task, but this time we evaluate the classifiers on the dataset of annotated questions. In addition to the evaluate the different algorithms using the features available on each scenario (online and offline), we apply co-training in order to expand the training examples. In the following section we present these experiments and their results. Definition of Question Quality The merit or quality of a question is a highly subjective factor that is difficult to quantify. Since it cannot be measured directly or extended from any feature(s) available, we had to hand annotate it. We define the quality of a question as a combination of five metrics: conciseness, politeness, readability, detail and relevance rated on a scale of 1 to 5. 5
  • 6. In addition to these metrics, we also annotated them with a quality label, that represents our judgment on the question quality. We used this measure to understand the importance of the five metrics and their combined influence on question quality.We found that conciseness, readability, politeness and detail are reliable estimators of question quality. We looked at different patterns of values the metrics took. The most consistent one was that high-quality questions had at least three metrics with values greater than or equal to 3. We used the same as our label for question quality. This rough rule for labeling was followed because high quality questions occurred in different forms: they were concise and readable but not detailed, detailed and polite but not concise, readable and polite but not relevant et cetera. In addition to that, our question-quality labeling rule also had considerable correlation with our hand annotated quality metric. 5.1 Experimental Setup We use the algorithms, features and evaluation metrics defined in the Asker Satisfaction task, but we apply them for the dataset of the manually labeled questions. This dataset contains 172 instances where 127 (73.83 %) of them are labeled as high quality, and the rest (45, 26.16%) as low quality. Another difference in this task is that we perform 4-fold cross validation, since the number of training instances is much smaller. Co-Training: For increasing the number of training examples, we apply this technique as presented in [2], making an adjustment for our task: since the target classes are not balanced, we ensure that the classifiers add the appropriate number of instances that preserves the class distribution. We train two classifiers where each one uses one of the set of features of the online and offline scenarios. The following are the values that we assign to the four parameters of the co-training algorithm: • p and n: number of positive and negative examples labeled by the classifiers and added to the training pool. We set this values to p = 3 and n = 1. • k Number of iterations. We set this parameter to k = 100. • u Number of unlabeled examples that are labeled by the classifiers in each iteration. We perform our experiments using different values. For the evaluation, we use 4-fold cross validation as follows: we partition the set of labeled instances in 4 subsets, maintaining the same class distribution on each subset. We select 3 subsets for training the classifiers and leave the remaining subset for testing them after each iteration. The subsets selected for training are going to be extended with the new instances labeled by the classifier, while the test subset is not modified. In each iteration we evaluate the accuracy, precision, recall and F1 score of each classifier. 5.2 Experiment Results Question quality and feature sets: We evaluated the same classifiers that we used on the Asker satisfaction task but for predicting the label assigned after the annotation; we also compared them using the two sets of features (online and offline). The averaged results are presented in Figure 3. Again, the best results are obtained with the logistic regression based learner using L1 regularization, with an accuracy of 0.74418 and a F1 score of 0.83851. However, for this task the classifiers are more accurate and the features from the online scenario have higher predictability. Cotraining: We evaluated the overall improvement across the iterations and the effect of varying the parameter u (the number of unlabeled instances that the classifiers will label) on the performance of each classifier. Figure 4 shows the F1 score of each classifier for each iteration of five experiments running cotraining with different values for the parameter u. We note that the accuracies of both classifiers improved, however the one for the offline scenario, which initially was the weakest, improved with larger margins up to the point to achieve the perfor- mance of the other classifier (when u = 75 and u = 100). Regarding the number of iterations, in general the improvements occur within the first 50. At this point, the training set varies from 213 to 255 instances, depending on the value of u. This variation is the effect of the random sampling on the set of unlabeled data, which is controlled by this parameter. 6
  • 7. Accuracy F1 Multinomial NB Logistic Regression (L2) Logistic Regression (L1) Decision Tree 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 Feature set Both Offline Online Figure 3: Evaluation of the classifiers on the Question Quality task. 0.88 0.86 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.84 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.82 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.80 ● ● ● ● ● ● ● ● ● ● ● ● ● 0.78 ● ● ● 0.88 0.86 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.84 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 0.82 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.80 ● ● ● ● ● 0.78 ● ● ● ● 0.88 ● 0.86 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.84 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 75 ● ● ● ● ● ● ● ● 0.82 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.80 ● ● ● ● ● ● ● ● 0.78 ● ● F1 0.88 0.86 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.84 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.82 ● ● ● ● ● ● ● ● ● ● ● ● ● 0.80 ● ● ● ● ● ● 0.78 ● 0.88 0.86 ● ● ● ● ● ● ● ● ● ● ● ● ● 0.84 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 200 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.82 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.80 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.78 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.88 0.86 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.84 ● 500 ● 0.82 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.80 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.78 ● ● ● ● 0 10 20 30 40 50 60 70 80 90 100 Iteration ● Offline ● Online Figure 4: Evaluation of the classifiers in co-training varying the parameter u. The best performance was obtained when u = 75. In this case, the F1 score of the classifier that uses the offline features increased from 0.77662 to 0.85222, while for the online classifier the im- provement was from 0.83852 to 0.86456. This coincides with the remark made in [2] about the training pool size, which notes that selecting the unlabeled instances from a smaller set forces the classifiers to learn from examples that are more representative of the underlying distribution. We also examined the confidence of the classifiers for each target class, i.e. the minimum probability predicted by the classifiers for the instances that were added to the training set. For the positive class (high quality), the lowest value was 0.92328, while for the negative class it was 0.20913. This uncertainty can be attributed to the fact that the classes are unbalanced. Table 4 presents a summary of the results of the experiments. 7
  • 8. Feature set Initial F1 u Max. F1 Iteration with Max. F1 Gain Offline 0.77662 25 0.83277 30 0.05615 50 0.82055 15 0.04393 75 0.85222 80 0.07560 100 0.83854 17 0.06192 200 0.81701 84 0.04040 500 0.81256 22 0.03594 Online 0.83852 25 0.86227 76 0.02375 50 0.85710 11 0.01858 75 0.86456 27 0.02604 100 0.85415 70 0.01563 200 0.85130 8 0.01278 500 0.86388 8 0.02536 Table 4: Maximum values of the F1 in co-training achieved varying the parameter u. 6 Conclusions From the above experiments, we have learnt vital insights about the performance of the algorithms in the asker satisfaction and question quality prediction. We realized the effect of the skew in asker satisfaction in the classification accuracy values. We have also seen certain well known CQA trends manifest in the feature analysis. The offline features clearly predict the asker satisfaction better. Our approach to look at a question as having offline and online views has been successful, giving us insights into question quality. We characterized question quality, an abstract measure as a combination of particular aspects. Our approach to quantify question quality showed the importance of the contributing metrics individu- ally, besides revealing its high subjectivity, and we were able to train classifiers that predicted the values of the annotations. We used co-training to expand the training data; this increased the predictive performance of both classifiers, but mainly the one based on the offline set of features. Furthermore, we learnt the the quality and consistency of the annotations are a limitation of this technique in this scenario. While we experimented with different learning algorithms and feature sets, we found that both asker satisfaction and question quality can be modeled similarly. Through our experiments, we were able to show that we can improve question quality prediction by defining more specific features and by expanding the training set using semi-supervised learning. References [1] Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, and Gilad Mishne. Finding high-quality content in social media. Proceedings of the international conference on Web search and web data mining WSDM 08, page 183, 2008. [2] Avrim Blum and Tom Mitchell. Combining Labeled and Unlabeled Data with Co-training. In COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann Publishers, 1998. [3] Yandong Liu, Jiang Bian, and Eugene Agichtein. Predicting information seeker satisfaction in community question answering. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval SIGIR 08, (Section 2):483, 2008. [4] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Pret- tenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Per- rot, and Duchesnay E. Scikit-learn: Machine Learning in Python . Journal of Machine Learning Research, 12:2825–2830, 2011. [5] C Shah and J Pomerantz. Evaluating and Predicting Answer Quality in Community QA. Li- brary, (March 2008):411–418, 2010. 8