Nowadays, Question Answering (Q&A) websites are popular source of information for finding answers to all kind of questions. Due to this popularity it is critical to help the identification of best answers to existing questions for simplifying the access to relevant information.
Although it is possible to identify relatively accurately best answers by using binary classifiers coupled with user, content and thread features, existing works have generally ignored to incorporate the thread-like structure of Q&A communities in the design of best answer identification predictors.
This paper investigates this particular issue by studying structural normalisation techniques for improving the accuracy of feature based best answer identification models.
Thread-based normalisation methods are introduced for improving the accuracy of identification models by introducing a systematic normalisation approach that normalise predictors by taking into account relations between features and the thread-like structure of Q&A communities. Compared to similar non normalised models, better results are obtained for each of the three communities studied. These results show that structural normalisation methods can improve the identification of best answers compared to non-normalised models.
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
1. GRÉGOIRE BUREL, PAUL MULLHOLAND, HARITH ALANI
Knowledge Media Institute, The Open University, Milton Keynes, UK.
WWW’16, Montréal, Canada.
11th April 2016
Structural Normalisation Methods for
Improving Best Answer Identification in
Question Answering Communities
2. Outline
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
Publications
2
Identifying Best Answers in Q&A Communities
- Question Answering Communities
- Automatic Best Answer Identification
- Reference Model
Structural Normalisation
- No Normalisation Vs. Structural Normalisation
- Normalisation Methods
- Normalisation Method Selection
Best Answer Identification with Structural Normalisation
- Models Comparison
- Features Comparison
Conclusions and Future Work
3. Q&A Communities
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
Publications
Question
Answer #1
Answer #2
...
Answer #n
QuestionThread
3
Q&A communities are communities
composed of askers and answerers looking
for solutions to particular issues.
When looking for answers, users need to
identify if similar questions have already been
answered correctly and see if a best answer
exists.
Unfortunately not all questions have labelled
best answers (43.2% of questions do not have
labelled best answers).
Existing works that have focused on best
answer identification have also generally
ignored the structure of Q&A websites while
designing feature normalisation methods.
?
!
-
4. Q&A Communities
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
Publications
Question
Answer #1
Answer #2
...
Answer #n
QuestionThread
4
Q&A communities are communities
composed of askers and answerers looking
for solutions to particular issues.
When looking for answers, users need to
identify if similar questions have already been
answered correctly and see if a best answer
exists.
Unfortunately not all questions have labelled
best answers (43.2% of questions do not have
labelled best answers).
Existing works that have focused on best
answer identification have also generally
ignored the structure of Q&A websites while
designing feature normalisation methods.
?
!
-
5. Q&A communities are communities
composed of askers and answerers looking
for solutions to particular issues.
When looking for answers, users need to
identify if similar questions have already been
answered correctly and see if a best answer
exists.
Unfortunately not all questions have labelled
best answers (43.2% of questions do not have
labelled best answers).
Existing works that have focused on best
answer identification have also generally
ignored the structure of Q&A websites for
designing feature normalisation methods.
Q&A Communities
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
Publications
Question
Answer #1
Answer #2
...
Answer #n
QuestionThread
?
!
-
5
6. Feature-based Best Answers Identification and Reference Model
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
6
In order to identify best answers features
are extracted and associated with each
answers and a binary classifier is trained
(Burel et al. 2012 , 2013, 2014):
- Such features are divided into User, Content
and Thread features.
- 30 features are used for the reference model.
- The Alternating Decision Tree algorithm is
used.
Reference Model Results:
- The reference model achieve an F1 of 0.817
on average.
- Thread features in particular score ratios are
highly related to best answers.
What methods could be used for
improving such model??
7. Feature-based Best Answers Identification and Reference Model
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
7
In order to identify best answers features are
extracted and associated with each answers
and a binary classifier is trained (Burel et al.
2012 , 2013, 2014):
- Such features are divided into User, Content and
Thread features.
- 30 features are used for the reference model.
- The Alternating Decision Tree algorithm is used.
Reference Model Results:
- The reference model achieve an F1 of 0.817 on
average.
- Thread features in particular score ratios are
highly related to best answers.
What methods could be used for improving
such model??
8. Contributions
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
8
Does the thread-like structure of Q&A communities can help the automatic
identification of best answers ?
- Introduce the concept of structural normalisation, a method for
normalising features based on the structure of Q&A communities.
- Present 3 different features normalisation methods: 1) Min/Max
normalisation; 2) Order Normalisation, and; 3) Normalised Order
Normalisation.
- Identify the best normalisation method for automatically identifying best
answers.
- Compare different models and features sets and show that structural
normalisation improves best answer identification.
?
9. Structural Normalisation / No Normalisation
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
9
Question
Answer #1
Answer #2
...
Answer #n
QuestionThread
Question
Answer #1
Answer #2
...
Answer #nQuestionThread
Question
Answer #1
Answer #2
...
Answer #nQuestionThread
Question
Answer #1
Answer #2
...
Answer #nQuestionThread
Question
Answer #1
Answer #2
...
Answer #nQuestionThread
Standard classifiers such as the reference model compare all the answers even if
they are not part of the same thread.
Thread-wise normalisation normalise features based on their relation within a
thread.
!
10. Structural Normalisation / No Normalisation
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
10
Question
Answer #1
Answer #2
...
Answer #n
QuestionThread
Question
Answer #1
Answer #2
...
Answer #nQuestionThread
Question
Answer #1
Answer #2
...
Answer #nQuestionThread
Question
Answer #1
Answer #2
...
Answer #nQuestionThread
Question
Answer #1
Answer #2
...
Answer #nQuestionThread
Standard classifiers such as the reference model compare all the answers even if
they are not part of the same thread.
Thread-wise normalisation normalise features based on their relation within a
thread.
!
11. Normalisation Methods
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
Min/Max Normalisation
(Thread-wise min/max)
Normalise features within a thread using
the minimum and maximum value of a
particular feature.
1
2
Normalised Order
Normalisation
Same as Order Normalisation but
the values are normalised between 0
and 1.
3
11
Order Normalisation
Generalises Gkotsis et al. (2014)
approach. Features within a thread
are ranked according to their value.
The lowest value is transformed to
the length of the thread while the
highest value is transformed to 1.
12. Normalisation Methods
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
Min/Max Normalisation
(Thread-wise min/max)
Normalise features within a thread using
the minimum and maximum value of a
particular feature.
1
Order Normalisation
Generalises Gkotsis et al. (2014)
approach. Features within a thread
are ranked according to their value.
The lowest value is transformed to
the length of the thread while the
highest value is transformed to 1.
2
Normalised Order
Normalisation
Same as Order Normalisation but
the values are normalised between 0
and 1.
3
12
Adaptive Feature Normalisation
Not all features need to be normalised. For instance, number of question views does not change within a question thread
but does across questions. If for a given feature the variance is zero and remains constant between all the threads, the
feature is not normalised. Otherwise, the feature is normalised with one of the previous functions.
!
13. Normalisation Method Selection
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
13
?
What normalisation method is the most useful ?
Approach
1. Calculate the average Information Gain (IG) of each feature associated with each
normalisation method.
2. Select the normalisation method that has the highest average IG across all the datasets.
14. Normalisation Method Selection
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
14
?
What normalisation method is the most useful ?
Approach
1. Calculate the average Information Gain (IG) of each feature associated with each
normalisation method.
2. Select the normalisation method that has the highest average IG across all the datasets.
15. Best Answers Identification Models Comparisons
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
15
16. Best Answers Identification Models Comparisons
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
16
General Results
- Thread normalisation approaches improves best answer identification (p =
2.817e − 05).
- On average, an increase in F1 of +5.3% is observed compared to the non
normalised models.
Baseline Models Results
- In general, thread normalisation approaches improves best answer
identification: On average F1 = 0.718 (+26.8%) .
- The longuest answers within a thread is more likely to be associated with a
best answers.
Core Features Models Results
- Unsurprisingly, there is no increase in F1 for thread features but there is a
significant increase for other features sets (+20% F1).
- Content features are more important within a thread whereas user features
are better for discriminating best answers globally.
Extended Features Models Results
- When score are present, normalisation does not improve as score ratio
already perform very well.
8
0
1
17. Best Answers Identification Models Comparisons
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
17
General Results
- Thread normalisation approaches improves best answer identification (p =
2.817e − 05).
- On average, an increase in F1 of +5.3% is observed compared to the non
normalised models.
Baseline Models Results
- In general, thread normalisation approaches improves best answer
identification: On average F1 = 0.718 (+26.8%) .
- The longuest answers within a thread is more likely to be associated with a
best answers.
Core Features Models Results
- Unsurprisingly, there is no increase in F1 for thread features but there is a
significant increase for other features sets (+20% F1).
- Content features are more important within a thread whereas user features
are better for discriminating best answers globally.
Extended Features Models Results
- When score are present, normalisation does not improve as score ratio
already perform very well.
8
0
1
18. Best Answers Identification Models Comparisons
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
18
General Results
- Thread normalisation approaches improves best answer identification (p =
2.817e − 05).
- On average, an increase in F1 of +5.3% is observed compared to the non
normalised models.
Baseline Models Results
- In general, thread normalisation approaches improves best answer
identification: On average F1 = 0.718 (+26.8%) .
- The longuest answers within a thread is more likely to be associated with a
best answers.
Core Features Models Results
- Unsurprisingly, there is no increase in F1 for thread features but there is a
significant increase for other features sets (+20% F1).
- Content features are more important within a thread whereas user features
are better for discriminating best answers globally.
Extended Features Models Results
- When score are present, normalisation does not improve as score ratio
already perform very well.
8
0
1
19. Best Answers Identification Models Comparisons
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
19
General Results
- Thread normalisation approaches improves best answer identification (p =
2.817e − 05).
- On average, an increase in F1 of +5.3% is observed compared to the non
normalised models.
Baseline Models Results
- In general, thread normalisation approaches improves best answer
identification: On average F1 = 0.718 (+26.8%) .
- The longuest answers within a thread is more likely to be associated with a
best answers.
Core Features Models Results
- Unsurprisingly, there is no increase in F1 for thread features but there is a
significant increase for other features sets (+20% F1).
- Content features are more important within a thread whereas user features
are better for discriminating best answers globally.
Extended Features Models Results
- When score are present, normalisation does not improve as score ratio
already performs very well.
8
0
1
20. Features Importance
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
20
?
Approach
1. Calculate the Information Gain (IG) of each feature for each dataset with and without
normalisation.
2. Order the features by IG across all the datasets. The higher the IG the more important the
feature.
What features are the most useful ? How does order normalisation impact feature
importance?
22. Features Importance - Results
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
22
General Results
- As previously observed, scores based features are the most important best answer
predictors.
- Similarly, user features are more important than content features when features are not
normalised. When features are normalised the difference diminishes.
Features Comparison
- Number of comments is associated with best answers when normalised. Therefore, the
relative amount of comments may be a good indicator of best answers.
- The relative number of answer for a user is an important indicator for identifying best
answers (i.e. good answers comes from users that post a lot of answers)
- Answers with relatively original content are more likely to be linked with best answers.
- Relative topic reputation is not really important meaning that this feature only helps when
distinguishing best answers in a global context.
23. Conclusions and Future Work
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
Conclusions
- Compared to the reference models,
structural approaches improve automatic
best answer identification by around +5%
F1.
- Thread-normalisation change the
importance of features (e.g. content length)
- Score features are overwhelming
important.
- Structural methods may be used
successfully for other classification tasks
where analysed communities are highly
structured.
23
1
24. Conclusions and Future Work
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
Conclusions
- Compared to the baseline models,
structural approaches improve automatic
best answer identification by around +5%
F1.
- Thread-normalisation change the
importance of features (e.g. content length)
- Score features are overwhelming
important.
- Structural methods may be used
successfully for other classification tasks
where analysed communities are highly
structured.
24
1
2
Future/Current Work
- Evaluate other types of normalisation (e.g.
signed feature ratios)
- Consider other structural optimisation
methods (e.g. Learning-to-Rank models)
- Consider applicability to other type of
communities
25. Questions and Discussion
@
Email: g.burel@open.ac.uk
Twitter: @evhart
Structural Normalisation Methods for Improving Best Answer Identification in Question Answering Communities
25