Selective Gradient Boosting for Effective Learning to Rank - SIGIR 2018

Selective Gradient Boosting
for Effective Learning to Rank
Claudio Lucchese1
, Franco Maria Nardini2
, Raffaele Perego2
,
Salvatore Orlando1, Salvatore Trani2
1 Ca’ Foscari University of Venice, Italy
2 ISTI-CNR, Pisa, Italy
l a b o r a t o r y
2018SIGIR

Scenario
• Learning to Rank (LtR) for modern IR systems
• Supervised approach exploiting graded-relevance labels
• Multi-Stage (2+) IR system
• First stage optimizes Recall
• Following stage(s) deal with Ranking
• First stage of production systems
outputs thousands of documents
• Only a small fraction of them are relevant [1]
• Ranking models should be trained accordingly
Credit: “Efficient and Effective Retrieval using Selective Pruning”
Tonellotto, N., Macdonald, C., & Ounis, I. ACM WSDM 2013
[1] Yin, D. et al. Ranking Relevance in Yahoo Search. In ACM SIGKDD 2016.

Problem
• With a high number of negative instances
• Higher class unbalance
• Better coverage of negative cases
• Research Questions:
• Does the volume of negative instances impacts the performance of state-of-
the-art LtR algorithms?
• Is it possible to improve the effectiveness of the learned model by exploiting
the diversity and variety of the negative instances?

Intuition
• Key idea: sampling of negative instances at each iterationof a
gradient boosting process
• State-of-the-art algorithms learn a forest of regression trees
incrementally through gradient boosting
• λ-MART [1] does not use sampling
• Stochastic Gradient Boosting [2]: uniform sampling
• Ground-truth labels are not taken into account
[1] Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23-581 (2010)
[2] J. H. Friedman. 2002. Stochastic gradient boosting. Computational Statistics & Data Analysis 38, 4 (2002)

• Smart sampling of the most “useful”
instances on a per-query basis
• All the positive instances
• A fraction of the top-ranked negative instances
• Rationale:
• The negative instances with the highest scores are exactly those being more likely to
be ranked above relevant instances, thus severely hindering model effectiveness
• Selected instance`s are close to the decision boundary between positives and
negatives
Proposal
Relevant document
Irrelevant document
Ranked List

Selective Gradient Boosting (SelGB)
• Identical to λ-MART except for
the red part
• Boosting approach
• λ-gradients for optimizing list-wise
metric (e.g., NDCG)
• Selective Sampling
• Repeated every n iterations
• Samples p% of top negative
instances plus all positives
• Selection depends on the model
learned so far

Experimental Settings
• LtR algorithmextended: λ-MART
• Metric: NDCG@10
• Software: QuickRank1
• Ensembleswith:
• Up to 1,000 trees
• Each tree with up to 64 leaves
• Hyper-parameterstested:
• Sampling rate in {0.1%, 0.25%, 0,5%, 1%, 2.5%, 5%, 10%, 25%}
• Sampling frequency in {1, 10, 50, 100}
1
http://quickrank.isti.cnr.it/

Dataset
• Novel Istella-X (eXtended) dataset1:
• Up to 5k query/doc pairs ranked according to BM25F
• Highly unbalanced:
• Only 4.64 relevant documents per query on average
1
Publicly available at http://quickrank.isti.cnr.it/istella-dataset/
Table 1: Datasets properties.
Properties IX5k IX2.5k IX1k IX500 IX100
# queries 10,000
# features 220
# query-doc pairs 26,791,447 15,778,399 7,363,902 3,995,063 906,782
max # docs/query 5,000 2,500 1,000 500 100
avg. # docs/query 2,679 1,578 736 400 91
# pos. query-doc pairs 46,371 (0.17%) 46,371 (0.29%) 46,371 (0.63%) 46,371 (1.16%) 46,371 (5.11%)
# neg. query-doc pairs 26,745,076 (99.83%) 15,732,028 (99.71%) 7,317,531 (99.37%) 3,948,692 (98,84%) 860,411 (94,89%)
results show that, by exploiting a large number of negative ex-
amples, SGB is able to build ranking models that result to be
more accurate than those learned with state-of-the-art ranking
algorithms.
This section is organized as follow. First we introduce the method-
ology used for the experimental evaluation, and the challenging
To investigate to which extent the presence of negative examples
may inuence the performance of LtR algorithms, we also created
three scaled-down variants of IX5k. For each query in the
training set, we rst sorted all negative examples in descending
order of BM25F scores, and then we discarded everything beyond a
given rank. We used this methodology to produce three datasets,
2.5k 1k 500

λ-MART
• Best effectiveness achieved on
Istella-X500
• λ-MART is not able to exploit the
increased number of negative
instances

Stochastic Gradient Boosting
• Many relevant documents are
discarded when the dataset is
highly unbalanced
• Sampling out positive instances
severely hurts effectiveness

Negative Stochastic Gradient Boosting
• SGB performing the sampling
only on the irrelevant docs
• Still uniformly at random
• Effectiveness is not improved

• Settings:
• Sampling at every iteration
• Several sampling rates
• Gain of 0,024 in NDCG@10
• SelGB is able of exploiting diversity
and variety of irrelevant
documents
• Best sampling rate: 1%
• Improved efficiency of training

• Settings:
• Fixed sampling rate: 1%
• Several sampling frequencies
• Best performance achieved with
frequent re-sampling
• Best sampling frequency: 1

Summary of performance
• Comparison against λ-MART:
• +3.2% in terms of NDCG@10
• Similar effectiveness achieved
with less than 15% of the trees
• Higher scoring efficiency

Spatial usage of negative instances
• Analysis on which documents
have been selected by SelGB
• y-axis: queries sorted by list size
• x-axis: documents ranked by BM25F
• color: selection frequency
• Outcomes:
• Most selected negative instances
have high BM25F rank
• Especially for queries with longer
list, many selected negative
instances have low BM25F rank
• Those documents are important!

Conclusions and Future Work
• We proposed a novel LtR algorithm:
• Selectively choose the most informative negative instances
• Mitigate dataset class unbalance
• Exploit diversity and variety of negative instances
• Outperform λ-MART both in of effectiveness and efficiency
• Future directions:
• Different sampling rates on a per query basis
• Novel selection strategy
• Evaluate improvement in terms of efficiency

17
Salvatore Trani
salvatore.trani@isti.cnr.it

Selective Gradient Boosting for Effective Learning to Rank - SIGIR 2018

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Selective Gradient Boosting for Effective Learning to Rank - SIGIR 2018

Ähnlich wie Selective Gradient Boosting for Effective Learning to Rank - SIGIR 2018 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Selective Gradient Boosting for Effective Learning to Rank - SIGIR 2018