tmptmptmp123.pptx

Generative Pseudo Labeling
윤용선
1

0. Paper
• GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval
• Authors: Kexin Wang, Nandan Thakur, Nils Reimers, Iryna Gurevych
• Published: 2021.12 (Arxiv)
• https://arxiv.org/abs/2112.07577
2

0. Preliminaries
• Information Retrieval
• Query와 관련이 있는 문서를 찾는 작업 (관련이 있는 = 대답할 수 있는)
• Open-domain QA: IR + MRC
• Method: 쿼리와 가장 높은 Score(Similarity) 를 갖는 문서 선택
• Sparse embedding vs Dense embedding
• Keyword/고유명사는 sparse, Synonym/Paraphrase는 dense
3

0. Preliminaries
- 빠른 검색 (Maximum
Inner Project Search)
- 아쉬운 성능
- 좋은 성능
- 엄청 느림
Retriever -> Reranker -> Reader
4

1. Introduction
• Recently, information retrieval methods based on dense vector spaces have become popular to
address the limitation of sparse vector.
• Dense retrieval methods require large amounts of training data to work well.
• Dense retrieval methods are extremely sensitive to domain shifts.
• Models trained on MS MARCO perform rather poorly for questions for COVID-19 scientific
literatures.
• Models did not learn how to represent this topic well in a vector space.
• We present Generative Pseudo Labeling (GPL), an unsupervised domain adaptation for dense
retrieval models.
6

2. Method
• For a given target corpus, we generate for each passage three queries using T5-encoder-decoder
model.
• For each of the generated queries, we use an existing retrieval system to retrieve 50 negative
passages.
• For each (query, positive, negative) – tuple we compute the margin score using cross-encoder.
• Train the bi-encoder with margin score.
7

2. Method
• Multiple Negative Ranking loss considers only the coarse relationship between queries and
passages., i.e. the matching passage is considered as relevant while all other passages are
considered irrelevant.
• However, the query generator might generate queries that are not answerable by the passage.
Further, other passages might actually be relevant as well for a given query.
• MarginMSE loss uses a powerful cross-encoder to soft-label (query, passage) pairs. It then teaches
the dense retriever to mimic the score margin between the positive and negative query-passage
pairs.
In GPL,
- Bad query -> low pos score -> distant
- False negative -> high neg score -> similar
MarginMSE Loss
8

3. Experiments
• Query generator: docT5query
• Negative miner(Retriever): msmarco-distilbert-base-v3, msmarco-MiniLM-L-6-v3
• 50 negatives using each retriever and uniformly sample
• Cross encoder: msmarco-MiniLM-L-6-v2
• Student: MS MARCO DistilBERT + Mean pooling + Dot product
• 140k training steps, 32 batch size (No need of large batch size!)
Experimental Setup
9

3. Experiments
• Six domain-specific text retrieval tasks from the BeIR benchmark
• Evaluation is done using nDCG@10
• 더 관련있는 문서를 더 높은 순위로 예측하자!
Evaluation
• Zero-Shot
• MS MARCO: distil-bert dense retrieval trained with MarginMSE
• BM25: lexical matching from Elastic search
• Pre-Training based Domain Adaptation
• SimCSE: encode same sent with different dropout masks + MNRL loss
• ICT: sample one sent from passage as the pseudo query
• TSDAE: denoising autoencoder
• Generation-based Domain Adaptation
• Qgen: generated query + Multiple Negative Ranking loss
Baselines
10

5. Analysis
• GPL begins to be saturated after around 100K steps.
• With TSDAE pre-training, the performance can be improved consistently.
Influence of Training Steps
Influence of Corpus Size
• We find with more than 10K passages, GPL can already outperform the zero-shot baseline
12

5. Analysis
• Generating 3 queries per passages appears to be optimal, generating more queries per passages
does not yield further improvements.
Robustness against Query Generation
Sensitivity to Starting Checkpoints
• We also evaluate to directly fine-tune a distilbert-model using QGen
13

6. Conclusion
• In this work we propose GPL, a novel unsupervised domain adaptation method
for dense retrieval models.
• Pseudo-labeling overcomes two important shortcomings of previous methods.
• Not all generated queries are of high quality
• Training with mined hard negatives can be noised
• We observe GPL performs well on all the datasets and significantly outperforms
other approaches.
• As a limitation, GPL requires a relatively complex training setup and future work
can focus on simplify this training pipeline.
14

tmptmptmp123.pptx

Recommended

Recommended

More Related Content

Similar to tmptmptmp123.pptx

Similar to tmptmptmp123.pptx (20)

Recently uploaded

Recently uploaded (20)

tmptmptmp123.pptx

Editor's Notes