SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Deep Neural Methods for Retrieval
Bhaskar Mitra
Principal Applied Scientist, Microsoft
PhD candidate, University College London
@UnderdogGeek
Topics
Last week
Fundamentals of learning to rank
This week
Deep neural methods for retrieval
Reading material
An Introduction to
Neural Information Retrieval
Foundations and Trends® in Information Retrieval
(December 2018)
Download PDF: http://bit.ly/fntir-neural
The state of neural information retrieval
Growing publication popularity at top
IR conferences
Strong performance against
traditional methods in TREC 2019
latent representation Learning for text
Inspecting non-query terms in the document may reveal important clues about whether the
document is relevant to the query
albuquerque
Passage about Albuquerque Passage not about Albuquerque
Deep Structured
Semantic Model
• Learn latent dense vector
representation of query and
document text
• Relevance is estimated by cosine
similarity between query and
document embeddings
• Relevant document embeddings
should be more similar to query
embeddings than non-relevant
document embeddings
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. Learning deep structured semantic models for web search using clickthrough data. In CIKM, 2013.
But how can we input text into a neural
model?
Different modalities of input text representation
Different modalities of input text representation
Different modalities of input text representation
Different modalities of input text representation
Deep Structured
Semantic Model
To train the model we can use any of the loss
functions we learned about in the last lecture
Cross-entropy loss against randomly sampled
negative documents is commonly used
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. Learning deep structured semantic models for web search using clickthrough data. In CIKM, 2013.
Shift-invariant neural
operations
Detecting a pattern in one part of the input space is similar to
detecting it in another
Leverage redundancy by moving a window over the whole
input space and then aggregate
On each instance of the window a kernel—also known as a
filter or a cell—is applied
Different aggregation strategies lead to different architectures
Convolution
Move the window over the input space each time applying
the same cell over the window
A typical cell operation can be,
ℎ = 𝜎 𝑊𝑋 + 𝑏
Full Input [words x in_channels]
Cell Input [window x in_channels]
Cell Output [1 x out_channels]
Full Output [1 + (words – window) / stride x out_channels]
Pooling
Move the window over the input space each time applying an
aggregate function over each dimension in within the window
ℎ𝑗 = 𝑚𝑎𝑥𝑖∈𝑤𝑖𝑛 𝑋𝑖,𝑗 𝑜𝑟 ℎ𝑗 = 𝑎𝑣𝑔𝑖∈𝑤𝑖𝑛 𝑋𝑖,𝑗
Full Input [words x channels]
Cell Input [window x channels]
Cell Output [1 x channels]
Full Output [1 + (words – window) / stride x channels]
max -pooling average -pooling
Convolution w/
Global Pooling
Stacking a global pooling layer on top of a convolutional layer
is a common strategy for generating a fixed length embedding
for a variable length text
Full Input [words x in_channels]
Full Output [1 x out_channels]
Recurrence
Similar to a convolution layer but additional dependency on
previous hidden state
A simple cell operation shown below but others like LSTM and
GRUs are more popular in practice,
ℎ𝑖 = 𝜎 𝑊𝑋𝑖 + 𝑈ℎ𝑖−1 + 𝑏
Full Input [words x in_channels]
Cell Input [window x in_channels] + [1 x out_channels]
Cell Output [1 x out_channels]
Full Output [1 x out_channels]
Convolutional
DSSM (CDSSM)
Replace bag-of-words assumption by concatenating
term vectors in a sequence on the input
Convolution followed by global max-pooling
Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Gregoire Mesnil. A latent semantic model with convolutional-pooling structure for information retrieval. In CIKM, 2014.
Interaction-based networks
Typically a document is relevant if some part of the
document contains information relevant to the query
Interaction matrix 𝑋—where 𝑥𝑖𝑗 is obtained by
comparing the ith window over query terms with the jth
window over the document terms—captures evidence of
relevance from different parts of the document
Additional neural network layers can inspect the
interaction matrix and aggregate the evidence to
estimate overall relevance
Zhengdong Lu and Hang Li. A deep architecture for matching short texts. In NIPS, 2013.
Kernel pooling
Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. End-to-end neural ad-hoc ranking with kernel pooling. In SIGIR, 2017.
Zhuyun Dai, Chenyan Xiong, Jamie Callan, and Zhiyuan Liu. Convolutional neural networks for soft-matching n-grams in ad-hoc search. In WSDM, 2018.
Lexical and semantic
matching networks
Mitra et al. [2016] argue that both lexical and
semantic matching is important for
document ranking
Duet model is a linear combination of two
DNNs—focusing on lexical and semantic
matching, respectively—jointly trained on
labelled data
Bhaskar Mitra, Fernando Diaz, and Nick Craswell. Learning to match using local and distributed representations of text for web search. In WWW, 2017.
Lexical and semantic
matching networks
Lexical sub-model operates over input matrix 𝑋
𝑥𝑖,𝑗 =
1, 𝑖𝑓 𝑡 𝑞,𝑖 = 𝑡 𝑑,𝑗
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
In relevant documents,
1. Many matches, typically in clusters
2. Matches localized early in document
3. Matches for all query terms
4. In-order (phrasal) matches
Bhaskar Mitra, Fernando Diaz, and Nick Craswell. Learning to match using local and distributed representations of text for web search. In WWW, 2017.
Many other neural architectures
(Palangi et al., 2015)
(Kalchbrenner et al., 2014)
(Denil et al., 2014)
(Kim, 2014)
(Severyn and Moschitti, 2015)
(Zhao et al., 2015) (Hu et al., 2014)
(Tai et al., 2015)
(Guo et al., 2016)
(Hui et al., 2017)
(Pang et al., 2017)
(Jaech et al., 2017)
(Dehghani et al., 2017)
Impact across both academia and industry
BERT for Ranking
Attention
Given a set of n items and an input context, produce a
probability distribution {a1, …, ai, …, an} of attending to each item
as a function of similarity between a learned representation (q)
of the context and learned representations (ki) of the items
𝑎𝑖 =
𝜑 𝑞, 𝑘𝑖
𝑗
𝑛
𝜑 𝑞, 𝑘𝑗
The aggregated output is given by 𝑖
𝑛
𝑎𝑖 ∙ 𝑣𝑖
Full Input [words x in_channels], [1 x ctx_channels]
Full Output [1 x out_channels]
* When attending over a sequence (and not a set), the key k and value
v are typically a function of the item and some encoding of the position
Self attention
Given a sequence (or set) of n items, treat each item as the
context at a time and attend over the whole sequence (or set),
and repeat for all n items
Full Input [words x in_channels]
Full Output [words x out_channels]
Self attention
Given a sequence (or set) of n items, treat each item as the
context at a time and attend over the whole sequence (or set),
and repeat for all n items
Full Input [words x in_channels]
Full Output [words x out_channels]
Self attention
Given a sequence (or set) of n items, treat each item as the
context at a time and attend over the whole sequence (or set),
and repeat for all n items
Full Input [words x in_channels]
Full Output [words x out_channels]
transformers
A transformer layer consists of a combination of self-
attention layer and multiple fully-connected or
convolutional layers, with residual connections
A transformer-based encoder can consist of multiple
transformers stacked in sequence
Full Input [words x in_channels]
Full Output [words x out_channels]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NeurIPS, 2017.
language modeling
A family of language modeling tasks have been
explored in the literature, including:
• Predict next word in a sequence
• Predict masked word in a sequence
• Predict next sentence
Fundamentally the same idea as word2vec and older
neural LMs—but with deeper models and considering
dependencies across longer distances between terms
w1 [MASK]w2 w4
model
?
loss
w3
contextualized Deep
word embeddings
http://jalammar.github.io/illustrated-bert/
Jacob Devlin, Ming-Wei Chang, et al. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 2018.
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. In NAACL-HLT, 2018.
BERT
Stacked transformer layers
Pretrained on two tasks:
• Masked language
modeling
• Next sentence prediction
Input: WordPiece embedding
+ position embedding +
segment embedding
Jacob Devlin, Ming-Wei Chang, et al. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 2018.
BERT for Ranking
BERT (and other large-scale unsupervised language models) are
demonstrating dramatic performance improvements on many IR tasks
Rodrigo Nogueira, and Kyunghyun Cho. Passage Re-ranking with BERT. In arXiv, 2019.
MS MARCO
Query Passage Pair
Query Passage
score
Retrieving, not just reranking, with deep
neural networks
Deep ranking models are compute-
intensive and are practically
employed only to rerank top-k
candidates retrieved by more
efficient traditional IR methods
IR performances may be significantly
more impacted if we can also use
them for candidate generation
score
Option 1: Query independent document
representation
Employ a Siamese network architecture
Compute document representations offline
and query representation at inference time
Efficient online but large offline
computation cost
Effectiveness degrades without interaction
features and lexical term matching
score
Fast approx. k-NN search with ANNOY
https://github.com/spotify/annoy
Efficient online but large offline
computation cost
Can scale to tail queries but at
higher computation cost—we
can trade-off the two
experimentally
Option 2: Assume query term independence
assumption
Bhaskar Mitra et al. Incorporating Query Term Independence Assumption for Efficient Retrieval and Ranking using Deep Neural Networks. In arXiv, 2019.
What did your model
really learn?
While we celebrate the recent performance bumps on
IR tasks from neural methods, it is also important to
recognize when and how they fail
Clever Hans was a horse claimed to have been
capable of performing arithmetic and other
intellectual tasks.
"If the eighth day of the month comes on a
Tuesday, what is the date of the following Friday?“
Hans would answer by tapping his hoof.
In fact, the horse was purported to have been
responding directly to involuntary cues in the
body language of the human trainer, who had the
faculties to solve each problem. The trainer was
entirely unaware that he was providing such cues.
(source: Wikipedia)
BM25 vs.
Inverse document
frequency of terms( )
BERT
Language model of term
co-occurrences( )
What corpus statistics does your model depend on?
What changed
between train and
test?
Terms often change meaning
across domains or over time
Robust retrieval performance is
important (e.g., enterprise search
across multiple tenants)
TodayRecentIn older
(1990s)
TREC data
Query: uk prime minister
domain A domain B domain C domain X
training domains test domain
Optimizing for cross domain performance
Optimizing for cross domain performance
Train model on multiple domains
During training, an adversarial
discriminator inspects the hidden
states of the model and tries to
predict the source corpus of the
training sample
convolution and
pooling layers
convolution and
pooling layers
hadamard
product
dense layers
adversarial discriminator (dense) 𝑧
𝑦
query
doc
The duet model, in addition to optimizing for the
ranking loss, also tries to “fool” the adversarial
discriminator – and in the process learns more
domain independent representations
Daniel Cohen, Bhaskar Mitra, Katja Hofmann, and W. Bruce Croft. Cross domain regularization for neural ranking models using adversarial learning. In SIGIR, 2018.
Deep Learning
@ TREC
If you are looking for interesting
research topics at the intersection of
machine learning and search, come
participate in the track!
Goal: Large, human-labeled, open IR data
200K queries, human-labeled, proprietary
Past: Weak supervision Here: Two new datasetsPast: Proprietary data
1+M queries, weak supervision, open 300+K queries, human-labeled, open
Mitra, Diaz and Craswell. Learning to match using local
and distributed representations of text for web search.
WWW 2017
Dehghani, Zamani, Severyn, Kamps and Croft.
Neural ranking models with weak supervision.
SIGIR 2017
More data
Bettersearchresults
TREC 2019 Deep Learning Track
Dataset availability
• Corpus + train + dev data for both tasks
available now from the DL Track site*
• NIST test sets available to participants now
• [Broader availability in Feb 2020]
* https://microsoft.github.io/TREC-2019-Deep-Learning/
Questions?
@UnderdogGeek bmitra@microsoft.com

Weitere ähnliche Inhalte

Was ist angesagt?

Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)Weiwei Guo
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMDivya Gera
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDevashish Shanker
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graphDing Li
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep LearningYaminiAlapati1
 
Multisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationMultisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationBhaskar Mitra
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기NAVER Engineering
 
Exploratory Data Analysis using Python
Exploratory Data Analysis using PythonExploratory Data Analysis using Python
Exploratory Data Analysis using PythonShirin Mojarad, Ph.D.
 
Deep Natural Language Processing for Search and Recommender Systems
Deep Natural Language Processing for Search and Recommender SystemsDeep Natural Language Processing for Search and Recommender Systems
Deep Natural Language Processing for Search and Recommender SystemsHuiji Gao
 
Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Gener...
Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Gener...Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Gener...
Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Gener...taeseon ryu
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?Bhaskar Mitra
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and TransformerArvind Devaraj
 

Was ist angesagt? (20)

Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graph
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep Learning
 
Tutorial on Deep Learning
Tutorial on Deep LearningTutorial on Deep Learning
Tutorial on Deep Learning
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Multisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationMultisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and Recommendation
 
Gnn overview
Gnn overviewGnn overview
Gnn overview
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
Exploratory Data Analysis using Python
Exploratory Data Analysis using PythonExploratory Data Analysis using Python
Exploratory Data Analysis using Python
 
Deep Natural Language Processing for Search and Recommender Systems
Deep Natural Language Processing for Search and Recommender SystemsDeep Natural Language Processing for Search and Recommender Systems
Deep Natural Language Processing for Search and Recommender Systems
 
Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Gener...
Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Gener...Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Gener...
Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Gener...
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 

Ähnlich wie Deep Neural Methods for Retrieval

CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationEugene Nho
 
Doc format.
Doc format.Doc format.
Doc format.butest
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...ijaia
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networksconnectbeubax
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...ijtsrd
 
Mapping Subsets of Scholarly Information
Mapping Subsets of Scholarly InformationMapping Subsets of Scholarly Information
Mapping Subsets of Scholarly InformationPaul Houle
 
Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsParang Saraf
 
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text University of Bari (Italy)
 
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...AI Publications
 
NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241Urjit Patel
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextUniversity of Bari (Italy)
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep LearningAndre Freitas
 
Effect of word embedding vector dimensionality on sentiment analysis through ...
Effect of word embedding vector dimensionality on sentiment analysis through ...Effect of word embedding vector dimensionality on sentiment analysis through ...
Effect of word embedding vector dimensionality on sentiment analysis through ...IAESIJAI
 

Ähnlich wie Deep Neural Methods for Retrieval (20)

CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
 
Doc format.
Doc format.Doc format.
Doc format.
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networks
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 
Mapping Subsets of Scholarly Information
Mapping Subsets of Scholarly InformationMapping Subsets of Scholarly Information
Mapping Subsets of Scholarly Information
 
1808.10245v1 (1).pdf
1808.10245v1 (1).pdf1808.10245v1 (1).pdf
1808.10245v1 (1).pdf
 
Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector Representations
 
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
 
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
 
NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
 
Effect of word embedding vector dimensionality on sentiment analysis through ...
Effect of word embedding vector dimensionality on sentiment analysis through ...Effect of word embedding vector dimensionality on sentiment analysis through ...
Effect of word embedding vector dimensionality on sentiment analysis through ...
 

Mehr von Bhaskar Mitra

So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...Bhaskar Mitra
 
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Bhaskar Mitra
 
Neural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressNeural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressBhaskar Mitra
 
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackConformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackBhaskar Mitra
 
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBenchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBhaskar Mitra
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Bhaskar Mitra
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalBhaskar Mitra
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document RankingBhaskar Mitra
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
Neu-IR 2017: welcome
Neu-IR 2017: welcomeNeu-IR 2017: welcome
Neu-IR 2017: welcomeBhaskar Mitra
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)Bhaskar Mitra
 
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)Bhaskar Mitra
 

Mehr von Bhaskar Mitra (20)

So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
 
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...
 
Neural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressNeural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progress
 
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackConformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning Track
 
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBenchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrieval
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Neu-IR 2017: welcome
Neu-IR 2017: welcomeNeu-IR 2017: welcome
Neu-IR 2017: welcome
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
 
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
 

Kürzlich hochgeladen

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxBhagirath Gogikar
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicinesherlingomez2
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 

Kürzlich hochgeladen (20)

CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 

Deep Neural Methods for Retrieval

  • 1. Deep Neural Methods for Retrieval Bhaskar Mitra Principal Applied Scientist, Microsoft PhD candidate, University College London @UnderdogGeek
  • 2. Topics Last week Fundamentals of learning to rank This week Deep neural methods for retrieval
  • 3. Reading material An Introduction to Neural Information Retrieval Foundations and Trends® in Information Retrieval (December 2018) Download PDF: http://bit.ly/fntir-neural
  • 4. The state of neural information retrieval Growing publication popularity at top IR conferences Strong performance against traditional methods in TREC 2019
  • 5. latent representation Learning for text Inspecting non-query terms in the document may reveal important clues about whether the document is relevant to the query albuquerque Passage about Albuquerque Passage not about Albuquerque
  • 6. Deep Structured Semantic Model • Learn latent dense vector representation of query and document text • Relevance is estimated by cosine similarity between query and document embeddings • Relevant document embeddings should be more similar to query embeddings than non-relevant document embeddings Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. Learning deep structured semantic models for web search using clickthrough data. In CIKM, 2013.
  • 7. But how can we input text into a neural model?
  • 8. Different modalities of input text representation
  • 9. Different modalities of input text representation
  • 10. Different modalities of input text representation
  • 11. Different modalities of input text representation
  • 12. Deep Structured Semantic Model To train the model we can use any of the loss functions we learned about in the last lecture Cross-entropy loss against randomly sampled negative documents is commonly used Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. Learning deep structured semantic models for web search using clickthrough data. In CIKM, 2013.
  • 13. Shift-invariant neural operations Detecting a pattern in one part of the input space is similar to detecting it in another Leverage redundancy by moving a window over the whole input space and then aggregate On each instance of the window a kernel—also known as a filter or a cell—is applied Different aggregation strategies lead to different architectures
  • 14. Convolution Move the window over the input space each time applying the same cell over the window A typical cell operation can be, ℎ = 𝜎 𝑊𝑋 + 𝑏 Full Input [words x in_channels] Cell Input [window x in_channels] Cell Output [1 x out_channels] Full Output [1 + (words – window) / stride x out_channels]
  • 15. Pooling Move the window over the input space each time applying an aggregate function over each dimension in within the window ℎ𝑗 = 𝑚𝑎𝑥𝑖∈𝑤𝑖𝑛 𝑋𝑖,𝑗 𝑜𝑟 ℎ𝑗 = 𝑎𝑣𝑔𝑖∈𝑤𝑖𝑛 𝑋𝑖,𝑗 Full Input [words x channels] Cell Input [window x channels] Cell Output [1 x channels] Full Output [1 + (words – window) / stride x channels] max -pooling average -pooling
  • 16. Convolution w/ Global Pooling Stacking a global pooling layer on top of a convolutional layer is a common strategy for generating a fixed length embedding for a variable length text Full Input [words x in_channels] Full Output [1 x out_channels]
  • 17. Recurrence Similar to a convolution layer but additional dependency on previous hidden state A simple cell operation shown below but others like LSTM and GRUs are more popular in practice, ℎ𝑖 = 𝜎 𝑊𝑋𝑖 + 𝑈ℎ𝑖−1 + 𝑏 Full Input [words x in_channels] Cell Input [window x in_channels] + [1 x out_channels] Cell Output [1 x out_channels] Full Output [1 x out_channels]
  • 18. Convolutional DSSM (CDSSM) Replace bag-of-words assumption by concatenating term vectors in a sequence on the input Convolution followed by global max-pooling Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Gregoire Mesnil. A latent semantic model with convolutional-pooling structure for information retrieval. In CIKM, 2014.
  • 19. Interaction-based networks Typically a document is relevant if some part of the document contains information relevant to the query Interaction matrix 𝑋—where 𝑥𝑖𝑗 is obtained by comparing the ith window over query terms with the jth window over the document terms—captures evidence of relevance from different parts of the document Additional neural network layers can inspect the interaction matrix and aggregate the evidence to estimate overall relevance Zhengdong Lu and Hang Li. A deep architecture for matching short texts. In NIPS, 2013.
  • 20. Kernel pooling Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. End-to-end neural ad-hoc ranking with kernel pooling. In SIGIR, 2017. Zhuyun Dai, Chenyan Xiong, Jamie Callan, and Zhiyuan Liu. Convolutional neural networks for soft-matching n-grams in ad-hoc search. In WSDM, 2018.
  • 21. Lexical and semantic matching networks Mitra et al. [2016] argue that both lexical and semantic matching is important for document ranking Duet model is a linear combination of two DNNs—focusing on lexical and semantic matching, respectively—jointly trained on labelled data Bhaskar Mitra, Fernando Diaz, and Nick Craswell. Learning to match using local and distributed representations of text for web search. In WWW, 2017.
  • 22. Lexical and semantic matching networks Lexical sub-model operates over input matrix 𝑋 𝑥𝑖,𝑗 = 1, 𝑖𝑓 𝑡 𝑞,𝑖 = 𝑡 𝑑,𝑗 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 In relevant documents, 1. Many matches, typically in clusters 2. Matches localized early in document 3. Matches for all query terms 4. In-order (phrasal) matches Bhaskar Mitra, Fernando Diaz, and Nick Craswell. Learning to match using local and distributed representations of text for web search. In WWW, 2017.
  • 23. Many other neural architectures (Palangi et al., 2015) (Kalchbrenner et al., 2014) (Denil et al., 2014) (Kim, 2014) (Severyn and Moschitti, 2015) (Zhao et al., 2015) (Hu et al., 2014) (Tai et al., 2015) (Guo et al., 2016) (Hui et al., 2017) (Pang et al., 2017) (Jaech et al., 2017) (Dehghani et al., 2017)
  • 24. Impact across both academia and industry BERT for Ranking
  • 25. Attention Given a set of n items and an input context, produce a probability distribution {a1, …, ai, …, an} of attending to each item as a function of similarity between a learned representation (q) of the context and learned representations (ki) of the items 𝑎𝑖 = 𝜑 𝑞, 𝑘𝑖 𝑗 𝑛 𝜑 𝑞, 𝑘𝑗 The aggregated output is given by 𝑖 𝑛 𝑎𝑖 ∙ 𝑣𝑖 Full Input [words x in_channels], [1 x ctx_channels] Full Output [1 x out_channels] * When attending over a sequence (and not a set), the key k and value v are typically a function of the item and some encoding of the position
  • 26. Self attention Given a sequence (or set) of n items, treat each item as the context at a time and attend over the whole sequence (or set), and repeat for all n items Full Input [words x in_channels] Full Output [words x out_channels]
  • 27. Self attention Given a sequence (or set) of n items, treat each item as the context at a time and attend over the whole sequence (or set), and repeat for all n items Full Input [words x in_channels] Full Output [words x out_channels]
  • 28. Self attention Given a sequence (or set) of n items, treat each item as the context at a time and attend over the whole sequence (or set), and repeat for all n items Full Input [words x in_channels] Full Output [words x out_channels]
  • 29. transformers A transformer layer consists of a combination of self- attention layer and multiple fully-connected or convolutional layers, with residual connections A transformer-based encoder can consist of multiple transformers stacked in sequence Full Input [words x in_channels] Full Output [words x out_channels] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NeurIPS, 2017.
  • 30. language modeling A family of language modeling tasks have been explored in the literature, including: • Predict next word in a sequence • Predict masked word in a sequence • Predict next sentence Fundamentally the same idea as word2vec and older neural LMs—but with deeper models and considering dependencies across longer distances between terms w1 [MASK]w2 w4 model ? loss w3
  • 31. contextualized Deep word embeddings http://jalammar.github.io/illustrated-bert/ Jacob Devlin, Ming-Wei Chang, et al. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 2018. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. In NAACL-HLT, 2018.
  • 32. BERT Stacked transformer layers Pretrained on two tasks: • Masked language modeling • Next sentence prediction Input: WordPiece embedding + position embedding + segment embedding Jacob Devlin, Ming-Wei Chang, et al. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 2018.
  • 33. BERT for Ranking BERT (and other large-scale unsupervised language models) are demonstrating dramatic performance improvements on many IR tasks Rodrigo Nogueira, and Kyunghyun Cho. Passage Re-ranking with BERT. In arXiv, 2019. MS MARCO Query Passage Pair Query Passage score
  • 34. Retrieving, not just reranking, with deep neural networks Deep ranking models are compute- intensive and are practically employed only to rerank top-k candidates retrieved by more efficient traditional IR methods IR performances may be significantly more impacted if we can also use them for candidate generation score
  • 35. Option 1: Query independent document representation Employ a Siamese network architecture Compute document representations offline and query representation at inference time Efficient online but large offline computation cost Effectiveness degrades without interaction features and lexical term matching score
  • 36. Fast approx. k-NN search with ANNOY https://github.com/spotify/annoy
  • 37. Efficient online but large offline computation cost Can scale to tail queries but at higher computation cost—we can trade-off the two experimentally Option 2: Assume query term independence assumption Bhaskar Mitra et al. Incorporating Query Term Independence Assumption for Efficient Retrieval and Ranking using Deep Neural Networks. In arXiv, 2019.
  • 38. What did your model really learn? While we celebrate the recent performance bumps on IR tasks from neural methods, it is also important to recognize when and how they fail
  • 39. Clever Hans was a horse claimed to have been capable of performing arithmetic and other intellectual tasks. "If the eighth day of the month comes on a Tuesday, what is the date of the following Friday?“ Hans would answer by tapping his hoof. In fact, the horse was purported to have been responding directly to involuntary cues in the body language of the human trainer, who had the faculties to solve each problem. The trainer was entirely unaware that he was providing such cues. (source: Wikipedia)
  • 40. BM25 vs. Inverse document frequency of terms( ) BERT Language model of term co-occurrences( ) What corpus statistics does your model depend on?
  • 41. What changed between train and test? Terms often change meaning across domains or over time Robust retrieval performance is important (e.g., enterprise search across multiple tenants) TodayRecentIn older (1990s) TREC data Query: uk prime minister
  • 42. domain A domain B domain C domain X training domains test domain Optimizing for cross domain performance
  • 43. Optimizing for cross domain performance Train model on multiple domains During training, an adversarial discriminator inspects the hidden states of the model and tries to predict the source corpus of the training sample convolution and pooling layers convolution and pooling layers hadamard product dense layers adversarial discriminator (dense) 𝑧 𝑦 query doc The duet model, in addition to optimizing for the ranking loss, also tries to “fool” the adversarial discriminator – and in the process learns more domain independent representations Daniel Cohen, Bhaskar Mitra, Katja Hofmann, and W. Bruce Croft. Cross domain regularization for neural ranking models using adversarial learning. In SIGIR, 2018.
  • 44. Deep Learning @ TREC If you are looking for interesting research topics at the intersection of machine learning and search, come participate in the track!
  • 45. Goal: Large, human-labeled, open IR data 200K queries, human-labeled, proprietary Past: Weak supervision Here: Two new datasetsPast: Proprietary data 1+M queries, weak supervision, open 300+K queries, human-labeled, open Mitra, Diaz and Craswell. Learning to match using local and distributed representations of text for web search. WWW 2017 Dehghani, Zamani, Severyn, Kamps and Croft. Neural ranking models with weak supervision. SIGIR 2017 More data Bettersearchresults TREC 2019 Deep Learning Track
  • 46. Dataset availability • Corpus + train + dev data for both tasks available now from the DL Track site* • NIST test sets available to participants now • [Broader availability in Feb 2020] * https://microsoft.github.io/TREC-2019-Deep-Learning/

Hinweis der Redaktion

  1. Clever Hans was a horse. It was claimed that he could do simple arithmetic. If you asked Hans a question he would respond by tapping his hoof. After a thorough investigation, it was, however, determined that what Clever Hans was really good at was at reading very subtle and, in fact, unintentional clues that his trainer was giving him via his body language. Hans didn’t know arithmetic at all. But he was very good at spotting body language that CORRELATED highly with the right answer.
  2. A traditional IR model, such as BM25, makes very few assumptions about the target collection. You can argue that the inverse document frequencies (and couple of the BM25 hyper-parameters) are all that you would learn from your collection. Which is why you can throw BM25 at most retrieval task (e.g., TREC or Web ranking in Bing) and it would give you pretty reasonable performance in most cases out-of-the-box. On the other hand, take a deep neural model and train it on Bing Web ranking task and then evaluate it on TREC data and I bet it falls flat on its face.