3. Relevance: challenge of 'aboutness'
bitcoin regulation
'bitcoin' - aboutness
‘regulation’
aboutness
X
X
Doc 1
Doc 2
X
Query
Doc 2 more about terms,
ranked 'higher'
4. One hypothesis: TF*IDF vector similarity (and bm25 ...)
bitcoin regulation
'bitcoin' - TF*IDF
‘regulation’
TF*IDF
X
X
Doc 1
Doc 2
X
Query
Doc 2 more 'about' because higher
concentration of the corpus's
'bitcoin' and 'regulation' terms
5. Sadly 'Aboutness' is hard
After minutes - simple
TF*IDF
After years
relevance /
feature eng.
work
Possible
Search
quality
Why!?
● Searcher vocab mismatches
● Searchers expect context
(personal, emotional, ...)
● Different entities to match to
● Language evolves
● Misspellings, typos, etc
● TF*IDF doesn't always matter
6. There's Something About Aboutness?
Neural Search Frontier
Hunt for the About October?
The Last Sharknado: It's *About* Time?
7. Embeddings: another vector sim
A vector storing information about the context a word appears in
“Have you heard the hype about bitcoin currency?”
“Bitcoin a currency for hackers?”
“Have you heard the hype about bitcoin currency?”
“Hacking on my Bitcoin server! It's not hyped”
Encode
contexts
[0.6,-0.4] bitcoin
Docs
8. Easier to see as a graph
[0.5,-0.5] cryptocurrency
[0.6,-0.4] bitcoin
These two terms
occur in similar
contexts
Graph from Desmos
9. Document embeddings
Encode full text as context for document
Encode [0.5,-0.4] doc1234
Have you heard of cryptocurrency? It's the cool
currency being worked on by hackers! Bitcoin is
one prominent example. Some say it's all hype
...
10. Is shared 'context' same as 'aboutness'!?
[0.5,-0.5] cryptocurrency
[0.6,-0.4] bitcoin
[0.5,-0.4] doc1234
[0.4,-0.2] doc5678
Embeddings:
bitcoin Closer docs to 'bitcoin'
ranked higher
1. doc1234
2. doc5678
11. Sadly no :(
I want to cancel my reservation
I want to confirm my reservation
Words occur in similar contexts can
have opposite meanings!!
'Doc about canceling' not
relevant for 'confirm'
12. Fundamentally there’s still a mismatch
Fiat currencies
have a global
conspiracy
against
dogecoin
bit dollar?
Crypto expert creates
content with nerd
speak
Average Citizen
searches in lay-
speak
Embeddings derived from corpus, not our
searcher’s language
15. Dot Product
(‘similarity’)
Sigmoid
(forces to 0..1
‘probability’)
[5.5,
8.1,
...
0.6]
'Bitcoin'
Embedding
[6.0,
1.7,
...
8.4]
'Energy'
Embedding
5.5*6.0 +
8.1*1.7 +
...
+ 0.6*8.4 =
102.65
0...1
0.67
Out of
context True
context
Term Other Term In Same Context? Model Prediction
bitcoin energy 1 0.67
Trying to predict
A word2vec Model
Term
embedding
Table
(initialized w/
random vals)
Term
embedding
Table
(initialized w/
random vals)
16. Tweak…
Term Other
Term
True
Context?
Prediction
bitcoin energy 1 0.67
bitcoin dongles 0 0.78
bitcoin relevance 0 0.56
bitcoin cables 0 0.34
To get into the weeds on the loss function & gradient descent for
word2vec:Stanford CS224D Deep Learning for NLP Lecture Notes
Chaubard, Mundra, Socher
Random
unrelated
terms
Terms
sharing
context
Tweak ‘bitcoin’ to get
prediction closer to 1
Tweak ‘bitcoin’ to get
prediction closer to 0
Goal:
Goal:
(showing skipgram / negative sampling method)
17. Maximize Me!
-
Maximize ‘True’ Contexts
Dot Product
(‘similarity’)
Sigmoid
(forces to
0..1)
[5.5,
8.1,
...
0.6]
[6.0,
1.7,
...
8.4]
Minimize ‘False’ Contexts
Just the model (sigmoid of dot product)
F =
18. Did you just neural network?
Backpropagate tweaks to weights
to reduce error
Neuron
(sigmoid)
Word1Word2
(Dot product implicit)
d F
d v[0]
d F
d v[1]
d F
d v[2]
...
Our fitness function
Tweak this
component to
maximize fit
More complex/'deep' neural nets, can
propagate back error to earlier layers
to learn weights
19. Dot Product
(‘similarity’)
Sigmoid
(forces to 0..1)
[5.5,
8.1,
...
0.6]
'Bitcoin'
Embedding
[6.0,
1.7,
...
8.4]
'Energy'
Embedding
5.5*6.0*2.0 +
8.1*1.7*0.5 +
...
+ 0.6*8.4*1.4 =
102.65
0...1
0.67
Out of
context True
context
Doc2Vec, train paragraph vector w/ term vectors
[2.0,
0.5,
...
1.4]
Para.
embedding
matrix
Term
embedding
matrix
Term
embedding
matrix
backpropagate
20. Same neg sampling can apply here
Term Para True
Context?
Prediction
bitcoin Doc 1234
Para 1
1 0.67
bitcoin Doc 5678
Para 5
0 0.78
bitcoin Doc 1234
Para 8
0 0.56
bitcoin Doc 1537
Para 1
0 0.34
I’m continuing the thread of negative sampling, but doc2vec need not
use negative sampling
Random
unrelated
docs
Terms
sharing
context
Tweak embeddings to
get prediction closer to 1
Tweak embeddings to
get prediction closer to 0
21. Embedding Modeling:
[0.5,-0.4] doc1234
...For years you've been
manipulating this...
Your new clay to mold...
Manipulating the constructions of
embeddings to measure something *domain
specific*
24. Embeddings from judgments?
Query
Term
Doc Relevant? Relevance
Score
bitcoin Doc 1234 1 0.67
bitcoin Doc 5678 0 0.78
bitcoin Doc 1234 0 0.56
bitcoin Doc 1537 0 0.34
Random
unrelated
docs
Relevant
doc for
term
Tweak embeddings to
get query-doc score
closer to 1
Tweak embeddings to
get prediction closer to 0
(derived from
clicks, etc)
25. Pretrain word2vec with corpus/sessions?
Query
Term
Doc Corpus
Model
True
Relevance
Tweaked
Score
bitcoin Doc 1234 0 1 0.01
bitcoin Doc 5678 1 0 0.99
bitcoin Doc 1234 0 0 0.56
bitcoin Doc 1537 0 0 0.34
Takes a lot to overcome
original unified model,
and its a case-by-case
basis
(a kind of prior)
An online / evolutionary approach to converge on improved
embeddings?
27. Embeddings go beyond language
User Item Purchased? Recommen
ded?
Tom Jeans 1 0.67
Tom Khakis 0 0.78
Tom iPad 0 0.56
Tom Dress
Shoes
0 0.34
Tweak embeddings to
get user-item score
closer to 1
Tweak embeddings to
get prediction closer to 0
(use to find similar items / users)
28. More features beyond just 'context'
Query Term Doc Sentiment Same
Context
TweakedSco
re
cancel Doc 1234 Angry 1 0.01
confirm Doc 5678 Angry 0 0.99
cancel Doc 1234 Happy 0 0.56
confirm Doc 1537 Happy 0 0.34
See: https://arxiv.org/abs/1805.07966
29. Evolutionary/contextual bandit embeddings?
Query
Term
Doc Corpus
Model
True
Relevance
Tweaked
Score
bitcoin Doc 1234 0 1 0.01
bitcoin Doc 5678 1 0 0.99
bitcoin Doc 1234 0 0 0.56
bitcoin Doc 1537 0 0 0.34
Pull embedding in the
direction your KPIs seem
to say is successful
(a kind of prior)
An online / evolutionary approach to converge on improved
embeddings?
30. Embedding Frontier...
There's a lot we could do
- Model specificity: high specificity terms cluster differently than low
specificity terms
- You often data-model/clean A LOT before computing your embeddings
(just like you do your search index)
- See Simon Hughes "Vectors in Search" talk on encoding vectors and
using them in Solr
33. Language Model
Can seeing text, and guessing what would "come next" help us on our
quest for 'aboutness'?
“eat __?_____”
cat: 0.001
...
chair: 0.001
pizza: 0.02
...
nap: 0.001
Highest
Probability
Obvious applications: autosuggest, etc
34. Markov language model, vocab size V
pizza chair nap ...
eat 0.02 0.0002 0.0001 ...
cat 0.0001 0.0001 0.02 ...
... ... ... ...
Probability of ‘pizza’ following
‘eat’ (as in “eat pizza”)
V words
Vwords
From corpus we can build a transition matrix
reflecting frequency of word transitions:
35. Transition Matrix Math
pizza chair nap ...
eat 0.02 0.0002 0.0001 ...
cat 0.0001 0.0001 0.02 ...
... ... ... ...
Let's say we don't 100% know the 'current' word can
we still guess the 'next' word?
Prob
eat 0.25
cat 0.40
... ...
X =
Prob 'pizza' next P(chair) P(nap) ...
0.25*0.02+0.40*0.0001 + ... ... ... ...
Pcurr
Pnext:
Vwords
V words
36. Language model for embeddings?
[5.5,
8.1,
...
0.6]
Term
embedding
matrix
‘eat’
embedding
‘h’dimensions
0 1 2 ... h
0 0.2 -0.4 0.1
1 -0.1 0.3 -0.24
2
...
h
X
[0.5,
6.1,
...
-4.8]
Embedding
of next word
=
(probably clusters near
‘pizza’ etc)A transition matrix - weights learned
through backpropagation
41. Not a silver bullet
[5.5,
W_xh
[-5.5,
1.1,
... W_hh
[5.0,
[1.5,
0.1,
...
W_xh
[ 8.5,
-1.1,
...
[-5.5,
1.1,
...
[5.5,
W_hy
W_xh
A dog is loose! Please call the
animal control
dog catcher
(corpus language)
(searcher language)
Won't learn this ideal
language model if
corpus never says
'dog catcher'
44. 'Translate' documents to queries?
Encoder RNN Decoder RNN
Animal Control Law
Herefore let it be …
And therefore… with
much to be blah blah
blah
…
…
The End
Predicted Queries
Doc Query Relevant?
Animal Control Law Dog catcher 1
Littering Law Dog Catcher 0
Using judgments/clicks as training data:
45. Can we use graded judgments?
Encoder RNN Decoder RNN
Animal Control Law
Herefore let it be …
And therefore… with
much to be blah blah
blah
…
…
The End
Predicted Queries
Doc Query Relevant?
Animal Control Law Dog catcher 4
Animal Leash Law Dog Catcher 2
Construct training data to
reflect weighting
46. Skip-Thought Vectors (a kind of ‘sentence2vec’)
Encoder RNN
Decoder RNN
(sentence before)
Decoder RNN
(sentence after)
It’s fleece was white
as snow
Becomes an embedding for the
sentence encoding semantic
meaning
Mary had a little lamb
And everywhere that
mary went, that lamb
was sure to go
47. Skip-Thought Vectors for queries? (‘doc2queryVec’)
Encoder RNN
Decoder RNN
(relevant query)
Decoder RNN
(relevant query)
Becomes an embedding mapping
docs into the query’s semantic
space
q=dog catcher
q=loose dog
Animal Control Law
Herefore let it be …
And therefore… with
much to be blah blah
blah
…
…
The End
Decoder RNN
(relevant query)
...
q=dog fight
My Thoughts on Skip Thoughts
49. Anything2vec
Deep learning is especially good at learning representations of :
- Images
- User History
- Songs
- Video
- Text
If everything can be ‘tokenized’ into some kind of token space for retrieval
Everything can be ‘vectorized’ into some embedding space for retrieval / ranking
50. Lucene community needs to get better first-class
vector support
Similarity API and vectors ?
long computeNorm(FieldInvertState state)
SimWeight computeWeight(float boost, CollectionStatistics
collectionStats, TermStatistics... termStats)
SimScorer simScorer(SimWeight weight, LeafReaderContext context)
51. Some progress
- Delimited Term Frequency Filter Factory (use tf to encode
- Payloads & Payload Score
Check out Simon Hughes talk: "Vectors in Search"
52. It’s not magic, it’s math!
Join the Search Relevancy Slack Community
http://o19s.com/slack
(projects, chat, conferences, help, book authors, and more…)
53. Matching: vocab mismatch!
Taxonomical Semantical Magical Search Doug Turnbull,
Lucene/Solr Revolution 2017
"Animal
Control
Law"
Dog catcher
Dog
Catcher
Animal
Control
Concept
1234
Taxonomist /
Librarian
Costly to manually
maintain mapping
between searcher &
corpus vocabulary
No Match, despite relevant results!
legalese
lay-speak
54. Animal Control Law
Herefore let it be …
And therefore… with
much to be blah blah
blah
…
…
The End
Ranking optimization hard!
Dog catcher
Score
doc
Ranked
Results
LTR is only as good as
your features: Solr/ES
queries based on your
skill with Solr/ES queries
LTR?
Ranking can be hard
optimization problem:
Fine tuning heuristics:
TF*IDF, Solr/ES queries,
analyzers, etc...
55. How can deep learning help?
Taxonomical Semantical Magical Search Doug Turnbull,
Lucene/Solr Revolution 2017
Can we learn, quickly at scale, the semantic
relationships between concepts?
What other forms of every-day enrichment
could deep-learning accelerate?
Dog catcher
Animal control
57. Can we translate between searcher & corpus?
[5.5,
8.1,
...
0.6]
[2.0,
0.5,
...
1.4]
Insert Machine
Learning Here
Ranking
Prediction
Doc
Search term
(classic LTR could go here)
58. Embeddings
Words with similar context share close embeddings
Cryptocurrency is not hyped! It's the future
Hackers invent their own cryptocurrency
Cryptocurrency is a digital currency
Hyped cryptocurrency is for hackers!
Encode
contexts
[0.5,-0.5] cryptocurrency
[0.6,-0.4] bitcoin
Hinweis der Redaktion
This doesn’t sufficiently generalize the queries
Likely don’t have as much training data in relevance judgments as in the unlabeled data from the corpus
This is a kind of point-wise learning to rank
This doesn’t sufficiently generalize the queries
Likely don’t have as much training data in relevance judgments as in the unlabeled data from the corpus
This is a kind of point-wise learning to rank
Alternatively, the relevance-based embeddings are a kind of prior
We can leverage information about possibly relevant documents (clicks, query logs, etc.) wrt queries while learning embeddings. We can adjust query vectors and mitigate the possible vocabulary distinction between docs and queries sets
This doesn’t sufficiently generalize the queries
Likely don’t have as much training data in relevance judgments as in the unlabeled data from the corpus
This is a kind of point-wise learning to rank
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior