After the LLM Big-Bang over one year ago, the field of (word-)embeddings has (paradoxically) gained a lot of momentum. That is remarkable, because as transformers seamlessly process Natural Language into Natural Language (or more generally, anything into whatever), what should one gain by looking into their “guts” (ie, their sub-symbolic “thoughts”)? In this presentation, we will go beyond the King-Queen example and the nearest-neighbors search, to illustrate how, based on convex-optimization, it is possible to make high-quality inferences with sets of embeddings. An application of the technique will be illustrated with an example from the domain of knowledge graphs.
1. Ivan Herreros, CAIML, Jan 24
Convex optimization and
text-embeddings
Explainable inferences with embeddings
2. Motivational example: Furniture e-commerce
User wants to
fi
lter by “outdoor”
Product Features
Big dining table wood, outdoor
Foldable chair wood, UV-resistant, waterproof
Bench wood, weather resistant
Small dining table wood, waterproof, scratch resistant
3. Option 1: 1-to-1 keyword matching
Result is completely “meaning”-agnostic
Product Features
Big dining table wood, outdoor
Foldable chair wood, UV-resistant, waterproof
Bench wood, weather resistant
Small dining table wood, waterproof, scratch resistant
4. Option 2: Direct retrieval including Synonyms
Needs an “ontology” marking synonyms (e.g. Concepts with alternative
labels)
All “inference” occurred at the creation of the ontology
Product Features
Big dining table wood, outdoor
Foldable chair wood, UV-resistant, waterproof
Bench wood, weather resistant
Small dining table wood, waterproof, scratch resistant
5. The 2023/24 way : ask ChatGPT
Product Features
Big dining table wood, outdoor
Foldable chair wood, UV-resistant, waterproof
Bench wood, weather resistant
Small dining table wood, waterproof, scratch resistant
6. Can we obtain chatGPT’s output without an LLM?
To be able to perform the same inferences that GPT made we need:
Derive similarities and set meaningful thresholds
“Weather resistant” implies “outdoor”
“Waterproof” is similar, but not enough to imply “outdoor”
Combine concepts
“Waterproof” and “UV resistant” combined imply “outdoor”
7. Embeddings
Representations of concepts as vectors of real numbers (documents,
images, graphs, etc.)
Similar concepts are mapped into nearby positions in a high-
dimensional space.
“the study of [X] is becoming a problem of vector
space mathematics” MIT Tech Review Sept, 2015
8. Embeddings in language processing
Distributional word embeddings (capturing co-occurring statistics)
powerful since mid 2010s (Mikolov et al., 2013)
• Transformer-based “word-in-context” embeddings
(BERT: Devlin et al., 2018)
Word, sentence and document embeddings, etc.
Today: openAI (among others) o
ff
ers endpoints to encode text as
embeddings.
9. Similarities in embedding space
Feature-to-feature similarities to “outdoor”
Feature Similarity
outdoor 1.0
weather resistant 0.859
waterproof 0.845
UV resistant 0.834
wood 0.804
scratch resistant 0.795
9
Order looks good, but how to interpret the values themselves?
10. Similarities in embedding space
Feature-to-feature similarities to “outdoor”
Feature Similarity
outdoor 1.0
weather resistant 0.859
waterproof 0.845
UV resistant 0.834
wood 0.804
scratch resistant 0.795
10
Order looks good, but how to interpret the values themselves?
11. Calibrating similarities in embedding space
Feature-to-feature similarities to “outdoor”
Indeed, weather resistant is 2x closer to outdoor than waterproof
Feature Similarity Percentile
outdoor 1.0 100
weather resistant 0.859 97
waterproof 0.845 94
UV resistant 0.834 91
wood 0.804 69
scratch resistant 0.795 56
11
12. To be able to perform the same inferences that GPT made we need
Derive similarities and set meaningful thresholds
“Weather resistant” implies “outdoor”
“Waterproofed” is similar, but not enough to imply “outdoor”
Combine concepts
“Waterproofed” and “UV resistant” combined imply “outdoor”
Can we obtain chatGPT’s output without an LLM?
17. To be able to perform the same inferences that GPT made we need to:
Derive similarities and set meaningful thresholds
“Weather resistant” implies “outdoor”
“Waterproofed” is similar, but not enough to imply “outdoor”
Combine concepts
“Waterproofed” and “UV resistant” combined imply “outdoor”
Can we obtain chatGPT’s output without an LLM?
18. Recap: What were we trying to solve?
Product Owner
If a user
fi
lters content by some feature, use all relevant features to
compute the match.
Data Scientist
Abstract problem: given a target “feature”
fi
nd out whether a set of
“features” implies it.
Idea: instead of either include or exclude each feature, why not
“weight” each feature’s contribution?
19. Geometrical understanding the problem
Three “embeddings” (a, b and c)
Points: (normalized) non-negative
combinations of them.
Each “point” corresponds to an
embedding that is “implied” by a,
b and c.
20. Closest “implied” vector
Three “embeddings” (a, b and c)
Points: (normalized) non-negative
combinations of them.
v: “query” embedding.
v_proj: most similar vector to v
generated as a combination of a,
b and c.
21. “Explainable” decomposition
Three “embeddings” (a, b and c).
v: “query” embedding.
v_proj: most similar vector to v
generated as a combination of a,
b and c.
We can extract how a, b and c
contributed to the “best match”.
Finding these projections can be
formulated as a convex-
optimization problem
22. Convex optimization
minimize f0(x)
subject to fi(x) ≤ 0, i = 1,…, m
gi(x) = 0, i = 1,…, p
Loss or cost function
Constraints
(define domain)
x : vector with variables to optimize
23. Convex optimization
minimize f0(x)
subject to fi(x) ≤ 0, i = 1,…, m
Ax = b
Convex loss and inequality
constraint functions
Linear equality constrains
Minimizing a convex function over a convex set.
25. Combine embeddings to maximize cosine similarity
minimize −vT
Mx
s.t. ∥Mx∥2 ≤ 1
x ≥ 0
“Mixed” embedding
“Mixed” embedding has
“at most” L2-norm=1
Non-negative mixing coefficients
v : target embedding (column vector)
M : matrix that contains the “source” embeddings (e.g.: a, b and c)
x : vector with the mixing coe
ffi
cients
27. Option 3: Matching through CVX projection
Product Features Weights Sim
Big dining
table
wood, outdoor [0. 1.] 1.0
Foldable chair
wood, UV-resistant,
waterproof
[0.332 0.346 0.394] 0.878
Bench wood, weather resistant [0.39 0.667] 0.877
Small dining
table
wood, waterproof,
scratch resistant
[0.346 0.505 0.224] 0.872
Order is correct, order of “relevances” too
However: Fine-tuning needed (compensate for “rest-similarity”, length
of feature list, etc)
28. Nice theory, how could we bring it to production?
Data Scientist
Compute matching between feature and set of features with a
“cone projection”.
Back-end Engineer/Data Engineer/Data Scientist
Inference at “write time”: compute which “features” imply other
features (for each pro
fi
le or at the “ontology” level).
Inference at “query time”: too costly, but you can pre-
fi
lter items
with a kNN in the aggregate embedding space.
29. Summary
Possibility to exploit compositionality in embeddings to support
inferences beyond 1-to-1 similarity/nearest-neighbours.
Convex optimization/CVXPY: declarative framework that allows to
de
fi
ne and e
ffi
ciently solve “convex” problems.
Many domains of application of the CVX+embeddings framework:
HR domain: matching applicant pro
fi
les against job
requirements.
RAG (idea): apply this kind of “search” in the R-step of RAG
architectures.