Convex Optimization and Word Embeddings

Ivan Herreros, CAIML, Jan 24
Convex optimization and
text-embeddings
Explainable inferences with embeddings

Motivational example: Furniture e-commerce
User wants to
fi
lter by “outdoor”
Product Features
Big dining table wood, outdoor
Foldable chair wood, UV-resistant, waterproof
Bench wood, weather resistant
Small dining table wood, waterproof, scratch resistant

Option 1: 1-to-1 keyword matching
Result is completely “meaning”-agnostic
Product Features

Option 2: Direct retrieval including Synonyms
Needs an “ontology” marking synonyms (e.g. Concepts with alternative
labels)
All “inference” occurred at the creation of the ontology
Product Features

The 2023/24 way : ask ChatGPT
Product Features

Can we obtain chatGPT’s output without an LLM?
To be able to perform the same inferences that GPT made we need:
Derive similarities and set meaningful thresholds
“Weather resistant” implies “outdoor”
“Waterproof” is similar, but not enough to imply “outdoor”
Combine concepts
“Waterproof” and “UV resistant” combined imply “outdoor”

Embeddings
Representations of concepts as vectors of real numbers (documents,
images, graphs, etc.)
Similar concepts are mapped into nearby positions in a high-
dimensional space.
“the study of [X] is becoming a problem of vector
space mathematics” MIT Tech Review Sept, 2015

Embeddings in language processing
Distributional word embeddings (capturing co-occurring statistics)
powerful since mid 2010s (Mikolov et al., 2013)
• Transformer-based “word-in-context” embeddings
(BERT: Devlin et al., 2018)
Word, sentence and document embeddings, etc.
Today: openAI (among others) o
ff
ers endpoints to encode text as
embeddings.

Similarities in embedding space
Feature-to-feature similarities to “outdoor”
Feature Similarity
outdoor 1.0
weather resistant 0.859
waterproof 0.845
UV resistant 0.834
wood 0.804
scratch resistant 0.795
9
Order looks good, but how to interpret the values themselves?

Similarities in embedding space
Feature Similarity
outdoor 1.0
weather resistant 0.859
waterproof 0.845
UV resistant 0.834
wood 0.804
scratch resistant 0.795
10
Order looks good, but how to interpret the values themselves?

Calibrating similarities in embedding space
Indeed, weather resistant is 2x closer to outdoor than waterproof
Feature Similarity Percentile
outdoor 1.0 100
weather resistant 0.859 97
waterproof 0.845 94
UV resistant 0.834 91
wood 0.804 69
scratch resistant 0.795 56
11

To be able to perform the same inferences that GPT made we need
“Waterproofed” is similar, but not enough to imply “outdoor”
Combine concepts
“Waterproofed” and “UV resistant” combined imply “outdoor”

The “surprising” properties of embeddings

The “surprising” properties of embeddings
King - Man + Woman ≈ Queen
M
an
Woman
K
i
n
g Queen

Compositionality of meaning with embeddings
If
King - Man + Woman ≈ Queen
then (maybe):
UV resistant + waterproofed ≈ outdoor

Can we combine “features”?
Feature(s) Similarity Percentile
outdoor 1.0 100
UV resistant + waterproof 0.871 98
weather resistant 0.859 97
waterproof 0.845 94
UV resistant 0.834 91
wood 0.804 69
scratch resistant 0.795 56

To be able to perform the same inferences that GPT made we need to:
“Waterproofed” is similar, but not enough to imply “outdoor”
Combine concepts
“Waterproofed” and “UV resistant” combined imply “outdoor”

Recap: What were we trying to solve?
Product Owner
If a user
fi
lters content by some feature, use all relevant features to
compute the match.
Data Scientist
Abstract problem: given a target “feature”
fi
nd out whether a set of
“features” implies it.
Idea: instead of either include or exclude each feature, why not
“weight” each feature’s contribution?

Geometrical understanding the problem
Three “embeddings” (a, b and c)
Points: (normalized) non-negative
combinations of them.
Each “point” corresponds to an
embedding that is “implied” by a,
b and c.

Closest “implied” vector
Three “embeddings” (a, b and c)
Points: (normalized) non-negative
combinations of them.
v: “query” embedding.
v_proj: most similar vector to v
generated as a combination of a,
b and c.

“Explainable” decomposition
Three “embeddings” (a, b and c).
v: “query” embedding.
v_proj: most similar vector to v
generated as a combination of a,
b and c.
We can extract how a, b and c
contributed to the “best match”.
Finding these projections can be
formulated as a convex-
optimization problem

Convex optimization
minimize f0(x)
subject to fi(x) ≤ 0, i = 1,…, m
gi(x) = 0, i = 1,…, p
Loss or cost function
Constraints
(define domain)
x : vector with variables to optimize

Convex optimization
minimize f0(x)
subject to fi(x) ≤ 0, i = 1,…, m
Ax = b
Convex loss and inequality
constraint functions
Linear equality constrains
Minimizing a convex function over a convex set.

Convex functions and convex sets
Convex function Convex set

Combine embeddings to maximize cosine similarity
minimize −vT
Mx
s.t. ∥Mx∥2 ≤ 1
x ≥ 0
“Mixed” embedding
“Mixed” embedding has
“at most” L2-norm=1
Non-negative mixing coefficients
v : target embedding (column vector)
M : matrix that contains the “source” embeddings (e.g.: a, b and c)
x : vector with the mixing coe
ffi
cients

CVXPY: Convex optimization in Python

Option 3: Matching through CVX projection
Product Features Weights Sim
Big dining
table
wood, outdoor [0. 1.] 1.0
Foldable chair
wood, UV-resistant,
waterproof
[0.332 0.346 0.394] 0.878
Bench wood, weather resistant [0.39 0.667] 0.877
Small dining
table
wood, waterproof,
scratch resistant
[0.346 0.505 0.224] 0.872
Order is correct, order of “relevances” too
However: Fine-tuning needed (compensate for “rest-similarity”, length
of feature list, etc)

Nice theory, how could we bring it to production?
Data Scientist
Compute matching between feature and set of features with a
“cone projection”.
Back-end Engineer/Data Engineer/Data Scientist
Inference at “write time”: compute which “features” imply other
features (for each pro
fi
le or at the “ontology” level).
Inference at “query time”: too costly, but you can pre-
fi
lter items
with a kNN in the aggregate embedding space.

Summary
Possibility to exploit compositionality in embeddings to support
inferences beyond 1-to-1 similarity/nearest-neighbours.
Convex optimization/CVXPY: declarative framework that allows to
de
fi
ne and e
ffi
ciently solve “convex” problems.
Many domains of application of the CVX+embeddings framework:
HR domain: matching applicant pro
fi
les against job
requirements.
RAG (idea): apply this kind of “search” in the R-step of RAG
architectures.

Thank you
https://www.linkedin.com/in/ivan-herreros-b64a204/
GitHub Ivan Herreros, PhD
ivanherreros@gmail.com
Embedding-cvx-projection

Convex Optimization and Word Embeddings

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Empfohlen

Empfohlen (20)

Convex Optimization and Word Embeddings