SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Ivan Herreros, CAIML, Jan 24
Convex optimization and
text-embeddings
Explainable inferences with embeddings
Motivational example: Furniture e-commerce
User wants to
fi
lter by “outdoor”
Product Features
Big dining table wood, outdoor
Foldable chair wood, UV-resistant, waterproof
Bench wood, weather resistant
Small dining table wood, waterproof, scratch resistant
Option 1: 1-to-1 keyword matching
Result is completely “meaning”-agnostic
Product Features
Big dining table wood, outdoor
Foldable chair wood, UV-resistant, waterproof
Bench wood, weather resistant
Small dining table wood, waterproof, scratch resistant
Option 2: Direct retrieval including Synonyms
Needs an “ontology” marking synonyms (e.g. Concepts with alternative
labels)
All “inference” occurred at the creation of the ontology
Product Features
Big dining table wood, outdoor
Foldable chair wood, UV-resistant, waterproof
Bench wood, weather resistant
Small dining table wood, waterproof, scratch resistant
The 2023/24 way : ask ChatGPT
Product Features
Big dining table wood, outdoor
Foldable chair wood, UV-resistant, waterproof
Bench wood, weather resistant
Small dining table wood, waterproof, scratch resistant
Can we obtain chatGPT’s output without an LLM?
To be able to perform the same inferences that GPT made we need:
Derive similarities and set meaningful thresholds
“Weather resistant” implies “outdoor”
“Waterproof” is similar, but not enough to imply “outdoor”
Combine concepts
“Waterproof” and “UV resistant” combined imply “outdoor”
Embeddings
Representations of concepts as vectors of real numbers (documents,
images, graphs, etc.)
Similar concepts are mapped into nearby positions in a high-
dimensional space.
“the study of [X] is becoming a problem of vector
space mathematics” MIT Tech Review Sept, 2015
Embeddings in language processing
Distributional word embeddings (capturing co-occurring statistics)
powerful since mid 2010s (Mikolov et al., 2013)
• Transformer-based “word-in-context” embeddings
(BERT: Devlin et al., 2018)
Word, sentence and document embeddings, etc.
Today: openAI (among others) o
ff
ers endpoints to encode text as
embeddings.
Similarities in embedding space
Feature-to-feature similarities to “outdoor”
Feature Similarity
outdoor 1.0
weather resistant 0.859
waterproof 0.845
UV resistant 0.834
wood 0.804
scratch resistant 0.795
9
Order looks good, but how to interpret the values themselves?
Similarities in embedding space
Feature-to-feature similarities to “outdoor”
Feature Similarity
outdoor 1.0
weather resistant 0.859
waterproof 0.845
UV resistant 0.834
wood 0.804
scratch resistant 0.795
10
Order looks good, but how to interpret the values themselves?
Calibrating similarities in embedding space
Feature-to-feature similarities to “outdoor”
Indeed, weather resistant is 2x closer to outdoor than waterproof
Feature Similarity Percentile
outdoor 1.0 100
weather resistant 0.859 97
waterproof 0.845 94
UV resistant 0.834 91
wood 0.804 69
scratch resistant 0.795 56
11
To be able to perform the same inferences that GPT made we need
Derive similarities and set meaningful thresholds
“Weather resistant” implies “outdoor”
“Waterproofed” is similar, but not enough to imply “outdoor”
Combine concepts
“Waterproofed” and “UV resistant” combined imply “outdoor”
Can we obtain chatGPT’s output without an LLM?
The “surprising” properties of embeddings
The “surprising” properties of embeddings
King - Man + Woman ≈ Queen
M
an
Woman
K
i
n
g Queen
Compositionality of meaning with embeddings
If
King - Man + Woman ≈ Queen
then (maybe):
UV resistant + waterproofed ≈ outdoor
Can we combine “features”?
Feature(s) Similarity Percentile
outdoor 1.0 100
UV resistant + waterproof 0.871 98
weather resistant 0.859 97
waterproof 0.845 94
UV resistant 0.834 91
wood 0.804 69
scratch resistant 0.795 56
To be able to perform the same inferences that GPT made we need to:
Derive similarities and set meaningful thresholds
“Weather resistant” implies “outdoor”
“Waterproofed” is similar, but not enough to imply “outdoor”
Combine concepts
“Waterproofed” and “UV resistant” combined imply “outdoor”
Can we obtain chatGPT’s output without an LLM?
Recap: What were we trying to solve?
Product Owner
If a user
fi
lters content by some feature, use all relevant features to
compute the match.
Data Scientist
Abstract problem: given a target “feature”
fi
nd out whether a set of
“features” implies it.
Idea: instead of either include or exclude each feature, why not
“weight” each feature’s contribution?
Geometrical understanding the problem
Three “embeddings” (a, b and c)
Points: (normalized) non-negative
combinations of them.
Each “point” corresponds to an
embedding that is “implied” by a,
b and c.
Closest “implied” vector
Three “embeddings” (a, b and c)
Points: (normalized) non-negative
combinations of them.
v: “query” embedding.
v_proj: most similar vector to v
generated as a combination of a,
b and c.
“Explainable” decomposition
Three “embeddings” (a, b and c).
v: “query” embedding.
v_proj: most similar vector to v
generated as a combination of a,
b and c.
We can extract how a, b and c
contributed to the “best match”.
Finding these projections can be
formulated as a convex-
optimization problem
Convex optimization
minimize f0(x)
subject to fi(x) ≤ 0, i = 1,…, m
gi(x) = 0, i = 1,…, p
Loss or cost function
Constraints
(define domain)
x : vector with variables to optimize
Convex optimization
minimize f0(x)
subject to fi(x) ≤ 0, i = 1,…, m
Ax = b
Convex loss and inequality
constraint functions
Linear equality constrains
Minimizing a convex function over a convex set.
Convex functions and convex sets
Convex function Convex set
Combine embeddings to maximize cosine similarity
minimize −vT
Mx
s.t. ∥Mx∥2 ≤ 1
x ≥ 0
“Mixed” embedding
“Mixed” embedding has
“at most” L2-norm=1
Non-negative mixing coefficients
v : target embedding (column vector)
M : matrix that contains the “source” embeddings (e.g.: a, b and c)
x : vector with the mixing coe
ffi
cients
CVXPY: Convex optimization in Python
Option 3: Matching through CVX projection
Product Features Weights Sim
Big dining
table
wood, outdoor [0. 1.] 1.0
Foldable chair
wood, UV-resistant,
waterproof
[0.332 0.346 0.394] 0.878
Bench wood, weather resistant [0.39 0.667] 0.877
Small dining
table
wood, waterproof,
scratch resistant
[0.346 0.505 0.224] 0.872
Order is correct, order of “relevances” too
However: Fine-tuning needed (compensate for “rest-similarity”, length
of feature list, etc)
Nice theory, how could we bring it to production?
Data Scientist
Compute matching between feature and set of features with a
“cone projection”.
Back-end Engineer/Data Engineer/Data Scientist
Inference at “write time”: compute which “features” imply other
features (for each pro
fi
le or at the “ontology” level).
Inference at “query time”: too costly, but you can pre-
fi
lter items
with a kNN in the aggregate embedding space.
Summary
Possibility to exploit compositionality in embeddings to support
inferences beyond 1-to-1 similarity/nearest-neighbours.
Convex optimization/CVXPY: declarative framework that allows to
de
fi
ne and e
ffi
ciently solve “convex” problems.
Many domains of application of the CVX+embeddings framework:
HR domain: matching applicant pro
fi
les against job
requirements.
RAG (idea): apply this kind of “search” in the R-step of RAG
architectures.
Thank you
https://www.linkedin.com/in/ivan-herreros-b64a204/
GitHub Ivan Herreros, PhD
ivanherreros@gmail.com
Embedding-cvx-projection

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 

Kürzlich hochgeladen (20)

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 

Empfohlen

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Empfohlen (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Convex Optimization and Word Embeddings

  • 1. Ivan Herreros, CAIML, Jan 24 Convex optimization and text-embeddings Explainable inferences with embeddings
  • 2. Motivational example: Furniture e-commerce User wants to fi lter by “outdoor” Product Features Big dining table wood, outdoor Foldable chair wood, UV-resistant, waterproof Bench wood, weather resistant Small dining table wood, waterproof, scratch resistant
  • 3. Option 1: 1-to-1 keyword matching Result is completely “meaning”-agnostic Product Features Big dining table wood, outdoor Foldable chair wood, UV-resistant, waterproof Bench wood, weather resistant Small dining table wood, waterproof, scratch resistant
  • 4. Option 2: Direct retrieval including Synonyms Needs an “ontology” marking synonyms (e.g. Concepts with alternative labels) All “inference” occurred at the creation of the ontology Product Features Big dining table wood, outdoor Foldable chair wood, UV-resistant, waterproof Bench wood, weather resistant Small dining table wood, waterproof, scratch resistant
  • 5. The 2023/24 way : ask ChatGPT Product Features Big dining table wood, outdoor Foldable chair wood, UV-resistant, waterproof Bench wood, weather resistant Small dining table wood, waterproof, scratch resistant
  • 6. Can we obtain chatGPT’s output without an LLM? To be able to perform the same inferences that GPT made we need: Derive similarities and set meaningful thresholds “Weather resistant” implies “outdoor” “Waterproof” is similar, but not enough to imply “outdoor” Combine concepts “Waterproof” and “UV resistant” combined imply “outdoor”
  • 7. Embeddings Representations of concepts as vectors of real numbers (documents, images, graphs, etc.) Similar concepts are mapped into nearby positions in a high- dimensional space. “the study of [X] is becoming a problem of vector space mathematics” MIT Tech Review Sept, 2015
  • 8. Embeddings in language processing Distributional word embeddings (capturing co-occurring statistics) powerful since mid 2010s (Mikolov et al., 2013) • Transformer-based “word-in-context” embeddings (BERT: Devlin et al., 2018) Word, sentence and document embeddings, etc. Today: openAI (among others) o ff ers endpoints to encode text as embeddings.
  • 9. Similarities in embedding space Feature-to-feature similarities to “outdoor” Feature Similarity outdoor 1.0 weather resistant 0.859 waterproof 0.845 UV resistant 0.834 wood 0.804 scratch resistant 0.795 9 Order looks good, but how to interpret the values themselves?
  • 10. Similarities in embedding space Feature-to-feature similarities to “outdoor” Feature Similarity outdoor 1.0 weather resistant 0.859 waterproof 0.845 UV resistant 0.834 wood 0.804 scratch resistant 0.795 10 Order looks good, but how to interpret the values themselves?
  • 11. Calibrating similarities in embedding space Feature-to-feature similarities to “outdoor” Indeed, weather resistant is 2x closer to outdoor than waterproof Feature Similarity Percentile outdoor 1.0 100 weather resistant 0.859 97 waterproof 0.845 94 UV resistant 0.834 91 wood 0.804 69 scratch resistant 0.795 56 11
  • 12. To be able to perform the same inferences that GPT made we need Derive similarities and set meaningful thresholds “Weather resistant” implies “outdoor” “Waterproofed” is similar, but not enough to imply “outdoor” Combine concepts “Waterproofed” and “UV resistant” combined imply “outdoor” Can we obtain chatGPT’s output without an LLM?
  • 14. The “surprising” properties of embeddings King - Man + Woman ≈ Queen M an Woman K i n g Queen
  • 15. Compositionality of meaning with embeddings If King - Man + Woman ≈ Queen then (maybe): UV resistant + waterproofed ≈ outdoor
  • 16. Can we combine “features”? Feature(s) Similarity Percentile outdoor 1.0 100 UV resistant + waterproof 0.871 98 weather resistant 0.859 97 waterproof 0.845 94 UV resistant 0.834 91 wood 0.804 69 scratch resistant 0.795 56
  • 17. To be able to perform the same inferences that GPT made we need to: Derive similarities and set meaningful thresholds “Weather resistant” implies “outdoor” “Waterproofed” is similar, but not enough to imply “outdoor” Combine concepts “Waterproofed” and “UV resistant” combined imply “outdoor” Can we obtain chatGPT’s output without an LLM?
  • 18. Recap: What were we trying to solve? Product Owner If a user fi lters content by some feature, use all relevant features to compute the match. Data Scientist Abstract problem: given a target “feature” fi nd out whether a set of “features” implies it. Idea: instead of either include or exclude each feature, why not “weight” each feature’s contribution?
  • 19. Geometrical understanding the problem Three “embeddings” (a, b and c) Points: (normalized) non-negative combinations of them. Each “point” corresponds to an embedding that is “implied” by a, b and c.
  • 20. Closest “implied” vector Three “embeddings” (a, b and c) Points: (normalized) non-negative combinations of them. v: “query” embedding. v_proj: most similar vector to v generated as a combination of a, b and c.
  • 21. “Explainable” decomposition Three “embeddings” (a, b and c). v: “query” embedding. v_proj: most similar vector to v generated as a combination of a, b and c. We can extract how a, b and c contributed to the “best match”. Finding these projections can be formulated as a convex- optimization problem
  • 22. Convex optimization minimize f0(x) subject to fi(x) ≤ 0, i = 1,…, m gi(x) = 0, i = 1,…, p Loss or cost function Constraints (define domain) x : vector with variables to optimize
  • 23. Convex optimization minimize f0(x) subject to fi(x) ≤ 0, i = 1,…, m Ax = b Convex loss and inequality constraint functions Linear equality constrains Minimizing a convex function over a convex set.
  • 24. Convex functions and convex sets Convex function Convex set
  • 25. Combine embeddings to maximize cosine similarity minimize −vT Mx s.t. ∥Mx∥2 ≤ 1 x ≥ 0 “Mixed” embedding “Mixed” embedding has “at most” L2-norm=1 Non-negative mixing coefficients v : target embedding (column vector) M : matrix that contains the “source” embeddings (e.g.: a, b and c) x : vector with the mixing coe ffi cients
  • 27. Option 3: Matching through CVX projection Product Features Weights Sim Big dining table wood, outdoor [0. 1.] 1.0 Foldable chair wood, UV-resistant, waterproof [0.332 0.346 0.394] 0.878 Bench wood, weather resistant [0.39 0.667] 0.877 Small dining table wood, waterproof, scratch resistant [0.346 0.505 0.224] 0.872 Order is correct, order of “relevances” too However: Fine-tuning needed (compensate for “rest-similarity”, length of feature list, etc)
  • 28. Nice theory, how could we bring it to production? Data Scientist Compute matching between feature and set of features with a “cone projection”. Back-end Engineer/Data Engineer/Data Scientist Inference at “write time”: compute which “features” imply other features (for each pro fi le or at the “ontology” level). Inference at “query time”: too costly, but you can pre- fi lter items with a kNN in the aggregate embedding space.
  • 29. Summary Possibility to exploit compositionality in embeddings to support inferences beyond 1-to-1 similarity/nearest-neighbours. Convex optimization/CVXPY: declarative framework that allows to de fi ne and e ffi ciently solve “convex” problems. Many domains of application of the CVX+embeddings framework: HR domain: matching applicant pro fi les against job requirements. RAG (idea): apply this kind of “search” in the R-step of RAG architectures.
  • 30. Thank you https://www.linkedin.com/in/ivan-herreros-b64a204/ GitHub Ivan Herreros, PhD ivanherreros@gmail.com Embedding-cvx-projection