Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Matching Fashion Products
With LSH Image Similarity
Hi, I’m Eddie.
@ejbell
We collect the world of
fashion into a customisable
shopping experience.
3
What makes us different?
All data is scraped from retailers
500 spiders (scrapy), 9000 designers
Almost everything is auto...
Why do we get duplicates?
There is no ISBN for fashion
Burberry Selfridges
inter-retailer
intra-retailer intra-retailer
How We Used to Find Duplicates
Lucene fuzzy string matching
Doesn’t really work
Yoox.com
3,000 products called “dress”
7,0...
How We Detect Duplicates Now
BRISK image descriptors
Leutenegger, Chli and Siegwart
BRISK: Binary Robust Invariant Scalabl...
BRISK octaves
How We Detect Duplicates Now
From Descriptors to Image Similarity
k-means
bag of words
1 x k
Architecture
Started in Storm/Java
Very painful
Ended up in Celery
Much nicer
Matching is done in elastic search.
Results
Results
Results
Results
Speed
We apply some filters but still lots of data
Could be doing 1,000,000+ comparisons per image
Speed it up with local s...
Local
Sensitivity
Hashing
Local
Sensitivity
Hashing
Local
Sensitivity
Hashing
1
1
1
1 1
1
1
1
1
1 1
0
0
0
0
0
0
0
1
1
0
0
Local
Sensitivity
Hashing
10
10
10
10 11
10
10
11
11
11 11
01
00
01
00
01
00
00
00
10
10
00
Local
Sensitivity
Hashing
101
101
101
101 111
101
100
111
110
110 110
010
000
010
000
010
000
000
000
001
101
101
Local
Sensitivity
Hashing
1011
1011
1011
1011 1111
1011
1001
1111
1101
1101 1101
0101
0001
0101
0001
0101
0000
0000
0000
0...
Local
Sensitivity
Hashing
1011
1011
1011
1011 1111
1011
1001
1111
1101
1101 1101
0101
0001
0101
0001
0101
0000
0000
0000
0...
LSH (Random Projections)
Split the n-dimensional
code in to q r-bit integers.
0 0 0 1 1 1 1 0 1 1 1 0 0 0 1 0
0 0 0 1 = 0 ...
Parameters
The Johnson-Lindestrauss lemma states that
For a n×m matrix we can project into…
d = O(log n)
… dimensions and ...
Performance
…?!
Other “useful” applications
Color Variants
Matching Sets
Outfits
Model Faces
What’s Next
Reverse image search
This works! We tried during a hackathon
Similiar textual features
i.e. word embeddings
Du...
thank you
@ejbell
Nächste SlideShare
Wird geladen in …5
×

Fashion product de-duplication with image similarity and LSH

4.707 Aufrufe

Veröffentlicht am

My talk from pydata London 12/2014

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

Fashion product de-duplication with image similarity and LSH

  1. 1. Matching Fashion Products With LSH Image Similarity
  2. 2. Hi, I’m Eddie. @ejbell
  3. 3. We collect the world of fashion into a customisable shopping experience. 3
  4. 4. What makes us different? All data is scraped from retailers 500 spiders (scrapy), 9000 designers Almost everything is automated SEO, recommendation, classification, sales This architecture comes with a few problems
  5. 5. Why do we get duplicates? There is no ISBN for fashion Burberry Selfridges inter-retailer intra-retailer intra-retailer
  6. 6. How We Used to Find Duplicates Lucene fuzzy string matching Doesn’t really work Yoox.com 3,000 products called “dress” 7,000 products called “shirt”
  7. 7. How We Detect Duplicates Now BRISK image descriptors Leutenegger, Chli and Siegwart BRISK: Binary Robust Invariant Scalable Keypoints. ICCV 2011: 2548-2555
  8. 8. BRISK octaves How We Detect Duplicates Now
  9. 9. From Descriptors to Image Similarity k-means bag of words 1 x k
  10. 10. Architecture Started in Storm/Java Very painful Ended up in Celery Much nicer Matching is done in elastic search.
  11. 11. Results
  12. 12. Results
  13. 13. Results
  14. 14. Results
  15. 15. Speed We apply some filters but still lots of data Could be doing 1,000,000+ comparisons per image Speed it up with local sensitivity hashing
  16. 16. Local Sensitivity Hashing
  17. 17. Local Sensitivity Hashing
  18. 18. Local Sensitivity Hashing 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0
  19. 19. Local Sensitivity Hashing 10 10 10 10 11 10 10 11 11 11 11 01 00 01 00 01 00 00 00 10 10 00
  20. 20. Local Sensitivity Hashing 101 101 101 101 111 101 100 111 110 110 110 010 000 010 000 010 000 000 000 001 101 101
  21. 21. Local Sensitivity Hashing 1011 1011 1011 1011 1111 1011 1001 1111 1101 1101 1101 0101 0001 0101 0001 0101 0000 0000 0000 0010 1010 1010
  22. 22. Local Sensitivity Hashing 1011 1011 1011 1011 1111 1011 1001 1111 1101 1101 1101 0101 0001 0101 0001 0101 0000 0000 0000 0010 1010 1010
  23. 23. LSH (Random Projections) Split the n-dimensional code in to q r-bit integers. 0 0 0 1 1 1 1 0 1 1 1 0 0 0 1 0 0 0 0 1 = 0 1 1 1 1 0 = 1 4 1 1 1 0 = 1 4 0 0 1 0 = 0 2 Map d-dimensional space into n-dimensional binary code. 0 0 0 1 1 1 1 0 1 1 1 0 0 0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 1 0 0 1 0 0 1 0 1 1 0 0 0 0 1 1 1 1 1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 1 1 1 0 0 0 1 0 1 1 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 Use the integer codes (i.e hash) to filter for similarity.
  24. 24. Parameters The Johnson-Lindestrauss lemma states that For a n×m matrix we can project into… d = O(log n) … dimensions and still preserve distance.
  25. 25. Performance …?!
  26. 26. Other “useful” applications
  27. 27. Color Variants
  28. 28. Matching Sets
  29. 29. Outfits
  30. 30. Model Faces
  31. 31. What’s Next Reverse image search This works! We tried during a hackathon Similiar textual features i.e. word embeddings Dual image / text vector embeddings
  32. 32. thank you @ejbell

×