3. What is Shutterstock?!
• Shutterstock sells stock images, videos & music.!
• Crowdsourced from artists around the world!
• Shutterstock reviews and indexes them for search!
• Customers by a subscription and download them!
11. Calculating Distances Between Colors!
• Euclidean distance works reasonably well in any color
space!
!
distRGB = sqrt((r1-r2)^2 + (g1-g2)^2 + (b1-b2)^2)!
distHSL = sqrt((h1-h2)^2 + (s1-s2)^2 + (l1-l2)^2)!
distLCH = sqrt((L1-L2)^2 + (C1-C2)^2 + (H1-H2)^2)!
!
• More sophisticated equations that better account for
human perception can be found at!
http://en.wikipedia.org/wiki/Color_difference!
!
13. Any operation you can do on a set of
numbers, you can do on an image!
• getting histograms!
• computing median values!
• standard deviations / variance!
• other statistics !
19. Indexing color histograms!
• index colors just like you would index text!
• volume of color == frequency of the term!
color_txt = "cfebc2
cfebc2 cfebc2 cfebc2
cfebc2 2e6b2e 2e6b2e
2e6b2e ff0000 …"
20. Solr Fields & Queries!
<field name="color" type="text_ws" …>!
• Easy to query!
• Can use solr’s default ranking effectively!
!
/solr/select?q=ff0000 e2c2d2&qf=color&defType=edismax…!
!
• or access term frequencies directly to create specific sort
functions:!
!
sort=product(tf(color,"ff0000"),tf(color,"e2c2d2")) desc!
21. Indexing color statistics!
Represent aggregate statistics of each image!
lightness:
median: 2
standard dev: 1
largest bin: 0
largest bin size: 50
saturation
median: 0
standard dev: 0
largest bin: 0
largest bin size: 100
…
22. Solr Fields & Queries!
<field name=”hue_median” type=”int” …>!
• Sort by the distance between input param and median value!
!
/solr/select?q=*&sort=abs(sub($query,hue_median)) asc!
29. How much of the image contains the
selected color?!
• Score each color by number/percentage of pixels!
!
sort=tf(color,"ff9900") desc!
30. Color Accuracy!
• As you reduce your color space, you also reduce
precision!
• reducing the colorspace too much increases recall and lowers precision. !
• Not reducing it enough lowers recall and higher precision.!
• reducing your color space down to ~100 to ~300 colors works well!
31. Weighing Multiple Colors Equally!
• If you search for 2 or more colors, the top result should
have the most even distribution of those colors!
• simple option:!
!
sort=product(tf(color,"ff9900"),tf(color,"2280e2")) desc!
!
• more complex: compute the stdev or variance of the
matching color values in your solr sort function, and sort the
results with the lowest variance first. !
!
32. Accounting for Similar & Different
Colors!
• The score for a particular color should reflect all the colors in the image.!
• At indexing time, increase the score based on similar colors; decrease it
based on differing colors.!
34. Conclusion!
• This talk provided a rough guide to building a basic search-by-color
application!
• Lots of opportunity to do more sophisticated things in image search. !
• matching colors in certain parts of an image!
• identifying visual styles (blur vs sharp, high contrast, etc)!
• patterns & textures!
• analyzing content in images (object detection)!
!
!