SlideShare a Scribd company logo
1 of 29
Download to read offline
Martin R. Ehmsen
martin@colourbox.com
 www.colourbox.com
Outline
• Personal introduction
• What is Colourbox?
• Why is Colourbox interesting?
  • Similar images
  • Search result ranking
  • Recommendations
• Why Colourbox?
  • Open position
• Questions

                                    Martin R. Ehmsen
                                  martin@colourbox.com
                                   www.colourbox.com
Who am I?
      Why am I here?
• Me
  • Graduated from IMADA, 2010
  • Ph.D. in Computer Science
  • Online Algorithms
  • Technical Project Manager
    & System Architect
• Why this talk?
  • Promote Colourbox
  • There are interesting jobs on Funen

                                            Martin R. Ehmsen
                                          martin@colourbox.com
                                           www.colourbox.com
Colourbox
• Microstock photography company
  • Resell images, vector graphics, videos
• March 2006
  • 3 employees, 50 users, 50,000 images,
    150 new images daily
• November 2011
  • 21 employees, 65,000 users, 2,000,000 images,
    5,000 new images daily
• Currently in top 10 of all stock sites, aiming at #1

                                              Martin R. Ehmsen
                                            martin@colourbox.com
                                             www.colourbox.com
Colourbox
• Only stock site that offers
  flat rate
  • Download all you want
     for €249,- per month
• Search, find, download
• Browse, get inspired,
  download




                                  Martin R. Ehmsen
                                martin@colourbox.com
                                 www.colourbox.com
The Tech
• Build using open source software
  • HTML(5), CSS(3), and Javascript (jQuery) front-end
  • Varnish, Lighttpd, and Memcached
  • MySQL (Percona) database
  • PHP backend
  • PHP, Python, and C++ scripts
  • Self-developed search engine (Colourit)
    • Using Python and C
• Cloud based on Amazon EC2 and S3

                                          Martin R. Ehmsen
                                        martin@colourbox.com
                                         www.colourbox.com
The Setup




              Martin R. Ehmsen
            martin@colourbox.com
             www.colourbox.com
The Geek Side
• Techniques from mathematics and computer science
  • Distributed/parallel computing
  • Vector mathematics
  • Various tree structures
  • Set intersection
  • Cache oblivious algorithms
  • Clustering algorithms
  • Ranking algorithms
  • Markov chains
  • etc...
                                         Martin R. Ehmsen
                                       martin@colourbox.com
                                        www.colourbox.com
Similar images
• Given an image, what other images look similar to it?
  • Inspire
  • Browse
• All images have keywords
• The keyword-to-image association is weighted
  • How pronounced is the keyword for the image?
  • Calculated automatically (more later)




                                           Martin R. Ehmsen
                                         martin@colourbox.com
                                          www.colourbox.com
Similar images




              Martin R. Ehmsen
            martin@colourbox.com
             www.colourbox.com
Similar images
• Each keyword is a dimension in keyword vector space
• Each image is then represented as a vector in this space
  • The projection onto each dimension is the weight of
     the corresponding keyword
• Example
  • (goat, 96), (white, 94), (outside, 50)
    • Vector (x, y, z, w) = (0.96, 0.94, 0.5, 0)
  • (goat, 47), (white, 81), (day, 19)
    • Vector (x, y, z, w) = (0.47, 0.81, 0, 0.19)

                                            Martin R. Ehmsen
                                          martin@colourbox.com
                                           www.colourbox.com
Similar images
• Similarity is then the angle between two vectors
  • Easily calculated using high school math
                  ·  = cos(θ)| || |
                 u v            u v
• Result between 0 and 90 degrees
• Example (cont.)
  • (0.96, 0.94, 0.5, 0) and (0.47, 0.81, 0, 0.19)
  • Approx 27.73 degrees
• Do two images with similarity of 27.73 degrees look similar?
  • Experiments determined the cut-off

                                              Martin R. Ehmsen
                                            martin@colourbox.com
                                             www.colourbox.com
Similar images




              Martin R. Ehmsen
            martin@colourbox.com
             www.colourbox.com
Similar images
• 2,000,000 images yields 2,000,000,000,000 comparisons
• No job dependencies
• No data modifications
• Relatively small data size
  • Each keyword is identified by a number
• Very easy to do in parallel and distribute
• Speed up using a trick from cache oblivious algorithms
• This is not a one-time thing
  • Keywords and weights change

                                          Martin R. Ehmsen
                                        martin@colourbox.com
                                         www.colourbox.com
Ranking of results
• How to rank search results?
  • Want the “best” results first
• First solution: Use number of downloads as parameter
• Problems
  • Old good images rank over new excellent images
  • Wrong keywords distort the results




                                          Martin R. Ehmsen
                                        martin@colourbox.com
                                         www.colourbox.com
Ranking of results
• Harvest information from the users
  • A clicked/downloaded image
    • Matched the search string well
    • Is a “good” image
  • A shown-but-not-clicked image either
    • Does not match the search string well, or
    • Is a “bad” image




                                          Martin R. Ehmsen
                                        martin@colourbox.com
                                         www.colourbox.com
Ranking of results
• The keyword-to-image association is weighted
• Keyword weights are updated when
  • a keyworder assigns a keyword (high weight)
  • a supplier assigns a keyword (high weight)
  • a user clicks on a photo presented by a search
  • a user does NOT click on a photo presented




                                           Martin R. Ehmsen
                                         martin@colourbox.com
                                          www.colourbox.com
Ranking of results
• Search “Summer Lemon”
• User clicks first result
• Pros
                             Lemon (0.9)             Lemon (0.7)
  • Second image ranked      Summer (0.8)            Summer (0.9)
    lower for “Lemon”        Apple (0.1)             Apple (0.0)

• Cons
  • “Summer” ranked lower
    on second image
                             Lemon (0.95)            Lemon (0.65)
  • Fixed by subsequent      Summer (0.86)           Summer (0.8)
    searches                 Apple (0.1)             Apple (0.0)

                                         Martin R. Ehmsen
                                       martin@colourbox.com
                                        www.colourbox.com
Ranking of results
• Images with
  • Wrong keywords are ranked very low over time
  • Good keywords are ranked higher
• Great images are ranked higher overall
• New excellent images can rank over old mediocre images




                                         Martin R. Ehmsen
                                       martin@colourbox.com
                                        www.colourbox.com
Recommendations
• “You are currently looking at image X,
  and you might be interested in image Y, Z, and W”




                                          Martin R. Ehmsen
                                        martin@colourbox.com
                                         www.colourbox.com
Recommendations
• What images are connected?
  • Let’s track our users to find out




                                          Martin R. Ehmsen
                                        martin@colourbox.com
                                         www.colourbox.com
Recommendations




#2364906   #2964241   #2684393


                           Martin R. Ehmsen
                         martin@colourbox.com
                          www.colourbox.com
Recommendations




              Martin R. Ehmsen
            martin@colourbox.com
             www.colourbox.com
Recommendations




              Martin R. Ehmsen
            martin@colourbox.com
             www.colourbox.com
Recommendations
• Enter Markov chains




• Using a Markov chain of order 1, the probability of
   going from media X to media Y is
  • How many times path X - Y was followed, divided by
  • Sum over all paths going out of image X

                                           Martin R. Ehmsen
                                         martin@colourbox.com
                                          www.colourbox.com
Why Colourbox?
•    We are
    • small - 15 people no more than 15 steps apart
    •   flat - no long chains of command
    •   flexible - we can move on good idea immediately
    •   a 2011 Gazelle - we are still hiring while others are still firing
•    We have
    • Relaxed atmosphere
    •   Flexible work hours
    •   Candy cabinet, world class coffee machine, and stunning view :-)
    •   etc...

                                                         Martin R. Ehmsen
                                                       martin@colourbox.com
                                                        www.colourbox.com
Why Colourbox?
• You get
  • to work on fun problems
  • great colleagues
  • an international outlook
  • to serve customers who are excited about us
  • to be part of a company which aims to be #1
• New projects
  • SkyFish - Company Colourbox
  • Zulubox - to articles what Colourbox is to images

                                           Martin R. Ehmsen
                                         martin@colourbox.com
                                          www.colourbox.com
We are hiring!
• Software Developer – front-end systems
  • Focus on HTML5, JS, PHP, SQL, etc.
  • Can implement a pixel-perfect design from a PSD
  • Can implement scalable code that also performs
     well when it is executed 50 times per second
  • You know your way around Linux
  • Start August 1st
    • We are construction a new office building
• Unsolicited applications are always welcome

                                         Martin R. Ehmsen
                                       martin@colourbox.com
                                        www.colourbox.com
Thank you!

Questions?

               Martin R. Ehmsen
             martin@colourbox.com
              www.colourbox.com

More Related Content

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Featured

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 

Imada presentation

  • 2. Outline • Personal introduction • What is Colourbox? • Why is Colourbox interesting? • Similar images • Search result ranking • Recommendations • Why Colourbox? • Open position • Questions Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 3. Who am I? Why am I here? • Me • Graduated from IMADA, 2010 • Ph.D. in Computer Science • Online Algorithms • Technical Project Manager & System Architect • Why this talk? • Promote Colourbox • There are interesting jobs on Funen Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 4. Colourbox • Microstock photography company • Resell images, vector graphics, videos • March 2006 • 3 employees, 50 users, 50,000 images, 150 new images daily • November 2011 • 21 employees, 65,000 users, 2,000,000 images, 5,000 new images daily • Currently in top 10 of all stock sites, aiming at #1 Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 5. Colourbox • Only stock site that offers flat rate • Download all you want for €249,- per month • Search, find, download • Browse, get inspired, download Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 6. The Tech • Build using open source software • HTML(5), CSS(3), and Javascript (jQuery) front-end • Varnish, Lighttpd, and Memcached • MySQL (Percona) database • PHP backend • PHP, Python, and C++ scripts • Self-developed search engine (Colourit) • Using Python and C • Cloud based on Amazon EC2 and S3 Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 7. The Setup Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 8. The Geek Side • Techniques from mathematics and computer science • Distributed/parallel computing • Vector mathematics • Various tree structures • Set intersection • Cache oblivious algorithms • Clustering algorithms • Ranking algorithms • Markov chains • etc... Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 9. Similar images • Given an image, what other images look similar to it? • Inspire • Browse • All images have keywords • The keyword-to-image association is weighted • How pronounced is the keyword for the image? • Calculated automatically (more later) Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 10. Similar images Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 11. Similar images • Each keyword is a dimension in keyword vector space • Each image is then represented as a vector in this space • The projection onto each dimension is the weight of the corresponding keyword • Example • (goat, 96), (white, 94), (outside, 50) • Vector (x, y, z, w) = (0.96, 0.94, 0.5, 0) • (goat, 47), (white, 81), (day, 19) • Vector (x, y, z, w) = (0.47, 0.81, 0, 0.19) Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 12. Similar images • Similarity is then the angle between two vectors • Easily calculated using high school math · = cos(θ)| || | u v u v • Result between 0 and 90 degrees • Example (cont.) • (0.96, 0.94, 0.5, 0) and (0.47, 0.81, 0, 0.19) • Approx 27.73 degrees • Do two images with similarity of 27.73 degrees look similar? • Experiments determined the cut-off Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 13. Similar images Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 14. Similar images • 2,000,000 images yields 2,000,000,000,000 comparisons • No job dependencies • No data modifications • Relatively small data size • Each keyword is identified by a number • Very easy to do in parallel and distribute • Speed up using a trick from cache oblivious algorithms • This is not a one-time thing • Keywords and weights change Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 15. Ranking of results • How to rank search results? • Want the “best” results first • First solution: Use number of downloads as parameter • Problems • Old good images rank over new excellent images • Wrong keywords distort the results Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 16. Ranking of results • Harvest information from the users • A clicked/downloaded image • Matched the search string well • Is a “good” image • A shown-but-not-clicked image either • Does not match the search string well, or • Is a “bad” image Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 17. Ranking of results • The keyword-to-image association is weighted • Keyword weights are updated when • a keyworder assigns a keyword (high weight) • a supplier assigns a keyword (high weight) • a user clicks on a photo presented by a search • a user does NOT click on a photo presented Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 18. Ranking of results • Search “Summer Lemon” • User clicks first result • Pros Lemon (0.9) Lemon (0.7) • Second image ranked Summer (0.8) Summer (0.9) lower for “Lemon” Apple (0.1) Apple (0.0) • Cons • “Summer” ranked lower on second image Lemon (0.95) Lemon (0.65) • Fixed by subsequent Summer (0.86) Summer (0.8) searches Apple (0.1) Apple (0.0) Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 19. Ranking of results • Images with • Wrong keywords are ranked very low over time • Good keywords are ranked higher • Great images are ranked higher overall • New excellent images can rank over old mediocre images Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 20. Recommendations • “You are currently looking at image X, and you might be interested in image Y, Z, and W” Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 21. Recommendations • What images are connected? • Let’s track our users to find out Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 22. Recommendations #2364906 #2964241 #2684393 Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 23. Recommendations Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 24. Recommendations Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 25. Recommendations • Enter Markov chains • Using a Markov chain of order 1, the probability of going from media X to media Y is • How many times path X - Y was followed, divided by • Sum over all paths going out of image X Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 26. Why Colourbox? • We are • small - 15 people no more than 15 steps apart • flat - no long chains of command • flexible - we can move on good idea immediately • a 2011 Gazelle - we are still hiring while others are still firing • We have • Relaxed atmosphere • Flexible work hours • Candy cabinet, world class coffee machine, and stunning view :-) • etc... Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 27. Why Colourbox? • You get • to work on fun problems • great colleagues • an international outlook • to serve customers who are excited about us • to be part of a company which aims to be #1 • New projects • SkyFish - Company Colourbox • Zulubox - to articles what Colourbox is to images Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 28. We are hiring! • Software Developer – front-end systems • Focus on HTML5, JS, PHP, SQL, etc. • Can implement a pixel-perfect design from a PSD • Can implement scalable code that also performs well when it is executed 50 times per second • You know your way around Linux • Start August 1st • We are construction a new office building • Unsolicited applications are always welcome Martin R. Ehmsen martin@colourbox.com www.colourbox.com
  • 29. Thank you! Questions? Martin R. Ehmsen martin@colourbox.com www.colourbox.com