Did you know that eHarmony is responsible for 5% of all new US marriages, and that more than 600,000 people already got married to partners they met on eHarmony?
eHarmony was founded to give people a better chance at finding happy, passionate, and fulfilling relationships. During this talk I describe how we go about creating Compatible matches, and how we leverage Big Data technologies to accomplish that goal.
Specifically, I discuss how we take Billion+ potential matches that we find through MongoDB, store them in a Voldemort NoSQL datastore, and then run multiple Hadoop jobs to come up with a filtered list based on Machine Learned models.
Our Hadoop clusters are in-house, high density, low power Seamicro installations, and we use Spring Batch and Spring Data Hadoop to orchestrate the Hadoop jobs.
31. DATA NEEDS FOR AFFINITY
50M+ REGISTERED USERS
103 ATTRIBUTES
250M+
PHOTOS
107 DAILY MATCHES
4B+ QUESTIONNAIRES
ANSWERED
32. COMMUNICATION AGGREGATES
EVENT LISTENER
SERVICE
USER ACTIVITY
SERVICE
10K EVENTS
PER SECOND
HOURLY, DAILY
~5MS
RESPONSE
TIMES
USER
SERVICE
TOTAL
33. OFFLINE BATCH JOBS
USER
SERVICE
1+GB
Compressed
Protocol
MAP-SIDE JOINS
Buffers
( T B ) SCORING
PAIRINGS
SERVICE
!
750M
Compressed
Protocol
Buffers
BILLION+
POTENTIAL
MATCHES
34. AMAZON
EMR
AWS DIRECT
CONNECT
IN-HOUSE
SEAMICRO
256 NODES
50TB STORAGE
LOW OPERATIONAL COST
LOW POWER CONSUMPTION
PREDICTABLE COMPLETION TIMES
DATA RETRIEVAL LATENCY