SlideShare ist ein Scribd-Unternehmen logo
1 von 87
Wednesday, June 12, 13
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/eharmony-hadoop
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Data Science of Love
Vaclav Petricek @petricek
Wednesday, June 12, 13
The eHarmony Difference › Who we are
~45% Tech
Wednesday, June 12, 13
The eHarmony Difference › Who we are
~15% Customer Care
~45% Tech
Wednesday, June 12, 13
The eHarmony Difference › Who we are
~15% Customer Care
~45% Tech
~10% Marketing
Wednesday, June 12, 13
The eHarmony Difference › Compatibility Matching System®
Wednesday, June 12, 13
The eHarmony Difference › Compatibility Matching System®
Compatibility Matching
System®
Wednesday, June 12, 13
The eHarmony Difference › Compatibility Matching System®
Compatibility Matching
System®
Compatibility
Matching
1
Wednesday, June 12, 13
The eHarmony Difference › Compatibility Matching System®
Compatibility Matching
System®
Compatibility
Matching
1
Affinity
Matching
2
Wednesday, June 12, 13
The eHarmony Difference › Compatibility Matching System®
Compatibility Matching
System®
Match
Distribution
3
Compatibility
Matching
1
Affinity
Matching
2
Wednesday, June 12, 13
The eHarmony Difference
Wednesday, June 12, 13
Affinity
Matching
Match
Distribution
2 3
The eHarmony Difference › Compatibility Matching System®
Compatibility
Matching
1
Wednesday, June 12, 13
Affinity
Matching
Match
Distribution
2 3
The eHarmony Difference › Compatibility Matching System®
Compatibility
Matching
1
Wednesday, June 12, 13
Wednesday, June 12, 13
Wednesday, June 12, 13
150	
  
ques)ons
Wednesday, June 12, 13
150	
  
ques)ons
Personality
Values
A5ributes
Beliefs
Wednesday, June 12, 13
Compatibility Matching › Obstreperousness
Wednesday, June 12, 13
Compatibility Matching › Romantic
Wednesday, June 12, 13
CMP (CMP Makes Pairings)
Wednesday, June 12, 13
CMP (CMP Makes Pairings)
Wednesday, June 12, 13
CMP (CMP Makes Pairings)
Wednesday, June 12, 13
CMP (CMP Makes Pairings)
Compa)bility	
  
Models
Wednesday, June 12, 13
Compatibility Matching ›
Wednesday, June 12, 13
Compatibility Matching ›
Wednesday, June 12, 13
Match
Distribution
3
Compatibility
Matching
1
Affinity
Matching
2
The eHarmony Difference › Compatibility Matching System®
Wednesday, June 12, 13
Match
Distribution
3
Compatibility
Matching
1
Affinity
Matching
2
The eHarmony Difference › Compatibility Matching System®
Layers on Top of
Compatibility Matching
Wednesday, June 12, 13
Affinity Matching ›
Wednesday, June 12, 13
61 21
Affinity Matching ›
Wednesday, June 12, 13
61 21
3000
Affinity Matching ›
Wednesday, June 12, 13
61 21
3000
Affinity Matching ›
Wednesday, June 12, 13
Affinity Matching ›
Wednesday, June 12, 13
………
Affinity Matching ›
Wednesday, June 12, 13
Affinity Matching › Distance
Prob(	
  	
  	
  	
  	
  	
  	
  )
Wednesday, June 12, 13
Affinity Matching › Distance
Wednesday, June 12, 13
Affinity Matching › Height difference
Prob(	
  	
  	
  	
  	
  	
  	
  ) 4	
  -­‐	
  8	
  in
cm
Wednesday, June 12, 13
Affinity Matching › “Attractiveness”
Prob(	
  	
  	
  	
  	
  	
  	
  )
Wednesday, June 12, 13
Affinity Matching › Zoom level
Wednesday, June 12, 13
Affinity Matching › Zoom level
Wednesday, June 12, 13
Affinity Matching › Zoom level
Wednesday, June 12, 13
25% -­‐1%-­‐1% -­‐24% 20% 13%
9% -­‐5%-­‐5% -­‐27% 7% 0%9% -­‐5%-­‐5% -­‐27% 7% 0%
-­‐12% -­‐21%-­‐21% -­‐42% -­‐19% -­‐23%
19% 0%0% -­‐28% 28% 10%
9% -­‐11%-­‐11% -­‐35% 11% 44%
Affinity Matching › Food preference
Wednesday, June 12, 13
25% -­‐1%-­‐1% -­‐24% 20% 13%
9% -­‐5%-­‐5% -­‐27% 7% 0%9% -­‐5%-­‐5% -­‐27% 7% 0%
-­‐12% -­‐21%-­‐21% -­‐42% -­‐19% -­‐23%
19% 0%0% -­‐28% 28% 10%
9% -­‐11%-­‐11% -­‐35% 11% 44%
Affinity Matching › Food preference
Wednesday, June 12, 13
Wednesday, June 12, 13
Wednesday, June 12, 13
Wednesday, June 12, 13
Wednesday, June 12, 13
Wednesday, June 12, 13
Affinity Matching ›
~40M	
  registered	
  users
~10^7	
  matches	
  per	
  day
~10^3	
  a5ributes
...
...
Prob( | data)
?
~10^8	
  daily
Prob( | features)
Wednesday, June 12, 13
Affinity Matching ›
~40M	
  registered	
  users
~10^7	
  matches	
  per	
  day
~10^3	
  a5ributes
...
...
Prob( | data)
?
~10^8	
  daily
Prob( | features)
Unsupervised	
  features
(LDA,	
  classifiers)
Constructed	
  features
Wednesday, June 12, 13
1TB RAM
Wednesday, June 12, 13
Maestro: Data
Protocol	
  Buffers
distcp
Wednesday, June 12, 13
Modeling: Maestro
UserMatchCommunica)on
feature	
  expansion
Sparse	
  
ML	
  format
models
Wednesday, June 12, 13
Modeling: Model parametrizations
Model	
  parameters
features
weights
tree	
  splits
Calibra)on	
  Spline
DISTANCE:534
Wednesday, June 12, 13
Modeling: Model parametrizations
Model	
  parameters
features
weights
tree	
  splits
Calibra)on	
  Spline
DISTANCE:534
DSL
Wednesday, June 12, 13
Modeling: Scala DSL
“same_religion”:”${user.profile.religion}=={cand.profile.religion}”
“cmp_drinking”:”cmp(${user.profile.drinking},{cand.profile.drinking})”
<
“strict_distance_u”:”${user.profile.accepted_distance}<={pairing.distance}”
60miles
Wednesday, June 12, 13
750M	
  Compressed
Protocol	
  Buffers
Production: Spring Conductor
Map-­‐side	
  joins
(TB)
Matching	
  User	
  Serice
Pairings	
  Browser	
  
Service
1+G	
  Compressed	
  Protocol	
  Buffers	
  
Scorer
Wednesday, June 12, 13
?
Production: FeatureX (expensive features)
FeatureX
LSH
NLP
Voldemort	
  backed	
  
Service
Wednesday, June 12, 13
Production: User Activity Service
User
Ac)vity
Service
10K	
  events/s
Matching
User
Service
~5ms	
  response
?
Event	
  Listener
Wednesday, June 12, 13
eHarmony & OpenSource
github.com/petricek/datatools
github.com/eHarmony/seeking
github.com/eHarmony/hive
springsource.org/spring-­‐data/hadoop
github.com/JohnLangford/vowpal_wabbit
Wednesday, June 12, 13
Compatibility
Matching
1
Affinity
Matching
2
Match
Distribution
3
The eHarmony Difference › Compatibility Matching System®
Wednesday, June 12, 13
Compatibility
Matching
1
Affinity
Matching
2
Match
Distribution
3
The eHarmony Difference › Compatibility Matching System®
Delivering the right
matches at the right
time to as many people
as possible across the
entire network.
Wednesday, June 12, 13
Match Distribution › Graph optimization
Wednesday, June 12, 13
Match Distribution › Graph optimization
Wednesday, June 12, 13
Match Distribution › Graph optimization
2 2
Wednesday, June 12, 13
Match Distribution › Graph optimization
2 21
Wednesday, June 12, 13
Match Distribution › Graph optimization
2 21Prob( | data)
Wednesday, June 12, 13
Match Distribution › Graph optimization
2 21Prob( | data)
Wednesday, June 12, 13
Match Distribution › Graph optimization
2 2Prob( | data)
Wednesday, June 12, 13
Match Distribution › Graph optimization
2 2Prob( | data)
Wednesday, June 12, 13
Resulting Customer Experience › Guided
Communication
Wednesday, June 12, 13
Resulting Customer Experience › Guided
Communication
Wednesday, June 12, 13
? !
Resulting Customer Experience › Guided
Communication
Wednesday, June 12, 13
Resulting Customer Experience › Success!
Wednesday, June 12, 13
Resulting Customer Experience › Success!
Wednesday, June 12, 13
eHarmony Results › The eHarmony Impact
2005
90
eHarmony Members
Married Every Day
Wednesday, June 12, 13
eHarmony Results › The eHarmony Impact
2005 2007
236
eHarmony Members
Married Every Day
Wednesday, June 12, 13
eHarmony Results › The eHarmony Impact
2005 2007 2009
542
eHarmony Members
Married Every Day
Wednesday, June 12, 13
Proceedings of National Academy of Sciences
Wednesday, June 12, 13
Press coverage
Wednesday, June 12, 13
Since	
  2005,	
  about	
  1/3	
  of	
  couples	
  
who	
  have	
  married	
  in	
  the	
  US	
  
have	
  met	
  online	
  (35%)
eHarmony Results › The eHarmony Impact
*	
  according	
  to	
  survey	
  of	
  couples	
  married	
  between	
  2005-­‐2012	
  by	
  Harris	
  InteracQve	
  for	
  eHarmony
Wednesday, June 12, 13
Rates of breakup or divorce
0%
2.0%
4.0%
6.0%
8.0%
All Online Offline
*	
  according	
  to	
  survey	
  of	
  couples	
  married	
  between	
  2005-­‐2012	
  by	
  Harris	
  InteracQve	
  for	
  eHarmony
Wednesday, June 12, 13
The	
  largest	
  number	
  
of	
  marriages	
  surveyed	
  
who	
  met	
  via	
  online	
  da)ng	
  
had	
  met	
  on	
  eHarmony	
  (25%)
eHarmony Results › The eHarmony Impact
*	
  according	
  to	
  survey	
  of	
  couples	
  married	
  between	
  2005-­‐2012	
  by	
  Harris	
  InteracQve	
  for	
  eHarmony
Wednesday, June 12, 13
Rates of breakup or divorce
0%
2.0%
4.0%
6.0%
8.0%
eHarmony All Other Online Offline
*	
  according	
  to	
  survey	
  of	
  couples	
  married	
  between	
  2005-­‐2012	
  by	
  by	
  Harris	
  InteracQve	
  for	
  eHarmony
Wednesday, June 12, 13
Rates of breakup or divorce
0%
2.0%
4.0%
6.0%
8.0%
eHarmony All Other Online Offline
*	
  according	
  to	
  survey	
  of	
  couples	
  married	
  between	
  2005-­‐2012	
  by	
  by	
  Harris	
  InteracQve	
  for	
  eHarmony
@petricek
linkedin.com/in/petricek
bit.ly/jobateharmony
Wednesday, June 12, 13
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/eharmony
-hadoop

Weitere ähnliche Inhalte

Andere mochten auch

NIKE - Segmentation & Targeting
NIKE - Segmentation & TargetingNIKE - Segmentation & Targeting
NIKE - Segmentation & Targeting
Arveen Shaheel
 
04 marketing segmentation,targeting and positioning
04 marketing segmentation,targeting and positioning04 marketing segmentation,targeting and positioning
04 marketing segmentation,targeting and positioning
Thejus Jayadev
 

Andere mochten auch (8)

LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
 
Facebook Targeting: User Acquisition
Facebook Targeting: User AcquisitionFacebook Targeting: User Acquisition
Facebook Targeting: User Acquisition
 
LinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationLinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data Application
 
NIKE - Segmentation & Targeting
NIKE - Segmentation & TargetingNIKE - Segmentation & Targeting
NIKE - Segmentation & Targeting
 
04 marketing segmentation,targeting and positioning
04 marketing segmentation,targeting and positioning04 marketing segmentation,targeting and positioning
04 marketing segmentation,targeting and positioning
 
Segmentation, Targeting, and Positioning
Segmentation, Targeting, and PositioningSegmentation, Targeting, and Positioning
Segmentation, Targeting, and Positioning
 
Webinar Presentation: Targeting with LinkedIn
Webinar Presentation: Targeting with LinkedInWebinar Presentation: Targeting with LinkedIn
Webinar Presentation: Targeting with LinkedIn
 
Audience Targeting
Audience TargetingAudience Targeting
Audience Targeting
 

Mehr von C4Media

Mehr von C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 

Data Science of Love