SlideShare ist ein Scribd-Unternehmen logo
1 von 18
The Evolution of Hadoop and AWS at
About  eHarmony Launched in 2000 Goal to create compatible matches that lead to happy, long-term relationships Compatibility models based on decades of research and clinical experience in psychology Available in United States, Canada, Australia and United Kingdom
Over 20 million users + 320-item questionnaire answered by each user = BIG DATA
Continuous Improvement on Match Quality Requires infrastructure that supports: More user data More complex models Increased growth
Why Not a Traditional Solution? Scaling vertically Complex to build Scaling a constant challenge Long engineering effort Expensive
How Hadoop Solved Our Problem Cuts BIG DATA into small data Horizontal scaling platform Fault tolerance Commodity boxes
How Amazon Solved Our Problem Amazon EC2 & S3 provided an attractive approach Hosted Hadoop framework Cost effective Ability to scale on demand SOLD!
AWS Pricing Model Pay-per-use elastic model Choice of server type Lets you get up and running quickly and cheaply Highly cost effective alternative to doing it        in-house 9
Performance by Instance Type 10 Minutes
AWS Elastic MapReduce EC2 cluster managed for you behind the scenes Only have to worry about MapReduce  Read/write data directly from S3 or HDFS Faster turn-around time to production
Elastic MapReduce for eHarmony Vastly simplified our Hadoop processing No need to explicitly allocate, start and shutdown EC2 instances No need to explicitly manipulate master node Cluster control and job management reduced to a single local command 12
Architecture Data Warehouse Amazon Cloud S3 Elastic MapReduce upload User data dump input Hadoop Jobs download output update key-value store Data Warehouse
Challenges The overall process depends on the success of each stage Assume every stage is unreliable Need to build retry/abort logic to handle failures 14
Total Execution Time 15
Lessons Learned EC2/S3/EMR = cost effective Hadoop community support is great Hadoop combined w/ real-time system = tricky Dev tools really easy to work right out of the box Ensuring end-to-end reliability poses biggest challenges 16
Looking Ahead More tools to empower business intelligence beyond engineering HIVE Helps empower engineers & non-engineers to create analytic jobs on the fly Tools for integrating to and from a traditional database/data warehouse to a Hadoop cluster
User Satisfaction

Weitere ähnliche Inhalte

Was ist angesagt?

AWSome Day Manila - Opening Keynote, Feb 25 2014
AWSome Day Manila - Opening Keynote, Feb 25 2014AWSome Day Manila - Opening Keynote, Feb 25 2014
AWSome Day Manila - Opening Keynote, Feb 25 2014
Amazon Web Services
 

Was ist angesagt? (20)

AWS Customer Presentation - How Runa uses AWS
AWS Customer Presentation - How Runa uses AWS AWS Customer Presentation - How Runa uses AWS
AWS Customer Presentation - How Runa uses AWS
 
MCT Summit Azure automated Machine Learning
MCT Summit Azure automated Machine Learning MCT Summit Azure automated Machine Learning
MCT Summit Azure automated Machine Learning
 
AWS Summit 2013 | Auckland - Powering Start-ups with AWS
AWS Summit 2013 | Auckland - Powering Start-ups with AWSAWS Summit 2013 | Auckland - Powering Start-ups with AWS
AWS Summit 2013 | Auckland - Powering Start-ups with AWS
 
Aws education meetup - Simplifying access to learning resources -Aditi Gupta
Aws education meetup - Simplifying access to learning resources -Aditi GuptaAws education meetup - Simplifying access to learning resources -Aditi Gupta
Aws education meetup - Simplifying access to learning resources -Aditi Gupta
 
ESRI UC 2010 - ArcGIS Server Virtualization and High-Performance Computing
ESRI UC 2010 - ArcGIS Server Virtualization and High-Performance ComputingESRI UC 2010 - ArcGIS Server Virtualization and High-Performance Computing
ESRI UC 2010 - ArcGIS Server Virtualization and High-Performance Computing
 
Managing an Enterprise Class Hybrid Architecture
Managing an Enterprise Class Hybrid ArchitectureManaging an Enterprise Class Hybrid Architecture
Managing an Enterprise Class Hybrid Architecture
 
Utilizing Human Data Validation For KPI Analysis And Machine Learning
Utilizing Human Data Validation For KPI Analysis And Machine LearningUtilizing Human Data Validation For KPI Analysis And Machine Learning
Utilizing Human Data Validation For KPI Analysis And Machine Learning
 
Scalability for Startups (Frank Mashraqi, Startonomics SF 2008)
Scalability for Startups (Frank Mashraqi, Startonomics SF 2008)Scalability for Startups (Frank Mashraqi, Startonomics SF 2008)
Scalability for Startups (Frank Mashraqi, Startonomics SF 2008)
 
RightScale News November 2013: Launch of Cloud Analytics
RightScale News November 2013: Launch of Cloud AnalyticsRightScale News November 2013: Launch of Cloud Analytics
RightScale News November 2013: Launch of Cloud Analytics
 
Agile Transformation as a Catalyst for Cloud Adoption AWS Summit SG 2017
Agile Transformation as a Catalyst for Cloud Adoption AWS Summit SG 2017Agile Transformation as a Catalyst for Cloud Adoption AWS Summit SG 2017
Agile Transformation as a Catalyst for Cloud Adoption AWS Summit SG 2017
 
FinOps at REA – Innovation in Finance & Operations
FinOps at REA – Innovation in Finance & OperationsFinOps at REA – Innovation in Finance & Operations
FinOps at REA – Innovation in Finance & Operations
 
estrat AWS Cloud Breakfast
estrat AWS Cloud Breakfastestrat AWS Cloud Breakfast
estrat AWS Cloud Breakfast
 
Strategies For Lasting Savings With AWS Reserved Instances
Strategies For Lasting Savings With AWS Reserved InstancesStrategies For Lasting Savings With AWS Reserved Instances
Strategies For Lasting Savings With AWS Reserved Instances
 
How to get the most out of your cloud - Microsoft Cloud Day
How to get the most out of your cloud - Microsoft Cloud DayHow to get the most out of your cloud - Microsoft Cloud Day
How to get the most out of your cloud - Microsoft Cloud Day
 
Keep Cloud Transformation on Track: Nine Best Practices to Avoid or Break Thr...
Keep Cloud Transformation on Track: Nine Best Practices to Avoid or Break Thr...Keep Cloud Transformation on Track: Nine Best Practices to Avoid or Break Thr...
Keep Cloud Transformation on Track: Nine Best Practices to Avoid or Break Thr...
 
Creating a Culture of Cost Management in Your Organization
Creating a Culture of Cost Management in Your OrganizationCreating a Culture of Cost Management in Your Organization
Creating a Culture of Cost Management in Your Organization
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
 
AWSome Day Manila - Opening Keynote, Feb 25 2014
AWSome Day Manila - Opening Keynote, Feb 25 2014AWSome Day Manila - Opening Keynote, Feb 25 2014
AWSome Day Manila - Opening Keynote, Feb 25 2014
 
Luncheon 2015-03-19 - If Your Traffic is Going to the Cloud, Why Aren’t You S...
Luncheon 2015-03-19 - If Your Traffic is Going to the Cloud, Why Aren’t You S...Luncheon 2015-03-19 - If Your Traffic is Going to the Cloud, Why Aren’t You S...
Luncheon 2015-03-19 - If Your Traffic is Going to the Cloud, Why Aren’t You S...
 
Creating a Culture of Cost Management in Your Organization
Creating a Culture of Cost Management in Your OrganizationCreating a Culture of Cost Management in Your Organization
Creating a Culture of Cost Management in Your Organization
 

Ähnlich wie AWS Customer Presentation - eHarmony

Hw09 Matchmaking In The Cloud
Hw09   Matchmaking In The CloudHw09   Matchmaking In The Cloud
Hw09 Matchmaking In The Cloud
Cloudera, Inc.
 
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Yahoo Developer Network
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
Amazon Web Services
 

Ähnlich wie AWS Customer Presentation - eHarmony (20)

Hw09 Matchmaking In The Cloud
Hw09   Matchmaking In The CloudHw09   Matchmaking In The Cloud
Hw09 Matchmaking In The Cloud
 
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
 
Amf304 optimizing-design-and-e-660cc73d-5c4c-4331-8f59-48cccdc1b7f4-135588426...
Amf304 optimizing-design-and-e-660cc73d-5c4c-4331-8f59-48cccdc1b7f4-135588426...Amf304 optimizing-design-and-e-660cc73d-5c4c-4331-8f59-48cccdc1b7f4-135588426...
Amf304 optimizing-design-and-e-660cc73d-5c4c-4331-8f59-48cccdc1b7f4-135588426...
 
AWS for Semiconductor and Electronics Design | Hsinchu, April 10
AWS for Semiconductor and Electronics Design | Hsinchu, April 10AWS for Semiconductor and Electronics Design | Hsinchu, April 10
AWS for Semiconductor and Electronics Design | Hsinchu, April 10
 
Sydney summit-keynote
 Sydney summit-keynote Sydney summit-keynote
Sydney summit-keynote
 
AWS Enterprise Day | Running Critical Business Applications on AWS
AWS Enterprise Day | Running Critical Business Applications on AWSAWS Enterprise Day | Running Critical Business Applications on AWS
AWS Enterprise Day | Running Critical Business Applications on AWS
 
Feb 2024 Apache Hudi Community Sync with Daniel Ford
Feb 2024 Apache Hudi Community Sync with Daniel FordFeb 2024 Apache Hudi Community Sync with Daniel Ford
Feb 2024 Apache Hudi Community Sync with Daniel Ford
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousing
 
Big dataandhp cforawsbrasilsummit
Big dataandhp cforawsbrasilsummitBig dataandhp cforawsbrasilsummit
Big dataandhp cforawsbrasilsummit
 
Retail & CPG
Retail & CPGRetail & CPG
Retail & CPG
 
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...
 
5 Reasons to Move Your BI to the Cloud
5 Reasons to Move Your BI to the Cloud5 Reasons to Move Your BI to the Cloud
5 Reasons to Move Your BI to the Cloud
 
Client approaches to successfully navigate through the big data storm
Client approaches to successfully navigate through the big data stormClient approaches to successfully navigate through the big data storm
Client approaches to successfully navigate through the big data storm
 
Auckland Summit Keynote
Auckland Summit KeynoteAuckland Summit Keynote
Auckland Summit Keynote
 
AWS Summit 2013 | India - Running Enterprise Applications like SAP, Oracle an...
AWS Summit 2013 | India - Running Enterprise Applications like SAP, Oracle an...AWS Summit 2013 | India - Running Enterprise Applications like SAP, Oracle an...
AWS Summit 2013 | India - Running Enterprise Applications like SAP, Oracle an...
 

Mehr von Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

AWS Customer Presentation - eHarmony

  • 1. The Evolution of Hadoop and AWS at
  • 2. About eHarmony Launched in 2000 Goal to create compatible matches that lead to happy, long-term relationships Compatibility models based on decades of research and clinical experience in psychology Available in United States, Canada, Australia and United Kingdom
  • 3.
  • 4. Over 20 million users + 320-item questionnaire answered by each user = BIG DATA
  • 5. Continuous Improvement on Match Quality Requires infrastructure that supports: More user data More complex models Increased growth
  • 6. Why Not a Traditional Solution? Scaling vertically Complex to build Scaling a constant challenge Long engineering effort Expensive
  • 7. How Hadoop Solved Our Problem Cuts BIG DATA into small data Horizontal scaling platform Fault tolerance Commodity boxes
  • 8. How Amazon Solved Our Problem Amazon EC2 & S3 provided an attractive approach Hosted Hadoop framework Cost effective Ability to scale on demand SOLD!
  • 9. AWS Pricing Model Pay-per-use elastic model Choice of server type Lets you get up and running quickly and cheaply Highly cost effective alternative to doing it in-house 9
  • 10. Performance by Instance Type 10 Minutes
  • 11. AWS Elastic MapReduce EC2 cluster managed for you behind the scenes Only have to worry about MapReduce Read/write data directly from S3 or HDFS Faster turn-around time to production
  • 12. Elastic MapReduce for eHarmony Vastly simplified our Hadoop processing No need to explicitly allocate, start and shutdown EC2 instances No need to explicitly manipulate master node Cluster control and job management reduced to a single local command 12
  • 13. Architecture Data Warehouse Amazon Cloud S3 Elastic MapReduce upload User data dump input Hadoop Jobs download output update key-value store Data Warehouse
  • 14. Challenges The overall process depends on the success of each stage Assume every stage is unreliable Need to build retry/abort logic to handle failures 14
  • 16. Lessons Learned EC2/S3/EMR = cost effective Hadoop community support is great Hadoop combined w/ real-time system = tricky Dev tools really easy to work right out of the box Ensuring end-to-end reliability poses biggest challenges 16
  • 17. Looking Ahead More tools to empower business intelligence beyond engineering HIVE Helps empower engineers & non-engineers to create analytic jobs on the fly Tools for integrating to and from a traditional database/data warehouse to a Hadoop cluster