SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Subtitle




eHarmony in Cloud      Brian Ko
eHarmony
• Online subscription-based matchmaking
  service
• Available in United States, Canada,
  Australia and United Kingdom.
• On average, 236 members in US marry
  every day.
• More than 20 million registered users.


                     1
Why Cloud?
• Problem exceeds the limits of the data
  center and data warehouse environment.
• Leverage EC2 and Hadoop to scale data




                    2
Finding match
• Model Creation




                   3
Find matching
• Matching




                   4
Find Matching
• Predicative Model Scores




                     5
Requirement
• All the matches, scores, and user
  information should be archived daily
• Ready for 10X growth
• Possible O(n2) problem
• Need to support set of models becoming
  more complex



                     6
Challenge
• Current architecture is multi-tiered with a
  relational back-end
• Scoring is DB join intensive
• Data need constant archiving
  – Matches, match scores, user attributes at
    time of match creation
  – Model validation is done at a later time
    across many days
• Need a non-DB solution

                        7
Solution
• Open Source Java implementation of
  Google’s MapReduce framework

  – Distributes work across vast amounts of data
  – Hadoop Distributed File System (HDFS)
    provides reliability through replication
  – Automatic re-execution on failure/distribution
  – Scale horizontally on commodity hardware


                         8
Slide 9
• Simple Storage Service (S3) provides
  cheap unlimited storage.
• Elastic Cloud Computing (EC2) enables
  horizontal scaling by adding servers on
  demand.




                      9
MapReduce
• A large server farm can use MapReduce to
  process huge dataset.
• Map step
  – Master node takes the input
  – Chops it up into smaller sub-problems
  – Distributes those to worker nodes.
• Reduce step
  – Master node takes the answers to all the sub-
    problems
  – Combines them in a way to get the output


                            10
Why Hadoop
• Mapper and Reducer are written by you
• Hadoop provides
  – Parallelization
  – Shuffle and sort




                       11
Actual Process
• Upload to S3 and start EC2 Cluster




                     13
Actual Process
• Process and archive




                        14
Amazon Elastic MapReduce
• It is a web service
• EC2 cluster is managed for you behind the
  scenes
• Starts Hadoop implementation of the
  MapReduce framework on Amazon EC2
• Each step can read and write data directly
  from and to S3
• Based on Hadoop 0.18.3

                      15
Elastic MapReduce
• No need to explicitly allocate, start and
  shutdown EC2 instances
• Individual jobs were managed by a remote
  script running on master node (no longer
  required)
• Jobs are arranged into a job flow, created
  with a single command
• Status of a job flow and all its steps are
  accessible by a REST service

                      16
Before Elastic Map Reduce
•   Allocate/Verify cluster
•   Push application to cluster
•   Run a control script on the master
•   Kick off each job step on the master
•   Create and detect a job completion token
•   Shut the cluster down



                       17
After Elastic MapReduce
• With Elastic MapReduce we can do all this
  with a single local command
• Uses jar and conf files stored on S3
• Various monitoring tools for EC2 and S3
  are provided




                     18
Development Environment
• Cheap to set up on Amazon
• Quick setup - Number of servers is
  controlled by a config variable
• Identical to production
• Separate development account
  recommended



                     19
Cost comparison
• Average EC2 and S3 Cost
  – Each run is 2 to 3 hours
  – $1200/month for EC2
  – $100/month for S3
• Projected in-house cost
  – $5000/month for a local cluster of 50 nodes
    running 24/7
  – A new company needs to add data center and
    operation personnel expense


                         20
Summary
• Dev tools really easy to work with and just
  work right out of the box
• Standard Hadoop AMI worked great
• Easy to write unit tests for MapReduce
• Hadoop community support is great.
• EC2/S3/EMR are cost effective
The End

5 minutes of question time
       starts now!
Questions

4 minutes left!
Questions

3 minutes left!
Questions

2 minutes left!
Questions

1 minute left!
Questions

30 seconds left!
Questions

TIME IS UP!

Weitere ähnliche Inhalte

Was ist angesagt?

Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...Brandon O'Brien
 
Spark volume requirements 2018
Spark volume requirements 2018Spark volume requirements 2018
Spark volume requirements 2018Rachit Arora
 
Netflix's Could Migration
Netflix's Could MigrationNetflix's Could Migration
Netflix's Could MigrationChef
 
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScyllaDB
 
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScyllaDB
 
Building a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache SparkBuilding a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache SparkDataWorks Summit
 
Lessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluLessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluDataWorks Summit
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingJen Aman
 
A Collaborative Data Science Development Workflow
A Collaborative Data Science Development WorkflowA Collaborative Data Science Development Workflow
A Collaborative Data Science Development WorkflowDatabricks
 
Scylla Summit 2016: Graph Processing with Titan and Scylla
Scylla Summit 2016: Graph Processing with Titan and ScyllaScylla Summit 2016: Graph Processing with Titan and Scylla
Scylla Summit 2016: Graph Processing with Titan and ScyllaScyllaDB
 
Apache Cassandra in the Cloud
Apache Cassandra in the CloudApache Cassandra in the Cloud
Apache Cassandra in the CloudInstaclustr
 
Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at AirbnbBill Liu
 
Platform as a service standard for hadoop environment
Platform as a service standard for hadoop environmentPlatform as a service standard for hadoop environment
Platform as a service standard for hadoop environmentAbhay Pai
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...HostedbyConfluent
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...Databricks
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceChin Huang
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesDatabricks
 

Was ist angesagt? (20)

Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
 
Spark volume requirements 2018
Spark volume requirements 2018Spark volume requirements 2018
Spark volume requirements 2018
 
Netflix's Could Migration
Netflix's Could MigrationNetflix's Could Migration
Netflix's Could Migration
 
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
 
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
 
Building a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache SparkBuilding a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache Spark
 
Lessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluLessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at Hulu
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
 
A Collaborative Data Science Development Workflow
A Collaborative Data Science Development WorkflowA Collaborative Data Science Development Workflow
A Collaborative Data Science Development Workflow
 
Scylla Summit 2016: Graph Processing with Titan and Scylla
Scylla Summit 2016: Graph Processing with Titan and ScyllaScylla Summit 2016: Graph Processing with Titan and Scylla
Scylla Summit 2016: Graph Processing with Titan and Scylla
 
Apache Cassandra in the Cloud
Apache Cassandra in the CloudApache Cassandra in the Cloud
Apache Cassandra in the Cloud
 
Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at Airbnb
 
Platform as a service standard for hadoop environment
Platform as a service standard for hadoop environmentPlatform as a service standard for hadoop environment
Platform as a service standard for hadoop environment
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph Performance
 
REDSHIFT - Amazon
REDSHIFT - AmazonREDSHIFT - Amazon
REDSHIFT - Amazon
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
 

Ähnlich wie eHarmony in the Cloud

سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرdatastack
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreducehansen3032
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Bigdata and Hadoop with Docker
Bigdata and Hadoop with DockerBigdata and Hadoop with Docker
Bigdata and Hadoop with Dockerharidasnss
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processinghitesh1892
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingHortonworks
 
Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?Alex Clark
 
Flood modelling on the Cloud
Flood modelling on the CloudFlood modelling on the Cloud
Flood modelling on the Cloudasm100
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaData Con LA
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing enginebigdatagurus_meetup
 
Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Zohar Elkayam
 

Ähnlich wie eHarmony in the Cloud (20)

سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابر
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Bigdata and Hadoop with Docker
Bigdata and Hadoop with DockerBigdata and Hadoop with Docker
Bigdata and Hadoop with Docker
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?
 
Flood modelling on the Cloud
Flood modelling on the CloudFlood modelling on the Cloud
Flood modelling on the Cloud
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Big Data training
Big Data trainingBig Data training
Big Data training
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016
 

Mehr von Craig Dickson

Amazon Webservices for Java Developers - UCI Webinar
Amazon Webservices for Java Developers - UCI WebinarAmazon Webservices for Java Developers - UCI Webinar
Amazon Webservices for Java Developers - UCI WebinarCraig Dickson
 
Dead-Simple Deployment: Headache-Free Java Web Applications in the Cloud
Dead-Simple Deployment: Headache-Free Java Web Applications in the CloudDead-Simple Deployment: Headache-Free Java Web Applications in the Cloud
Dead-Simple Deployment: Headache-Free Java Web Applications in the CloudCraig Dickson
 
Rapid RESTful Web Applications with Apache Sling and Jackrabbit
Rapid RESTful Web Applications with Apache Sling and JackrabbitRapid RESTful Web Applications with Apache Sling and Jackrabbit
Rapid RESTful Web Applications with Apache Sling and JackrabbitCraig Dickson
 
Java PaaS Vendor Survey - September 2011
Java PaaS Vendor Survey - September 2011Java PaaS Vendor Survey - September 2011
Java PaaS Vendor Survey - September 2011Craig Dickson
 
JDBC Basics (In 20 Minutes Flat)
JDBC Basics (In 20 Minutes Flat)JDBC Basics (In 20 Minutes Flat)
JDBC Basics (In 20 Minutes Flat)Craig Dickson
 
How to test drive development using Linux
How to test drive development using LinuxHow to test drive development using Linux
How to test drive development using LinuxCraig Dickson
 
Google Wave Introduction
Google Wave IntroductionGoogle Wave Introduction
Google Wave IntroductionCraig Dickson
 
Adobe Flex 4 Overview
Adobe Flex 4 OverviewAdobe Flex 4 Overview
Adobe Flex 4 OverviewCraig Dickson
 
Java Persistence API (JPA) - A Brief Overview
Java Persistence API (JPA) - A Brief OverviewJava Persistence API (JPA) - A Brief Overview
Java Persistence API (JPA) - A Brief OverviewCraig Dickson
 
Fast and Free SSO: A Survey of Open-Source Solutions to Single Sign-on
Fast and Free SSO: A Survey of Open-Source Solutions to Single Sign-onFast and Free SSO: A Survey of Open-Source Solutions to Single Sign-on
Fast and Free SSO: A Survey of Open-Source Solutions to Single Sign-onCraig Dickson
 
Building Social Applications using Zembly
Building Social Applications using ZemblyBuilding Social Applications using Zembly
Building Social Applications using ZemblyCraig Dickson
 
Best Practices for Large-Scale Web Sites
Best Practices for Large-Scale Web SitesBest Practices for Large-Scale Web Sites
Best Practices for Large-Scale Web SitesCraig Dickson
 
Cloud Computing Introduction
Cloud Computing IntroductionCloud Computing Introduction
Cloud Computing IntroductionCraig Dickson
 
Performance Analysis and Monitoring with Perf4j
Performance Analysis and Monitoring with Perf4jPerformance Analysis and Monitoring with Perf4j
Performance Analysis and Monitoring with Perf4jCraig Dickson
 
JavaFX vs AJAX vs Flex
JavaFX vs AJAX vs FlexJavaFX vs AJAX vs Flex
JavaFX vs AJAX vs FlexCraig Dickson
 

Mehr von Craig Dickson (16)

Amazon Webservices for Java Developers - UCI Webinar
Amazon Webservices for Java Developers - UCI WebinarAmazon Webservices for Java Developers - UCI Webinar
Amazon Webservices for Java Developers - UCI Webinar
 
Dead-Simple Deployment: Headache-Free Java Web Applications in the Cloud
Dead-Simple Deployment: Headache-Free Java Web Applications in the CloudDead-Simple Deployment: Headache-Free Java Web Applications in the Cloud
Dead-Simple Deployment: Headache-Free Java Web Applications in the Cloud
 
Rapid RESTful Web Applications with Apache Sling and Jackrabbit
Rapid RESTful Web Applications with Apache Sling and JackrabbitRapid RESTful Web Applications with Apache Sling and Jackrabbit
Rapid RESTful Web Applications with Apache Sling and Jackrabbit
 
Java PaaS Vendor Survey - September 2011
Java PaaS Vendor Survey - September 2011Java PaaS Vendor Survey - September 2011
Java PaaS Vendor Survey - September 2011
 
JDBC Basics (In 20 Minutes Flat)
JDBC Basics (In 20 Minutes Flat)JDBC Basics (In 20 Minutes Flat)
JDBC Basics (In 20 Minutes Flat)
 
How to test drive development using Linux
How to test drive development using LinuxHow to test drive development using Linux
How to test drive development using Linux
 
Google Wave Introduction
Google Wave IntroductionGoogle Wave Introduction
Google Wave Introduction
 
Adobe Flex 4 Overview
Adobe Flex 4 OverviewAdobe Flex 4 Overview
Adobe Flex 4 Overview
 
Palm WebOS Overview
Palm WebOS OverviewPalm WebOS Overview
Palm WebOS Overview
 
Java Persistence API (JPA) - A Brief Overview
Java Persistence API (JPA) - A Brief OverviewJava Persistence API (JPA) - A Brief Overview
Java Persistence API (JPA) - A Brief Overview
 
Fast and Free SSO: A Survey of Open-Source Solutions to Single Sign-on
Fast and Free SSO: A Survey of Open-Source Solutions to Single Sign-onFast and Free SSO: A Survey of Open-Source Solutions to Single Sign-on
Fast and Free SSO: A Survey of Open-Source Solutions to Single Sign-on
 
Building Social Applications using Zembly
Building Social Applications using ZemblyBuilding Social Applications using Zembly
Building Social Applications using Zembly
 
Best Practices for Large-Scale Web Sites
Best Practices for Large-Scale Web SitesBest Practices for Large-Scale Web Sites
Best Practices for Large-Scale Web Sites
 
Cloud Computing Introduction
Cloud Computing IntroductionCloud Computing Introduction
Cloud Computing Introduction
 
Performance Analysis and Monitoring with Perf4j
Performance Analysis and Monitoring with Perf4jPerformance Analysis and Monitoring with Perf4j
Performance Analysis and Monitoring with Perf4j
 
JavaFX vs AJAX vs Flex
JavaFX vs AJAX vs FlexJavaFX vs AJAX vs Flex
JavaFX vs AJAX vs Flex
 

Kürzlich hochgeladen

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

eHarmony in the Cloud

  • 2. eHarmony • Online subscription-based matchmaking service • Available in United States, Canada, Australia and United Kingdom. • On average, 236 members in US marry every day. • More than 20 million registered users. 1
  • 3. Why Cloud? • Problem exceeds the limits of the data center and data warehouse environment. • Leverage EC2 and Hadoop to scale data 2
  • 7. Requirement • All the matches, scores, and user information should be archived daily • Ready for 10X growth • Possible O(n2) problem • Need to support set of models becoming more complex 6
  • 8. Challenge • Current architecture is multi-tiered with a relational back-end • Scoring is DB join intensive • Data need constant archiving – Matches, match scores, user attributes at time of match creation – Model validation is done at a later time across many days • Need a non-DB solution 7
  • 9. Solution • Open Source Java implementation of Google’s MapReduce framework – Distributes work across vast amounts of data – Hadoop Distributed File System (HDFS) provides reliability through replication – Automatic re-execution on failure/distribution – Scale horizontally on commodity hardware 8
  • 10. Slide 9 • Simple Storage Service (S3) provides cheap unlimited storage. • Elastic Cloud Computing (EC2) enables horizontal scaling by adding servers on demand. 9
  • 11. MapReduce • A large server farm can use MapReduce to process huge dataset. • Map step – Master node takes the input – Chops it up into smaller sub-problems – Distributes those to worker nodes. • Reduce step – Master node takes the answers to all the sub- problems – Combines them in a way to get the output 10
  • 12. Why Hadoop • Mapper and Reducer are written by you • Hadoop provides – Parallelization – Shuffle and sort 11
  • 13. Actual Process • Upload to S3 and start EC2 Cluster 13
  • 14. Actual Process • Process and archive 14
  • 15. Amazon Elastic MapReduce • It is a web service • EC2 cluster is managed for you behind the scenes • Starts Hadoop implementation of the MapReduce framework on Amazon EC2 • Each step can read and write data directly from and to S3 • Based on Hadoop 0.18.3 15
  • 16. Elastic MapReduce • No need to explicitly allocate, start and shutdown EC2 instances • Individual jobs were managed by a remote script running on master node (no longer required) • Jobs are arranged into a job flow, created with a single command • Status of a job flow and all its steps are accessible by a REST service 16
  • 17. Before Elastic Map Reduce • Allocate/Verify cluster • Push application to cluster • Run a control script on the master • Kick off each job step on the master • Create and detect a job completion token • Shut the cluster down 17
  • 18. After Elastic MapReduce • With Elastic MapReduce we can do all this with a single local command • Uses jar and conf files stored on S3 • Various monitoring tools for EC2 and S3 are provided 18
  • 19. Development Environment • Cheap to set up on Amazon • Quick setup - Number of servers is controlled by a config variable • Identical to production • Separate development account recommended 19
  • 20. Cost comparison • Average EC2 and S3 Cost – Each run is 2 to 3 hours – $1200/month for EC2 – $100/month for S3 • Projected in-house cost – $5000/month for a local cluster of 50 nodes running 24/7 – A new company needs to add data center and operation personnel expense 20
  • 21. Summary • Dev tools really easy to work with and just work right out of the box • Standard Hadoop AMI worked great • Easy to write unit tests for MapReduce • Hadoop community support is great. • EC2/S3/EMR are cost effective
  • 22. The End 5 minutes of question time starts now!