2. About eHarmony Launched in 2000 Goal to create compatible matches that lead to happy, long-term relationships Compatibility models based on decades of research and clinical experience in psychology Available in United States, Canada, Australia and United Kingdom
3.
4. Over 20 million users + 320-item questionnaire answered by each user = BIG DATA
5. Continuous Improvement on Match Quality Requires infrastructure that supports: More user data More complex models Increased growth
6. Why Not a Traditional Solution? Scaling vertically Complex to build Scaling a constant challenge Long engineering effort Expensive
7. How Hadoop Solved Our Problem Cuts BIG DATA into small data Horizontal scaling platform Fault tolerance Commodity boxes
8. How Amazon Solved Our Problem Amazon EC2 & S3 provided an attractive approach Hosted Hadoop framework Cost effective Ability to scale on demand SOLD!
9. AWS Pricing Model Pay-per-use elastic model Choice of server type Lets you get up and running quickly and cheaply Highly cost effective alternative to doing it in-house 9
11. AWS Elastic MapReduce EC2 cluster managed for you behind the scenes Only have to worry about MapReduce Read/write data directly from S3 or HDFS Faster turn-around time to production
12. Elastic MapReduce for eHarmony Vastly simplified our Hadoop processing No need to explicitly allocate, start and shutdown EC2 instances No need to explicitly manipulate master node Cluster control and job management reduced to a single local command 12
13. Architecture Data Warehouse Amazon Cloud S3 Elastic MapReduce upload User data dump input Hadoop Jobs download output update key-value store Data Warehouse
14. Challenges The overall process depends on the success of each stage Assume every stage is unreliable Need to build retry/abort logic to handle failures 14
16. Lessons Learned EC2/S3/EMR = cost effective Hadoop community support is great Hadoop combined w/ real-time system = tricky Dev tools really easy to work right out of the box Ensuring end-to-end reliability poses biggest challenges 16
17. Looking Ahead More tools to empower business intelligence beyond engineering HIVE Helps empower engineers & non-engineers to create analytic jobs on the fly Tools for integrating to and from a traditional database/data warehouse to a Hadoop cluster