SlideShare ist ein Scribd-Unternehmen logo
1 von 95
Downloaden Sie, um offline zu lesen
Patchwork Data at Etsy
        Matt Walker
Etsy



       June




2005          2007    2009   2011   2013
What happened?
We don’t like to talk about it
Okay, we do

•   http://codeascraft.etsy.com

•   https://www.etsy.com/codeascraft/talks



•   http://kongscreenprinting.com
Catch Phrases

•   Continuous deployment

•   Blameless postmortems

•   Measure everything

•   Continuous experimentation
Metrics-Driven Development


•   Ganglia

•   StatsD/Graphite

•   Splunk
Scaling a Traditional RDBMS


•   Sharded MySQL

•   memcached

•   Object-relational mapping in PHP
December




2005   2007   2009              2011   2013
Adtuitive


•   Online advertising network

•   Match forum post with rich product advertisements

•   Unafraid of scaling across Etsy sellers
Adtuitive


•   Amazon Web Services

•   JRuby

•   Rails
LAMP Stack for Big Data
•   HDFS                               •   Pig

•   MapReduce                          •   Oozie

•   HBase                              •   Avro

•   Hive                               •   Zookeeper

•   Flume

•   JDBC/ODBC    http://gigaom.com/2010/08/01/meet-big-data-equivalent-of-the-lamp-stack/


•   Hue
LAMP Stack for Big Data
•   HDFS                               •   Pig

•   MapReduce                          •   Oozie

•   HBase                              •   Avro

•   Hive                               •   Zookeeper

•   Flume

•   JDBC/ODBC    http://gigaom.com/2010/08/01/meet-big-data-equivalent-of-the-lamp-stack/


•   Hue
LAMP Stack for Big Data
•   HDFS S3               •   Pig Cascading

•   MapReduce (Elastic)   •   Oozie

•   HBase                 •   Avro TupleSerialization

•   Hive                  •   Zookeeper

•   Flume

•   JDBC/ODBC

•   Hue
Powered by MapReduce

•   ETL

•   Analytics

•   A/B testing

•   Recommenders

•   Search
Applications
•   Log ETL                          •   A/B Analyzer

•   Database snapshotter             •   Catapult

•   TasteTest                        •   Distributed search indexing

•   Facebook Gift Recommender        •   Fast Game (search index)

•   Complimentary/similar listings   •   Search autosuggest

•   Funnel Cake                      •   SearchAds

•   Feature Funnel                   •   SCRAM ETL (fraud detection)
Applications
•   Log ETL                          •   A/B Analyzer

•   Database snapshotter             •   Catapult

•   TasteTest                        •   Distributed search indexing

•   Facebook Gift Recommender        •   Fast Game (search index)

•   Complimentary/similar listings   •   Search autosuggest

•   Funnel Cake                      •   SearchAds

•   Feature Funnel                   •   SCRAM ETL (fraud detection)
Catapult


•   End-to-end success story

•   Extremely valuable for a web shop
Relevancy Thursdays



                      January




2005   2007    2009             2011   2013
Relevancy Thursdays


•   Switch default sort order to relevance

•   Each Thursday in January
Relevancy Thursdays


•   Default search order was recency

•   Relisting was our equivalent of advertising

•   $0.20 updated your listing’s timestamp
Relevancy Thursdays


•   Recency was meant to support “freshness” in search results

•   Search originated as PostgreSQL query

•   Converted to Solr to scale
What happens if we switch to
        relevance?
Relevancy Thursdays


•   No A/B testing framework

•   No event logs

•   Limping along with Google Analytics
First Log Analysis



                        February




2005   2007      2009              2011   2013
First Log Analysis


•   Raw web access logs

•   URL- and ref tag-based

•   Regex parser
Heyday of Tooling
•   A/B framework

•   Front end event logger

•   Database snapshotter

•   Barnum and Bailey

•   Custom operator library

•   Loaders
LAMP Stack for Big Data
•   HDFS S3               •   Pig Cascading

•   MapReduce (Elastic)   •   Oozie

•   HBase                 •   Avro TupleSerialization

•   Hive                  •   Zookeeper

•   Flume

•   JDBC/ODBC

•   Hue
LAMP Stack for Big Data
•   HDFS S3                         •   Pig Cascading

•   MapReduce (Elastic)             •   Oozie Barnum

•   HBase                           •   Avro TupleSerialization

•   Hive                            •   Zookeeper

•   Flume Akamai

•   JDBC/ODBC snapshotter/loaders

•   Hue
A/B Framework


•   Ramp-ups + A/B testing

•   Feature flag development
Self-service analytics for any A/B
         test on the site
A/B Framework



                      June




2005   2007    2009          2011   2013
A/B Analyzer



                                    November




2005   2007        2009      2011              2013
Why did it take so long?


•   Non-web developers learning the PHP stack

•   Failed experiments with “easier to use” MapReduce tools

•   Realizing self-service analytics was what Etsy needed
Catapult



                                February




2005   2007      2009    2011   2013
Catapult


•   A/B Analyzer + Launch Calendar

•   Full product lifecycle
LAMP Stack for Big Data
•   HDFS S3                         •   Pig Cascading

•   MapReduce (Elastic)             •   Oozie Barnum

•   HBase                           •   Avro TupleSerialization

•   Hive                            •   Zookeeper

•   Flume Akamai

•   JDBC/ODBC snapshotter/loaders

•   Hue
LAMP Stack for Big Data
•   HDFS                            •   Pig Cascading

•   MapReduce                       •   Oozie

•   HBase                           •   Avro TupleSerialization

•   Hive Vertica                    •   Zookeeper

•   Flume logrotate

•   JDBC/ODBC snapshotter/loaders

•   Hue
Computation Models


•   Batch

•   Interactive

•   Streaming
Batch
Cascading
RDBMS / Cascading
         SQL              cascading.jruby

Query Planner/Optimizer     Cascading

   Execution Engine        MapReduce

        Storage               HDFS
cascading.jruby
cascading.jruby

•   Productivity: no compile

•   Reuse: factor out structure

•   Efficiency: no JRuby runtime

•   Optimization: move aggregations map-side
A nice constructor
cascading.jruby
Productivity

•   Job templates

•   Reloader

•   Cascading local mode

•   Sampled data
Reuse
Reuse
Field Names
Efficiency


•   Just a constructor

•   Calls into Cascading API

•   No JRuby runtime on cluster
Optimization
Tuple Data Model
UDFs
Scalding


•   Distributed collections

•   Function literals replace UDFs
Interactive
Vertica
Sharded MySQL


•   Borrowed from Flickr

•   Works
Thou Shalt Not Join
Hive



                      January




2005   2007    2009   2011      2013
Hive Turned Off



                            April




2005   2007     2009    2011        2013
Hive

•   Slow

•   Sensitive

•   Operational burden

•   Educational burden
Vertica


•   Offline copy of shards, master, auxiliary databases

•   Joins are easy

•   Reasonable latency
Vertica



                               November




2005   2007     2009    2011       2013
Vertica


•   Game changer at Etsy

•   High demand for joins

•   Rapid prototyping data pipelines
RDBMS / Cascading
         SQL              cascading.jruby

Query Planner/Optimizer     Cascading

   Execution Engine        MapReduce

        Storage               HDFS
Back to MapReduce

•   Event logs

•   Schedule

•   Load data in prod

•   Scale
Vertica


•   Not Hive, Impala, Shark, etc.

•   May change our minds
Streaming
Not Powered by MapReduce


•   Activity Feed

•   Shop Stats
Etsyweb


•   memcached

•   Gearman

•   Sharded MySQL
Usecases


•   Trending

•   Fraud detection

•   ?
Turns out people don’t make
product decisions in real time


 http://mcfunley.com/whom-the-gods-would-destroy-they-first-give-real-time-analytics
Summing Up


•   Be glad you’re living in the future

•   Automated tools for the common case

•   Don’t be afraid to experiment
Image Credits
•   http://kongscreenprinting.com/what-we-do-    •   http://www.globaltimes.cn/
    showcase                                         SPECIALCOVERAGE/Top10Peopleof2011.aspx

•   http://animal.discovery.com                  •   http://www.theculturemap.com/scream-time-
                                                     edvard-munch-museum/
•   http://www.rallyrace.com/turning-over-the-
    stone-event-production-basics/               •   http://www.repentamerica.com/webelieve.html

•   http://www.flickr.com/photos/bbalaji/         •   https://soundcloud.com/tearland/tl-hive
    2443820505/
                                                 •   http://pocketnow.com/2012/08/02/wifi-vs-data-
•   http://www.madeyoulaugh.com/funny_photos/        speed-vs-battery-life/bush-scratching-head
    caveman_harley/caveman_harley.jpg

•   http://theundercoverrecruiter.com/6-ways-
    catapult-your-job-search-after-layoff/
Contact / Reference

•   Matt Walker

•   @data_daddy

•   http://codeascraft.etsy.com/

•   http://www.etsy.com/codeascraft/talks

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (13)

Migrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMigrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without Downtime
 
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
 
DevTools at Etsy
DevTools at EtsyDevTools at Etsy
DevTools at Etsy
 
Go or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.comGo or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.com
 
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanSolr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
 
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
 
Outages, PostMortems, and Human Error
Outages, PostMortems, and Human ErrorOutages, PostMortems, and Human Error
Outages, PostMortems, and Human Error
 
Building a Successful Organization By Mastering Failure
Building a Successful Organization By Mastering FailureBuilding a Successful Organization By Mastering Failure
Building a Successful Organization By Mastering Failure
 
Scaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightScaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went Right
 
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
 
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
 
Code as Craft: Building a Strong Engineering Culture at Etsy
Code as Craft: Building a Strong Engineering Culture at EtsyCode as Craft: Building a Strong Engineering Culture at Etsy
Code as Craft: Building a Strong Engineering Culture at Etsy
 
Mobile App Feature Configuration and A/B Experiments
Mobile App Feature Configuration and A/B ExperimentsMobile App Feature Configuration and A/B Experiments
Mobile App Feature Configuration and A/B Experiments
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Patchwork Data at Etsy

  • 1. Patchwork Data at Etsy Matt Walker
  • 2.
  • 3.
  • 4. Etsy June 2005 2007 2009 2011 2013
  • 6. We don’t like to talk about it
  • 7. Okay, we do • http://codeascraft.etsy.com • https://www.etsy.com/codeascraft/talks • http://kongscreenprinting.com
  • 8. Catch Phrases • Continuous deployment • Blameless postmortems • Measure everything • Continuous experimentation
  • 9. Metrics-Driven Development • Ganglia • StatsD/Graphite • Splunk
  • 10. Scaling a Traditional RDBMS • Sharded MySQL • memcached • Object-relational mapping in PHP
  • 11. December 2005 2007 2009 2011 2013
  • 12. Adtuitive • Online advertising network • Match forum post with rich product advertisements • Unafraid of scaling across Etsy sellers
  • 13. Adtuitive • Amazon Web Services • JRuby • Rails
  • 14.
  • 15. LAMP Stack for Big Data • HDFS • Pig • MapReduce • Oozie • HBase • Avro • Hive • Zookeeper • Flume • JDBC/ODBC http://gigaom.com/2010/08/01/meet-big-data-equivalent-of-the-lamp-stack/ • Hue
  • 16. LAMP Stack for Big Data • HDFS • Pig • MapReduce • Oozie • HBase • Avro • Hive • Zookeeper • Flume • JDBC/ODBC http://gigaom.com/2010/08/01/meet-big-data-equivalent-of-the-lamp-stack/ • Hue
  • 17. LAMP Stack for Big Data • HDFS S3 • Pig Cascading • MapReduce (Elastic) • Oozie • HBase • Avro TupleSerialization • Hive • Zookeeper • Flume • JDBC/ODBC • Hue
  • 18. Powered by MapReduce • ETL • Analytics • A/B testing • Recommenders • Search
  • 19. Applications • Log ETL • A/B Analyzer • Database snapshotter • Catapult • TasteTest • Distributed search indexing • Facebook Gift Recommender • Fast Game (search index) • Complimentary/similar listings • Search autosuggest • Funnel Cake • SearchAds • Feature Funnel • SCRAM ETL (fraud detection)
  • 20. Applications • Log ETL • A/B Analyzer • Database snapshotter • Catapult • TasteTest • Distributed search indexing • Facebook Gift Recommender • Fast Game (search index) • Complimentary/similar listings • Search autosuggest • Funnel Cake • SearchAds • Feature Funnel • SCRAM ETL (fraud detection)
  • 21. Catapult • End-to-end success story • Extremely valuable for a web shop
  • 22. Relevancy Thursdays January 2005 2007 2009 2011 2013
  • 23. Relevancy Thursdays • Switch default sort order to relevance • Each Thursday in January
  • 24. Relevancy Thursdays • Default search order was recency • Relisting was our equivalent of advertising • $0.20 updated your listing’s timestamp
  • 25. Relevancy Thursdays • Recency was meant to support “freshness” in search results • Search originated as PostgreSQL query • Converted to Solr to scale
  • 26. What happens if we switch to relevance?
  • 27. Relevancy Thursdays • No A/B testing framework • No event logs • Limping along with Google Analytics
  • 28.
  • 29.
  • 30. First Log Analysis February 2005 2007 2009 2011 2013
  • 31. First Log Analysis • Raw web access logs • URL- and ref tag-based • Regex parser
  • 32.
  • 33.
  • 34. Heyday of Tooling • A/B framework • Front end event logger • Database snapshotter • Barnum and Bailey • Custom operator library • Loaders
  • 35. LAMP Stack for Big Data • HDFS S3 • Pig Cascading • MapReduce (Elastic) • Oozie • HBase • Avro TupleSerialization • Hive • Zookeeper • Flume • JDBC/ODBC • Hue
  • 36. LAMP Stack for Big Data • HDFS S3 • Pig Cascading • MapReduce (Elastic) • Oozie Barnum • HBase • Avro TupleSerialization • Hive • Zookeeper • Flume Akamai • JDBC/ODBC snapshotter/loaders • Hue
  • 37. A/B Framework • Ramp-ups + A/B testing • Feature flag development
  • 38. Self-service analytics for any A/B test on the site
  • 39. A/B Framework June 2005 2007 2009 2011 2013
  • 40. A/B Analyzer November 2005 2007 2009 2011 2013
  • 41. Why did it take so long? • Non-web developers learning the PHP stack • Failed experiments with “easier to use” MapReduce tools • Realizing self-service analytics was what Etsy needed
  • 42.
  • 43.
  • 44. Catapult February 2005 2007 2009 2011 2013
  • 45. Catapult • A/B Analyzer + Launch Calendar • Full product lifecycle
  • 46.
  • 47.
  • 48.
  • 49. LAMP Stack for Big Data • HDFS S3 • Pig Cascading • MapReduce (Elastic) • Oozie Barnum • HBase • Avro TupleSerialization • Hive • Zookeeper • Flume Akamai • JDBC/ODBC snapshotter/loaders • Hue
  • 50. LAMP Stack for Big Data • HDFS • Pig Cascading • MapReduce • Oozie • HBase • Avro TupleSerialization • Hive Vertica • Zookeeper • Flume logrotate • JDBC/ODBC snapshotter/loaders • Hue
  • 51. Computation Models • Batch • Interactive • Streaming
  • 52.
  • 53. Batch
  • 55. RDBMS / Cascading SQL cascading.jruby Query Planner/Optimizer Cascading Execution Engine MapReduce Storage HDFS
  • 57. cascading.jruby • Productivity: no compile • Reuse: factor out structure • Efficiency: no JRuby runtime • Optimization: move aggregations map-side
  • 60. Productivity • Job templates • Reloader • Cascading local mode • Sampled data
  • 61. Reuse
  • 62. Reuse
  • 64.
  • 65. Efficiency • Just a constructor • Calls into Cascading API • No JRuby runtime on cluster
  • 68. UDFs
  • 69. Scalding • Distributed collections • Function literals replace UDFs
  • 70.
  • 71.
  • 72.
  • 75. Sharded MySQL • Borrowed from Flickr • Works
  • 77. Hive January 2005 2007 2009 2011 2013
  • 78. Hive Turned Off April 2005 2007 2009 2011 2013
  • 79. Hive • Slow • Sensitive • Operational burden • Educational burden
  • 80. Vertica • Offline copy of shards, master, auxiliary databases • Joins are easy • Reasonable latency
  • 81. Vertica November 2005 2007 2009 2011 2013
  • 82. Vertica • Game changer at Etsy • High demand for joins • Rapid prototyping data pipelines
  • 83.
  • 84. RDBMS / Cascading SQL cascading.jruby Query Planner/Optimizer Cascading Execution Engine MapReduce Storage HDFS
  • 85. Back to MapReduce • Event logs • Schedule • Load data in prod • Scale
  • 86. Vertica • Not Hive, Impala, Shark, etc. • May change our minds
  • 88. Not Powered by MapReduce • Activity Feed • Shop Stats
  • 89. Etsyweb • memcached • Gearman • Sharded MySQL
  • 90. Usecases • Trending • Fraud detection • ?
  • 91.
  • 92. Turns out people don’t make product decisions in real time http://mcfunley.com/whom-the-gods-would-destroy-they-first-give-real-time-analytics
  • 93. Summing Up • Be glad you’re living in the future • Automated tools for the common case • Don’t be afraid to experiment
  • 94. Image Credits • http://kongscreenprinting.com/what-we-do- • http://www.globaltimes.cn/ showcase SPECIALCOVERAGE/Top10Peopleof2011.aspx • http://animal.discovery.com • http://www.theculturemap.com/scream-time- edvard-munch-museum/ • http://www.rallyrace.com/turning-over-the- stone-event-production-basics/ • http://www.repentamerica.com/webelieve.html • http://www.flickr.com/photos/bbalaji/ • https://soundcloud.com/tearland/tl-hive 2443820505/ • http://pocketnow.com/2012/08/02/wifi-vs-data- • http://www.madeyoulaugh.com/funny_photos/ speed-vs-battery-life/bush-scratching-head caveman_harley/caveman_harley.jpg • http://theundercoverrecruiter.com/6-ways- catapult-your-job-search-after-layoff/
  • 95. Contact / Reference • Matt Walker • @data_daddy • http://codeascraft.etsy.com/ • http://www.etsy.com/codeascraft/talks