Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Cohort Analysis at Scale

Presented at Strata San Jose 2018. Shares how Netflix enables business teams to perform cohort analysis on very large, high dimensional data by using Big Data and web application technologies such as Spark, Druid, Node, React, and D3

  • Loggen Sie sich ein, um Kommentare anzuzeigen.

Cohort Analysis at Scale

  1. 1. Cohort Analysis at Scale BLAKE IRVINE STRATA SAN JOSE 2019.03.06
  2. 2. World Markets
  3. 3. World Markets
  4. 4. World Markets
  5. 5. Partners help us Grow
  6. 6. Partners are companies that make it easier for people to sign up and engage with our service and help us retain members. BLAKE IRVINE | STRATA SAN JOSE 2018
  7. 7. BLAKE IRVINE | STRATA SAN JOSE 2018
  8. 8. Let’s take a (virtual) trip!
  9. 9. Trip to Strata San Jose
  10. 10. Trip to Strata San Jose
  11. 11. Trip to Strata San Jose
  12. 12. Multiple Partner Associations Trip to Strata San Jose
  13. 13. ● Are partners helping us acquire members? ● Through which channels? ● How does a partner impact the regional market? ● Do members use our service differently on partner devices? ● How do partners compare to each other? Evaluating Partners BLAKE IRVINE | STRATA SAN JOSE 2018
  14. 14. Cohorts are the collections of members associated with a partner that are relevant to the business question. BLAKE IRVINE | STRATA SAN JOSE 2018
  15. 15. ● Our trip example showed how one member can be associated with many partners. ● Business teams want to explore the nuance of cohorts... Cohorts can be complex BLAKE IRVINE | STRATA SAN JOSE 2018
  16. 16. … for example BLAKE IRVINE | STRATA SAN JOSE 2018
  17. 17. ● 100+ million members ● Dozens of partners ● Many combinations of members and partners ● Leads to… ○ High dimensionality, high cardinality datasets ○ Very large datasets of member-level time-series activity Evaluating Cohorts BLAKE IRVINE | STRATA SAN JOSE 2018
  18. 18. Cohort Analysis at Scale
  19. 19. ● Data platform ● Data construction ● Data product for cohort analysis Cohort Analysis at Scale BLAKE IRVINE | STRATA SAN JOSE 2018
  20. 20. Data Platform Simplified Overview Big Data Portal
  21. 21. Data construction: Data Model member device partner playback events isp billing events billing processor BLAKE IRVINE | STRATA SAN JOSE 2018
  22. 22. Data construction: Cohort Dataset signup_events cohort playback_events billing_events x_events BLAKE IRVINE | STRATA SAN JOSE 2018
  23. 23. Data construction: Flat Tables playback_f cohort_playback_s device_d isp_d geo_d partner_d cohort_d BLAKE IRVINE | STRATA SAN JOSE 2018
  24. 24. Data for consumption: Flat Table key memb er_id device _id device_name device_categor y partner_name country region data_payload 1 1213 674 Amazon Fire TV Set Top Box Amazon US Americas [{"id":5025945823792539,"sequen ce":41,"time":1491962092955}, {"id":5025947899236389,"sequen ce":95,"time":1491962104824}] 2 7623 1172 Chromecast Streaming Stick Google DE EMEA … 3 4291 129 PS3 Game Console Sony ES EMEA … 4 9013 447 iPad 4 Tablet Apple CA Americas … BLAKE IRVINE | STRATA SAN JOSE 2018
  25. 25. Data construction: Copy forward BLAKE IRVINE | STRATA SAN JOSE 2018 cohort_playback_s Big Data Portal
  26. 26. ● Goals ○ Serve dozens of users ○ Provide interactive / low-latency tool ○ Provide many different perspectives ● Challenges ○ Manage high dimensionality ○ Very large time-series datasets Data Product for Cohort Analysis BLAKE IRVINE | STRATA SAN JOSE 2018
  27. 27. Analytic Tool Choices Choice 1 Choice 2 Choice 3 Analytic Tool Tool 1 Tool 2 Tool 3 Data Engine MPP Cloud In memory Data Size 1B rows 10B rows 100M rows Performance (SWAG) Up to many minutes Many minutes Several Seconds BLAKE IRVINE | STRATA SAN JOSE 2018
  28. 28. ● Data stored in Druid ● Custom app built with Javascript Choice 4... BLAKE IRVINE | STRATA SAN JOSE 2018
  29. 29. ● An open source data store for analytic applications ● Distributed, column-oriented, indexed architecture ● Well suited to serve our “flat” tables Druid white paper: http://static.druid.io/docs/druid.pdf BLAKE IRVINE | STRATA SAN JOSE 2018
  30. 30. ● Built with Express, React, Redux, D3 ● Custom UX / UI to manage views and dimensionality ● Enabled access to data served by Druid ● Enabled management of query execution and caching BLAKE IRVINE | STRATA SAN JOSE 2018
  31. 31. ● Video demo with simulated data BLAKE IRVINE | STRATA SAN JOSE 2018
  32. 32. ● PED DEMO BLAKE IRVINE | STRATA SAN JOSE 2018
  33. 33. Challenges
  34. 34. ● Dimensions (aka slice-n-dice) ○ More is always better ○ Changes require restatement ● “Typical” use cases must be met also ○ Not a solution for every data question ○ Analysts and other tools are still needed Challenges BLAKE IRVINE | STRATA SAN JOSE 2018
  35. 35. ● Data volume always increases… ○ More members, more partners, more devices, more metrics ● Custom app development time is longer, and ongoing ○ But for the right use cases, worthwhile Challenges BLAKE IRVINE | STRATA SAN JOSE 2018
  36. 36. Partners help Netflix grow. We measure partner value through cohorts. Big data tools enable efficient analysis. BLAKE IRVINE | STRATA SAN JOSE 2018 Thank you!
  37. 37. Blake Irvine - Growth Data Products birvine@netflix.com @blakeirvine linkedin.com/in/blakeirvine/

×