Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Data Analytics and Processing at Snap - Druid Meetup LA - September 2018

2.629 Aufrufe

Veröffentlicht am

Charles Allen covers data processing, analytics, and insights systems at Snap. Strength points for Druid use cases are called out as are differences in some of the processing systems used.

This is the slide collection from the second talk from:

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

Data Analytics and Processing at Snap - Druid Meetup LA - September 2018

  1. 1. Snapchat 2018 Analytics at Snap Big Data processing, slicing, and dicing Charles Allen charles.allen@snap.com https://www.linkedin.com/in/charles-allen-255bab2a/
  2. 2. 09.20.18 Who we are Snap growth Wrangling Data / Data tool chest Druid’s powerhouse Overview
  3. 3. Who we are
  4. 4. Snap Inc. is a camera company
  5. 5. Express yourself! place creative here place creative here
  6. 6. Live in the moment place creative here
  7. 7. Snap growth
  8. 8. Million DAU Q2 2014 Million DAU Q2 188 2018 Source: 10-K; 10-Q; earnings call transcripts User base up Advertiser value up 57
  9. 9. Trillions of interactions per week.
  10. 10. Wrangling data
  11. 11. Lack of data causes pain Natural pipeline development Need Find data signal, and data processing SME Source Work with development team for pipeline Develop To production! Deploy Fire and forget, or keep it live? Maintain Getting insights into data
  12. 12. Common data consumption formats Scripting High level of expertise Extremely dynamic Usually either one-off for a specific human. Or scripted for machine consumption. DashboardsReports Small qty of KPIs Big tables or worksheets “Executive” summarization Multiple KPIs Curated by expert Some flexibility Often operational in nature or usage
  13. 13. Data tool chest
  14. 14. Headline Center, Sub, Labels, 6-Screens Yellow Stream buffer Kafka Stream buffer Pubsub Batch processing orchestration Airflow Bundle storage Storage Key architecture components for data flow control ICON
  15. 15. Key architecture components for business logic Stream and Batch processing Dataflow Pipeline business logic Beam Popular language Python Popular language Java Stream and batch processing Spark
  16. 16. Headline Center, Sub, Labels, 6-Screens Yellow Bulk data warehousing Big Query Exploratory data storage Druid Druid centric dashboarding Superset General dashboarding Looker Key architecture components for data consumption
  17. 17. Core event log workflows GDPR SOX ● Bundle lands in GCS ● Airflow churns data between BigQuery and GCS ● Over 20k DAG runs a week ● Lots of access control
  18. 18. Druid vs BigQuery Druid Multi cloud compatible. Higher friction data load. Lower friction data maintenance. Gets more affordable with more usage. You will track who has the most data. Very fast. Slice and dice. BigQuery Fully managed and hosted, GCP-only. Low friction data load. High friction data maintenance. Price punishment for using too much. You will track who is causing cost spikes. Often slow, but faster than hadoop. Joins. Internal use cases for Druid vs BigQuery
  19. 19. Druid’s powerhouse
  20. 20. Large compute capacity Cores >10k Flowing into Druid Events per day >100B Answered Queries per day >100k Key Druid stats
  21. 21. Druid ingestion and consumption Reports / Dashboards SME Dashboards Drill Down
  22. 22. Data Storage & Querying Platform Platform GKE Cluster ZooKeeper Coordination & configuration Druid Indexed datastore Java, Druid Druid Indexed datastore Java, Druid Druid Broker Druid Historicals* Druid Coordinator Java, CoreOS, Druid, GCE Mesos Cluster Management GCE Marathon Orchestration GCE GCS Deep Storage CloudSQL Druid Metadata ZooKeeper Coordination & Configuration ZooKeeper Coordination & configuration MongoDB Query Time Lookup Cache ● GCP Deployment Manager ● Helm
  23. 23. Recent data FAST NVME-SSD 1 Week 2 Hot Recent data HA 1 Week 1 Cold Keep older data available Older Data HADruid retention tunings
  24. 24. We Are Hiring! charles.allen@snap.com https://www.snap.com/jobs/