SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Rounds Analytics Pipeline
Aviv Laufer CTO, Rounds @avivl
Founded 2008
35 person team (⅔ in R&D)
Tel-Aviv based
Raised over $22 million in funding from industry leading
investors such as Sequoia Capital, Samsung Ventures, Rhodium,
Verizon Ventures and many more. Over 25 million users worldwide
Client - Objective-C, Java, C/C++
Server - Python, Go, C++, and bits of Erlang
DB - MySQL (RDS), CouchBase (a few clusters)
Multi Cloud - AWS, GCE, SL, DO
Deployment - Ansible
Monitoring - Sensu, NewRelic, VictorOps
And yes we use Docker…...
Tools of the Trade
We tried to make it work quite a few times, and failed
We kept on trying
We think we got it right this time
ANALYTICS @ROUNDS
DO OR DO NOT. THERE IS NO TRY. - Master Yoda
One monolith app
Data was written to MySql/RDS - row by row
Batch ETL to Vertica
And then came July 2014
GENESIS
Data collection killed our backend app
Slow, failing ETL process
No real time view into events
Preferred users over analytics, we killed the event collection
We were flying blind
CHAOS
Separate ETL process from main app
Clients reports (a request for each event) to a different microservice
First very naive version written in Go - it scales!
Data is written to an Elasticsearch cluster.
ETL from ES to Vertica
EXODUS
Frontends - Receiving user analytics and perform sanity checks
Google Pub/Sub - Store events for future processing
Workers - Pull the events from Pub/Sub and stream to Google BigQuery and ES
...And Then There Were Three...
Clients send gzipped, batched
Frontend does sanity checks - Validation, versioning, etc.
Frontend replies fast (202 Accepted) and closes the connection in order
to save on mobile socket life
Geo load-balanced
Pushes analytics into Pub/Sub for future processing, mutation
Fan-In model
ANALYTICS - FRONTEND
Pulls analytics from Pub/Sub
Mutate/Enrich data if necessary
Inserts to various DBs, according to usage - Monitoring, BI, Warehousing, etc.
Separation of concerns - Worker cluster per target DB
Fan-out model
Renee Finch, golang.org/doc/gopher/pencil/ANALYTICS - WORKER
Golang, abstraction package
Receives rows, streams to BigQuery (as opposed to load jobs)
Sync (foreground insert) or Async (background insert)
Pros: Instant data availability, no job delay, fast
Cons: Harder handling of bad analytics, Google’s HTTP 500s (requires retry)
Open source, PRs merrily encouraged!
Collecting User Data and Usage - Blog Post - http://bit.ly/CollectingDataRounds
github.com/rounds/go-bqstreamerSTREAMING TO BigQuery
Frontends deployed in several locations in GCE (We Geo load balance them)
Workers are in GCE (Europe West)
ES cluster is in GCE (Europe West)
ACROSS THE UNIVERSE
We started using Elasticsearch for monitoring about a year before elastic.co relaize that
Every new feature received a monitoring dashboard
Debugging
Monitoring (custom sensu checks)
Data is kept for 30 to 90 days
Ad-hoc reporting using Kibana
ELASTICSEARCH
Store data from the beginning till the end of time
Standard(ish) SQL
Very fast
No DBA is needed
Business reports (SiSense) - “Kibana” for BigQuery
BigQuery
NEW ISSUES
Permissive vs hard scheme for events - allow clients ease of use while keeping the scheme
strict for ease of BI
Clients make mistakes (Arabic locale dates) - elasticsearch allows while BQ doesn’t
Things we’re integrating as solutions
Every event is a class - compile time validation generated from RAML
Wrote a library for event reporting server side
ANY QUESTIONS
DO YOU HAVE?

Weitere ähnliche Inhalte

Was ist angesagt?

Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutionsClaudio Pontili
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaTreasure Data, Inc.
 
The Fermilab HEPCloud Facility
The Fermilab HEPCloud FacilityThe Fermilab HEPCloud Facility
The Fermilab HEPCloud FacilityClaudio Pontili
 
Building an intelligent big data application in 30 minutes
Building an intelligent big data application in 30 minutesBuilding an intelligent big data application in 30 minutes
Building an intelligent big data application in 30 minutesClaudiu Barbura
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Jason Flittner
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingDatabricks
 
IPC Global Big Data To Decision Solution Overview
IPC Global Big Data To Decision Solution OverviewIPC Global Big Data To Decision Solution Overview
IPC Global Big Data To Decision Solution Overviewpzybrick
 
Serverless Event Driven Containers with KEDA
Serverless Event Driven Containers with KEDAServerless Event Driven Containers with KEDA
Serverless Event Driven Containers with KEDANilesh Gule
 
Azure containers fundamentals
Azure containers fundamentalsAzure containers fundamentals
Azure containers fundamentalsNilesh Gule
 
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Jeff Magnusson
 
Big data for dot net Devs with Spark
Big data for dot net Devs with SparkBig data for dot net Devs with Spark
Big data for dot net Devs with SparkNilesh Gule
 
Rapid Data Analytics @ Netflix
Rapid Data Analytics @ NetflixRapid Data Analytics @ Netflix
Rapid Data Analytics @ NetflixData Con LA
 
Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Andreas Raible
 
Building Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and KubeflowBuilding Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and KubeflowDatabricks
 
How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...Alluxio, Inc.
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElasticsearch
 
Migrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the CloudMigrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the CloudRobert Sanders
 
PowerStream Demo
PowerStream DemoPowerStream Demo
PowerStream DemoSingleStore
 

Was ist angesagt? (20)

Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutions
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
 
The Fermilab HEPCloud Facility
The Fermilab HEPCloud FacilityThe Fermilab HEPCloud Facility
The Fermilab HEPCloud Facility
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
 
Building an intelligent big data application in 30 minutes
Building an intelligent big data application in 30 minutesBuilding an intelligent big data application in 30 minutes
Building an intelligent big data application in 30 minutes
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks Streaming
 
IPC Global Big Data To Decision Solution Overview
IPC Global Big Data To Decision Solution OverviewIPC Global Big Data To Decision Solution Overview
IPC Global Big Data To Decision Solution Overview
 
Serverless Event Driven Containers with KEDA
Serverless Event Driven Containers with KEDAServerless Event Driven Containers with KEDA
Serverless Event Driven Containers with KEDA
 
Azure containers fundamentals
Azure containers fundamentalsAzure containers fundamentals
Azure containers fundamentals
 
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
 
Big data for dot net Devs with Spark
Big data for dot net Devs with SparkBig data for dot net Devs with Spark
Big data for dot net Devs with Spark
 
Rapid Data Analytics @ Netflix
Rapid Data Analytics @ NetflixRapid Data Analytics @ Netflix
Rapid Data Analytics @ Netflix
 
Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?
 
CSharp
CSharpCSharp
CSharp
 
Building Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and KubeflowBuilding Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and Kubeflow
 
How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep dive
 
Migrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the CloudMigrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the Cloud
 
PowerStream Demo
PowerStream DemoPowerStream Demo
PowerStream Demo
 

Ähnlich wie Rounds analytics pipeline

Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engineWalter Liu
 
Griffith Bi Migration & Source Control
Griffith Bi Migration & Source ControlGriffith Bi Migration & Source Control
Griffith Bi Migration & Source ControlDavid Waters
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsIntroduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsJongwook Woo
 
Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Imply
 
Business model driven cloud adoption - what NI is doing in the cloud
Business model driven cloud adoption -  what  NI is doing in the cloudBusiness model driven cloud adoption -  what  NI is doing in the cloud
Business model driven cloud adoption - what NI is doing in the cloudErnest Mueller
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlKhanderao Kand
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopRTTS
 
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...Big Data Week
 
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaSunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaMopuru Babu
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemYael Garten
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveHarnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveQubole
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minsSparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minssparkflows
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleDatabricks
 
Docker Geneva Meetup - Introduction to Docker
Docker Geneva Meetup - Introduction to DockerDocker Geneva Meetup - Introduction to Docker
Docker Geneva Meetup - Introduction to DockerSmartWave
 

Ähnlich wie Rounds analytics pipeline (20)

Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
Griffith Bi Migration & Source Control
Griffith Bi Migration & Source ControlGriffith Bi Migration & Source Control
Griffith Bi Migration & Source Control
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsIntroduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
 
Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
 
Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)
 
Business model driven cloud adoption - what NI is doing in the cloud
Business model driven cloud adoption -  what  NI is doing in the cloudBusiness model driven cloud adoption -  what  NI is doing in the cloud
Business model driven cloud adoption - what NI is doing in the cloud
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of Hadoop
 
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
 
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaSunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveHarnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minsSparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Docker Geneva Meetup - Introduction to Docker
Docker Geneva Meetup - Introduction to DockerDocker Geneva Meetup - Introduction to Docker
Docker Geneva Meetup - Introduction to Docker
 

Kürzlich hochgeladen

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Kürzlich hochgeladen (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Rounds analytics pipeline

  • 1. Rounds Analytics Pipeline Aviv Laufer CTO, Rounds @avivl
  • 2. Founded 2008 35 person team (⅔ in R&D) Tel-Aviv based Raised over $22 million in funding from industry leading investors such as Sequoia Capital, Samsung Ventures, Rhodium, Verizon Ventures and many more. Over 25 million users worldwide
  • 3.
  • 4. Client - Objective-C, Java, C/C++ Server - Python, Go, C++, and bits of Erlang DB - MySQL (RDS), CouchBase (a few clusters) Multi Cloud - AWS, GCE, SL, DO Deployment - Ansible Monitoring - Sensu, NewRelic, VictorOps And yes we use Docker…... Tools of the Trade
  • 5. We tried to make it work quite a few times, and failed We kept on trying We think we got it right this time ANALYTICS @ROUNDS DO OR DO NOT. THERE IS NO TRY. - Master Yoda
  • 6. One monolith app Data was written to MySql/RDS - row by row Batch ETL to Vertica And then came July 2014 GENESIS
  • 7.
  • 8. Data collection killed our backend app Slow, failing ETL process No real time view into events Preferred users over analytics, we killed the event collection We were flying blind CHAOS
  • 9. Separate ETL process from main app Clients reports (a request for each event) to a different microservice First very naive version written in Go - it scales! Data is written to an Elasticsearch cluster. ETL from ES to Vertica EXODUS
  • 10. Frontends - Receiving user analytics and perform sanity checks Google Pub/Sub - Store events for future processing Workers - Pull the events from Pub/Sub and stream to Google BigQuery and ES ...And Then There Were Three...
  • 11. Clients send gzipped, batched Frontend does sanity checks - Validation, versioning, etc. Frontend replies fast (202 Accepted) and closes the connection in order to save on mobile socket life Geo load-balanced Pushes analytics into Pub/Sub for future processing, mutation Fan-In model ANALYTICS - FRONTEND
  • 12. Pulls analytics from Pub/Sub Mutate/Enrich data if necessary Inserts to various DBs, according to usage - Monitoring, BI, Warehousing, etc. Separation of concerns - Worker cluster per target DB Fan-out model Renee Finch, golang.org/doc/gopher/pencil/ANALYTICS - WORKER
  • 13. Golang, abstraction package Receives rows, streams to BigQuery (as opposed to load jobs) Sync (foreground insert) or Async (background insert) Pros: Instant data availability, no job delay, fast Cons: Harder handling of bad analytics, Google’s HTTP 500s (requires retry) Open source, PRs merrily encouraged! Collecting User Data and Usage - Blog Post - http://bit.ly/CollectingDataRounds github.com/rounds/go-bqstreamerSTREAMING TO BigQuery
  • 14. Frontends deployed in several locations in GCE (We Geo load balance them) Workers are in GCE (Europe West) ES cluster is in GCE (Europe West) ACROSS THE UNIVERSE
  • 15. We started using Elasticsearch for monitoring about a year before elastic.co relaize that Every new feature received a monitoring dashboard Debugging Monitoring (custom sensu checks) Data is kept for 30 to 90 days Ad-hoc reporting using Kibana ELASTICSEARCH
  • 16.
  • 17. Store data from the beginning till the end of time Standard(ish) SQL Very fast No DBA is needed Business reports (SiSense) - “Kibana” for BigQuery BigQuery
  • 18. NEW ISSUES Permissive vs hard scheme for events - allow clients ease of use while keeping the scheme strict for ease of BI Clients make mistakes (Arabic locale dates) - elasticsearch allows while BQ doesn’t Things we’re integrating as solutions Every event is a class - compile time validation generated from RAML Wrote a library for event reporting server side