SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Massive Scale
Data Processing
Pallavi Phadnis, Snehal Nagmote
Flink Forward SF 2019
● Consolidated Logging (CL) Overview
● High Level Architecture of CL platform
● Log Processing at Scale
● Event Extractor Use Case
● Monitoring and Alerting
● Impact of Flink based Platform
Agenda
Consolidated Logging (CL)
Build an integrated solution to provide insights into user
behavior and application performance metrics through
client-side logging.
Consolidated Logging
Use Cases Powered By CL
● Personalization
● Recommendations
● A/B Experimentation
● Application Performance
Consolidated Logging
X
Event Types
300+
Log
ProfileIdentify
Presented
NavigationLevel
Focus
...
Play
Device Platforms /
App Versions
10+
TVUI
Android
iOS
Web
...
Log Events
100s of billion
events / day
1+ petabyte of user
behavior data per
day
=
CL Platform
NETFLIX
APPS
OTHER
INTERNAL/EXTERNAL
APPS
Consolidated
Logging
DATA COLLECTION
DATA TRANSFORMATION
EVENT EXTRACTION
DATA ENRICHMENT
- USER INFO
- GEO/DEVICE INFO
SITE USAGE
APP NAVIGATIONS
SHOWS WATCHED
USER SESSIONIZATION
CL Schema
App Schema
iOS ANDROID TVUI WWW
PLATFORM FEATURES
REALTIME & BATCH DATA SINKS
Legacy Pipeline
Flink Based Platform
Landing
Service
Kafka
Event
Extractor
CL App
Kafka Streams
Elasticsearch
Hive tables
Landing
Service
SQS
Log
Processing
Server
Kafka
CL
Streaming
App
CL
ETL
CL DW
(Hive)
Kafka
CL Router
App
13 Keystone routes
S3
CL App
● Generic log processing application - supports different logging specifications
● Real-time processing
○ Data transformations
○ Data enrichment - Membership information, Geo, Device type
■ Joins
● Single source of truth with unified output schema
● Supports different data sinks: Kafka/Hive
● SLA
○ RPS: 3.5 million events per sec at peak, Latency: < 3ms
CL App Features
CL App Design
● Stateless Flink Application (Flink 1.4, Kafka 1.1)
○ At-least once processing
● Isolation of concerns through separate Flink jobs for different use cases/sink types
● Different job DAGs with common framework library: Fan In/ Fan Out
Common Log Processing Framework
Log
Consumer
Config
Reader
(FP)
Data
Enrichment
Data
Transformations
Spec
Parser
Data
Sink
Raw
events
Processed
events
Kafka
Kafka
Hive / Iceberg
CL Schema /
App Schema
Request
Type &
Version
Source
Segregated
sources
Multiple
sinks
Raw
events
Hive
Data
Partitioning
Events
Backup
● Embarrassingly parallel job (parallelism over 2000)
○ Uniform CPU utilization with high number of partitions on source kafka topic
● High memory pressure and GC pause on JM - Recovery failure/restart loop
○ Memory leak in archiving execution history (FLINK-10066)
○ Scaling bottleneck of kafka source’s union state (FLINK-10122)
● Overwhelmed coordinator due to thundering herd problem with high parallelism
(KIP-266)
Learnings & Best Practices
Data compression - a factor to consider
● Data compression ratio was worse for parquet and kafka (~ 4x)
○ Upstream kafka producer batching difference increased data entropy
● Backlog in kafka can lead to sudden load on external micro-services
● Kafka backpressure leads to task failures
○ Duplicate events
● Guice dependency injection conflicts with Flink
○ classloader.resolve-order=parent-first
Learnings & Best Practices
Event Extractor
Event Extractor Use Case
Personalization
Pipeline
CL Consumers
User clicks
User Searches
App perf metrics
Impressions
CL Stream
(Transformed
and enriched)
Personalization stream
Search stream
Impressions stream
Experimentation stream
Search Pipeline
Impressions
Pipeline
A/B
Experimentation
Pipeline
Consumer Insights
Pipeline
Exploratory
Analysis
Customer Service Tool
Keystone Routes For CL
● Growth/Scale
○ 3.5 million events/sec
○ Reading same data multiple times
■ Compute redundancy
■ Scale Kafka infrastructure for outgoing bytes
■ Operational Overhead
● High Compute and Operational cost
Problems with CL Legacy Pipeline
Keystone Routes For CL Event Extractor
● Stateless Single Flink Application
● Read data once, apply processing and route it to multiple streams
● Configuration driven Processing, without code change
● SQL Support on Stream
● Filter, Transformation and Projection support on stream
● Out of box metrics for users
What is Event Extractor ?
● User configuration in Yaml
● Confings are managed in version control and updated in s3
● Example config
Event Extractor User Interface
filterExpression: field1= 'Presented' and field2 like '%impressionToken%' and field3 not like
'%storyArt%'
projectionExpression: field_name1, field_name2, field_name3, field_name5
transformations: { OutputFieldName:inner_field, fieldName:top_level_field,
nestedFieldName:inner_field, type: type}
sinkDetails: {sinkType: kafka, name: topic_name}
ownerName: email-address
routeName: unique_name
Event Extractor Design
Config
Reader
SQL Parser
Config
Parser
Transformation
Projection
User Config Management Pipeline
Filter Function
Schema Builder
Elastic Search
Sink
Kafka Sink
Hive Sink
User
Configs via
S3
CL Enriched
Stream
Hive
Multiple Kafka Sinks
Event Extractor
● Scaling single Flink Application
● Lack of Isolation
○ Isolated by type of sink application writes to
○ Deployment per sink type (Kafka,Hive,Elasticsearch)
● Back pressure is shared between multiple consumers
○ Consumer Kafka topics are created in the same cluster
○ Canaries and testing before on boarding new config
Challenges with Event Extractor
● Buildup of Network Pressure caused S3 checkpoint failures due to socket timeouts
○ Job goes into restart loop due to high frequency of checkpoint failures
○ Better g1gc and increase s3 timeouts
● Tuning parallelism to avoid unbalanced CPU Utilization
○ Extensive CPU Flame Graphs and system metrics to identify bottlenecks
○ Setting parallelism in multiples of Kafka partitions and task slots to achieve better
cpu utilization
Learnings and Best Practices
● Flink Kafka Consumer needs continuous stream to progress high watermark (FLINK-5479)
○ StickyPartitioner Producer skips producing data to out of sync partitions
○ Setting stickyPartitioner.minQualifiedIsrRatio=1.0 helps to produce data to out of
sync partitions
● Outlier Container/Broker (due to bad hardware)
○ Consumer gets non-linear traffic pattern (stuck consumer alert)
○ Producer throws BatchExpiredTimeout Exception and increase in checkpoint
failures
Learnings and Best Practices
● Keystone (Self-Serve UI) for deployment of streaming apps
○ Out of box ELK stack support for application logs
○ Automated Alerts integration with Atlas
● Deployment Strategy
○ Minimize Duplicates, Checkpoints are stored in S3
● Restart Strategy
○ Fine-grained Recovery
Deployment
Monitoring and Alerting
Monitoring and Alerting
CL Platform Benefits
Improved
Data
Processing
Can Handle
Large Payloads
compared to
Legacy pipeline
Improved error
handling
Reduced
Data Loss
Reduced points
of failures
Ability to backfill
or reprocess
historic raw
events
Legacy Tables
Decommission
and Reduced
Storage
Redundancy
Read once and
route to
different sinks
through event
extractor
Single source of
truth (SSOT) for
CL Data in Data
warehouse
Schema
consistency
across CL
components
and Tools
Single
Source of
Truth
Reduced Cost
&
Operational
Overhead
Thank you.

Weitere ähnliche Inhalte

Was ist angesagt?

Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformVMware Tanzu
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsFlink Forward
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)DataWorks Summit
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
 
Streaming all over the world Real life use cases with Kafka Streams
Streaming all over the world  Real life use cases with Kafka StreamsStreaming all over the world  Real life use cases with Kafka Streams
Streaming all over the world Real life use cases with Kafka Streamsconfluent
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveYingjun Wu
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®confluent
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 

Was ist angesagt? (20)

Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
Streaming all over the world Real life use cases with Kafka Streams
Streaming all over the world  Real life use cases with Kafka StreamsStreaming all over the world  Real life use cases with Kafka Streams
Streaming all over the world Real life use cases with Kafka Streams
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWave
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 

Ähnlich wie Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix using Flink - Snehal Nagmote & Pallavi Phadnis

Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
Why Serverless Flink Matters - Blazing Fast Stream Processing Made ScalableWhy Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
Why Serverless Flink Matters - Blazing Fast Stream Processing Made ScalableHostedbyConfluent
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasMonal Daxini
 
Stream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream SharingStream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream Sharingconfluent
 
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Miguel Pérez Colino
 
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...Flink Forward
 
Story of migrating event pipeline from batch to streaming
Story of migrating event pipeline from batch to streamingStory of migrating event pipeline from batch to streaming
Story of migrating event pipeline from batch to streaminglohitvijayarenu
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of dataconfluent
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormRan Silberman
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEventsNeil Avery
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupMingmin Chen
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)KafkaZone
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiBowen Li
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performanceconfluent
 
Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain confluent
 
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...Kai Wähner
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 

Ähnlich wie Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix using Flink - Snehal Nagmote & Pallavi Phadnis (20)

Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
Why Serverless Flink Matters - Blazing Fast Stream Processing Made ScalableWhy Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
Stream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream SharingStream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream Sharing
 
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
 
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
 
Story of migrating event pipeline from batch to streaming
Story of migrating event pipeline from batch to streamingStory of migrating event pipeline from batch to streaming
Story of migrating event pipeline from batch to streaming
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & Storm
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEvents
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
 
Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain 
 
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 

Mehr von Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!Flink Forward
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleFlink Forward
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitFlink Forward
 

Mehr von Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 

Kürzlich hochgeladen

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix using Flink - Snehal Nagmote & Pallavi Phadnis

  • 1. Massive Scale Data Processing Pallavi Phadnis, Snehal Nagmote Flink Forward SF 2019
  • 2. ● Consolidated Logging (CL) Overview ● High Level Architecture of CL platform ● Log Processing at Scale ● Event Extractor Use Case ● Monitoring and Alerting ● Impact of Flink based Platform Agenda
  • 4. Build an integrated solution to provide insights into user behavior and application performance metrics through client-side logging. Consolidated Logging
  • 5. Use Cases Powered By CL ● Personalization ● Recommendations ● A/B Experimentation ● Application Performance
  • 6. Consolidated Logging X Event Types 300+ Log ProfileIdentify Presented NavigationLevel Focus ... Play Device Platforms / App Versions 10+ TVUI Android iOS Web ... Log Events 100s of billion events / day 1+ petabyte of user behavior data per day =
  • 7. CL Platform NETFLIX APPS OTHER INTERNAL/EXTERNAL APPS Consolidated Logging DATA COLLECTION DATA TRANSFORMATION EVENT EXTRACTION DATA ENRICHMENT - USER INFO - GEO/DEVICE INFO SITE USAGE APP NAVIGATIONS SHOWS WATCHED USER SESSIONIZATION CL Schema App Schema iOS ANDROID TVUI WWW PLATFORM FEATURES REALTIME & BATCH DATA SINKS
  • 8. Legacy Pipeline Flink Based Platform Landing Service Kafka Event Extractor CL App Kafka Streams Elasticsearch Hive tables Landing Service SQS Log Processing Server Kafka CL Streaming App CL ETL CL DW (Hive) Kafka CL Router App 13 Keystone routes S3
  • 10. ● Generic log processing application - supports different logging specifications ● Real-time processing ○ Data transformations ○ Data enrichment - Membership information, Geo, Device type ■ Joins ● Single source of truth with unified output schema ● Supports different data sinks: Kafka/Hive ● SLA ○ RPS: 3.5 million events per sec at peak, Latency: < 3ms CL App Features
  • 11. CL App Design ● Stateless Flink Application (Flink 1.4, Kafka 1.1) ○ At-least once processing ● Isolation of concerns through separate Flink jobs for different use cases/sink types ● Different job DAGs with common framework library: Fan In/ Fan Out
  • 12. Common Log Processing Framework Log Consumer Config Reader (FP) Data Enrichment Data Transformations Spec Parser Data Sink Raw events Processed events Kafka Kafka Hive / Iceberg CL Schema / App Schema Request Type & Version Source Segregated sources Multiple sinks Raw events Hive Data Partitioning Events Backup
  • 13. ● Embarrassingly parallel job (parallelism over 2000) ○ Uniform CPU utilization with high number of partitions on source kafka topic ● High memory pressure and GC pause on JM - Recovery failure/restart loop ○ Memory leak in archiving execution history (FLINK-10066) ○ Scaling bottleneck of kafka source’s union state (FLINK-10122) ● Overwhelmed coordinator due to thundering herd problem with high parallelism (KIP-266) Learnings & Best Practices
  • 14. Data compression - a factor to consider
  • 15. ● Data compression ratio was worse for parquet and kafka (~ 4x) ○ Upstream kafka producer batching difference increased data entropy ● Backlog in kafka can lead to sudden load on external micro-services ● Kafka backpressure leads to task failures ○ Duplicate events ● Guice dependency injection conflicts with Flink ○ classloader.resolve-order=parent-first Learnings & Best Practices
  • 17. Event Extractor Use Case Personalization Pipeline CL Consumers User clicks User Searches App perf metrics Impressions CL Stream (Transformed and enriched) Personalization stream Search stream Impressions stream Experimentation stream Search Pipeline Impressions Pipeline A/B Experimentation Pipeline Consumer Insights Pipeline Exploratory Analysis Customer Service Tool
  • 19. ● Growth/Scale ○ 3.5 million events/sec ○ Reading same data multiple times ■ Compute redundancy ■ Scale Kafka infrastructure for outgoing bytes ■ Operational Overhead ● High Compute and Operational cost Problems with CL Legacy Pipeline
  • 20. Keystone Routes For CL Event Extractor
  • 21. ● Stateless Single Flink Application ● Read data once, apply processing and route it to multiple streams ● Configuration driven Processing, without code change ● SQL Support on Stream ● Filter, Transformation and Projection support on stream ● Out of box metrics for users What is Event Extractor ?
  • 22. ● User configuration in Yaml ● Confings are managed in version control and updated in s3 ● Example config Event Extractor User Interface filterExpression: field1= 'Presented' and field2 like '%impressionToken%' and field3 not like '%storyArt%' projectionExpression: field_name1, field_name2, field_name3, field_name5 transformations: { OutputFieldName:inner_field, fieldName:top_level_field, nestedFieldName:inner_field, type: type} sinkDetails: {sinkType: kafka, name: topic_name} ownerName: email-address routeName: unique_name
  • 23. Event Extractor Design Config Reader SQL Parser Config Parser Transformation Projection User Config Management Pipeline Filter Function Schema Builder Elastic Search Sink Kafka Sink Hive Sink User Configs via S3 CL Enriched Stream Hive Multiple Kafka Sinks Event Extractor
  • 24. ● Scaling single Flink Application ● Lack of Isolation ○ Isolated by type of sink application writes to ○ Deployment per sink type (Kafka,Hive,Elasticsearch) ● Back pressure is shared between multiple consumers ○ Consumer Kafka topics are created in the same cluster ○ Canaries and testing before on boarding new config Challenges with Event Extractor
  • 25. ● Buildup of Network Pressure caused S3 checkpoint failures due to socket timeouts ○ Job goes into restart loop due to high frequency of checkpoint failures ○ Better g1gc and increase s3 timeouts ● Tuning parallelism to avoid unbalanced CPU Utilization ○ Extensive CPU Flame Graphs and system metrics to identify bottlenecks ○ Setting parallelism in multiples of Kafka partitions and task slots to achieve better cpu utilization Learnings and Best Practices
  • 26. ● Flink Kafka Consumer needs continuous stream to progress high watermark (FLINK-5479) ○ StickyPartitioner Producer skips producing data to out of sync partitions ○ Setting stickyPartitioner.minQualifiedIsrRatio=1.0 helps to produce data to out of sync partitions ● Outlier Container/Broker (due to bad hardware) ○ Consumer gets non-linear traffic pattern (stuck consumer alert) ○ Producer throws BatchExpiredTimeout Exception and increase in checkpoint failures Learnings and Best Practices
  • 27. ● Keystone (Self-Serve UI) for deployment of streaming apps ○ Out of box ELK stack support for application logs ○ Automated Alerts integration with Atlas ● Deployment Strategy ○ Minimize Duplicates, Checkpoints are stored in S3 ● Restart Strategy ○ Fine-grained Recovery Deployment
  • 30. CL Platform Benefits Improved Data Processing Can Handle Large Payloads compared to Legacy pipeline Improved error handling Reduced Data Loss Reduced points of failures Ability to backfill or reprocess historic raw events Legacy Tables Decommission and Reduced Storage Redundancy Read once and route to different sinks through event extractor Single source of truth (SSOT) for CL Data in Data warehouse Schema consistency across CL components and Tools Single Source of Truth Reduced Cost & Operational Overhead