SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Uber Business Metrics
Generation and Management
Through Apache Flink
Uber
Engineer at Uber Marketplace
Wenrui Meng
Miao Yu
2018 Flink Forward Asia
Uber Mission
We ignite opportunity by setting the world in motion.
我们构建交通服务平台,为全球点燃机遇之火.
● Marketplace
● Project background
● Architecture
● User case
● Challenges
● Future work
Outline
Marketplace Team
Marketplace
Platform
Marketplace
Data
Marketplace
Dynamics
Marketplace
Fare
Marketplace Data
● Mission: Empower teams throughout Uber to understand and improve the efficiency of
the marketplace by providing real time data for our Marketplace levers, modeling and
analysis across 15 teams
● Data sources: trip lifecycle, matching, rider, driver, fare, mobile across 100+ Apache
Kafka topics
● Data categories: trip, rider, driver, session, product, geo, time
Marketplace Data
Trip Flow Patterns
Demand Patterns
Human Consumption - Incentives
Human Consumption - Regional Metrics
Marketplace Data
attribution note
Real-Time Analytics
● Usage pattern
● Issues
● Solution
Project Background
● Metrics are aggregated on the fly
from raw data.
● Metric definitions have a DAG
structure.
● Metrics are usually aggregated
upon a limited set of dimensions,
e.g., time, geo, and etc.
Usage Pattern
● Resource waste
● Bad performance
● Inconsistency
● ...
Issues
Solution
● An E2E platform
○ Provides a unified DSL to define metrics for multiple datasources like Kafka, Elastic Search,
and etc.
○ Based on metric definitions, optimizes execution plan based on objectives including
performance, isolation, and etc.
○ Directly generates metrics, instead of generating an intermediate raw dataset and then
aggregating on the fly.
Architecture
● Formula: a DSL describing an aggregation or arithmetic operation over aggregation
metric_A + metric_B * metric_C - avg((table_A where (exists(column_A) AND (column_B != 0))).column_C)
● Dimensions: over which this metric will be aggregated
Time, geo, product type, and ...
● Sinks: physical storage of this metric
Elastic search, Cassandra, MySQL, and ...
Metric Definition
1. Parse all metric definitions
2. Build a DAG of the most basic Flink operations needed for generating
these metrics.
3. Merging nodes and edges in operation DAG to form a new DAG of
logical metric groups, to reduce duplication, promote isolation and
improve performance.
4. Generate metric group config files and feed them into execution plan
generator.
Execution Optimizer
1. Parse metric group config files.
2. Given configs, build a DAG of stream operators. Within each stream
operator, there could be also a DAG of event operators.
3. Applied pre-defined operator templates and generate the final Flink
execution plan.
Execution Plan Generator
● Given specific set of metric, export SQL for it.
● Given SQL query, analyze percentage overlap with existing data in the
platform
● Convert the adhoc query to leverage existing data in platform
● Auto debugging to find root cause of data abnormality
● Metric discovery
● ….
Functionalities
● Avoid inconsistency (only 1 source of truth)
● Avoid inefficiency in development and duplicate effort
● Avoid cluster resource waste by optimization framework proposed
● Avoid duplicate definition by detecting the duplication when adding or
updating metric
● Improve the metric definition (ex if there is existing metrics A+B to
define C, it’s not good to define from scratch)
● Decouple metrics definition and generation techniques, easy to adopt
new techniques
● Decouple platform from storage
Benefits
Driver status summary of one minute:
1.Driver utilization per hexagon on 1 minute
2.Driver utilization per city on 1 minute
3.Driver total total worked minutes per hour
4.Open cars of each hexagon at 1PM in San Francisco
Back-end
service
Apache Kafka
message
Pipeline
sinks
every 4
seconds
Use Case Walkthrough
There are two solutions that support these three use cases:
1. Store aggregation for each driver and hexagon based on one minute
into Elasticsearch. Different service queries with additional on-the-fly
aggregation.
2. Streaming jobs to calculate the exact data needed to feed into service
or model. Streaming job maintained by different teams.
Dimension Value
Hexagon id Hex_1
Driver id Driver_1
Status Driving_client
Others.. ..
Use Case Walkthrough
Driver team
Location-update
event
Driver summary
per minute
Metric group 1:
driver per hour
Metric group 2:
driver utilization
hexagon
Metric group 3:
driver utilization
city
Driver_id, hex_id,
status, others
Metric group 4:
total open car
hexagon
Pricing team
Surge team
Heath team
Use Case Walkthrough
Our new framework supports all these use cases but without any duplicate
computations or inconsistency.
Complex topology
● With the scale and size of Uber’s current metrics, we will encounter very
complex topology once most of the metrics are onboarded onto this
framework.
● We may need to cluster the tasks and split them into a few separate
jobs, connected to Apache Kafka.
Challenges
Unstable sinks
● Each metrics group may have a list of different sinks including, Apache
Hive, Apache Kafka, ELK, Cassandra, etc. These storage systems are
owned by different teams at Uber, with different SLAs.
● Slow and unstable storage could bring down the whole pipeline SLA. It’s
desirable to separate the sink from processor. At Uber, we have a
dedicated publisher pipeline to fulfill the sink tasks with the ability to turn
on/off specific sink.
Challenges
Metrics evolution
● User will add new metric and update the existing metric. How do we
support this with minimum effort?
○ We need distinguish the golden sets from ad hoc cases. For the
golden set, a bach of metrics changes are reviewed and put into
pipeline.
○ For ad hoc use cases, a user can run the pipeline by themselves but
won’t receive much support.
Challenges
Onboarding
● At Uber, there are 15 teams and 20+ services that use Gairos. It’s a big
challenge to ask each client to change their way to consume data.
● We host our gateway to query data in Gairos. It’’s desirable to build
conversion in gateway to convert the old way consuming data to new
approach.
Challenges
● Improve execution optimizer
● Improve tooling
Future Work
Uber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache Flink

Weitere ähnliche Inhalte

Was ist angesagt?

Scalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERScalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERShuyi Chen
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline DevelopmentTimothy Spann
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsDave Gardner
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
Data Warehousing with Python
Data Warehousing with PythonData Warehousing with Python
Data Warehousing with PythonMartin Loetzsch
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyBuilding real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyKishore Gopalakrishna
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
Data and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineageData and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineageJulien Le Dem
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceDatabricks
 
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...Databricks
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidDatabricks
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPconfluent
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...HostedbyConfluent
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streamingdatamantra
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 

Was ist angesagt? (20)

Scalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERScalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBER
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
Data Warehousing with Python
Data Warehousing with PythonData Warehousing with Python
Data Warehousing with Python
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyBuilding real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case study
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Data and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineageData and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineage
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and Druid
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 

Ähnlich wie Uber Business Metrics Generation and Management Through Apache Flink

Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger AnalyticsItzhak Kameli
 
Exploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerExploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerSambit Banerjee
 
Slack in the Age of Prometheus
Slack in the Age of PrometheusSlack in the Age of Prometheus
Slack in the Age of PrometheusGeorge Luong
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
 
Modernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringModernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringManageEngine, Zoho Corporation
 
Adaptive Scaling of Microgateways on Kubernetes
Adaptive Scaling of Microgateways on KubernetesAdaptive Scaling of Microgateways on Kubernetes
Adaptive Scaling of Microgateways on KubernetesWSO2
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uberconfluent
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareApache Apex
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupMingmin Chen
 
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsPRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsNECST Lab @ Politecnico di Milano
 
Autoscaling in kubernetes v1
Autoscaling in kubernetes v1Autoscaling in kubernetes v1
Autoscaling in kubernetes v1JurajHantk
 
2022-05-23-DevOps pro Europe - Managing Apps at scale.pdf
2022-05-23-DevOps pro Europe - Managing Apps at scale.pdf2022-05-23-DevOps pro Europe - Managing Apps at scale.pdf
2022-05-23-DevOps pro Europe - Managing Apps at scale.pdfŁukasz Piątkowski
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerFederico Palladoro
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Etu Solution
 
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...Fatima Qayyum
 
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud
Kai Wähner
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningDatabricks
 

Ähnlich wie Uber Business Metrics Generation and Management Through Apache Flink (20)

Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger Analytics
 
Exploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerExploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access Layer
 
Slack in the Age of Prometheus
Slack in the Age of PrometheusSlack in the Age of Prometheus
Slack in the Age of Prometheus
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Modernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringModernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoring
 
Adaptive Scaling of Microgateways on Kubernetes
Adaptive Scaling of Microgateways on KubernetesAdaptive Scaling of Microgateways on Kubernetes
Adaptive Scaling of Microgateways on Kubernetes
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
 
Autoscaling in Kubernetes
Autoscaling in KubernetesAutoscaling in Kubernetes
Autoscaling in Kubernetes
 
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsPRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
 
Autoscaling in kubernetes v1
Autoscaling in kubernetes v1Autoscaling in kubernetes v1
Autoscaling in kubernetes v1
 
2022-05-23-DevOps pro Europe - Managing Apps at scale.pdf
2022-05-23-DevOps pro Europe - Managing Apps at scale.pdf2022-05-23-DevOps pro Europe - Managing Apps at scale.pdf
2022-05-23-DevOps pro Europe - Managing Apps at scale.pdf
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
 
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
 
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud

 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
 

Kürzlich hochgeladen

Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 

Kürzlich hochgeladen (20)

Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 

Uber Business Metrics Generation and Management Through Apache Flink

  • 1. Uber Business Metrics Generation and Management Through Apache Flink Uber Engineer at Uber Marketplace Wenrui Meng Miao Yu 2018 Flink Forward Asia
  • 2. Uber Mission We ignite opportunity by setting the world in motion. 我们构建交通服务平台,为全球点燃机遇之火.
  • 3. ● Marketplace ● Project background ● Architecture ● User case ● Challenges ● Future work Outline
  • 5. Marketplace Data ● Mission: Empower teams throughout Uber to understand and improve the efficiency of the marketplace by providing real time data for our Marketplace levers, modeling and analysis across 15 teams ● Data sources: trip lifecycle, matching, rider, driver, fare, mobile across 100+ Apache Kafka topics ● Data categories: trip, rider, driver, session, product, geo, time
  • 6. Marketplace Data Trip Flow Patterns Demand Patterns Human Consumption - Incentives Human Consumption - Regional Metrics
  • 8.
  • 10. ● Usage pattern ● Issues ● Solution Project Background
  • 11. ● Metrics are aggregated on the fly from raw data. ● Metric definitions have a DAG structure. ● Metrics are usually aggregated upon a limited set of dimensions, e.g., time, geo, and etc. Usage Pattern
  • 12. ● Resource waste ● Bad performance ● Inconsistency ● ... Issues
  • 13. Solution ● An E2E platform ○ Provides a unified DSL to define metrics for multiple datasources like Kafka, Elastic Search, and etc. ○ Based on metric definitions, optimizes execution plan based on objectives including performance, isolation, and etc. ○ Directly generates metrics, instead of generating an intermediate raw dataset and then aggregating on the fly.
  • 15. ● Formula: a DSL describing an aggregation or arithmetic operation over aggregation metric_A + metric_B * metric_C - avg((table_A where (exists(column_A) AND (column_B != 0))).column_C) ● Dimensions: over which this metric will be aggregated Time, geo, product type, and ... ● Sinks: physical storage of this metric Elastic search, Cassandra, MySQL, and ... Metric Definition
  • 16. 1. Parse all metric definitions 2. Build a DAG of the most basic Flink operations needed for generating these metrics. 3. Merging nodes and edges in operation DAG to form a new DAG of logical metric groups, to reduce duplication, promote isolation and improve performance. 4. Generate metric group config files and feed them into execution plan generator. Execution Optimizer
  • 17. 1. Parse metric group config files. 2. Given configs, build a DAG of stream operators. Within each stream operator, there could be also a DAG of event operators. 3. Applied pre-defined operator templates and generate the final Flink execution plan. Execution Plan Generator
  • 18. ● Given specific set of metric, export SQL for it. ● Given SQL query, analyze percentage overlap with existing data in the platform ● Convert the adhoc query to leverage existing data in platform ● Auto debugging to find root cause of data abnormality ● Metric discovery ● …. Functionalities
  • 19. ● Avoid inconsistency (only 1 source of truth) ● Avoid inefficiency in development and duplicate effort ● Avoid cluster resource waste by optimization framework proposed ● Avoid duplicate definition by detecting the duplication when adding or updating metric ● Improve the metric definition (ex if there is existing metrics A+B to define C, it’s not good to define from scratch) ● Decouple metrics definition and generation techniques, easy to adopt new techniques ● Decouple platform from storage Benefits
  • 20. Driver status summary of one minute: 1.Driver utilization per hexagon on 1 minute 2.Driver utilization per city on 1 minute 3.Driver total total worked minutes per hour 4.Open cars of each hexagon at 1PM in San Francisco Back-end service Apache Kafka message Pipeline sinks every 4 seconds Use Case Walkthrough
  • 21. There are two solutions that support these three use cases: 1. Store aggregation for each driver and hexagon based on one minute into Elasticsearch. Different service queries with additional on-the-fly aggregation. 2. Streaming jobs to calculate the exact data needed to feed into service or model. Streaming job maintained by different teams. Dimension Value Hexagon id Hex_1 Driver id Driver_1 Status Driving_client Others.. .. Use Case Walkthrough
  • 22. Driver team Location-update event Driver summary per minute Metric group 1: driver per hour Metric group 2: driver utilization hexagon Metric group 3: driver utilization city Driver_id, hex_id, status, others Metric group 4: total open car hexagon Pricing team Surge team Heath team Use Case Walkthrough Our new framework supports all these use cases but without any duplicate computations or inconsistency.
  • 23. Complex topology ● With the scale and size of Uber’s current metrics, we will encounter very complex topology once most of the metrics are onboarded onto this framework. ● We may need to cluster the tasks and split them into a few separate jobs, connected to Apache Kafka. Challenges
  • 24. Unstable sinks ● Each metrics group may have a list of different sinks including, Apache Hive, Apache Kafka, ELK, Cassandra, etc. These storage systems are owned by different teams at Uber, with different SLAs. ● Slow and unstable storage could bring down the whole pipeline SLA. It’s desirable to separate the sink from processor. At Uber, we have a dedicated publisher pipeline to fulfill the sink tasks with the ability to turn on/off specific sink. Challenges
  • 25. Metrics evolution ● User will add new metric and update the existing metric. How do we support this with minimum effort? ○ We need distinguish the golden sets from ad hoc cases. For the golden set, a bach of metrics changes are reviewed and put into pipeline. ○ For ad hoc use cases, a user can run the pipeline by themselves but won’t receive much support. Challenges
  • 26. Onboarding ● At Uber, there are 15 teams and 20+ services that use Gairos. It’s a big challenge to ask each client to change their way to consume data. ● We host our gateway to query data in Gairos. It’’s desirable to build conversion in gateway to convert the old way consuming data to new approach. Challenges
  • 27. ● Improve execution optimizer ● Improve tooling Future Work