SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Streaming your Lyft Ride Prices
Flink Forward, San Francisco, April 2nd 2019
Akshay Balwally | Engineer, Pricing
Thomas Weise | @thweise | Engineer, Streaming Platform
go.lyft.com/streaming-prices-ffsf-2019
Agenda
2
● Introduction to dynamic pricing
● Legacy pricing infrastructure
● Streaming based infrastructure
● Beam & multiple languages
● Beam Flink runner
● Lessons learned
3
Dynamic Pricing
Supply/Demand curve
ETA
Pricing
Notifications
Detect Delays
Coupons
User Delight
Fraud
Behaviour Fingerprinting
Monetary Impact
Imperative to act fast
Top Destinations
Core Experience
Introduction to
Dynamic Pricing
4
5
● Dynamic Pricing- price changes minutely at each
location bucket
● Why?
○ At face value, dynamic pricing is strange
○ But Lyft’s marketplace changes quickly
Dynamic Pricing- What and Why?
The Marketplace affects Prices
● An Imbalanced Market is Inefficient
○ Too many available drivers: bad
○ Too few available drivers: bad
○ Solution: Price lever controls passenger
request rate, which maintains healthy
supply levels
● Result: increase price if demand >> supply
6
What is PrimeTime?
● Belief: There exists some set of
optimal price multipliers per
location/time bucket
● PrimeTime- Lyft product that sets a
multiplier for each gh6 each
minute
● Example: In ‘9q8yyv’, at 5:01pm
PST, PrimeTime = 2.0
● Scale: Roughly 3 million
geohashes prices every minute
7
Why is PrimeTime Hard?
● 1. Need low-latency information about supply and demand
● 2. Pricing is an unsupervised problem- correct answer is never observed
● Solution: break the problem into multiple models that form a DAG, where
intermediate models are solving supervised problems
○ Example: f (available_supply) -> pickup_times
8
Legacy Pricing
Infrastructure
9
Legacy architecture: A series of cron jobs
● Ingest high volume of client app events
(Kinesis, KCL)
● Compute features (e.g. demand,
conversation rate, supply) from events
● Run ML models on features to compute
primetime for all regions (per min, per gh6)
SFO, calendar_min_1: {gh6: 1.0, gh6: 2.0, ...}
NYC: calendar_min_1: {gh6, 2.0, gh6: 1.0, ...}
10
Problems
1. Latency
2. Code complexity (LOC)
3. Hard to add new features involving windowing/join (i.e. arbitrary demand
windows, subregional computation)
4. No dynamic / smart triggers
11
Can we use Flink?
12
13
Streaming Stack
13
Streaming
Application
(SQL, Java)
Stream / Schema
Registry
Deployment
Tooling
Metrics &
Dashboards
Alerts Logging
Amazon
EC2
Amazon S3 Wavefront
Salt
(Config / Orca)
Docker
Source Sink
14
Streaming and Python
● Flink and many other big data ecosystem projects are Java / JVM based
○ Team wants to adopt streaming, but doesn’t have the Java skills
○ Jython != Python
● Use cases for different language environments
○ Python primary option for Machine Learning
● Cost of many API styles and runtime environments
15
Python via Beam
Streaming
Application
(Python/Beam)
Source Sink
Streaming based
Pricing Infrastructure
16
17
Pipeline (conceptual outline)
kinesis events
(source)
aggregate and
window
filter events
run models to
generate
features
(culminating in
PT)
internal services
redis
ride_requested,
app_open, ...
unique_users_per_min,
unique_requests_per_5_
min, ...
conversion learner,
eta learner, ...
Lyft apps
(phones)
valid sessions,
dedupe, ...
Details of implementation
1. Filtering (with internal service calls)
2. Aggregation with Beam windowing: 1min, 5min (by event time)
3. Triggers: watermark, data-driven (stateful processing)
4. Join multiple streams: CoGroup or stateful processing
5. Machine learning models invoked within Beam transforms
6. Final gh6:pt output from pipeline stored to Redis
18
Gains
• Latency: 3 minutes -> 30s
‒ Latency now dominated by model execution
• Reuse of model code
• 10K => 4K LOC
• 300 => 120 AWS instances
19
Beam and multiple
languages
20
21
The Beam Vision
1. End users: who want to write pipelines in a
language that’s familiar.
2. SDK writers: who want to make Beam
concepts available in new languages.
Includes IOs: connectors to data stores.
3. Runner writers: who have a distributed
processing environment and want to
support Beam pipelines
Beam Model: Fn Runners
Apache
Flink
Apache
Spark
Beam Model: Pipeline Construction
Other
LanguagesBeam Java
Beam
Python
Execution Execution
Cloud
Dataflow
Execution
https://s.apache.org/apache-beam-project-overview
22
Multi-Language Support
● Initially Java SDK and Java Runners
● 2016: Start of cross-language support effort
● 2017: Python SDK on Dataflow
● 2018: Go SDK (for portable runners)
● 2018: Python on Flink MVP
● Next: Cross-language pipelines, more portable runners
23
Python Example
p = beam.Pipeline(runner=runner, options=pipeline_options)
(p
| ReadFromText("/path/to/text*") | Map(lambda line: ...)
| WindowInto(FixedWindows(120)
trigger=AfterWatermark(
early=AfterProcessingTime(60),
late=AfterCount(1))
accumulation_mode=ACCUMULATING)
| CombinePerKey(sum))
| WriteToText("/path/to/outputs")
)
result = p.run()
( What, Where, When, How )
24
⋮
input | Sum.PerKey()
Python
input.apply(
Sum.integersPerKey())
Java
SELECT key, SUM(value)
FROM input GROUP BY key
SQL (via Java)
⋮
Cloud Dataflow
Apache Spark
Apache Flink
Apache Apex
Gearpump
Apache Samza
Apache Nemo
(incubating)
IBM Streams
Sum Per Key
Java objects
Sum Per Key
Dataflow JSON API
Portability (originally)
https://s.apache.org/state-of-beam-sfo-2018
25
⋮
input | Sum.PerKey()
Python
stats.Sum(s, input)
Go
SELECT key, SUM(value)
FROM input GROUP BY key
SQL (via Java)
⋮
input.apply(
Sum.integersPerKey())
Java Apache Spark
Apache Flink
Apache Apex
Gearpump
Cloud Dataflow
Apache Samza
Apache Nemo
(incubating)
IBM Streams
Sum Per Key
Java objects
Sum Per Key
Portable protos
Portability (current)
https://s.apache.org/state-of-beam-sfo-2018
Beam Flink Runner
26
27
Portable Flink Runner
SDK
(Python)
Job Service
Artifact
Staging
Job Manager
Fn Services
(Beam Flink Task)
Task Manager
Executor / Fn API
Provision Control Data
Artifact
Retrieval
State Logging
gRPC
Pipeline (protobuf)
ClusterRunner
Dependencies
(optional)
python -m
apache_beam.examples.wordcount 
--input=/etc/profile 
--output=/tmp/py-wordcount-direct 
--runner=PortableRunner 
--job_endpoint=localhost:8099 
--streaming
Staging Location
(DFS, S3, …)
SDK Worker
(UDFs)
SDK Worker
(UDFs)
SDK Worker
(Python)
Flink Job
28
Lyft Flink Runner Customizations
● Translator extension for streaming sources
○ Kinesis, Kafka consumers that we also use in Java Flink jobs
○ Message decoding, watermarks
● Python execution environment for SDK workers
○ Tailored to internal deployment tooling
○ Docker-free, frozen virtual envs
● https://github.com/lyft/beam/tree/release-2.11.0-lyft
Robert Bradshaw, Beam Summit London, 2018
Workers have Fn
Runner
Runner worker launches services.
Control
Service
Data
Service
State
Service
Logging
Service
30
Fn API
How slow is this ?
● Fn API Overhead 15% ?
● Fused stages
● Bundle size
● Parallel SDK workers
● TODO: Cython
● protobuf C++ bindings
decode, …, window count
(messages
| 'reshuffle' >> beam.Reshuffle()
| 'decode' >> beam.Map(lambda x: (__import__('random').randint(0, 511), 1))
| 'noop1' >> beam.Map(lambda x : x)
| 'noop2' >> beam.Map(lambda x : x)
| 'noop3' >> beam.Map(lambda x : x)
| 'window' >> beam.WindowInto(window.GlobalWindows(),
trigger=Repeatedly(AfterProcessingTime(5 * 1000)),
accumulation_mode= AccumulationMode.DISCARDING)
| 'group' >> beam.GroupByKey()
| 'count' >> beam.Map(count)
)
31
Fast enough for real Python work !
● c5.4xlarge machines (16 vCPU, 32 GB)
● 16 SDK workers / machine
● 1000 ms or 1000 records / bundle
● 280,000 transforms / second / machine (~ 17,500 per worker)
● Python user code will be gating factor
32
Beam Portability Recap
● Pipelines written in non-JVM languages on JVM runners
○ Python, Go on Flink (and others)
● Full isolation of user code
○ Native CPython execution w/o library restrictions
● Flexible SDK worker execution
○ Docker, Process, Embedded, ...
● Multiple languages in a single pipeline (WIP)
○ Use Java Beam IO with Python
○ Use TFX with Java
○ <your use case here>
33
Feature Support Matrix (Beam 2.11.0)
https://s.apache.org/apache-beam-portability-support-table
Lessons Learned
34
Lessons Learned
• Python Beam SDK and portable Flink runner evolving
• Keep pipeline simple - Flink tasks / shuffles are not free
• Stateful processing is essential for complex logic
• Model execution latency matters
• Instrument everything for monitoring
• Think about pipeline restart and upgrade
• Mind your dependencies - rate limit API calls
• Long running Python processes may expose memory leaks
35
We’re Hiring! Apply at www.lyft.com/careers
or email data-recruiting@lyft.com
Data Engineering
Engineering Manager
San Francisco
Software Engineer
San Francisco, Seattle, &
New York City
Data Infrastructure
Engineering Manager
San Francisco
Software Engineer
San Francisco & Seattle
Experimentation
Software Engineer
San Francisco
Streaming
Software Engineer
San Francisco
Observability
Software Engineer
San Francisco
Please ask questions!
This presentation:
http://go.lyft.com/streaming-prices-ffsf-2019

Weitere ähnliche Inhalte

Was ist angesagt?

Flink Forward San Francisco 2019: Elastic Data Processing with Apache Flink a...
Flink Forward San Francisco 2019: Elastic Data Processing with Apache Flink a...Flink Forward San Francisco 2019: Elastic Data Processing with Apache Flink a...
Flink Forward San Francisco 2019: Elastic Data Processing with Apache Flink a...Flink Forward
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Thomas Weise
 
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...Flink Forward
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Thomas Weise
 
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...Flink Forward
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuFlink Forward
 
Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...
Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...
Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...Flink Forward
 
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...Flink Forward
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward
 
Dynamic pricing of Lyft rides using streaming
Dynamic pricing of Lyft rides using streamingDynamic pricing of Lyft rides using streaming
Dynamic pricing of Lyft rides using streamingAmar Pai
 
Flink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksFlink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksEron Wright
 
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward
 
Python Streaming Pipelines with Beam on Flink
Python Streaming Pipelines with Beam on FlinkPython Streaming Pipelines with Beam on Flink
Python Streaming Pipelines with Beam on FlinkAljoscha Krettek
 
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...Flink Forward
 
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...Flink Forward
 
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...Flink Forward
 
Kubernetes + Operator + PaaSTA = Flink @ Yelp - Antonio Verardi, Yelp
Kubernetes + Operator + PaaSTA = Flink @ Yelp -  Antonio Verardi, YelpKubernetes + Operator + PaaSTA = Flink @ Yelp -  Antonio Verardi, Yelp
Kubernetes + Operator + PaaSTA = Flink @ Yelp - Antonio Verardi, YelpFlink Forward
 
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...Flink Forward
 
Flink Forward Berlin 2017: Patrick Lucas - Flink in Containerland
Flink Forward Berlin 2017: Patrick Lucas - Flink in ContainerlandFlink Forward Berlin 2017: Patrick Lucas - Flink in Containerland
Flink Forward Berlin 2017: Patrick Lucas - Flink in ContainerlandFlink Forward
 
Flink Forward Berlin 2017: Dominik Bruhn - Deploying Flink Jobs as Docker Con...
Flink Forward Berlin 2017: Dominik Bruhn - Deploying Flink Jobs as Docker Con...Flink Forward Berlin 2017: Dominik Bruhn - Deploying Flink Jobs as Docker Con...
Flink Forward Berlin 2017: Dominik Bruhn - Deploying Flink Jobs as Docker Con...Flink Forward
 

Was ist angesagt? (20)

Flink Forward San Francisco 2019: Elastic Data Processing with Apache Flink a...
Flink Forward San Francisco 2019: Elastic Data Processing with Apache Flink a...Flink Forward San Francisco 2019: Elastic Data Processing with Apache Flink a...
Flink Forward San Francisco 2019: Elastic Data Processing with Apache Flink a...
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
 
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
 
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
 
Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...
Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...
Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...
 
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
 
Dynamic pricing of Lyft rides using streaming
Dynamic pricing of Lyft rides using streamingDynamic pricing of Lyft rides using streaming
Dynamic pricing of Lyft rides using streaming
 
Flink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksFlink Connector Development Tips & Tricks
Flink Connector Development Tips & Tricks
 
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
 
Python Streaming Pipelines with Beam on Flink
Python Streaming Pipelines with Beam on FlinkPython Streaming Pipelines with Beam on Flink
Python Streaming Pipelines with Beam on Flink
 
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
 
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
 
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
 
Kubernetes + Operator + PaaSTA = Flink @ Yelp - Antonio Verardi, Yelp
Kubernetes + Operator + PaaSTA = Flink @ Yelp -  Antonio Verardi, YelpKubernetes + Operator + PaaSTA = Flink @ Yelp -  Antonio Verardi, Yelp
Kubernetes + Operator + PaaSTA = Flink @ Yelp - Antonio Verardi, Yelp
 
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
 
Flink Forward Berlin 2017: Patrick Lucas - Flink in Containerland
Flink Forward Berlin 2017: Patrick Lucas - Flink in ContainerlandFlink Forward Berlin 2017: Patrick Lucas - Flink in Containerland
Flink Forward Berlin 2017: Patrick Lucas - Flink in Containerland
 
Flink Forward Berlin 2017: Dominik Bruhn - Deploying Flink Jobs as Docker Con...
Flink Forward Berlin 2017: Dominik Bruhn - Deploying Flink Jobs as Docker Con...Flink Forward Berlin 2017: Dominik Bruhn - Deploying Flink Jobs as Docker Con...
Flink Forward Berlin 2017: Dominik Bruhn - Deploying Flink Jobs as Docker Con...
 

Ähnlich wie Streaming Lyft Ride Prices with Beam and Flink

The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...Karthik Murugesan
 
Flink Forward Berlin 2018: Thomas Weise & Aljoscha Krettek - "Python Streamin...
Flink Forward Berlin 2018: Thomas Weise & Aljoscha Krettek - "Python Streamin...Flink Forward Berlin 2018: Thomas Weise & Aljoscha Krettek - "Python Streamin...
Flink Forward Berlin 2018: Thomas Weise & Aljoscha Krettek - "Python Streamin...Flink Forward
 
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Aljoscha Krettek
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyftmarkgrover
 
Maintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queuesMaintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queuesPaolo Corti
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Bowen Li
 
Service Mesh with Envoy and Istio
Service Mesh with Envoy and IstioService Mesh with Envoy and Istio
Service Mesh with Envoy and IstioArvind Thangamani
 
P4_tutorial.pdf
P4_tutorial.pdfP4_tutorial.pdf
P4_tutorial.pdfPramodhN3
 
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...apidays
 
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021StreamNative
 
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computingTimothy Spann
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)Yuuki Takano
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonStephan Ewen
 
Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)Apache Apex
 
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Taiwan User Group
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward
 
Powering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonPowering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonTatiana Al-Chueyr
 
CNCF Singapore - Introduction to Envoy
CNCF Singapore - Introduction to EnvoyCNCF Singapore - Introduction to Envoy
CNCF Singapore - Introduction to EnvoyHarish
 

Ähnlich wie Streaming Lyft Ride Prices with Beam and Flink (20)

The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...
 
Flink Forward Berlin 2018: Thomas Weise & Aljoscha Krettek - "Python Streamin...
Flink Forward Berlin 2018: Thomas Weise & Aljoscha Krettek - "Python Streamin...Flink Forward Berlin 2018: Thomas Weise & Aljoscha Krettek - "Python Streamin...
Flink Forward Berlin 2018: Thomas Weise & Aljoscha Krettek - "Python Streamin...
 
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Maintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queuesMaintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queues
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
 
Service Mesh with Envoy and Istio
Service Mesh with Envoy and IstioService Mesh with Envoy and Istio
Service Mesh with Envoy and Istio
 
P4_tutorial.pdf
P4_tutorial.pdfP4_tutorial.pdf
P4_tutorial.pdf
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
 
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
 
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)
 
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
 
Powering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonPowering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and Python
 
CNCF Singapore - Introduction to Envoy
CNCF Singapore - Introduction to EnvoyCNCF Singapore - Introduction to Envoy
CNCF Singapore - Introduction to Envoy
 

Mehr von Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 

Mehr von Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Kürzlich hochgeladen

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Kürzlich hochgeladen (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Streaming Lyft Ride Prices with Beam and Flink

  • 1. Streaming your Lyft Ride Prices Flink Forward, San Francisco, April 2nd 2019 Akshay Balwally | Engineer, Pricing Thomas Weise | @thweise | Engineer, Streaming Platform go.lyft.com/streaming-prices-ffsf-2019
  • 2. Agenda 2 ● Introduction to dynamic pricing ● Legacy pricing infrastructure ● Streaming based infrastructure ● Beam & multiple languages ● Beam Flink runner ● Lessons learned
  • 3. 3 Dynamic Pricing Supply/Demand curve ETA Pricing Notifications Detect Delays Coupons User Delight Fraud Behaviour Fingerprinting Monetary Impact Imperative to act fast Top Destinations Core Experience
  • 5. 5 ● Dynamic Pricing- price changes minutely at each location bucket ● Why? ○ At face value, dynamic pricing is strange ○ But Lyft’s marketplace changes quickly Dynamic Pricing- What and Why?
  • 6. The Marketplace affects Prices ● An Imbalanced Market is Inefficient ○ Too many available drivers: bad ○ Too few available drivers: bad ○ Solution: Price lever controls passenger request rate, which maintains healthy supply levels ● Result: increase price if demand >> supply 6
  • 7. What is PrimeTime? ● Belief: There exists some set of optimal price multipliers per location/time bucket ● PrimeTime- Lyft product that sets a multiplier for each gh6 each minute ● Example: In ‘9q8yyv’, at 5:01pm PST, PrimeTime = 2.0 ● Scale: Roughly 3 million geohashes prices every minute 7
  • 8. Why is PrimeTime Hard? ● 1. Need low-latency information about supply and demand ● 2. Pricing is an unsupervised problem- correct answer is never observed ● Solution: break the problem into multiple models that form a DAG, where intermediate models are solving supervised problems ○ Example: f (available_supply) -> pickup_times 8
  • 10. Legacy architecture: A series of cron jobs ● Ingest high volume of client app events (Kinesis, KCL) ● Compute features (e.g. demand, conversation rate, supply) from events ● Run ML models on features to compute primetime for all regions (per min, per gh6) SFO, calendar_min_1: {gh6: 1.0, gh6: 2.0, ...} NYC: calendar_min_1: {gh6, 2.0, gh6: 1.0, ...} 10
  • 11. Problems 1. Latency 2. Code complexity (LOC) 3. Hard to add new features involving windowing/join (i.e. arbitrary demand windows, subregional computation) 4. No dynamic / smart triggers 11
  • 12. Can we use Flink? 12
  • 13. 13 Streaming Stack 13 Streaming Application (SQL, Java) Stream / Schema Registry Deployment Tooling Metrics & Dashboards Alerts Logging Amazon EC2 Amazon S3 Wavefront Salt (Config / Orca) Docker Source Sink
  • 14. 14 Streaming and Python ● Flink and many other big data ecosystem projects are Java / JVM based ○ Team wants to adopt streaming, but doesn’t have the Java skills ○ Jython != Python ● Use cases for different language environments ○ Python primary option for Machine Learning ● Cost of many API styles and runtime environments
  • 17. 17 Pipeline (conceptual outline) kinesis events (source) aggregate and window filter events run models to generate features (culminating in PT) internal services redis ride_requested, app_open, ... unique_users_per_min, unique_requests_per_5_ min, ... conversion learner, eta learner, ... Lyft apps (phones) valid sessions, dedupe, ...
  • 18. Details of implementation 1. Filtering (with internal service calls) 2. Aggregation with Beam windowing: 1min, 5min (by event time) 3. Triggers: watermark, data-driven (stateful processing) 4. Join multiple streams: CoGroup or stateful processing 5. Machine learning models invoked within Beam transforms 6. Final gh6:pt output from pipeline stored to Redis 18
  • 19. Gains • Latency: 3 minutes -> 30s ‒ Latency now dominated by model execution • Reuse of model code • 10K => 4K LOC • 300 => 120 AWS instances 19
  • 21. 21 The Beam Vision 1. End users: who want to write pipelines in a language that’s familiar. 2. SDK writers: who want to make Beam concepts available in new languages. Includes IOs: connectors to data stores. 3. Runner writers: who have a distributed processing environment and want to support Beam pipelines Beam Model: Fn Runners Apache Flink Apache Spark Beam Model: Pipeline Construction Other LanguagesBeam Java Beam Python Execution Execution Cloud Dataflow Execution https://s.apache.org/apache-beam-project-overview
  • 22. 22 Multi-Language Support ● Initially Java SDK and Java Runners ● 2016: Start of cross-language support effort ● 2017: Python SDK on Dataflow ● 2018: Go SDK (for portable runners) ● 2018: Python on Flink MVP ● Next: Cross-language pipelines, more portable runners
  • 23. 23 Python Example p = beam.Pipeline(runner=runner, options=pipeline_options) (p | ReadFromText("/path/to/text*") | Map(lambda line: ...) | WindowInto(FixedWindows(120) trigger=AfterWatermark( early=AfterProcessingTime(60), late=AfterCount(1)) accumulation_mode=ACCUMULATING) | CombinePerKey(sum)) | WriteToText("/path/to/outputs") ) result = p.run() ( What, Where, When, How )
  • 24. 24 ⋮ input | Sum.PerKey() Python input.apply( Sum.integersPerKey()) Java SELECT key, SUM(value) FROM input GROUP BY key SQL (via Java) ⋮ Cloud Dataflow Apache Spark Apache Flink Apache Apex Gearpump Apache Samza Apache Nemo (incubating) IBM Streams Sum Per Key Java objects Sum Per Key Dataflow JSON API Portability (originally) https://s.apache.org/state-of-beam-sfo-2018
  • 25. 25 ⋮ input | Sum.PerKey() Python stats.Sum(s, input) Go SELECT key, SUM(value) FROM input GROUP BY key SQL (via Java) ⋮ input.apply( Sum.integersPerKey()) Java Apache Spark Apache Flink Apache Apex Gearpump Cloud Dataflow Apache Samza Apache Nemo (incubating) IBM Streams Sum Per Key Java objects Sum Per Key Portable protos Portability (current) https://s.apache.org/state-of-beam-sfo-2018
  • 27. 27 Portable Flink Runner SDK (Python) Job Service Artifact Staging Job Manager Fn Services (Beam Flink Task) Task Manager Executor / Fn API Provision Control Data Artifact Retrieval State Logging gRPC Pipeline (protobuf) ClusterRunner Dependencies (optional) python -m apache_beam.examples.wordcount --input=/etc/profile --output=/tmp/py-wordcount-direct --runner=PortableRunner --job_endpoint=localhost:8099 --streaming Staging Location (DFS, S3, …) SDK Worker (UDFs) SDK Worker (UDFs) SDK Worker (Python) Flink Job
  • 28. 28 Lyft Flink Runner Customizations ● Translator extension for streaming sources ○ Kinesis, Kafka consumers that we also use in Java Flink jobs ○ Message decoding, watermarks ● Python execution environment for SDK workers ○ Tailored to internal deployment tooling ○ Docker-free, frozen virtual envs ● https://github.com/lyft/beam/tree/release-2.11.0-lyft
  • 29. Robert Bradshaw, Beam Summit London, 2018 Workers have Fn Runner Runner worker launches services. Control Service Data Service State Service Logging Service
  • 30. 30 Fn API How slow is this ? ● Fn API Overhead 15% ? ● Fused stages ● Bundle size ● Parallel SDK workers ● TODO: Cython ● protobuf C++ bindings decode, …, window count (messages | 'reshuffle' >> beam.Reshuffle() | 'decode' >> beam.Map(lambda x: (__import__('random').randint(0, 511), 1)) | 'noop1' >> beam.Map(lambda x : x) | 'noop2' >> beam.Map(lambda x : x) | 'noop3' >> beam.Map(lambda x : x) | 'window' >> beam.WindowInto(window.GlobalWindows(), trigger=Repeatedly(AfterProcessingTime(5 * 1000)), accumulation_mode= AccumulationMode.DISCARDING) | 'group' >> beam.GroupByKey() | 'count' >> beam.Map(count) )
  • 31. 31 Fast enough for real Python work ! ● c5.4xlarge machines (16 vCPU, 32 GB) ● 16 SDK workers / machine ● 1000 ms or 1000 records / bundle ● 280,000 transforms / second / machine (~ 17,500 per worker) ● Python user code will be gating factor
  • 32. 32 Beam Portability Recap ● Pipelines written in non-JVM languages on JVM runners ○ Python, Go on Flink (and others) ● Full isolation of user code ○ Native CPython execution w/o library restrictions ● Flexible SDK worker execution ○ Docker, Process, Embedded, ... ● Multiple languages in a single pipeline (WIP) ○ Use Java Beam IO with Python ○ Use TFX with Java ○ <your use case here>
  • 33. 33 Feature Support Matrix (Beam 2.11.0) https://s.apache.org/apache-beam-portability-support-table
  • 35. Lessons Learned • Python Beam SDK and portable Flink runner evolving • Keep pipeline simple - Flink tasks / shuffles are not free • Stateful processing is essential for complex logic • Model execution latency matters • Instrument everything for monitoring • Think about pipeline restart and upgrade • Mind your dependencies - rate limit API calls • Long running Python processes may expose memory leaks 35
  • 36. We’re Hiring! Apply at www.lyft.com/careers or email data-recruiting@lyft.com Data Engineering Engineering Manager San Francisco Software Engineer San Francisco, Seattle, & New York City Data Infrastructure Engineering Manager San Francisco Software Engineer San Francisco & Seattle Experimentation Software Engineer San Francisco Streaming Software Engineer San Francisco Observability Software Engineer San Francisco
  • 37. Please ask questions! This presentation: http://go.lyft.com/streaming-prices-ffsf-2019