Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
AthenaX: Streaming
Processing Platform
@Uber
Bill Liu, Haohui Mai
APRIL 11, 2017
Speakers
Bill Liu Haohui Mai, @wheat9
• Senior Software Engineer @ Uber
• PMC, Apache Hadoop & Storm
• Senior Software Eng...
• Uber: Transport A → B on demand
reliably
• Dynamic marketplace
• Example: UberEATS
Uber business is real-time
Challenges
Infra.: Reliability & scalability
• 99.99% SLA on latency
• At-least-once processing
• Billions of messages
• M...
Building streaming applications
Thrift
01001…
• Framework-specific
• Ad-hoc management over the life-
cycles
The AthenaX approach
SELECT AVG(…)
FROM eats_order
WHERE …
Write SQLs to build streaming applications
01001…
Generic table...
• Write SQLs to build streaming applications
• Insight: generic table
• Reliable, scalable processing based on Apache
Flin...
Agenda
• Motivating example
• Case study: ETD in UberEATS
• Implementation
• Current status
• Conclusion
Example
Real-time dashboard for restaurants
…
Time
AvgPrep.time
Time
…
SELECT meal_id, AVG(meal_prep_time)
FROM eats_order...
Example (cont.)
Building streaming processing applications with SQL
SELECT AVG(meal_prep_time) FROM
eats_order
GROUP BY me...
Example (cont.)
SELECT * FROM (
SELECT EXPECTED_TIME(meal_id)
AS e, meal_id,

AVG(meal_prep_time) AS t

FROM eats_order
GR...
Agenda
• Motivating example
• Case study: ETD in UberEATS
• Implementation
• Current status
• Conclusion
The case of UberEATS
• Three-way marketplace
• Real-time metrics
• Estimated Time to Delivery (ETD)
• Transactions
• Deman...
The case of UberEATS
• Three-way marketplace
• Real-time metrics
• Estimated Time to Delivery (ETD)
• Transactions
• Deman...
Predicting the ETD
• Key metric: time to prepare a meal(tprep)
• Learn a function f: (order status) → tprep periodically
•...
Architecture of the ETD service
Prediction
service
Order status
(Kafka)
AthenaX
Data warehouse
Feature / Model
(Cassandra)...
Agenda
• Motivating example
• Case study: ETD in UberEATS
• Implementation
• Current status
• Conclusion
Architecture
SQL
Catalog
Query planner
Optimizer
Deployment
Monitoring
Flink job
AthenaX runtime
Flink on YARN
HDFS
Athena...
Executing AthenaX applications
• Compilation + Code generation
• Flink SQLAPIs: SQL → Logical plans → Flink
applications
•...
AthenaX as a self-serving platform
• Metadata / catalog management
• Job management
• Monitoring
• Resource management and...
Agenda
• Motivating example
• Case study: ETD in UberEATS
• Implementation
• Current status
• Conclusion
Current status
• Pilot jobs in production
• In the process of full-scale roll outs
• Based on Apache Flink 1.3-SNAPSHOT
• ...
Embrace the community
• Group window support for streaming SQL
• CALCITE-1603, CALCITE-1615
• FLINK-5624, FLINK-5710, FLIN...
Agenda
• Motivating example
• Case study: ETD in UberEATS
• Implementation
• Current status
• Conclusion
Conclusion
• AthenaX: write SQLs to build streaming applications
• Treat table as a generic concept
• Productivity: develo...
First & Last Name
Thank you
Compiling SQL
LogicalTableScan
LogicalProject
LogicalAggregate
LogicalProject
SELECT AVG(meal_prep_time)
FROM eats_order
G...
Lazy deserialization
Example of SQL optimization
SELECT
AVG(meal_prep_time)
FROM eats_order
Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming processing platform on Flink
Nächste SlideShare
Wird geladen in …5
×

Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming processing platform on Flink

753 Aufrufe

Veröffentlicht am

The mission of Uber is to make transportation as reliable as running water. The business is fundamentally driven by real-time data -- more than half of the employees in Uber, many of whom are non-technical, use SQL on a regular basis to analyze data and power their business decisions. We are building AthenaX, a stream processing platform built on top of Apache Flink to enable our users to write SQL to process real-time data efficiently and reliably at Uber's scale. Using Apache Calcite as query parser, AthenaX compiles the SQL down to Flink jobs. Leveraging Flink's unique streaming capabilities, AthenaX supports (1) consistent computations reliably thanks to at-least-once guarantees, (2) nontrivial analytics (e.g., windowing and joins) on multiple data sources, and (3) efficient and cost-effective executions in production through code generation and elastic scaling.

Veröffentlicht in: Daten & Analysen
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming processing platform on Flink

  1. 1. AthenaX: Streaming Processing Platform @Uber Bill Liu, Haohui Mai APRIL 11, 2017
  2. 2. Speakers Bill Liu Haohui Mai, @wheat9 • Senior Software Engineer @ Uber • PMC, Apache Hadoop & Storm • Senior Software Engineer @ Uber
  3. 3. • Uber: Transport A → B on demand reliably • Dynamic marketplace • Example: UberEATS Uber business is real-time
  4. 4. Challenges Infra.: Reliability & scalability • 99.99% SLA on latency • At-least-once processing • Billions of messages • Multiple PB / day Solutions: Productivity • Audiences: majority of employees use SQL actively • Abstractions: Flink / DSL? • Integrations: data management, monitoring, reporting, etc.
  5. 5. Building streaming applications Thrift 01001… • Framework-specific • Ad-hoc management over the life- cycles
  6. 6. The AthenaX approach SELECT AVG(…) FROM eats_order WHERE … Write SQLs to build streaming applications 01001… Generic tables Compilation Thrift• Decouple business logics with framework • Unified integration & management
  7. 7. • Write SQLs to build streaming applications • Insight: generic table • Reliable, scalable processing based on Apache Flink • Develop & deploy streaming applications in production in hours instead of weeks AthenaX: Streaming processing platform @ Uber
  8. 8. Agenda • Motivating example • Case study: ETD in UberEATS • Implementation • Current status • Conclusion
  9. 9. Example Real-time dashboard for restaurants … Time AvgPrep.time Time … SELECT meal_id, AVG(meal_prep_time) FROM eats_order GROUP BY meal_id, HOP(proctime(), INTERVAL ‘1’ MINUTE, INTERVAL ‘15’ MINUTE)
  10. 10. Example (cont.) Building streaming processing applications with SQL SELECT AVG(meal_prep_time) FROM eats_order GROUP BY meal_id, HOP(proctime(), INTERVAL ‘1’ MINUTE, INTERVAL ‘15’ MINUTE)
  11. 11. Example (cont.) SELECT * FROM ( SELECT EXPECTED_TIME(meal_id) AS e, meal_id,
 AVG(meal_prep_time) AS t
 FROM eats_order GROUP BY meal_id, HOP(proctime(), INTERVAL ‘1’ MINUTE, INTERVAL ‘15’ MINUTE) Building streaming processing applications with SQL Tables are more generic than analytical stores RPC
  12. 12. Agenda • Motivating example • Case study: ETD in UberEATS • Implementation • Current status • Conclusion
  13. 13. The case of UberEATS • Three-way marketplace • Real-time metrics • Estimated Time to Delivery (ETD) • Transactions • Demand forecasts
  14. 14. The case of UberEATS • Three-way marketplace • Real-time metrics • Estimated Time to Delivery (ETD) • Transactions • Demand forecasts
  15. 15. Predicting the ETD • Key metric: time to prepare a meal(tprep) • Learn a function f: (order status) → tprep periodically • Predict the ETD for current orders using f • AthenaX extracts features for both learnings and predictions
  16. 16. Architecture of the ETD service Prediction service Order status (Kafka) AthenaX Data warehouse Feature / Model (Cassandra) Online features Offline features Machinelearning SELECT AVG(meal_prep_time) FROM eats_order GROUP BY meal_id, HOP(proctime(), INTERVAL ‘1’ MINUTE,
  17. 17. Agenda • Motivating example • Case study: ETD in UberEATS • Implementation • Current status • Conclusion
  18. 18. Architecture SQL Catalog Query planner Optimizer Deployment Monitoring Flink job AthenaX runtime Flink on YARN HDFS AthenaX Flink
  19. 19. Executing AthenaX applications • Compilation + Code generation • Flink SQLAPIs: SQL → Logical plans → Flink applications • Leverage the Volcano optimizer in Apache Calcite • Challenges: exposing streaming semantics Query planner Optimizer Deployment Monitoring Compile SQLs to Flink applications
  20. 20. AthenaX as a self-serving platform • Metadata / catalog management • Job management • Monitoring • Resource management and elastic scaling • Failure recovery Query planner Optimizer Deployment Monitoring Self-serving production support end-to-end
  21. 21. Agenda • Motivating example • Case study: ETD in UberEATS • Implementation • Current status • Conclusion
  22. 22. Current status • Pilot jobs in production • In the process of full-scale roll outs • Based on Apache Flink 1.3-SNAPSHOT • Projection, filtering, group windows, UDF • Streaming joins not yet supported
  23. 23. Embrace the community • Group window support for streaming SQL • CALCITE-1603, CALCITE-1615 • FLINK-5624, FLINK-5710, FLINK-6011, FLINK-6012 • Stability fixes • FLINK-3679, FLINK-5631 • Table abstractions for Cassandra / JDBC (WIP) • Available in the upcoming 1.3 release Contributions to the upstream
  24. 24. Agenda • Motivating example • Case study: ETD in UberEATS • Implementation • Current status • Conclusion
  25. 25. Conclusion • AthenaX: write SQLs to build streaming applications • Treat table as a generic concept • Productivity: development → production in hours • The AthenaX approach • SQL on streams as a platform • Self-serving production support end-to-end
  26. 26. First & Last Name Thank you
  27. 27. Compiling SQL LogicalTableScan LogicalProject LogicalAggregate LogicalProject SELECT AVG(meal_prep_time) FROM eats_order GROUP BY meal_id, HOP(proctime(), INTERVAL ‘1’ MINUTE, val eats = getEatsOrder() eats.window(Slide .over(“15.minutes”) .every(“1.minute”))  .avg(“meal_prep_time”) Parsing DataStreamScan DataStreamCalc DataStreamAggregate DataStreamCalc Planning 01001…
  28. 28. Lazy deserialization Example of SQL optimization SELECT AVG(meal_prep_time) FROM eats_order

×