SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Fabian Hueske
Flink Forward
Sep 12, 2016
Taking a look under the hood of
Apache Flink’s® relational APIs
DataStream API is not for Everyone
 Writing DataStream programs is not easy
 Requires Knowledge & Skill
• Stream processing concepts (time, state, windows, triggers, ...)
• Programming experience (Java / Scala)
 Program logic goes into UDFs
• great for expressiveness
• bad for optimization - need for manual tuning
2https://www.flickr.com/photos/scottvanderchijs/3630946389, CC BY 2.0
What are relational APIs?
 Relational APIs are declarative
• User says what is needed.
• System decides how to compute it.
 Users do not specify implementation.
 Queries are efficiently executed!
3
Agenda
 Relational Queries for streaming and batch data
 Flink’s Relational APIs
 Query Translation Step-by-Step
 Current State & Outlook
4
Relational Queries for
Streaming and Batch Data
5
Flink = Streaming and Batch
 Flink is a platform for distributed stream and batch data
processing
 Relational APIs for streaming and batch tables
• Queries on batch tables terminate and produce a finite result
• Queries on streaming tables run continuously and produce result
stream
 Same syntax & semantics for streaming and batch queries
6
Streaming Queries
 Implementing streaming applications is challenging
• Only some people have the skills
 Stream processing technology spreads rapidly
• There is a talent gap
 Lack of OS systems that support SQL on parallel streams
 Relational APIs will make this technology more accessible
7
Streaming Queries
 Consistent results require event-time processing
• Results must only depend on input data
 Not all relational operators can be naively applied
on streams
• Aggregations, joins, and set operators require windows
• Sorting is restricted
 We can make it work with some extensions & restrictions!
8
Batch Queries
 Relational queries on batch tables?
• Are you kidding? Yet another SQL-on-Hadoop solution?
 Easing application development is primary goal
• Simple things should be simple
• Built-in (SQL) functions supersede UDFs
• Better integration of data sources
 Not intended to compete with dedicated SQL engines
9
Flink’s Relational APIs
10
Relational APIs in Flink
 Flink features two relational APIs
• Table API (since Flink 0.9.0)
• SQL (since Flink 1.1.0)
 Equivalent feature set (at the moment)
• Table API and SQL can be mixed
 Both are tightly integrated with Flink’s core APIs
• DataStream
• DataSet
Table API
 Language INtegrated Query (LINQ) API
• Queries are not embedded as String
 Centered around Table objects
• Operations are applied on Tables and return a Table
 Available in Java and Scala
Table API Example (streaming)
val sensorData: DataStream[(String, Long, Double)] = ???
// convert DataSet into Table
val sensorTable: Table = sensorData
.toTable(tableEnv, 'location, ’time, 'tempF)
// define query on Table
val avgTempCTable: Table = sensorTable
.groupBy('location)
.window(Tumble over 1.days on 'rowtime as 'w)
.select('w.start as 'day, 'location,
(('tempF.avg - 32) * 0.556) as 'avgTempC)
.where('location like "room%")
SQL
 Standard SQL
 Queries are embedded as Strings into programs
 Referenced tables must be registered
 Queries return a Table object
• Integration with Table API
SQL Example (batch)
// define & register external Table
val sensorTable: new CsvTableSource(
"/path/to/data",
Array("location", "day", "tempF"), // column names
Array(String, String, Double)) // column types
tableEnv.registerTableSource("sensorData", sensorTable)
// query registered Table
val avgTempCTable: Table = tableEnv
.sql("""
SELECT day, location, AVG((tempF - 32) * 0.556) AS avgTempC
FROM sensorData
WHERE location LIKE 'room%'
GROUP BY day, location""")
Query Translation Step-by-
Step
16
2 APIs [SQL, Table API]
*
2 backends [DataStream, DataSet]
=
4 different translation paths?
17
Nope!
18
What is Apache Calcite® ?
 Apache Calcite is a SQL parsing and query optimizer framework
 Used by many other projects to parse and optimize SQL queries
• Apache Drill, Apache Hive, Apache Kylin, Cascading, …
• … and so does Flink
 The Calcite community put Streaming SQL on their agenda
• Extension to standard SQL
• Committer Julian Hyde gave a talk about Streaming SQL this morning
19
Architecture Overview
20
 Table API and SQL queries are
translated into common logical
plan representation.
 Logical plans are translated and
optimized depending on
execution backend.
 Plans are transformed into
DataSet or DataStream
programs.
Catalog
 Table definitions required for parsing, validation,
and optimization of queries
• Tables, columns, and data types
 Tables are registered in Calcite’s catalog
 Tables can be created from
• DataSets
• DataStreams
• TableSources (without going through DataSet/DataStream API)
21
Table API to Logical Plan
 API calls are translated into logical operators
and immediately validated
 API operators compose a tree
 Before optimization, the API operator tree is translated
into a logical Calcite plan
22
Table API to Logical Plan
sensorTable
.groupBy('location)
.window(Tumble over 1.days on 'rowtime as 'w)
.select('w.start as 'day, 'location,
(('tempF.avg - 32) * 0.556) as 'avgTempC)
.where('location like "room%")
23
SQL Query to Logical Plan
 Calcite parses and validates SQL queries
• Table & attribute names
• Input and return types of expressions
• …
 Calcite translates parse tree into logical plan
• Same representation as for Table API queries
24
SQL Query to Logical Plan
SELECT day, location,
AVG((tempF - 32) * 0.556) AS avgTempC
FROM sensorData
WHERE location LIKE 'room%’
GROUP BY day, location
25
Query Optimization
 Calcite features a Volcano-style optimizer
• Rule-based plan transformations
• Cost-based plan choices
 Calcite provides many optimization rules
 Custom rules to transform logical nodes into Flink nodes
• DataSet rules to translate batch queries
• DataStream rules to translate streaming queries
26
Query Optimization
27
Flink Plan to Flink Program
 Flink nodes translate themselves into DataStream or
DataSet operators
 User functions are code generated
• Expressions, conditions, built-in functions, …
 Code is generated as String
• Shipped in user-function and compiled at worker
• Janino Compiler Framework
 Batch and streaming queries share code generation logic
28
Flink Plan to Flink Program
29
Execution
 Generated operators are
embedded in DataStream or
DataSet programs.
 DataSet programs are also
optimized by Flink’s DataSet
optimizer
 Holistic execution
30
Current State & Outlook
31
Current State
 Flink 1.1 features Table API & SQL on Calcite
 Streaming SQL & Table API support
• Selection, Projection, Union
 Batch SQL & Table API support
• Selection, Projection, Sort
• Inner & Outer Equi-Joins, Set operations
32
Outlook: Streaming Table API & SQL
 Streaming Aggregates
• Table API (aiming for Flink 1.2)
• Streaming SQL (Calcite community is working on this)
 Joins
• Windowed Stream - Stream Joins
• [Static Table, Slow Stream] – Stream Joins
 More TableSource and Sinks
33
General Improvements
 Extend Code Generation
• Optimized data types
• Specialized serializers and comparators
• Aggregation functions
 More SQL functions and support for UDFs
 Stand-alone SQL client
34
Contributions welcome
 There is still a lot to do
• New operators and features
• Performance improvements
• Tooling and integration
 Get in touch and start contributing!
35
Summary
 Relational APIs for streaming and batch data
• Language-integrated Table API
• Standard SQL (for batch and stream tables)
 Joint optimization (Calcite) and code generation
 Execution as DataStream or DataSet programs
 Stream analytics for everyone!
36

Weitere ähnliche Inhalte

Was ist angesagt?

Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingFlink Forward
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestDataGyula Fóra
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Stephan Ewen
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream ProcessingGyula Fóra
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Robert Metzger
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4Till Rohrmann
 
Flink Apachecon Presentation
Flink Apachecon PresentationFlink Apachecon Presentation
Flink Apachecon PresentationGyula Fóra
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4Flink Forward
 
January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016Robert Metzger
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Apache Flink Taiwan User Group
 
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by FlinkFlink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by FlinkFlink Forward
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkRobert Metzger
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkKostas Tzoumas
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System OverviewFlink Forward
 
Aljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkAljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkFlink Forward
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Flink Forward
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsStephan Ewen
 

Was ist angesagt? (20)

Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestData
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
 
A look at Flink 1.2
A look at Flink 1.2A look at Flink 1.2
A look at Flink 1.2
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream Processing
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4
 
Flink Apachecon Presentation
Flink Apachecon PresentationFlink Apachecon Presentation
Flink Apachecon Presentation
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
 
January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
 
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by FlinkFlink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
Aljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkAljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache Flink
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
 

Andere mochten auch

Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Till Rohrmann
 
Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache FlinkStream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache FlinkFabian Hueske
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinTill Rohrmann
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords   The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords Stephan Ewen
 
Stephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereFlink Forward
 
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Flink Forward
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkVerverica
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streamsRadu Tudoran
 
Juggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary dataJuggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary dataFabian Hueske
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxiniMonal Daxini
 
Gábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
Gábor Horváth - Code Generation in Serializers and Comparators of Apache FlinkGábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
Gábor Horváth - Code Generation in Serializers and Comparators of Apache FlinkFlink Forward
 
Apache Flink Community Updates November 2016 @ Berlin Meetup
Apache Flink Community Updates November 2016 @ Berlin MeetupApache Flink Community Updates November 2016 @ Berlin Meetup
Apache Flink Community Updates November 2016 @ Berlin MeetupRobert Metzger
 
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processingTimo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processingVerverica
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextVerverica
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Till Rohrmann
 
Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits allJulian Hyde
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Ververica
 
Microservices using relocatable Docker containers
Microservices using relocatable Docker containersMicroservices using relocatable Docker containers
Microservices using relocatable Docker containersMauricio Garavaglia
 

Andere mochten auch (20)

Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
 
Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache FlinkStream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords   The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
 
Stephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink Everywhere
 
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
 
Juggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary dataJuggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary data
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
 
Gábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
Gábor Horváth - Code Generation in Serializers and Comparators of Apache FlinkGábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
Gábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
 
Apache Flink Community Updates November 2016 @ Berlin Meetup
Apache Flink Community Updates November 2016 @ Berlin MeetupApache Flink Community Updates November 2016 @ Berlin Meetup
Apache Flink Community Updates November 2016 @ Berlin Meetup
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processingTimo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits all
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®
 
Microservices using relocatable Docker containers
Microservices using relocatable Docker containersMicroservices using relocatable Docker containers
Microservices using relocatable Docker containers
 

Ähnlich wie Taking a look under the hood of Apache Flink's relational APIs.

Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices Zalando Technology
 
Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices ZalandoHayley
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Value Association
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward
 
Apache Flink Online Training
Apache Flink Online TrainingApache Flink Online Training
Apache Flink Online TrainingLearntek1
 
HBaseCon2015-final
HBaseCon2015-finalHBaseCon2015-final
HBaseCon2015-finalMaryann Xue
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureGyula Fóra
 
Apache flink 1.7 and Beyond
Apache flink 1.7 and BeyondApache flink 1.7 and Beyond
Apache flink 1.7 and BeyondTill Rohrmann
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseDataWorks Summit
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersLucidworks
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache FlinkFabian Hueske
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021ssuser8ccb5a
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxcamyla81
 

Ähnlich wie Taking a look under the hood of Apache Flink's relational APIs. (20)

Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices
 
Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
Linq
LinqLinq
Linq
 
Apache Flink Online Training
Apache Flink Online TrainingApache Flink Online Training
Apache Flink Online Training
 
Apache flink
Apache flinkApache flink
Apache flink
 
Apache flink
Apache flinkApache flink
Apache flink
 
HBaseCon2015-final
HBaseCon2015-finalHBaseCon2015-final
HBaseCon2015-final
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and Future
 
Apache flink
Apache flinkApache flink
Apache flink
 
Apache flink 1.7 and Beyond
Apache flink 1.7 and BeyondApache flink 1.7 and Beyond
Apache flink 1.7 and Beyond
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBase
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink
 
Spark sql meetup
Spark sql meetupSpark sql meetup
Spark sql meetup
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
 

Mehr von Fabian Hueske

Flink's Journey from Academia to the ASF
Flink's Journey from Academia to the ASFFlink's Journey from Academia to the ASF
Flink's Journey from Academia to the ASFFabian Hueske
 
Why and how to leverage the power and simplicity of SQL on Apache Flink
Why and how to leverage the power and simplicity of SQL on Apache FlinkWhy and how to leverage the power and simplicity of SQL on Apache Flink
Why and how to leverage the power and simplicity of SQL on Apache FlinkFabian Hueske
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Fabian Hueske
 
Apache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityApache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityFabian Hueske
 
Apache Flink - A Sneek Preview on Language Integrated Queries
Apache Flink - A Sneek Preview on Language Integrated QueriesApache Flink - A Sneek Preview on Language Integrated Queries
Apache Flink - A Sneek Preview on Language Integrated QueriesFabian Hueske
 
Apache Flink - Akka for the Win!
Apache Flink - Akka for the Win!Apache Flink - Akka for the Win!
Apache Flink - Akka for the Win!Fabian Hueske
 
Apache Flink - Community Update January 2015
Apache Flink - Community Update January 2015Apache Flink - Community Update January 2015
Apache Flink - Community Update January 2015Fabian Hueske
 

Mehr von Fabian Hueske (8)

Flink SQL in Action
Flink SQL in ActionFlink SQL in Action
Flink SQL in Action
 
Flink's Journey from Academia to the ASF
Flink's Journey from Academia to the ASFFlink's Journey from Academia to the ASF
Flink's Journey from Academia to the ASF
 
Why and how to leverage the power and simplicity of SQL on Apache Flink
Why and how to leverage the power and simplicity of SQL on Apache FlinkWhy and how to leverage the power and simplicity of SQL on Apache Flink
Why and how to leverage the power and simplicity of SQL on Apache Flink
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...
 
Apache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityApache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce Compatibility
 
Apache Flink - A Sneek Preview on Language Integrated Queries
Apache Flink - A Sneek Preview on Language Integrated QueriesApache Flink - A Sneek Preview on Language Integrated Queries
Apache Flink - A Sneek Preview on Language Integrated Queries
 
Apache Flink - Akka for the Win!
Apache Flink - Akka for the Win!Apache Flink - Akka for the Win!
Apache Flink - Akka for the Win!
 
Apache Flink - Community Update January 2015
Apache Flink - Community Update January 2015Apache Flink - Community Update January 2015
Apache Flink - Community Update January 2015
 

Kürzlich hochgeladen

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Kürzlich hochgeladen (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Taking a look under the hood of Apache Flink's relational APIs.

  • 1. Fabian Hueske Flink Forward Sep 12, 2016 Taking a look under the hood of Apache Flink’s® relational APIs
  • 2. DataStream API is not for Everyone  Writing DataStream programs is not easy  Requires Knowledge & Skill • Stream processing concepts (time, state, windows, triggers, ...) • Programming experience (Java / Scala)  Program logic goes into UDFs • great for expressiveness • bad for optimization - need for manual tuning 2https://www.flickr.com/photos/scottvanderchijs/3630946389, CC BY 2.0
  • 3. What are relational APIs?  Relational APIs are declarative • User says what is needed. • System decides how to compute it.  Users do not specify implementation.  Queries are efficiently executed! 3
  • 4. Agenda  Relational Queries for streaming and batch data  Flink’s Relational APIs  Query Translation Step-by-Step  Current State & Outlook 4
  • 6. Flink = Streaming and Batch  Flink is a platform for distributed stream and batch data processing  Relational APIs for streaming and batch tables • Queries on batch tables terminate and produce a finite result • Queries on streaming tables run continuously and produce result stream  Same syntax & semantics for streaming and batch queries 6
  • 7. Streaming Queries  Implementing streaming applications is challenging • Only some people have the skills  Stream processing technology spreads rapidly • There is a talent gap  Lack of OS systems that support SQL on parallel streams  Relational APIs will make this technology more accessible 7
  • 8. Streaming Queries  Consistent results require event-time processing • Results must only depend on input data  Not all relational operators can be naively applied on streams • Aggregations, joins, and set operators require windows • Sorting is restricted  We can make it work with some extensions & restrictions! 8
  • 9. Batch Queries  Relational queries on batch tables? • Are you kidding? Yet another SQL-on-Hadoop solution?  Easing application development is primary goal • Simple things should be simple • Built-in (SQL) functions supersede UDFs • Better integration of data sources  Not intended to compete with dedicated SQL engines 9
  • 11. Relational APIs in Flink  Flink features two relational APIs • Table API (since Flink 0.9.0) • SQL (since Flink 1.1.0)  Equivalent feature set (at the moment) • Table API and SQL can be mixed  Both are tightly integrated with Flink’s core APIs • DataStream • DataSet
  • 12. Table API  Language INtegrated Query (LINQ) API • Queries are not embedded as String  Centered around Table objects • Operations are applied on Tables and return a Table  Available in Java and Scala
  • 13. Table API Example (streaming) val sensorData: DataStream[(String, Long, Double)] = ??? // convert DataSet into Table val sensorTable: Table = sensorData .toTable(tableEnv, 'location, ’time, 'tempF) // define query on Table val avgTempCTable: Table = sensorTable .groupBy('location) .window(Tumble over 1.days on 'rowtime as 'w) .select('w.start as 'day, 'location, (('tempF.avg - 32) * 0.556) as 'avgTempC) .where('location like "room%")
  • 14. SQL  Standard SQL  Queries are embedded as Strings into programs  Referenced tables must be registered  Queries return a Table object • Integration with Table API
  • 15. SQL Example (batch) // define & register external Table val sensorTable: new CsvTableSource( "/path/to/data", Array("location", "day", "tempF"), // column names Array(String, String, Double)) // column types tableEnv.registerTableSource("sensorData", sensorTable) // query registered Table val avgTempCTable: Table = tableEnv .sql(""" SELECT day, location, AVG((tempF - 32) * 0.556) AS avgTempC FROM sensorData WHERE location LIKE 'room%' GROUP BY day, location""")
  • 17. 2 APIs [SQL, Table API] * 2 backends [DataStream, DataSet] = 4 different translation paths? 17
  • 19. What is Apache Calcite® ?  Apache Calcite is a SQL parsing and query optimizer framework  Used by many other projects to parse and optimize SQL queries • Apache Drill, Apache Hive, Apache Kylin, Cascading, … • … and so does Flink  The Calcite community put Streaming SQL on their agenda • Extension to standard SQL • Committer Julian Hyde gave a talk about Streaming SQL this morning 19
  • 20. Architecture Overview 20  Table API and SQL queries are translated into common logical plan representation.  Logical plans are translated and optimized depending on execution backend.  Plans are transformed into DataSet or DataStream programs.
  • 21. Catalog  Table definitions required for parsing, validation, and optimization of queries • Tables, columns, and data types  Tables are registered in Calcite’s catalog  Tables can be created from • DataSets • DataStreams • TableSources (without going through DataSet/DataStream API) 21
  • 22. Table API to Logical Plan  API calls are translated into logical operators and immediately validated  API operators compose a tree  Before optimization, the API operator tree is translated into a logical Calcite plan 22
  • 23. Table API to Logical Plan sensorTable .groupBy('location) .window(Tumble over 1.days on 'rowtime as 'w) .select('w.start as 'day, 'location, (('tempF.avg - 32) * 0.556) as 'avgTempC) .where('location like "room%") 23
  • 24. SQL Query to Logical Plan  Calcite parses and validates SQL queries • Table & attribute names • Input and return types of expressions • …  Calcite translates parse tree into logical plan • Same representation as for Table API queries 24
  • 25. SQL Query to Logical Plan SELECT day, location, AVG((tempF - 32) * 0.556) AS avgTempC FROM sensorData WHERE location LIKE 'room%’ GROUP BY day, location 25
  • 26. Query Optimization  Calcite features a Volcano-style optimizer • Rule-based plan transformations • Cost-based plan choices  Calcite provides many optimization rules  Custom rules to transform logical nodes into Flink nodes • DataSet rules to translate batch queries • DataStream rules to translate streaming queries 26
  • 28. Flink Plan to Flink Program  Flink nodes translate themselves into DataStream or DataSet operators  User functions are code generated • Expressions, conditions, built-in functions, …  Code is generated as String • Shipped in user-function and compiled at worker • Janino Compiler Framework  Batch and streaming queries share code generation logic 28
  • 29. Flink Plan to Flink Program 29
  • 30. Execution  Generated operators are embedded in DataStream or DataSet programs.  DataSet programs are also optimized by Flink’s DataSet optimizer  Holistic execution 30
  • 31. Current State & Outlook 31
  • 32. Current State  Flink 1.1 features Table API & SQL on Calcite  Streaming SQL & Table API support • Selection, Projection, Union  Batch SQL & Table API support • Selection, Projection, Sort • Inner & Outer Equi-Joins, Set operations 32
  • 33. Outlook: Streaming Table API & SQL  Streaming Aggregates • Table API (aiming for Flink 1.2) • Streaming SQL (Calcite community is working on this)  Joins • Windowed Stream - Stream Joins • [Static Table, Slow Stream] – Stream Joins  More TableSource and Sinks 33
  • 34. General Improvements  Extend Code Generation • Optimized data types • Specialized serializers and comparators • Aggregation functions  More SQL functions and support for UDFs  Stand-alone SQL client 34
  • 35. Contributions welcome  There is still a lot to do • New operators and features • Performance improvements • Tooling and integration  Get in touch and start contributing! 35
  • 36. Summary  Relational APIs for streaming and batch data • Language-integrated Table API • Standard SQL (for batch and stream tables)  Joint optimization (Calcite) and code generation  Execution as DataStream or DataSet programs  Stream analytics for everyone! 36