SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
STREAMING SQL
Jungtaek Lim
WHO AM I?
• Software Engineer @ Hortonworks
• remote worker
• Open source prosumer
• PMC member of Apache Storm
• Committer of Jedis
• Contributor of Apache (Spark,
Zeppelin,Ambari, Calcite), Redis,
and so on.
• Contact: kabhwan@gmail.com
WHAT END USERS WANT?
• Performance < Easy to use
• Innovation of technology is making the thing easier and easier
to end-users
NOSQL ON HADOOP
MAPREDUCE SPARK SQL
NEXT FOR STREAMING?
SQL AGAIN!
STREAMING SQL
• Unbounded real time data
• can’t be fully covered in SQL standard and requires new ideas
• No standard yet
• Apache Calcite proposes its own Streaming SQL
• https://calcite.apache.org/docs/stream.html
• aggregation and stream-relation, stream-stream join is done within window
• most of things are not implemented yet
COMPARISON
Processing
unit
SQL-like API SQL
Streaming
SQL
Status
Apache Flink Tuple O O Not yet Experimental
Apache
Storm
Micro-batch
(Trident
based)
X
O
(Pure style)
Not yet Experimental
Apache Spark Micro-batch O O
Only with
Structured
Streaming API
Alpha
SIMPLE USE CASE
1. Get JSON from Kafka
2. Filter error logs (status >=
400)
3. Project columns with user
defined function and
calculations
4. Store rows back to Kafka
STORM SQL STATEMENTS
CREATE FUNCTION GET_TIME AS 'org.apache.storm.sql.runtime.functions.scalar.datetime.GetTime2'
CREATE EXTERNAL TABLE APACHE_LOGS (id INT PRIMARY KEY, remote_ipVARCHAR, request_urlVARCHAR,
request_methodVARCHAR, statusVARCHAR, request_header_user_agentVARCHAR, time_received_utc_isoformatVARCHAR,
time_us DOUBLE) LOCATION 'kafka://localhost:2181/brokers?topic=apachelogs' TBLPROPERTIES '{"producer":
{"bootstrap.servers":"localhost:
9092","acks":"1","key.serializer":"org.apache.storm.kafka.IntSerializer","value.serializer":"org.apache.storm.kafka.ByteBufferSerializer"}}'
CREATE EXTERNAL TABLE APACHE_ERROR_LOGS (id INT PRIMARY KEY, remote_ipVARCHAR, request_url
VARCHAR, request_methodVARCHAR, status INT, request_header_user_agentVARCHAR, time_received_utc_isoformat
VARCHAR, time_received_timestamp BIGINT, time_elapsed_ms INT) LOCATION 'kafka://localhost:2181/brokers?
topic=apacheerrorlogs' TBLPROPERTIES '{"producer":{"bootstrap.servers":"localhost:
9092","acks":"1","key.serializer":"org.apache.storm.kafka.IntSerializer","value.serializer":"org.apache.storm.kafka.ByteBufferSerializer"}}'
INSERT INTO APACHE_ERROR_LOGS SELECT ID, REMOTE_IP, REQUEST_URL, REQUEST_METHOD,
CAST(STATUS AS INT) AS STATUS_INT, REQUEST_HEADER_USER_AGENT,TIME_RECEIVED_UTC_ISOFORMAT,
GET_TIME(TIME_RECEIVED_UTC_ISOFORMAT, 'yyyy-MM-dd''T''HH:mm:ssZZ') AS
TIME_RECEIVED_TIMESTAMP, (TIME_US / 1000) ASTIME_ELAPSED_MS FROM APACHE_LOGS WHERE
(CAST(STATUS AS INT) / 100) >= 4
Input topic Output topic
CALCITE PROPOSAL
https://calcite.apache.org/docs/stream.html
PROPOSAL
WINDOWING
SELECT STREAM
TUMBLE_END(rowtime,
INTERVAL '1' HOUR) AS
rowtime, productId
FROM Orders
GROUP BY
TUMBLE(rowtime,
INTERVAL '1' HOUR),
productId
HAVING COUNT(*) > 2 OR
SUM(units) > 10;
rowtime productId
10:00:00 30
11:00:00 10
11:00:00 40
PROPOSAL
STREAMTO RELATION JOIN
SELECT STREAM
o.rowtime, o.productId,
o.orderId, o.units,
p.name, p.unitPrice
FROM Orders AS o
JOIN Products AS p
ON o.productId =
p.productId;
rowtime productI
d
orderId units name unitPrice
10:17:00 30 5 4 Cheese 17
10:17:05 10 6 1 Beer 0.25
10:18:05 20 7 2 Wine 6
10:18:07 30 8 20 Cheese 17
11:02:00 10 9 6 Beer 0.25
11:04:00 10 10 1 Beer 0.25
11:09:30 40 11 12 Bread 100
11:24:11 10 12 4 Beer 0.25
PROPOSAL
STREAMTO RELATION JOIN (CONT.)
SELECT STREAM *
FROM Orders AS o
JOIN ProductVersions AS p
ON o.productId = p.productId
AND o.rowtime
BETWEEN p.startDate
AND p.endDate;
- ProductVersions is a temporal
versioned table
- unit price of product 10 is
increased to 0.35 at 11:00
rowtime productId orderId units
productId
1
name unitPrice
10:17:00 30 5 4 30 Cheese 17
10:17:05 10 6 1 10 Beer 0.25
10:18:05 20 7 2 20 Wine 6
10:18:07 30 8 20 30 Cheese 17
11:02:00 10 9 6 10 Beer 0.35
11:04:00 10 10 1 10 Beer 0.35
11:09:30 40 11 12 40 Bread 100
11:24:11 10 12 4 10 Beer 0.35
PROPOSAL
STREAMTO STREAM JOIN
SELECT STREAM o.rowtime,
o.productId, o.orderId, s.rowtime AS
shipTime
FROM Orders AS o
JOIN Shipments AS s
ON o.orderId = s.orderId
AND s.rowtime BETWEEN
o.rowtime AND o.rowtime +
INTERVAL '1' HOUR;
rowtime productId orderId shipTime
10:17:00 30 5 10:55:00
10:17:05 10 6 10:20:00
11:02:00 10 9 11:58:00
STILL NOT ENOUGH?
GUI
Drag and drop, configure, done!
img source: https://community.hortonworks.com/articles/8422/visualize-near-real-time-stock-price-changes-using.html
THANKS!

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
 
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache ZeppelinData science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
Building Streaming Applications with Apache Storm 1.1
Building Streaming Applications with Apache Storm 1.1Building Streaming Applications with Apache Storm 1.1
Building Streaming Applications with Apache Storm 1.1
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
fluentd -- the missing log collector
fluentd -- the missing log collectorfluentd -- the missing log collector
fluentd -- the missing log collector
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streams
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on Kubernetes
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafka
 
Apache Apex & Bigtop
Apache Apex & BigtopApache Apex & Bigtop
Apache Apex & Bigtop
 
Cooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython NotebookCooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython Notebook
 

Ähnlich wie Streaming SQL

Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
Have your Cake and Eat it Too - Architecture for Batch and Real-time processingHave your Cake and Eat it Too - Architecture for Batch and Real-time processing
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
DataWorks Summit
 

Ähnlich wie Streaming SQL (20)

C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
 
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
Have your Cake and Eat it Too - Architecture for Batch and Real-time processingHave your Cake and Eat it Too - Architecture for Batch and Real-time processing
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
 
Building Streaming Applications with Streaming SQL
Building Streaming Applications with Streaming SQLBuilding Streaming Applications with Streaming SQL
Building Streaming Applications with Streaming SQL
 
Managing the Continuous Delivery of Code to AWS Lambda
Managing the Continuous Delivery of Code to AWS LambdaManaging the Continuous Delivery of Code to AWS Lambda
Managing the Continuous Delivery of Code to AWS Lambda
 
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
 
Into The Box 2018 Ortus Keynote
Into The Box 2018 Ortus KeynoteInto The Box 2018 Ortus Keynote
Into The Box 2018 Ortus Keynote
 
2018 10-31 modern-http_routing-lisa18
2018 10-31 modern-http_routing-lisa182018 10-31 modern-http_routing-lisa18
2018 10-31 modern-http_routing-lisa18
 
Software as a Service workshop / Unlocked: the Hybrid Cloud 12th May 2014
Software as a Service workshop / Unlocked: the Hybrid Cloud 12th May 2014Software as a Service workshop / Unlocked: the Hybrid Cloud 12th May 2014
Software as a Service workshop / Unlocked: the Hybrid Cloud 12th May 2014
 
API First with Connexion - PyConWeb 2018
API First with Connexion - PyConWeb 2018API First with Connexion - PyConWeb 2018
API First with Connexion - PyConWeb 2018
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Atlassian Connect on Serverless Platforms: Low Cost Add-Ons
Atlassian Connect on Serverless Platforms: Low Cost Add-OnsAtlassian Connect on Serverless Platforms: Low Cost Add-Ons
Atlassian Connect on Serverless Platforms: Low Cost Add-Ons
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
 
Owin and Katana
Owin and KatanaOwin and Katana
Owin and Katana
 
Introducing Gridiron Security and Compliance Management Platform and Enclave ...
Introducing Gridiron Security and Compliance Management Platform and Enclave ...Introducing Gridiron Security and Compliance Management Platform and Enclave ...
Introducing Gridiron Security and Compliance Management Platform and Enclave ...
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
SmokeTests
SmokeTestsSmokeTests
SmokeTests
 

Kürzlich hochgeladen

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Kürzlich hochgeladen (20)

Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 

Streaming SQL

  • 2. WHO AM I? • Software Engineer @ Hortonworks • remote worker • Open source prosumer • PMC member of Apache Storm • Committer of Jedis • Contributor of Apache (Spark, Zeppelin,Ambari, Calcite), Redis, and so on. • Contact: kabhwan@gmail.com
  • 4. • Performance < Easy to use • Innovation of technology is making the thing easier and easier to end-users
  • 9. STREAMING SQL • Unbounded real time data • can’t be fully covered in SQL standard and requires new ideas • No standard yet • Apache Calcite proposes its own Streaming SQL • https://calcite.apache.org/docs/stream.html • aggregation and stream-relation, stream-stream join is done within window • most of things are not implemented yet
  • 10. COMPARISON Processing unit SQL-like API SQL Streaming SQL Status Apache Flink Tuple O O Not yet Experimental Apache Storm Micro-batch (Trident based) X O (Pure style) Not yet Experimental Apache Spark Micro-batch O O Only with Structured Streaming API Alpha
  • 11. SIMPLE USE CASE 1. Get JSON from Kafka 2. Filter error logs (status >= 400) 3. Project columns with user defined function and calculations 4. Store rows back to Kafka
  • 12. STORM SQL STATEMENTS CREATE FUNCTION GET_TIME AS 'org.apache.storm.sql.runtime.functions.scalar.datetime.GetTime2' CREATE EXTERNAL TABLE APACHE_LOGS (id INT PRIMARY KEY, remote_ipVARCHAR, request_urlVARCHAR, request_methodVARCHAR, statusVARCHAR, request_header_user_agentVARCHAR, time_received_utc_isoformatVARCHAR, time_us DOUBLE) LOCATION 'kafka://localhost:2181/brokers?topic=apachelogs' TBLPROPERTIES '{"producer": {"bootstrap.servers":"localhost: 9092","acks":"1","key.serializer":"org.apache.storm.kafka.IntSerializer","value.serializer":"org.apache.storm.kafka.ByteBufferSerializer"}}' CREATE EXTERNAL TABLE APACHE_ERROR_LOGS (id INT PRIMARY KEY, remote_ipVARCHAR, request_url VARCHAR, request_methodVARCHAR, status INT, request_header_user_agentVARCHAR, time_received_utc_isoformat VARCHAR, time_received_timestamp BIGINT, time_elapsed_ms INT) LOCATION 'kafka://localhost:2181/brokers? topic=apacheerrorlogs' TBLPROPERTIES '{"producer":{"bootstrap.servers":"localhost: 9092","acks":"1","key.serializer":"org.apache.storm.kafka.IntSerializer","value.serializer":"org.apache.storm.kafka.ByteBufferSerializer"}}' INSERT INTO APACHE_ERROR_LOGS SELECT ID, REMOTE_IP, REQUEST_URL, REQUEST_METHOD, CAST(STATUS AS INT) AS STATUS_INT, REQUEST_HEADER_USER_AGENT,TIME_RECEIVED_UTC_ISOFORMAT, GET_TIME(TIME_RECEIVED_UTC_ISOFORMAT, 'yyyy-MM-dd''T''HH:mm:ssZZ') AS TIME_RECEIVED_TIMESTAMP, (TIME_US / 1000) ASTIME_ELAPSED_MS FROM APACHE_LOGS WHERE (CAST(STATUS AS INT) / 100) >= 4
  • 13.
  • 16. PROPOSAL WINDOWING SELECT STREAM TUMBLE_END(rowtime, INTERVAL '1' HOUR) AS rowtime, productId FROM Orders GROUP BY TUMBLE(rowtime, INTERVAL '1' HOUR), productId HAVING COUNT(*) > 2 OR SUM(units) > 10; rowtime productId 10:00:00 30 11:00:00 10 11:00:00 40
  • 17. PROPOSAL STREAMTO RELATION JOIN SELECT STREAM o.rowtime, o.productId, o.orderId, o.units, p.name, p.unitPrice FROM Orders AS o JOIN Products AS p ON o.productId = p.productId; rowtime productI d orderId units name unitPrice 10:17:00 30 5 4 Cheese 17 10:17:05 10 6 1 Beer 0.25 10:18:05 20 7 2 Wine 6 10:18:07 30 8 20 Cheese 17 11:02:00 10 9 6 Beer 0.25 11:04:00 10 10 1 Beer 0.25 11:09:30 40 11 12 Bread 100 11:24:11 10 12 4 Beer 0.25
  • 18. PROPOSAL STREAMTO RELATION JOIN (CONT.) SELECT STREAM * FROM Orders AS o JOIN ProductVersions AS p ON o.productId = p.productId AND o.rowtime BETWEEN p.startDate AND p.endDate; - ProductVersions is a temporal versioned table - unit price of product 10 is increased to 0.35 at 11:00 rowtime productId orderId units productId 1 name unitPrice 10:17:00 30 5 4 30 Cheese 17 10:17:05 10 6 1 10 Beer 0.25 10:18:05 20 7 2 20 Wine 6 10:18:07 30 8 20 30 Cheese 17 11:02:00 10 9 6 10 Beer 0.35 11:04:00 10 10 1 10 Beer 0.35 11:09:30 40 11 12 40 Bread 100 11:24:11 10 12 4 10 Beer 0.35
  • 19. PROPOSAL STREAMTO STREAM JOIN SELECT STREAM o.rowtime, o.productId, o.orderId, s.rowtime AS shipTime FROM Orders AS o JOIN Shipments AS s ON o.orderId = s.orderId AND s.rowtime BETWEEN o.rowtime AND o.rowtime + INTERVAL '1' HOUR; rowtime productId orderId shipTime 10:17:00 30 5 10:55:00 10:17:05 10 6 10:20:00 11:02:00 10 9 11:58:00
  • 21. GUI Drag and drop, configure, done! img source: https://community.hortonworks.com/articles/8422/visualize-near-real-time-stock-price-changes-using.html