SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Windowing in Apache Apex
Yogi Devendra
yogidevendra@apache.org
(with comparison to micro-batch)
Agenda
● Windowing : Why? What?
● Example
● Window sizes : Apex terminologies
● Windowing : Internals
● Windowing : Operator callbacks
● Rolling statistics using sliding windows
● Comparison:
○ Apex windowing with micro-batches
2
Image ref [4]3
Calculate Amount of water
Image ref [5]4
Streams?
Windowing: Why?
● Data in motion ⇒ Unbounded datasets[1]
○ No beginning, No end
● Compute expects finite data
● Failure recovery requires book keeping
● We need some frame of reference for tracking
5
Windowing: What?
● Data is flowing w.r.t time
● Computers understands time
● Use time axis as a reference
● Break the stream into finite time slices
⇒ Streaming Windows
6
Example 1
7 Image ref [6]
Example 1a : %change
8
● Input :
○ Stream A = Stock price
○ Stream B = Index price
● Output : Stream C = %Change difference
○ %change (Stock) - %change(Index)
○ 1 data point per sec (Max over 1 sec)
● Window size for this operation is 1 sec
Example 1b: %change, avg
9
● Input : A = Stock price, B = Index price
● Output :
○ Stream C = % Change difference
■ Max(%change (Stock) - %change(Index)) 1 point per sec
○ Stream D = Avg stock price over 1 min
■ 1 data point per min
● Window size for Avg operation is 1 min
Apex Computation model (recap)[9]
● Directed Acyclic Graph ⇒ Application [DAG]
● Nodes ⇒ Computation units [Operators]
● Edges ⇒ Sequence of data tuples [Streams]
10
Filtered
Stream
Output StreamTuple Tuple
FilteredStream
Enriched
Stream
Enriched
Stream
er
Operator
er
Operator
er
Operator
er
Operator
Application
11
Operator Operation Output stream Window Size
Percent
change
%change (Stock) -
%change(Index)
Stream C 1 sec
Avg price Avg over 1 min Stream D 1 min
Input
Adapter
Percent
change
Avg.
Price
Index price
Stock price
(1 per sec)
(1 per min)
12
Apex terminologies
● Streaming window size
○ What is smallest time slice
to be considered for this
application?
● Application window count
○ How many streaming
windows does this operator
take to complete one unit of
work?
35mm
20mm
least count
= 1mm
Terms explained: Example 1b %change, avg
13
● Streaming window size
○ Smallest time slice ⇒ 1 sec
● Application window count
○ Percent change ⇒ 1 sec = 1 streaming window
○ Avg. price ⇒ 1 min = 60 streaming window
Input
Adapter
Percent
change
Avg.
Price
Index price
Stock price
(1 per sec)
(1 per min)
Streaming window size
● Application level configuration
● Platform default = 500 ms
● Platform default is good enough for most
applications
14
● Operator level configuration
● Platform default = 1
● If the operator is not doing special operations
over multiple streaming window
⇒ use default
Application window count
15
Configuring windowing parameters
16
dag.setAttribute(
DAGContext.STREAMING_WINDOW_SIZE_MILLIS, 1000);
dag.setAttribute(operator,
DAGContext.APPLICATION_WINDOW_COUNT,60);
● Setting Streaming window size to 1 sec
● Setting application window count to 60
streaming windows
Windows at input adapters
17
Container (for input adapter)
Begin Window
(Streaming window)
control tuple
Data Tuple
End Window
(Streaming window)
control tuple
Window
N
Buffer Server
Window
N+1
Window
Generator
Control tuples
Input
Adapter Data tuples
Input Node
Typical window
Windows at Operators
18
Container
Control tuples
Operator
Data
tuples
Generic Node
Window
N
Buffer Server
Window
N+1
Begin Window
Incoming Data
Tuple
Outgoing Data
Tuple
Data
tuples
End Window
Tuples flowing in stream
19
Input
Operator
Operator 1 Operator 2 Operator 3
Begin Window Data Tuple End Window
WNWN+1WN+2
As
time
progress
Windowing : Operator callbacks
20
● If operator wish to do some processing at
window level:
○ Configure APPLICATION_WINDOW_COUNT
○ Implement :
■ beginWindow(long windowId)
■ endWindow()
Windowing : Operator callbacks (continued)
21
● Platform wraps operators inside Node
(InputNode, GenericNode)
○ looks at the control tuples for streaming windows
boundaries
○ Invokes operator beginWindow(), endWindow()
based on APPLICATION_WINDOW_COUNT
Examples: Per window operations
22
● Aggregate computations
○ Avg over last 1 min
○ Max over last 1 sec
● Writing to external store in batch
○ Data written file system e.g. HDFS
Example 2 : Rolling statistics
23
● Twitter trends
○ show top 10 URLs mentioned in the tweets
○ Results over last 5 mins
○ Update results every half second
Application
24
● Input : Stream of tweet samples
● Output : Top 10 trending URLs
○ over last 5 mins
○ emit results every half second
Twitter
Input
URL
extractor
Unique
URL
Counter
Top N
counter
Sliding windows
25
● Rolling statistics
○ Results over last X windows
○ Emit results after every M windows.
WN-2WN-1WNWN+1WN+2
Windowed statistics
26
Slide-by Window count [11]
27
● Operator level configuration
● App developer should specify:
○ After how many streaming windows should
operator emit rolling statistics?
○ How to merge results across windows (unifier)
● Value between : 1 to APPLICATION_WINDOW_COUNT
● Default
○ Turned off : Tumbling window
○ Non-overlapping stats for each window
Example 2: Configuration
28
● Application
○ Smallest time slice ⇒ half second
STREAMING_WINDOW_SIZE = 500 ms
● SMA Operator
○ Rolling stats over ⇒ 5 min
APPLICATION_WINDOW_COUNT = 600
○ Emit frequency ⇒ half second
SLIDE_BY_WINDOW_COUNT = 1
Slide-by Window count (continued)
29
<property>
<name>dt.application.ApplicationName.operator.OperatorName
.attr.SLIDE_BY_WINDOW_COUNT</name>
<value>1</value>
</property>
dag.setAttribute(operator,
DAGContext.SLIDE_BY_WINDOW_COUNT,1);
Comparison with micro-batch
30
Gol gappa ⇒ micro-batch
image ref [8]
Gol gappa ⇒ Streaming windows
image ref [7]
Apex windows : Highlights
31
● Apex streaming windows
○ Streams ⇒ divided into time slices
○ Window ⇒ markers added to stream
○ Records ⇒ do not wait for window end
● Uses
○ Engine ⇒ Book keeping
○ Operators ⇒ Custom aggregates on windows
Micro-batch engines
32
● Micro-batch
○ Streams ⇒ divided into small size batches
○ Micro-batches ⇒ processed separately
○ Each record ⇒ waits till micro-batch is ready for
further processing.
● Example : Spark streaming
Comparison
33
Micro batch engines Apex streaming windows
Waiting time Records waits till micro-batch
is ready for further processing
Records do not wait for
end of window
Additional latency Artificial latency introduced
because of records waiting for
micro-batch boundaries
No additional latency
involved. Records are
immediately forwarded to
next stage of processing.
Limits Sub-second latencies only for
simple applications.System
with multiple network shuffle
leads multi-seconds latencies.
[14]
Even latencies like 2ms
achievable [13]
34
Questions
Image ref [2]
35
Resources
36
● Apache Apex Page
○ http://apex.incubator.apache.org
● Mailing Lists
○ dev@apex.incubator.apache.org
○ users@apex.incubator.apache.org
● Repository
○ https://github.com/apache/incubator-apex-core
○ https://github.com/apache/incubator-apex-malhar
● Issue Tracking
○ https://issues.apache.org/jira/browse/APEXCORE
○ https://issues.apache.org/jira/browse/APEXMALHAR
● @ApacheApex
● /groups/7020520
References
1. Thank You | planwallpaper http://www.planwallpaper.com/thank-you
2. Question | clipartpanda http://www.clipartpanda.com/clipart_images/how-to-answer-the-question-46954146
3. Streaming 101 | oreilly http://radar.oreilly.com/2015/08/the-world-beyond-batch-streaming-101.html
4. Swimming Pool Design | homesthetics http://homesthetics.net/backyard-landscaping-ideas-swimming-pool-design/
5. Mountain Stream | freebigpictures http://freebigpictures.com/river-pictures/mountain-stream/
6. Yahoo Finance http://finance.yahoo.com/
7. Crispy Chaat | grabhouse http://grabhouse.com/urbancocktail/11-crispy-chaat-joints-food-lovers-hyderabad/
8. Paani puri stall | citiyshor http://www.cityshor.com/pune/food/street-food/camp/murali-paani-puri-stall/
9. Application Developement | DataTorrent http://docs.datatorrent.com/application_development/
10. Malhar demos | Apache apex malhar | https://github.com/apache/incubator-apex-
malhar/blob/master/demos/yahoofinance/src/main/java/com/datatorrent/demos/yahoofinance/YahooFinanceApplication.java
11. Malhar demos | Apache apex malhar https://github.com/apache/incubator-apex-
malhar/blob/master/demos/twitter/src/main/java/com/datatorrent/demos/twitter/TwitterTopCounterApplication.java
12. https://github.com/apache/incubator-apex-malhar/blob/master/demos/yahoofinance/src/main/resources/META-INF/properties.xml#L28
13. ilganeli | slideshare http://www.slideshare.net/ilganeli/nextgen-decision-making-in-under-2ms
14. teamblog | cakesolutions http://www.cakesolutions.net/teamblogs/spark-streaming-tricky-parts
37

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsThomas Weise
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop PlatformApache Apex
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingApache Apex
 
Smart Partitioning with Apache Apex (Webinar)
Smart Partitioning with Apache Apex (Webinar)Smart Partitioning with Apache Apex (Webinar)
Smart Partitioning with Apache Apex (Webinar)Apache Apex
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupDataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupThomas Weise
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareApache Apex
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacApache Apex
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream APIApache Apex
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexApache Apex
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017Apache Apex
 
Fault Tolerance and Processing Semantics in Apache Apex
Fault Tolerance and Processing Semantics in Apache ApexFault Tolerance and Processing Semantics in Apache Apex
Fault Tolerance and Processing Semantics in Apache ApexApache Apex Organizer
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexApache Apex
 
Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016Bhupesh Chawda
 
Apache Apex Introduction with PubMatic
Apache Apex Introduction with PubMaticApache Apex Introduction with PubMatic
Apache Apex Introduction with PubMaticApache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentApache Apex
 
Introduction to Real-time data processing
Introduction to Real-time data processingIntroduction to Real-time data processing
Introduction to Real-time data processingYogi Devendra Vyavahare
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 

Was ist angesagt? (20)

Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
 
Smart Partitioning with Apache Apex (Webinar)
Smart Partitioning with Apache Apex (Webinar)Smart Partitioning with Apache Apex (Webinar)
Smart Partitioning with Apache Apex (Webinar)
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupDataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application Meetup
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
 
Apex as yarn application
Apex as yarn applicationApex as yarn application
Apex as yarn application
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream API
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
Fault Tolerance and Processing Semantics in Apache Apex
Fault Tolerance and Processing Semantics in Apache ApexFault Tolerance and Processing Semantics in Apache Apex
Fault Tolerance and Processing Semantics in Apache Apex
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016
 
Apache Apex Introduction with PubMatic
Apache Apex Introduction with PubMaticApache Apex Introduction with PubMatic
Apache Apex Introduction with PubMatic
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
 
Introduction to Real-time data processing
Introduction to Real-time data processingIntroduction to Real-time data processing
Introduction to Real-time data processing
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
 

Andere mochten auch

Real-time Stream Processing using Apache Apex
Real-time Stream Processing using Apache ApexReal-time Stream Processing using Apache Apex
Real-time Stream Processing using Apache ApexApache Apex
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingApache Apex
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)Apache Apex
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataApache Apex
 
Capital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 msCapital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 msApache Apex
 
Цветочные легенды
Цветочные легендыЦветочные легенды
Цветочные легендыNinel Kek
 
Римский корсаков снегурочка
Римский корсаков снегурочкаРимский корсаков снегурочка
Римский корсаков снегурочкаNinel Kek
 
High Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRSHigh Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRSJonathan Oliver
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationApache Apex
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Apache Apex
 
правописание приставок урок№4
правописание приставок урок№4правописание приставок урок№4
правописание приставок урок№4HomichAlla
 
бсп (обоб. урок)
бсп (обоб. урок)бсп (обоб. урок)
бсп (обоб. урок)HomichAlla
 
Troubleshooting mysql-tutorial
Troubleshooting mysql-tutorialTroubleshooting mysql-tutorial
Troubleshooting mysql-tutorialjames tong
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Spark Summit
 
The 5 People in your Organization that grow Legacy Code
The 5 People in your Organization that grow Legacy CodeThe 5 People in your Organization that grow Legacy Code
The 5 People in your Organization that grow Legacy CodeRoberto Cortez
 
Hadoop basic commands
Hadoop basic commandsHadoop basic commands
Hadoop basic commandsbispsolutions
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application Apache Apex
 
Build your shiny new pc, with Pangoly
Build your shiny new pc, with PangolyBuild your shiny new pc, with Pangoly
Build your shiny new pc, with PangolyPangoly
 

Andere mochten auch (20)

Real-time Stream Processing using Apache Apex
Real-time Stream Processing using Apache ApexReal-time Stream Processing using Apache Apex
Real-time Stream Processing using Apache Apex
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
Capital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 msCapital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 ms
 
Цветочные легенды
Цветочные легендыЦветочные легенды
Цветочные легенды
 
Римский корсаков снегурочка
Римский корсаков снегурочкаРимский корсаков снегурочка
Римский корсаков снегурочка
 
High Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRSHigh Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRS
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
 
правописание приставок урок№4
правописание приставок урок№4правописание приставок урок№4
правописание приставок урок№4
 
бсп (обоб. урок)
бсп (обоб. урок)бсп (обоб. урок)
бсп (обоб. урок)
 
Troubleshooting mysql-tutorial
Troubleshooting mysql-tutorialTroubleshooting mysql-tutorial
Troubleshooting mysql-tutorial
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
 
The 5 People in your Organization that grow Legacy Code
The 5 People in your Organization that grow Legacy CodeThe 5 People in your Organization that grow Legacy Code
The 5 People in your Organization that grow Legacy Code
 
Hadoop File System Shell Commands,
Hadoop File System Shell Commands,Hadoop File System Shell Commands,
Hadoop File System Shell Commands,
 
Hadoop basic commands
Hadoop basic commandsHadoop basic commands
Hadoop basic commands
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application
 
Build your shiny new pc, with Pangoly
Build your shiny new pc, with PangolyBuild your shiny new pc, with Pangoly
Build your shiny new pc, with Pangoly
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 

Ähnlich wie Windowing in Apache Apex

Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streamingdatamantra
 
Peer sim (p2p network)
Peer sim (p2p network)Peer sim (p2p network)
Peer sim (p2p network)Hein Min Htike
 
Stream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUpStream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUpBowen Li
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Golinuxlab_conf
 
Etienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runnerEtienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runnerEtienne Chauchot
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleDmytro Semenov
 
Training Webinar: Detect Performance Bottlenecks of Applications
Training Webinar: Detect Performance Bottlenecks of ApplicationsTraining Webinar: Detect Performance Bottlenecks of Applications
Training Webinar: Detect Performance Bottlenecks of ApplicationsOutSystems
 
Parallel programing in web applications - public.pptx
Parallel programing in web applications - public.pptxParallel programing in web applications - public.pptx
Parallel programing in web applications - public.pptxGuy Bary
 
FIWARE Developers Week_BootcampWeBUI_presentation1
FIWARE Developers Week_BootcampWeBUI_presentation1FIWARE Developers Week_BootcampWeBUI_presentation1
FIWARE Developers Week_BootcampWeBUI_presentation1FIWARE
 
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Timo Walther
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)KafkaZone
 
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Noam Elfanbaum
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS
 
Story of migrating event pipeline from batch to streaming
Story of migrating event pipeline from batch to streamingStory of migrating event pipeline from batch to streaming
Story of migrating event pipeline from batch to streaminglohitvijayarenu
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analyticsXiang Fu
 
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...NETWAYS
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...Amazon Web Services
 
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...Flink Forward
 

Ähnlich wie Windowing in Apache Apex (20)

Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streaming
 
Peer sim (p2p network)
Peer sim (p2p network)Peer sim (p2p network)
Peer sim (p2p network)
 
Stream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUpStream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUp
 
Nexmark with beam
Nexmark with beamNexmark with beam
Nexmark with beam
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Go
 
Etienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runnerEtienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runner
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
Training Webinar: Detect Performance Bottlenecks of Applications
Training Webinar: Detect Performance Bottlenecks of ApplicationsTraining Webinar: Detect Performance Bottlenecks of Applications
Training Webinar: Detect Performance Bottlenecks of Applications
 
Parallel programing in web applications - public.pptx
Parallel programing in web applications - public.pptxParallel programing in web applications - public.pptx
Parallel programing in web applications - public.pptx
 
FIWARE Developers Week_BootcampWeBUI_presentation1
FIWARE Developers Week_BootcampWeBUI_presentation1FIWARE Developers Week_BootcampWeBUI_presentation1
FIWARE Developers Week_BootcampWeBUI_presentation1
 
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
 
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 
Story of migrating event pipeline from batch to streaming
Story of migrating event pipeline from batch to streamingStory of migrating event pipeline from batch to streaming
Story of migrating event pipeline from batch to streaming
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
 
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
 

Mehr von Apache Apex

Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Apache Apex
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Apex
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFSApache Apex
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to YarnApache Apex
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsApache Apex
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentIngesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentApache Apex
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)Apache Apex
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexApache Apex
 
Apache Apex & Bigtop
Apache Apex & BigtopApache Apex & Bigtop
Apache Apex & BigtopApache Apex
 
Building Your First Apache Apex Application
Building Your First Apache Apex ApplicationBuilding Your First Apache Apex Application
Building Your First Apache Apex ApplicationApache Apex
 

Mehr von Apache Apex (13)

Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentIngesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
 
Apache Apex & Bigtop
Apache Apex & BigtopApache Apex & Bigtop
Apache Apex & Bigtop
 
Building Your First Apache Apex Application
Building Your First Apache Apex ApplicationBuilding Your First Apache Apex Application
Building Your First Apache Apex Application
 

Kürzlich hochgeladen

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 

Kürzlich hochgeladen (20)

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 

Windowing in Apache Apex

  • 1. Windowing in Apache Apex Yogi Devendra yogidevendra@apache.org (with comparison to micro-batch)
  • 2. Agenda ● Windowing : Why? What? ● Example ● Window sizes : Apex terminologies ● Windowing : Internals ● Windowing : Operator callbacks ● Rolling statistics using sliding windows ● Comparison: ○ Apex windowing with micro-batches 2
  • 3. Image ref [4]3 Calculate Amount of water
  • 5. Windowing: Why? ● Data in motion ⇒ Unbounded datasets[1] ○ No beginning, No end ● Compute expects finite data ● Failure recovery requires book keeping ● We need some frame of reference for tracking 5
  • 6. Windowing: What? ● Data is flowing w.r.t time ● Computers understands time ● Use time axis as a reference ● Break the stream into finite time slices ⇒ Streaming Windows 6
  • 8. Example 1a : %change 8 ● Input : ○ Stream A = Stock price ○ Stream B = Index price ● Output : Stream C = %Change difference ○ %change (Stock) - %change(Index) ○ 1 data point per sec (Max over 1 sec) ● Window size for this operation is 1 sec
  • 9. Example 1b: %change, avg 9 ● Input : A = Stock price, B = Index price ● Output : ○ Stream C = % Change difference ■ Max(%change (Stock) - %change(Index)) 1 point per sec ○ Stream D = Avg stock price over 1 min ■ 1 data point per min ● Window size for Avg operation is 1 min
  • 10. Apex Computation model (recap)[9] ● Directed Acyclic Graph ⇒ Application [DAG] ● Nodes ⇒ Computation units [Operators] ● Edges ⇒ Sequence of data tuples [Streams] 10 Filtered Stream Output StreamTuple Tuple FilteredStream Enriched Stream Enriched Stream er Operator er Operator er Operator er Operator
  • 11. Application 11 Operator Operation Output stream Window Size Percent change %change (Stock) - %change(Index) Stream C 1 sec Avg price Avg over 1 min Stream D 1 min Input Adapter Percent change Avg. Price Index price Stock price (1 per sec) (1 per min)
  • 12. 12 Apex terminologies ● Streaming window size ○ What is smallest time slice to be considered for this application? ● Application window count ○ How many streaming windows does this operator take to complete one unit of work? 35mm 20mm least count = 1mm
  • 13. Terms explained: Example 1b %change, avg 13 ● Streaming window size ○ Smallest time slice ⇒ 1 sec ● Application window count ○ Percent change ⇒ 1 sec = 1 streaming window ○ Avg. price ⇒ 1 min = 60 streaming window Input Adapter Percent change Avg. Price Index price Stock price (1 per sec) (1 per min)
  • 14. Streaming window size ● Application level configuration ● Platform default = 500 ms ● Platform default is good enough for most applications 14
  • 15. ● Operator level configuration ● Platform default = 1 ● If the operator is not doing special operations over multiple streaming window ⇒ use default Application window count 15
  • 16. Configuring windowing parameters 16 dag.setAttribute( DAGContext.STREAMING_WINDOW_SIZE_MILLIS, 1000); dag.setAttribute(operator, DAGContext.APPLICATION_WINDOW_COUNT,60); ● Setting Streaming window size to 1 sec ● Setting application window count to 60 streaming windows
  • 17. Windows at input adapters 17 Container (for input adapter) Begin Window (Streaming window) control tuple Data Tuple End Window (Streaming window) control tuple Window N Buffer Server Window N+1 Window Generator Control tuples Input Adapter Data tuples Input Node Typical window
  • 18. Windows at Operators 18 Container Control tuples Operator Data tuples Generic Node Window N Buffer Server Window N+1 Begin Window Incoming Data Tuple Outgoing Data Tuple Data tuples End Window
  • 19. Tuples flowing in stream 19 Input Operator Operator 1 Operator 2 Operator 3 Begin Window Data Tuple End Window WNWN+1WN+2 As time progress
  • 20. Windowing : Operator callbacks 20 ● If operator wish to do some processing at window level: ○ Configure APPLICATION_WINDOW_COUNT ○ Implement : ■ beginWindow(long windowId) ■ endWindow()
  • 21. Windowing : Operator callbacks (continued) 21 ● Platform wraps operators inside Node (InputNode, GenericNode) ○ looks at the control tuples for streaming windows boundaries ○ Invokes operator beginWindow(), endWindow() based on APPLICATION_WINDOW_COUNT
  • 22. Examples: Per window operations 22 ● Aggregate computations ○ Avg over last 1 min ○ Max over last 1 sec ● Writing to external store in batch ○ Data written file system e.g. HDFS
  • 23. Example 2 : Rolling statistics 23 ● Twitter trends ○ show top 10 URLs mentioned in the tweets ○ Results over last 5 mins ○ Update results every half second
  • 24. Application 24 ● Input : Stream of tweet samples ● Output : Top 10 trending URLs ○ over last 5 mins ○ emit results every half second Twitter Input URL extractor Unique URL Counter Top N counter
  • 25. Sliding windows 25 ● Rolling statistics ○ Results over last X windows ○ Emit results after every M windows. WN-2WN-1WNWN+1WN+2
  • 27. Slide-by Window count [11] 27 ● Operator level configuration ● App developer should specify: ○ After how many streaming windows should operator emit rolling statistics? ○ How to merge results across windows (unifier) ● Value between : 1 to APPLICATION_WINDOW_COUNT ● Default ○ Turned off : Tumbling window ○ Non-overlapping stats for each window
  • 28. Example 2: Configuration 28 ● Application ○ Smallest time slice ⇒ half second STREAMING_WINDOW_SIZE = 500 ms ● SMA Operator ○ Rolling stats over ⇒ 5 min APPLICATION_WINDOW_COUNT = 600 ○ Emit frequency ⇒ half second SLIDE_BY_WINDOW_COUNT = 1
  • 29. Slide-by Window count (continued) 29 <property> <name>dt.application.ApplicationName.operator.OperatorName .attr.SLIDE_BY_WINDOW_COUNT</name> <value>1</value> </property> dag.setAttribute(operator, DAGContext.SLIDE_BY_WINDOW_COUNT,1);
  • 30. Comparison with micro-batch 30 Gol gappa ⇒ micro-batch image ref [8] Gol gappa ⇒ Streaming windows image ref [7]
  • 31. Apex windows : Highlights 31 ● Apex streaming windows ○ Streams ⇒ divided into time slices ○ Window ⇒ markers added to stream ○ Records ⇒ do not wait for window end ● Uses ○ Engine ⇒ Book keeping ○ Operators ⇒ Custom aggregates on windows
  • 32. Micro-batch engines 32 ● Micro-batch ○ Streams ⇒ divided into small size batches ○ Micro-batches ⇒ processed separately ○ Each record ⇒ waits till micro-batch is ready for further processing. ● Example : Spark streaming
  • 33. Comparison 33 Micro batch engines Apex streaming windows Waiting time Records waits till micro-batch is ready for further processing Records do not wait for end of window Additional latency Artificial latency introduced because of records waiting for micro-batch boundaries No additional latency involved. Records are immediately forwarded to next stage of processing. Limits Sub-second latencies only for simple applications.System with multiple network shuffle leads multi-seconds latencies. [14] Even latencies like 2ms achievable [13]
  • 35. 35
  • 36. Resources 36 ● Apache Apex Page ○ http://apex.incubator.apache.org ● Mailing Lists ○ dev@apex.incubator.apache.org ○ users@apex.incubator.apache.org ● Repository ○ https://github.com/apache/incubator-apex-core ○ https://github.com/apache/incubator-apex-malhar ● Issue Tracking ○ https://issues.apache.org/jira/browse/APEXCORE ○ https://issues.apache.org/jira/browse/APEXMALHAR ● @ApacheApex ● /groups/7020520
  • 37. References 1. Thank You | planwallpaper http://www.planwallpaper.com/thank-you 2. Question | clipartpanda http://www.clipartpanda.com/clipart_images/how-to-answer-the-question-46954146 3. Streaming 101 | oreilly http://radar.oreilly.com/2015/08/the-world-beyond-batch-streaming-101.html 4. Swimming Pool Design | homesthetics http://homesthetics.net/backyard-landscaping-ideas-swimming-pool-design/ 5. Mountain Stream | freebigpictures http://freebigpictures.com/river-pictures/mountain-stream/ 6. Yahoo Finance http://finance.yahoo.com/ 7. Crispy Chaat | grabhouse http://grabhouse.com/urbancocktail/11-crispy-chaat-joints-food-lovers-hyderabad/ 8. Paani puri stall | citiyshor http://www.cityshor.com/pune/food/street-food/camp/murali-paani-puri-stall/ 9. Application Developement | DataTorrent http://docs.datatorrent.com/application_development/ 10. Malhar demos | Apache apex malhar | https://github.com/apache/incubator-apex- malhar/blob/master/demos/yahoofinance/src/main/java/com/datatorrent/demos/yahoofinance/YahooFinanceApplication.java 11. Malhar demos | Apache apex malhar https://github.com/apache/incubator-apex- malhar/blob/master/demos/twitter/src/main/java/com/datatorrent/demos/twitter/TwitterTopCounterApplication.java 12. https://github.com/apache/incubator-apex-malhar/blob/master/demos/yahoofinance/src/main/resources/META-INF/properties.xml#L28 13. ilganeli | slideshare http://www.slideshare.net/ilganeli/nextgen-decision-making-in-under-2ms 14. teamblog | cakesolutions http://www.cakesolutions.net/teamblogs/spark-streaming-tricky-parts 37