2. ● Key concepts: DAG, Operators, Ports
● APIs for defining Applications, Operators
● “Word Count” example DAG
● Building Apache Apex from source code
● Creating a sample application
● Demo
● Questions
Outline
3. ● An Application is defined as Directed Acyclic Graph : DAG
● Vertices of the DAG are computational units : Operators
● Edges of the DAG are data tuples in-motion : Streams
● Operator end-points for input , output : Ports
● An Operator takes one or more input streams, performs computations & emits one or more output streams
○ Each operator is USER’s business logic, or built-in operator from our open source library
○ Operator may have multiple instances that run in parallel
Application as a DAG
7. ● Data at Rest - Count occurrences of words in a file
● Data in Motion - Emit counts at the end of the window
● Another variation - Emit cumulative counts at the end of
every window.
Sample application
Apex Application DAGHDFS
LOGS
Lines Counts