2. Apache Oozie
ïŒApache ooize is a java web application used to
schedule Apache Hadoop jobs.
ïŒOozie combines multiple jobs sequentially into
one logical unit of work.
ïŒItâs integrated with the Hadoop stack.
ïŒIs an server based work flow scheduling system
to Manage Hadoop jobs, It Supports,
3.
4. Three types of workflows
âą Oozie workflow jobs
âą Oozie Bundle
âą Coordinator jobs
Oozie workflow jobs
ï Sequence of action to be executed.
Oozie Bundle
ï Package of multiple coordinator And
workflow jobs.
Coordinator jobs
ï workflow jobs triggered by time and date
availability.
5. âą Users are permitted to create Directed Acyclic
Graphs of workflow which can be run in parallel
and sequentially in Hadoop.
âą It consist of two parts:
ïŒ workflow engine
ïŒ coordinator engine
workflow engine
ï Responsibility of a workflow engine is to store
and run workflow composed of Hadoop jobs.
coordinator engine
ï It runs workflow jobs based on predefined
schedules and a availability of data.
6. âą Ooize is scalable and can manage the timely
execution of thousands of workflow in a Hadoop
cluster.
âą Ooize is very much flexible as well one can easily
start ,stop, suspend and rerun jobs.
âą Ooize makes it very easy to return failed workflow.
7. How it is work
âą Ooize workflow consists of Action Nodes and
Control Nodes.
An Action node represents a workflow jobs .
âą Moving files into HDFS,running a map reduce, pig
or Hive jobs, importing data using sqoop or
running a Shell Script of a program written java.
Control node
âą Controls the workflow execution between actions
by allowing contracts like conditional logic where
in different branches dependent action node
8. Types of node
Start Node
ï Designates the start of the workflow jobs.
End Node
ïSignals end of the job.
Error node
ïDesignates the occurrences of an error and
corresponding error message to be printed.
9. Features of ooize
ïUsing itâs web service APIs one control jobs
from anywhere.
ïOoize has to send email notification upon
computation jobs
ïOozie has provision to execute jobs which are
scheduled to run periodically
ïUsing its Web Service APIs one can control
jobs from anywhere.
18. 9. The status of ooize can be checked from command line or the
web console
19. 10. To setup the ooize client, copy the client tar file to the
âooize clientâ and the path in bashrc file .
20. Ooize workflow for IOT data analysis
Assuming that the data received from a machine has the following
structure.
21. ï±The goal of the analysis is to find the counts of each
status/error code and produce an output with a structure
22. ï±The ooize workflow comprising of hadoop streaming map
reduce job action and email action that notify the
success or failure of the job.
ï±The map program parses the status/error code from each
line in the input and emits key-value pairs.
ï±Where key is the status/error code and value is 1.
ï±The reduce program receives the key-pairs emitted by
the map program aggregated by the same key.
ï±Each key ,the reduce program calculates the count and
emitskey,value pairs where key is the status/error code
and the value is the count