2. In Addition to
Boris Lublinsky, Kevin T.
Smith and Alexey
Yakubovich.
“Professional Hadoop
Solutions”
3. New cases
• Use cases
• Organizing all data processing steps: up-down
• Regular data injection
• Regular data transformation
• Regular report generation
• Extensions
• File movement on HDFS (synch. java action)
• Data transfer (synch - ftp, synch - ssh)
• Logging / monitoring (beyond Oozie console)
4. New & rediscovered Oozie features
• 1. JMS notifications (job life cycle, SLA)
http://oozie.apache.org/docs/4.0.0/DG_JMSNotifications.html
• 2. Overriding the launcher
https://github.com/yahoo/oozie/blob/master/examples/src/main/java/org
/apache/oozie/example/DemoPigMain.java
• Unit testing Oozie with MiniOozie
http://oozie.apache.org/docs/4.0.0/ENG_MiniOozie.html
5. JMS notifications
“Push” JMS notifications for action status, SLA met and SLA
miss
Needs “JMS broker” to interprets notifications
Apache ActiveMQ
Need “JMS notification configuration” in the oozie-site.xml:
oozie.services.ext
oozie.services.EventHandlerService…
oozie.jms.producer.connection.properties (topic)
Notification types
Job status: start, success, failure, suspended …
SLA: start| end| duration && met | miss
Message format: javax.jms.TextMessage with Oozie job specific
headers
6. Overriding the launcher (cross-cutting concerns)
• Regular Pig job launcher –
org.apache.oozie.action.hadoop.PigMain
Reminder: action executor provides all preparations for submitting
action as a hadoop job(s). In particularly the PigMain executor invokes
the Pig runtime on an Edge (Gateway) node.
public class SpecialPigExec extends PigMain() {
e.g. logging, external services (security,, transactions)
}
• Oozie workflow
<action name=“pig-special”>
<pig>
…
<property>
<name> oozie.launcher.action.main.class </name>
<value> … SpecialPigExec</value>
7. Unit testing Oozie with MiniOozie
• MiniOozieTestCase is a junit test class
• Allows to test workflow and coordinator applications
• Tests workflow directly from IDE (Eclipse for sure)
• Does not require access to cluster or running Oozie server
• Runs against the local file system
• Tested on Linux and Max OS X, configured with Maven (simple)
• Needs most (all) Oozie libraries
Action choice restricted:
java actions is straight forward.
others can be “simulated”
I can’t tell if possible to combine with PigUnit and Hive standalone
mode.