2. 2
High Availability
• A system without non-planned downtime when
partial failures occur
• Typically achieved by having redundancies and removing
single-points of failure
• Our Goals
• Don’t change the API or usage patterns
• User doesn’t even have to know its HA
4. 4
The HA Solution: Database
• Oozie stores all state in a database
• (submitted jobs, workflow definitions, etc)
• Instead of a failover model, we want to run many
Oozie servers against the same database
• Active-Active HA
• Also provides horizontal scalability
• ZooKeeper for coordination
6. 6
The HA Solution: Access
• Users and client programs need a single address to
connect (Web UI, REST/Java API, JobTracker callbacks,
etc)
• Load Balancer, Virtual IP, or DNS round-robin can be
used to provide a single entry point to the Oozie
servers
• Technically also needs to be HA
8. 8
The HA Solution: Log Streaming
• Oozie’s log files are not in the database
• Each Oozie Server only has access to its own logs
• Jobs are not assigned to a specific Oozie server
• What if Oozie Server A wants to get logs for a job
processed by Oozie Server B?
• Oozie Server A can ask Oozie Server B for its logs
• Caveat: If an Oozie Server goes down, any logs from it will
be unavailable until it is brought back up
11. 11
How to Enable HA
• Setup Load balancer, ZooKeeper ensemble, HA database,
and multiple identically configured Oozie servers
• Enable Oozie HA services:
<property>
<name>oozie.services.ext</name>
<value>
org.apache.oozie.service.ZKLocksService,
org.apache.oozie.service.ZKXLogStreamingService,
org.apache.oozie.service.ZKJobsConcurrencyService
</value>
</property>
12. 12
How to Enable HA
• Point Oozie to ZooKeeper Ensemble:
<property>
<name>oozie.zookeeper.connection.string</name>
<value>ZK_HOST1:2181,ZK_HOST2:2181</value>
</property>
• Point environment variable for callbacks to load
balancer:
export OOZIE_BASE_URL="http://loadbalancer:11000/oozie"
13. 13
How to Enable HA: Security
• Extra step to configure Kerberos with Load Balancer:
<property>
<name>
oozie.authentication.kerberos.principal
</name>
<value>HTTP/loadbalancer@REALM</value>
</property>
• Note: this currently prevents clients from talking
directly to any Oozie server
14. 14
How to Enable HA: Security
• Enable Kerberos connection to ZooKeeper and ACLs:
<property>
<name>oozie.zookeeper.secure</name>
<value>true</value>
</property>
• ACLs prevent malicious users or programs from
interfering with Oozie’s znodes
16. 16
Using Oozie with HA
• New Oozie CLI/REST API command to list all servers
$ oozie admin -oozie http://loadbalancer:11000/oozie -servers
hostA : http://hostA:11000/oozie
hostB : http://hostB:11000/oozie
hostC : http://hostC:11000/oozie
• Log messages now include which server wrote them
2013-09-29 16:46:20,182 WARN
org.apache.oozie.command.wf.ActionStartXCommand:
SERVER[hostA] USER[root] GROUP[-] TOKEN[] APP[demo-wf]
JOB[0000000-130925230553293-oozie-oozi-W] ACTION[0000000-
130925230553293-oozie-oozi-W@streaming-node] [***0000000-
130925230553293-oozie-oozi-W@streaming-node***]Action
status=RUNNING
18. 18
To Do
• HA support for SLAs and HCatalog integration
• Sharelib Purging with HA
• Log Streaming HA
• With Kerberos, Oozie servers can’t talk to each other
• Breaks log streaming, sharelibupdate
• Other misc improvements