Session Abstract</strong><div></div><div><p>Branch-and-bound is a widely used technique for efficiently searching for solutions to combinatorial optimization problems. In this session, we will introduce BranchReduce, an open-source Java library for performing distributed branch-and-bound on a Hadoop cluster under YARN. Applications only need to write code that is specific to their optimization problem (namely the branching rule, the lower bound computation, and the upper bound computation), and BranchReduce handles deploying the application to the cluster, managing the execution, and periodically rebalancing the search space across the machines. We will give an overview of how BranchReduce works and then walk through an example that solves a scheduling problem with a near-linear speedup over a single machine implementation.
8. Building a New Processing Framework on YARN
Copyright 2012 Cloudera Inc. All rights reserved
9. A Terrifyingly Accurate Paraphrasing of JWZ
Some people, when confronted with a tedious
problem, say, “I know, I’ll write a framework.”
Now they have two tedious problems.
Copyright 2012 Cloudera Inc. All rights reserved
11. The Example YARN App: Distributed Shell
Copyright 2012 Cloudera Inc. All rights reserved
12. Do We Need a New Programming Language for
Developing YARN Applications?
Copyright 2012 Cloudera Inc. All rights reserved
13. Do We Need a New Programming Language for
Developing YARN Applications?
Copyright 2012 Cloudera Inc. All rights reserved
14. Leverage Existing Frameworks
• Popular RPC libraries
with support for
multiple languages
• C++, Java, Python
• We need to make it
easy to deploy existing
applications on YARN
Copyright 2012 Cloudera Inc. All rights reserved
16. Design Pattern: The Unified Application Master
• Contains business logic
and YARN logic
• Primary reason:
Communication
• Also: dynamic resource
allocation
• Develop our
master/worker
applications locally and
then deploy them on
YARN
Copyright 2012 Cloudera Inc. All rights reserved
17. YARN Lifecycle Management as a Service
• Specifically, extensions
of Guava’s Service
interface
• YarnClientService
• AppMasterService
• Contains all of the logic
for creating applications
and keeping an eye on
them
Copyright 2012 Cloudera Inc. All rights reserved
19. Lua as a Configuration Language
• Small and Simple
• Looks like a
configuration file
• Functions are there
when/if you need them
• Inheritance
• Don’t Repeat Yourself
• Forgiving of undefined
values
• Java/C++ Integration
Copyright 2012 Cloudera Inc. All rights reserved
20. First Kitten Utility: The cat Function
Copyright 2012 Cloudera Inc. All rights reserved
21. Second Kitten Utility: The yarn Function
Copyright 2012 Cloudera Inc. All rights reserved
23. Branch-and-Bound
Copyright 2012 Cloudera Inc. All rights reserved
24. The Challenge of Parallel Branch and Bound:
Unbalanced Search Space
• Some branches are
pruned quickly
• Can be difficult to
determine the best
splits a priori
• Easy to revert to a de
facto single-threaded
search
Copyright 2012 Cloudera Inc. All rights reserved
25. The Solution: Work Stealing
Copyright 2012 Cloudera Inc. All rights reserved
26. You Write Three Classes
• A Task class that implements Writable
• A GlobalState class that implements Writable and has a
mergeWith(GlobalState other) method
• A Processor class that defines:
• execute(T task, BranchReduceContext<T, GlobalState> ctxt);
• With optional initialize and cleanup methods
• Configuration is done via BranchReduceJob
Copyright 2012 Cloudera Inc. All rights reserved