Business Rules on Hadoop

BUSINESS RULES
ON HADOOP
Egor Gryaznov
June 26th, 2013

 Analyze call center data
 Next call prevention
 Call volume reduction
 Big data platform (that’s me)
 Applications on top of it
2
What we do

 Decision logic
 Automate processes
 Enforce policies
 Make decisions
 ETL
 Business Rule Management Systems
 Ilog, Drools, etc.
 Write, manage, deploy, execute, monitor
3
What are business rules

4
How business rules work
1. Get data
2. Write rules
3. ?????
4. PROFIT!

5
Data

6
Data
Rules

7
Data
Rules
Rules
Engine

8
Data
Rules
Rules
Engine
Results

9
 Java beans to hold data
 Create one object for every record
 Rules to describe logic
 Insert all beans into engine
 Execute rules against objects
 Modify objects in place
 Return new objects
 Write out to file/database/etc

10
Why business rules
 Non-programmers who want to analyze data
 Don’t need to write code
 Available GUIs for rule writing
 One-time infrastructure set up
 Rules are plug and play

 Memory intensive
 Hard calculations, medium data
 Complicated decision logic
 Pseudo-joins
 Easy calculations, huge data
 Row by row if-then
 Aggregation
 Scaling existing solutions
 Very relevant at NICE
11
Why business rules on Hadoop

 Calculations that require access to full data
set
 Too much data for one key
 Serialization of objects
 Only if you have reducer
 Custom objects only
12
Don’t get too excited

 Agents make sales
 Fake generated data
 Different types of calculations
 Compare performance between standalone
and clustered
13
Examples

1. How much bonus should agent get based
on current sale?
 Bonus = sale > 0 ? sale/100 : 0
 All work in mapper, no reducer
2. How much did the agent sell total?
 Total = sum of all sales by this agent
 Pass-through mapper, work in reducer
14
Test Scenarios

public class AgentSalesBean implements
Writable{
private String name;
private String office;
private int salesTotal;
private double bonus;
}
//getters, setters, and serializer/deserializer
15
Details of Examples

 Read in file with records
 1 row = 1 AgentSalesBean
 5M – 50M in increments of 5M for first test
 1M – 5M in increments of 1M for second test
 Extra runs on Hadoop with more data
 1M records ≈ 23MB
16
Details of Examples

 Standalone machine
 3.3 GHz (1 CPU x 4 cores x 1 thread/core)
 9 GB RAM allocated to JVM
 Cluster
 3 data nodes
 2.4 GHz (2 CPUs x 4 cores x 2 threads/core)
 2 GB/mapper (32 GB available/node)
17
Details of Examples

How much bonus should agent get based on
current sale?
 Bonus = sale > 0 ? sale/100 : 0
 All work in mapper, no reducer
18
First Scenario

19
Results – First Scenario
y = 3.374x + 31.6
R² = 0.967
y = 6.517x + 7.466
R² = 0.993
0
50
100
150
200
250
300
350
400
0 10 20 30 40 50 60
Runtimeinseconds
Number of records (in millions)
Hadoop
Standalone

 Neither implementation ran out of RAM
 Have to clear out working memory
 Both implementations grow linearly
 Standalone grows twice as fast
20
Results – First Scenario

How much did the agent sell total?
 Total = sum of all sales by this agent
 Pass-through mapper, work in reducer
21
Second Scenario

22
Results – Second Scenario
y = 6x + 32.8
R² = 0.992
y = 15.6x - 3
R² = 0.999
0
10
20
30
40
50
60
70
0 1 2 3 4 5 6
Runtime(inseconds)
Number of records (in millions)
Hadoop
Standalone

 Standalone ran out of RAM (9 GB) at 5M
 Hadoop never did; ran up to 50M
 Hadoop ran at 24x600MB reducers
 Getting the data there took a while though
 MapReduce scaled very well
 Due to how Hadoop MapReduce is implemented
 Can be run on just one reducer
23
Results – Second Scenario

 Lots of memory required
 For complicated rules
 Read up on the implementation of your engine
 Hadoop only for huge datasets
 JVM startup time
 Cutoff size depends on rule complexity and
object sizes
 Hadoop scales very well
 Especially with subset calculations
24
Conclusions

 You will have a custom solution
 Need to know your data
 Need to know what you want to find out
 Different ways of writing rules for the same thing
 Write your own MapReduce jobs
25
How do I actually do this?

 Figure out execution of rules
 What does the work (mapper vs reducer vs both)
 Make sure your beans can be serialized
 Recursive serialization
 Your mileage will vary
 Every organization has different needs and
capacities
26
How do I actually do this?

Business Rules on Hadoop

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Business Rules on Hadoop

Ähnlich wie Business Rules on Hadoop (20)

Mehr von DataWorks Summit

Mehr von DataWorks Summit (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Business Rules on Hadoop

Hinweis der Redaktion