SlideShare ist ein Scribd-Unternehmen logo
1 von 28
BUSINESS RULES
ON HADOOP
Egor Gryaznov
June 26th, 2013
 Analyze call center data
 Next call prevention
 Call volume reduction
 Big data platform (that’s me)
 Applications on top of it
2
What we do
 Decision logic
 Automate processes
 Enforce policies
 Make decisions
 ETL
 Business Rule Management Systems
 Ilog, Drools, etc.
 Write, manage, deploy, execute, monitor
3
What are business rules
4
How business rules work
1. Get data
2. Write rules
3. ?????
4. PROFIT!
5
How business rules work
Data
6
How business rules work
Data
Rules
7
How business rules work
Data
Rules
Rules
Engine
8
How business rules work
Data
Rules
Rules
Engine
Results
9
How business rules work
 Java beans to hold data
 Create one object for every record
 Rules to describe logic
 Insert all beans into engine
 Execute rules against objects
 Modify objects in place
 Return new objects
 Write out to file/database/etc
10
Why business rules
 Non-programmers who want to analyze data
 Don’t need to write code
 Available GUIs for rule writing
 One-time infrastructure set up
 Rules are plug and play
 Memory intensive
 Hard calculations, medium data
 Complicated decision logic
 Pseudo-joins
 Easy calculations, huge data
 Row by row if-then
 Aggregation
 Scaling existing solutions
 Very relevant at NICE
11
Why business rules on Hadoop
 Calculations that require access to full data
set
 Too much data for one key
 Serialization of objects
 Only if you have reducer
 Custom objects only
12
Don’t get too excited
 Agents make sales
 Fake generated data
 Different types of calculations
 Compare performance between standalone
and clustered
13
Examples
1. How much bonus should agent get based
on current sale?
 Bonus = sale > 0 ? sale/100 : 0
 All work in mapper, no reducer
2. How much did the agent sell total?
 Total = sum of all sales by this agent
 Pass-through mapper, work in reducer
14
Test Scenarios
public class AgentSalesBean implements
Writable{
private String name;
private String office;
private int salesTotal;
private double bonus;
}
//getters, setters, and serializer/deserializer
15
Details of Examples
 Read in file with records
 1 row = 1 AgentSalesBean
 5M – 50M in increments of 5M for first test
 1M – 5M in increments of 1M for second test
 Extra runs on Hadoop with more data
 1M records ≈ 23MB
16
Details of Examples
 Standalone machine
 3.3 GHz (1 CPU x 4 cores x 1 thread/core)
 9 GB RAM allocated to JVM
 Cluster
 3 data nodes
 2.4 GHz (2 CPUs x 4 cores x 2 threads/core)
 2 GB/mapper (32 GB available/node)
17
Details of Examples
How much bonus should agent get based on
current sale?
 Bonus = sale > 0 ? sale/100 : 0
 All work in mapper, no reducer
18
First Scenario
19
Results – First Scenario
y = 3.374x + 31.6
R² = 0.967
y = 6.517x + 7.466
R² = 0.993
0
50
100
150
200
250
300
350
400
0 10 20 30 40 50 60
Runtimeinseconds
Number of records (in millions)
Hadoop
Standalone
 Neither implementation ran out of RAM
 Have to clear out working memory
 Both implementations grow linearly
 Standalone grows twice as fast
20
Results – First Scenario
How much did the agent sell total?
 Total = sum of all sales by this agent
 Pass-through mapper, work in reducer
21
Second Scenario
22
Results – Second Scenario
y = 6x + 32.8
R² = 0.992
y = 15.6x - 3
R² = 0.999
0
10
20
30
40
50
60
70
0 1 2 3 4 5 6
Runtime(inseconds)
Number of records (in millions)
Hadoop
Standalone
 Standalone ran out of RAM (9 GB) at 5M
 Hadoop never did; ran up to 50M
 Hadoop ran at 24x600MB reducers
 Getting the data there took a while though
 MapReduce scaled very well
 Due to how Hadoop MapReduce is implemented
 Can be run on just one reducer
23
Results – Second Scenario
 Lots of memory required
 For complicated rules
 Read up on the implementation of your engine
 Hadoop only for huge datasets
 JVM startup time
 Cutoff size depends on rule complexity and
object sizes
 Hadoop scales very well
 Especially with subset calculations
24
Conclusions
 You will have a custom solution
 Need to know your data
 Need to know what you want to find out
 Different ways of writing rules for the same thing
 Write your own MapReduce jobs
25
How do I actually do this?
 Figure out execution of rules
 What does the work (mapper vs reducer vs both)
 Make sure your beans can be serialized
 Recursive serialization
 Your mileage will vary
 Every organization has different needs and
capacities
26
How do I actually do this?
27
QUESTIONS?
28
THANK YOU!

Weitere ähnliche Inhalte

Was ist angesagt?

Garbage collection in JVM
Garbage collection in JVMGarbage collection in JVM
Garbage collection in JVMaragozin
 
Using apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogUsing apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogVadim Semenov
 
Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.EDB
 
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...In-Memory Computing Summit
 
Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%ScyllaDB
 
Assessing Graph Solutions for Apache Spark
Assessing Graph Solutions for Apache SparkAssessing Graph Solutions for Apache Spark
Assessing Graph Solutions for Apache SparkDatabricks
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQLlefredbe
 
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PGPgDay.Seoul
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Petr Zapletal
 
Relational databases for BigData
Relational databases for BigDataRelational databases for BigData
Relational databases for BigDataAlexander Tokarev
 
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScyllaDB
 
Tool it Up! - Session #3 - MySQL
Tool it Up! - Session #3 - MySQLTool it Up! - Session #3 - MySQL
Tool it Up! - Session #3 - MySQLtoolitup
 
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)Spark Summit
 
High Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of ViewHigh Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of Viewaragozin
 
Superworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueSuperworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueDatabricks
 
PostgreSQL 9.6 Performance-Scalability Improvements
PostgreSQL 9.6 Performance-Scalability ImprovementsPostgreSQL 9.6 Performance-Scalability Improvements
PostgreSQL 9.6 Performance-Scalability ImprovementsPGConf APAC
 
How Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter FootprintHow Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter FootprintScyllaDB
 
Automated Machine Learning Using Spark Mllib to Improve Customer Experience-(...
Automated Machine Learning Using Spark Mllib to Improve Customer Experience-(...Automated Machine Learning Using Spark Mllib to Improve Customer Experience-(...
Automated Machine Learning Using Spark Mllib to Improve Customer Experience-(...Spark Summit
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Spark Summit
 

Was ist angesagt? (20)

Garbage collection in JVM
Garbage collection in JVMGarbage collection in JVM
Garbage collection in JVM
 
Using apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogUsing apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at Datadog
 
Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.
 
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
 
Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%
 
Assessing Graph Solutions for Apache Spark
Assessing Graph Solutions for Apache SparkAssessing Graph Solutions for Apache Spark
Assessing Graph Solutions for Apache Spark
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQL
 
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Relational databases for BigData
Relational databases for BigDataRelational databases for BigData
Relational databases for BigData
 
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
 
Tool it Up! - Session #3 - MySQL
Tool it Up! - Session #3 - MySQLTool it Up! - Session #3 - MySQL
Tool it Up! - Session #3 - MySQL
 
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)
 
High Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of ViewHigh Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of View
 
Superworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueSuperworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and Fugue
 
Cloud DWH deep dive
Cloud DWH deep diveCloud DWH deep dive
Cloud DWH deep dive
 
PostgreSQL 9.6 Performance-Scalability Improvements
PostgreSQL 9.6 Performance-Scalability ImprovementsPostgreSQL 9.6 Performance-Scalability Improvements
PostgreSQL 9.6 Performance-Scalability Improvements
 
How Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter FootprintHow Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter Footprint
 
Automated Machine Learning Using Spark Mllib to Improve Customer Experience-(...
Automated Machine Learning Using Spark Mllib to Improve Customer Experience-(...Automated Machine Learning Using Spark Mllib to Improve Customer Experience-(...
Automated Machine Learning Using Spark Mllib to Improve Customer Experience-(...
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
 

Ähnlich wie Business Rules on Hadoop

Tales from the Field
Tales from the FieldTales from the Field
Tales from the FieldMongoDB
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018VMware Tanzu
 
NoSQL meetup July 2011
NoSQL meetup July 2011NoSQL meetup July 2011
NoSQL meetup July 2011Shay Hassidim
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsRobert Grossman
 
Beyond Best Practice: Grid Computing in the Modern World
Beyond Best Practice: Grid Computing in the Modern World Beyond Best Practice: Grid Computing in the Modern World
Beyond Best Practice: Grid Computing in the Modern World ThotWave
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014Dylan Tong
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About ShardingMongoDB
 
BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up Craig Schumann
 
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudJaipaul Agonus
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
 
MongoDB Evenings Houston: Implementing EDW Using MongoDB by Purvesh Patel, Ch...
MongoDB Evenings Houston: Implementing EDW Using MongoDB by Purvesh Patel, Ch...MongoDB Evenings Houston: Implementing EDW Using MongoDB by Purvesh Patel, Ch...
MongoDB Evenings Houston: Implementing EDW Using MongoDB by Purvesh Patel, Ch...MongoDB
 
Slices Of Performance in Java - Oleksandr Bodnar
Slices Of Performance in Java - Oleksandr BodnarSlices Of Performance in Java - Oleksandr Bodnar
Slices Of Performance in Java - Oleksandr BodnarGlobalLogic Ukraine
 
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)Brian Brazil
 
Introducing MongoDB in a multi-site HA environment
Introducing MongoDB in a multi-site HA environmentIntroducing MongoDB in a multi-site HA environment
Introducing MongoDB in a multi-site HA environmentSebastian Geib
 
Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Omid Vahdaty
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Giridhar Addepalli
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuningYosuke Mizutani
 
Dataswft Intel benchmark 2013
Dataswft Intel benchmark 2013Dataswft Intel benchmark 2013
Dataswft Intel benchmark 2013dhulis
 

Ähnlich wie Business Rules on Hadoop (20)

Tales from the Field
Tales from the FieldTales from the Field
Tales from the Field
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
 
NoSQL meetup July 2011
NoSQL meetup July 2011NoSQL meetup July 2011
NoSQL meetup July 2011
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
Beyond Best Practice: Grid Computing in the Modern World
Beyond Best Practice: Grid Computing in the Modern World Beyond Best Practice: Grid Computing in the Modern World
Beyond Best Practice: Grid Computing in the Modern World
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014
 
Siem tools
Siem toolsSiem tools
Siem tools
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
 
BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up
 
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
MongoDB Evenings Houston: Implementing EDW Using MongoDB by Purvesh Patel, Ch...
MongoDB Evenings Houston: Implementing EDW Using MongoDB by Purvesh Patel, Ch...MongoDB Evenings Houston: Implementing EDW Using MongoDB by Purvesh Patel, Ch...
MongoDB Evenings Houston: Implementing EDW Using MongoDB by Purvesh Patel, Ch...
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Slices Of Performance in Java - Oleksandr Bodnar
Slices Of Performance in Java - Oleksandr BodnarSlices Of Performance in Java - Oleksandr Bodnar
Slices Of Performance in Java - Oleksandr Bodnar
 
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
 
Introducing MongoDB in a multi-site HA environment
Introducing MongoDB in a multi-site HA environmentIntroducing MongoDB in a multi-site HA environment
Introducing MongoDB in a multi-site HA environment
 
Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuning
 
Dataswft Intel benchmark 2013
Dataswft Intel benchmark 2013Dataswft Intel benchmark 2013
Dataswft Intel benchmark 2013
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Kürzlich hochgeladen (20)

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Business Rules on Hadoop

  • 1. BUSINESS RULES ON HADOOP Egor Gryaznov June 26th, 2013
  • 2.  Analyze call center data  Next call prevention  Call volume reduction  Big data platform (that’s me)  Applications on top of it 2 What we do
  • 3.  Decision logic  Automate processes  Enforce policies  Make decisions  ETL  Business Rule Management Systems  Ilog, Drools, etc.  Write, manage, deploy, execute, monitor 3 What are business rules
  • 4. 4 How business rules work 1. Get data 2. Write rules 3. ????? 4. PROFIT!
  • 6. 6 How business rules work Data Rules
  • 7. 7 How business rules work Data Rules Rules Engine
  • 8. 8 How business rules work Data Rules Rules Engine Results
  • 9. 9 How business rules work  Java beans to hold data  Create one object for every record  Rules to describe logic  Insert all beans into engine  Execute rules against objects  Modify objects in place  Return new objects  Write out to file/database/etc
  • 10. 10 Why business rules  Non-programmers who want to analyze data  Don’t need to write code  Available GUIs for rule writing  One-time infrastructure set up  Rules are plug and play
  • 11.  Memory intensive  Hard calculations, medium data  Complicated decision logic  Pseudo-joins  Easy calculations, huge data  Row by row if-then  Aggregation  Scaling existing solutions  Very relevant at NICE 11 Why business rules on Hadoop
  • 12.  Calculations that require access to full data set  Too much data for one key  Serialization of objects  Only if you have reducer  Custom objects only 12 Don’t get too excited
  • 13.  Agents make sales  Fake generated data  Different types of calculations  Compare performance between standalone and clustered 13 Examples
  • 14. 1. How much bonus should agent get based on current sale?  Bonus = sale > 0 ? sale/100 : 0  All work in mapper, no reducer 2. How much did the agent sell total?  Total = sum of all sales by this agent  Pass-through mapper, work in reducer 14 Test Scenarios
  • 15. public class AgentSalesBean implements Writable{ private String name; private String office; private int salesTotal; private double bonus; } //getters, setters, and serializer/deserializer 15 Details of Examples
  • 16.  Read in file with records  1 row = 1 AgentSalesBean  5M – 50M in increments of 5M for first test  1M – 5M in increments of 1M for second test  Extra runs on Hadoop with more data  1M records ≈ 23MB 16 Details of Examples
  • 17.  Standalone machine  3.3 GHz (1 CPU x 4 cores x 1 thread/core)  9 GB RAM allocated to JVM  Cluster  3 data nodes  2.4 GHz (2 CPUs x 4 cores x 2 threads/core)  2 GB/mapper (32 GB available/node) 17 Details of Examples
  • 18. How much bonus should agent get based on current sale?  Bonus = sale > 0 ? sale/100 : 0  All work in mapper, no reducer 18 First Scenario
  • 19. 19 Results – First Scenario y = 3.374x + 31.6 R² = 0.967 y = 6.517x + 7.466 R² = 0.993 0 50 100 150 200 250 300 350 400 0 10 20 30 40 50 60 Runtimeinseconds Number of records (in millions) Hadoop Standalone
  • 20.  Neither implementation ran out of RAM  Have to clear out working memory  Both implementations grow linearly  Standalone grows twice as fast 20 Results – First Scenario
  • 21. How much did the agent sell total?  Total = sum of all sales by this agent  Pass-through mapper, work in reducer 21 Second Scenario
  • 22. 22 Results – Second Scenario y = 6x + 32.8 R² = 0.992 y = 15.6x - 3 R² = 0.999 0 10 20 30 40 50 60 70 0 1 2 3 4 5 6 Runtime(inseconds) Number of records (in millions) Hadoop Standalone
  • 23.  Standalone ran out of RAM (9 GB) at 5M  Hadoop never did; ran up to 50M  Hadoop ran at 24x600MB reducers  Getting the data there took a while though  MapReduce scaled very well  Due to how Hadoop MapReduce is implemented  Can be run on just one reducer 23 Results – Second Scenario
  • 24.  Lots of memory required  For complicated rules  Read up on the implementation of your engine  Hadoop only for huge datasets  JVM startup time  Cutoff size depends on rule complexity and object sizes  Hadoop scales very well  Especially with subset calculations 24 Conclusions
  • 25.  You will have a custom solution  Need to know your data  Need to know what you want to find out  Different ways of writing rules for the same thing  Write your own MapReduce jobs 25 How do I actually do this?
  • 26.  Figure out execution of rules  What does the work (mapper vs reducer vs both)  Make sure your beans can be serialized  Recursive serialization  Your mileage will vary  Every organization has different needs and capacities 26 How do I actually do this?

Hinweis der Redaktion

  1. Thanks for comingI’m Egor GryaznovHow many know business rules?How many know Hadoop? (haha)
  2. Make decisions based on data If age > 21, then can drink = true If customer bought 2 items, get cheapest for free