Alex Korbonits, Data Scientist, Remitly, at MLconf Seattle 2017

19 May 2017
Applications of machine learning and ensemble methods to
risk rule optimization
Alex Korbonits, Data Scientist

2
Introduction
About Remitly and Me

3
Introduction
• Risk management and risk rules
• Generating rules from machine learning models
• Incremental rule ranking
• Model ensembling
• Rule inclusion/exclusion criteria
• Why this matters to Remitly
Agenda

A spectre is haunting risk management — the spectre of…

8
Risk rules, how do they work?
• Rules are typically managed via a GUI. Dropdown menus, etc.
• Rules are logical conjunctions of expressions of input data, e.g.:
(x < 10) AND (y > 20) AND (z < 100)
• Rule conditions are based on transaction and customer
attributes.
• Collectively, all rules form a logical disjunction, e.g.:
rule1 OR rule2 OR rule3
• When one rule triggers, we queue a transaction for review.
• Easy to integrate rules we’ve learned from data into this
framework.
Risk management and risk rules

9
FOILed again
• FOIL (first order inductive learner)
• Accepts binary features only
• A rule is a simple conjunction of binary features
• Learns rules via separate-and-conquer
• Decision tree
• Accepts continuous and categorical features
• A single rule is a root-to-leaf path
• Learns via divide-and-conquer
Generating rules from machine learning models

10
Separate-and-conquer
• FOIL takes as its input sequences of features and a ground
truth. We map all of our input features to a boolean space.
• Different strategies for continuous features, e.g., binning.
• FOIL learns Horn Clause programs from examples
Implication form: (p ∧ q ∧ ... ∧ t) → u
Disjunction form: ¬p ∨ ¬q ∨ ... ∨ ¬t ∨ u
• Learns Horn Clause programs from positive class examples.
• Examples are removed from training data at each step.
• FOIL rules are simply lists of features.
• We map rules we learn from FOIL into human-readable
rules that we can implement in our risk rule management
system.
FOIL (First Order Inductive Learner)

11
Divide-and-conquer
• Decision trees are interpretable
• A rule is a root-to-leaf path.
• Like a FOIL rule, a decision tree rule is a conjunction.
• Use DFS to extract all rules from a decision tree
• Easy to evaluate in together with FOIL rules
• Easily implementable in our risk rule management
system
Decision Trees

12
SQL to the rescue
• We synthesize hand-crafted rule performance with SQL
• For each transaction, we know if a rule triggers or not.
• We can use this to synthesize new handcrafted rules
that aren’t yet in production.
• We can derive precision/recall easily from this data.
• We can rank productionized rules alone to look at rules
we can immediately eliminate from production (i.e.,
remove redundancy).
• We can rank productionized rules alone to establish a
baseline level of performance for risk rule management.
Synthesizing Production Rules

13
You are the weakest rule, goodbye!
• Today, there are hundreds of rules live in production.
• A single decision tree or FOIL model can represent
thousands of rules.
• Can we find a strict subset of those rules that recalls the
exact same amount of fraud?
• First we measure the performance of each rule
individually on a test set.
• With each step, we get the (next) best rule and remove
the fraud from our test set that our (next) best rule
catches.
• We repeat this process until our rules no longer catch
any uncaught fraud, whereupon the process terminates.
Incremental Rule Ranking

14
Will it blend?
• Ensembling rules gives us a lot of lift
• We ensemble:
• Synthesized production rules
• FOIL rules
• Decision tree rules
• We rank a list of candidate rules from each model class.
• Our output is a classifier of ensembled rules
• We’re seeing 8% jump in recall and a 1% increase in
precision
Model ensembling

15
To include or not to include, that is the question
• Risk rule optimization is a constraint optimization
problem
• Optimal rule sets must satisfy business constraints
• We must balance catching fraud with insulting
customers
• Constraints can be nonlinear, e.g., with tradeoffs
between precision and recall.
• With each ranking step, we evaluate the whole classifier
• We include a rule when our classifier fits our criteria
• We discard rules when our classifier violates our criteria
Rule inclusion/exclusion criteria

16
It’s a rule in a black-box!
• The most informative rule features are derived from
black box models.
• Rules/lists of rules with these features as conditions is
kind of model stacking
• Risk rules limited to conjunctions, but inputs unlimited
• Add more black box inputs to improve rules learned
• Better black-box inputs reduce complexity of rules (i.e.,
they have fewer conditions)
Black box input features

17
How did we do this?
• Redshift
• Python
• S3
• EC2 p2.xlarge with deep learning AMI
• GPU instance gives us ~17x boost in training/inference
time compared to laptop
• TensorFlow/Keras
• Scalding
Technologies used

18
Citing our sources
Bibliography
Fürnkranz, Johannes. "Separate-and-conquer rule learning." Artificial Intelligence Review 13, no. 1 (1999): 3-54.
Mooney, Raymond J., and Mary Elaine Califf. "Induction of first-order decision lists: Results on learning the past tense of English verbs." JAIR 3 (1995): 1-24.
Quinlan, J. Ross. "Induction of decision trees." Machine learning 1, no. 1 (1986): 81-106.
Quinlan, J. Ross. "Learning logical definitions from relations." Machine learning 5, no. 3 (1990): 239-266.
Quinlan, J. R. "Determinate literals in inductive logic programming." In Proceedings of the Eighth International Workshop on Machine Learning, pp. 442-446. 1991.
Quinlan, J., and R. Cameron-Jones. "FOIL: A midterm report." In Machine Learning: ECML-93, pp. 1-20. Springer Berlin/Heidelberg, 1993.
Quinlan, J. Ross, and R. Mike Cameron-Jones. "Induction of logic programs: FOIL and related systems." New Generation Computing 13, no. 3-4 (1995): 287-312.
Quinlan, J. Ross. C4. 5: programs for machine learning. Elsevier, 2014.

19
What we talked about
• Risk management and risk rules
• Generating rules from machine learning models
• Incremental rule ranking
• Model ensembling
• Rule inclusion/exclusion criteria
• Why this matters to Remitly
Summary

20
Remitly’s Data Science team uses ML for a variety of purposes.
ML applications are core to our business – therefore our business must be core to our ML applications.
Machine learning at Remitly

www.remitly.com/careers
We’re hiring!
alex@remitly.com

Alex Korbonits, Data Scientist, Remitly, at MLconf Seattle 2017

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Alex Korbonits, Data Scientist, Remitly, at MLconf Seattle 2017

Ähnlich wie Alex Korbonits, Data Scientist, Remitly, at MLconf Seattle 2017 (20)

Mehr von MLconf

Mehr von MLconf (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Alex Korbonits, Data Scientist, Remitly, at MLconf Seattle 2017

Hinweis der Redaktion