SlideShare ist ein Scribd-Unternehmen logo
1 von 21
19 May 2017
Applications of machine learning and ensemble methods to
risk rule optimization
Alex Korbonits, Data Scientist
2
Introduction
About Remitly and Me
3
Introduction
• Risk management and risk rules
• Generating rules from machine learning models
• Incremental rule ranking
• Model ensembling
• Rule inclusion/exclusion criteria
• Why this matters to Remitly
Agenda
A spectre is haunting risk management — the spectre of…
8
Risk rules, how do they work?
• Rules are typically managed via a GUI. Dropdown menus, etc.
• Rules are logical conjunctions of expressions of input data, e.g.:
(x < 10) AND (y > 20) AND (z < 100)
• Rule conditions are based on transaction and customer
attributes.
• Collectively, all rules form a logical disjunction, e.g.:
rule1 OR rule2 OR rule3
• When one rule triggers, we queue a transaction for review.
• Easy to integrate rules we’ve learned from data into this
framework.
Risk management and risk rules
9
FOILed again
• FOIL (first order inductive learner)
• Accepts binary features only
• A rule is a simple conjunction of binary features
• Learns rules via separate-and-conquer
• Decision tree
• Accepts continuous and categorical features
• A single rule is a root-to-leaf path
• Learns via divide-and-conquer
Generating rules from machine learning models
10
Separate-and-conquer
• FOIL takes as its input sequences of features and a ground
truth. We map all of our input features to a boolean space.
• Different strategies for continuous features, e.g., binning.
• FOIL learns Horn Clause programs from examples
Implication form: (p ∧ q ∧ ... ∧ t) → u
Disjunction form: ¬p ∨ ¬q ∨ ... ∨ ¬t ∨ u
• Learns Horn Clause programs from positive class examples.
• Examples are removed from training data at each step.
• FOIL rules are simply lists of features.
• We map rules we learn from FOIL into human-readable
rules that we can implement in our risk rule management
system.
FOIL (First Order Inductive Learner)
11
Divide-and-conquer
• Decision trees are interpretable
• A rule is a root-to-leaf path.
• Like a FOIL rule, a decision tree rule is a conjunction.
• Use DFS to extract all rules from a decision tree
• Easy to evaluate in together with FOIL rules
• Easily implementable in our risk rule management
system
Decision Trees
12
SQL to the rescue
• We synthesize hand-crafted rule performance with SQL
• For each transaction, we know if a rule triggers or not.
• We can use this to synthesize new handcrafted rules
that aren’t yet in production.
• We can derive precision/recall easily from this data.
• We can rank productionized rules alone to look at rules
we can immediately eliminate from production (i.e.,
remove redundancy).
• We can rank productionized rules alone to establish a
baseline level of performance for risk rule management.
Synthesizing Production Rules
13
You are the weakest rule, goodbye!
• Today, there are hundreds of rules live in production.
• A single decision tree or FOIL model can represent
thousands of rules.
• Can we find a strict subset of those rules that recalls the
exact same amount of fraud?
• First we measure the performance of each rule
individually on a test set.
• With each step, we get the (next) best rule and remove
the fraud from our test set that our (next) best rule
catches.
• We repeat this process until our rules no longer catch
any uncaught fraud, whereupon the process terminates.
Incremental Rule Ranking
14
Will it blend?
• Ensembling rules gives us a lot of lift
• We ensemble:
• Synthesized production rules
• FOIL rules
• Decision tree rules
• We rank a list of candidate rules from each model class.
• Our output is a classifier of ensembled rules
• We’re seeing 8% jump in recall and a 1% increase in
precision
Model ensembling
15
To include or not to include, that is the question
• Risk rule optimization is a constraint optimization
problem
• Optimal rule sets must satisfy business constraints
• We must balance catching fraud with insulting
customers
• Constraints can be nonlinear, e.g., with tradeoffs
between precision and recall.
• With each ranking step, we evaluate the whole classifier
• We include a rule when our classifier fits our criteria
• We discard rules when our classifier violates our criteria
Rule inclusion/exclusion criteria
16
It’s a rule in a black-box!
• The most informative rule features are derived from
black box models.
• Rules/lists of rules with these features as conditions is
kind of model stacking
• Risk rules limited to conjunctions, but inputs unlimited
• Add more black box inputs to improve rules learned
• Better black-box inputs reduce complexity of rules (i.e.,
they have fewer conditions)
Black box input features
17
How did we do this?
• Redshift
• Python
• S3
• EC2 p2.xlarge with deep learning AMI
• GPU instance gives us ~17x boost in training/inference
time compared to laptop
• TensorFlow/Keras
• Scalding
Technologies used
18
Citing our sources
Bibliography
Fürnkranz, Johannes. "Separate-and-conquer rule learning." Artificial Intelligence Review 13, no. 1 (1999): 3-54.
Mooney, Raymond J., and Mary Elaine Califf. "Induction of first-order decision lists: Results on learning the past tense of English verbs." JAIR 3 (1995): 1-24.
Quinlan, J. Ross. "Induction of decision trees." Machine learning 1, no. 1 (1986): 81-106.
Quinlan, J. Ross. "Learning logical definitions from relations." Machine learning 5, no. 3 (1990): 239-266.
Quinlan, J. R. "Determinate literals in inductive logic programming." In Proceedings of the Eighth International Workshop on Machine Learning, pp. 442-446. 1991.
Quinlan, J., and R. Cameron-Jones. "FOIL: A midterm report." In Machine Learning: ECML-93, pp. 1-20. Springer Berlin/Heidelberg, 1993.
Quinlan, J. Ross, and R. Mike Cameron-Jones. "Induction of logic programs: FOIL and related systems." New Generation Computing 13, no. 3-4 (1995): 287-312.
Quinlan, J. Ross. C4. 5: programs for machine learning. Elsevier, 2014.
19
What we talked about
• Risk management and risk rules
• Generating rules from machine learning models
• Incremental rule ranking
• Model ensembling
• Rule inclusion/exclusion criteria
• Why this matters to Remitly
Summary
20
Remitly’s Data Science team uses ML for a variety of purposes.
ML applications are core to our business – therefore our business must be core to our ML applications.
Machine learning at Remitly
www.remitly.com/careers
We’re hiring!
alex@remitly.com

Weitere ähnliche Inhalte

Was ist angesagt?

Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learning
butest
 

Was ist angesagt? (20)

Feature selection
Feature selectionFeature selection
Feature selection
 
simulation modeling in DSS
 simulation modeling in DSS simulation modeling in DSS
simulation modeling in DSS
 
Statistical learning intro
Statistical learning introStatistical learning intro
Statistical learning intro
 
Unit 1 introduction
Unit 1 introductionUnit 1 introduction
Unit 1 introduction
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learning
 
Introduction to ml
Introduction to mlIntroduction to ml
Introduction to ml
 
MachineLlearning introduction
MachineLlearning introductionMachineLlearning introduction
MachineLlearning introduction
 
Modelling simulation (1)
Modelling simulation (1)Modelling simulation (1)
Modelling simulation (1)
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
 
Learning Method In Data Mining
Learning Method In Data MiningLearning Method In Data Mining
Learning Method In Data Mining
 
Interpretable machine learning : Methods for understanding complex models
Interpretable machine learning : Methods for understanding complex modelsInterpretable machine learning : Methods for understanding complex models
Interpretable machine learning : Methods for understanding complex models
 
Presentationnnnn
PresentationnnnnPresentationnnnn
Presentationnnnn
 
System modeling and simulation full notes by sushma shetty (www.vtulife.com)
System modeling and simulation full notes by sushma shetty (www.vtulife.com)System modeling and simulation full notes by sushma shetty (www.vtulife.com)
System modeling and simulation full notes by sushma shetty (www.vtulife.com)
 
A neural ada boost based facial expression recogniton System
A neural ada boost based facial expression recogniton SystemA neural ada boost based facial expression recogniton System
A neural ada boost based facial expression recogniton System
 
Fairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningFairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine Learning
 
Production systems
Production systemsProduction systems
Production systems
 
Feature Selection Techniques for Software Fault Prediction (Summary)
Feature Selection Techniques for Software Fault Prediction (Summary)Feature Selection Techniques for Software Fault Prediction (Summary)
Feature Selection Techniques for Software Fault Prediction (Summary)
 
Introduction to simulation and modeling
Introduction to simulation and modelingIntroduction to simulation and modeling
Introduction to simulation and modeling
 
introduction to modeling, Types of Models, Classification of mathematical mod...
introduction to modeling, Types of Models, Classification of mathematical mod...introduction to modeling, Types of Models, Classification of mathematical mod...
introduction to modeling, Types of Models, Classification of mathematical mod...
 

Andere mochten auch

Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
MLconf
 
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
MLconf
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
MLconf
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
MLconf
 
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
MLconf
 
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
MLconf
 
Jeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisJeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, Adaptris
MLconf
 

Andere mochten auch (20)

Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
 
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
 
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
 
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
 
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
 
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017 Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
 
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
 
Jeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisJeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, Adaptris
 

Ähnlich wie Alex Korbonits, Data Scientist, Remitly, at MLconf Seattle 2017

Are evolutionary rule learning algorithms appropriate for malware detection
Are evolutionary rule learning algorithms appropriate for malware detectionAre evolutionary rule learning algorithms appropriate for malware detection
Are evolutionary rule learning algorithms appropriate for malware detection
UltraUploader
 
[EMC] Source Code Protection
[EMC] Source Code Protection[EMC] Source Code Protection
[EMC] Source Code Protection
Perforce
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Lionel Briand
 

Ähnlich wie Alex Korbonits, Data Scientist, Remitly, at MLconf Seattle 2017 (20)

Are evolutionary rule learning algorithms appropriate for malware detection
Are evolutionary rule learning algorithms appropriate for malware detectionAre evolutionary rule learning algorithms appropriate for malware detection
Are evolutionary rule learning algorithms appropriate for malware detection
 
Production based system
Production based systemProduction based system
Production based system
 
IBM Paris Bluemix Meetup #12 - Ecole 42 - 9 décembre 2015
IBM Paris Bluemix Meetup #12 - Ecole 42 - 9 décembre 2015IBM Paris Bluemix Meetup #12 - Ecole 42 - 9 décembre 2015
IBM Paris Bluemix Meetup #12 - Ecole 42 - 9 décembre 2015
 
Lecture 6 expert systems
Lecture 6   expert systemsLecture 6   expert systems
Lecture 6 expert systems
 
Production System
Production SystemProduction System
Production System
 
Lecture 06 production system
Lecture 06 production systemLecture 06 production system
Lecture 06 production system
 
Foutse_Khomh.pptx
Foutse_Khomh.pptxFoutse_Khomh.pptx
Foutse_Khomh.pptx
 
A Fast Decision Rule Engine for Anomaly Detection
A Fast Decision Rule Engine for Anomaly DetectionA Fast Decision Rule Engine for Anomaly Detection
A Fast Decision Rule Engine for Anomaly Detection
 
[EMC] Source Code Protection
[EMC] Source Code Protection[EMC] Source Code Protection
[EMC] Source Code Protection
 
KScope14 Understanding the Zombies that lurk within your system
KScope14 Understanding the Zombies that lurk within your systemKScope14 Understanding the Zombies that lurk within your system
KScope14 Understanding the Zombies that lurk within your system
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
 
Rules engine.pptx
Rules engine.pptxRules engine.pptx
Rules engine.pptx
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
Unit 1 Fundamentals of Artificial Intelligence-Part II.pptx
Unit 1  Fundamentals of Artificial Intelligence-Part II.pptxUnit 1  Fundamentals of Artificial Intelligence-Part II.pptx
Unit 1 Fundamentals of Artificial Intelligence-Part II.pptx
 
Rules in Artificial Intelligence
Rules in Artificial IntelligenceRules in Artificial Intelligence
Rules in Artificial Intelligence
 
Rise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetupRise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetup
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
 
Four ways to represent computer executable rules
Four ways to represent computer executable rulesFour ways to represent computer executable rules
Four ways to represent computer executable rules
 
Lecture_8.ppt
Lecture_8.pptLecture_8.ppt
Lecture_8.ppt
 

Mehr von MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 

Mehr von MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Alex Korbonits, Data Scientist, Remitly, at MLconf Seattle 2017

  • 1. 19 May 2017 Applications of machine learning and ensemble methods to risk rule optimization Alex Korbonits, Data Scientist
  • 3. 3 Introduction • Risk management and risk rules • Generating rules from machine learning models • Incremental rule ranking • Model ensembling • Rule inclusion/exclusion criteria • Why this matters to Remitly Agenda
  • 4. A spectre is haunting risk management — the spectre of…
  • 5.
  • 6.
  • 7.
  • 8. 8 Risk rules, how do they work? • Rules are typically managed via a GUI. Dropdown menus, etc. • Rules are logical conjunctions of expressions of input data, e.g.: (x < 10) AND (y > 20) AND (z < 100) • Rule conditions are based on transaction and customer attributes. • Collectively, all rules form a logical disjunction, e.g.: rule1 OR rule2 OR rule3 • When one rule triggers, we queue a transaction for review. • Easy to integrate rules we’ve learned from data into this framework. Risk management and risk rules
  • 9. 9 FOILed again • FOIL (first order inductive learner) • Accepts binary features only • A rule is a simple conjunction of binary features • Learns rules via separate-and-conquer • Decision tree • Accepts continuous and categorical features • A single rule is a root-to-leaf path • Learns via divide-and-conquer Generating rules from machine learning models
  • 10. 10 Separate-and-conquer • FOIL takes as its input sequences of features and a ground truth. We map all of our input features to a boolean space. • Different strategies for continuous features, e.g., binning. • FOIL learns Horn Clause programs from examples Implication form: (p ∧ q ∧ ... ∧ t) → u Disjunction form: ¬p ∨ ¬q ∨ ... ∨ ¬t ∨ u • Learns Horn Clause programs from positive class examples. • Examples are removed from training data at each step. • FOIL rules are simply lists of features. • We map rules we learn from FOIL into human-readable rules that we can implement in our risk rule management system. FOIL (First Order Inductive Learner)
  • 11. 11 Divide-and-conquer • Decision trees are interpretable • A rule is a root-to-leaf path. • Like a FOIL rule, a decision tree rule is a conjunction. • Use DFS to extract all rules from a decision tree • Easy to evaluate in together with FOIL rules • Easily implementable in our risk rule management system Decision Trees
  • 12. 12 SQL to the rescue • We synthesize hand-crafted rule performance with SQL • For each transaction, we know if a rule triggers or not. • We can use this to synthesize new handcrafted rules that aren’t yet in production. • We can derive precision/recall easily from this data. • We can rank productionized rules alone to look at rules we can immediately eliminate from production (i.e., remove redundancy). • We can rank productionized rules alone to establish a baseline level of performance for risk rule management. Synthesizing Production Rules
  • 13. 13 You are the weakest rule, goodbye! • Today, there are hundreds of rules live in production. • A single decision tree or FOIL model can represent thousands of rules. • Can we find a strict subset of those rules that recalls the exact same amount of fraud? • First we measure the performance of each rule individually on a test set. • With each step, we get the (next) best rule and remove the fraud from our test set that our (next) best rule catches. • We repeat this process until our rules no longer catch any uncaught fraud, whereupon the process terminates. Incremental Rule Ranking
  • 14. 14 Will it blend? • Ensembling rules gives us a lot of lift • We ensemble: • Synthesized production rules • FOIL rules • Decision tree rules • We rank a list of candidate rules from each model class. • Our output is a classifier of ensembled rules • We’re seeing 8% jump in recall and a 1% increase in precision Model ensembling
  • 15. 15 To include or not to include, that is the question • Risk rule optimization is a constraint optimization problem • Optimal rule sets must satisfy business constraints • We must balance catching fraud with insulting customers • Constraints can be nonlinear, e.g., with tradeoffs between precision and recall. • With each ranking step, we evaluate the whole classifier • We include a rule when our classifier fits our criteria • We discard rules when our classifier violates our criteria Rule inclusion/exclusion criteria
  • 16. 16 It’s a rule in a black-box! • The most informative rule features are derived from black box models. • Rules/lists of rules with these features as conditions is kind of model stacking • Risk rules limited to conjunctions, but inputs unlimited • Add more black box inputs to improve rules learned • Better black-box inputs reduce complexity of rules (i.e., they have fewer conditions) Black box input features
  • 17. 17 How did we do this? • Redshift • Python • S3 • EC2 p2.xlarge with deep learning AMI • GPU instance gives us ~17x boost in training/inference time compared to laptop • TensorFlow/Keras • Scalding Technologies used
  • 18. 18 Citing our sources Bibliography Fürnkranz, Johannes. "Separate-and-conquer rule learning." Artificial Intelligence Review 13, no. 1 (1999): 3-54. Mooney, Raymond J., and Mary Elaine Califf. "Induction of first-order decision lists: Results on learning the past tense of English verbs." JAIR 3 (1995): 1-24. Quinlan, J. Ross. "Induction of decision trees." Machine learning 1, no. 1 (1986): 81-106. Quinlan, J. Ross. "Learning logical definitions from relations." Machine learning 5, no. 3 (1990): 239-266. Quinlan, J. R. "Determinate literals in inductive logic programming." In Proceedings of the Eighth International Workshop on Machine Learning, pp. 442-446. 1991. Quinlan, J., and R. Cameron-Jones. "FOIL: A midterm report." In Machine Learning: ECML-93, pp. 1-20. Springer Berlin/Heidelberg, 1993. Quinlan, J. Ross, and R. Mike Cameron-Jones. "Induction of logic programs: FOIL and related systems." New Generation Computing 13, no. 3-4 (1995): 287-312. Quinlan, J. Ross. C4. 5: programs for machine learning. Elsevier, 2014.
  • 19. 19 What we talked about • Risk management and risk rules • Generating rules from machine learning models • Incremental rule ranking • Model ensembling • Rule inclusion/exclusion criteria • Why this matters to Remitly Summary
  • 20. 20 Remitly’s Data Science team uses ML for a variety of purposes. ML applications are core to our business – therefore our business must be core to our ML applications. Machine learning at Remitly

Hinweis der Redaktion

  1. Hi everyone. My name is Alex Korbonits, and I am a data scientist at Remitly. This talk is broadly about applying machine learning to legacy risk rule systems.
  2. Before we dive in, here’s a little bit about Remitly and me. Remitly was founded in 2011 out to forever change the way people send money to their loved ones. Worldwide, remittances represent over 660 billion dollars annually, roughly 4x the amount of foreign aid. We’re the largest independent digital remittance company in the U.S. We’re sending over 2 billion dollars annually and growing quickly I'm Remitly's first data scientist, and our team is growing. Right now my principal focus is FRAUD CLASSIFICATION Previously, I was a data scientist at a startup called Nuiku, focusing on NLP.
  3. First a quick background on risk management systems and how risk rules are used in industry Almost always these rules are hand-crafted by domain experts. Why not generate rules from machine learning models? Once we’ve generated rules, we’ll consider how to measure their effectiveness via ranking them, as single models in isolation or ensembled together with rules from other models. Importantly, we’re able to evaluate rules we’ve generated from machine learning models together with existing hand-crafted risk rules. Industrial settings require thinking beyond status quo model evaluation metrics: today we’ll consider tying model and rule selection to business costs and impact. That makes sense, and dollars and cents. Internally, we’ve developed a tool that can do all of this end-to-end. It’s being used by fraud domain experts to optimize our current risk rules in production.
  4. A Spectre is haunting risk management... the spectre of...
  5. COMMUNISM. Wait, hold on a second, that’s another talk…
  6. The spectre of… BIG DATA
  7. Typically, risk rules are handcrafted by domain experts. They're usually bucketed into different categories and their overall effect is orchestrated with different priorities and workflows. Risk rules come in many flavors. In the case of fraud, the majority of risk rules are targeted toward common MO's of fraudsters. We also use risk rules to comply with company policy and governmental policy. Not all risk rules have to do with fraud. Plenty are for KYC, or, know-your-customer purposes. Last, risk rule management systems are used to detect suspicious or illegal activity that isn't fraud, for example, for money laundering. Policy rules make sense to implement by hand because they are a direct reflection of those policies. However, when it comes to fraud rules or rules for suspicious activity, all too often rules are created in a reactionary manner to cover slightly generalized patterns of examples of known fraud that were previously undetected. The spectre of big data renders this process impractical, inefficient, and expensive. It’s imperative that we begin to scale out our production of new rules so that we can keep up with managing existing and new risks we face every day. We want to use machine learning to change risk rule management from a reactionary to a predictive practice. Last, we don't just want to manage our risk rules. We want to optimize them, with some constraints.
  8. Risk rules, how do they work? Typically, risk rules are managed in a GUI. Dropdown menus, clicking boxes, etc. The complexity of a single rule is usually that of a logical conjunction. Foo AND bar AND baz We use customer and transaction inputs as features to our rules based on policy and domain knowledge. They’re what you’d expect. Recency, frequency, and magnitude features are very common, as are count features. Our risk classifier as a whole is a logical disjunction of all of our rules. Rule1 OR rule2 OR rule3 At its simplest, when a rule fires we queue a transaction for manual risk review. We can easily integrate rules generated from machine learning models into this system if they can be represented as conjunctions.
  9. We need to train machine learning models on our dataset and extract rules from them. How do we do that? For our MVP, we start with two simple model classes. Single decision trees and FOIL models. FOIL stands for First Order Inductive Learner. It learns rules differently than decision trees, via separate and conquer vs. divide and conquer For example, a transaction can only follow a single path through a decision tree. However, a single transaction can trigger multiple FOIL rules, even though during training a single transaction is only picked up by one foil rule before it is discarded during training Feature engineering is important with FOIL, where splits for continuous features need to be pre-specified. Rules extracted from decision trees are nice since the splits are learned during training.
  10. What is FOIL? Again, FOIL stands for First Order Inductive Learner. First, to prep our data for FOIL, we take sequences of input data and our label and map it to a feature space of booleans. For categorical or sparse features this is straightforward but for continuous features we have more flexibility. Binning is a pretty simple option. What's good is that there is always room to improve this feature engineering process. Deciding how to do this for continuous variables properly is extremely important, especially when certain continuous variables are skewed. A FOIL model learns Horn clause programs from examples. What's a Horn clause program? Effectively, a Horn clause program is a conjunction of boolean statements about your data which imply a particular class. There are two big ways that FOIL is different than a decision tree. One, they learn differently. FOIL models look for positive examples first, and learn a very precise boolean box around those examples. Then, those positive examples are removed from the training data. Subsequent rules are learned from the remaining training data. This process continues to produce highly targeted/precise rules for us. With decision trees, the gain of a given split is evaluated globally. Second, in a decision tree, nearby leaf nodes share a LOT of ancestors together. FOIL rules cover the space of examples differently. This is one of the reasons we chose to use FOIL, as its different hypotheses about our data act as a nice form of regularization. When training has completed, we map rules we’ve learned via FOIL into human-readable rules that we can implement in our risk rule management system.
  11. So that’s FOIL. We also chose to derive sets of rules from decision tree models since they’re also interpretable. Like a FOIL rule, a decision tree rule is simply a conjunction of conditions at each branch of the tree. A single condition is a feature, a threshold, and an inequality. Rule extraction is easy: just find all root-to-leaf paths via depth-first-search. We use a common framework to evaluate FOIL rules and decision tree rules together with hand-crafted rules that are already in production These rules are easy to implement in our risk rule management system.
  12. To evaluate rules learned from machine learning models together with hand-crafted rules, we need to synthesize our rules in the data. We don’t just want historical performance of our rules. We want to consider synthesizing performance of new handcrafted rules, too, before they’re in production. Here we write SQL to see, for every transaction, whether or not our production rules would have triggered. We can derive all of the same metrics with these rules as we can when we evaluate decision tree or foil rules. If we evaluate the performance of all of our hand-crafted rules, we can immediately see where we can eliminate some of them that aren’t a value-add. I.e. there are redundant rules that may be causing unnecessary manual reviews. We can also look at our hand-crafted rules alone to establish a baseline level of performance. We have a minimum bar that we can augment with rules learned from data. How do we do this evaluation? Next, we turn to a process I’m calling incremental rule ranking.
  13. Now we have Incremental rule ranking This algorithm allows us to properly assess and compare rules, regardless of whether we manually create them or learn them from data. We do so by ranking rules *incrementally*. We rank incrementally according to the F-Beta-Score of a specific rule. We use an F-Beta-Score that directly ties the overall performance of our rules to our internal goals Ranking incrementally is a multi-step process that begins with measuring the performance of each rule individually on a test set. With each step, we get the (next) best rule and remove the fraud from our test set that our (next) best rule catches. We repeat this process until our rules no longer catch any uncaught fraud, whereupon the process terminates. This is a slight variant on the separate and conquer strategy that FOIL employs during model training. Here we’re not learning new rules, but we ARE finding a subset of rules that obtains the same recall as ALL of the rules combined. We’re getting a more precise overall classifier. We can do this for a single model class or source of rules. We can ensemble rules together from different sources or model classes. If we synthesize our production rules, we can measure their effectiveness as a baseline. Holding everything else constant, increasing *beta* will result in fewer rules and lower overall precision, whereas decreasing *beta* will result in more rules and higher overall precision. For a given beta and set of candidate rules, this algorithm does not increase the amount of fraud that the candidate rules catch. It gives us the most efficient subset of candidate rules that catch the maximum amount of fraud, drastically reducing the number of rules needed to do so and improving overall precision.
  14. Ensembling rules gives us a lot of lift We ensemble: Synthesized production rules FOIL rules Decision tree rules We rank a list of candidate rules from each model class. Our output is a classifier of ensembled rules We’re seeing 8% jump in recall and a 1% increase in precision
  15. Risk rule optimization is a constraint optimization problem We can’t just maximize the overall precision/recall of our classifier. That won’t do. Fraud is very expensive for us. We want to catch as much of it as possible. However, we don’t want to review every transaction. Our economic constraints weight the cost of false negatives so heavily that extremely high recall is required for us to keep the lights on. On the other hand, if we were to review every single transaction, we’d have great recall, but we’d insult customers, increase friction, and also have to take the time to do all of those reviews! So, we need to evaluate the classifier during the ranking process to ensure that we’re not putting ourselves out of business. Said another way, we want to make sure our classifier is doing its job and represents a viable set of rules to put into production. When we are considering adding a rule to our classifier, we evaluate the classifier as a whole to make sure our constraints are satisfied. If so, we add the rule and continue. If not, we discard the rule and consider other rule candidates. The rule inclusion/exclusion process is directly tied to our ranking process. On the right we have a precision recall plot and 3 curves. Each curve here represents a source of rules that have been ranked. Green and blue represent rules sourced from single model classes. Red is from ensembling these rules together prior to ranking. Each point, going from left to right, represents the cumulative precision and recall of a classifier after N rules have been ranked. In this example, we represent our constraints by these black lines. We want a classifier that is in the upper-right-hand quadrant defined by these two constraints. The horizontal line is one we look at during each step of ranking. We don't want our classifier to be this imprecise. The vertical line is more of a goal rather than a constraint. We want our classifier to eventually get past this line. It means we're really kicking fraud's butt.
  16. The most informative rule features are derived from black box models. Rules/lists of rules with these features as conditions is kind of model stacking Risk rules limited to conjunctions, but inputs unlimited Add more black box inputs to improve rules learned Better black-box inputs reduce complexity of rules (i.e., they have fewer conditions) I’d like to add that risk rule optimization as a tool can be used for multiple things. On the one hand, if we don’t engineer new features before learning new rules, we can immediately put them into production. On the flipside, we can engineer and test new features, and then see how useful they are to justify putting them into production.
  17. So how did we do this? We used a bunch of technologies to do risk rule optimization. Redshift – we use Redshift extensively for our data warehousing and for synthesizing our hand-crafted production rules Python – we used python extensively for building our risk-rule-optimization machine learning pipeline S3 – we used S3 a lot for pushing/pull input data, outputs, and all sorts of stuff. EC2 – I had to use EC2’s for this process. The Deep learning AMI was great for speeding up the training and inference of models used for building black box input features. GPU instance gives us ~17x boost in training/inference time compared to laptop I used r3’s and r4’s generally for the rest of the pipeline as well as for training and testing FOIL and decision tree models. TensorFlow/Keras – I used tensorflow/keras for building black-box models as inputs. Scalding – we used scalding for ETL to turn raw sources of production data into data ready to use by our machine learning models and for synthesizing our production rule performance
  18. And here is a bibliography that I certainly won’t begin to go through but which I’ll leave here for those who want to dive deeper.
  19. Hand-tailored risk management systems and risk rules are the status quo. Machine learning models help us learn new risk rules from data and improve upon rules we already have in production Since we’re learning rules from data it’s imperative that we carefully assess our rules one by one and as part of a whole classifier. We need to evaluate rules we’ve generated in concert with existing hand-crafted rules currently in production. Some of them are enormously valuable. We can’t simply cast them aside because they’re hand-crafted. Ranking rules helps us quantify their effectiveness and gauge how well our rules can generalize to unseen data. Tying in business constraints and objectives helps us choose what to implement and what to avoid – even if a rule recalls a lot of fraud for us, it could be extremely crude and subject an unreasonable and unnecessary number of customers to our risk investigation process. We are just getting started. We’re thinking of ways to incorporate black-box models further into risk rule optimization, both in terms of building smarter input features as well as for extracting rules. There are all sorts of other avenues. We can learn rules on top of rules to simulate rules with more complicated first-order logic for example.
  20. What does machine learning at Remitly look like? Understanding: Fraud classification Risk rule optimization Anomaly detection Customer segmentation and customer lifetime value Pricing optimization
  21. We're hiring! Email me at alex@remitly.com. That’s all, folks! THANKS