Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Nächste SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Wird geladen in …3
×
1 von 39

Fraud detection by Denisa Banulescu-Radu

0

Teilen

Herunterladen, um offline zu lesen

Denisa Banulescu-Radu presentation at the joint Bucharest - Paris WiMLDS meetup, 13 April 2021

Fraud detection by Denisa Banulescu-Radu

  1. 1. Banulescu-Radu (LEO) WiMLDS 13/04/2021 1 / 39
  2. 2. Data Science for Financial Fraud Detection Denisa BANULESCU-RADU University of Orléans, LEO WiMLDS 13th of April 2021 Banulescu-Radu (LEO) WiMLDS 13/04/2021 2 / 39
  3. 3. Background • Since 2015: Associate Professor – University of Orléans, LEO • 2016: Young Researcher Award in Economics – Autorité des Marchés Financiers • 2015: Thesis Prize – Fondation Banque de France • 2014-2015: Max Weber Postdoctoral Fellow – European University Institute • 2011-2014: PhD in Economics – Maastricht University and University of Orléans Title dissertation: "Four essays in financial econometrics" Banulescu-Radu (LEO) WiMLDS 13/04/2021 3 / 39
  4. 4. Main research interests Banulescu-Radu (LEO) WiMLDS 13/04/2021 4 / 39
  5. 5. Outline 1 Econometrics vs Machine Learning 2 General aspects of fraud 3 Main challenges and solutions 4 Case studies 4.1 Case 1: Insurance fraud detection 4.2 Case 2: Social fraud detection 5 Conclusion Banulescu-Radu (LEO) WiMLDS 13/04/2021 5 / 39
  6. 6. Econometrics vs Machine Learning Outline 1 Econometrics vs Machine Learning 2 General aspects of fraud 3 Main challenges and solutions 4 Case studies 4.1 Case 1: Insurance fraud detection 4.2 Case 2: Social fraud detection 5 Conclusion Banulescu-Radu (LEO) WiMLDS 13/04/2021 6 / 39
  7. 7. Econometrics vs Machine Learning Econometrics vs Machine Learning Banulescu-Radu (LEO) WiMLDS 13/04/2021 7 / 39
  8. 8. Econometrics vs Machine Learning Econometrics vs Machine Learning Banulescu-Radu (LEO) WiMLDS 13/04/2021 8 / 39
  9. 9. Econometrics vs Machine Learning “there are a number of areas where there would be opportunities for fruitful collaboration between econometrics and machine learning ” Hal Varian (2014) - Professor of Economics (University of Michigan) & Chief Economist (Google) Banulescu-Radu (LEO) WiMLDS 13/04/2021 9 / 39
  10. 10. General aspects of fraud Outline 1 Econometrics vs Machine Learning 2 General aspects of fraud 3 Main challenges and solutions 4 Case studies 4.1 Case 1: Insurance fraud detection 4.2 Case 2: Social fraud detection 5 Conclusion Banulescu-Radu (LEO) WiMLDS 13/04/2021 10 / 39
  11. 11. General aspects of fraud Fraud detection - Why is it important? Banulescu-Radu (LEO) WiMLDS 13/04/2021 11 / 39
  12. 12. General aspects of fraud Definition of fraud Definition • Baesens et al. (2015) Fraud is an uncommon, well-considered, imperceptibly concealed, time-evolving, and often carefully organized crime which appears in many types of forms. Banulescu-Radu (LEO) WiMLDS 13/04/2021 12 / 39
  13. 13. General aspects of fraud Typologies of fraud Banulescu-Radu (LEO) WiMLDS 13/04/2021 13 / 39
  14. 14. Main challenges and solutions Outline 1 Econometrics vs Machine Learning 2 General aspects of fraud 3 Main challenges and solutions 4 Case studies 4.1 Case 1: Insurance fraud detection 4.2 Case 2: Social fraud detection 5 Conclusion Banulescu-Radu (LEO) WiMLDS 13/04/2021 14 / 39
  15. 15. Main challenges and solutions Main CHALLENGES and solutions Banulescu-Radu (LEO) WiMLDS 13/04/2021 15 / 39
  16. 16. Main challenges and solutions Main CHALLENGES and solutions Banulescu-Radu (LEO) WiMLDS 13/04/2021 16 / 39
  17. 17. Main challenges and solutions Main CHALLENGES and solutions Banulescu-Radu (LEO) WiMLDS 13/04/2021 17 / 39
  18. 18. Main challenges and solutions Main CHALLENGES and solutions Banulescu-Radu (LEO) WiMLDS 13/04/2021 18 / 39
  19. 19. Main challenges and solutions Main challenges and SOLUTIONS 1. Main tools used to fight fraud Banulescu-Radu (LEO) WiMLDS 13/04/2021 19 / 39
  20. 20. Main challenges and solutions Main challenges and SOLUTIONS 2. Deal with imbalanced datasets Banulescu-Radu (LEO) WiMLDS 13/04/2021 20 / 39
  21. 21. Main challenges and solutions Main challenges and SOLUTIONS 2. Deal with imbalanced datasets Banulescu-Radu (LEO) WiMLDS 13/04/2021 21 / 39
  22. 22. Main challenges and solutions Main challenges and SOLUTIONS Banulescu-Radu (LEO) WiMLDS 13/04/2021 22 / 39
  23. 23. Main challenges and solutions Main challenges and SOLUTIONS 3. Evaluation of fraud detection models Banulescu-Radu (LEO) WiMLDS 13/04/2021 23 / 39
  24. 24. Main challenges and solutions Main challenges and SOLUTIONS 4. Improving the interpretability of fraud detection models “if the users do not trust a model or a prediction, they will not use it” (Ribeiro et al., 2016) • LIME method Ribeiro et al. (2016) • SHAP (SHapley Additive exPlanations) value Lundberg and Lee, (2017) BUT ... to what extent do we need fraud detection models to be interpretable? Banulescu-Radu (LEO) WiMLDS 13/04/2021 24 / 39
  25. 25. Case studies Outline 1 Econometrics vs Machine Learning 2 General aspects of fraud 3 Main challenges and solutions 4 Case studies 4.1 Case 1: Insurance fraud detection 4.2 Case 2: Social fraud detection 5 Conclusion Banulescu-Radu (LEO) WiMLDS 13/04/2021 25 / 39
  26. 26. Case studies Case 1: Insurance fraud detection Outline 1 Econometrics vs Machine Learning 2 General aspects of fraud 3 Main challenges and solutions 4 Case studies 4.1 Case 1: Insurance fraud detection 4.2 Case 2: Social fraud detection 5 Conclusion Banulescu-Radu (LEO) WiMLDS 13/04/2021 26 / 39
  27. 27. Case studies Case 1: Insurance fraud detection General framework • Fraud claims represented 10% of the total number of claims in 2019 (Insurance Europe) • Negative record for France: e2.5 Billion in 2014. Only e219 million recovered. (ALFA) Banulescu-Radu (LEO) WiMLDS 13/04/2021 27 / 39
  28. 28. Case studies Case 1: Insurance fraud detection Methodology DATA • 45 954 house claims for the period 2013 to 2017 • French insurance company • 0.76% of claims are fraudulent Technical tools • Logistic LASSO (Cox, 1958; Tibshirani, 1996) • Random forest (Breiman, 2001) • Extreme Gradient Boosting or Xgboost (Chen and Guestrin, 2016) Resampling techniques to deal with imbalanced data • Random Oversampling • Synthetic Minority Oversampling TEchnique or SMOTE (Chawla et al., 2002) • ADAptive SYNthetic sampling or ADASYN (He et al., 2008) Performance metrics • AUC-ROC, AUC-PR, Brier score, Log-Loss, F-measure Banulescu-Radu (LEO) WiMLDS 13/04/2021 28 / 39
  29. 29. Case studies Case 1: Insurance fraud detection Methodology Banulescu-Radu (LEO) WiMLDS 13/04/2021 29 / 39
  30. 30. Case studies Case 1: Insurance fraud detection • Interpretation of results: SHAP value method (global/individual level) Figure 1: Fraudulent case Figure 2: Non Fraudulent case Banulescu-Radu (LEO) WiMLDS 13/04/2021 30 / 39
  31. 31. Case studies Case 2: Social fraud detection Outline 1 Econometrics vs Machine Learning 2 General aspects of fraud 3 Main challenges and solutions 4 Case studies 4.1 Case 1: Insurance fraud detection 4.2 Case 2: Social fraud detection 5 Conclusion Banulescu-Radu (LEO) WiMLDS 13/04/2021 31 / 39
  32. 32. Case studies Case 2: Social fraud detection General framework • Controlling the risks of social and fiscal fraud and combating illegal work are also important problems for social justice and economic efficiency • French mutual organization • collects data systematically from their beneficiaries • organizes regular controls on a subsample of their taxpayers • manages a fraud detection system to identify those who do not pay their contributions Banulescu-Radu (LEO) WiMLDS 13/04/2021 32 / 39
  33. 33. Case studies Case 2: Social fraud detection General framework Objective: Estimate the tax shortfall. Definition The tax shortfall is defined as the potential sum of the tax adjustments that could have been imposed on companies having defrauded or made er- roneous social declarations, if they had been effectively audited, whereas they were not in reality. Banulescu-Radu (LEO) WiMLDS 13/04/2021 33 / 39
  34. 34. Case studies Case 2: Social fraud detection Remarks • the two decisions are neither sequential nor conditional • the decisions are linked Banulescu-Radu (LEO) WiMLDS 13/04/2021 34 / 39
  35. 35. Case studies Case 2: Social fraud detection Banulescu-Radu (LEO) WiMLDS 13/04/2021 35 / 39
  36. 36. Case studies Case 2: Social fraud detection Methodology: Estimation by Maximum Likelihood Control decision Ci = ( 1 0 if C∗ i = Xc,i βc + εc,i > 0 otherwise ∀i = 1, . . . , n (1) Fraud decision e Di = 1 0 if D∗ i = Xd,i βd + εd,i 0 otherwise ∀i = 1, . . . , n (2) Potential tax shortfall M∗ i = ( Xm,i βm + εm,i 0 if e Di = 1 otherwise ∀i = 1, ..n (3)   εc,i εd,i εm,i   ∼ N 0, X with X = DRD (4) D =    σc 0 0 0 σd 0 0 0 σm    R =    1 ρcd ρcm ρcd 1 ρdm ρcm ρdm 1    (5) Banulescu-Radu (LEO) WiMLDS 13/04/2021 36 / 39
  37. 37. Conclusion Outline 1 Econometrics vs Machine Learning 2 General aspects of fraud 3 Main challenges and solutions 4 Case studies 4.1 Case 1: Insurance fraud detection 4.2 Case 2: Social fraud detection 5 Conclusion Banulescu-Radu (LEO) WiMLDS 13/04/2021 37 / 39
  38. 38. Conclusion Thank you for your attention! Banulescu-Radu (LEO) WiMLDS 13/04/2021 38 / 39
  39. 39. Conclusion Banulescu-Radu (LEO) WiMLDS 13/04/2021 39 / 39

×