22. 應用:澳洲國稅局(1)
Australian Taxation Office — Case Study
全澳洲共有 22,000 員工
Revenue Collection and Refund
Management
Compliance and Risk Modelling
12M Individuals, $450B Income, $100B
Tax
2M Companies..., $1800B Income, $40B
Tax
2005 年改用 R 軟體分析資料
23. 應用:澳洲國稅局(2)
主要任務:
High Risk Refunds
Required to Lodge ($110M)
Assessing Levels of Debt
Propensity to Pay
Capacity to Pay
Determining Optimal Treatment Strategies
Identity Theft — eTax and International
Project Wickenby Text Mining
24. R的應用:澳洲國稅局(3)
Major task is all about the data:
data understanding/preparation, feature
generation/selection
100,000 cases by 1,000 variables
Stock and trade:
glm, rpart, ada, randomForest, kernlab
Simple binary classification and $ regression
Identify new characteristics to target high risk (5%);
Focus resources on productive cases - $ and tax payer
benefit;
Decision trees and ensembles (random forests) are
often effective
26. 德國 Fraunhofer 財經顧問公司(1)
60 家分支機構、80 個研究單位
18000 個員工,年預算 1.65 億歐元
http://www.fraunhofer.org
A case study on using generalized
additive models to fit credit rating
Scores (客戶信用評分卡系統)
by Marlene Müller,
marlene.mueller@itwm.fraunhofer.de
28. 德國 Fraunhofer顧問公司(3)
使用的 R 套件:
Two main approaches for GAM in
- gam::gam; backfitting with local scoring (Hastie and Tibshirani;
1990)
- mgcv::gam ; penalized regression splines (Wood; 2006)
; compare these procedures under the default settings of gam::gam
and mgcv::gam
Competing estimators:
- logit binary GLM with G(u) = 1/{1 + exp(−u)} (logistic cdf as link)
- logit2, logit3 binary GLM with 2nd / 3rd order polynomial terms
for the continuous regressors
- logitc binary GLM with continuous regressors categorized (4–5
levels)
- gam binary GAM using gam::gam with s() terms for continuous
- mgcv binary GAM using mgcv::gam