IRJET - Precise and Efficient Processing of Data in Permissioned Blockchain
Variables Creation using SAS on Longitudinal Data for Fraud Models
1.
2. Summary
The objective of this presentation is to learn how to use SAS to create
variables on longitudinal data for Third Party Transaction Fraud Modeling
purpose, as well as the type of variables used for Fraud Detecting & Prevention
model.
The Fraud model development sample usually is structured as longitudinal
data, or panel data. The panel data structure allows us to examine and comparedata, or panel data. The panel data structure allows us to examine and compare
response over time. The panel data usually consists of a large number of a short
series of time points.
Generally speaking, Fraud model could utilize information on historical data,
which provide information on past tendencies in order to capture abnormal
behavior to detect and prevent fraud.
3. Data Structure on Historical Transactions
Obs Account_Number DateTime IP_Address Fraud_Status Transaction_Amount
1 20001 11JUL14:11:28:22 216.82.XXX.33 0 500
2 20001 15JUL14:09:15:02 216.82.XXX.33 0 800
3 20001 19JUL14:12:10:03 216.82.XXX.33 0 1000
4 20001 25JUL14:16:20:23 216.82.XXX.33 0 600
5 20001 12AUG14:08:07:05 257.72.XXX.45 1 5000
6 20001 15AUG14:16:12:28 216.82.XXX.33 0 700
7 20002 13JUL14:18:02:25 325.82.XXX.44 0 300
8 20002 15JUL14:20:16:01 325.82.XXX.44 0 200
9 20002 28JUL14:14:06:02 159.21.XXX.25 1 2000
10 20002 18AUG14:15:05:45 325.82.XXX.44 0 400
Table Name: Fraud
In this example, the model purpose is to predict the future online credit card transaction fraud.
For the sake of simplicity, the online transaction panel data shown in above table only contains 2 customers with
2 distinct account number 20001 and 20002. The online transactions happened during July and August 2014. There
are total 10 transaction and only 2 confirmed fraud transaction happed on 08/12/2014 for account number 20001
and 07/28/2014 for account number 20002. The table also contains the information of IP address and transaction
amount during each transaction. In reality, since Fraud model development data contains each transaction data,
the development sample most likely would be extremely large dataset, including many different aspects of
variables.
In order obtain past behavior, we can create variables with historical time period involved variables.
10 20002 18AUG14:15:05:45 325.82.XXX.44 0 400
4. Type of Variables
Based on the historical information provided by the online credit card transaction
panel data, we can create the following variables in order to capture the fraud
behavior pattern. For example
1. whether there is more than 1 of IP address used within X days from the same
customer.
2. whether IP Address is matched with the latest IP Address between X days
from the same customer with non-fraud status in latest transaction.from the same customer with non-fraud status in latest transaction.
3. Whether summary of Last transaction amount is Less than X dollar amount
within Y days from the same customer.
When I first tried to create this type of changing time variables, I tried to use
complex SAS macro to solve the problem. Later I figured out SAS PRCO SQL
statement will solve most of the changing time variable problem.
5. SAS PRCO SQL code example to generate variables
1. The first variable: whether there is more than 1 of IP address used within X days
from the same customer.
%macro rule_IP (out=, datetime=, var=);
proc sql;
create table &out as
select distinct a.*,count(distinct b.ip_address) as count ,
case
when(calculated count =1) then 0
else 1
end as &var
from fraud as a, fraud as b
where a.account_number=b.account_number and
from fraud as a, fraud as b
where a.account_number=b.account_number and
b.datetime between a.datetime&datetime and a.datetime
group by a.obs
;
quit;
%mend;
%rule_IP (out=test_1, datetime=-172800, var= rule_IP_1);/*172800 is the number of seconds for 2 days*/
%rule_IP (out=test_2, datetime=-432000, var= rule_IP_2);/*432000 is the number of seconds for 5 days*/
%rule_IP (out=test_3, datetime=-604800, var= rule_IP_3);/*604800 is the number of seconds for 7 days*/
This variable provide the past behavior patter regarding to how customer connect with the IP address when made purchase
online. From this exercise, three dummy coded variables created with different time period. User can create any days in this
SAS PROC SQL code. User can choose using different ways to indentify the days, rather using seconds as I did in this example.
6. SAS PRCO SQL code example to generate the variable
2. The second variable, whether IP Address is matched with the latest IP Address
between X days from the same customer with non-fraud status in latest transaction.
data fraud; set fraud; by account_number datetime;Lag1_Fraud_Statu=lag(Fraud_Status); if first.account_number then Lag1_Fraud_Statu=.; run;
%macro rule_good_IP (out=, datetime=, var=);
proc sql ;
create table &out as
select distinct a.*,
sum(case when b.Lag1_Fraud_Status=1 then 1 else 0 end) as lag_fraud,
count(b.obs) as count_obs,
count(distinct b.ip_address) as count_ip ,
case when
(calculated count_obs gt 1 and calculated lag_fraud >=1 )or
(calculated count_obs gt 1 and calculated count_ip >1 )
then 1 else 0 end as &var
from fraud as a, fraud as b
where a.account_number=b.account_number and
b.datetime between a.datetime&datetime and a.datetime
group by a.obs
;
quit;
%mend;
%rule_good_IP(out=test_1, datetime=-172800, var=rule_good_ip_1);/*172800 is the number of seconds for 2 days*/
%rule_good_IP(out=test_2, datetime=-432000, var=rule_good_ip_2);/*432000 is the number of seconds for 5 days*/
%rule_good_IP(out=test_3, datetime=-604800, var=rule_good_ip_3);/*604800 is the number of seconds for 7 days*/
This variable provide the past behavior patter regarding to how customer connect with the IP address when purchase online,
whether the IP address belongs to fraudster. User can create any days in this SAS PROC SQL code.
7. SAS PRCO SQL code example to generate the variable
3. The third variable: Summary of Last transaction amount is Less than X dollar amount
within Y days from the same customer.
data fraud; set fraud; by account_number datetime;lag_amt=lag(Transaction_Amount); if first.account_number then lag_amt=.; run;
%macro rule_transaction (out=, datetime=, var=, amt=, amt1=);
proc sql ;
create table &out as
select distinct a.*,
count(b.obs) as count_obs,
sum(b.lag_amt) as sum_lag_amount,
case when
(&amt< calculated sum_lag_amount<=&amt1 ) and
calculated count_obs gt 1
then 1 else 0 end as &var
from fraud as a ,fraud as bfrom fraud as a ,fraud as b
where a.account_number=b.account_number and
b.datetime between a.datetime&datetime and a.datetime
group by a.obs
;
quit;
%mend;
%rule_transaction(out=test_1, datetime=-432000, var=rule_19_11, amt=0, amt1=2000); /*within 5 day, last transaction amount from same customer
less or equal to 2000 dollors*/
%rule_transaction(out=test_2, datetime=-864000, var=rule_19_12, amt=0, amt1=2000); /*within 10 day, last transaction amount from same customer
less or equal to 2000 dollors*/
%rule_transaction(out=test_3, datetime=-1728000, var=rule_19_13,amt=0, amt1=2000); /*within 20 day, last transaction amount from same customer
less or equal to 2000 dollors*/
This variable provide the past behavior patter regarding to how customer spending when purchase online. User can create any days or
any transaction dollar amount in this SAS PROC SQL code.
8. Conclusion
By adopting the similiar PROC SQL method of creating the changing time variable in
longitudinal data, analyst can create multi dimensional changing time variables
utilize information on historical data to capture abnormal behavior to detect and
prevent fraud.
For any comment or question, please contact Kaitlyn Hu.For any comment or question, please contact Kaitlyn Hu.
Kaitlyn.S.Hu@gmail.com