SlideShare a Scribd company logo
1 of 8
Download to read offline
Summary
The objective of this presentation is to learn how to use SAS to create
variables on longitudinal data for Third Party Transaction Fraud Modeling
purpose, as well as the type of variables used for Fraud Detecting & Prevention
model.
The Fraud model development sample usually is structured as longitudinal
data, or panel data. The panel data structure allows us to examine and comparedata, or panel data. The panel data structure allows us to examine and compare
response over time. The panel data usually consists of a large number of a short
series of time points.
Generally speaking, Fraud model could utilize information on historical data,
which provide information on past tendencies in order to capture abnormal
behavior to detect and prevent fraud.
Data Structure on Historical Transactions
Obs Account_Number DateTime IP_Address Fraud_Status Transaction_Amount
1 20001 11JUL14:11:28:22 216.82.XXX.33 0 500
2 20001 15JUL14:09:15:02 216.82.XXX.33 0 800
3 20001 19JUL14:12:10:03 216.82.XXX.33 0 1000
4 20001 25JUL14:16:20:23 216.82.XXX.33 0 600
5 20001 12AUG14:08:07:05 257.72.XXX.45 1 5000
6 20001 15AUG14:16:12:28 216.82.XXX.33 0 700
7 20002 13JUL14:18:02:25 325.82.XXX.44 0 300
8 20002 15JUL14:20:16:01 325.82.XXX.44 0 200
9 20002 28JUL14:14:06:02 159.21.XXX.25 1 2000
10 20002 18AUG14:15:05:45 325.82.XXX.44 0 400
Table Name: Fraud
In this example, the model purpose is to predict the future online credit card transaction fraud.
For the sake of simplicity, the online transaction panel data shown in above table only contains 2 customers with
2 distinct account number 20001 and 20002. The online transactions happened during July and August 2014. There
are total 10 transaction and only 2 confirmed fraud transaction happed on 08/12/2014 for account number 20001
and 07/28/2014 for account number 20002. The table also contains the information of IP address and transaction
amount during each transaction. In reality, since Fraud model development data contains each transaction data,
the development sample most likely would be extremely large dataset, including many different aspects of
variables.
In order obtain past behavior, we can create variables with historical time period involved variables.
10 20002 18AUG14:15:05:45 325.82.XXX.44 0 400
Type of Variables
Based on the historical information provided by the online credit card transaction
panel data, we can create the following variables in order to capture the fraud
behavior pattern. For example
1. whether there is more than 1 of IP address used within X days from the same
customer.
2. whether IP Address is matched with the latest IP Address between X days
from the same customer with non-fraud status in latest transaction.from the same customer with non-fraud status in latest transaction.
3. Whether summary of Last transaction amount is Less than X dollar amount
within Y days from the same customer.
When I first tried to create this type of changing time variables, I tried to use
complex SAS macro to solve the problem. Later I figured out SAS PRCO SQL
statement will solve most of the changing time variable problem.
SAS PRCO SQL code example to generate variables
1. The first variable: whether there is more than 1 of IP address used within X days
from the same customer.
%macro rule_IP (out=, datetime=, var=);
proc sql;
create table &out as
select distinct a.*,count(distinct b.ip_address) as count ,
case
when(calculated count =1) then 0
else 1
end as &var
from fraud as a, fraud as b
where a.account_number=b.account_number and
from fraud as a, fraud as b
where a.account_number=b.account_number and
b.datetime between a.datetime&datetime and a.datetime
group by a.obs
;
quit;
%mend;
%rule_IP (out=test_1, datetime=-172800, var= rule_IP_1);/*172800 is the number of seconds for 2 days*/
%rule_IP (out=test_2, datetime=-432000, var= rule_IP_2);/*432000 is the number of seconds for 5 days*/
%rule_IP (out=test_3, datetime=-604800, var= rule_IP_3);/*604800 is the number of seconds for 7 days*/
This variable provide the past behavior patter regarding to how customer connect with the IP address when made purchase
online. From this exercise, three dummy coded variables created with different time period. User can create any days in this
SAS PROC SQL code. User can choose using different ways to indentify the days, rather using seconds as I did in this example.
SAS PRCO SQL code example to generate the variable
2. The second variable, whether IP Address is matched with the latest IP Address
between X days from the same customer with non-fraud status in latest transaction.
data fraud; set fraud; by account_number datetime;Lag1_Fraud_Statu=lag(Fraud_Status); if first.account_number then Lag1_Fraud_Statu=.; run;
%macro rule_good_IP (out=, datetime=, var=);
proc sql ;
create table &out as
select distinct a.*,
sum(case when b.Lag1_Fraud_Status=1 then 1 else 0 end) as lag_fraud,
count(b.obs) as count_obs,
count(distinct b.ip_address) as count_ip ,
case when
(calculated count_obs gt 1 and calculated lag_fraud >=1 )or
(calculated count_obs gt 1 and calculated count_ip >1 )
then 1 else 0 end as &var
from fraud as a, fraud as b
where a.account_number=b.account_number and
b.datetime between a.datetime&datetime and a.datetime
group by a.obs
;
quit;
%mend;
%rule_good_IP(out=test_1, datetime=-172800, var=rule_good_ip_1);/*172800 is the number of seconds for 2 days*/
%rule_good_IP(out=test_2, datetime=-432000, var=rule_good_ip_2);/*432000 is the number of seconds for 5 days*/
%rule_good_IP(out=test_3, datetime=-604800, var=rule_good_ip_3);/*604800 is the number of seconds for 7 days*/
This variable provide the past behavior patter regarding to how customer connect with the IP address when purchase online,
whether the IP address belongs to fraudster. User can create any days in this SAS PROC SQL code.
SAS PRCO SQL code example to generate the variable
3. The third variable: Summary of Last transaction amount is Less than X dollar amount
within Y days from the same customer.
data fraud; set fraud; by account_number datetime;lag_amt=lag(Transaction_Amount); if first.account_number then lag_amt=.; run;
%macro rule_transaction (out=, datetime=, var=, amt=, amt1=);
proc sql ;
create table &out as
select distinct a.*,
count(b.obs) as count_obs,
sum(b.lag_amt) as sum_lag_amount,
case when
(&amt< calculated sum_lag_amount<=&amt1 ) and
calculated count_obs gt 1
then 1 else 0 end as &var
from fraud as a ,fraud as bfrom fraud as a ,fraud as b
where a.account_number=b.account_number and
b.datetime between a.datetime&datetime and a.datetime
group by a.obs
;
quit;
%mend;
%rule_transaction(out=test_1, datetime=-432000, var=rule_19_11, amt=0, amt1=2000); /*within 5 day, last transaction amount from same customer
less or equal to 2000 dollors*/
%rule_transaction(out=test_2, datetime=-864000, var=rule_19_12, amt=0, amt1=2000); /*within 10 day, last transaction amount from same customer
less or equal to 2000 dollors*/
%rule_transaction(out=test_3, datetime=-1728000, var=rule_19_13,amt=0, amt1=2000); /*within 20 day, last transaction amount from same customer
less or equal to 2000 dollors*/
This variable provide the past behavior patter regarding to how customer spending when purchase online. User can create any days or
any transaction dollar amount in this SAS PROC SQL code.
Conclusion
By adopting the similiar PROC SQL method of creating the changing time variable in
longitudinal data, analyst can create multi dimensional changing time variables
utilize information on historical data to capture abnormal behavior to detect and
prevent fraud.
For any comment or question, please contact Kaitlyn Hu.For any comment or question, please contact Kaitlyn Hu.
Kaitlyn.S.Hu@gmail.com

More Related Content

Similar to Variables Creation using SAS on Longitudinal Data for Fraud Models

Fraud prevention is better with TigerGraph inside
Fraud prevention is better with  TigerGraph insideFraud prevention is better with  TigerGraph inside
Fraud prevention is better with TigerGraph insideTigerGraph
 
Databricks-EN-2.pdf
Databricks-EN-2.pdfDatabricks-EN-2.pdf
Databricks-EN-2.pdfrutgermcgeek
 
Recorded Future News Analytics for Financial Services
Recorded Future News Analytics for Financial ServicesRecorded Future News Analytics for Financial Services
Recorded Future News Analytics for Financial ServicesChris Holden
 
License DSL translation in COMPAS framework
License DSL translation in COMPAS frameworkLicense DSL translation in COMPAS framework
License DSL translation in COMPAS frameworkCuddle.ai
 
IRJET- Credit Card Fraud Detection using Machine Learning
IRJET- Credit Card Fraud Detection using Machine LearningIRJET- Credit Card Fraud Detection using Machine Learning
IRJET- Credit Card Fraud Detection using Machine LearningIRJET Journal
 
IRJET- Predicting Bitcoin Prices using Convolutional Neural Network Algor...
IRJET-  	  Predicting Bitcoin Prices using Convolutional Neural Network Algor...IRJET-  	  Predicting Bitcoin Prices using Convolutional Neural Network Algor...
IRJET- Predicting Bitcoin Prices using Convolutional Neural Network Algor...IRJET Journal
 
Survey on An effective database tampering system with veriable computation a...
Survey on An effective  database tampering system with veriable computation a...Survey on An effective  database tampering system with veriable computation a...
Survey on An effective database tampering system with veriable computation a...IRJET Journal
 
5-minute Practical Streaming Techniques that can Save You Millions
5-minute Practical Streaming Techniques that can Save You Millions5-minute Practical Streaming Techniques that can Save You Millions
5-minute Practical Streaming Techniques that can Save You MillionsHostedbyConfluent
 
IRJET- Guarded Remittance System Employing WANET for Catastrophe Region
IRJET-  	  Guarded Remittance System Employing WANET for Catastrophe RegionIRJET-  	  Guarded Remittance System Employing WANET for Catastrophe Region
IRJET- Guarded Remittance System Employing WANET for Catastrophe RegionIRJET Journal
 
Online Transaction Fraud Detection System Based on Machine Learning
Online Transaction Fraud Detection System Based on Machine LearningOnline Transaction Fraud Detection System Based on Machine Learning
Online Transaction Fraud Detection System Based on Machine LearningIRJET Journal
 
Multi Bank Atm Family Card: Integration of Multi Bank Multiple User in Single...
Multi Bank Atm Family Card: Integration of Multi Bank Multiple User in Single...Multi Bank Atm Family Card: Integration of Multi Bank Multiple User in Single...
Multi Bank Atm Family Card: Integration of Multi Bank Multiple User in Single...IRJET Journal
 
Bitcoin Price Prediction Using LSTM
Bitcoin Price Prediction Using LSTMBitcoin Price Prediction Using LSTM
Bitcoin Price Prediction Using LSTMIRJET Journal
 
A Research Paper on Credit Card Fraud Detection
A Research Paper on Credit Card Fraud DetectionA Research Paper on Credit Card Fraud Detection
A Research Paper on Credit Card Fraud DetectionIRJET Journal
 
Scaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabScaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabRoman
 
Software Engineering Testing & Research
Software Engineering Testing & Research Software Engineering Testing & Research
Software Engineering Testing & Research Vrushali Lanjewar
 
Real timefrauddetectiononbigdata
Real timefrauddetectiononbigdataReal timefrauddetectiononbigdata
Real timefrauddetectiononbigdataPranab Ghosh
 
IRJET - Precise and Efficient Processing of Data in Permissioned Blockchain
IRJET - Precise and Efficient Processing of Data in Permissioned BlockchainIRJET - Precise and Efficient Processing of Data in Permissioned Blockchain
IRJET - Precise and Efficient Processing of Data in Permissioned BlockchainIRJET Journal
 

Similar to Variables Creation using SAS on Longitudinal Data for Fraud Models (20)

Fraud prevention is better with TigerGraph inside
Fraud prevention is better with  TigerGraph insideFraud prevention is better with  TigerGraph inside
Fraud prevention is better with TigerGraph inside
 
Databricks-EN-2.pdf
Databricks-EN-2.pdfDatabricks-EN-2.pdf
Databricks-EN-2.pdf
 
Recorded Future News Analytics for Financial Services
Recorded Future News Analytics for Financial ServicesRecorded Future News Analytics for Financial Services
Recorded Future News Analytics for Financial Services
 
License DSL translation in COMPAS framework
License DSL translation in COMPAS frameworkLicense DSL translation in COMPAS framework
License DSL translation in COMPAS framework
 
IRJET- Credit Card Fraud Detection using Machine Learning
IRJET- Credit Card Fraud Detection using Machine LearningIRJET- Credit Card Fraud Detection using Machine Learning
IRJET- Credit Card Fraud Detection using Machine Learning
 
IRJET- Predicting Bitcoin Prices using Convolutional Neural Network Algor...
IRJET-  	  Predicting Bitcoin Prices using Convolutional Neural Network Algor...IRJET-  	  Predicting Bitcoin Prices using Convolutional Neural Network Algor...
IRJET- Predicting Bitcoin Prices using Convolutional Neural Network Algor...
 
Survey on An effective database tampering system with veriable computation a...
Survey on An effective  database tampering system with veriable computation a...Survey on An effective  database tampering system with veriable computation a...
Survey on An effective database tampering system with veriable computation a...
 
5-minute Practical Streaming Techniques that can Save You Millions
5-minute Practical Streaming Techniques that can Save You Millions5-minute Practical Streaming Techniques that can Save You Millions
5-minute Practical Streaming Techniques that can Save You Millions
 
IRJET- Guarded Remittance System Employing WANET for Catastrophe Region
IRJET-  	  Guarded Remittance System Employing WANET for Catastrophe RegionIRJET-  	  Guarded Remittance System Employing WANET for Catastrophe Region
IRJET- Guarded Remittance System Employing WANET for Catastrophe Region
 
Online Transaction Fraud Detection System Based on Machine Learning
Online Transaction Fraud Detection System Based on Machine LearningOnline Transaction Fraud Detection System Based on Machine Learning
Online Transaction Fraud Detection System Based on Machine Learning
 
Multi Bank Atm Family Card: Integration of Multi Bank Multiple User in Single...
Multi Bank Atm Family Card: Integration of Multi Bank Multiple User in Single...Multi Bank Atm Family Card: Integration of Multi Bank Multiple User in Single...
Multi Bank Atm Family Card: Integration of Multi Bank Multiple User in Single...
 
Bitcoin Price Prediction Using LSTM
Bitcoin Price Prediction Using LSTMBitcoin Price Prediction Using LSTM
Bitcoin Price Prediction Using LSTM
 
A Research Paper on Credit Card Fraud Detection
A Research Paper on Credit Card Fraud DetectionA Research Paper on Credit Card Fraud Detection
A Research Paper on Credit Card Fraud Detection
 
Scaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabScaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at Grab
 
Software Engineering Testing & Research
Software Engineering Testing & Research Software Engineering Testing & Research
Software Engineering Testing & Research
 
1893 1896
1893 18961893 1896
1893 1896
 
1893 1896
1893 18961893 1896
1893 1896
 
WSO2 Complex Event Processor
WSO2 Complex Event ProcessorWSO2 Complex Event Processor
WSO2 Complex Event Processor
 
Real timefrauddetectiononbigdata
Real timefrauddetectiononbigdataReal timefrauddetectiononbigdata
Real timefrauddetectiononbigdata
 
IRJET - Precise and Efficient Processing of Data in Permissioned Blockchain
IRJET - Precise and Efficient Processing of Data in Permissioned BlockchainIRJET - Precise and Efficient Processing of Data in Permissioned Blockchain
IRJET - Precise and Efficient Processing of Data in Permissioned Blockchain
 

Variables Creation using SAS on Longitudinal Data for Fraud Models

  • 1.
  • 2. Summary The objective of this presentation is to learn how to use SAS to create variables on longitudinal data for Third Party Transaction Fraud Modeling purpose, as well as the type of variables used for Fraud Detecting & Prevention model. The Fraud model development sample usually is structured as longitudinal data, or panel data. The panel data structure allows us to examine and comparedata, or panel data. The panel data structure allows us to examine and compare response over time. The panel data usually consists of a large number of a short series of time points. Generally speaking, Fraud model could utilize information on historical data, which provide information on past tendencies in order to capture abnormal behavior to detect and prevent fraud.
  • 3. Data Structure on Historical Transactions Obs Account_Number DateTime IP_Address Fraud_Status Transaction_Amount 1 20001 11JUL14:11:28:22 216.82.XXX.33 0 500 2 20001 15JUL14:09:15:02 216.82.XXX.33 0 800 3 20001 19JUL14:12:10:03 216.82.XXX.33 0 1000 4 20001 25JUL14:16:20:23 216.82.XXX.33 0 600 5 20001 12AUG14:08:07:05 257.72.XXX.45 1 5000 6 20001 15AUG14:16:12:28 216.82.XXX.33 0 700 7 20002 13JUL14:18:02:25 325.82.XXX.44 0 300 8 20002 15JUL14:20:16:01 325.82.XXX.44 0 200 9 20002 28JUL14:14:06:02 159.21.XXX.25 1 2000 10 20002 18AUG14:15:05:45 325.82.XXX.44 0 400 Table Name: Fraud In this example, the model purpose is to predict the future online credit card transaction fraud. For the sake of simplicity, the online transaction panel data shown in above table only contains 2 customers with 2 distinct account number 20001 and 20002. The online transactions happened during July and August 2014. There are total 10 transaction and only 2 confirmed fraud transaction happed on 08/12/2014 for account number 20001 and 07/28/2014 for account number 20002. The table also contains the information of IP address and transaction amount during each transaction. In reality, since Fraud model development data contains each transaction data, the development sample most likely would be extremely large dataset, including many different aspects of variables. In order obtain past behavior, we can create variables with historical time period involved variables. 10 20002 18AUG14:15:05:45 325.82.XXX.44 0 400
  • 4. Type of Variables Based on the historical information provided by the online credit card transaction panel data, we can create the following variables in order to capture the fraud behavior pattern. For example 1. whether there is more than 1 of IP address used within X days from the same customer. 2. whether IP Address is matched with the latest IP Address between X days from the same customer with non-fraud status in latest transaction.from the same customer with non-fraud status in latest transaction. 3. Whether summary of Last transaction amount is Less than X dollar amount within Y days from the same customer. When I first tried to create this type of changing time variables, I tried to use complex SAS macro to solve the problem. Later I figured out SAS PRCO SQL statement will solve most of the changing time variable problem.
  • 5. SAS PRCO SQL code example to generate variables 1. The first variable: whether there is more than 1 of IP address used within X days from the same customer. %macro rule_IP (out=, datetime=, var=); proc sql; create table &out as select distinct a.*,count(distinct b.ip_address) as count , case when(calculated count =1) then 0 else 1 end as &var from fraud as a, fraud as b where a.account_number=b.account_number and from fraud as a, fraud as b where a.account_number=b.account_number and b.datetime between a.datetime&datetime and a.datetime group by a.obs ; quit; %mend; %rule_IP (out=test_1, datetime=-172800, var= rule_IP_1);/*172800 is the number of seconds for 2 days*/ %rule_IP (out=test_2, datetime=-432000, var= rule_IP_2);/*432000 is the number of seconds for 5 days*/ %rule_IP (out=test_3, datetime=-604800, var= rule_IP_3);/*604800 is the number of seconds for 7 days*/ This variable provide the past behavior patter regarding to how customer connect with the IP address when made purchase online. From this exercise, three dummy coded variables created with different time period. User can create any days in this SAS PROC SQL code. User can choose using different ways to indentify the days, rather using seconds as I did in this example.
  • 6. SAS PRCO SQL code example to generate the variable 2. The second variable, whether IP Address is matched with the latest IP Address between X days from the same customer with non-fraud status in latest transaction. data fraud; set fraud; by account_number datetime;Lag1_Fraud_Statu=lag(Fraud_Status); if first.account_number then Lag1_Fraud_Statu=.; run; %macro rule_good_IP (out=, datetime=, var=); proc sql ; create table &out as select distinct a.*, sum(case when b.Lag1_Fraud_Status=1 then 1 else 0 end) as lag_fraud, count(b.obs) as count_obs, count(distinct b.ip_address) as count_ip , case when (calculated count_obs gt 1 and calculated lag_fraud >=1 )or (calculated count_obs gt 1 and calculated count_ip >1 ) then 1 else 0 end as &var from fraud as a, fraud as b where a.account_number=b.account_number and b.datetime between a.datetime&datetime and a.datetime group by a.obs ; quit; %mend; %rule_good_IP(out=test_1, datetime=-172800, var=rule_good_ip_1);/*172800 is the number of seconds for 2 days*/ %rule_good_IP(out=test_2, datetime=-432000, var=rule_good_ip_2);/*432000 is the number of seconds for 5 days*/ %rule_good_IP(out=test_3, datetime=-604800, var=rule_good_ip_3);/*604800 is the number of seconds for 7 days*/ This variable provide the past behavior patter regarding to how customer connect with the IP address when purchase online, whether the IP address belongs to fraudster. User can create any days in this SAS PROC SQL code.
  • 7. SAS PRCO SQL code example to generate the variable 3. The third variable: Summary of Last transaction amount is Less than X dollar amount within Y days from the same customer. data fraud; set fraud; by account_number datetime;lag_amt=lag(Transaction_Amount); if first.account_number then lag_amt=.; run; %macro rule_transaction (out=, datetime=, var=, amt=, amt1=); proc sql ; create table &out as select distinct a.*, count(b.obs) as count_obs, sum(b.lag_amt) as sum_lag_amount, case when (&amt< calculated sum_lag_amount<=&amt1 ) and calculated count_obs gt 1 then 1 else 0 end as &var from fraud as a ,fraud as bfrom fraud as a ,fraud as b where a.account_number=b.account_number and b.datetime between a.datetime&datetime and a.datetime group by a.obs ; quit; %mend; %rule_transaction(out=test_1, datetime=-432000, var=rule_19_11, amt=0, amt1=2000); /*within 5 day, last transaction amount from same customer less or equal to 2000 dollors*/ %rule_transaction(out=test_2, datetime=-864000, var=rule_19_12, amt=0, amt1=2000); /*within 10 day, last transaction amount from same customer less or equal to 2000 dollors*/ %rule_transaction(out=test_3, datetime=-1728000, var=rule_19_13,amt=0, amt1=2000); /*within 20 day, last transaction amount from same customer less or equal to 2000 dollors*/ This variable provide the past behavior patter regarding to how customer spending when purchase online. User can create any days or any transaction dollar amount in this SAS PROC SQL code.
  • 8. Conclusion By adopting the similiar PROC SQL method of creating the changing time variable in longitudinal data, analyst can create multi dimensional changing time variables utilize information on historical data to capture abnormal behavior to detect and prevent fraud. For any comment or question, please contact Kaitlyn Hu.For any comment or question, please contact Kaitlyn Hu. Kaitlyn.S.Hu@gmail.com