The main target of this project is that it enables the telemarketing team to prioritize targeting for term loan marketing program by adopting a data-driven approach in Machine learning.
3. Data Source and Objective
â–Ş This dataset is based on "Bank Marketing" UCI
dataset
â–Ş The full description along with dataset is available
here :
http://archive.ics.uci.edu/ml/datasets/Bank+Marketi
ng
â–Ş This dataset is enriched with a few social and
economic attributes
â–Ş Due to confidentiality clauses all attributes are
not mentioned
â–Ş The binary classification goal is to predict if the
client will subscribe a bank term deposit
3
Visit: Learnbay.co
4. Data Description
4
Variable Description
Age Age of Customer
Job
Type of Job (Categorical : “admin”, ”blue-collar”, “entrepreneur”, “housemaid”, ”management”, "retired","self-
employed","services","student","technician","unemployed","unknown")
Marital marital status(categorical:"divorced","married","single","unknown")
education (categorical: “basic.4y”, “basic.6y”, “basic.9y”, “high.school”, “illiterate”, “professional.course”, “university.degree”, “unknown”)
default default: has credit in default? (categorical: “no”, “yes”, “unknown”)
housing housing: has housing loan? (categorical: “no”, “yes”, “unknown”)
loan loan: has personal loan? (categorical: “no”, “yes”, “unknown”)
contact contact: contact communication type (categorical: “cellular”, “telephone”)
month month: last contact month of year (categorical: “jan”, “feb”, “mar”, …, “nov”, “dec”)
day_of_week day_of_week: last contact day of the week (categorical: “mon”, “tue”, “wed”, “thu”, “fri”)
duration duration: last contact duration, in seconds (numeric).
campaign campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
pdays
pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not
previously contacted)
previous previous: number of contacts performed before this campaign and for this client (numeric)
poutcome poutcome: outcome of the previous marketing campaign (categorical: “failure”, “nonexistent”,“success”)
emp.var.rate emp.var.rate: employment variation rate — (numeric)
cons.price.idx cons.price.idx: consumer price index — (numeric)
cons.conf.idx cons.conf.idx: consumer confidence index — (numeric)
euribor3m euribor3m: euribor 3 month rate — (numeric)
nr.employed nr.employed: number of employees — (numeric)
y target variable - has the client subscribed to term deposit (1/0) Visit: Learnbay.co
5. Exploratory Data
Analysis
Data Understanding – Univariate
Analysis
â–Ş How well populated is the data?
â–Ş How much variation is there in the variables given to
you?
â–Ş What are the unique levels for the categorical variables
â–Ş What is the proportion of missing data for the given raw
variables? Discard variables that are more than 25%
missing in values
â–Ş Missing Value Imputation Methods : Mean for Numeric
and Mode for Categorical 5
Visit: Learnbay.co
6. Bi-Variate Plots
Visualizations to reveal Bi-Variate
data patterns and relationships
â–Ş Can you spot a concentration of higher than average
proportion of term deposit clients with a feature?
â–Ş What can be an appropriate grouping logic of
different features, based on linear trend of term
deposit rates?
â–Ş Are there extreme values in the predictor variables?
How do we decide the capping and flooring points
for features
6
Visit: Learnbay.co
7. Insights from Bi-Variate Plots
Based on the nature of the Bi-Variate Plots , we determine optimal predictors and
identify variable interactions.
â–Ş Cross Tabulate categorical variables with respect to
term deposit rates
â–Ş Within the cross tabulation are there segments
where the term deposit rates are higher than
average?
â–Ş Can they be combined to identify the interactions
that can be strong discriminators?
Optimal Features
â–Ş Based on discrimination of Term Deposit rates , we
can which features are good predictors
â–Ş Capturing an example below
7
Interaction Variables
Visit: Learnbay.co
8. Classification Methodologies to Consider
Bi-directional approach : Strong classifier algorithms (Statistical and ML Based)
can be tested and results compared for final deployment
â–Ş Leverage Linearized Features to satisfy
assumptions of Logistic Regression
â–Ş LR will estimate the likelihood of the event and
utilize the link function (Logit) for computation
â–Ş Flexibility to create custom target segment based
on predicted probability for each client
Tree Based ML Algorithm
â–Ş Develops a tree like structure across parent and child
nodes
â–Ş Generates a set of rules that can be visually
interpreted and readily deployed for decision making
â–Ş Based on classification capabilities, strategies can be
designed on the ML Tree Rules to derive optimal
business benefit
8
Binomial LR Algorithm
Visit: Learnbay.co
9. A bit about both Methods
▪ Dependent Variable – Dichotomous from Binomial
Distribution
â–Ş Relates Log of Odds to a Linear Combination of
Predictors
â–Ş Final model have statistically significant predictors
Tree Based ML Algorithm
9
Binomial LR Algorithm
Predicted Probabilities
Describe a Sigmoidal
Curve
▪ Dependent Variable – Dichotomous from Binomial
Distribution
â–Ş The goal is to form child nodes so that the node
impurity is reduced
â–Ş The higher the difference in impurity between
parent and child nodes, the better the split.
â–Ş Example : Class with p(event)=0.5 is most impure
while Class with p(event)=100 is the most pure
Impurity Measures
for C Classes in
Target Variable
Visit: Learnbay.co
11. How to enable strategic prioritization
Achieve Campaign Optimization through Profiling and Priority Ordering
â–Ş Develop Target profiles
â–Ş Utilize features that helps build up custom
audiences
Prioritization
â–Ş Develop target segments within every group or
profiles
â–Ş Attach a priority order for the Telemarketing team to
leverage
11
Characterization
Visit: Learnbay.co