Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Pds assignment 2 presentation
1. Identifying Default of Credit Card Payments
- Vikas Virani(s3715555)
- Salina Bharthu(s3736867)
Banks dealing with the risk of potential of customer to repay credit card bills
Study aims at providing model to predict likelihood of default payment
Classifying probability of default payment using KNN and Decision tree algorithms.
Data Description:
Data source: UCI machine learning repository( https://archive.ics.uci.edu/ml/machine-learning-databases/00350/)
Dataset contains 30000 observations and 24 variables.
Holds customers’ personal details such as(age, education, gender, marital status)
Holds financial details such as(balance limit, previous payment status, previous bill amounts, previous amount paid, default payment status)
Data Preparation:
Checking data against missing values, impossible values and outliers.
- No missing value present
- Outliers are removed using Z-score approach
- Impossible values replaced by nearest possible value
Assigning appropriate datatype to categorical variables.
Display summary of nominal variables and numerical variables.
2. Data Exploration:
Visualizing individual variables to find the trends
Visualizing pair of variables to explore hypothesis
- Does personal details (such as age, sex, education, marital status) or balance limit or past payment status impacts
the chances of default?
- Age: Relatively similar chances of default in all age groups
- Sex: Female customers are more probable to default as compared to male.
- Education: Higher chances of default for university graduate as compared to school and high school graduate respectively.
- Marital status: Similar chances of default for married, single and others.
- Balance limit: The bill amount is slightly higher for non-default cards as compared to default cards.
- Previous Payment Status: The chances of default are increased when there is a delay in previous payments for even 1 month.
- Do spending habits vary for different age groups, sex or marital status?
- Age : People below the age of 24 tend to spend less money overall. Whereas, spending habits are equally distributed among other age groups, people
from 55- 65 tend to spend more.
- Sex: Spending habits does not vary in gender.
- Marital status: Married people tend to spend more than others
- Do Payment delays are dependent on Age, Gender or Marital Status of the Person?
- Age: payment status follows similar trend in all age groups
- Gender: Women are having higher proportion of chances of payment delay than men.
- Marital status: Higher number of single card holders having payment delays as compared to married and others.
3. Feature Selection:
Using MinMaxScaler() to set all features in the same scale
Applying F1 Score technique to select the best features of dataset
Data Modelling:
Splitting up data into training and test dataset.
Applying KNN and Decision tree classifiers.
Applying Resampling technique to balance the dataset in terms of target feature values.
Using GridSearchCV to find best suitable combination of attribute values of classifier.
Validating model using confusion matrix, Classification report, accuracy score and error rate.
Conclusion:
To effectively classify unseen data for Default Payment Status, Decision tree classifier works better.
Classification Algorithm Training- test data
proportion
Accuracy score Error rate
K nearest neighbour 80% - 20% 0.742 0.258
K nearest neighbour 60% - 40% 0.726 0.274
K nearest neighbour 50% - 50% 0.732 0.268
Decision Tree 80% - 20% 0.800 0.200
Decision Tree 60% - 40% 0.803 0.197
Decision Tree 50% - 50% 0.789 0.211