1. Churn Prediction in Telecommunication
Presenter : Nabi Shaikh & Shridhi Pandya
2. • Problem Statement
• Given data Set with Description
• Pre-processing Stage
• Feature Extraction
• Machine Learning Stage
• Result & Analysis
Agenda
3. • A term used to representthe loss of a customer is churn.
• Customers become “churners” when they discontinue their subscription and move their
business to a competitor.
• Churning is the process of customer turnover. This is a major concern for companies with
many customers who can easily switch to other competitors.
• Examples include credit card issuers, insurancecompaniesand telecommunication companies.
• Using a customer data, companiesthese days are saving a business by identifying the customers
which are likely to churn in near future and give a clear view what step should be taken into
consideration forretaining their existing customer.
• Customer churns are often consideredas a Downtime to any leading Organization in any
industrial section, here we are talking Customer churn in Telecomm industry
Problem Statement 1/6
4. • Three .Csv File’s were given for Every Month for 2014 and 2015
• Monthly Pay Data
• Monthly Usage Data
• Monthly Voice Data
• Disconnection date
Data Descriptions 2/6
5. Attribute Names Description
ID Customer Unique Identity
2G_IND
2G Indicator [1 or 0]
1-Subscribed
0-Not Subscribed
3G_IND
3G Indicator [1 or 0]
1-Subscribed
0-Not Subscribed
VOL_2G
2G
Volume of Data Consumed by Subscriber
VOL_3G
3G
Volume of Data Consumed by Subscriber
Attributes Description – Usage Csv File
6. Attributes Description – Payment Csv File
Attribute Names Description
ID Customer Unique Identity
Acct_actv_date Card Application Date
Mob_actv_date Customer subscription Date
Total_Bill Total Monthly Bill
RentalCharge Rental Charges Given by Customer Monthly
Bases
Non-RentalCharge NRental Charges Given by Customer Monthly
Bases
Attributes Description – Disconnection Csv File
Attribute Names Description
ID Customer Unique Identity
Disconnection Date Date at which Customer disconnected the
7. Attributes Description – Voice Csv File
Attribute Names Description
ID Customer Unique Identity
LOC_OG_MOU Local Outgoing Call Minutes of Usage
LOC_IC_MOU Local Incoming Call Minutes of Usage
STD_OG_MOU STD Outgoing Call Minutes of Usage
STD_IC_MOU STD Incoming Call Minutes of Usage
ISD_OG_MOU International Outgoing Call Minutes of Usage
ISD_IC_MOU International Incoming Call Minutes of Usage
ROAM_OG_MOU Roaming Outgoing Call Minutes of Usage
ROAM_IC_MOU Roaming Incoming Call Minutes of Usage
TOT_IC_MOU Total Incoming Call Minutes of Usage
TOT_OG_MOU Total Outgoing Call Minutes of Usage
8. Pre-processing 3/6
• Pre processing stage is a Exploratory data analysis(EDA) is an approach to analyzing
dataset to summarize their main characteristics,often with visual methods.
• Identifying Na , Nan’s in the dataset.
• EDA focuses more narrowly onhandling missing values and making transformations of
variablesas needed.
• Identifying Outlier.
• For missing and Na value identificationwe use several packagesfor having good visualization.
• PackagesUsed For Identifying the Missing Value Amelia for Visualization Next slide->
9. Amelia Package for Visualization Missing Value
R Code :
> missmap (dataframe , Legend = “ Missing NA’s”)
10. Outlier Detection
• Summary the Data Frame by Summary(Dataframe_name).
• Box-plot.
• Histogram.
• Outliers were replaced by 0.95% and 0.05% of the particular feature.
11. Feature Extraction 4/6
• Feature extraction plays an important role in determining the performance of
predictive models.
• The Variable which we considered for asNew significant variable were derived by using
Mutate Function from Dplyr Package
• Some of the Derived Variable are :
• Total Bill of Customer for July-2014to November-2014
• Total 2G Volume Consumption for July-2014to November-2014
• Total 3G Volume Consumption for July-2014to November-2014
• In all 25 Derived Variables were derived ,Output of GLM is shown in Next Slide