SlideShare a Scribd company logo
1 of 16
Churn Prediction in Telecommunication
Presenter : Nabi Shaikh & Shridhi Pandya
• Problem Statement
• Given data Set with Description
• Pre-processing Stage
• Feature Extraction
• Machine Learning Stage
• Result & Analysis
Agenda
• A term used to representthe loss of a customer is churn.
• Customers become “churners” when they discontinue their subscription and move their
business to a competitor.
• Churning is the process of customer turnover. This is a major concern for companies with
many customers who can easily switch to other competitors.
• Examples include credit card issuers, insurancecompaniesand telecommunication companies.
• Using a customer data, companiesthese days are saving a business by identifying the customers
which are likely to churn in near future and give a clear view what step should be taken into
consideration forretaining their existing customer.
• Customer churns are often consideredas a Downtime to any leading Organization in any
industrial section, here we are talking Customer churn in Telecomm industry
Problem Statement 1/6
• Three .Csv File’s were given for Every Month for 2014 and 2015
• Monthly Pay Data
• Monthly Usage Data
• Monthly Voice Data
• Disconnection date
Data Descriptions 2/6
Attribute Names Description
ID Customer Unique Identity
2G_IND
2G Indicator [1 or 0]
1-Subscribed
0-Not Subscribed
3G_IND
3G Indicator [1 or 0]
1-Subscribed
0-Not Subscribed
VOL_2G
2G
Volume of Data Consumed by Subscriber
VOL_3G
3G
Volume of Data Consumed by Subscriber
Attributes Description – Usage Csv File
Attributes Description – Payment Csv File
Attribute Names Description
ID Customer Unique Identity
Acct_actv_date Card Application Date
Mob_actv_date Customer subscription Date
Total_Bill Total Monthly Bill
RentalCharge Rental Charges Given by Customer Monthly
Bases
Non-RentalCharge NRental Charges Given by Customer Monthly
Bases
Attributes Description – Disconnection Csv File
Attribute Names Description
ID Customer Unique Identity
Disconnection Date Date at which Customer disconnected the
Attributes Description – Voice Csv File
Attribute Names Description
ID Customer Unique Identity
LOC_OG_MOU Local Outgoing Call Minutes of Usage
LOC_IC_MOU Local Incoming Call Minutes of Usage
STD_OG_MOU STD Outgoing Call Minutes of Usage
STD_IC_MOU STD Incoming Call Minutes of Usage
ISD_OG_MOU International Outgoing Call Minutes of Usage
ISD_IC_MOU International Incoming Call Minutes of Usage
ROAM_OG_MOU Roaming Outgoing Call Minutes of Usage
ROAM_IC_MOU Roaming Incoming Call Minutes of Usage
TOT_IC_MOU Total Incoming Call Minutes of Usage
TOT_OG_MOU Total Outgoing Call Minutes of Usage
Pre-processing 3/6
• Pre processing stage is a Exploratory data analysis(EDA) is an approach to analyzing
dataset to summarize their main characteristics,often with visual methods.
• Identifying Na , Nan’s in the dataset.
• EDA focuses more narrowly onhandling missing values and making transformations of
variablesas needed.
• Identifying Outlier.
• For missing and Na value identificationwe use several packagesfor having good visualization.
• PackagesUsed For Identifying the Missing Value Amelia for Visualization Next slide->
Amelia Package for Visualization Missing Value
R Code :
> missmap (dataframe , Legend = “ Missing NA’s”)
Outlier Detection
• Summary the Data Frame by Summary(Dataframe_name).
• Box-plot.
• Histogram.
• Outliers were replaced by 0.95% and 0.05% of the particular feature.
Feature Extraction 4/6
• Feature extraction plays an important role in determining the performance of
predictive models.
• The Variable which we considered for asNew significant variable were derived by using
Mutate Function from Dplyr Package
• Some of the Derived Variable are :
• Total Bill of Customer for July-2014to November-2014
• Total 2G Volume Consumption for July-2014to November-2014
• Total 3G Volume Consumption for July-2014to November-2014
• In all 25 Derived Variables were derived ,Output of GLM is shown in Next Slide
Machine Learning 5/6
Logistic Regression
70 %
30 %
Significance of Featured variable
Result & Analysis 6/6
Receiver Operating Characteristic(ROC)
Thank you

More Related Content

Similar to Churn Presentation22May2016

2012 cs-data-collection-guide
2012 cs-data-collection-guide2012 cs-data-collection-guide
2012 cs-data-collection-guide
v_rajsingh
 
Call diagnostics 2013 12 15
Call diagnostics 2013 12 15Call diagnostics 2013 12 15
Call diagnostics 2013 12 15
Martin Wright
 
PCI Solna EDB 101020 FortConsult
PCI Solna EDB 101020 FortConsultPCI Solna EDB 101020 FortConsult
PCI Solna EDB 101020 FortConsult
Jolin Löf
 
Impact Data Introduction
Impact Data IntroductionImpact Data Introduction
Impact Data Introduction
Chris Shearer
 

Similar to Churn Presentation22May2016 (20)

Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Mindfull - The Power of Predictive
Mindfull - The Power of PredictiveMindfull - The Power of Predictive
Mindfull - The Power of Predictive
 
2012 cs-data-collection-guide
2012 cs-data-collection-guide2012 cs-data-collection-guide
2012 cs-data-collection-guide
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
 
Data mining in telecommunication industry
Data mining in telecommunication industryData mining in telecommunication industry
Data mining in telecommunication industry
 
NZS-4555 - IT Analytics Keynote - IT Analytics for the Enterprise
NZS-4555 - IT Analytics Keynote - IT Analytics for the EnterpriseNZS-4555 - IT Analytics Keynote - IT Analytics for the Enterprise
NZS-4555 - IT Analytics Keynote - IT Analytics for the Enterprise
 
Big Data? Big Deal, Barclaycard
Big Data? Big Deal, Barclaycard Big Data? Big Deal, Barclaycard
Big Data? Big Deal, Barclaycard
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
 
An efficient data pre processing frame work for loan credibility prediction s...
An efficient data pre processing frame work for loan credibility prediction s...An efficient data pre processing frame work for loan credibility prediction s...
An efficient data pre processing frame work for loan credibility prediction s...
 
Call diagnostics 2013 12 15
Call diagnostics 2013 12 15Call diagnostics 2013 12 15
Call diagnostics 2013 12 15
 
ITAM Tools Day, November 2015 - Concorde
ITAM Tools Day, November 2015 - ConcordeITAM Tools Day, November 2015 - Concorde
ITAM Tools Day, November 2015 - Concorde
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
 
Data Mining in Telecommunication Industry
Data Mining in Telecommunication IndustryData Mining in Telecommunication Industry
Data Mining in Telecommunication Industry
 
An Oversight or a New Customer Phenomenon, Getting the Most of your Contact C...
An Oversight or a New Customer Phenomenon, Getting the Most of your Contact C...An Oversight or a New Customer Phenomenon, Getting the Most of your Contact C...
An Oversight or a New Customer Phenomenon, Getting the Most of your Contact C...
 
Data Mining
Data MiningData Mining
Data Mining
 
PCI Solna EDB 101020 FortConsult
PCI Solna EDB 101020 FortConsultPCI Solna EDB 101020 FortConsult
PCI Solna EDB 101020 FortConsult
 
Payments: From a commodity to a fundamental driver of eCommerce success
Payments: From a commodity to a fundamental driver of eCommerce successPayments: From a commodity to a fundamental driver of eCommerce success
Payments: From a commodity to a fundamental driver of eCommerce success
 
Churn prediction
Churn predictionChurn prediction
Churn prediction
 
Impact Data Introduction
Impact Data IntroductionImpact Data Introduction
Impact Data Introduction
 

Churn Presentation22May2016

  • 1. Churn Prediction in Telecommunication Presenter : Nabi Shaikh & Shridhi Pandya
  • 2. • Problem Statement • Given data Set with Description • Pre-processing Stage • Feature Extraction • Machine Learning Stage • Result & Analysis Agenda
  • 3. • A term used to representthe loss of a customer is churn. • Customers become “churners” when they discontinue their subscription and move their business to a competitor. • Churning is the process of customer turnover. This is a major concern for companies with many customers who can easily switch to other competitors. • Examples include credit card issuers, insurancecompaniesand telecommunication companies. • Using a customer data, companiesthese days are saving a business by identifying the customers which are likely to churn in near future and give a clear view what step should be taken into consideration forretaining their existing customer. • Customer churns are often consideredas a Downtime to any leading Organization in any industrial section, here we are talking Customer churn in Telecomm industry Problem Statement 1/6
  • 4. • Three .Csv File’s were given for Every Month for 2014 and 2015 • Monthly Pay Data • Monthly Usage Data • Monthly Voice Data • Disconnection date Data Descriptions 2/6
  • 5. Attribute Names Description ID Customer Unique Identity 2G_IND 2G Indicator [1 or 0] 1-Subscribed 0-Not Subscribed 3G_IND 3G Indicator [1 or 0] 1-Subscribed 0-Not Subscribed VOL_2G 2G Volume of Data Consumed by Subscriber VOL_3G 3G Volume of Data Consumed by Subscriber Attributes Description – Usage Csv File
  • 6. Attributes Description – Payment Csv File Attribute Names Description ID Customer Unique Identity Acct_actv_date Card Application Date Mob_actv_date Customer subscription Date Total_Bill Total Monthly Bill RentalCharge Rental Charges Given by Customer Monthly Bases Non-RentalCharge NRental Charges Given by Customer Monthly Bases Attributes Description – Disconnection Csv File Attribute Names Description ID Customer Unique Identity Disconnection Date Date at which Customer disconnected the
  • 7. Attributes Description – Voice Csv File Attribute Names Description ID Customer Unique Identity LOC_OG_MOU Local Outgoing Call Minutes of Usage LOC_IC_MOU Local Incoming Call Minutes of Usage STD_OG_MOU STD Outgoing Call Minutes of Usage STD_IC_MOU STD Incoming Call Minutes of Usage ISD_OG_MOU International Outgoing Call Minutes of Usage ISD_IC_MOU International Incoming Call Minutes of Usage ROAM_OG_MOU Roaming Outgoing Call Minutes of Usage ROAM_IC_MOU Roaming Incoming Call Minutes of Usage TOT_IC_MOU Total Incoming Call Minutes of Usage TOT_OG_MOU Total Outgoing Call Minutes of Usage
  • 8. Pre-processing 3/6 • Pre processing stage is a Exploratory data analysis(EDA) is an approach to analyzing dataset to summarize their main characteristics,often with visual methods. • Identifying Na , Nan’s in the dataset. • EDA focuses more narrowly onhandling missing values and making transformations of variablesas needed. • Identifying Outlier. • For missing and Na value identificationwe use several packagesfor having good visualization. • PackagesUsed For Identifying the Missing Value Amelia for Visualization Next slide->
  • 9. Amelia Package for Visualization Missing Value R Code : > missmap (dataframe , Legend = “ Missing NA’s”)
  • 10. Outlier Detection • Summary the Data Frame by Summary(Dataframe_name). • Box-plot. • Histogram. • Outliers were replaced by 0.95% and 0.05% of the particular feature.
  • 11. Feature Extraction 4/6 • Feature extraction plays an important role in determining the performance of predictive models. • The Variable which we considered for asNew significant variable were derived by using Mutate Function from Dplyr Package • Some of the Derived Variable are : • Total Bill of Customer for July-2014to November-2014 • Total 2G Volume Consumption for July-2014to November-2014 • Total 3G Volume Consumption for July-2014to November-2014 • In all 25 Derived Variables were derived ,Output of GLM is shown in Next Slide
  • 12. Machine Learning 5/6 Logistic Regression 70 % 30 %