SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Consumer Expenditure
Setu Chokshi
14th July 2017
Objective
• Propose one way of using the data employing one of the following
methods: regression, classification or clustering. Execute your
proposal and discuss your methodology, justify your algorithm/
feature selection and share insights from the model.
• Dataset: Consumer Expenditure Survey for 1996-‐2000 (12k rows, 220
columns)
A typical American family
This infographic summarizes the consumer demographics in the expenditure data. It provides for a very good macro overview of the
dataset and what can be expected out of it.
About Chart
2.3 vehicles per family
77% own a home
2.8 members per family
1.5 earning members per family
How much do they earn?
Description
For every dollar earned by the family members, about 78 cents are
used to pay various expenses to support and maintain the family. 20
cents are used to pay various taxes including social security.
Maintenance
About 60% of the expenses are
towards the non discretionary items
like rent, food etc.
Expenses
$40,679
Income
$53,147
Entertainment
The balance 40% is what is used for
discretionary items like Alcohol,
entertainment and travel.
Taxes
$9,962
Where does the money go?
0
50
100
$765 $1489 $1956 $2806
RentAlcohol
Tobacco
Entertainment Clothes Utilities Transport Food
$3921 $5821 $10687
Analytics
Potential questions data can answer?
Who are these people?
Who are these people? What are their demographics? Should
we customize the product for the diversity?
Targeting specific groups
Why should be target certain demographics? Why would they
buy the product from you?
Potential reach
Where should they grow the business next?
Is this necessary for them to get your product? If so how
frequently?
What motivates them to buy?
How much elasticity do they have in purchasing the product?
Would they be ok with price increases or would this product be
a battle over prices.
1. Other macro economic indicators can also be calculated as well using this data.
But since our focus is on CE goods company, we will exclude them.
Steps for the analysis
Step 04
Step 03
Step 02
Step 01
Initial Analysis
After eliminating lag variables, a pair-wise correlation
analysis was performed to id key variables.
Calculations
Calculated savings using residual & net worth methods
to identify elasticity of each demographic.
Understanding the data
K-Means to identify clusters within the groups. Decision
trees & ridge regression to understand the expenses.
Validation
Tried to understand the clusters and the data
patterns to get additional insights.
Presentation
Preparation of the results in the simplistic manner to be
presented to the Consumer goods executive team.
Demographics (using clustering)
Rich / Super Rich
3.7%
Single earner
25.7%
Singles
25.0%
Working spouse
33.1%
Widows
12.5%
59 years old
Mostly female
1 member
High school
46 years old
Mostly female
1 to 2 members
Some degree
45 years old
Mostly male
3 to 6 members
College, no degree
55 years old
Mostly male
2 to 4 members
College educated
47 years old
Fe(Male)
3 to 5 members
Bachelors degree
These were arrived using the K Means clustering algorithm. The features names were arrived on the basis of what the key separation features were for each cluster. I included the
calculated parameters of residual savings and net worth savings to be included in the clustering as well. The outliers were kept in the separate cluster and is being named as super rich
or the 0.01 percenter. Additional cluster level information can be found in the slide notes for this page.
Elasticity (expense / income)4
Widows Singles Working Spouse Single Earner
Income5
Clothes2
Alcohol / Tobacco
Entertainment
Residual Savings1
Net worth savings
19$
42$
70$
42$
38%
10%
2%
$18
$0
22%
7%
1%
$35
$5
17%
7%
2%
$49
$42
25%
9%
1%
$35
$13
1. The residual savings are a bit inflated due to some outlier data points, that fall on the cluster boundary. Did not get time to clean up.
2. For food I should have included the food away from home and working expenses. A potential link to elasticity could have helped further.
3. The (super) rich spend about 7 to 11% on clothes; 2 to 4% on alchol/tobacco and 1% on entertainment.
4. I would also carry out the elasticity analysis over the lag variables to determine the sensitivity towards price (data not used)
5. All income values in 10,000’s
Appendix
Pairwise Correlation Analysis (sklearn)
Unsorted Sorted
t-SNE for cluster analysis (sklearn)
Clothing spend (decision trees)
Gradient Boosted
Tried this approach to see if
building multiple decision trees
changes the variable importance on
the clothing spend
Simple decision tree
A quick look at the variable
importance in a build up of a
decision tree. These line up with
the variables found via correlation
analysis
17
%
14%
5%
5%
4%
Income
Residual savings
Education
Vehicles
Hours worked
68%
9%
6%
5%
4%
Income
Renter
Residual Savings
West US
Education
1. Explained variance is 0.35 for decision trees vs 0.48 for gradient booted trees
2. RMSE 5323 for decision trees vs 4765 for the gradient boosted trees
Food_Away Analysis using Ridge
Regression
See reference excel sheet.

Weitere ähnliche Inhalte

Was ist angesagt?

Energy Management Program
Energy Management ProgramEnergy Management Program
Energy Management Program
wnaqvi
 
CONTROL AND INSTRUMENTATION OF POWER PLANT
CONTROL AND INSTRUMENTATION OF POWER PLANTCONTROL AND INSTRUMENTATION OF POWER PLANT
CONTROL AND INSTRUMENTATION OF POWER PLANT
Subarna Poddar
 

Was ist angesagt? (20)

Oil shale..New fossil fuel for century
Oil shale..New fossil fuel for centuryOil shale..New fossil fuel for century
Oil shale..New fossil fuel for century
 
Energy Transition - A comprehensive approach
Energy Transition - A comprehensive approachEnergy Transition - A comprehensive approach
Energy Transition - A comprehensive approach
 
What is Gross refining margin GRM?
What is Gross refining margin GRM?What is Gross refining margin GRM?
What is Gross refining margin GRM?
 
Energy Auditing and demand side management ,EADSM
Energy Auditing and demand side management ,EADSMEnergy Auditing and demand side management ,EADSM
Energy Auditing and demand side management ,EADSM
 
Oil and gas refinery
Oil and gas refineryOil and gas refinery
Oil and gas refinery
 
Energy Management Program
Energy Management ProgramEnergy Management Program
Energy Management Program
 
Concentrated solar power
Concentrated solar powerConcentrated solar power
Concentrated solar power
 
Fuel cells and hydrogen energy systems
Fuel cells and hydrogen energy systemsFuel cells and hydrogen energy systems
Fuel cells and hydrogen energy systems
 
Introduction to Natural Gas -NG
Introduction to Natural Gas -NGIntroduction to Natural Gas -NG
Introduction to Natural Gas -NG
 
Energy Transition In A Nutshell
Energy Transition In A NutshellEnergy Transition In A Nutshell
Energy Transition In A Nutshell
 
Renewable Energy Certificate
Renewable Energy CertificateRenewable Energy Certificate
Renewable Energy Certificate
 
Petroleum and Crude oil quality
Petroleum and Crude oil qualityPetroleum and Crude oil quality
Petroleum and Crude oil quality
 
1 KW Solar Photovoltaic System
1 KW Solar Photovoltaic System1 KW Solar Photovoltaic System
1 KW Solar Photovoltaic System
 
Oil refinery Presentation
Oil refinery PresentationOil refinery Presentation
Oil refinery Presentation
 
CONTROL AND INSTRUMENTATION OF POWER PLANT
CONTROL AND INSTRUMENTATION OF POWER PLANTCONTROL AND INSTRUMENTATION OF POWER PLANT
CONTROL AND INSTRUMENTATION OF POWER PLANT
 
Hybrid power generation system
Hybrid power generation systemHybrid power generation system
Hybrid power generation system
 
Energy Audit / Energy Conservation Basics by Varun Pratap Singh
Energy Audit / Energy Conservation Basics by Varun Pratap SinghEnergy Audit / Energy Conservation Basics by Varun Pratap Singh
Energy Audit / Energy Conservation Basics by Varun Pratap Singh
 
Fusion
FusionFusion
Fusion
 
Challenges in the oil and gas sector overview and outlook
Challenges in the oil and gas sector  overview and outlookChallenges in the oil and gas sector  overview and outlook
Challenges in the oil and gas sector overview and outlook
 
Oil Refineries in India
Oil Refineries in IndiaOil Refineries in India
Oil Refineries in India
 

Ähnlich wie Analysis on the US Consumer Expenditure

DC_nonprofit_2015_5DataPointsThatReallyMatterDRAFTv2
DC_nonprofit_2015_5DataPointsThatReallyMatterDRAFTv2DC_nonprofit_2015_5DataPointsThatReallyMatterDRAFTv2
DC_nonprofit_2015_5DataPointsThatReallyMatterDRAFTv2
LMSmith361
 
Discussion Board RubricProficientNoviceIntroduction an.docx
Discussion Board RubricProficientNoviceIntroduction an.docxDiscussion Board RubricProficientNoviceIntroduction an.docx
Discussion Board RubricProficientNoviceIntroduction an.docx
felipaser7p
 
North Carolina 2013 LTC Costs
North Carolina 2013 LTC CostsNorth Carolina 2013 LTC Costs
North Carolina 2013 LTC Costs
Brian Johnson
 
Dundee wealth slides
Dundee wealth slidesDundee wealth slides
Dundee wealth slides
taneilanthony
 
Introducing the Motivational Map
Introducing the Motivational MapIntroducing the Motivational Map
Introducing the Motivational Map
David Rose
 
Week 9 - eHealth in Ontario
Week 9 - eHealth in OntarioWeek 9 - eHealth in Ontario
Week 9 - eHealth in Ontario
Alexandre Mayer
 
Exam 1 (covers Chapters 1-7)Math 140Show all work! Na.docx
Exam 1 (covers Chapters 1-7)Math 140Show all work!     Na.docxExam 1 (covers Chapters 1-7)Math 140Show all work!     Na.docx
Exam 1 (covers Chapters 1-7)Math 140Show all work! Na.docx
SANSKAR20
 
Bryant Loy MKT 530 Final Exam - Final Copy
Bryant Loy MKT 530 Final Exam - Final CopyBryant Loy MKT 530 Final Exam - Final Copy
Bryant Loy MKT 530 Final Exam - Final Copy
Bryant Loy
 
Traci's vs. new team elites ppt aug 2011
Traci's vs. new team elites ppt aug 2011 Traci's vs. new team elites ppt aug 2011
Traci's vs. new team elites ppt aug 2011
John Wright
 
Abc workshop ppt__1.5_hr__2014_v9
Abc workshop ppt__1.5_hr__2014_v9Abc workshop ppt__1.5_hr__2014_v9
Abc workshop ppt__1.5_hr__2014_v9
AmericanRetire
 

Ähnlich wie Analysis on the US Consumer Expenditure (20)

Customer Personality Analysis — Part 1.pdf
Customer Personality Analysis — Part 1.pdfCustomer Personality Analysis — Part 1.pdf
Customer Personality Analysis — Part 1.pdf
 
INFORMED WOMEN KNOW MORE!
INFORMED WOMEN KNOW MORE!INFORMED WOMEN KNOW MORE!
INFORMED WOMEN KNOW MORE!
 
INFORMED WOMEN KNOW MORE!
INFORMED WOMEN KNOW MORE!INFORMED WOMEN KNOW MORE!
INFORMED WOMEN KNOW MORE!
 
DC_nonprofit_2015_5DataPointsThatReallyMatterDRAFTv2
DC_nonprofit_2015_5DataPointsThatReallyMatterDRAFTv2DC_nonprofit_2015_5DataPointsThatReallyMatterDRAFTv2
DC_nonprofit_2015_5DataPointsThatReallyMatterDRAFTv2
 
Discussion Board RubricProficientNoviceIntroduction an.docx
Discussion Board RubricProficientNoviceIntroduction an.docxDiscussion Board RubricProficientNoviceIntroduction an.docx
Discussion Board RubricProficientNoviceIntroduction an.docx
 
Avalon's DM 101 - Analytics and Reporting
Avalon's DM 101 - Analytics and ReportingAvalon's DM 101 - Analytics and Reporting
Avalon's DM 101 - Analytics and Reporting
 
Melbourne Business School - mba talk october 14 - croll - 40m - lean analytics
Melbourne Business School - mba talk october 14 - croll - 40m - lean analyticsMelbourne Business School - mba talk october 14 - croll - 40m - lean analytics
Melbourne Business School - mba talk october 14 - croll - 40m - lean analytics
 
Slides from New Media Manitoba Lean Analytics workshop, June 2015
Slides from New Media Manitoba Lean Analytics workshop, June 2015Slides from New Media Manitoba Lean Analytics workshop, June 2015
Slides from New Media Manitoba Lean Analytics workshop, June 2015
 
North Carolina 2013 LTC Costs
North Carolina 2013 LTC CostsNorth Carolina 2013 LTC Costs
North Carolina 2013 LTC Costs
 
Changes in consumer spending habits due to covid 19
Changes in consumer spending habits due to covid 19Changes in consumer spending habits due to covid 19
Changes in consumer spending habits due to covid 19
 
CPG Trend Analysis and Growth Opportunities across Retail Channels
CPG Trend Analysis and Growth Opportunities across Retail ChannelsCPG Trend Analysis and Growth Opportunities across Retail Channels
CPG Trend Analysis and Growth Opportunities across Retail Channels
 
Dundee wealth slides
Dundee wealth slidesDundee wealth slides
Dundee wealth slides
 
McKinsey Survey: Qatari consumer sentiment during the coronavirus crisis
McKinsey Survey: Qatari consumer sentiment during the coronavirus crisisMcKinsey Survey: Qatari consumer sentiment during the coronavirus crisis
McKinsey Survey: Qatari consumer sentiment during the coronavirus crisis
 
Introducing the Motivational Map
Introducing the Motivational MapIntroducing the Motivational Map
Introducing the Motivational Map
 
Week 9 - eHealth in Ontario
Week 9 - eHealth in OntarioWeek 9 - eHealth in Ontario
Week 9 - eHealth in Ontario
 
Exam 1 (covers Chapters 1-7)Math 140Show all work! Na.docx
Exam 1 (covers Chapters 1-7)Math 140Show all work!     Na.docxExam 1 (covers Chapters 1-7)Math 140Show all work!     Na.docx
Exam 1 (covers Chapters 1-7)Math 140Show all work! Na.docx
 
Bryant Loy MKT 530 Final Exam - Final Copy
Bryant Loy MKT 530 Final Exam - Final CopyBryant Loy MKT 530 Final Exam - Final Copy
Bryant Loy MKT 530 Final Exam - Final Copy
 
Traci's vs. new team elites ppt aug 2011
Traci's vs. new team elites ppt aug 2011 Traci's vs. new team elites ppt aug 2011
Traci's vs. new team elites ppt aug 2011
 
Trends in the Advisor Market
Trends in the Advisor Market Trends in the Advisor Market
Trends in the Advisor Market
 
Abc workshop ppt__1.5_hr__2014_v9
Abc workshop ppt__1.5_hr__2014_v9Abc workshop ppt__1.5_hr__2014_v9
Abc workshop ppt__1.5_hr__2014_v9
 

Mehr von Setu Chokshi

Mehr von Setu Chokshi (9)

Build vs Buy: Ensuring maximum ROI from AI
Build vs Buy: Ensuring maximum ROI from AIBuild vs Buy: Ensuring maximum ROI from AI
Build vs Buy: Ensuring maximum ROI from AI
 
AI for AI: Building state of the art models
AI for AI: Building state of the art modelsAI for AI: Building state of the art models
AI for AI: Building state of the art models
 
Microsoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine LearningMicrosoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine Learning
 
2018 Global Azure Bootcamp Azure Machine Learning for neural networks
2018 Global Azure Bootcamp Azure Machine Learning for neural networks2018 Global Azure Bootcamp Azure Machine Learning for neural networks
2018 Global Azure Bootcamp Azure Machine Learning for neural networks
 
Azure machine learning 101 Parts 1 & 2 - Classification Algorithms
Azure machine learning 101  Parts 1 & 2  -  Classification Algorithms Azure machine learning 101  Parts 1 & 2  -  Classification Algorithms
Azure machine learning 101 Parts 1 & 2 - Classification Algorithms
 
Azure machine learning 101 - Part 1
Azure machine learning 101 - Part 1Azure machine learning 101 - Part 1
Azure machine learning 101 - Part 1
 
Azure Boot Camp 2017 getting started with azure machine learning
Azure Boot Camp 2017 getting started with azure machine learningAzure Boot Camp 2017 getting started with azure machine learning
Azure Boot Camp 2017 getting started with azure machine learning
 
Machine Learning 101
Machine Learning 101Machine Learning 101
Machine Learning 101
 
Time series predictions using LSTMs
Time series predictions using LSTMsTime series predictions using LSTMs
Time series predictions using LSTMs
 

Kürzlich hochgeladen

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 

Kürzlich hochgeladen (20)

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 

Analysis on the US Consumer Expenditure

  • 2. Objective • Propose one way of using the data employing one of the following methods: regression, classification or clustering. Execute your proposal and discuss your methodology, justify your algorithm/ feature selection and share insights from the model. • Dataset: Consumer Expenditure Survey for 1996-‐2000 (12k rows, 220 columns)
  • 3. A typical American family This infographic summarizes the consumer demographics in the expenditure data. It provides for a very good macro overview of the dataset and what can be expected out of it. About Chart 2.3 vehicles per family 77% own a home 2.8 members per family 1.5 earning members per family
  • 4. How much do they earn? Description For every dollar earned by the family members, about 78 cents are used to pay various expenses to support and maintain the family. 20 cents are used to pay various taxes including social security. Maintenance About 60% of the expenses are towards the non discretionary items like rent, food etc. Expenses $40,679 Income $53,147 Entertainment The balance 40% is what is used for discretionary items like Alcohol, entertainment and travel. Taxes $9,962
  • 5. Where does the money go? 0 50 100 $765 $1489 $1956 $2806 RentAlcohol Tobacco Entertainment Clothes Utilities Transport Food $3921 $5821 $10687
  • 7. Potential questions data can answer? Who are these people? Who are these people? What are their demographics? Should we customize the product for the diversity? Targeting specific groups Why should be target certain demographics? Why would they buy the product from you? Potential reach Where should they grow the business next? Is this necessary for them to get your product? If so how frequently? What motivates them to buy? How much elasticity do they have in purchasing the product? Would they be ok with price increases or would this product be a battle over prices. 1. Other macro economic indicators can also be calculated as well using this data. But since our focus is on CE goods company, we will exclude them.
  • 8. Steps for the analysis Step 04 Step 03 Step 02 Step 01 Initial Analysis After eliminating lag variables, a pair-wise correlation analysis was performed to id key variables. Calculations Calculated savings using residual & net worth methods to identify elasticity of each demographic. Understanding the data K-Means to identify clusters within the groups. Decision trees & ridge regression to understand the expenses. Validation Tried to understand the clusters and the data patterns to get additional insights. Presentation Preparation of the results in the simplistic manner to be presented to the Consumer goods executive team.
  • 9. Demographics (using clustering) Rich / Super Rich 3.7% Single earner 25.7% Singles 25.0% Working spouse 33.1% Widows 12.5% 59 years old Mostly female 1 member High school 46 years old Mostly female 1 to 2 members Some degree 45 years old Mostly male 3 to 6 members College, no degree 55 years old Mostly male 2 to 4 members College educated 47 years old Fe(Male) 3 to 5 members Bachelors degree These were arrived using the K Means clustering algorithm. The features names were arrived on the basis of what the key separation features were for each cluster. I included the calculated parameters of residual savings and net worth savings to be included in the clustering as well. The outliers were kept in the separate cluster and is being named as super rich or the 0.01 percenter. Additional cluster level information can be found in the slide notes for this page.
  • 10. Elasticity (expense / income)4 Widows Singles Working Spouse Single Earner Income5 Clothes2 Alcohol / Tobacco Entertainment Residual Savings1 Net worth savings 19$ 42$ 70$ 42$ 38% 10% 2% $18 $0 22% 7% 1% $35 $5 17% 7% 2% $49 $42 25% 9% 1% $35 $13 1. The residual savings are a bit inflated due to some outlier data points, that fall on the cluster boundary. Did not get time to clean up. 2. For food I should have included the food away from home and working expenses. A potential link to elasticity could have helped further. 3. The (super) rich spend about 7 to 11% on clothes; 2 to 4% on alchol/tobacco and 1% on entertainment. 4. I would also carry out the elasticity analysis over the lag variables to determine the sensitivity towards price (data not used) 5. All income values in 10,000’s
  • 12. Pairwise Correlation Analysis (sklearn) Unsorted Sorted
  • 13. t-SNE for cluster analysis (sklearn)
  • 14. Clothing spend (decision trees) Gradient Boosted Tried this approach to see if building multiple decision trees changes the variable importance on the clothing spend Simple decision tree A quick look at the variable importance in a build up of a decision tree. These line up with the variables found via correlation analysis 17 % 14% 5% 5% 4% Income Residual savings Education Vehicles Hours worked 68% 9% 6% 5% 4% Income Renter Residual Savings West US Education 1. Explained variance is 0.35 for decision trees vs 0.48 for gradient booted trees 2. RMSE 5323 for decision trees vs 4765 for the gradient boosted trees
  • 15. Food_Away Analysis using Ridge Regression See reference excel sheet.

Hinweis der Redaktion

  1. These were arrived using the K Means clustering algorithm. The features names were arrived on the basis of what the key separation features were for each cluster. I included the calculated parameters of residual savings and net worth savings to be included in the clustering as well. The outliers were kept in the separate cluster and is being named as super rich or the 0.01 percenter. Non- Working Widows: Observations: 40.88% of the cluster has 2 for marital (against 7.66 % globally) 83.82% of the cluster has \N for emptype (against 24.41 % globally) 83.82% of the cluster has \N for empstat (against 24.50 % globally) Rich: Observations wages_calc is in average 245% greater : mean of 190k against 55002 globally expenses is in average 246% greater : mean of 180k against 53148 globally residual_savings is in average 172% greater : mean of 110k against 40679 globally Singles: Observations 33.53% of the cluster has 5 for marital (against 10.52 % globally) 46.29% of the cluster has 3 for marital (against 15.34 % globally) 97.24% of the cluster has 0 for married (against 36.23 % globally) Working Spouses: Observations 36.28% of the cluster has 1 for working_part_spouse (against 15.40 % globally) 45.41% of the cluster has 40 for hrswkd_spouse (against 19.98 % globally) 98.90% of the cluster has 1 for working_spouse (against 44.15 % globally) Single Earner Observations 67.08% of the cluster has 0 for wkswkd_spouse (against 17.98 % globally) 67.46% of the cluster has 0 for hrswkd_spouse (against 18.09 % globally) 52.79% of the cluster has \N for empstat (against 24.50 % globally) Super Rich: Observations net_worth_savings is in average 1531% greater : mean of 400k against 24731 globally expenses is in average 564% greater : mean of 350k against 53148 globally wages_calc is in average 559% greater : mean of 360k against 55002 globally