SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Customer segmentation
an excuse to use Machine Learning ;-)
● Julio Martinez
● Web developer since 2001
● 2 years working at Ulabox
● Machine Learning hobbyist
● Find me: @liopic
Who am I?
1. docker pull jupyter/scipy-notebook
2. git clone git@github.com:ulabox/datasets
3. git clone git@github.com:liopic/scbcn17-customer-segmentation
4. cp datasets/data/*.csv scbcn17-customer-segmentation/
Preparing the workshop
My 2017 objective: M.L.
● Motivation
○ It’s the new hot thing
○ AlphaGo beat Lee Sedol, March 2016
● Some background, but need to learn more
1. Choose the way
○ Coursera’s vs. books vs. workshops vs. posts
2. Find an excuse to apply it
○ @work is better than @home
Learning about Machine Learning
Customer clusters @work, aka “the excuse”
● There is a non-programmer Business Analysis Department
● Groups of customers based on periodicity + amount spent
○ Example: people that buy once per month, 100€ ticket
○ Useful for business reports
○ Not so useful for UX, CRM
● Groups by behavior? Clustering orders!
Boring!
1. With past data -> make a ML model
○ clean data
○ choose a ML algorithm/s
○ tune the algorithm, with testing
2. With new data -> use model to predict (or give new info)
○ deploy pipeline
○ update model
101 Machine Learning: the method
● Supervised
○ data + labels(result)
● Unsupervised
○ just data
● Reinforcement
○ function to optimize
101 Machine Learning: type of problems
Supervised learning
TRAINING SET
cat cat person
TEST SET
???
Unsupervised learning
TRAINING SET
TEST SET
There is NO test
● Try to extract features (information, shapes): similar and different
● Uses:
○ Clustering
○ Anomaly detection (it doesn’t look “normal”)
○ Dimensional reduction
○ Transfer features, projections ...
Unsupervised learning
● Use:
○ grouping
○ quantization
● Algorithms:
○ k-means
○ DBSCAN
Clustering
● need: how many clusters
k-means
● need: how many samples at minimum, tune other params
DBSCAN: Density-based spatial clustering of applications with noise
So, ready to hack?
But wait a moment!
● Data preparation
○ Keep same order of magnitude, usually [0,1]
○ Remove noise
○ Other processes
■ Binarize data, categorical features
● weekday, ex. 4 -> 0, 0, 0, 1, 0, 0, 0
■ Process missing data
Before algorithms: data!
● Explore the data
○ Images are richer than numbers
■ “We get more orders at 22h” vs.
● Ask domain experts
○ Understand normal & border cases
■ The step at 14h is the web cutoff time
Before algorithms: data!
● Explore and optimize the data
○ Features that count, feature engineering
○ Avoid the “curse of dimensionality”
● Start small, understandable, useful
● Find excuses to try it, and sell it!
Lessons learned
Now, let’s hack!
1. docker pull jupyter/scipy-notebook
2. git clone git@github.com:ulabox/datasets
3. git clone git@github.com:liopic/scbcn17-customer-segmentation
4. cp datasets/data/*.csv scbcn17-customer-segmentation/
5. cd scbcn17-customer-segmentation
6. ./jupyter.sh
7. Open the link in your browser and open the Workshop.ipynb file
Let’s hack
Thank you!

Weitere ähnliche Inhalte

Ähnlich wie Customer segmentation scbcn17

Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Scaling Recommendations at Quora (RecSys talk 9/16/2016)Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Scaling Recommendations at Quora (RecSys talk 9/16/2016)Nikhil Dandekar
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain
 
Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments NASIG
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemMarsan Ma
 
What Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PMWhat Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PMProduct School
 
Introduction to Scrum
Introduction to ScrumIntroduction to Scrum
Introduction to ScrumBixlabs
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Aaron Saray
 
Bimbo Final Project Presentation
Bimbo Final Project PresentationBimbo Final Project Presentation
Bimbo Final Project PresentationCan Köklü
 
Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018Raul Eulogio
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15MLconf
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConfXavier Amatriain
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systemsXavier Amatriain
 
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang)  - 2014 Boston Data FestivalWinning Data Science Competitions (Owen Zhang)  - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festivalfreshdatabos
 
Winning data science competitions
Winning data science competitionsWinning data science competitions
Winning data science competitionsOwen Zhang
 
Software Engineering Primer
Software Engineering PrimerSoftware Engineering Primer
Software Engineering PrimerGeorg Buske
 
Big data @ uber vu (1)
Big data @ uber vu (1)Big data @ uber vu (1)
Big data @ uber vu (1)Mihnea Giurgea
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 

Ähnlich wie Customer segmentation scbcn17 (20)

Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Scaling Recommendations at Quora (RecSys talk 9/16/2016)Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Scaling Recommendations at Quora (RecSys talk 9/16/2016)
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Clean Code
Clean CodeClean Code
Clean Code
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking system
 
What Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PMWhat Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PM
 
Introduction to Scrum
Introduction to ScrumIntroduction to Scrum
Introduction to Scrum
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
 
Bimbo Final Project Presentation
Bimbo Final Project PresentationBimbo Final Project Presentation
Bimbo Final Project Presentation
 
Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 
Search@flipkart
Search@flipkartSearch@flipkart
Search@flipkart
 
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang)  - 2014 Boston Data FestivalWinning Data Science Competitions (Owen Zhang)  - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
 
Winning data science competitions
Winning data science competitionsWinning data science competitions
Winning data science competitions
 
Software Engineering Primer
Software Engineering PrimerSoftware Engineering Primer
Software Engineering Primer
 
Big data @ uber vu (1)
Big data @ uber vu (1)Big data @ uber vu (1)
Big data @ uber vu (1)
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 

Mehr von Julio Martinez

Buscando un trabajo en un pajar
Buscando un trabajo en un pajarBuscando un trabajo en un pajar
Buscando un trabajo en un pajarJulio Martinez
 
Remote working effectively
Remote working effectivelyRemote working effectively
Remote working effectivelyJulio Martinez
 
Conclusion of the Seminary UPC 2017
Conclusion of the Seminary UPC 2017Conclusion of the Seminary UPC 2017
Conclusion of the Seminary UPC 2017Julio Martinez
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to DockerJulio Martinez
 
Some OOP paradigms & SOLID
Some OOP paradigms & SOLIDSome OOP paradigms & SOLID
Some OOP paradigms & SOLIDJulio Martinez
 
Introduction to Clean Code
Introduction to Clean CodeIntroduction to Clean Code
Introduction to Clean CodeJulio Martinez
 
Professional development
Professional developmentProfessional development
Professional developmentJulio Martinez
 

Mehr von Julio Martinez (8)

Buscando un trabajo en un pajar
Buscando un trabajo en un pajarBuscando un trabajo en un pajar
Buscando un trabajo en un pajar
 
Remote working effectively
Remote working effectivelyRemote working effectively
Remote working effectively
 
Conclusion of the Seminary UPC 2017
Conclusion of the Seminary UPC 2017Conclusion of the Seminary UPC 2017
Conclusion of the Seminary UPC 2017
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
 
Some OOP paradigms & SOLID
Some OOP paradigms & SOLIDSome OOP paradigms & SOLID
Some OOP paradigms & SOLID
 
Introduction to Clean Code
Introduction to Clean CodeIntroduction to Clean Code
Introduction to Clean Code
 
Professional development
Professional developmentProfessional development
Professional development
 
Code metrics in PHP
Code metrics in PHPCode metrics in PHP
Code metrics in PHP
 

Kürzlich hochgeladen

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Customer segmentation scbcn17

  • 1. Customer segmentation an excuse to use Machine Learning ;-)
  • 2.
  • 3. ● Julio Martinez ● Web developer since 2001 ● 2 years working at Ulabox ● Machine Learning hobbyist ● Find me: @liopic Who am I?
  • 4. 1. docker pull jupyter/scipy-notebook 2. git clone git@github.com:ulabox/datasets 3. git clone git@github.com:liopic/scbcn17-customer-segmentation 4. cp datasets/data/*.csv scbcn17-customer-segmentation/ Preparing the workshop
  • 5. My 2017 objective: M.L. ● Motivation ○ It’s the new hot thing ○ AlphaGo beat Lee Sedol, March 2016 ● Some background, but need to learn more
  • 6. 1. Choose the way ○ Coursera’s vs. books vs. workshops vs. posts 2. Find an excuse to apply it ○ @work is better than @home Learning about Machine Learning
  • 7. Customer clusters @work, aka “the excuse” ● There is a non-programmer Business Analysis Department ● Groups of customers based on periodicity + amount spent ○ Example: people that buy once per month, 100€ ticket ○ Useful for business reports ○ Not so useful for UX, CRM ● Groups by behavior? Clustering orders! Boring!
  • 8. 1. With past data -> make a ML model ○ clean data ○ choose a ML algorithm/s ○ tune the algorithm, with testing 2. With new data -> use model to predict (or give new info) ○ deploy pipeline ○ update model 101 Machine Learning: the method
  • 9. ● Supervised ○ data + labels(result) ● Unsupervised ○ just data ● Reinforcement ○ function to optimize 101 Machine Learning: type of problems
  • 10. Supervised learning TRAINING SET cat cat person TEST SET ???
  • 12. ● Try to extract features (information, shapes): similar and different ● Uses: ○ Clustering ○ Anomaly detection (it doesn’t look “normal”) ○ Dimensional reduction ○ Transfer features, projections ... Unsupervised learning
  • 13. ● Use: ○ grouping ○ quantization ● Algorithms: ○ k-means ○ DBSCAN Clustering
  • 14. ● need: how many clusters k-means
  • 15. ● need: how many samples at minimum, tune other params DBSCAN: Density-based spatial clustering of applications with noise
  • 16. So, ready to hack? But wait a moment!
  • 17. ● Data preparation ○ Keep same order of magnitude, usually [0,1] ○ Remove noise ○ Other processes ■ Binarize data, categorical features ● weekday, ex. 4 -> 0, 0, 0, 1, 0, 0, 0 ■ Process missing data Before algorithms: data!
  • 18. ● Explore the data ○ Images are richer than numbers ■ “We get more orders at 22h” vs. ● Ask domain experts ○ Understand normal & border cases ■ The step at 14h is the web cutoff time Before algorithms: data!
  • 19. ● Explore and optimize the data ○ Features that count, feature engineering ○ Avoid the “curse of dimensionality” ● Start small, understandable, useful ● Find excuses to try it, and sell it! Lessons learned
  • 21. 1. docker pull jupyter/scipy-notebook 2. git clone git@github.com:ulabox/datasets 3. git clone git@github.com:liopic/scbcn17-customer-segmentation 4. cp datasets/data/*.csv scbcn17-customer-segmentation/ 5. cd scbcn17-customer-segmentation 6. ./jupyter.sh 7. Open the link in your browser and open the Workshop.ipynb file Let’s hack