Customer segmentation scbcn17

•

2 gefällt mir•869 views

Julio Martinez

Workshop introduction. Software Craftsmanship Conference in Barcelona, October 2017.

Technologie

Customer segmentation
an excuse to use Machine Learning ;-)

● Julio Martinez
● Web developer since 2001
● 2 years working at Ulabox
● Machine Learning hobbyist
● Find me: @liopic
Who am I?

1. docker pull jupyter/scipy-notebook
2. git clone git@github.com:ulabox/datasets
3. git clone git@github.com:liopic/scbcn17-customer-segmentation
4. cp datasets/data/*.csv scbcn17-customer-segmentation/
Preparing the workshop

My 2017 objective: M.L.
● Motivation
○ It’s the new hot thing
○ AlphaGo beat Lee Sedol, March 2016
● Some background, but need to learn more

1. Choose the way
○ Coursera’s vs. books vs. workshops vs. posts
2. Find an excuse to apply it
○ @work is better than @home
Learning about Machine Learning

Customer clusters @work, aka “the excuse”
● There is a non-programmer Business Analysis Department
● Groups of customers based on periodicity + amount spent
○ Example: people that buy once per month, 100€ ticket
○ Useful for business reports
○ Not so useful for UX, CRM
● Groups by behavior? Clustering orders!
Boring!

1. With past data -> make a ML model
○ clean data
○ choose a ML algorithm/s
○ tune the algorithm, with testing
2. With new data -> use model to predict (or give new info)
○ deploy pipeline
○ update model
101 Machine Learning: the method

● Supervised
○ data + labels(result)
● Unsupervised
○ just data
● Reinforcement
○ function to optimize
101 Machine Learning: type of problems

Supervised learning
TRAINING SET
cat cat person
TEST SET
???

Unsupervised learning
TRAINING SET
TEST SET
There is NO test

● Try to extract features (information, shapes): similar and different
● Uses:
○ Clustering
○ Anomaly detection (it doesn’t look “normal”)
○ Dimensional reduction
○ Transfer features, projections ...
Unsupervised learning

● Use:
○ grouping
○ quantization
● Algorithms:
○ k-means
○ DBSCAN
Clustering

● need: how many samples at minimum, tune other params
DBSCAN: Density-based spatial clustering of applications with noise

● Data preparation
○ Keep same order of magnitude, usually [0,1]
○ Remove noise
○ Other processes
■ Binarize data, categorical features
● weekday, ex. 4 -> 0, 0, 0, 1, 0, 0, 0
■ Process missing data
Before algorithms: data!

● Explore the data
○ Images are richer than numbers
■ “We get more orders at 22h” vs.
● Ask domain experts
○ Understand normal & border cases
■ The step at 14h is the web cutoff time
Before algorithms: data!

● Explore and optimize the data
○ Features that count, feature engineering
○ Avoid the “curse of dimensionality”
● Start small, understandable, useful
● Find excuses to try it, and sell it!
Lessons learned

Empfohlen

Hadoop @ eBuddyBennie Schut

Curtain call of zooey - what i've learned in yahoo羽祈張

Piano Media - approach to data gathering and processingMartinStrycek

Meetup 18/10/2018 - Artificiële intelligentie en mobiliteitDigipolis Antwerpen

Big Data & Social Analytics presentationgustavosouto

Beat the Benchmark.Pruthuvi Maheshakya Wijewardena

Evan Estola – Data Scientist, Meetup.com at MLconf ATLMLconf

Empfohlen

Hadoop @ eBuddyBennie Schut

Curtain call of zooey - what i've learned in yahoo羽祈張

Piano Media - approach to data gathering and processingMartinStrycek

Meetup 18/10/2018 - Artificiële intelligentie en mobiliteitDigipolis Antwerpen

Big Data & Social Analytics presentationgustavosouto

Beat the Benchmark.Pruthuvi Maheshakya Wijewardena

Evan Estola – Data Scientist, Meetup.com at MLconf ATLMLconf

Scaling Recommendations at Quora (RecSys talk 9/16/2016)Nikhil Dandekar

Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous

Clean CodeNeeleEilers

Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain

Analyzing workflows and improving communication across departments NASIG

Embedded based retrieval in modern search ranking systemMarsan Ma

What Are the Basics of Product Manager Interviews by Google PMProduct School

Introduction to ScrumBixlabs

Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Aaron Saray

Bimbo Final Project PresentationCan Köklü

Machine Learning - Startup weekend UCSB 2018Raul Eulogio

Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15MLconf

10 more lessons learned from building Machine Learning systems - MLConfXavier Amatriain

10 more lessons learned from building Machine Learning systemsXavier Amatriain

Search@flipkartUmesh Prasad

Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festivalfreshdatabos

Winning data science competitionsOwen Zhang

Software Engineering PrimerGeorg Buske

Big data @ uber vu (1)Mihnea Giurgea

Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh

Buscando un trabajo en un pajarJulio Martinez

Remote working effectivelyJulio Martinez

Weitere ähnliche Inhalte

Ähnlich wie Customer segmentation scbcn17

Scaling Recommendations at Quora (RecSys talk 9/16/2016)Nikhil Dandekar

Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous

Clean CodeNeeleEilers

Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain

Analyzing workflows and improving communication across departments NASIG

Embedded based retrieval in modern search ranking systemMarsan Ma

What Are the Basics of Product Manager Interviews by Google PMProduct School

Introduction to ScrumBixlabs

Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Aaron Saray

Bimbo Final Project PresentationCan Köklü

Machine Learning - Startup weekend UCSB 2018Raul Eulogio

Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15MLconf

10 more lessons learned from building Machine Learning systems - MLConfXavier Amatriain

10 more lessons learned from building Machine Learning systemsXavier Amatriain

Search@flipkartUmesh Prasad

Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festivalfreshdatabos

Winning data science competitionsOwen Zhang

Software Engineering PrimerGeorg Buske

Big data @ uber vu (1)Mihnea Giurgea

Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh

Ähnlich wie Customer segmentation scbcn17 (20)

Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Production-Ready BIG ML Workflows - from zero to hero

Clean Code

Strata 2016 - Lessons Learned from building real-life Machine Learning Systems

Analyzing workflows and improving communication across departments

Embedded based retrieval in modern search ranking system

What Are the Basics of Product Manager Interviews by Google PM

Introduction to Scrum

Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...

Bimbo Final Project Presentation

Machine Learning - Startup weekend UCSB 2018

Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15

10 more lessons learned from building Machine Learning systems - MLConf

10 more lessons learned from building Machine Learning systems

Search@flipkart

Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival

Winning data science competitions

Software Engineering Primer

Big data @ uber vu (1)

Ledingkart Meetup #2: Scaling Search @Lendingkart

Mehr von Julio Martinez

Buscando un trabajo en un pajarJulio Martinez

Remote working effectivelyJulio Martinez

Conclusion of the Seminary UPC 2017Julio Martinez

Introduction to DockerJulio Martinez

Some OOP paradigms & SOLIDJulio Martinez

Introduction to Clean CodeJulio Martinez

Professional developmentJulio Martinez

Code metrics in PHPJulio Martinez

Mehr von Julio Martinez (8)

Buscando un trabajo en un pajar

Remote working effectively

Conclusion of the Seminary UPC 2017

Introduction to Docker

Some OOP paradigms & SOLID

Introduction to Clean Code

Professional development

Code metrics in PHP

Kürzlich hochgeladen

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Developing An App To Navigate The Roads of BrazilV3cube

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity

Salesforce Community Group Quito, Salesforce 101

Scaling API-first – The story of a global engineering organization

Developing An App To Navigate The Roads of Brazil

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

08448380779 Call Girls In Friends Colony Women Seeking Men

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Finology Group – Insurtech Innovation Award 2024

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Breaking the Kubernetes Kill Chain: Host Path Mount

A Domino Admins Adventures (Engage 2024)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Automating Google Workspace (GWS) & more with Apps Script

Customer segmentation scbcn17

1. Customer segmentation an excuse to use Machine Learning ;-)

3. ● Julio Martinez ● Web developer since 2001 ● 2 years working at Ulabox ● Machine Learning hobbyist ● Find me: @liopic Who am I?

4. 1. docker pull jupyter/scipy-notebook 2. git clone git@github.com:ulabox/datasets 3. git clone git@github.com:liopic/scbcn17-customer-segmentation 4. cp datasets/data/*.csv scbcn17-customer-segmentation/ Preparing the workshop

5. My 2017 objective: M.L. ● Motivation ○ It’s the new hot thing ○ AlphaGo beat Lee Sedol, March 2016 ● Some background, but need to learn more

6. 1. Choose the way ○ Coursera’s vs. books vs. workshops vs. posts 2. Find an excuse to apply it ○ @work is better than @home Learning about Machine Learning

7. Customer clusters @work, aka “the excuse” ● There is a non-programmer Business Analysis Department ● Groups of customers based on periodicity + amount spent ○ Example: people that buy once per month, 100€ ticket ○ Useful for business reports ○ Not so useful for UX, CRM ● Groups by behavior? Clustering orders! Boring!

8. 1. With past data -> make a ML model ○ clean data ○ choose a ML algorithm/s ○ tune the algorithm, with testing 2. With new data -> use model to predict (or give new info) ○ deploy pipeline ○ update model 101 Machine Learning: the method

9. ● Supervised ○ data + labels(result) ● Unsupervised ○ just data ● Reinforcement ○ function to optimize 101 Machine Learning: type of problems

10. Supervised learning TRAINING SET cat cat person TEST SET ???

11. Unsupervised learning TRAINING SET TEST SET There is NO test

12. ● Try to extract features (information, shapes): similar and different ● Uses: ○ Clustering ○ Anomaly detection (it doesn’t look “normal”) ○ Dimensional reduction ○ Transfer features, projections ... Unsupervised learning

13. ● Use: ○ grouping ○ quantization ● Algorithms: ○ k-means ○ DBSCAN Clustering

14. ● need: how many clusters k-means

15. ● need: how many samples at minimum, tune other params DBSCAN: Density-based spatial clustering of applications with noise

16. So, ready to hack? But wait a moment!

17. ● Data preparation ○ Keep same order of magnitude, usually [0,1] ○ Remove noise ○ Other processes ■ Binarize data, categorical features ● weekday, ex. 4 -> 0, 0, 0, 1, 0, 0, 0 ■ Process missing data Before algorithms: data!

18. ● Explore the data ○ Images are richer than numbers ■ “We get more orders at 22h” vs. ● Ask domain experts ○ Understand normal & border cases ■ The step at 14h is the web cutoff time Before algorithms: data!

19. ● Explore and optimize the data ○ Features that count, feature engineering ○ Avoid the “curse of dimensionality” ● Start small, understandable, useful ● Find excuses to try it, and sell it! Lessons learned

20. Now, let’s hack!

21. 1. docker pull jupyter/scipy-notebook 2. git clone git@github.com:ulabox/datasets 3. git clone git@github.com:liopic/scbcn17-customer-segmentation 4. cp datasets/data/*.csv scbcn17-customer-segmentation/ 5. cd scbcn17-customer-segmentation 6. ./jupyter.sh 7. Open the link in your browser and open the Workshop.ipynb file Let’s hack

22. Thank you!