Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Customer segmentation
an excuse to use Machine Learning ;-)
● Julio Martinez
● Web developer since 2001
● 2 years working at Ulabox
● Machine Learning hobbyist
● Find me: @liopic
Who...
1. docker pull jupyter/scipy-notebook
2. git clone git@github.com:ulabox/datasets
3. git clone git@github.com:liopic/scbcn...
My 2017 objective: M.L.
● Motivation
○ It’s the new hot thing
○ AlphaGo beat Lee Sedol, March 2016
● Some background, but ...
1. Choose the way
○ Coursera’s vs. books vs. workshops vs. posts
2. Find an excuse to apply it
○ @work is better than @hom...
Customer clusters @work, aka “the excuse”
● There is a non-programmer Business Analysis Department
● Groups of customers b...
1. With past data -> make a ML model
○ clean data
○ choose a ML algorithm/s
○ tune the algorithm, with testing
2. With new...
● Supervised
○ data + labels(result)
● Unsupervised
○ just data
● Reinforcement
○ function to optimize
101 Machine Learnin...
Supervised learning
TRAINING SET
cat cat person
TEST SET
???
Unsupervised learning
TRAINING SET
TEST SET
There is NO test
● Try to extract features (information, shapes): similar and different
● Uses:
○ Clustering
○ Anomaly detection (it doesn’...
● Use:
○ grouping
○ quantization
● Algorithms:
○ k-means
○ DBSCAN
Clustering
● need: how many clusters
k-means
● need: how many samples at minimum, tune other params
DBSCAN: Density-based spatial clustering of applications with noise
So, ready to hack?
But wait a moment!
● Data preparation
○ Keep same order of magnitude, usually [0,1]
○ Remove noise
○ Other processes
■ Binarize data, categor...
● Explore the data
○ Images are richer than numbers
■ “We get more orders at 22h” vs.
● Ask domain experts
○ Understand no...
● Explore and optimize the data
○ Features that count, feature engineering
○ Avoid the “curse of dimensionality”
● Start s...
Now, let’s hack!
1. docker pull jupyter/scipy-notebook
2. git clone git@github.com:ulabox/datasets
3. git clone git@github.com:liopic/scbcn...
Thank you!
Customer segmentation scbcn17
Nächste SlideShare
Wird geladen in …5
×

Customer segmentation scbcn17

485 Aufrufe

Veröffentlicht am

Workshop introduction. Software Craftsmanship Conference in Barcelona, October 2017.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Customer segmentation scbcn17

  1. 1. Customer segmentation an excuse to use Machine Learning ;-)
  2. 2. ● Julio Martinez ● Web developer since 2001 ● 2 years working at Ulabox ● Machine Learning hobbyist ● Find me: @liopic Who am I?
  3. 3. 1. docker pull jupyter/scipy-notebook 2. git clone git@github.com:ulabox/datasets 3. git clone git@github.com:liopic/scbcn17-customer-segmentation 4. cp datasets/data/*.csv scbcn17-customer-segmentation/ Preparing the workshop
  4. 4. My 2017 objective: M.L. ● Motivation ○ It’s the new hot thing ○ AlphaGo beat Lee Sedol, March 2016 ● Some background, but need to learn more
  5. 5. 1. Choose the way ○ Coursera’s vs. books vs. workshops vs. posts 2. Find an excuse to apply it ○ @work is better than @home Learning about Machine Learning
  6. 6. Customer clusters @work, aka “the excuse” ● There is a non-programmer Business Analysis Department ● Groups of customers based on periodicity + amount spent ○ Example: people that buy once per month, 100€ ticket ○ Useful for business reports ○ Not so useful for UX, CRM ● Groups by behavior? Clustering orders! Boring!
  7. 7. 1. With past data -> make a ML model ○ clean data ○ choose a ML algorithm/s ○ tune the algorithm, with testing 2. With new data -> use model to predict (or give new info) ○ deploy pipeline ○ update model 101 Machine Learning: the method
  8. 8. ● Supervised ○ data + labels(result) ● Unsupervised ○ just data ● Reinforcement ○ function to optimize 101 Machine Learning: type of problems
  9. 9. Supervised learning TRAINING SET cat cat person TEST SET ???
  10. 10. Unsupervised learning TRAINING SET TEST SET There is NO test
  11. 11. ● Try to extract features (information, shapes): similar and different ● Uses: ○ Clustering ○ Anomaly detection (it doesn’t look “normal”) ○ Dimensional reduction ○ Transfer features, projections ... Unsupervised learning
  12. 12. ● Use: ○ grouping ○ quantization ● Algorithms: ○ k-means ○ DBSCAN Clustering
  13. 13. ● need: how many clusters k-means
  14. 14. ● need: how many samples at minimum, tune other params DBSCAN: Density-based spatial clustering of applications with noise
  15. 15. So, ready to hack? But wait a moment!
  16. 16. ● Data preparation ○ Keep same order of magnitude, usually [0,1] ○ Remove noise ○ Other processes ■ Binarize data, categorical features ● weekday, ex. 4 -> 0, 0, 0, 1, 0, 0, 0 ■ Process missing data Before algorithms: data!
  17. 17. ● Explore the data ○ Images are richer than numbers ■ “We get more orders at 22h” vs. ● Ask domain experts ○ Understand normal & border cases ■ The step at 14h is the web cutoff time Before algorithms: data!
  18. 18. ● Explore and optimize the data ○ Features that count, feature engineering ○ Avoid the “curse of dimensionality” ● Start small, understandable, useful ● Find excuses to try it, and sell it! Lessons learned
  19. 19. Now, let’s hack!
  20. 20. 1. docker pull jupyter/scipy-notebook 2. git clone git@github.com:ulabox/datasets 3. git clone git@github.com:liopic/scbcn17-customer-segmentation 4. cp datasets/data/*.csv scbcn17-customer-segmentation/ 5. cd scbcn17-customer-segmentation 6. ./jupyter.sh 7. Open the link in your browser and open the Workshop.ipynb file Let’s hack
  21. 21. Thank you!

×