Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Machine Learning in Big Data
- Look forward or be left behind
V. William Porto
Hadoop Summit June 2015
2  RedPoint Global Inc. 2015 Confidential
Machine Learning – keeping ahead of the curve
Three basic tenants for success i...
3  RedPoint Global Inc. 2015 Confidential
Machine Learning – why bother?
If you have always done it that way, it is proba...
4  RedPoint Global Inc. 2015 Confidential
Machine Learning – what really is it all about?
Learning vs. instruction
Humans...
5  RedPoint Global Inc. 2015 Confidential
Data Modeling – what, why, how
Regression – what happened in the past
Predictio...
6  RedPoint Global Inc. 2015 Confidential
Data Modeling – what, why, how
Choices, choices - the wide world of data modeli...
7  RedPoint Global Inc. 2015 Confidential
Supervised vs. Unsupervised Models
8  RedPoint Global Inc. 2015 Confidential
Linear Models
Major Assumption: the world is linear
Pros:
the math is easy!
fas...
9  RedPoint Global Inc. 2015 Confidential
Decision Trees
Major Assumption: the world is discrete
Pros:
easy to understand...
10  RedPoint Global Inc. 2015 Confidential
Non-Linear Models
Major Assumption: data is representative
Pros:
‘universal’ m...
11  RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation
Basic Question – which one describes the data the best...
12  RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation – group think
Collaborative Filtering
Relationship Mat...
13  RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation with Statistics
Statistical Techniques:
K-Means
Vector...
14  RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation – data driven
Feature Maps:
Pros:
lets data speaks for...
15  RedPoint Global Inc. 2015 Confidential
Model Selection – how to choose?
Basic Model Type (prediction or segmentation)...
16  RedPoint Global Inc. 2015 Confidential
Optimization – making the best choices
Standard (old-school) Techniques:
PCA, ...
17  RedPoint Global Inc. 2015 Confidential
Optimization – is that the only way?
18  RedPoint Global Inc. 2015 Confidential
Optimization – Evolving better solutions
Simulated Evolution
Pros:
fast, effic...
19  RedPoint Global Inc. 2015 Confidential
Optimization – Evolving Models
What does a ‘solution’ look like?
model type
pa...
20  RedPoint Global Inc. 2015 Confidential
Evolutionary Optimization in a Hadoop Environment
Challenges:
data partitionin...
21  RedPoint Global Inc. 2015 Confidential
Optimization in a Hadoop Environment – what really works
MapReduce:
algorithmi...
22  RedPoint Global Inc. 2015 Confidential
ML in a Hadoop Environment – Single Algorithm Architecture
Multi-Core Machine ...
23  RedPoint Global Inc. 2015 Confidential
Machine Learning in a Hadoop Environment
ML Algorithms:
Locally Weighted Linea...
24  RedPoint Global Inc. 2015 Confidential
Machine Learning in a Hadoop Environment – example
Hadoop Multi-Core Tests (pe...
25  RedPoint Global Inc. 2015 Confidential
ML in a Hadoop Environment – Evolutionary Optimization Architecture
Offspring ...
26  RedPoint Global Inc. 2015 Confidential
Machine Learning – Hadoop, MPI, GPU?
query info
Analyze the algorithmic bottle...
27  RedPoint Global Inc. 2015 Confidential
Optimization – Don’t Stop Now
Adaptation
update models regularly
drop old data...
28  RedPoint Global Inc. 2015 Confidential
A Word about RedPoint Global
Launched 2006
Founded and staffed by industry
vet...
29  RedPoint Global Inc. 2015 Confidential
Time for Q&A
For more information contact:
Bill Porto
RedPoint Global Inc.
36 ...
Nächste SlideShare
Wird geladen in …5
×

von

Machine Learning in Big Data Slide 1 Machine Learning in Big Data Slide 2 Machine Learning in Big Data Slide 3 Machine Learning in Big Data Slide 4 Machine Learning in Big Data Slide 5 Machine Learning in Big Data Slide 6 Machine Learning in Big Data Slide 7 Machine Learning in Big Data Slide 8 Machine Learning in Big Data Slide 9 Machine Learning in Big Data Slide 10 Machine Learning in Big Data Slide 11 Machine Learning in Big Data Slide 12 Machine Learning in Big Data Slide 13 Machine Learning in Big Data Slide 14 Machine Learning in Big Data Slide 15 Machine Learning in Big Data Slide 16 Machine Learning in Big Data Slide 17 Machine Learning in Big Data Slide 18 Machine Learning in Big Data Slide 19 Machine Learning in Big Data Slide 20 Machine Learning in Big Data Slide 21 Machine Learning in Big Data Slide 22 Machine Learning in Big Data Slide 23 Machine Learning in Big Data Slide 24 Machine Learning in Big Data Slide 25 Machine Learning in Big Data Slide 26 Machine Learning in Big Data Slide 27 Machine Learning in Big Data Slide 28 Machine Learning in Big Data Slide 29
Nächste SlideShare
Machine Learning in Big Data
Weiter

5 Gefällt mir

Teilen

Machine Learning in Big Data

Hadoop Summit 2015

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Machine Learning in Big Data

  1. 1. Machine Learning in Big Data - Look forward or be left behind V. William Porto Hadoop Summit June 2015
  2. 2. 2  RedPoint Global Inc. 2015 Confidential Machine Learning – keeping ahead of the curve Three basic tenants for success in today’s world Prediction - you need to learn and use what you’ve learned Optimization - the world is a dynamic place Automation - because people don’t scale well
  3. 3. 3  RedPoint Global Inc. 2015 Confidential Machine Learning – why bother? If you have always done it that way, it is probably wrong” - Charles Kettering
  4. 4. 4  RedPoint Global Inc. 2015 Confidential Machine Learning – what really is it all about? Learning vs. instruction Humans learn instinctively – computers not so much Intelligent Systems Memory Prediction (modeling) Assessment Feedback Adaptation
  5. 5. 5  RedPoint Global Inc. 2015 Confidential Data Modeling – what, why, how Regression – what happened in the past Prediction – what will happen in the future “Prediction is very difficult – especially if it’s about the future” - Nihls Bohr
  6. 6. 6  RedPoint Global Inc. 2015 Confidential Data Modeling – what, why, how Choices, choices - the wide world of data modeling Supervised models you have historical data and known correlated outputs (truth) Unsupervised models historical data, but may not have (or trust) associated outputs
  7. 7. 7  RedPoint Global Inc. 2015 Confidential Supervised vs. Unsupervised Models
  8. 8. 8  RedPoint Global Inc. 2015 Confidential Linear Models Major Assumption: the world is linear Pros: the math is easy! fast execution Cons: the real world isn’t really linear all errors aren’t all equal easy to generate misleading results
  9. 9. 9  RedPoint Global Inc. 2015 Confidential Decision Trees Major Assumption: the world is discrete Pros: easy to understand fast execution no linearity assumptions Cons: lots of ‘human time’ to create bias in unbalanced trees some concepts need very large trees
  10. 10. 10  RedPoint Global Inc. 2015 Confidential Non-Linear Models Major Assumption: data is representative Pros: ‘universal’ modeling tools fast execution no linearity assumptions Cons: lots of parameters, many techniques training can be slow difficult to explain and understand Artificial Neural Network Bayesian Network
  11. 11. 11  RedPoint Global Inc. 2015 Confidential Clustering/Segmentation Basic Question – which one describes the data the best? Raw data
  12. 12. 12  RedPoint Global Inc. 2015 Confidential Clustering/Segmentation – group think Collaborative Filtering Relationship Matrix
  13. 13. 13  RedPoint Global Inc. 2015 Confidential Clustering/Segmentation with Statistics Statistical Techniques: K-Means Vector Quantization Pros: relatively simple statistically-backed results Cons: assumptions: data distribution how many clusters really are there? K-Means Clustering Vector Quantization
  14. 14. 14  RedPoint Global Inc. 2015 Confidential Clustering/Segmentation – data driven Feature Maps: Pros: lets data speaks for itself useful boundary relationships Cons: slow to train Customer Demographics
  15. 15. 15  RedPoint Global Inc. 2015 Confidential Model Selection – how to choose? Basic Model Type (prediction or segmentation) inputs + correlated outputs inputs only? Basic Questions: which one to use for my problem? parameters? is this the best choice? could I do better, and how?
  16. 16. 16  RedPoint Global Inc. 2015 Confidential Optimization – making the best choices Standard (old-school) Techniques: PCA, Partial Least Squares, etc. Pros: because the math is easy ! Cons: lots of (usually incorrect) assumptions new data = start from scratch
  17. 17. 17  RedPoint Global Inc. 2015 Confidential Optimization – is that the only way?
  18. 18. 18  RedPoint Global Inc. 2015 Confidential Optimization – Evolving better solutions Simulated Evolution Pros: fast, efficient search always have a solution arbitrary ‘evaluation’ functions can start with existing solution(s) Cons: CPU time + memory – but that’s why we have distributed processing!
  19. 19. 19  RedPoint Global Inc. 2015 Confidential Optimization – Evolving Models What does a ‘solution’ look like? model type parameters data (training + testing) Variation – alter model type, parameters Assessment – how well does the model work? Selection – survival of the fittest
  20. 20. 20  RedPoint Global Inc. 2015 Confidential Evolutionary Optimization in a Hadoop Environment Challenges: data partitioning distributed computation communication MapReduce
  21. 21. 21  RedPoint Global Inc. 2015 Confidential Optimization in a Hadoop Environment – what really works MapReduce: algorithmic task partitioning iterative tasks vs. fully compartmented tasks aggregation – distribution tasks communication / synchronization costs
  22. 22. 22  RedPoint Global Inc. 2015 Confidential ML in a Hadoop Environment – Single Algorithm Architecture Multi-Core Machine (per Chu and Kim, et. al 2006, Stanford NLPG) ML Algorithm Engine Master Mapper Mapper Mapper Mapper Data Reducer input reduce query info result query info map (split data)intermediate data
  23. 23. 23  RedPoint Global Inc. 2015 Confidential Machine Learning in a Hadoop Environment ML Algorithms: Locally Weighted Linear Regression K-Means Nearest Neighbor (KNN) Feed-forward Multi-layer Neural Network (MLP) Principal Component Analysis (PCA) Support Vector Machine (SVM)
  24. 24. 24  RedPoint Global Inc. 2015 Confidential Machine Learning in a Hadoop Environment – example Hadoop Multi-Core Tests (per Chu and Kim, et. al 2006, Stanford NLPG) # Processors Speed increase
  25. 25. 25  RedPoint Global Inc. 2015 Confidential ML in a Hadoop Environment – Evolutionary Optimization Architecture Offspring Partition Offspring Partition Map Initial (seed) Population Coordinator Map ... ... Offspring Partition Master (Variation) Reducer Reducer ... 1st reduction stage (local selection) 2nd reduction stage (global selection) Reducer Nth generation solutions map stage (evaluation)
  26. 26. 26  RedPoint Global Inc. 2015 Confidential Machine Learning – Hadoop, MPI, GPU? query info Analyze the algorithmic bottlenecks Use Hadoop / MapReduce if: large number of features relatively few inter-process communication steps e.g., on-line training Use MPI, GPUs if: large number of training samples e.g., batch training
  27. 27. 27  RedPoint Global Inc. 2015 Confidential Optimization – Don’t Stop Now Adaptation update models regularly drop old data, retrain Model with different time scales daily, weekly, seasonal, yearly, multi-year Automate the process !
  28. 28. 28  RedPoint Global Inc. 2015 Confidential A Word about RedPoint Global Launched 2006 Founded and staffed by industry veterans Headquarters: Wellesley, Massachusetts Offices in US, UK, Australia, Philippines Global customer base Serves most major industries MAGIC QUADRANT Data Quality MAGIC QUADRANT Multichannel Campaign Management MAGIC QUADRANT Integrated Marketing Management
  29. 29. 29  RedPoint Global Inc. 2015 Confidential Time for Q&A For more information contact: Bill Porto RedPoint Global Inc. 36 Washington St., Suite 120 Wellesley Hills, MA 02481 vwporto@redpoint.net
  • ElisabettaRonchieri

    Dec. 3, 2019
  • sadanandvwargad88

    Mar. 25, 2018
  • ssuser9bc6ea1

    Apr. 19, 2016
  • liaochei

    Aug. 26, 2015
  • ssuserf88631

    Aug. 26, 2015

Hadoop Summit 2015

Aufrufe

Aufrufe insgesamt

1.886

Auf Slideshare

0

Aus Einbettungen

0

Anzahl der Einbettungen

3

Befehle

Downloads

0

Geteilt

0

Kommentare

0

Likes

5

×