Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Things we will cover
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 2
GOAL
Explain Cloud IoT, its challenges, and a
p...
WHO WE ARE
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 3
Who I am
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 4
I am interested in creating intelligent systems
through inc...
Who we are: CenturyLink Cloud
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 5
+ ++
CLOUD COLOCATION NETWORK MANAGED
...
What is IoT
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 6
Human desire to connect ourselves to
each other via tech...
Internet growth > Hardware growth
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 7
motherboard.vice.com
newscientist....
CenturyLink Cloud IoT Advantage
►  37 states
►  550,000 miles of network
►  Innovative Gigabit
fiber network
►  25MM+ cons...
PROBLEM
STATEMENT
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 9
Problem statement:
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 10
►  Prevent incidents
through early
detection
►  ...
GOAL
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 11
Build a real-time artificial intelligence
capable of analyzing...
PREDICTION
LANDSCAPE
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 12
Prediction Adoption Model
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 13
Stage I:
INTRODUCTION
1. Design
2. Measur...
Prediction Adoption Model (actual)
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 14
TIME
SOPHISTICATION
CHECK
THIS
O...
Stage I: INTRODUCTION
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 15
1. Design
►  What should we measure?
►  What ...
Stage II: GROWTH
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 16
3. Describe
►  Which metrics relate to our
outcome...
Stage III: MATURITY
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 17
7. Predict
►  Are there patterns?
►  Are there ...
Stage IV: DECLINE
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 18
7. Feedback
►  Is my model primarily basing its
d...
Domain process involvement
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 19
BUSINESS
►  Is involved early
in definin...
SOLUTION
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 20
Working backwards
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 21
ITEM
1 Skynet
2 Action mapping
3 Action landscape...
Working backwards (cont.)
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 22
ITEM STAGE
1 Skynet ACT
2 Action mapping ...
Working backwards (cont.)
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 23
ITEM STAGE PRIMARY DOMAIN
1 Skynet ACT EN...
This is a WIP
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 24
ITEM STAGE PRIMARY DOMAIN
1 Skynet ACT ENGINEERING
2 ...
LESSONS
LEARNED
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 25
16. DOMAIN MODEL
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 26
►  938,076 metrics
►  Verify the unique stream of
...
15. APPROACH
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 27
VARIABILITY
►  Changes in observed state
►  Plan for v...
14. COLLECTION
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 28
►  Agreement of signals
►  Cacophony of
signals
►  H...
13. SAMPLING
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 29
Shannon-Nyquist Paradox
►  The more you measure
someth...
12. RETENTION
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 30
►  Recall that precision relates to
sampling consiste...
11. NORMALIZATION
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 31
Kievit, R.A., Frankenhuis, et al. (2013). Simpson...
26-Jul-16 32
Predicted
CenturyLink Confidential
Actual Boundary
10. ANOMALY DETECTION
►  Capture the time series data
for ...
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 33
►  Time series data shows
the context behind
anomalies that co-occu...
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 34
►  We have also built a search
engine for time series data
that all...
6. TRAINING DATA
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 35
►  Evaluate ALL assumptions
in regards to training...
RESULTS
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 36
Prediction Results
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 37
►  38,392,438 predictions every 24hr.
►  Anomaly...
Want to join me?
Let’s connect:
►  @ryan_kirk
Try CenturyLink Cloud free:
►  ctl.io
We are hiring
►  ctl.io/careers/jobs
T...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyzing - StampedeCon 2016
Nächste SlideShare
Wird geladen in …5
×

Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyzing - StampedeCon 2016

603 Aufrufe

Veröffentlicht am

The collection and use of Big Data has become an important part of modern business practice. The Internet of Things (IoT) movement promises to provide new opportunities for businesses interested in the intersection of people and technology. It is also wrought with pitfalls for practitioners and researchers who struggle to make sense of an increasing cacophony of signals. How should they poll and collect data from millions of signals in a way that is manageable, scalable, and statistically valid? How should they analyze and predict using these data? This presentation will discuss these challenges with applied examples from monitoring and managing one of the world’s largest computers.

Veröffentlicht in: Technologie
  • Loggen Sie sich ein, um Kommentare anzuzeigen.

Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyzing - StampedeCon 2016

  1. 1. Things we will cover 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 2 GOAL Explain Cloud IoT, its challenges, and a principled, agile approach to prediction amidst uncertainty in such a way that people from a broad audience can (hopefully) relate. WILL ►  IoT, Cloud landscape, and CTL ►  Prediction Lifecycle ►  Challenges by business domain ►  Data Science Lessons Learned WILL NOT ►  Big Data ►  Architecture ►  Algorithms ►  Technology
  2. 2. WHO WE ARE 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 3
  3. 3. Who I am 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 4 I am interested in creating intelligent systems through incorporating humans and machines in an active learning loop. ►  Decision Scientist with PhD in HCI from Iowa State ►  Principal Data Scientist for CenturyLink Cloud ►  Curricular Design, Educational Technology, Online Advertising, Online Retail, Big Data UX, Cloud, IoT, Physics ►  Hiking, Data journalism, Stocks, Horse Racing ryankirk.info
  4. 4. Who we are: CenturyLink Cloud 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 5 + ++ CLOUD COLOCATION NETWORK MANAGED SERVICES
  5. 5. What is IoT 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 6 Human desire to connect ourselves to each other via technology ►  Modern plumbing… ►  Telegraph ! Telephone ►  Telephone ! Dial-up ►  Dial-up ! HSN ►  HSN ! WAN ►  WAN ! IoT Human desire to connect ourselves to each other via technology to empower each other
  6. 6. Internet growth > Hardware growth 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 7 motherboard.vice.com newscientist.com
  7. 7. CenturyLink Cloud IoT Advantage ►  37 states ►  550,000 miles of network ►  Innovative Gigabit fiber network ►  25MM+ consumer endpoints ►  60+ DCS 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 8
  8. 8. PROBLEM STATEMENT 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 9
  9. 9. Problem statement: 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 10 ►  Prevent incidents through early detection ►  Reduce MTTR by facilitating root-cause analytics ►  Facilitate domain experts and harvest their knowledge "
  10. 10. GOAL 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 11 Build a real-time artificial intelligence capable of analyzing all incoming streams of data in order to know which actions our machines need to automatically take. It’s simple, really… build Skynet
  11. 11. PREDICTION LANDSCAPE 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 12
  12. 12. Prediction Adoption Model 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 13 Stage I: INTRODUCTION 1. Design 2. Measure Stage III: MATURITY 5. Predict 6. Act TIME SOPHISTICATION INTRO GROWTH MATURITY DECLINE Stage II: GROWTH 3. Describe 4. Detect Stage IV: DECLINE 7. Feedback 8. Obsolescence
  13. 13. Prediction Adoption Model (actual) 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 14 TIME SOPHISTICATION CHECK THIS OUT OH NO, OH NO, OH NO! HAHA, IT WORKED! I NEVER SAID IT WOULD … Stage I: CHECK THIS OUT 1. It runs 2. Results are promising Stage III: HAHA, IT WORKED! 5. I surprise myself sometimes 6. I found a shortcut to scale it Stage II: OH NO, OH NO, OH NO! 3. It works but it’s terrible 4. It will never scale Stage IV: I NEVER SAID IT WOULD… 7. How do I prove it is still working? 8. There is no way to apply it to this scenario
  14. 14. Stage I: INTRODUCTION 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 15 1. Design ►  What should we measure? ►  What are the core business processes? ►  What is the unit of analysis? ►  What are our research questions/ hypotheses? 2. Measure ►  Do we push or pull? ►  How often should we measure? ►  How long do we need the data? ►  How do we represent the data schema?
  15. 15. Stage II: GROWTH 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 16 3. Describe ►  Which metrics relate to our outcomes of interest? ►  What is the typical value of each metric? ►  How do you visualize each metric? 4. Detect ►  What do we expect to happen? ►  Which values/events are unexpected? ►  When should we alert? ►  How will we scale our analysis?
  16. 16. Stage III: MATURITY 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 17 7. Predict ►  Are there patterns? ►  Are there more complex relationships? ►  What is going to happen? ►  How do we get training data? 6. Act ►  What actions should we take? ►  How can we incorporate new outcomes into the current model?
  17. 17. Stage IV: DECLINE 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 18 7. Feedback ►  Is my model primarily basing its decisions upon its previous decisions? ►  Can I separate the model from its parameters? ►  Can I still evaluate accuracy? 8. Obsolescence ►  Are my business scenarios still grounded? ►  Do my model assumptions still hold? ►  Does it still scale? ►  Is the intervention still needed?
  18. 18. Domain process involvement 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 19 BUSINESS ►  Is involved early in defining requirements ENGINEERING ►  Builds MVP ►  Solidifies solution RESEARCH ►  Builds prototype and suggests solution
  19. 19. SOLUTION 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 20
  20. 20. Working backwards 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 21 ITEM 1 Skynet 2 Action mapping 3 Action landscape 4 Prediction 5 Categorical learning 6 Training Data 7 Feedback loop 8 High SNR 9 Unsupervised learning 10 Anomaly Detection 11 Normalization 12 Retention 13 Sampling 14 Collection 15 Approach 16 Domain model “In life, unless you’re more gifted than Einstein, inversion [i.e. working backwards] will help you solve problems.” Charlie Munger
  21. 21. Working backwards (cont.) 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 22 ITEM STAGE 1 Skynet ACT 2 Action mapping ACT 3 Action landscape ACT 4 Prediction PREDICT 5 Categorical learning PREDICT 6 Training Data PREDICT 7 Feedback loop PREDICT 8 High SNR DETECT 9 Unsupervised learning DETECT 10 Anomaly Detection DETECT 11 Normalization DESCRIBE 12 Retention DESCRIBE 13 Sampling MEASURE 14 Collection MEASURE 15 Approach DESIGN 16 Domain model DESIGN TIME SOPHISTICATION INTRO GROWTH MATURITY DECLINE
  22. 22. Working backwards (cont.) 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 23 ITEM STAGE PRIMARY DOMAIN 1 Skynet ACT ENGINEERING 2 Action mapping ACT BUSINES 3 Action landscape ACT RESEARCH 4 Prediction PREDICT RESEARCH 5 Categorical learning PREDICT RESEARCH 6 Training Data PREDICT ENGINEERING 7 Feedback loop PREDICT BUSINESS 8 High SNR DETECT RESEARCH 9 Unsupervised learning DETECT RESEARCH 10 Anomaly Detection DETECT RESEARCH 11 Normalization DESCRIBE RESEARCH 12 Retention DESCRIBE ENGINEERING 13 Sampling MEASURE RESEARCH 14 Collection MEASURE ENGINEERING 15 Approach DESIGN RESEARCH 16 Domain model DESIGN BUSINESS
  23. 23. This is a WIP 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 24 ITEM STAGE PRIMARY DOMAIN 1 Skynet ACT ENGINEERING 2 Action mapping ACT BUSINES 3 Action landscape ACT RESEARCH 4 Prediction PREDICT RESEARCH 5 Categorical learning PREDICT RESEARCH 6 Training Data PREDICT ENGINEERING 7 Feedback loop PREDICT BUSINESS 8 High SNR DETECT RESEARCH 9 Unsupervised learning DETECT RESEARCH 10 Anomaly Detection DETECT RESEARCH 11 Normalization DESCRIBE RESEARCH 12 Sampling MEASURE RESEARCH 13 Collection MEASURE ENGINEERING 14 Domain model DESIGN BUSINESS QUEUED (StampedCon 2017?) WORKING PRODUCTION
  24. 24. LESSONS LEARNED 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 25
  25. 25. 16. DOMAIN MODEL 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 26 ►  938,076 metrics ►  Verify the unique stream of data across systems ►  Key-based DESIGN
  26. 26. 15. APPROACH 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 27 VARIABILITY ►  Changes in observed state ►  Plan for variability UNCERTAINTY ►  Unobserved state(s) ►  Design for uncertainty DESIGN (cont.)
  27. 27. 14. COLLECTION 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 28 ►  Agreement of signals ►  Cacophony of signals ►  How often should we measure? ►  We have no labeled training data ►  An approach we can build upon in the future MEASURE
  28. 28. 13. SAMPLING 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 29 Shannon-Nyquist Paradox ►  The more you measure something the more it varies ►  Bias related to time and variability ►  EG. Temperature yesterday was 68 degrees MEASURE (cont.)
  29. 29. 12. RETENTION 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 30 ►  Recall that precision relates to sampling consistency ►  Not all metrics are created equal ►  Coverage remains problematic DESCRIBE
  30. 30. 11. NORMALIZATION 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 31 Kievit, R.A., Frankenhuis, et al. (2013). Simpson’s paradox in psychological science. Frontiers in Psychology Simpson’s Paradox ►  aggregate trend != sum of individual trends ►  Applies to all aggregates: sums, averages, correlations, etc. ►  What is the unit of analysis? DESCRIBE (cont.)
  31. 31. 26-Jul-16 32 Predicted CenturyLink Confidential Actual Boundary 10. ANOMALY DETECTION ►  Capture the time series data for each piece of connected platform technology ►  Find implicit anomalies within a time series vector ►  Values that are surprising ►  Highly scalable DETECT presented by Ryan Kirk at StampedeCon 2016
  32. 32. 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 33 ►  Time series data shows the context behind anomalies that co-occur ►  Group anomalous vectors based upon structural properties and co-occurrence ►  Up-level anomalies into higher-order alerts using contextual information 9. UNSUPERVISED LEARNING DETECT (cont.) 8. HIGH SNR
  33. 33. 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 34 ►  We have also built a search engine for time series data that allows us to build cool looking graphs in real-time ►  We basically do all of this to empower slack alerts ►  Allows tags to propagate forwards 7. FEEDBACK LOOP PREDICT
  34. 34. 6. TRAINING DATA 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 35 ►  Evaluate ALL assumptions in regards to training data ►  Ideally use active learning approach or risk becoming tautological PREDICT (cont.)
  35. 35. RESULTS 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 36
  36. 36. Prediction Results 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 37 ►  38,392,438 predictions every 24hr. ►  Anomaly rate < 0.01% (0.0001) ~3K anomalies/day ►  Accuracy is ~90% ►  Prediction latency ~3.0 seconds ►  ~30 Higher order alerts/day
  37. 37. Want to join me? Let’s connect: ►  @ryan_kirk Try CenturyLink Cloud free: ►  ctl.io We are hiring ►  ctl.io/careers/jobs Thanks to: ►  StampedeCon2016 ►  pixabay.com 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 38

×