Weitere ähnliche Inhalte Ähnlich wie Data Science Powered Apps for Internet of Things (20) Mehr von VMware Tanzu (20) Kürzlich hochgeladen (20) Data Science Powered Apps for Internet of Things1. 1© Copyright 2016 Pivotal. All rights reserved.
Data Science-Powered Apps for
the Internet of Things
Chris Rawles1 and Jarrod Vawdrey2
1. Sr. Data Scientist in New York, New York
2. Sr. Data Scientist in Atlanta, Georgia
2. 2© Copyright 2016 Pivotal. All rights reserved.
Today’s talk
1. A real-time data science app
A. The app: a live demonstration
B. How can a data scientist build a data science application?
C. Revisiting the app
2. Generalizing the framework: Solving new data science
challenges
A. Internet of Things – Creating a smart app to prevent oil spill disasters
B. Financial data - How can retail banks influence their cardholders’
behavior?
3. 3© Copyright 2016 Pivotal. All rights reserved.
Today’s talk
1. A real-time data science app
A. The app: a live demonstration
B. How can a data scientist build a data science application?
C. Revisiting the app
2. Generalizing the framework: Solving new data science
challenges
A. Internet of Things – Creating a smart app to prevent oil spill disasters
B. Financial data - How can retail banks influence their cardholders’
behavior?
5. 5© Copyright 2016 Pivotal. All rights reserved.
Today’s talk
1. A real-time data science app
A. The app: a live demonstration
B. How can a data scientist build a data science application?
C. Revisiting the app
2. Generalizing the framework: Solving new data science
challenges
A. Internet of Things – creating a smart app
B. Financial data - How can retail banks influence their cardholders’
behavior?
6. 6© Copyright 2016 Pivotal. All rights reserved.
Training
app
Model
Scoring as
a service
Model
Training as
a service
Sensor
app
Scoring
app
Dashboard
app
Data science workflow: Movement classification
1. Sensor + Dashboard
2. Redis
3. Training app
4. Scoring app
7. 7© Copyright 2016 Pivotal. All rights reserved.
here is my source code
run it on the cloud for me
- Onsi Fakhouri
@onsijoe
i do not care how
8. 8© Copyright 2016 Pivotal. All rights reserved.
cf push
CF determines app type (Java, Python, Ruby, …)
Installs necessary environment
Provisions and binds data services
Creates domain, routing, and load balancing
Continual app health checks and restarts
9. 9© Copyright 2016 Pivotal. All rights reserved.
Data ingestion: Accelerometric data
Accelerometric data streamed from
mobile phone at 15 Hz (15x / second)
Other sensor data: gyroscopic data,
magnetometer data, lon/lat, etc.
Accelerometer axes
10. 10© Copyright 2016 Pivotal. All rights reserved.
For real-time applications, low-latency data ingestion into
the data store is essential
WebSocket protocol - socket.io
– Mobile phone Webserver
– Webserver Dashboard
socket.io redis
Data ingestion
Training
app
Sensor
app
11. 11© Copyright 2016 Pivotal. All rights reserved.
Data storage
We are using a redis store for:
– Storing training data
– Model persistence
– Storing a micro-batch of scoring data
Other storage systems include GemFire, HAWQ/Hadoop,
Greenplum Database, PostgreSQL, …
12. 12© Copyright 2016 Pivotal. All rights reserved.
Modeling
Scalable machine learning applications in Pivotal
Cloud Foundry
1. Training app
2. Scoring app
13. 13© Copyright 2016 Pivotal. All rights reserved.
Modeling – Training app
Goal: build a data-driven model that learns accelerometric
motions associated with each activity
Feature Engineering
• Time-domain
transformations
• Fast Fourier Transform
analysis
Machine Learning
Classification Model
• Random Forest Model
using 2 second time
windows (30 samples)
…
Training data
Trained
model
14. 14© Copyright 2016 Pivotal. All rights reserved.
Model building
20 seconds per
training activity
Two second moving
window on training
data
Features: time-
domain summary
statistics and Fourier
transform coefficients
15. 15© Copyright 2016 Pivotal. All rights reserved.
Model training approaches
1. Near-real-time model training
– Use small batches to train model
2. Real-time model training
– Online machine learning algorithm : continually update model
using each new data point
3. Offline model training
– Build a model offline using batches
– Useful for models requiring finer model tuning and calibration
16. 16© Copyright 2016 Pivotal. All rights reserved.
Feature Engineering
• Time-domain
transformations
• Fast Fourier Transform
analysis
Machine Learning
Classification Model
• Random Forest Model
using 2 second time
windows (30 samples)
Trained model
Streaming input window
Model
Prediction
API Call
Model
prediction
PCF App:
Scoring app
• Real-time model scoring
• The dashboard initiates a request via
an API call and receives a model
prediction
{ "channel": "1234",
"label": ”walking",
"label_value": 0.746 }
17. 17© Copyright 2016 Pivotal. All rights reserved.
1. Application auto-scaling
– As the data grows, the model scales
2. Application autonomy
– The model application is independent of other applications = faster
development iterations
– Faster development = rapid feedback loop
3. Multiple applications can access model scoring app
Operationalizing scalable data science applications
Model scoring as a service
Why?
18. 18© Copyright 2016 Pivotal. All rights reserved.
Today’s talk
1. A real-time data science app
A. The app: a live demonstration
B. How can a data scientist build a data science application?
C. Revisiting the app
2. Generalizing the framework: Solving new data science
challenges
A. Internet of Things – creating a smart app
B. Financial data - How can retail banks influence their cardholders’
behavior?
20. 20© Copyright 2016 Pivotal. All rights reserved.
Today’s talk
1. A real-time data science app
A. The app: a live demonstration
B. How can a data scientist build a data science application?
C. Revisiting the app
2. Generalizing the framework: Solving new data science
challenges
A. Internet of Things – Creating a smart app to prevent oil spill disasters
B. Financial data - How can retail banks influence their cardholders’
behavior?
21. 21© Copyright 2016 Pivotal. All rights reserved.
Gene Sequencing
Smart Grids
COST TO SEQUENCE
ONE GENOME
HAS FALLEN FROM
$100M IN
2001
TO $10K IN 2011
TO $1K IN 2014
READING SMART METERS
EVERY 15 MINUTES IS
3000X MORE
DATA INTENSIVE
Stock Market
Social Media
FACEBOOK UPLOADS
250 MILLION
PHOTOS EACH DAY
In all industries billions of data points represent
opportunities for the Internet of Things
Oil Exploration
Video Surveillance
OIL RIGS GENERATE
25000
DATA POINTS
PER SECOND
Medical Imaging
Mobile Sensors
22. 22© Copyright 2016 Pivotal. All rights reserved.
How can we use data
to help prevent
accidents like the Macondo
Disaster ?
23. 23© Copyright 2016 Pivotal. All rights reserved. 23© Copyright 2016 Pivotal. All rights reserved.
…by creating a Smart Application
24. 24© Copyright 2016 Pivotal. All rights reserved.
Training
app
Model
Scoring as
a service
Model
Training as
a service
Sensor
app
Scoring
app
Dashboard
app
Data science workflow: Movement classification
25. 25© Copyright 2016 Pivotal. All rights reserved.
Training
app
Model
Scoring as
a service
Model
Training as
a service
Sensor
app
Scoring
app
Dashboard
app
Data science workflow: Creating a smart app to
prevent oil spill disasters • Alert operator
• Send signal to control system
to change operating
parameters
• Replace old machinery
• Shut down plant
26. 26© Copyright 2016 Pivotal. All rights reserved.
Training
app
Model
Scoring as
a service
Model
Training as
a service
Sensor
app
Scoring
app
Dashboard
app
Data science workflow: How can retail banks influence their
cardholders’ behavior? • Provide customized services
and promotions
• Next best offer
• Characterize and improve
customer satisfaction
27. 27© Copyright 2016 Pivotal. All rights reserved.
Thank you
Questions and comments
crawles@pivotal.io