4. AI history à Perceptron
1958 F. Rosenblatt,
“Perceptron” model,
neuronal networks
1943 W. McCulloch,
W. Pitts, “Neuron” as
logical element
OR function XOR function
1969 M. Minsky,
S. Papert, triggers
first AI winter
feed forward
5. AI history à AI winter
1958 F. Rosenblatt,
Perzeptron model,
neuronal networks
1987-1993 the second
AI winter, desktop
computer, LISP
machines expensive
1943 W. McCulloch,
W. Pitts, neuron as
logical element
1980 Boom expert
systems, Q&A using
logical rules, Prolog
1969 M. Minsky,
S. Papert, trigger
first AI winter
1993-2001
Moore’s law, Deep
blue chess-
playing, Standford
DARPA challenge
6. 6
AI beats human in games - 2016
Komodo beasts H. Nakamura in 2016AlphaGo beats L. Sedols in 2016
Go 4:1 Chess 2:1
7. Image Classification- 2016
Human Performance AI Performance
https://arxiv.org/pdf/1602.07261.pdf
95% 97%
The ability to understand the content of an image by using machine learning
8. Breast Cancer Diagnoses - 2017
Pathologist Performance AI Performance
https://research.googleblog.com/2017/03/assisting-pathologists-in-detecting.html
73% 92%
Doctors often use additional tests to find or diagnose breast cancer
The pathologist ended up
spending 30 hours on this
task on 130 slides
A closeup of a lymph node biopsy.
11. Fishing in the sea versus fishing in the lake
Data Warehouse Data Lake
Business Intellingence helps find
answers to questions you know.
Data Science helps you find the
question itself.
Any kind of data & schema-on-readStructured data & schema-on-write
Parallel processing on big dataSQL-ish queries on database tables
Extract, Transform, Load Extract, Load, Transform-on-the-fly
Low cost on commodity hardwareExpensive for large data
12. More Data + Bigger Models
Accuracy
Scale (data size, model size)
other approaches
neural networks
1990s
https://www.scribd.com/document/355752799/Jeff-Dean-s-Lecture-for-YC-AI
13. More Data + Bigger Models + More Computation
Accuracy
Scale (data size, model size)
other approaches
neural networks
Now
https://www.scribd.com/document/355752799/Jeff-Dean-s-Lecture-for-YC-AI
more compute
14. More Data + Bigger Models + More Computation
= Better Results in Machine Learning
15. Millions of “trip”
events each day globally
400+ billion viewing-
related events per day
Five billion data points
for Price Tip feature
Movie
recommendation
Price
optimization
Routing and price
optimization
21. Train and evaluate machine learning models at scale
Single machine Data center
How to run more experiments faster and in parallel?
How to share and reproduce research?
How to go from research to real products?
22. Distributed Machine Learning
Data Size
Model Size
Model parallelism
Single machine
Data center
Data
parallelism
training very large models exploring several model
architectures, hyper-
parameter optimization,
training several
independent models
speeds up the training
23. Compute Workload for Training and Evaluation
I/O intensive
Compute
intensive
Single machine
Data center
24. I/O Workload for Simulation and Testing
I/O intensive
Compute
intensive
Single machine
Data center
36. Who are you?
Who do you know?
What can you afford?
Where are you?
What have you purchased?
What do you like?
What content do you prefer?
Why have you contacted us?
38. Breaking Down Data Silos
Connect all your data tools,
other sources, and gain a 360
degree view on your data
Get actionable insights and
serve them personal, relevant
content along their journey
Real-time processing and
decision making
One
Data Platform
Marketing
Tools
Touchpoints
Historical
Aftersales
Data Analytics
Machine Learning
Data Apps
39. Actionable data insights that
businesses can use to...
ü Better understand and better
engage your customers
ü Respond to the convergence
of customer expectations
ü Driver of brand perception
360 degree view of your customers
Social
Apps
CRM
Billing
Channels
Service Call
Location
Devices
Network
Ordering
Customer 360
40. Where AI can help…
+ Predicting lifetime value
+ Churn estimation
+ Customer segmentation
+ Cross/Upselling
+ Recommendations
+ Demand forecasting
+ Market Basket Analysis
+ Sentiment analysis
+ Loyalty programs
+ Reactivation likelihood
+ Discount targeting
+ Call to action
+ Risk analysis
+ In store traffic patterns
42. High-level Development Process for Autonomous Vehicles
1 Collect
sensors data
3 Autonomous
Driving
2 Model
Engineering
Data Logger Control Unit
Big Data Trained Model
Data Center
Agenda
43. Sensors Udacity Lincoln MKZ
Camera 3x Blackfly GigE Camera, 20 Hz
Lidar Velodyne HDL-32E, 9.5 Hz
IMU Xsens, 400 Hz
GPS 2x fixed, 1 Hz
CAN bus, 1,1 kHz
Robot Operating System
Data 3 GB per minute
https://github.com/udacity/self-driving-car
44. Sensors Spec
Sensor blinding,
sunlight,
darkness
rain, fog,
snow
non-metal
objects
wind/ high
velocity
resolution range data
Ultrasonic yes yes yes no + + +
Lidar yes no yes yes +++ ++ +
Radar yes yes no yes ++ +++ +
Camera no no yes yes +++ +++ +++
46. Machine Learning for Autonomous Driving
+ Sensor Fusion clustering, segmentation, pattern recognition
+ Road ego-motion, image processing and pattern recognition
+ Localization simultaneous localization and mapping
+ Situation Understanding detection and classification
+ Trajectory Planning motion planning and control
+ Control Strategy reinforcement and supervised learning
+ Driver Model image processing and pattern recognition
47. Machine Learning Cycle
Data collection
for training/test
Feature
engineering
I/O workload
Model development
and architecture
Compute workload I/O workload
Training and
evaluation
Re- Simulation
and Testing
Scaling and
monitoring
Model deployment
versioning
1 2 3
Model tuning
48. Flux – Open Machine Learning Stack
Training & Test data
Compute + Network + Storage
Deploy model
ML Development & Catalog & REST API
ML-Specialists
Feature
Engineering
Training
Evaluation
Re-Simulation
Testing
CaffeOnSpark
Sample Model Prediction Batch Regression Cluster
Dataset Correlation Centroid Anomaly Test Scores
ü Mainly open source
ü No vendor lock in
ü Scale-out architecture
ü Multi user support
ü Resource management
ü Job scheduling
ü Speed-up training
ü Speed-up simulation
49. Feature Engineering
+ Hadoop InputFormat and
Record Reader for Rosbag
+ Process Rosbag with Spark,
Yarn, MapReduce, Hadoop
Streaming API, …
+ Spark RDD are cached and
optimized for analysis
Ros
bag
Processing
Engine
Computer
Network
Storage
Advanced
Analytics
RDD
Record
Reader
RDD
DataFrame, DataSet
SQL, Spark APIs
NumPy
Ros
Msg
50. Training & Evaluation
+ Tensorflow ROSRecordDataset
+ Protocol Buffers to serialize
records
+ Save time because data
conversion not needed
+ Save storage because data
duplication not needed
Training
Engine
Machine
Learning
Ros
bag
Computer
Network
Storage
ROS
Dataset
Ros
msg
51. Re-Simulation & Testing
+ Use Spark for preprocessing,
transformation, cleansing,
aggregation, time window
selection before publish to ROS
topics
+ Use Re-Simulation framework
of choice to subscribe to the
ROS topics
Engine
Re-Simulation
with framework
of choice
Computer
Network
Storage
Ros
bag
Ros
topic
core
subscribe
publish
53. + Classification, Regression, Clustering,
Collaborative Filtering, Anomaly Detection
+ Supervised/Unsupervised Reinforcement
Learning, Deep Learning, CNN
+ Model Training, Evaluation, Testing,
Simulation, Inference
+ Big Data Strategy, Consulting, Data
Lab, Data Science as a Service
+ Data Collection, Cleaning, Analyzing,
Modeling, Validation, Visualization
+ Business Case Validation,
Prototyping, MVPs, Dashboards
Data Science Machine Learning
54. + Architecture, DevOps, Cloud Building
+ App. Management Hadoop Ecosystem
+ Managed Infrastructure Services
+ Compute, Network, Storage, Firewall,
Loadbalancer, DDoS, Protection
+ Continuous Integration and Deployment
+ Data Pipelines (Acquisition,
Ingestion, Analytics, Visualization)
+ Distributed Data Architectures
+ Data Processing Backend
+ Hadoop Ecosystem
+ Test Automation and Testing
Data Engineering Data Operations
55. Think Big Business Strategy
Data Strategy
Technology Strategy
Agile Delivery Model
Business Case Validation
Prototypes, MVPs
Data Exploration
Data AcquisitionStart Small
Value
Proposition