Designed for busy Architects and Managers, this Lightbend Webinar features Dr. Emre Velipasaoglu, Principal Data Scientist at Lightbend, who explains what ML is really all about, the ideal use cases for ML, and how getting it right can benefit your streaming and Fast Data application architectures.
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
What's The Role Of Machine Learning In Fast Data And Streaming Applications?
1. What's The Role of Machine Learning In
Fast Data and Streaming Applications?
WEBINAR
Emre Velipasaoglu, Ph.D
Principal Data Scientist
2. What is Machine Learning?
A computer program is a set of explicit
instructions that produce an output for a
given input.
Machine Learning (ML) is about how to
program computers to improve
automatically with experience rather
than explicit instructions.
val x = 0 to 1000000000000
val y = x.map(x => x/2.0 + 1)
1 -> 1.5
2 -> 2.0
=>
10^12 -> ?
3. Why does everyone want to use it?
A lot of the recent transformative technologies are based on ML:
• optical character recognition
• speech recognition
• fraud detection
• web search
• personalized marketing
and advertising
• Computer Aided medical
Diagnosis (CADx)
4. Why does everyone want to use it?
Emerging trends that will leverage ML:
• Internet of Things (IoT)
• Augmented Reality (AR)
and Virtual Reality (VR)
• Autonomous vehicles
• Customer service chat bots
• Security: Face/voice/biometrics recognition
• Healthcare: drug discovery, outcome prediction,
personalized care
• Democratization of ML and the long tail of ML applications
5. What are some of the use cases?
Machines do certain tasks better than humans.
• IBM DeepBlue in chess
• IBM Watson in Jeopardy
• Google AlphaGo in go
• Lip reading: LipNet 93% vs. humans 65%
6. What are some of the use cases?
Machines are more cost efficient in certain tasks:
• Transcribing: Microsoft 89% vs. humans 89%
• Computer-aided diagnosis: E.g. Dermatologist-level classification of skin
cancer with deep neural networks, Esteva, et. al., published in Nature,
June 2017.
7. What are some of the use cases?
Machines are the only way to scale up processing in certain tasks.
• Previewing video: Clarifai can
analyze 3.5 minutes of video in
10 seconds for detecting
objects.
• Commercial loan agreements
review: AI in seconds vs.
humans in 360,000 hours.
8. Why should you care?
Information gives competitive advantage.
ML unlocks information from your data
It is your product, your data, your operations.
ML is being democratized.
It is not for a handful of giant tech companies anymore.
9. How does it work?
E.g. Augmented Reality (AR) Shopping Personalization:
• Shopping is one area where
AR is expected to impact.
• IBM CeBIT 2013 app:
• scans a shelf
• recognizes products
• overlays nutritional info
• Add a recommender system,
tailored to
• your customers,
• your product catalog.
10. Learning a Recommender
User Rating Matrix
A B C D
Alan 5 1 1
Emre 4 2 3
Vishal 5 1
Matrix
Factorization
User Latent
Factor Model
f1 f2
Alan 1.63 0.89
Emre 0.89 2.10
Vishal 2.03 1.01
Item Latent Factor Model
A B C D
f1 2.21 1.88 -0.24 0.33
f2 1.68 1.22 1.51 0.74
11. Scoring Items
Query
User
User Latent Factors
f1 f2
Vishal 2.03 1.01
Query
Item
Estimate
Ratings
Ranked Items
score
A 6.00
D 1.41
C 1.05
Item Latent Factors
A C D
f1 2.21 -0.24 0.33
f2 1.68 1.51 0.74
User Latent
Factor Model
f1 f2
Alan 1.63 0.89
Emre 0.89 2.10
Vishal 2.03 1.01
Item Latent Factor Model
A B C D
f1 2.21 1.88 -0.24 0.33
f2 1.68 1.22 1.51 0.74
ApplicationUserItems
12. Modern ML
• Size of data
• Does not fit in one node, must distribute
• E.g. billions of user x item ratings (several orders of magnitude more
historical events to aggregate ratings from)
• Size of model
• Does not fit in one node, must distribute
• E.g. millions of users, thousands of products
• Learning speed
• Anywhere from real-time model updates to batch updates in minutes
• Operational latency and throughput
• Low milliseconds response time for millions of transactions
13. ML Lifecycle - Development
Early research
• Explore of modeling techniques
Iterations
• Feature selection
• Training parameter tuning
Productization
• Feature computations, model updating, scoring, caching, optimizations, etc.
(e.g. update and query of latent factors)
recommender
system
collaborative filtering
content-based filtering
hybrid models
Bayesian networks
clustering
latent semantic models
Markov decision process …
singular value decomposition
alternating least squares
non-negative matrix factorization
number of latent factors
step size
convergence criteria
14. ML Lifecycle - Management
Monitoring
• Model performance
• Latency
• Throughput
• Model quality
• Drift (e.g. has the user’s tastes changed recently?)
• Security and robustness
Controlling
• Model optimization (for performance)
• Model update (for quality)
15. Which tools are available and what do they do?
Machine Learning
Spark MLlib ML library for Spark
Flink ML ML library for Flink
Mahout Distributed or scalable ML algorithms
Tensorflow Google's open source deep learning library
Theano Numerical library for Python, especially for deep learning
Deeplearning4j Deep learning library in Java
BigDL Intel’s distributed deep learning library on Spark
scikit-learn Main ML library for Python
OpenNLP ML based toolkit for the processing of natural language text
16. Which tools are available and what do they do?
Streaming
Flink Stream processing framework with sophisticated handling of
late arriving data
Spark Streaming Dataset based computing framework with mini-batch
streaming support
Beam API for data processing pipelines
Data Ingestion
Kafka Distributed stream processing for high-throughput, low-
latency, real-time data feeds
Flume Log processing
17. Which tools are available and what do they do?
Persistence and Storage
HDFS Hadoop based Distributed File System
Cassandra Distributed NoSQL database management system
ElasticSearch Distributed, RESTful search and analytics engine
AWS S3 Cloud based object store
18. How does the Fast Data Platform tie it all together?
HDFS
User
Rating
Matrix
Spark
Matrix
Factorizat
ion
Cassandra
Latent
Item
Model
Latent
User
Model
Flink / Akka Streams
Query
Item
Query
User
Item
Factors
User
Factors
Score
Kafka
Ranked
Items
Kafka
User Items
Application
batch streaming
19. What else does FDP provide?
data persistence & storage
stream
processing
machine learning
cluster
analysis
infrastructure
durable
messaging
backplane
microservices
intelligent
management
20. In Summary
• Machine Learning is the way to build transformative products leveraging data that
are otherwise impossible to build.
• It is not difficult to build Machine Learning based solutions, thanks to new open
source tools.
• Lightbend’s Fast Data Platform provides an easy onramp for building, deploying
and running Fast Data clusters and services leveraging best of breed tools .
21. Upgrade your grey matter!
Get the free O’Reilly book by Dr. Dean Wampler,
VP of Fast Data Engineering at Lightbend
bit.ly/lightbend-fast-data