SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Image Caption Generation: Intro to Distributed
Tensorflow and Distributed Scoring with Apache Spark
Luca Grazioli, Data Scientist @ ICTEAM
Data Science Milan, 15th May 2017
2 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Agenda
 Who am I
 ICTeam Big Data Lab
 What’s Deep Learning?
 Deep Learning Challenges
 Tensorflow
 Distributed Tensorflow
 Image caption generation
 Distributed scoring with APACHE SPARK
3 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Who am I
MSc computer science
• University of Milan-Bicocca
• Definition of a Knowledge Engineer ML model
Academic research
• Modeling and understanding time-evolving scenario
(http://www.iiisci.org/journal/CV$/sci/pdfs/SA268SN15.pdf)
Data Scientist @ ICTeam
• Big Data Science
• Data Engineering (a bit!)
• Deep Learning
More at: http://luca-grazioli.it or on Linkedin
4 ICTeam S.p.A. – Presentazione della Divisione Progettazione
GPU NODE 2
ICTeam Big Data Lab
BIG DATA CLUSTER
CLUSTER
NODE 1
CLUSTER
NODE 2
CLUSTER
NODE 3
CLUSTER
NODE 4
EDGE NODE
WEB CLIENT TOOLS
GPU NODE 2
GPU NODE 1
5 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Deep Learning
Credit by Lukas Masuch
6 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Deep Learning
Computer vision
Natural Language
processing
Speech recognition
7 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Deep Learning Challenges
Data Volume
CPU usage
Graph complexity
Parameter Space
8 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Tensorflow
x W
Matmul
b
Add
RELU
C import tensorflow as tf
b = tf.Variable(tf.zeros(100)
x = tf.placeholder(name=‘x’)
W = tf.variable(. . .)
regr = tf.matmul(W, x) + b
relu = tf.nn.relu(regr)
C = [. . .]
# Session
s = tf.Session()
for step in xrange(0, 10):
input = . . .
result = s.run(C, feed_dict={x:
input})
…
[...] Tensorflow takes
computation described
[...] and maps it onto a
wide variety of
different HW
platform, ranging from
[…] mobile device
platforms such as
Android and iOS to […]
large scale
computing systems
[...]
9 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed TF: concepts
Multi-device
execution
Data
Parallel
Training
In-graph
processing
Between-
graph
processing
Model
parallel
training
Model
computation
pipelining
10 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed TF: concepts
Multi-device
execution
Credits: http://download.tensorflow.org/paper/whitepaper2015.pdf
Multi-device
execution
Data Parallel
Training
In-graph
processing
Between-
graph
processing
Model
parallel
training
Model
computation
pipelining
11 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed TF: concepts
Model
parallel
training
Credits: http://download.tensorflow.org/paper/whitepaper2015.pdf
Multi-device
execution
Data Parallel
Training
In-graph
processing
Between-
graph
processing
Model
parallel
training
Model
computation
pipelining
12 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed TF: concepts
Model
computation
pipelining
Credits: http://download.tensorflow.org/paper/whitepaper2015.pdf
Multi-device
execution
Data Parallel
Training
In-graph
processing
Between-
graph
processing
Model
parallel
training
Model
computation
pipelining
13 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed TF: concepts
In-graph Processing
Client
Worker 1
CPU:0 GPU:0
Worker 2
CPU:0
PARAMETER SERVERS
PS1 PS2 PS3
Between-graph Processing
Client 1
Worker 1.1
CPU:0 GPU:0
Worker 1.2
CPU:0
PARAMETER SERVERS
PS1 PS2 PS3
Client 2
Worker 2.1
CPU:0 GPU:0
CPU:0
Worker 2.2
14 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Layer 1
Layer 2
Data
shard 2
Layer 1
Layer 2
Layer 1’
Layer 2’
Data
shard 1
Data
shard 4
Layer 1’
Layer 2’
Between-graph Processing
Data
shard 3
Client 1
Worker 1.1 Worker 1.2
PARAMETER SERVERS
PS1 PS2 PS3
Client 2
Worker 2.1
Worker 2.2
Layer 1
Layer 2
Layer 1
Layer 2
Data
shard 3
Data
shard 1
In-graph Processing
Client
Worker 1 Worker 2
PARAMETER SERVERS
PS1 PS2 PS3
Data
shard 2
Data
shard 4
Distributed TF: concepts
15 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Image caption generation
Credits: https://github.com/tensorflow/models/tree/master/im2txt http://press.liacs.nl/mirflickr/
• A couple of dogs standing next to each
other.
• A couple of dogs are standing in a field.
• A couple of dogs standing next to each
other on a field.
• A scenic view of a lake with mountains in the
background.
• A scenic view of a lake with mountains in the
distance.
• A scenic view of a lake with a mountain in the
background.
• A city street at night with traffic lights.
• A city street at night with a red light.
• A city street at night with a red light.
Try it
yourself!
http://bit.ly
/2r8jU1q
16 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Image caption generation
17 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed scoring with APACHE SPARK
Phase1 - Ingestion
File
Syste
m
Phase 2 – Distributed Scoring
Data
Node
. . .
CLUSTER EDGE
NODE
SPARK
DRIVER
Data
Node
18 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed scoring with APACHE SPARK
Read images
• images_df = spark.read.parquet('/user/lgrazioli/flickrTestImageBin/')
Define
Scoring function
• Restore last training checkpoint
• Define iterator function to yield scored record from a partition
Let’s score!
• scored_sample_rdd = images_df.rdd.mapPartitions(score_partition).flatMap(lambda
x: x)
• scored_df = spark.createDataFrame(scored_sample_rdd, schema)
19 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Conclusion
Today’s goals:
• Understand Deep Learning technological challanges
• How to distribute a deep learning training algorithm
• How to score in distributed fashion
• How a big data ecosystem can help
Future works:
• Tensorframes https://github.com/databricks/tensorframes
• New technologies (e.g. TPU )
• Tensorflow improvements
• High-level API
20 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Bibliography
1. Deep Learning - The Past, Present and Future of Artificial
Intelligence (Lukas Masuch)
2. TensorFlow: Large-Scale Machine Learning on Heterogeneous
Distributed Systems (Martin Abadi, Ashish Agarwal, et al.)
3. https://github.com/tensorflow/models/tree/master/im2txt
4. http://press.liacs.nl/mirflickr/
Thanks!

Weitere ähnliche Inhalte

Was ist angesagt?

Driverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.aiDriverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.ai
Sri Ambati
 
DeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoTDeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoT
Romeo Kienzler
 

Was ist angesagt? (20)

SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
 
Dataiku data science studio
Dataiku data science studioDataiku data science studio
Dataiku data science studio
 
Webinar - Patient Readmission Risk
Webinar - Patient Readmission RiskWebinar - Patient Readmission Risk
Webinar - Patient Readmission Risk
 
MATLAB Simulink Research Help
MATLAB Simulink Research HelpMATLAB Simulink Research Help
MATLAB Simulink Research Help
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
Machinel Learning with spark
Machinel Learning with spark Machinel Learning with spark
Machinel Learning with spark
 
Driverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.aiDriverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.ai
 
TensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsTensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative models
 
CD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systemsCD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systems
 
Hadoop Mapreduce Projects
Hadoop Mapreduce ProjectsHadoop Mapreduce Projects
Hadoop Mapreduce Projects
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
TigerGraph.js
TigerGraph.jsTigerGraph.js
TigerGraph.js
 
Best Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkBest Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache Spark
 
DeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoTDeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoT
 
IBM Middle East Data Science Connect 2016 - Doha, Qatar
IBM Middle East Data Science Connect 2016 - Doha, QatarIBM Middle East Data Science Connect 2016 - Doha, Qatar
IBM Middle East Data Science Connect 2016 - Doha, Qatar
 
Adopting software design practices for better machine learning
Adopting software design practices for better machine learningAdopting software design practices for better machine learning
Adopting software design practices for better machine learning
 
International Journal of Computer Science, Engineering and Information Techn...
International Journal of Computer Science, Engineering and  Information Techn...International Journal of Computer Science, Engineering and  Information Techn...
International Journal of Computer Science, Engineering and Information Techn...
 
MATLAB PhD Research Thesis Guidance
MATLAB PhD Research Thesis GuidanceMATLAB PhD Research Thesis Guidance
MATLAB PhD Research Thesis Guidance
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorch
 
Basic Data Engineering
Basic Data EngineeringBasic Data Engineering
Basic Data Engineering
 

Ähnlich wie Deep Learning - Luca Grazioli, ICTEAM

Resume-Rohit_Vijay_Bapat_December_2016
Resume-Rohit_Vijay_Bapat_December_2016Resume-Rohit_Vijay_Bapat_December_2016
Resume-Rohit_Vijay_Bapat_December_2016
Rohit Bapat
 
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
Databricks
 

Ähnlich wie Deep Learning - Luca Grazioli, ICTEAM (20)

The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPC
 
Creating a Machine Learning Model on the Cloud
Creating a Machine Learning Model on the CloudCreating a Machine Learning Model on the Cloud
Creating a Machine Learning Model on the Cloud
 
CV Jens Grunert
CV Jens GrunertCV Jens Grunert
CV Jens Grunert
 
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea SaltarelloAzure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
 
Resume-Rohit_Vijay_Bapat_December_2016
Resume-Rohit_Vijay_Bapat_December_2016Resume-Rohit_Vijay_Bapat_December_2016
Resume-Rohit_Vijay_Bapat_December_2016
 
HPC in higher education
HPC in higher educationHPC in higher education
HPC in higher education
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCC
 
Third Gen Production ML Architectures: Lessons from History, Experiences with...
Third Gen Production ML Architectures: Lessons from History, Experiences with...Third Gen Production ML Architectures: Lessons from History, Experiences with...
Third Gen Production ML Architectures: Lessons from History, Experiences with...
 
Sailing the V: Engineering digitalization through task automation and reuse i...
Sailing the V: Engineering digitalization through task automation and reuse i...Sailing the V: Engineering digitalization through task automation and reuse i...
Sailing the V: Engineering digitalization through task automation and reuse i...
 
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
 
Creating a custom Machine Learning Model for your applications - Java Dev Day...
Creating a custom Machine Learning Model for your applications - Java Dev Day...Creating a custom Machine Learning Model for your applications - Java Dev Day...
Creating a custom Machine Learning Model for your applications - Java Dev Day...
 
License Plate Recognition System using Python and OpenCV
License Plate Recognition System using Python and OpenCVLicense Plate Recognition System using Python and OpenCV
License Plate Recognition System using Python and OpenCV
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science Project
 
Computer graphics by bahadar sher
Computer graphics by bahadar sherComputer graphics by bahadar sher
Computer graphics by bahadar sher
 
Network Analyzer and Report Generation Tool for NS-2 using TCL Script
Network Analyzer and Report Generation Tool for NS-2 using TCL ScriptNetwork Analyzer and Report Generation Tool for NS-2 using TCL Script
Network Analyzer and Report Generation Tool for NS-2 using TCL Script
 
Dsp lab manual 15 11-2016
Dsp lab manual 15 11-2016Dsp lab manual 15 11-2016
Dsp lab manual 15 11-2016
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
 
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAMSparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
 

Mehr von Data Science Milan

MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
Data Science Milan
 
Time Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraTime Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del Pra
Data Science Milan
 
Audience projection of target consumers over multiple domains a ner and baye...
Audience projection of target consumers over multiple domains  a ner and baye...Audience projection of target consumers over multiple domains  a ner and baye...
Audience projection of target consumers over multiple domains a ner and baye...
Data Science Milan
 
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo LomonacoContinual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Data Science Milan
 
3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning
Data Science Milan
 
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Data Science Milan
 
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data ReplyPricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Data Science Milan
 

Mehr von Data Science Milan (20)

ML & Graph algorithms to prevent financial crime in digital payments
ML & Graph  algorithms to prevent  financial crime in  digital paymentsML & Graph  algorithms to prevent  financial crime in  digital payments
ML & Graph algorithms to prevent financial crime in digital payments
 
How to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plansHow to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plans
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning Methods
 
"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies
 
Question generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AIQuestion generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AI
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWS
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Reinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del PraReinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del Pra
 
Time Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraTime Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del Pra
 
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AILudwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
 
Audience projection of target consumers over multiple domains a ner and baye...
Audience projection of target consumers over multiple domains  a ner and baye...Audience projection of target consumers over multiple domains  a ner and baye...
Audience projection of target consumers over multiple domains a ner and baye...
 
Weak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina KhvatovaWeak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina Khvatova
 
GANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex HoncharGANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex Honchar
 
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo LomonacoContinual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
 
3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning
 
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
 
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
 
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data ReplyPricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
 
A view of graph data usage by Cerved
A view of graph data usage by CervedA view of graph data usage by Cerved
A view of graph data usage by Cerved
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Deep Learning - Luca Grazioli, ICTEAM

  • 1. Image Caption Generation: Intro to Distributed Tensorflow and Distributed Scoring with Apache Spark Luca Grazioli, Data Scientist @ ICTEAM Data Science Milan, 15th May 2017
  • 2. 2 ICTeam S.p.A. – Presentazione della Divisione Progettazione Agenda  Who am I  ICTeam Big Data Lab  What’s Deep Learning?  Deep Learning Challenges  Tensorflow  Distributed Tensorflow  Image caption generation  Distributed scoring with APACHE SPARK
  • 3. 3 ICTeam S.p.A. – Presentazione della Divisione Progettazione Who am I MSc computer science • University of Milan-Bicocca • Definition of a Knowledge Engineer ML model Academic research • Modeling and understanding time-evolving scenario (http://www.iiisci.org/journal/CV$/sci/pdfs/SA268SN15.pdf) Data Scientist @ ICTeam • Big Data Science • Data Engineering (a bit!) • Deep Learning More at: http://luca-grazioli.it or on Linkedin
  • 4. 4 ICTeam S.p.A. – Presentazione della Divisione Progettazione GPU NODE 2 ICTeam Big Data Lab BIG DATA CLUSTER CLUSTER NODE 1 CLUSTER NODE 2 CLUSTER NODE 3 CLUSTER NODE 4 EDGE NODE WEB CLIENT TOOLS GPU NODE 2 GPU NODE 1
  • 5. 5 ICTeam S.p.A. – Presentazione della Divisione Progettazione Deep Learning Credit by Lukas Masuch
  • 6. 6 ICTeam S.p.A. – Presentazione della Divisione Progettazione Deep Learning Computer vision Natural Language processing Speech recognition
  • 7. 7 ICTeam S.p.A. – Presentazione della Divisione Progettazione Deep Learning Challenges Data Volume CPU usage Graph complexity Parameter Space
  • 8. 8 ICTeam S.p.A. – Presentazione della Divisione Progettazione Tensorflow x W Matmul b Add RELU C import tensorflow as tf b = tf.Variable(tf.zeros(100) x = tf.placeholder(name=‘x’) W = tf.variable(. . .) regr = tf.matmul(W, x) + b relu = tf.nn.relu(regr) C = [. . .] # Session s = tf.Session() for step in xrange(0, 10): input = . . . result = s.run(C, feed_dict={x: input}) … [...] Tensorflow takes computation described [...] and maps it onto a wide variety of different HW platform, ranging from […] mobile device platforms such as Android and iOS to […] large scale computing systems [...]
  • 9. 9 ICTeam S.p.A. – Presentazione della Divisione Progettazione Distributed TF: concepts Multi-device execution Data Parallel Training In-graph processing Between- graph processing Model parallel training Model computation pipelining
  • 10. 10 ICTeam S.p.A. – Presentazione della Divisione Progettazione Distributed TF: concepts Multi-device execution Credits: http://download.tensorflow.org/paper/whitepaper2015.pdf Multi-device execution Data Parallel Training In-graph processing Between- graph processing Model parallel training Model computation pipelining
  • 11. 11 ICTeam S.p.A. – Presentazione della Divisione Progettazione Distributed TF: concepts Model parallel training Credits: http://download.tensorflow.org/paper/whitepaper2015.pdf Multi-device execution Data Parallel Training In-graph processing Between- graph processing Model parallel training Model computation pipelining
  • 12. 12 ICTeam S.p.A. – Presentazione della Divisione Progettazione Distributed TF: concepts Model computation pipelining Credits: http://download.tensorflow.org/paper/whitepaper2015.pdf Multi-device execution Data Parallel Training In-graph processing Between- graph processing Model parallel training Model computation pipelining
  • 13. 13 ICTeam S.p.A. – Presentazione della Divisione Progettazione Distributed TF: concepts In-graph Processing Client Worker 1 CPU:0 GPU:0 Worker 2 CPU:0 PARAMETER SERVERS PS1 PS2 PS3 Between-graph Processing Client 1 Worker 1.1 CPU:0 GPU:0 Worker 1.2 CPU:0 PARAMETER SERVERS PS1 PS2 PS3 Client 2 Worker 2.1 CPU:0 GPU:0 CPU:0 Worker 2.2
  • 14. 14 ICTeam S.p.A. – Presentazione della Divisione Progettazione Layer 1 Layer 2 Data shard 2 Layer 1 Layer 2 Layer 1’ Layer 2’ Data shard 1 Data shard 4 Layer 1’ Layer 2’ Between-graph Processing Data shard 3 Client 1 Worker 1.1 Worker 1.2 PARAMETER SERVERS PS1 PS2 PS3 Client 2 Worker 2.1 Worker 2.2 Layer 1 Layer 2 Layer 1 Layer 2 Data shard 3 Data shard 1 In-graph Processing Client Worker 1 Worker 2 PARAMETER SERVERS PS1 PS2 PS3 Data shard 2 Data shard 4 Distributed TF: concepts
  • 15. 15 ICTeam S.p.A. – Presentazione della Divisione Progettazione Image caption generation Credits: https://github.com/tensorflow/models/tree/master/im2txt http://press.liacs.nl/mirflickr/ • A couple of dogs standing next to each other. • A couple of dogs are standing in a field. • A couple of dogs standing next to each other on a field. • A scenic view of a lake with mountains in the background. • A scenic view of a lake with mountains in the distance. • A scenic view of a lake with a mountain in the background. • A city street at night with traffic lights. • A city street at night with a red light. • A city street at night with a red light. Try it yourself! http://bit.ly /2r8jU1q
  • 16. 16 ICTeam S.p.A. – Presentazione della Divisione Progettazione Image caption generation
  • 17. 17 ICTeam S.p.A. – Presentazione della Divisione Progettazione Distributed scoring with APACHE SPARK Phase1 - Ingestion File Syste m Phase 2 – Distributed Scoring Data Node . . . CLUSTER EDGE NODE SPARK DRIVER Data Node
  • 18. 18 ICTeam S.p.A. – Presentazione della Divisione Progettazione Distributed scoring with APACHE SPARK Read images • images_df = spark.read.parquet('/user/lgrazioli/flickrTestImageBin/') Define Scoring function • Restore last training checkpoint • Define iterator function to yield scored record from a partition Let’s score! • scored_sample_rdd = images_df.rdd.mapPartitions(score_partition).flatMap(lambda x: x) • scored_df = spark.createDataFrame(scored_sample_rdd, schema)
  • 19. 19 ICTeam S.p.A. – Presentazione della Divisione Progettazione Conclusion Today’s goals: • Understand Deep Learning technological challanges • How to distribute a deep learning training algorithm • How to score in distributed fashion • How a big data ecosystem can help Future works: • Tensorframes https://github.com/databricks/tensorframes • New technologies (e.g. TPU ) • Tensorflow improvements • High-level API
  • 20. 20 ICTeam S.p.A. – Presentazione della Divisione Progettazione Bibliography 1. Deep Learning - The Past, Present and Future of Artificial Intelligence (Lukas Masuch) 2. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems (Martin Abadi, Ashish Agarwal, et al.) 3. https://github.com/tensorflow/models/tree/master/im2txt 4. http://press.liacs.nl/mirflickr/