SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Distributed implementation
of a LSTM on Spark and
Tensorflow
Emanuel Di Nardo
Source code: https://github.com/EmanuelOverflow/LSTM-TensorSpark
Overview
● Introduction
● Apache Spark
● Tensorflow
● RNN-LSTM
● Implementation
● Results
● Conclusions
Introduction
Distributed environment:
● Many computation units;
● Each unit is called ‘node’;
● Node collaboration/competition;
● Message passing;
● Synchronization and global
state management;
Apache Spark
● Large-scale data processing framework;
● In-memory processing;
● General purpose:
○ MapReduce;
○ Batch and streaming processing;
○ Machine learning;
○ Graph theory;
○ Etc…
● Scalable;
● Open source;
Apache Spark
● Resilient Distributed Dataset (RDD):
○ Fault-tolerant collection of elements;
○ Transformation and actions;
○ Lazy computation;
● Spark core:
○ Tasks dispatching;
○ Scheduling;
○ I/O;
● Essentially:
○ A master driver organizes nodes and demands tasks to workers passing a RDD;
○ Worker executioner runs tasks and returns results in new RDD;
Apache Spark Streaming
● Streaming computation;
● Mini-batch strategy;
● Latency depends on mini-batch elaboration time/size;
● Easy to combine with batch strategy;
● Fault tolerance;
Apache Spark
● API for many languages:
○ Java;
○ Python;
○ Scala;
○ R;
● Runs on
○ Hadoop;
○ Mesos;
○ Standalone;
○ Vloud.
● It can access diverse data sources including:
○ HDFS;
○ Cassandra;
○ HBase;
Tensorflow
● Numerical computation library;
● Computation is graph-based:
○ Nodes are mathematical operations;
○ Edges are I/O multidimensional array (tensors);
● Distributed on multiple CPU/GPU;
● API:
○ Python;
○ C++;
● Open source;
● A Google product;
Tensorflow
● Data Flow Graph:
○ Oriented graph;
○ Nodes are mathematical operations or
data I/O;
○ Edges are I/O tensors;
○ Operations are asynchronous and parallel:
■ Performed once all input tensors are
available;
● Flexible and easily extendible;
● Auto-differentiation;
● Lazy computation;
RNN-LSTM
● Recurrent Neural Network;
● Cyclic networks:
○ At each training step the output of
the previous step is used to feed the
same layer with a different input
data;
● Input Xt is transformed in the
hidden layer A, the output is also
used to feed itself;
*Image from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
RNN-LSTM
● Recurrent Neural Network;
● Cyclic networks:
○ At each training step the output of the previous step is used to feed the same layer with a
different input data;
● Unrolled network:
○ Each input feed the network;
○ The output is passed to the next step as a supplementary input data;
*Image from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
RNN-LSTM
● This kind of network has a great problem...:
○ It is unable to learn long data sequence;
○ It works only with in short term;
● It is needed a ‘long memory’ model:
○ Long-short term memory;
● Hidden layer is able to memorize long data sequence using:
○ Current input;
○ Previous output;
○ Network memory state;
*Image from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
RNN-LSTM
● Hidden layer is able to memorize long data sequence using:
○ Current input;
○ Previous output;
○ Network memory state;
● Four ‘gate layers’ to preserve information:
○ Forget gate layer;
○ Input gate layer;
○ ‘Candidate’ gate layer;
○ Output gate layer;
● Multiple activation functions:
○ Sigmoid for the first three layers;
○ Tanh for the output layer;
*Image from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Implementation
● RNN-LSTM:
○ Distributed on Spark;
○ Mathematical operations with Tensorflow;
● Distribution of mini-batch computation:
○ Each partition takes care of a subset of the whole dataset;
○ Each subset has the same size, it is not required in the mini-batch strategy, using proper
techniques, but we want to test performances over all partitions with a balanced loading;
● Tensorflow provides many LSTM implementations, but it has been decided to
implement a network from scratch for learning purpose;
Implementation
● A master driver splits the input data in partitions organized by key:
○ Input data is shuffled and normalized;
○ Each partition will have its own RDD;
● Each spark-worker runs an entire LSTM training cycle:
○ We will have a number of LSTM equal to number of partitions;
○ It is possible to choose number of epochs, number of hidden layers and number of partitions;
○ Memory to assign to each worker and many other parameters;
● At the end of training step the returning RDD will be mapped in a key-value
data structure with weights and biases values;
● At the end, all elements in the RDDs are averaged to achieve the final result;
Implementation
● With tensorflow mathematical operations a new LSTM is created:
○ Operations are executed in a lazy manner;
○ Initialization builds and organizes the data graph;
● Weights and biases are initialized randomly;
● An optimizer is chosen and an OutputLayer is instantiate;
● For the lazy-strategy all operations must be placed in a ‘session’ window:
○ Session handles initialization ops and graph execution;
○ All variables must be initialized before any run;
● Taking advantages of python function passing, all computation layers are
performed with a unique method:
○ Each time a different function and the right variables are used;
Implementation
● At the end minimization is performed:
○ Loss function is computed in the output layer;
○ Minimization uses tensorflow auto-differentiation;
● At the end data are organized in a key-value structure with weights and
biases;
● It is also possible to perform data evaluation, but it is not a very
time-consuming task, therefore it is not reported.
Results
● Tested locally in a multicore environment:
○ Distributed environment is not available;
○ Each partition is assigned to a core;
● No GPU usage;
● Iris dataset*;
● Overloaded CPUs vs Idle CPUs;
● 12 Core - 64GB RAM;
* http://archive.ics.uci.edu/ml/datasets/Iris
Results
● 3 partitions:
Partition T. exec(s) T. exec(m)
1 1385.62 ~23
2 1675.76 ~28
3 1692.48 ~28
Tot+weight average 1704.81 ~28
Tot+repartition 1704.81 ~28
Results
● 5 partitions:
Partition T. exec(s) T. exec(m)
1 867.18 ~14
2 834.31 ~14
3 995.37 ~16
4 970.46 ~16
5 1015.47 ~17
Tot+weight average 1023.43 ~17
Tot+repartition 1023.43 ~17
Results
● 15 partitions:
Part. T. exec(s) T. exec(m) Part. T. exec(s) T. exec(m) Part. T. exec(s) T. exec(m)
1 476.76 ~8 6 482.82 ~8 11 458.05 ~8
2 448..91 ~7 7 499.66 ~8 12 504.85 ~8
3 472.05 ~8 8 454.78 ~8 13 470.93 ~8
4 493.39 ~8 9 479.61 ~8 14 450.84 ~8
5 485.66 ~8 10 493.21 ~8 15 454.29 ~8
Tot+weight average 510.89 ~9
Tot+repartition 510.89 ~9
Results
● Comparison without distribution:
System T. exec(s) T. exec(m) Speed up mb Speed up loc.
dist-3 1704.81 ~28 96% 61%
dist-5 1023.91 ~17 97% 76%
dist-15 510.89 ~9 98% 88%
local-opt 4080.94 ~68 89% 6%
local 4335.66 ~72 88% -
local-mb-10 34699.58 ~578 - -
local: not distributed implementation
local-opt: not distributed - optimized implementation
local-mb-10: not distributed implementation with mini-batch each 10 elements (like dist-15 organization)
Results
● 3 partitions [overloaded vs idle]:
Part. T. exec busy(s) T. exec busy(m) T. exec idle(s) T. exec idle(m)
1 2679.76 ~44 1385.62 ~23
2 2910.69 ~48 1675.76 ~28
3 3063.88 ~51 1692.48 ~28
Tot 3078.15 ~51 1704.81 ~28
Results
● 5 partitions [overloaded vs idle]:
Part. T. exec busy(s) T. exec busy(m) T. exec idle(s) T. exec idle(m)
1 1356.44 ~22 867.18 ~14
2 1358.28 ~22 834.31 ~14
3 1373.25 ~22 995.37 ~16
4 1370.11 ~23 970.46 ~16
5 1372.25 ~23 1015.47 ~17
Tot 1393.91 ~23 1023.43 ~17

Weitere ähnliche Inhalte

Was ist angesagt?

Data Warehouse Project
Data Warehouse ProjectData Warehouse Project
Data Warehouse Project
Sunny U Okoro
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Databricks
 

Was ist angesagt? (20)

Dbms schemas for decision support
Dbms schemas for decision supportDbms schemas for decision support
Dbms schemas for decision support
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Business Analytics Decision Tree in R
Business Analytics Decision Tree in RBusiness Analytics Decision Tree in R
Business Analytics Decision Tree in R
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Data Warehouse Project
Data Warehouse ProjectData Warehouse Project
Data Warehouse Project
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Base de données graphe et Neo4j
Base de données graphe et Neo4jBase de données graphe et Neo4j
Base de données graphe et Neo4j
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
 
Project+team+1 slides (2)
Project+team+1 slides (2)Project+team+1 slides (2)
Project+team+1 slides (2)
 
Data-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing Strategies
 
Winning Market Expansion Strategies for CPG brands, Using Spatial Data and An...
Winning Market Expansion Strategies for CPG brands, Using Spatial Data and An...Winning Market Expansion Strategies for CPG brands, Using Spatial Data and An...
Winning Market Expansion Strategies for CPG brands, Using Spatial Data and An...
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
 
Kettle – Etl Tool
Kettle – Etl ToolKettle – Etl Tool
Kettle – Etl Tool
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 

Ähnlich wie Distributed implementation of a lstm on spark and tensorflow

Lcu14 101- coresight overview
Lcu14 101- coresight overviewLcu14 101- coresight overview
Lcu14 101- coresight overview
Linaro
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph Processing
Riyad Parvez
 

Ähnlich wie Distributed implementation of a lstm on spark and tensorflow (20)

Gluster dev session #6 understanding gluster's network communication layer
Gluster dev session #6  understanding gluster's network   communication layerGluster dev session #6  understanding gluster's network   communication layer
Gluster dev session #6 understanding gluster's network communication layer
 
Lcu14 101- coresight overview
Lcu14 101- coresight overviewLcu14 101- coresight overview
Lcu14 101- coresight overview
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph Processing
 
Tf paper ppt
Tf paper pptTf paper ppt
Tf paper ppt
 
MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103
 
Untangling the Intricacies of Thread Synchronization in the PREEMPT_RT Linux ...
Untangling the Intricacies of Thread Synchronization in the PREEMPT_RT Linux ...Untangling the Intricacies of Thread Synchronization in the PREEMPT_RT Linux ...
Untangling the Intricacies of Thread Synchronization in the PREEMPT_RT Linux ...
 
TensorRT survey
TensorRT surveyTensorRT survey
TensorRT survey
 
A Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm BasebandsA Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm Basebands
 
When the OS gets in the way
When the OS gets in the wayWhen the OS gets in the way
When the OS gets in the way
 
LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)
 
Computer Architecture and Organization
Computer Architecture and OrganizationComputer Architecture and Organization
Computer Architecture and Organization
 
Common Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning Frameworks
 
Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using Automata
Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using AutomataModeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using Automata
Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using Automata
 
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
 
Migrating to Apache Spark at Netflix
Migrating to Apache Spark at NetflixMigrating to Apache Spark at Netflix
Migrating to Apache Spark at Netflix
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
 

Kürzlich hochgeladen

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Kürzlich hochgeladen (20)

Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 

Distributed implementation of a lstm on spark and tensorflow

  • 1. Distributed implementation of a LSTM on Spark and Tensorflow Emanuel Di Nardo Source code: https://github.com/EmanuelOverflow/LSTM-TensorSpark
  • 2. Overview ● Introduction ● Apache Spark ● Tensorflow ● RNN-LSTM ● Implementation ● Results ● Conclusions
  • 3. Introduction Distributed environment: ● Many computation units; ● Each unit is called ‘node’; ● Node collaboration/competition; ● Message passing; ● Synchronization and global state management;
  • 4. Apache Spark ● Large-scale data processing framework; ● In-memory processing; ● General purpose: ○ MapReduce; ○ Batch and streaming processing; ○ Machine learning; ○ Graph theory; ○ Etc… ● Scalable; ● Open source;
  • 5. Apache Spark ● Resilient Distributed Dataset (RDD): ○ Fault-tolerant collection of elements; ○ Transformation and actions; ○ Lazy computation; ● Spark core: ○ Tasks dispatching; ○ Scheduling; ○ I/O; ● Essentially: ○ A master driver organizes nodes and demands tasks to workers passing a RDD; ○ Worker executioner runs tasks and returns results in new RDD;
  • 6. Apache Spark Streaming ● Streaming computation; ● Mini-batch strategy; ● Latency depends on mini-batch elaboration time/size; ● Easy to combine with batch strategy; ● Fault tolerance;
  • 7. Apache Spark ● API for many languages: ○ Java; ○ Python; ○ Scala; ○ R; ● Runs on ○ Hadoop; ○ Mesos; ○ Standalone; ○ Vloud. ● It can access diverse data sources including: ○ HDFS; ○ Cassandra; ○ HBase;
  • 8. Tensorflow ● Numerical computation library; ● Computation is graph-based: ○ Nodes are mathematical operations; ○ Edges are I/O multidimensional array (tensors); ● Distributed on multiple CPU/GPU; ● API: ○ Python; ○ C++; ● Open source; ● A Google product;
  • 9. Tensorflow ● Data Flow Graph: ○ Oriented graph; ○ Nodes are mathematical operations or data I/O; ○ Edges are I/O tensors; ○ Operations are asynchronous and parallel: ■ Performed once all input tensors are available; ● Flexible and easily extendible; ● Auto-differentiation; ● Lazy computation;
  • 10. RNN-LSTM ● Recurrent Neural Network; ● Cyclic networks: ○ At each training step the output of the previous step is used to feed the same layer with a different input data; ● Input Xt is transformed in the hidden layer A, the output is also used to feed itself; *Image from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 11. RNN-LSTM ● Recurrent Neural Network; ● Cyclic networks: ○ At each training step the output of the previous step is used to feed the same layer with a different input data; ● Unrolled network: ○ Each input feed the network; ○ The output is passed to the next step as a supplementary input data; *Image from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 12. RNN-LSTM ● This kind of network has a great problem...: ○ It is unable to learn long data sequence; ○ It works only with in short term; ● It is needed a ‘long memory’ model: ○ Long-short term memory; ● Hidden layer is able to memorize long data sequence using: ○ Current input; ○ Previous output; ○ Network memory state; *Image from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 13. RNN-LSTM ● Hidden layer is able to memorize long data sequence using: ○ Current input; ○ Previous output; ○ Network memory state; ● Four ‘gate layers’ to preserve information: ○ Forget gate layer; ○ Input gate layer; ○ ‘Candidate’ gate layer; ○ Output gate layer; ● Multiple activation functions: ○ Sigmoid for the first three layers; ○ Tanh for the output layer; *Image from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 14. Implementation ● RNN-LSTM: ○ Distributed on Spark; ○ Mathematical operations with Tensorflow; ● Distribution of mini-batch computation: ○ Each partition takes care of a subset of the whole dataset; ○ Each subset has the same size, it is not required in the mini-batch strategy, using proper techniques, but we want to test performances over all partitions with a balanced loading; ● Tensorflow provides many LSTM implementations, but it has been decided to implement a network from scratch for learning purpose;
  • 15. Implementation ● A master driver splits the input data in partitions organized by key: ○ Input data is shuffled and normalized; ○ Each partition will have its own RDD; ● Each spark-worker runs an entire LSTM training cycle: ○ We will have a number of LSTM equal to number of partitions; ○ It is possible to choose number of epochs, number of hidden layers and number of partitions; ○ Memory to assign to each worker and many other parameters; ● At the end of training step the returning RDD will be mapped in a key-value data structure with weights and biases values; ● At the end, all elements in the RDDs are averaged to achieve the final result;
  • 16. Implementation ● With tensorflow mathematical operations a new LSTM is created: ○ Operations are executed in a lazy manner; ○ Initialization builds and organizes the data graph; ● Weights and biases are initialized randomly; ● An optimizer is chosen and an OutputLayer is instantiate; ● For the lazy-strategy all operations must be placed in a ‘session’ window: ○ Session handles initialization ops and graph execution; ○ All variables must be initialized before any run; ● Taking advantages of python function passing, all computation layers are performed with a unique method: ○ Each time a different function and the right variables are used;
  • 17. Implementation ● At the end minimization is performed: ○ Loss function is computed in the output layer; ○ Minimization uses tensorflow auto-differentiation; ● At the end data are organized in a key-value structure with weights and biases; ● It is also possible to perform data evaluation, but it is not a very time-consuming task, therefore it is not reported.
  • 18. Results ● Tested locally in a multicore environment: ○ Distributed environment is not available; ○ Each partition is assigned to a core; ● No GPU usage; ● Iris dataset*; ● Overloaded CPUs vs Idle CPUs; ● 12 Core - 64GB RAM; * http://archive.ics.uci.edu/ml/datasets/Iris
  • 19. Results ● 3 partitions: Partition T. exec(s) T. exec(m) 1 1385.62 ~23 2 1675.76 ~28 3 1692.48 ~28 Tot+weight average 1704.81 ~28 Tot+repartition 1704.81 ~28
  • 20. Results ● 5 partitions: Partition T. exec(s) T. exec(m) 1 867.18 ~14 2 834.31 ~14 3 995.37 ~16 4 970.46 ~16 5 1015.47 ~17 Tot+weight average 1023.43 ~17 Tot+repartition 1023.43 ~17
  • 21. Results ● 15 partitions: Part. T. exec(s) T. exec(m) Part. T. exec(s) T. exec(m) Part. T. exec(s) T. exec(m) 1 476.76 ~8 6 482.82 ~8 11 458.05 ~8 2 448..91 ~7 7 499.66 ~8 12 504.85 ~8 3 472.05 ~8 8 454.78 ~8 13 470.93 ~8 4 493.39 ~8 9 479.61 ~8 14 450.84 ~8 5 485.66 ~8 10 493.21 ~8 15 454.29 ~8 Tot+weight average 510.89 ~9 Tot+repartition 510.89 ~9
  • 22. Results ● Comparison without distribution: System T. exec(s) T. exec(m) Speed up mb Speed up loc. dist-3 1704.81 ~28 96% 61% dist-5 1023.91 ~17 97% 76% dist-15 510.89 ~9 98% 88% local-opt 4080.94 ~68 89% 6% local 4335.66 ~72 88% - local-mb-10 34699.58 ~578 - - local: not distributed implementation local-opt: not distributed - optimized implementation local-mb-10: not distributed implementation with mini-batch each 10 elements (like dist-15 organization)
  • 23. Results ● 3 partitions [overloaded vs idle]: Part. T. exec busy(s) T. exec busy(m) T. exec idle(s) T. exec idle(m) 1 2679.76 ~44 1385.62 ~23 2 2910.69 ~48 1675.76 ~28 3 3063.88 ~51 1692.48 ~28 Tot 3078.15 ~51 1704.81 ~28
  • 24. Results ● 5 partitions [overloaded vs idle]: Part. T. exec busy(s) T. exec busy(m) T. exec idle(s) T. exec idle(m) 1 1356.44 ~22 867.18 ~14 2 1358.28 ~22 834.31 ~14 3 1373.25 ~22 995.37 ~16 4 1370.11 ~23 970.46 ~16 5 1372.25 ~23 1015.47 ~17 Tot 1393.91 ~23 1023.43 ~17