Strata Beijing 2017: Jumpy, a python interface for nd4j

•

3 gefällt mir•2,830 views

Adam Gibson

This covers jumpy: https://github.com/deeplearning4j/jumpy our new python interface.

Daten & Analysen

Who are we?
This slide shows that GPUs should complement the big data stack on the Hadoop ecosystem, rather than trying to
replace Hadoop etc. outright. Wholesale replacement of the big data stack will be cost-prohibitive to many clients. We
believe the right approach is to sell GPUs for accelerated computation and a few other use cases. That’s our beach
head. (Obviously, the widening functionality of the Volta will change the GPU ecosystem.)
Founded 2014
Distributed worldwide
Lots of activity in China

Most JVM python interfaces
● Network based. Requires gateway and py4j
● Tons of overhead. Often a bottleneck with real Spark
jobs
● Places a focus on “pushing logic down to scala”
● Doesn’t interop well with existing python ecosystem
● Often api compatibility issues
● “Good enough” for basic use cases despite overhead

Basic facts about overhead
● In depth paper: https://arxiv.org/pdf/1612.01437.pdf
● Python vs scala: 15x slower
● Much of this is due to network traffic
● Serialization is another big problem
● Imagine saving objects every time you run compute.

Distributed Deep Learning bottlenecks
● Network overhead from param servers
● Data movement between cpu and gpu
● Buffer allocation for compute
● Data Loading and input creation (creating tensors
from data)

Linear Algebra in python
● C based internally
● Python is just an interface
● Tend to interop with numpy pointers directly
● Supports cpu and gpu
● For DL often varied engines (MPI,GRPC,..)
● Often extended in C

Linear Algebra in spark
● Based on breeze and net lib java (not maintained
anymore, limited to cpu)
● Most routines are Scala based
● On heap memory (bad for latency)
● Cuda support is sparse at best
● Doesn’t conform with industry standards (python)
● Not meant for heavy compute (hardware accel)
● Relies on spark for most ops (you can’t do this with
deep learning)

Minor conclusions
● 1 of these is not like the other
● Hard to interop with python ecosystem
● Spark tries to be something it’s not re: linear algebra
● Spark should do data loading. Not linear algebra
better handled by c++ (simd,gpus,..)
● Alternatives are needed (more specialization) (a focus
on c++ with pythonic conventions)

Nd4j
● Java based api, c++ core
● Own off heap memory management (even for gpu)
● Soon: Autodiff and graph execution (graph of
operations) and sparse
● Similar architecture to numpy (easy interop)
(http://nd4j.org/userguide)
● Works with blas/lapack
● Generally faster than numpy even from python (as
we’ll see soon)
● It’s not python though!

Nd4j Parameter Server
Aeron: More stable latency than GRPC and way faster
(25x!) than TF

Jumpy: A better python interface
● Low latency using c internally
● Interface with nd4j <-> numpy via direct pointers
● Syntax sugar similar to numpy
● Uses jnius underneath(https://github.com/kivy/pyjnius)
● JNIUS starts and manages a JVM for you. Interops
via JNI and Cython
● Easy to extend

Conclusions and future work
● No networks! An actual path to improvement
● Reflection can be a bottleneck
● Like most useful things in python, most of it is c!
● Plans to optimize pyjnius itself
● Can enable us to interop with other parts of python

Strata Beijing 2017: Jumpy, a python interface for nd4j

Empfohlen

Deep Learning with GPUs in Production - AI By the BayAdam Gibson

Advanced deeplearning4j featuresAdam Gibson

Big Data Analytics TokyoAdam Gibson

Boolan machine learning summitAdam Gibson

Self driving computers active learning workflows with human interpretable ve...Adam Gibson

Anomaly Detection and Automatic Labeling with Deep LearningAdam Gibson

Brief introduction to Distributed Deep LearningAdam Gibson

Future of ai on the jvmAdam Gibson

Empfohlen

Deep Learning with GPUs in Production - AI By the BayAdam Gibson

Advanced deeplearning4j featuresAdam Gibson

Big Data Analytics TokyoAdam Gibson

Boolan machine learning summitAdam Gibson

Self driving computers active learning workflows with human interpretable ve...Adam Gibson

Anomaly Detection and Automatic Labeling with Deep LearningAdam Gibson

Brief introduction to Distributed Deep LearningAdam Gibson

Future of ai on the jvmAdam Gibson

Deploying signature verification with deep learningAdam Gibson

Dl4j in the wildAdam Gibson

Bringing Deep Learning into production Paolo Platter

Productionizing dl from the ground upAdam Gibson

Anomaly detection in deep learning (Updated) EnglishAdam Gibson

Machine Learning for (JVM) DevelopersMateusz Dymczyk

Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...Databricks

CI/CD for Machine Learning with Daniel KobranDatabricks

Strata San Jose 2016: Scalable Ensemble Learning with H2OSri Ambati

Improving ad hoc and production workflows at Stitch FixStitch Fix Algorithms

Impala presentation ahad ranaData Con LA

Deploy Deep Learning Models with TensorFlow + LambdaGreg Werner

Advanced pythonNovita Sari

PyCon HK 2018 - Heterogeneous job processing with Apache Kafka Hua Chu

Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkJen Aman

Optimizing SparkStitch Fix Algorithms

Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Data Con LA

Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf

Graph Databases at NetflixIoannis Papapanagiotou

Cascalog at May Bay Area Hadoop User Groupnathanmarz

Tokyo r15 異常検知入門Yohei Sato

[Ｒ勉強会][データマイニング] R言語による時系列分析Koichi Hamada

Weitere ähnliche Inhalte

Was ist angesagt?

Deploying signature verification with deep learningAdam Gibson

Dl4j in the wildAdam Gibson

Bringing Deep Learning into production Paolo Platter

Productionizing dl from the ground upAdam Gibson

Anomaly detection in deep learning (Updated) EnglishAdam Gibson

Machine Learning for (JVM) DevelopersMateusz Dymczyk

Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...Databricks

CI/CD for Machine Learning with Daniel KobranDatabricks

Strata San Jose 2016: Scalable Ensemble Learning with H2OSri Ambati

Improving ad hoc and production workflows at Stitch FixStitch Fix Algorithms

Impala presentation ahad ranaData Con LA

Deploy Deep Learning Models with TensorFlow + LambdaGreg Werner

Advanced pythonNovita Sari

PyCon HK 2018 - Heterogeneous job processing with Apache Kafka Hua Chu

Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkJen Aman

Optimizing SparkStitch Fix Algorithms

Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Data Con LA

Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf

Graph Databases at NetflixIoannis Papapanagiotou

Cascalog at May Bay Area Hadoop User Groupnathanmarz

Was ist angesagt? (20)

Deploying signature verification with deep learning

Dl4j in the wild

Bringing Deep Learning into production

Productionizing dl from the ground up

Anomaly detection in deep learning (Updated) English

Machine Learning for (JVM) Developers

Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...

CI/CD for Machine Learning with Daniel Kobran

Strata San Jose 2016: Scalable Ensemble Learning with H2O

Improving ad hoc and production workflows at Stitch Fix

Impala presentation ahad rana

Deploy Deep Learning Models with TensorFlow + Lambda

Advanced python

PyCon HK 2018 - Heterogeneous job processing with Apache Kafka

Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark

Optimizing Spark

Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...

Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...

Graph Databases at Netflix

Cascalog at May Bay Area Hadoop User Group

Andere mochten auch

Tokyo r15 異常検知入門Yohei Sato

[Ｒ勉強会][データマイニング] R言語による時系列分析Koichi Hamada

異常行動検出入門 – 行動データ時系列のデータマイニング –Yohei Sato

時系列分析による異常検知入門Yohei Sato

機械学習を用いた異常検知入門michiaki ito

FIT2012招待講演「異常検知技術のビジネス応用最前線」Shohei Hido

Andere mochten auch (6)

Tokyo r15 異常検知入門

[Ｒ勉強会][データマイニング] R言語による時系列分析

異常行動検出入門 – 行動データ時系列のデータマイニング –

時系列分析による異常検知入門

機械学習を用いた異常検知入門

FIT2012招待講演「異常検知技術のビジネス応用最前線」

Ähnlich wie Strata Beijing 2017: Jumpy, a python interface for nd4j

Making the big data ecosystem work together with python apache arrow, spark,...Holden Karau

Making the big data ecosystem work together with Python & Apache Arrow, Apach...Holden Karau

Spark Summit EU 2015: Lessons from 300+ production usersDatabricks

Big Data Beyond the JVM - Strata San Jose 2018Holden Karau

AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty

Introduction to Apache Flinkdatamantra

Deep Learning with Spark and GPUsDataWorks Summit

Netty trainingMarcelo Serpa

Shootout at the PAAS CorralPostgreSQL Experts, Inc.

Netty trainingJackson dos Santos Olveira

Architecting and productionising data science applications at scalesamthemonad

Big data beyond the JVM - DDTX 2018Holden Karau

Apache spark on Hadoop Yarn Resource Managerharidasnss

Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAlluxio, Inc.

Lightning Fast Dataframes with PolarsAlberto Danese

Introduction to Apache Sparkdatamantra

Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty

Dataframes Showdown (miniConf 2022)8thLight

Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...confluent

BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...Big Data Montreal

Ähnlich wie Strata Beijing 2017: Jumpy, a python interface for nd4j (20)

Making the big data ecosystem work together with python apache arrow, spark,...

Making the big data ecosystem work together with Python & Apache Arrow, Apach...

Spark Summit EU 2015: Lessons from 300+ production users

Big Data Beyond the JVM - Strata San Jose 2018

AWS Big Data Demystified #1: Big data architecture lessons learned

Introduction to Apache Flink

Deep Learning with Spark and GPUs

Netty training

Shootout at the PAAS Corral

Netty training

Architecting and productionising data science applications at scale

Big data beyond the JVM - DDTX 2018

Apache spark on Hadoop Yarn Resource Manager

Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio

Lightning Fast Dataframes with Polars

Introduction to Apache Spark

Big Data in 200 km/h | AWS Big Data Demystified #1.3

Dataframes Showdown (miniConf 2022)

Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...

BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...

Mehr von Adam Gibson

End to end MLworkflowsAdam Gibson

World Artificial Intelligence Conference Shanghai 2018Adam Gibson

Wrangleconf Big Data Malaysia 2016Adam Gibson

Distributed deep rl on spark strata singaporeAdam Gibson

Deep learning in production with the bestAdam Gibson

SKIL - Dl4j in the wild meetupAdam Gibson

Strata Beijing - Deep Learning in Production on SparkAdam Gibson

Skymind - Udacity China presentationAdam Gibson

Anomaly Detection in Deep Learning (Updated)Adam Gibson

Hadoop summit 2016Adam Gibson

Anomaly detection in deep learningAdam Gibson

Advanced spark deep learningAdam Gibson

Skymind Open Power Summit ISV Round TableAdam Gibson

Recurrent nets and sensorsAdam Gibson

Nd4 j slides.pptxAdam Gibson

Deep learning on Hadoop/Spark -NextMLAdam Gibson

Skymind & Deeplearning4j: Deep Learning for the EnterpriseAdam Gibson

Sf data mining_meetupAdam Gibson

Mehr von Adam Gibson (18)

End to end MLworkflows

World Artificial Intelligence Conference Shanghai 2018

Wrangleconf Big Data Malaysia 2016

Distributed deep rl on spark strata singapore

Deep learning in production with the best

SKIL - Dl4j in the wild meetup

Strata Beijing - Deep Learning in Production on Spark

Skymind - Udacity China presentation

Anomaly Detection in Deep Learning (Updated)

Hadoop summit 2016

Anomaly detection in deep learning

Advanced spark deep learning

Skymind Open Power Summit ISV Round Table

Recurrent nets and sensors

Nd4 j slides.pptx

Deep learning on Hadoop/Spark -NextML

Skymind & Deeplearning4j: Deep Learning for the Enterprise

Sf data mining_meetup

Kürzlich hochgeladen

Ranking and Scoring Exercises for ResearchRajesh Mondal

Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg

7. Epi of Chronic respiratory diseases.pptibrahimabdi22

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg

The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg

Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health

Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg

怎样办理圣地亚哥州立大学毕业证（SDSU毕业证书）成绩单学校原版复制vexqp

Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums

Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila

如何办理英国诺森比亚大学毕业证（NU毕业证书）成绩单原件一模一样wsppdmt

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg

Digital Transformation Playbook by Graham WareGraham Ware

怎样办理纽约州立大学宾汉姆顿分校毕业证（SUNY-Bin毕业证书）成绩单学校原版复制vexqp

PLE-statistics document for primary schscnajjemba

Kürzlich hochgeladen (20)

Ranking and Scoring Exercises for Research

Abortion pills in Jeddah | +966572737505 | Get Cytotec

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...

7. Epi of Chronic respiratory diseases.ppt

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...

The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...

Harnessing the Power of GenAI for BI and Reporting.pptx

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...

Lecture_2_Deep_Learning_Overview-newone1

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...

怎样办理圣地亚哥州立大学毕业证（SDSU毕业证书）成绩单学校原版复制

Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...

Aspirational Block Program Block Syaldey District - Almora

如何办理英国诺森比亚大学毕业证（NU毕业证书）成绩单原件一模一样

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...

Digital Transformation Playbook by Graham Ware

怎样办理纽约州立大学宾汉姆顿分校毕业证（SUNY-Bin毕业证书）成绩单学校原版复制

PLE-statistics document for primary schs

Strata Beijing 2017: Jumpy, a python interface for nd4j

2. Who are we? This slide shows that GPUs should complement the big data stack on the Hadoop ecosystem, rather than trying to replace Hadoop etc. outright. Wholesale replacement of the big data stack will be cost-prohibitive to many clients. We believe the right approach is to sell GPUs for accelerated computation and a few other use cases. That’s our beach head. (Obviously, the widening functionality of the Volta will change the GPU ecosystem.) Founded 2014 Distributed worldwide Lots of activity in China

3. Skymind in China

4. Most JVM python interfaces ● Network based. Requires gateway and py4j ● Tons of overhead. Often a bottleneck with real Spark jobs ● Places a focus on “pushing logic down to scala” ● Doesn’t interop well with existing python ecosystem ● Often api compatibility issues ● “Good enough” for basic use cases despite overhead

5. Basic facts about overhead ● In depth paper: https://arxiv.org/pdf/1612.01437.pdf ● Python vs scala: 15x slower ● Much of this is due to network traffic ● Serialization is another big problem ● Imagine saving objects every time you run compute.

6. Distributed Deep Learning bottlenecks ● Network overhead from param servers ● Data movement between cpu and gpu ● Buffer allocation for compute ● Data Loading and input creation (creating tensors from data)

7. Linear Algebra in python ● C based internally ● Python is just an interface ● Tend to interop with numpy pointers directly ● Supports cpu and gpu ● For DL often varied engines (MPI,GRPC,..) ● Often extended in C

8. Linear Algebra in spark ● Based on breeze and net lib java (not maintained anymore, limited to cpu) ● Most routines are Scala based ● On heap memory (bad for latency) ● Cuda support is sparse at best ● Doesn’t conform with industry standards (python) ● Not meant for heavy compute (hardware accel) ● Relies on spark for most ops (you can’t do this with deep learning)

9. Minor conclusions ● 1 of these is not like the other ● Hard to interop with python ecosystem ● Spark tries to be something it’s not re: linear algebra ● Spark should do data loading. Not linear algebra better handled by c++ (simd,gpus,..) ● Alternatives are needed (more specialization) (a focus on c++ with pythonic conventions)

10. Nd4j ● Java based api, c++ core ● Own off heap memory management (even for gpu) ● Soon: Autodiff and graph execution (graph of operations) and sparse ● Similar architecture to numpy (easy interop) (http://nd4j.org/userguide) ● Works with blas/lapack ● Generally faster than numpy even from python (as we’ll see soon) ● It’s not python though!

11. Nd4j Parameter Server Aeron: More stable latency than GRPC and way faster (25x!) than TF

12. Jumpy: A better python interface ● Low latency using c internally ● Interface with nd4j <-> numpy via direct pointers ● Syntax sugar similar to numpy ● Uses jnius underneath(https://github.com/kivy/pyjnius) ● JNIUS starts and manages a JVM for you. Interops via JNI and Cython ● Easy to extend

13. Jumpy examples

14. Thanks! Join our QQ group:

15. Conclusions and future work ● No networks! An actual path to improvement ● Reflection can be a bottleneck ● Like most useful things in python, most of it is c! ● Plans to optimize pyjnius itself ● Can enable us to interop with other parts of python