SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Downloaden Sie, um offline zu lesen
Dr. Christoph Angerer, Manager AI Developer Technologies, NVIDIA
RAPIDS, FOSDEM’19
2
Healthcare Industrial Consumer Internet Automotive Ad Tech /
MarTech
Retail Financial / Insurance
HPC & AI TRANSFORMS INDUSTRIES
Computational & Data Scientists Are Driving Change
3
DATA SCIENCE IS NOT A LINEAR PROCESS
It Requires Exploration and Iterations
All
Data
ETL
Manage Data
Structured
Data Store
Data
Preparation
Training
Model
Training
Visualization
Evaluate
Inference
Deploy
Accelerating`Model Training` only does have benefit but doesn’t address the whole problem
Iterate …
Cross Validate …
Grid Search …
Iterate some more.
4
DAY IN THE LIFE
Or: Why did I want to become a Data Scientist?
Data Scientist are valued resources.
Why not give them the environment to
be more productiveconfigure ETL workflow
5
PERFORMANCE AND DATA GROWTH
Post-Moore's law
Moore's law is no longer a predictor of capacity in CPU
market growth
Distributing CPUs exacerbates the problem
Data sizes continue to grow
6
TRADITIONAL
DATA SCIENCE
CLUSTER
Workload Profile:
Fannie Mae Mortgage Data:
• 192GB data set
• 16 years, 68 quarters
• 34.7 Million single family mortgage loans
• 1.85 Billion performance records
• XGBoost training set: 50 features
300 Servers | $3M | 180 kW
7
GPU-ACCELERATED
MACHINE
LEARNING
CLUSTER
1 DGX-2 | 10 kW
1/8 the Cost | 1/15 the Space
1/18 the Power
NVIDIA Data Science Platform
with DGX-2
0 2,000 4,000 6,000 8,000 10,000
20 CPU Nodes
30 CPU Nodes
50 CPU Nodes
100 CPU Nodes
DGX-2
5x DGX-1
End-to-End
8
DELIVERING DATA SCIENCE VALUE
Maximized Productivity Top Model Accuracy Lowest TCO
215xSpeedup Using RAPIDS
with XGBoost
Oak Ridge
National Labs
$1BPotential Saving with
4% Error Rate Reduction
Global
Retail Giant
$1.5MInfrastructure
Cost Saving
Streaming Media
Company
9
DATA SCIENCE WORKFLOW WITH RAPIDS
Open Source, End-to-end GPU-accelerated Workflow Built On CUDA
DATA
DATA PREPARATION
GPUs accelerated compute for in-memory data preparation
Simplified implementation using familiar data science tools
Python drop-in Pandas replacement built on CUDA C++. GPU-accelerated Spark (in development)
PREDICTIONS
10
DATA SCIENCE WORKFLOW WITH RAPIDS
Open Source, End-to-end GPU-accelerated Workflow Built On CUDA
MODEL TRAINING
GPU-acceleration of today’s most popular ML algorithms
XGBoost, PCA, K-means, k-NN, DBScan, tSVD …
DATA PREDICTIONS
11
DATA SCIENCE WORKFLOW WITH RAPIDS
Open Source, End-to-end GPU-accelerated Workflow Built On CUDA
VISUALIZATION
Effortless exploration of datasets, billions of records in milliseconds
Dynamic interaction with data = faster ML model development
Data visualization ecosystem (Graphistry & OmniSci), integrated with RAPIDS
DATA PREDICTIONS
12
Faster Data Access Less Data Movement
THE EFFECTS OF END-TO-END ACCELERATION
25-100x Improvement
Less code
Language flexible
Primarily In-Memory
HDFS
Read
HDFS
Write
HDFS
Read
HDFS
Write
HDFS
ReadQuery ETL ML Train
HDFS
Read Query ETL ML Train
HDFS
Read
GPU
Read
Query
CPU
Write
GPU
Read
ETL
CPU
Write
GPU
Read
ML
Train
Arrow
Read
Query ETL
ML
Train
5-10x Improvement
More code
Language rigid
Substantially on GPU
50-100x Improvement
Same code
Language flexible
Primarily on GPU
RAPIDS
GPU/Spark In-Memory Processing
Hadoop Processing, Reading from disk
Spark In-Memory Processing
13
ADDRESSING
CHALLENGES
IN GPU
ACCELERATED
DATA SCIENCE
• Too much data movement
• Too many makeshift data
formats
• Writing CUDA C/C++ is
involved
• No Python API for data
manipulation
Yes GPUs are fast but …
14
APP A
DATA MOVEMENT AND TRANSFORMATION
The bane of productivity and performance
CPU GPU
APP B
Read Data
Copy & Convert
Copy & Convert
Copy & Convert
Load Data
APP A GPU
Data
APP B
GPU
Data
APP A
APP B
15
APP A
DATA MOVEMENT AND TRANSFORMATION
What if we could keep data on the GPU?
APP B
Copy & Convert
Copy & Convert
Copy & Convert
APP A GPU
Data
APP B
GPU
Data
Read Data
Load Data
APP B
CPU GPU
APP A
16
LEARNING FROM APACHE ARROW
From Apache Arrow Home Page - https://arrow.apache.org/
17
CUDA DATA FRAMES IN PYTHON
GPUs at your Fingertips
Illustrations from https://changhsinlee.com/pyspark-dataframe-basics/
18
RAPIDS
OPEN GPU DATA SCIENCE
19
RAPIDS
Open GPU Data Science
APPLICATIONS
SYSTEMS
ALGORITHMS
CUDA
ARCHITECTURE
• Learn what the data science community needs
• Use best practices and standards
• Build scalable systems and algorithms
• Test Applications and workflows
• Iterate
20
cuDF
Analytics
GPU Memory
Data Preparation VisualizationModel Training
cuML
Machine Learning
cuGraph
Graph Analytics
PyTorch & Chainer
Deep Learning
Kepler.GL
Visualization
RAPIDS COMPONENTS
DASK
21
cuDF
Analytics
GPU Memory
Data Preparation VisualizationModel Training
cuML
Machine Learning
cuGraph
Graph Analytics
PyTorch & Chainer
Deep Learning
Kepler.GL
Visualization
CUML & CUGRAPH
DASK
22
AI LIBRARIES
Accelerating more of the AI ecosystem
Graph Analytics is fundamental to network analysis
Machine Learning is fundamental to prediction,
classification, clustering, anomaly detection and
recommendations.
Both can be accelerated with NVIDIA GPU
8x V100 20-90x faster than dual socket CPU
Decisions Trees
Random Forests
Linear Regressions
Logistics Regressions
K-Means
K-Nearest Neighbor
DBSCAN
Kalman Filtering
Principal Components
Single Value Decomposition
Bayesian Inferencing
PageRank
BFS
Jaccard Similarity
Single Source Shortest Path
Triangle Counting
Louvain Modularity
ARIMA
Holt-Winters
Machine Learning Graph Analytics
Time Series
XGBoost, Mortgage Dataset, 90x
3 Hours to 2 mins on 1 DGX-1
cuML & cuGraph
23
CUDF + XGBOOST
DGX-2 vs Scale Out CPU Cluster
• Full end to end pipeline
• Leveraging Dask + PyGDF
• Store each GPU results in sys mem then read back in
• Arrow to Dmatrix (CSR) for XGBoost
24
CUDF + XGBOOST
Scale Out GPU Cluster vs DGX-2
0 50 100 150 200 250 300 350
5xDGX-1
DGX-2
Chart Title
ETL+CSV (s) ML Prep (s) ML (s)
• Full end to end pipeline
• Leveraging Dask for multi-node + PyGDF
• Store each GPU results in sys mem then read back in
• Arrow to Dmatrix (CSR) for XGBoost
25
CUML
Benchmarks of initial algorithms
26
NEAR FUTURE WORK ON CUML
Additional algorithms in development right now
K-means - Released
K-NN - Released
Kalman filter – v0.5
GLM – v0.5
Random Forests - v0.6
ARIMA – v0.6
UMAP – v0.6
Collaborative filtering – Q2 2019
27
CUGRAPH
GPU-Accelerated Graph Analytics Library
Coming Soon:
Full NVGraph Integration Q1 2019
28
cuDF
Analytics
GPU Memory
Data Preparation VisualizationModel Training
cuML
Machine Learning
cuGraph
Graph Analytics
PyTorch & Chainer
Deep Learning
Kepler.GL
Visualization
CUDF
DASK
29
CUDF
GPU DataFrame library
• Apache Arrow data format
• Pandas-like API
• Unary and Binary Operations
• Joins / Merges
• GroupBys
• Filters
• User-Defined Functions (UDFs)
• Accelerated file readers
• Etc.
30
CUDF
Today
CUDA With Python Bindings
• Low level library containing function
implementations and C/C++ API
• Importing/exporting Apache Arrow using the
CUDA IPC mechanism
• CUDA kernels to perform element-wise math
operations on GPU DataFrame columns
• CUDA sort, join, groupby, and reduction
operations on GPU DataFrames
• A Python library for manipulating GPU
DataFrames
• Python interface to CUDA C++ with additional
functionality
• Creating Apache Arrow from Numpy arrays,
Pandas DataFrames, and PyArrow Tables
• JIT compilation of User-Defined Functions
(UDFs) using Numba
31
GPU-Accelerated string functions with a Pandas-like API
CUSTRING & NVSTRING
• API and functionality is following Pandas:
https://pandas.pydata.org/pandas-
docs/stable/api.html#string-handling
• lower()
• ~22x speedup
• find()
• ~40x speedup
• slice()
• ~100x speedup
0.00
100.00
200.00
300.00
400.00
500.00
600.00
700.00
800.00
lower() find(#) slice(1,15)
milliseconds
Pandas cudastrings
32
cuDF
Analytics
GPU Memory
Data Preparation VisualizationModel Training
cuML
Machine Learning
cuGraph
Graph Analytics
PyTorch & Chainer
Deep Learning
Kepler.GL
Visualization
DASK
DASK
33
DASK
What is Dask and why does RAPIDS use it for scaling out?
• Dask is a distributed computation scheduler
built to scale Python workloads from laptops to
supercomputer clusters.
• Extremely modular with scheduling, compute,
data transfer, and out-of-core handling all being
disjointed allowing us to plug in our own
implementations.
• Can easily run multiple Dask workers per node
to allow for an easier development model of
one worker per GPU regardless of single node
or multi node environment.
34
DASK
Scale up and out with cuDF
• Use cuDF primitives underneath in map-reduce style
operations with the same high level API
• Instead of using typical Dask data movement of
pickling objects and sending via TCP sockets, take
advantage of hardware advancements using a
communications framework called OpenUCX:
• For intranode data movement, utilize NVLink
and PCIe peer-to-peer communications
• For internode data movement, utilize GPU
RDMA over Infiniband and RoCE
https://github.com/rapidsai/dask_gdf
35
DASK
Scale up and out with cuML
• Native integration with Dask + cuDF
• Can easily use Dask workers to initialize NCCL for
optimized gather / scatter operations
• Example: this is how the dask-xgboost included
in the container works for multi-GPU and multi-
node, multi-GPU
• Provides easy to use, high level primitives for
synchronization of workers which is needed for many
ML algorithms
36
LOOKING TO THE
FUTURE
37
Next few months
GPU DATAFRAME
• Continue improving performance and functionality
• Single GPU
• Single node, multi GPU
• Multi node, multi GPU
• String Support
• Support for specific “string” dtype with GPU-accelerated functionality similar to Pandas
• Accelerated Data Loading
• File formats: CSV, Parquet, ORC – to start
38
CPUs bottleneck data loading in high throughput systems
ACCELERATED DATA LOADING
• CSV Reader
• Follows API of pandas.read_csv
• Current implementation is >10x speed
improvement over pandas
• Parquet Reader
• Work in progress:
https://github.com/gpuopenanalytics/li
bgdf/pull/85
• Will follow API of pandas.read_parquet
• ORC Reader
• Additionally looking towards GPU-accelerating
decompression for common compression
schemes
Source: Apache Crail blog: SQL Performance: Part 1 - Input File Formats
39
PYTHON CUDA ARRAY INTERFACE
Interoperability for Python GPU Array Libraries
• The CUDA array interface is a standard format
that describes a GPU array to allow sharing
GPU arrays between different libraries without
needing to copy or convert data
• Numba, CuPy, and PyTorch are the first
libraries to adopt the interface:
• https://numba.pydata.org/numba-
doc/dev/cuda/cuda_array_interface.html
• https://github.com/cupy/cupy/releases/tag/
v5.0.0b4
• https://github.com/pytorch/pytorch/pull/119
84
40
CONCLUDING
REMARKS
41
A DAY IN THE LIFE
Or: Why did I want to become a Data Scientist?
42
A DAY IN THE LIFE
Or: Why did I want to become a Data Scientist?
A: For the Data Science. And coffee.
43
ONE ARCHITECTURE FOR HPC AND DATA
SCIENCE
Simulation Data Analytics Visualization
44
• https://ngc.nvidia.com/registry/nvidia-
rapidsai-rapidsai
• https://hub.docker.com/r/rapidsai/rapidsai/
• https://github.com/rapidsai
• https://anaconda.org/rapidsai/
• WIP:
• https://pypi.org/project/cudf
• https://pypi.org/project/cuml
RAPIDS
How do I get the software?
45
JOIN THE MOVEMENT
Everyone Can Help!
Integrations, feedback, documentation support, pull requests, new issues, or code donations welcomed!
APACHE ARROW GPU Open Analytics
Initiative
https://arrow.apache.org/
@ApacheArrow
http://gpuopenanalytics.com/
@GPUOAI
RAPIDS
https://rapids.ai
@RAPIDSAI
THANK YOU

Weitere ähnliche Inhalte

Was ist angesagt?

[B15] HiRDBのSQL実行プランはどのように決定しているのか?by Masaaki Narita
[B15] HiRDBのSQL実行プランはどのように決定しているのか?by Masaaki Narita[B15] HiRDBのSQL実行プランはどのように決定しているのか?by Masaaki Narita
[B15] HiRDBのSQL実行プランはどのように決定しているのか?by Masaaki Narita
Insight Technology, Inc.
 
Glusterfs 파일시스템 구성_및 운영가이드_v2.0
Glusterfs 파일시스템 구성_및 운영가이드_v2.0Glusterfs 파일시스템 구성_및 운영가이드_v2.0
Glusterfs 파일시스템 구성_및 운영가이드_v2.0
sprdd
 
[D34] Shared Nothingなのに、Active-Activeクラスタ? ~ 高いスケーラビリティを誇る日立国産DBMS「HiRDB」のクラス...
[D34] Shared Nothingなのに、Active-Activeクラスタ? ~ 高いスケーラビリティを誇る日立国産DBMS「HiRDB」のクラス...[D34] Shared Nothingなのに、Active-Activeクラスタ? ~ 高いスケーラビリティを誇る日立国産DBMS「HiRDB」のクラス...
[D34] Shared Nothingなのに、Active-Activeクラスタ? ~ 高いスケーラビリティを誇る日立国産DBMS「HiRDB」のクラス...
Insight Technology, Inc.
 
10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ
Takashi Hoshino
 

Was ist angesagt? (20)

Hadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントHadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイント
 
Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVM Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVM
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
[db tech showcase Tokyo 2014] B26: PostgreSQLを拡張してみよう by SRA OSS, Inc. 日本支社 高塚遥
[db tech showcase Tokyo 2014] B26: PostgreSQLを拡張してみよう  by SRA OSS, Inc. 日本支社 高塚遥[db tech showcase Tokyo 2014] B26: PostgreSQLを拡張してみよう  by SRA OSS, Inc. 日本支社 高塚遥
[db tech showcase Tokyo 2014] B26: PostgreSQLを拡張してみよう by SRA OSS, Inc. 日本支社 高塚遥
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
[B15] HiRDBのSQL実行プランはどのように決定しているのか?by Masaaki Narita
[B15] HiRDBのSQL実行プランはどのように決定しているのか?by Masaaki Narita[B15] HiRDBのSQL実行プランはどのように決定しているのか?by Masaaki Narita
[B15] HiRDBのSQL実行プランはどのように決定しているのか?by Masaaki Narita
 
Scalar DB: Universal Transaction Manager
Scalar DB: Universal Transaction ManagerScalar DB: Universal Transaction Manager
Scalar DB: Universal Transaction Manager
 
コンテナ仮想、その裏側 〜user namespaceとrootlessコンテナ〜
コンテナ仮想、その裏側 〜user namespaceとrootlessコンテナ〜コンテナ仮想、その裏側 〜user namespaceとrootlessコンテナ〜
コンテナ仮想、その裏側 〜user namespaceとrootlessコンテナ〜
 
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSEUnderstanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSE
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
 
Glusterfs 파일시스템 구성_및 운영가이드_v2.0
Glusterfs 파일시스템 구성_및 운영가이드_v2.0Glusterfs 파일시스템 구성_및 운영가이드_v2.0
Glusterfs 파일시스템 구성_및 운영가이드_v2.0
 
Apache ZooKeeper
Apache ZooKeeperApache ZooKeeper
Apache ZooKeeper
 
[D34] Shared Nothingなのに、Active-Activeクラスタ? ~ 高いスケーラビリティを誇る日立国産DBMS「HiRDB」のクラス...
[D34] Shared Nothingなのに、Active-Activeクラスタ? ~ 高いスケーラビリティを誇る日立国産DBMS「HiRDB」のクラス...[D34] Shared Nothingなのに、Active-Activeクラスタ? ~ 高いスケーラビリティを誇る日立国産DBMS「HiRDB」のクラス...
[D34] Shared Nothingなのに、Active-Activeクラスタ? ~ 高いスケーラビリティを誇る日立国産DBMS「HiRDB」のクラス...
 
CephFS Update
CephFS UpdateCephFS Update
CephFS Update
 
10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
 
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road AheadAmazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
ストリーム処理プラットフォームにおけるKafka導入事例 #kafkajp
ストリーム処理プラットフォームにおけるKafka導入事例 #kafkajpストリーム処理プラットフォームにおけるKafka導入事例 #kafkajp
ストリーム処理プラットフォームにおけるKafka導入事例 #kafkajp
 

Ähnlich wie NVIDIA Rapids presentation

RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
Connected Data World
 

Ähnlich wie NVIDIA Rapids presentation (20)

GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
 
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
 
AI + E-commerce
AI + E-commerceAI + E-commerce
AI + E-commerce
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptxOpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 
Phi Week 2019
Phi Week 2019Phi Week 2019
Phi Week 2019
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 

Kürzlich hochgeladen

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 

Kürzlich hochgeladen (20)

Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 

NVIDIA Rapids presentation

  • 1. Dr. Christoph Angerer, Manager AI Developer Technologies, NVIDIA RAPIDS, FOSDEM’19
  • 2. 2 Healthcare Industrial Consumer Internet Automotive Ad Tech / MarTech Retail Financial / Insurance HPC & AI TRANSFORMS INDUSTRIES Computational & Data Scientists Are Driving Change
  • 3. 3 DATA SCIENCE IS NOT A LINEAR PROCESS It Requires Exploration and Iterations All Data ETL Manage Data Structured Data Store Data Preparation Training Model Training Visualization Evaluate Inference Deploy Accelerating`Model Training` only does have benefit but doesn’t address the whole problem Iterate … Cross Validate … Grid Search … Iterate some more.
  • 4. 4 DAY IN THE LIFE Or: Why did I want to become a Data Scientist? Data Scientist are valued resources. Why not give them the environment to be more productiveconfigure ETL workflow
  • 5. 5 PERFORMANCE AND DATA GROWTH Post-Moore's law Moore's law is no longer a predictor of capacity in CPU market growth Distributing CPUs exacerbates the problem Data sizes continue to grow
  • 6. 6 TRADITIONAL DATA SCIENCE CLUSTER Workload Profile: Fannie Mae Mortgage Data: • 192GB data set • 16 years, 68 quarters • 34.7 Million single family mortgage loans • 1.85 Billion performance records • XGBoost training set: 50 features 300 Servers | $3M | 180 kW
  • 7. 7 GPU-ACCELERATED MACHINE LEARNING CLUSTER 1 DGX-2 | 10 kW 1/8 the Cost | 1/15 the Space 1/18 the Power NVIDIA Data Science Platform with DGX-2 0 2,000 4,000 6,000 8,000 10,000 20 CPU Nodes 30 CPU Nodes 50 CPU Nodes 100 CPU Nodes DGX-2 5x DGX-1 End-to-End
  • 8. 8 DELIVERING DATA SCIENCE VALUE Maximized Productivity Top Model Accuracy Lowest TCO 215xSpeedup Using RAPIDS with XGBoost Oak Ridge National Labs $1BPotential Saving with 4% Error Rate Reduction Global Retail Giant $1.5MInfrastructure Cost Saving Streaming Media Company
  • 9. 9 DATA SCIENCE WORKFLOW WITH RAPIDS Open Source, End-to-end GPU-accelerated Workflow Built On CUDA DATA DATA PREPARATION GPUs accelerated compute for in-memory data preparation Simplified implementation using familiar data science tools Python drop-in Pandas replacement built on CUDA C++. GPU-accelerated Spark (in development) PREDICTIONS
  • 10. 10 DATA SCIENCE WORKFLOW WITH RAPIDS Open Source, End-to-end GPU-accelerated Workflow Built On CUDA MODEL TRAINING GPU-acceleration of today’s most popular ML algorithms XGBoost, PCA, K-means, k-NN, DBScan, tSVD … DATA PREDICTIONS
  • 11. 11 DATA SCIENCE WORKFLOW WITH RAPIDS Open Source, End-to-end GPU-accelerated Workflow Built On CUDA VISUALIZATION Effortless exploration of datasets, billions of records in milliseconds Dynamic interaction with data = faster ML model development Data visualization ecosystem (Graphistry & OmniSci), integrated with RAPIDS DATA PREDICTIONS
  • 12. 12 Faster Data Access Less Data Movement THE EFFECTS OF END-TO-END ACCELERATION 25-100x Improvement Less code Language flexible Primarily In-Memory HDFS Read HDFS Write HDFS Read HDFS Write HDFS ReadQuery ETL ML Train HDFS Read Query ETL ML Train HDFS Read GPU Read Query CPU Write GPU Read ETL CPU Write GPU Read ML Train Arrow Read Query ETL ML Train 5-10x Improvement More code Language rigid Substantially on GPU 50-100x Improvement Same code Language flexible Primarily on GPU RAPIDS GPU/Spark In-Memory Processing Hadoop Processing, Reading from disk Spark In-Memory Processing
  • 13. 13 ADDRESSING CHALLENGES IN GPU ACCELERATED DATA SCIENCE • Too much data movement • Too many makeshift data formats • Writing CUDA C/C++ is involved • No Python API for data manipulation Yes GPUs are fast but …
  • 14. 14 APP A DATA MOVEMENT AND TRANSFORMATION The bane of productivity and performance CPU GPU APP B Read Data Copy & Convert Copy & Convert Copy & Convert Load Data APP A GPU Data APP B GPU Data APP A APP B
  • 15. 15 APP A DATA MOVEMENT AND TRANSFORMATION What if we could keep data on the GPU? APP B Copy & Convert Copy & Convert Copy & Convert APP A GPU Data APP B GPU Data Read Data Load Data APP B CPU GPU APP A
  • 16. 16 LEARNING FROM APACHE ARROW From Apache Arrow Home Page - https://arrow.apache.org/
  • 17. 17 CUDA DATA FRAMES IN PYTHON GPUs at your Fingertips Illustrations from https://changhsinlee.com/pyspark-dataframe-basics/
  • 19. 19 RAPIDS Open GPU Data Science APPLICATIONS SYSTEMS ALGORITHMS CUDA ARCHITECTURE • Learn what the data science community needs • Use best practices and standards • Build scalable systems and algorithms • Test Applications and workflows • Iterate
  • 20. 20 cuDF Analytics GPU Memory Data Preparation VisualizationModel Training cuML Machine Learning cuGraph Graph Analytics PyTorch & Chainer Deep Learning Kepler.GL Visualization RAPIDS COMPONENTS DASK
  • 21. 21 cuDF Analytics GPU Memory Data Preparation VisualizationModel Training cuML Machine Learning cuGraph Graph Analytics PyTorch & Chainer Deep Learning Kepler.GL Visualization CUML & CUGRAPH DASK
  • 22. 22 AI LIBRARIES Accelerating more of the AI ecosystem Graph Analytics is fundamental to network analysis Machine Learning is fundamental to prediction, classification, clustering, anomaly detection and recommendations. Both can be accelerated with NVIDIA GPU 8x V100 20-90x faster than dual socket CPU Decisions Trees Random Forests Linear Regressions Logistics Regressions K-Means K-Nearest Neighbor DBSCAN Kalman Filtering Principal Components Single Value Decomposition Bayesian Inferencing PageRank BFS Jaccard Similarity Single Source Shortest Path Triangle Counting Louvain Modularity ARIMA Holt-Winters Machine Learning Graph Analytics Time Series XGBoost, Mortgage Dataset, 90x 3 Hours to 2 mins on 1 DGX-1 cuML & cuGraph
  • 23. 23 CUDF + XGBOOST DGX-2 vs Scale Out CPU Cluster • Full end to end pipeline • Leveraging Dask + PyGDF • Store each GPU results in sys mem then read back in • Arrow to Dmatrix (CSR) for XGBoost
  • 24. 24 CUDF + XGBOOST Scale Out GPU Cluster vs DGX-2 0 50 100 150 200 250 300 350 5xDGX-1 DGX-2 Chart Title ETL+CSV (s) ML Prep (s) ML (s) • Full end to end pipeline • Leveraging Dask for multi-node + PyGDF • Store each GPU results in sys mem then read back in • Arrow to Dmatrix (CSR) for XGBoost
  • 26. 26 NEAR FUTURE WORK ON CUML Additional algorithms in development right now K-means - Released K-NN - Released Kalman filter – v0.5 GLM – v0.5 Random Forests - v0.6 ARIMA – v0.6 UMAP – v0.6 Collaborative filtering – Q2 2019
  • 27. 27 CUGRAPH GPU-Accelerated Graph Analytics Library Coming Soon: Full NVGraph Integration Q1 2019
  • 28. 28 cuDF Analytics GPU Memory Data Preparation VisualizationModel Training cuML Machine Learning cuGraph Graph Analytics PyTorch & Chainer Deep Learning Kepler.GL Visualization CUDF DASK
  • 29. 29 CUDF GPU DataFrame library • Apache Arrow data format • Pandas-like API • Unary and Binary Operations • Joins / Merges • GroupBys • Filters • User-Defined Functions (UDFs) • Accelerated file readers • Etc.
  • 30. 30 CUDF Today CUDA With Python Bindings • Low level library containing function implementations and C/C++ API • Importing/exporting Apache Arrow using the CUDA IPC mechanism • CUDA kernels to perform element-wise math operations on GPU DataFrame columns • CUDA sort, join, groupby, and reduction operations on GPU DataFrames • A Python library for manipulating GPU DataFrames • Python interface to CUDA C++ with additional functionality • Creating Apache Arrow from Numpy arrays, Pandas DataFrames, and PyArrow Tables • JIT compilation of User-Defined Functions (UDFs) using Numba
  • 31. 31 GPU-Accelerated string functions with a Pandas-like API CUSTRING & NVSTRING • API and functionality is following Pandas: https://pandas.pydata.org/pandas- docs/stable/api.html#string-handling • lower() • ~22x speedup • find() • ~40x speedup • slice() • ~100x speedup 0.00 100.00 200.00 300.00 400.00 500.00 600.00 700.00 800.00 lower() find(#) slice(1,15) milliseconds Pandas cudastrings
  • 32. 32 cuDF Analytics GPU Memory Data Preparation VisualizationModel Training cuML Machine Learning cuGraph Graph Analytics PyTorch & Chainer Deep Learning Kepler.GL Visualization DASK DASK
  • 33. 33 DASK What is Dask and why does RAPIDS use it for scaling out? • Dask is a distributed computation scheduler built to scale Python workloads from laptops to supercomputer clusters. • Extremely modular with scheduling, compute, data transfer, and out-of-core handling all being disjointed allowing us to plug in our own implementations. • Can easily run multiple Dask workers per node to allow for an easier development model of one worker per GPU regardless of single node or multi node environment.
  • 34. 34 DASK Scale up and out with cuDF • Use cuDF primitives underneath in map-reduce style operations with the same high level API • Instead of using typical Dask data movement of pickling objects and sending via TCP sockets, take advantage of hardware advancements using a communications framework called OpenUCX: • For intranode data movement, utilize NVLink and PCIe peer-to-peer communications • For internode data movement, utilize GPU RDMA over Infiniband and RoCE https://github.com/rapidsai/dask_gdf
  • 35. 35 DASK Scale up and out with cuML • Native integration with Dask + cuDF • Can easily use Dask workers to initialize NCCL for optimized gather / scatter operations • Example: this is how the dask-xgboost included in the container works for multi-GPU and multi- node, multi-GPU • Provides easy to use, high level primitives for synchronization of workers which is needed for many ML algorithms
  • 37. 37 Next few months GPU DATAFRAME • Continue improving performance and functionality • Single GPU • Single node, multi GPU • Multi node, multi GPU • String Support • Support for specific “string” dtype with GPU-accelerated functionality similar to Pandas • Accelerated Data Loading • File formats: CSV, Parquet, ORC – to start
  • 38. 38 CPUs bottleneck data loading in high throughput systems ACCELERATED DATA LOADING • CSV Reader • Follows API of pandas.read_csv • Current implementation is >10x speed improvement over pandas • Parquet Reader • Work in progress: https://github.com/gpuopenanalytics/li bgdf/pull/85 • Will follow API of pandas.read_parquet • ORC Reader • Additionally looking towards GPU-accelerating decompression for common compression schemes Source: Apache Crail blog: SQL Performance: Part 1 - Input File Formats
  • 39. 39 PYTHON CUDA ARRAY INTERFACE Interoperability for Python GPU Array Libraries • The CUDA array interface is a standard format that describes a GPU array to allow sharing GPU arrays between different libraries without needing to copy or convert data • Numba, CuPy, and PyTorch are the first libraries to adopt the interface: • https://numba.pydata.org/numba- doc/dev/cuda/cuda_array_interface.html • https://github.com/cupy/cupy/releases/tag/ v5.0.0b4 • https://github.com/pytorch/pytorch/pull/119 84
  • 41. 41 A DAY IN THE LIFE Or: Why did I want to become a Data Scientist?
  • 42. 42 A DAY IN THE LIFE Or: Why did I want to become a Data Scientist? A: For the Data Science. And coffee.
  • 43. 43 ONE ARCHITECTURE FOR HPC AND DATA SCIENCE Simulation Data Analytics Visualization
  • 44. 44 • https://ngc.nvidia.com/registry/nvidia- rapidsai-rapidsai • https://hub.docker.com/r/rapidsai/rapidsai/ • https://github.com/rapidsai • https://anaconda.org/rapidsai/ • WIP: • https://pypi.org/project/cudf • https://pypi.org/project/cuml RAPIDS How do I get the software?
  • 45. 45 JOIN THE MOVEMENT Everyone Can Help! Integrations, feedback, documentation support, pull requests, new issues, or code donations welcomed! APACHE ARROW GPU Open Analytics Initiative https://arrow.apache.org/ @ApacheArrow http://gpuopenanalytics.com/ @GPUOAI RAPIDS https://rapids.ai @RAPIDSAI