An Introduction to H2O4GPU

•Als PPTX, PDF herunterladen•

1 gefällt mir•598 views

This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/rKoBJcnsFpM Speaker's Bio: Mateusz is a software developer who loves all things distributed, machine learning and hates buzzwords. His favourite hobby data juggling. He obtained his M.Sc. in Computer Science from AGH UST in Krakow, Poland, during which he did an exchange at L’ECE Paris in France and worked on distributed flight booking systems. After graduation he move to Tokyo to work as a researcher at Fujitsu Laboratories on machine learning and NLP projects, where he is still currently based. In his spare time he tries to be part of the IT community by organizing, attending and speaking at conferences and meet ups.

Technologie

Mateusz Dymczyk
Senior Software Engineer
H2O.ai
@mdymczyk
Introduction to
H2O4GPU

Practical Machine Learning
Machine
Learning

Moore’s Law
1980 1990 2000 2010 2020
102
103
104
105
106
107
40 Years of Microprocessor Trend Data
Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O.
Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for
2010-2015 by K. Rupp
Single-threaded perf
1.5X per year
1.1X per year
Transistors
(thousands)

GPU
1980 1990 2000 2010 2020
GPU-Computing perf
1.5X per year
1000X
by
2025
Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O.
Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for
2010-2015 by K. Rupp
102
103
104
105
106
107
Single-threaded perf
1.5X per year
1.1X per year
APPLICATIONS
SYSTEMS
ALGORITHMS
CUDA
ARCHITECTURE

GPU architecture
Low latency vs High throughput
GPU
• Optimized for data-parallel,
throughput computation
• Architecture tolerant of
memory latency
• More transistors dedicated to
computation
CPU
• Optimized for low-latency
access to cached data sets
• Control logic for out-of-order
and speculative execution

GPU Enhanced Applications
Application Code
GPU
Use GPU to
Parallelize
Compute-Intensive
Functions CPU
Rest of Sequential
CPU Code

H2O4GPU
• Open-Source: https://github.com/h2oai/h2o4gpu
• Collection of important ML algorithms ported to the GPU (with CPU fallback option):
• Gradient Boosted Machines
• GLM
• Truncated SVD
• PCA
• KMeans
• (soon) Field Aware Factorization Machines
• Performance optimized, multi-GPU support (certain algorithms)
• Used within our own Driverless AI Product to boost performance 30X
• Scikit-Learn compatible Python API (and now R API)

Gradient Boosting Machines
• Based upon XGBoost
• Raw floating point data -> Binned into Quantiles
• Quantiles are stored as compressed instead of floats
• Compressed Quantiles are efficiently transferred to GPU
• Sparsity is handled directly with highly GPU efficiency
• Multi-GPU by sharding rows using NVIDIA NCCL AllReduce

KMeans
• Significantly faster than Scikit-learn implementation (up to 50x)
• Significantly faster than other GPU implementations (5x-10x)
• Supports kmeans|| initialization
• Supports multiple GPUs by sharding the dataset
• Supports batching data if exceeds GPU memory

Truncated SVD & PCA
• Matrix decomposition
• Popular for text processing
and dimensionality reduction
• GPU optimizes linear algebra
operations

Truncated SVD & PCA
• The intrinsic dimensionality of certain datasets is much lower than the
original (e.g. here 4096 vs. actual ~200)
• PCA can reduce the dimensionality and preserve most of the explained
variance at the same time
• Better input for further modeling - takes less time

Field Aware Factorization Machines
* under development
• Click Through Rate (CTR):
• One of the most important tasks in computational advertising
• Percentage of users, who actually click on ads
• Until recently solved with logistic regression - bad at finding feature conjunctions
(learns the effect of all variables or features individually)
Clicked Publisher (P) Advertiser (A) Gender (G)
Yes ESPN Nike Male
No NBC Adidas Male

Field Aware Factorization Machines
* under development
• Separates the data into fields (Publisher, Advertiser, Gender) and features (EPSN, NBC,
Adidas, Nike, Male, Female)
• Uses a latent space for each pair to generate the model
• Used to win the first prize of three CTR competitions hosted by Criteo, Avazu, Outbrain,
and also the third prize of RecSys Challenge 2015.

More info
• Code: http://github.com/h2oai/h2o4gpu
• Questions:
• https://stackoverflow.com/questions/tagged/h2o4gpu
• https://gitter.im/h2oai/h2o4gpu

Empfohlen

Machine Learning on Google Cloud with H2OSri Ambati

H2O-3: Overview of new features and algorithmsSri Ambati

Get Behind the Wheel with H2O Driverless AI Hands-On Training Sri Ambati

Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...Sri Ambati

Sundar Ranganathan, NetApp + Vinod Iyengar, H2O.ai - Driverless AI integratio...Sri Ambati

Hambug R Meetup - Intro to H2OSri Ambati

Training of Python scikit-learn models on AzureMark Tabladillo

Ai platform at scaleHenry Saputra

Empfohlen

Machine Learning on Google Cloud with H2OSri Ambati

H2O-3: Overview of new features and algorithmsSri Ambati

Get Behind the Wheel with H2O Driverless AI Hands-On Training Sri Ambati

Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...Sri Ambati

Sundar Ranganathan, NetApp + Vinod Iyengar, H2O.ai - Driverless AI integratio...Sri Ambati

Hambug R Meetup - Intro to H2OSri Ambati

Training of Python scikit-learn models on AzureMark Tabladillo

Ai platform at scaleHenry Saputra

CI/CD for Machine Learning with Daniel KobranDatabricks

Bigdata Machine Learning PlatformMk Kim

Paris Data Geek - Spark Streaming Djamel Zouaoui

Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Databricks

Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Databricks

Data Tells the Story - Greenplum Summit 2018VMware Tanzu

Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Databricks

Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...Databricks

Introduction to data science with H2O-ChicagoSri Ambati

Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...Databricks

AI on Greenplum Using  Apache MADlib and MADlib Flow - Greenplum Summit 2019VMware Tanzu

ASGARD Splunk Conf 2016Keith Kraus

H2O Deep Water - Making Deep Learning Accessible to EveryoneSri Ambati

Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Databricks

Rapids: Data Science on GPUsinside-BigData.com

Designing Artificial IntelligenceDavid Chou

AWS Customer Presentation - VMIX AWS ExperienceAmazon Web Services

Dsdt meetup 2017 11-21JDA Labs MTL

RAPIDS – Open GPU-accelerated Data ScienceData Works MD

Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Databricks

GTC 2017: Powering the AI RevolutionNVIDIA

Ai Forum at Computex 2017 - Keynote Slides by Jensen HuangNVIDIA Taiwan

Weitere ähnliche Inhalte

Was ist angesagt?

CI/CD for Machine Learning with Daniel KobranDatabricks

Bigdata Machine Learning PlatformMk Kim

Paris Data Geek - Spark Streaming Djamel Zouaoui

Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Databricks

Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Databricks

Data Tells the Story - Greenplum Summit 2018VMware Tanzu

Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Databricks

Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...Databricks

Introduction to data science with H2O-ChicagoSri Ambati

Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...Databricks

AI on Greenplum Using  Apache MADlib and MADlib Flow - Greenplum Summit 2019VMware Tanzu

ASGARD Splunk Conf 2016Keith Kraus

H2O Deep Water - Making Deep Learning Accessible to EveryoneSri Ambati

Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Databricks

Rapids: Data Science on GPUsinside-BigData.com

Designing Artificial IntelligenceDavid Chou

AWS Customer Presentation - VMIX AWS ExperienceAmazon Web Services

Dsdt meetup 2017 11-21JDA Labs MTL

RAPIDS – Open GPU-accelerated Data ScienceData Works MD

Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Databricks

Was ist angesagt? (20)

CI/CD for Machine Learning with Daniel Kobran

Bigdata Machine Learning Platform

Paris Data Geek - Spark Streaming

Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...

Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...

Data Tells the Story - Greenplum Summit 2018

Building Intelligent Applications, Experimental ML with Uber’s Data Science W...

Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...

Introduction to data science with H2O-Chicago

Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...

AI on Greenplum Using  Apache MADlib and MADlib Flow - Greenplum Summit 2019

ASGARD Splunk Conf 2016

H2O Deep Water - Making Deep Learning Accessible to Everyone

Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...

Rapids: Data Science on GPUs

Designing Artificial Intelligence

AWS Customer Presentation - VMIX AWS Experience

Dsdt meetup 2017 11-21

RAPIDS – Open GPU-accelerated Data Science

Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...

Ähnlich wie An Introduction to H2O4GPU

GTC 2017: Powering the AI RevolutionNVIDIA

Ai Forum at Computex 2017 - Keynote Slides by Jensen HuangNVIDIA Taiwan

In datacenter performance analysis of a tensor processing unitJinwon Lee

Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15MLconf

DATE 2020: Design, Automation and Test in Europe ConferenceLEGATO project

realtime_ai_systems_academia.pptxgopikahari7

Big Data LDN 2017: BI Converges with AI - GPUs for Fast DataMatt Stubbs

Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemAI Frontiers

Extending Hadoop for Fun & ProfitMilind Bhandarkar

Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsKinetica

Deterministic and high throughput data processing for CubeSatsPablo Ghiglino

Exascale CapablSagar Dolas

Deep_Learning_Frameworks_CNTK_PyTorchSubhashis Hazarika

Programmable Exascale SupercomputerSagar Dolas

GTC Taiwan 2017 主題演說NVIDIA Taiwan

Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Matej Misik

Robotics technical Presentationklepsydratechnologie

Fugaku, the Successes and the Lessons LearnedRCCSRENKEI

IAC 2020klepsydratechnologie

High performance computing for researchEsteban Hernandez

Ähnlich wie An Introduction to H2O4GPU (20)

GTC 2017: Powering the AI Revolution

Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang

In datacenter performance analysis of a tensor processing unit

Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

DATE 2020: Design, Automation and Test in Europe Conference

realtime_ai_systems_academia.pptx

Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data

Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System

Extending Hadoop for Fun & Profit

Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics

Deterministic and high throughput data processing for CubeSats

Exascale Capabl

Deep_Learning_Frameworks_CNTK_PyTorch

Programmable Exascale Supercomputer

GTC Taiwan 2017 主題演說

Fast data in times of crisis with GPU accelerated database QikkDB | Business ...

Robotics technical Presentation

Fugaku, the Successes and the Lessons Learned

IAC 2020

High performance computing for research

Mehr von Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati

Generative AI Masterclass - Model Risk Management.pptxSri Ambati

AI and the Future of Software Development: A Sneak Peek Sri Ambati

LLMOps: Match report from the top of the 5thSri Ambati

Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati

Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati

Risk Management for LLMsSri Ambati

Open-Source AI: Community is the WaySri Ambati

Building Custom GenAI Apps at H2OSri Ambati

Applied Gen AI for the Finance Vertical Sri Ambati

Cutting Edge Tricks from LLM PapersSri Ambati

Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati

Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati

KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati

LLM Interpretability Sri Ambati

Never Reply to an Email AgainSri Ambati

Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati

From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati

AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati

AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati

Mehr von Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day

Generative AI Masterclass - Model Risk Management.pptx

AI and the Future of Software Development: A Sneak Peek

LLMOps: Match report from the top of the 5th

Building, Evaluating, and Optimizing your RAG App for Production

Building LLM Solutions using Open Source and Closed Source Solutions in Coher...

Risk Management for LLMs

Open-Source AI: Community is the Way

Building Custom GenAI Apps at H2O

Applied Gen AI for the Finance Vertical

Cutting Edge Tricks from LLM Papers

Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...

Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...

KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...

LLM Interpretability

Never Reply to an Email Again

Introducción al Aprendizaje Automatico con H2O-3 (1)

From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...

AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...

AI Foundations Course Module 1 - An AI Transformation Journey

Kürzlich hochgeladen

🐬 The future of MySQL is Postgres 🐘RTylerCroy

A Year of the Servo Reboot: Where Are We Now?Igalia

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

How to convert PDF to text with Nanonetsnaman860154

A Domino Admins Adventures (Engage 2024)Gabriella Davis

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Kürzlich hochgeladen (20)

🐬 The future of MySQL is Postgres 🐘

A Year of the Servo Reboot: Where Are We Now?

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Data Cloud, More than a CDP by Matt Robison

Tata AIG General Insurance Company - Insurer Innovation Award 2024

2024: Domino Containers - The Next Step. News from the Domino Container commu...

How to convert PDF to text with Nanonets

A Domino Admins Adventures (Engage 2024)

08448380779 Call Girls In Civil Lines Women Seeking Men

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Handwritten Text Recognition for manuscripts and early printed texts

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Boost Fertility New Invention Ups Success Rates.pdf

How to Troubleshoot Apps for the Modern Connected Worker

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Automating Google Workspace (GWS) & more with Apps Script

Exploring the Future Potential of AI-Enabled Smartphone Processors

An Introduction to H2O4GPU

1. Mateusz Dymczyk Senior Software Engineer H2O.ai @mdymczyk Introduction to H2O4GPU

2. Practical Machine Learning Machine Learning

3. Moore’s Law 1980 1990 2000 2010 2020 102 103 104 105 106 107 40 Years of Microprocessor Trend Data Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp Single-threaded perf 1.5X per year 1.1X per year Transistors (thousands)

4. GPU 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp 102 103 104 105 106 107 Single-threaded perf 1.5X per year 1.1X per year APPLICATIONS SYSTEMS ALGORITHMS CUDA ARCHITECTURE

5. GPU architecture Low latency vs High throughput GPU • Optimized for data-parallel, throughput computation • Architecture tolerant of memory latency • More transistors dedicated to computation CPU • Optimized for low-latency access to cached data sets • Control logic for out-of-order and speculative execution

6. GPU Enhanced Applications Application Code GPU Use GPU to Parallelize Compute-Intensive Functions CPU Rest of Sequential CPU Code

7. Machine Learning and GPUs

8. Matrix Multiplication

9. Matrix Multiplication

10. Matrix Multiplication

11. Matrix Multiplication

12. Matrix Multiplication

13. H2O4GPU • Open-Source: https://github.com/h2oai/h2o4gpu • Collection of important ML algorithms ported to the GPU (with CPU fallback option): • Gradient Boosted Machines • GLM • Truncated SVD • PCA • KMeans • (soon) Field Aware Factorization Machines • Performance optimized, multi-GPU support (certain algorithms) • Used within our own Driverless AI Product to boost performance 30X • Scikit-Learn compatible Python API (and now R API)

14. Gradient Boosting Machines • Based upon XGBoost • Raw floating point data -> Binned into Quantiles • Quantiles are stored as compressed instead of floats • Compressed Quantiles are efficiently transferred to GPU • Sparsity is handled directly with highly GPU efficiency • Multi-GPU by sharding rows using NVIDIA NCCL AllReduce

15.

16. KMeans • Significantly faster than Scikit-learn implementation (up to 50x) • Significantly faster than other GPU implementations (5x-10x) • Supports kmeans|| initialization • Supports multiple GPUs by sharding the dataset • Supports batching data if exceeds GPU memory

17. 12 with kmeans||

18. Truncated SVD & PCA • Matrix decomposition • Popular for text processing and dimensionality reduction • GPU optimizes linear algebra operations

19. Truncated SVD & PCA • The intrinsic dimensionality of certain datasets is much lower than the original (e.g. here 4096 vs. actual ~200) • PCA can reduce the dimensionality and preserve most of the explained variance at the same time • Better input for further modeling - takes less time

20.

21. Field Aware Factorization Machines * under development • Click Through Rate (CTR): • One of the most important tasks in computational advertising • Percentage of users, who actually click on ads • Until recently solved with logistic regression - bad at finding feature conjunctions (learns the effect of all variables or features individually) Clicked Publisher (P) Advertiser (A) Gender (G) Yes ESPN Nike Male No NBC Adidas Male

22. Field Aware Factorization Machines * under development • Separates the data into fields (Publisher, Advertiser, Gender) and features (EPSN, NBC, Adidas, Nike, Male, Female) • Uses a latent space for each pair to generate the model • Used to win the first prize of three CTR competitions hosted by Criteo, Avazu, Outbrain, and also the third prize of RecSys Challenge 2015.

23. More info • Code: http://github.com/h2oai/h2o4gpu • Questions: • https://stackoverflow.com/questions/tagged/h2o4gpu • https://gitter.im/h2oai/h2o4gpu

24. Q&A