[2C5]Map-D: A GPU Database for Interactive Big Data Analytics

•

13 gefällt mir•11,486 views

NAVER D2

DEVIEW 2014 [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

Technologie

map-D
GDWDUHÀQHG
www.map-d.com
@datarefined Todd Mostak Ι
todd@map-d.com Ι
1830 Sansome St.
San Francisco, CA 94104
#mapd
@datarefined

map-D? super-fast database
built into GPU memory
Do?
world’s fastest
real-time big data analytics
interactive visualization
Demo?
twitter analytics platform
1billion+ tweets
milliseconds

The importance of interactivity
People have struggled for a long time to build interactive
visualizations of big data that can deliver insight
Interactivity means:
• Hypothesis testing can occur at “speed of thought”
How Interactive is interactive enough?
• According to a study by Jeffrey Heer and Zhicheng Liu, “an injected
delay of half a second per operation adversely affects user
performance in exploratory data analysis.”
• Some types of latency are more detrimental than others:
• For example, linking and brushing more sensitive than zooming

Strategies for interactivity
• Sampling:
• Ex. BlinkDB
• Issues:
• Need statistically robust method for sampling
• Sampling can miss “long-tail” phenomena
• Pre-computation
• Ex. ImMems (datacubing)
• Issues:
• Only can show what curator thought was relevant
• Can only store a certain number of binned attributes
• Must be curated!

$The Arrival of In-Memory Systems • Traditional RDBMS used to be too slow to serve as a back-end for interactive visualizations. • Queries of over a billion records could take minutes if not hours • But in-memory systems can execute such queries in a fraction of the time. • Both full DBMS and “pseudo”-DBMS solutions • But still often too slow$

Core Innovation
SQL-enabled column store database built into the memory
architecture on GPUs and CPUs
Code developed from scratch to take advantage of:
• Memory and computational bandwidth of multiple GPUs
• Heterogeneous architectures (CPUs and GPUs)
• Fast RDMA between GPUs on different nodes
• GPU Graphics pipeline
Double-level buffer pool across GPU and CPU memory
Shared scans – multiple queries of the same data can share
memory bandwidth
System can scan data at 2TB/sec per node, with 10TB/sec per
node logical throughput with shared scans

The
Hardware
IB
IB
GPU
1
GPU
2
GPU
3
PCI
PCI
CPU
0
S1
CPU
1
QPI
RAID
Controller
GPU
0
S2
S3
S4
IB
IB
GPU
1
GPU
2
GPU
3
PCI
PCI
CPU
0
S1
CPU
1
QPI
RAID
Controller
GPU
0
S2
S3
S4
Switch
Node
0
Node
1

The
Two-‐Level
Buffer
Pool
GPU
Memory
CPU
Memory
SSD

Shared Nothing Processing
Multiple GPUs, with data partitioned between them
Filter
text ILIKE ‘rain’!
Filter
text ILIKE ‘rain’!
Filter
text ILIKE ‘rain’!
Node
1
Node
2
Node
3

Complex
AnalyKcs
Image
processing
VisualizaKon
GPU
in-‐memory
SQL
database
OpenGL
H.264/VP8
streaming
GPU
pipeline
Machine
learning
Graph
analyKcs
License
Simple
#
of
GPUs
Mobile/server
versions
Scale
to
cluster
of
GPU
nodes
SQL
compiler
Shared
scans
User
defined
funcKons
Hybrid
GPU/CPU
execuKon
OpenCL
and
CUDA
Product GPU
powered
end-‐to-‐end
big
data
analyKcs
and
visualizaKon
plaYorm

Map-D code
Single GPU
12GB memory
Map-D code
integrated into
GPU memory
Single CPU
768GB memory
Map-D code
integrated into
CPU memory
NVIDIA TEGRA
Mobile chip
4GB memory
Map-D code
integrated into
chip memory
8 cards = 4U box
4 sockets = 4U box
Map-D code
runs on GPU +
CPU memory
36U rack:
~400GB GPU
~12TB CPU
Mobile Map-D running
small datasets
Native App
Web-based
service
Map-D hardware architecture
Large Data Big Data
Small Data
Next Gen Flash
40TB
100GB/s

map-D
www.map-d.com
@datarefined
info@map-d.com

Empfohlen

[214]유연하고 확장성 있는 빅데이터 처리NAVER D2

SCasia 2018 MSFT hands on session for Azure Batch AIHiroshi Tanaka

[212]big models without big data using domain specific deep networks in data-...NAVER D2

How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsAltoros

Deep Dive on Amazon EC2Amazon Web Services

Caching methodology and strategiesTiep Vu

High Performance Computing (HPC) in cloudAccubits Technologies

MEW22 22nd Machine Evaluation Workshop MicrosoftLee Stott

Empfohlen

[214]유연하고 확장성 있는 빅데이터 처리NAVER D2

SCasia 2018 MSFT hands on session for Azure Batch AIHiroshi Tanaka

[212]big models without big data using domain specific deep networks in data-...NAVER D2

How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsAltoros

Deep Dive on Amazon EC2Amazon Web Services

Caching methodology and strategiesTiep Vu

High Performance Computing (HPC) in cloudAccubits Technologies

MEW22 22nd Machine Evaluation Workshop MicrosoftLee Stott

Deep Learning Computer BuildPetteriTeikariPhD

deep learning in production cff 2017Ari Kamlani

Deep Learning을 위한 AWS 기반 인공 지능(AI) 서비스 (윤석찬)Amazon Web Services Korea

re:dash is awesomeHiroshi Toyama

R in Minecraft Revolution Analytics

Microsoft Azure in HPC scenariosmictc

(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPCAmazon Web Services

Kafka Summit SF 2017 - Infrastructure for Streaming Applications confluent

Speeding up R with Parallel Programming in the CloudRevolution Analytics

Nyc summit intro_to_cassandrazznate

Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesSeungYong Oh

Tensorflow vs MxNetAshish Bansal

Arc305 how netflix leverages multiple regions to increase availability an i...Ruslan Meshenberg

Move Over, RsyncAll Things Open

DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy

Managing your Black Friday LogsJ On The Beach

Building the Right Platform Architecture for HadoopAll Things Open

An Introduction to Using PostgreSQL with Docker & KubernetesJonathan Katz

[2B4]Live Broadcasting 추천시스템 NAVER D2

[2C4]Clustered computing with CoreOS, fleet and etcdNAVER D2

[2A1]Line은 어떻게 글로벌 메신저 플랫폼이 되었는가NAVER D2

[1D4]오타 수정과 편집 기능을 가진 Android Keyboard Service 개발기NAVER D2

Weitere ähnliche Inhalte

Was ist angesagt?

Deep Learning Computer BuildPetteriTeikariPhD

deep learning in production cff 2017Ari Kamlani

Deep Learning을 위한 AWS 기반 인공 지능(AI) 서비스 (윤석찬)Amazon Web Services Korea

re:dash is awesomeHiroshi Toyama

R in Minecraft Revolution Analytics

Microsoft Azure in HPC scenariosmictc

(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPCAmazon Web Services

Kafka Summit SF 2017 - Infrastructure for Streaming Applications confluent

Speeding up R with Parallel Programming in the CloudRevolution Analytics

Nyc summit intro_to_cassandrazznate

Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesSeungYong Oh

Tensorflow vs MxNetAshish Bansal

Arc305 how netflix leverages multiple regions to increase availability an i...Ruslan Meshenberg

Move Over, RsyncAll Things Open

DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy

Managing your Black Friday LogsJ On The Beach

Building the Right Platform Architecture for HadoopAll Things Open

An Introduction to Using PostgreSQL with Docker & KubernetesJonathan Katz

Was ist angesagt? (18)

Deep Learning Computer Build

deep learning in production cff 2017

Deep Learning을 위한 AWS 기반 인공 지능(AI) 서비스 (윤석찬)

re:dash is awesome

R in Minecraft

Microsoft Azure in HPC scenarios

(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC

Kafka Summit SF 2017 - Infrastructure for Streaming Applications

Speeding up R with Parallel Programming in the Cloud

Nyc summit intro_to_cassandra

Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes

Tensorflow vs MxNet

Arc305 how netflix leverages multiple regions to increase availability an i...

Move Over, Rsync

DataStax and Esri: Geotemporal IoT Search and Analytics

Managing your Black Friday Logs

Building the Right Platform Architecture for Hadoop

An Introduction to Using PostgreSQL with Docker & Kubernetes

Andere mochten auch

[2B4]Live Broadcasting 추천시스템 NAVER D2

[2C4]Clustered computing with CoreOS, fleet and etcdNAVER D2

[2A1]Line은 어떻게 글로벌 메신저 플랫폼이 되었는가NAVER D2

[1D4]오타 수정과 편집 기능을 가진 Android Keyboard Service 개발기NAVER D2

[1B5]github first-principlesNAVER D2

[1A1]행복한프로그래머를위한철학NAVER D2

[2D4]Python에서의 동시성_병렬성NAVER D2

[1B2]자신있는개발자에서훌륭한개발자로NAVER D2

[1D2]아이비컨과 공유기 해킹을 통한 인도어 IOT 삽질기NAVER D2

[Hello world]nodejs helloworld chaesuwonNAVER D2

How to Solve Real-Time Data ProblemsIBM Power Systems

[Hello world]git internalNAVER D2

[Hello world]n forgeNAVER D2

[Hello world]play framework소개NAVER D2

Deview2014 Live Broadcasting 추천시스템 발표 자료choi kyumin

[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...NAVER D2

S3332 peter bakkumPeter Bakkum

GPU ProgrammingWilliam Cunningham

제2회 hello world 오픈세미나 collie html5-animationlibraryNAVER D2

RHQ를 활용한 Legacy system 모니터링YUSOO KIM

Andere mochten auch (20)

[2B4]Live Broadcasting 추천시스템

[2C4]Clustered computing with CoreOS, fleet and etcd

[2A1]Line은 어떻게 글로벌 메신저 플랫폼이 되었는가

[1D4]오타 수정과 편집 기능을 가진 Android Keyboard Service 개발기

[1B5]github first-principles

[1A1]행복한프로그래머를위한철학

[2D4]Python에서의 동시성_병렬성

[1B2]자신있는개발자에서훌륭한개발자로

[1D2]아이비컨과 공유기 해킹을 통한 인도어 IOT 삽질기

[Hello world]nodejs helloworld chaesuwon

How to Solve Real-Time Data Problems

[Hello world]git internal

[Hello world]n forge

[Hello world]play framework소개

Deview2014 Live Broadcasting 추천시스템 발표 자료

[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...

S3332 peter bakkum

GPU Programming

제2회 hello world 오픈세미나 collie html5-animationlibrary

RHQ를 활용한 Legacy system 모니터링

Ähnlich wie [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

GPU-Accelerating UDFs in PySpark with Numba and PyGDFKeith Kraus

Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Matej Misik

Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabaseKinetica

Rapids: Data Science on GPUsinside-BigData.com

NVIDIA Rapids presentationtestSri1

GOAI: GPU-Accelerated Data Science DataSciCon 2017Joshua Patterson

Understanding HadoopAhmed Ossama

RAPIDS, GPUs & Python - AWS Community Day MelbourneRay Hilton

Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj

Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseDatabricks

Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico

BertenthalJesse Lingeman

GPU databases - How to use them and what the future holdsArnon Shimoni

In memory grids IMDGPrateek Jain

Chug dl presentationChicago Hadoop Users Group

RAPIDS – Open GPU-accelerated Data ScienceData Works MD

Tek12: Graphing real-time performance with Graphitenanderoo

GPU Computing: A brief overviewRajiv Kumar

Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAlluxio, Inc.

A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...Facultad de Informática UCM

Ähnlich wie [2C5]Map-D: A GPU Database for Interactive Big Data Analytics (20)

GPU-Accelerating UDFs in PySpark with Numba and PyGDF

Fast data in times of crisis with GPU accelerated database QikkDB | Business ...

Powering Real-Time Big Data Analytics with a Next-Gen GPU Database

Rapids: Data Science on GPUs

NVIDIA Rapids presentation

GOAI: GPU-Accelerated Data Science DataSciCon 2017

Understanding Hadoop

RAPIDS, GPUs & Python - AWS Community Day Melbourne

Hybrid Transactional/Analytics Processing with Spark and IMDGs

Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise

Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)

Bertenthal

GPU databases - How to use them and what the future holds

In memory grids IMDG

Chug dl presentation

RAPIDS – Open GPU-accelerated Data Science

Tek12: Graphing real-time performance with Graphite

GPU Computing: A brief overview

Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio

A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...

Mehr von NAVER D2

[211] 인공지능이 인공지능 챗봇을 만든다NAVER D2

[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...NAVER D2

[215] Druid로 쉽고 빠르게 데이터 분석하기NAVER D2

[245]Papago Internals: 모델분석과 응용기술 개발NAVER D2

[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈NAVER D2

[235]Wikipedia-scale Q&ANAVER D2

[244]로봇이 현실 세계에 대해 학습하도록 만들기NAVER D2

[243] Deep Learning to help student’s Deep LearningNAVER D2

[234]Fast & Accurate Data Annotation Pipeline for AI applicationsNAVER D2

Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingNAVER D2

[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지NAVER D2

[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기NAVER D2

[224]네이버 검색과 개인화NAVER D2

[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)NAVER D2

[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기NAVER D2

[213] Fashion Visual SearchNAVER D2

[232] TensorRT를 활용한 딥러닝 Inference 최적화NAVER D2

[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지NAVER D2

[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터NAVER D2

[223]기계독해 QA: 검색인가, NLP인가?NAVER D2

Mehr von NAVER D2 (20)

[211] 인공지능이 인공지능 챗봇을 만든다

[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...

[215] Druid로 쉽고 빠르게 데이터 분석하기

[245]Papago Internals: 모델분석과 응용기술 개발

[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈

[235]Wikipedia-scale Q&A

[244]로봇이 현실 세계에 대해 학습하도록 만들기

[243] Deep Learning to help student’s Deep Learning

[234]Fast & Accurate Data Annotation Pipeline for AI applications

Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing

[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지

[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기

[224]네이버 검색과 개인화

[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)

[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기

[213] Fashion Visual Search

[232] TensorRT를 활용한 딥러닝 Inference 최적화

[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지

[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터

[223]기계독해 QA: 검색인가, NLP인가?

Kürzlich hochgeladen

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

🐬 The future of MySQL is Postgres 🐘RTylerCroy

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Histor y of HAM Radio presentation slidevu2urc

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts

🐬 The future of MySQL is Postgres 🐘

A Domino Admins Adventures (Engage 2024)

Scaling API-first – The story of a global engineering organization

Driving Behavioral Change for Information Management through Data-Driven Gree...

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Histor y of HAM Radio presentation slide

Automating Google Workspace (GWS) & more with Apps Script

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Boost PC performance: How more available memory can improve productivity

How to Troubleshoot Apps for the Modern Connected Worker

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

08448380779 Call Girls In Friends Colony Women Seeking Men

Exploring the Future Potential of AI-Enabled Smartphone Processors

[2C5]Map-D: A GPU Database for Interactive Big Data Analytics

1. map-D GDWDUHÀQHG www.map-d.com @datarefined Todd Mostak Ι todd@map-d.com Ι 1830 Sansome St. San Francisco, CA 94104 #mapd @datarefined

2. map-D? super-fast database built into GPU memory Do? world’s fastest real-time big data analytics interactive visualization Demo? twitter analytics platform 1billion+ tweets milliseconds

3. The importance of interactivity People have struggled for a long time to build interactive visualizations of big data that can deliver insight Interactivity means: • Hypothesis testing can occur at “speed of thought” How Interactive is interactive enough? • According to a study by Jeffrey Heer and Zhicheng Liu, “an injected delay of half a second per operation adversely affects user performance in exploratory data analysis.” • Some types of latency are more detrimental than others: • For example, linking and brushing more sensitive than zooming

4. Strategies for interactivity • Sampling: • Ex. BlinkDB • Issues: • Need statistically robust method for sampling • Sampling can miss “long-tail” phenomena • Pre-computation • Ex. ImMems (datacubing) • Issues: • Only can show what curator thought was relevant • Can only store a certain number of binned attributes • Must be curated!

5. The Arrival of In-Memory Systems • Traditional RDBMS used to be too slow to serve as a back-end for interactive visualizations. • Queries of over a billion records could take minutes if not hours • But in-memory systems can execute such queries in a fraction of the time. • Both full DBMS and “pseudo”-DBMS solutions • But still often too slow

6. Enter Map-D

7. the technology

8. Core Innovation SQL-enabled column store database built into the memory architecture on GPUs and CPUs Code developed from scratch to take advantage of: • Memory and computational bandwidth of multiple GPUs • Heterogeneous architectures (CPUs and GPUs) • Fast RDMA between GPUs on different nodes • GPU Graphics pipeline Double-level buffer pool across GPU and CPU memory Shared scans – multiple queries of the same data can share memory bandwidth System can scan data at 2TB/sec per node, with 10TB/sec per node logical throughput with shared scans

9. The Hardware IB IB GPU 1 GPU 2 GPU 3 PCI PCI CPU 0 S1 CPU 1 QPI RAID Controller GPU 0 S2 S3 S4 IB IB GPU 1 GPU 2 GPU 3 PCI PCI CPU 0 S1 CPU 1 QPI RAID Controller GPU 0 S2 S3 S4 Switch Node 0 Node 1

10. The Two-‐Level Buffer Pool GPU Memory CPU Memory SSD

11. Shared Nothing Processing Multiple GPUs, with data partitioned between them Filter text ILIKE ‘rain’! Filter text ILIKE ‘rain’! Filter text ILIKE ‘rain’! Node 1 Node 2 Node 3

12. the product

13. Complex AnalyKcs Image processing VisualizaKon GPU in-‐memory SQL database OpenGL H.264/VP8 streaming GPU pipeline Machine learning Graph analyKcs License Simple # of GPUs Mobile/server versions Scale to cluster of GPU nodes SQL compiler Shared scans User defined funcKons Hybrid GPU/CPU execuKon OpenCL and CUDA Product GPU powered end-‐to-‐end big data analyKcs and visualizaKon plaYorm

14. Map-D code Single GPU 12GB memory Map-D code integrated into GPU memory Single CPU 768GB memory Map-D code integrated into CPU memory NVIDIA TEGRA Mobile chip 4GB memory Map-D code integrated into chip memory 8 cards = 4U box 4 sockets = 4U box Map-D code runs on GPU + CPU memory 36U rack: ~400GB GPU ~12TB CPU Mobile Map-D running small datasets Native App Web-based service Map-D hardware architecture Large Data Big Data Small Data Next Gen Flash 40TB 100GB/s

15. map-D www.map-d.com @datarefined info@map-d.com