SlideShare ist ein Scribd-Unternehmen logo
1 von 76
Downloaden Sie, um offline zu lesen
Machine Learning at the Limit
John Canny*^
* Computer Science Division
University of California, Berkeley
^ Yahoo Research Labs
@Alpine Labs, March, 2015
Outline
Scaling Inward:
• What are the performance limits for machine learning on
single machines, and how close can we get to them?
(Single-node Roofline Design)
Scaling Out:
• What happens when we try to scale up from optimized
single nodes to clusters?
(Roofline Design for Clusters)
My Other Job(s)
Yahoo [Chen, Pavlov, Canny, KDD 2009]*
Ebay [Chen, Canny, SIGIR 2011]**
Quantcast 2011-2013
Microsoft 2014
Yahoo 2015
* Best application paper prize
** Best paper honorable mention
Data Scientist’s Workflow
Digging Around
in Data
Hypothesize
Model
Customize
Large Scale
Exploitation
Evaluate
Interpret
Sandbox
Production
Data Scientist’s Workflow
Digging Around
in Data
Hypothesize
Model
Customize
Large Scale
Exploitation
Evaluate
Interpret
Sandbox
Production
Why Build a New ML Toolkit?
• Performance: GPU performance pulling away from other
platforms for *sparse* and dense data.
Minibatch + SGD methods dominant on Big Data,…
• Customizability: Great value in customizing models (loss
functions, constraints,…)
• Explore/Deploy: Explore fast, run the same code in
prototype and production. Be able to run on clusters.
Desiderata
• Performance:
• Roofline Design (single machine and cluster)
• General Matrix Library with full CPU/GPU acceleration
• Customizability:
• Modular Learner Architecture (reusable components)
• Likelihood “Mixins”
• Explore/Deploy:
• Interactive, Scriptable, Graphical
• JVM based (Scala) w/ optimal cluster primitives
Outline
Scaling Inward:
• What are the performance limits for machine learning on
single machines, and how close can we get to them?
(Single-node Roofline Design)
Scaling Out:
• What happens when we try to scale up from optimized
single nodes to clusters?
(Roofline Design for Clusters)
Roofline Design (Williams, Waterman, Patterson, 2009)
• Roofline design establishes fundamental performance
limits for a computational kernel.
Operational Intensity (flops/byte)
Throughput(gflops)
1
10
100
1000
0.01 0.1 1 10 100
GPU ALU throughput
CPU ALU throughput
1000
A Tale of Two Architectures
Intel CPU NVIDIA GPU
Memory Controller
L3 Cache
Core
ALU
Core
ALU
Core
ALU
Core
ALU
L2 Cache
ALU ALU ALU ALU ALU ALU
CPU vs GPU Memory Hierarchy
Intel 8 core Sandy Bridge CPU NVIDIA GK110 GPU
8 MB L3 Cache
1.5 MB L2 Cache
4 MB register file (!)4kB registers:
1 MB Shared Mem
2 MB L2 Cache
512K L1 Cache
1 MB Constant Mem
1 TB/s1 TB/s
40 TB/s
13 TB/s
5 TB/s
10s GB Main Memory 4 GB Main Memory
20 GB/s
500 GB/s
500 GB/s
200 GB/s
Roofline Design – Matrix kernels
• Dense matrix multiply
• Sparse matrix multiply
Operational Intensity (flops/byte)
Throughput(gflops)
1
10
100
1000
0.01 0.1 1 10 100
GPU ALU throughput
CPU ALU throughput
1000
DataSource
(JBOD disks)
Learner
Model
Optimizer
Mixins
Model
Optimizer
Mixins
GPU 1 thread 1
GPU 2 thread 2
:
:
CPU host code
data
blocks
DataSource
(Memory)
DataSource
HDFS over
network
Zhao+Canny
SIAM DM 13, KDD 13, BIGLearn 13
A Rooflined Machine Learning Toolkit
Compressed disk streaming at
~ 0.1-2 GB/s  100 HDFS nodes
30 Gflops to
2 Teraflops per GPU
Matrix + Machine Learning Layers
Written in the beautiful Scala language:
• Interpreter with JIT, scriptable.
• Open syntax +,-,*, ,, etc, math looks like math.
• Java VM + Java codebase – runs on Hadoop, Yarn, Spark.
• Hardware acceleration in C/C++ native code (CPU/GPU).
• Easy parallelism: Actors, parallel collections.
• Memory management (sort of ).
• Pre-built for multiple Platforms (Windows, MacOS, Linux).
Experience similar to Matlab, R, SciPy
Benchmarks
Recent benchmarks on some representative tasks:
• Text Classification on Reuters news data (0.5 GB)
• Click prediction on the Kaggle Criteo dataset (12 GB)
• Clustering of handwritten digit images (MNIST) (25 GB)
• Collaborative filtering on the Netflix prize dataset (4 GB)
• Topic modeling (LDA) on a NY times collection (0.5 GB)
• Random Forests on a UCI Year Prediction dataset (0.2 GB)
• Pagerank on two social network graphs at 12GB and 48GB
Benchmarks
Systems (single node)
• BIDMach
• VW (Vowpal Wabbit) from Yahoo/Microsoft
• Scikit-Learn
• LibLinear
Cluster Systems
• Spark v1.1 and v1.2
• Graphlab (academic version)
• Yahoo’s LDA cluster
Benchmarks: Single-Machine Systems
System Algorithm Dataset Dim Time
(s)
Cost
($)
Energy
(KJ)
BIDMach Logistic Reg. RCV1 103 14 0.002 3
Vowpal
Wabbit
Logistic Reg. RCV1 103 130 0.02 30
LibLinear Logistic Reg. RCV1 103 250 0.04 60
Scikit-Learn Logistic Reg. RCV1 103 576 0.08 120
RCV1: Text Classification, 103 topics (0.5GB).
Algorithms were tuned to achieve similar accuracy.
Benchmarks: Cluster Systems
System A/B Algorithm Dataset Dim Time
(s)
Cost
($)
Energy
(KJ)
Spark-72
BIDMach
Logistic Reg. RCV1 1
103
30
14
0.07
0.002
120
3
Spark-64
BIDMach
RandomForest YearPred 1 280
320
0.48
0.05
480
60
Spark-128
BIDMach
Logistic Reg. Criteo 1 400
81
1.40
0.01
2500
16
Spark-XX = System with XX cores
BIDMach ran on one node with GTX-680 GPU
Benchmarks: Cluster Systems
System A/B Algorithm Dataset Dim Time
(s)
Cost
($)
Energy
(KJ)
Spark-384
BIDMach
K-Means MNIST 4096 1100
735
9.00
0.12
22k
140
GraphLab-576
BIDMach
Matrix
Factorization
Netflix 100 376
90
16
0.015
10k
20
Yahoo-1000
BIDMach
LDA (Gibbs) NYtimes 1024 220k
300k
40k
60
4E10
6E7
Spark-XX or GraphLab-XX = System with XX cores
Yahoo-1000 had 1000 nodes
BIDMach at Scale
Latent Dirichlet Allocation
BIDMach outperforms cluster systems on this problem, and
has run up to 10 TB on one node.
Convergence on 1TB data
Benchmark Summary
• BIDMach is at least 10x faster than other single-machine
systems for comparable accuracy.
• For Random Forests or single-class regression, BIDMach on
a GPU node is comparable with 8-16 worker clusters.
• For multi-class regression, factor models, clustering etc.,
GPU-assisted BIDMach is comparable to 100-1000-worker
clusters. Larger problems correlate with larger values in this
range.
Cost and Energy Use
• BIDMach had a 10x-1000x cost advantage over the other
systems. The ratio was higher for larger-scale problems.
• Energy savings were similar to the cost savings, at 10x-
1000x.
In the Wild (Examples from Industry)
• Multilabel regression problem (summer intern project):
• Existing tool (single-machine) took ~ 1 week to build a model.
• BIDMach on a GPU node takes 1 hour (120x speedup)
• Iteration and feature engineering gave +15% accuracy.
• Auction simulation problem (cluster job):
• Existing tool simulates auction variations on log data.
• On NVIDIA 3.0 devices (64 registers/thread) we achieve a 70x
speedup over a reference implementation in Scala
• On NVIDIA 3.5 devices (256 registers/thread) we can move
auction state entirely into register storage and gain a 400x
speedup.
In the Wild (Examples from Industry)
• Classification (cluster job):
• Cluster job (logistic regression) took 8 hours.
• BIDMach version takes < 1 hour on a single node.
• SVMs for image classification (single machine)
• Large multi-label classification took 1 week with LibSVM.
• BIDMach version (SGD-based SVM) took 90 seconds.
Performance Revisited
• BIDMach had a 10x-1000x cost advantage over the other
systems. The ratio was higher for larger-scale problems.
• Energy savings were similar to the cost savings, at 10x-
1000x.
But why??
• We only expect about 10x
from GPU acceleration?
The Exponential Growth of O(1)
Once upon a Time…
a = b + c // single instruction
y = A[i] // single instruction
Now five orders
of magnitude
The Exponential Growth of O(1)
By slight abuse of notation, let O(1) denote the ratio of times
for a main memory access vs an ALU operation.
Then O(1) is not only not “small,” but its growing exponentially
with time.
A lot of non-rooflined code is too “memory-hungry” – you can
often do (much) better with custom, memory-aware kernels.
Log (Memaccess/ALUop)
Time (Years)
Performance Creep
• Code that isnt explicitly rooflined seems to be creeping
further and further from its roofline limit.
• With Intel’s profiling tools we can check the throughput of a variety of
running programs – often 10s–100s of Mflops – memory bound.
• Need well-designed kernels (probably hardware-aware
code) to get to the limits.
• Coding for performance:
Avoid:
Large hash tables
Large binary trees
Linked data structures
Use:
Sorting
Memory B-trees
Packed data structures
BIDMach MLAlgorithms
1. Regression (logistic, linear)
2. Support Vector Machines
3. k-Means Clustering
4. Topic Modeling - Latent Dirichlet Allocation
5. Collaborative Filtering
6. NMF – Non-Negative Matrix Factorization
7. Factorization Machines
8. Random Forests
9. Multi-layer neural networks
10. IPTW (Causal Estimation)
11. ICA
= Likely the fastest implementation available
CPU vs GPU Memory Hierarchy
Intel 8 core Sandy Bridge CPU NVIDIA GK110 GPU
8 MB L3 Cache
1.5 MB L2 Cache
4 MB register file (!)4kB registers:
1 MB Shared Mem
2 MB L2 Cache
512K L1 Cache
1 MB Constant Mem
1 TB/s1 TB/s
40 TB/s
13 TB/s
5 TB/s
10s GB Main Memory 4 GB Main Memory
20 GB/s
500 GB/s
500 GB/s
200 GB/s
CoDesign: Gibbs Sampling
• Gibbs sampling is a very general method for inference in
machine learning, but typically slow.
• But it’s a particularly good match
for SIMT hardware (GPUs), not
difficult to get 10-100x speedups.
• But many popular models have slow convergence:
• Latent Dirichlet Allocation
• 10 passes of online Variational Bayes ≈
• 1000 passes for collapsed Gibbs sampling
Consequences
LDA graphical model:
• Topic samples Z are discrete and sparse: very few samples
for a specific topic in a particular document.
• This gives a high-variance random walk through parameter
space before  converges.
Documents
Gibbs Parameter Estimation
Gibbs sampling in ML often applied to distributions of the
form
P(D,Z,)
Where D is the data,  are the parameters of the model, and
X are other hidden states we would like to integrate over.
e.g for LDA,  are the ,  parameters and Z are
assignments of words to topics.
• Gibbs sampling most often simply samples over both, which
is slow.
• But we really want to optimize over , and marginalize
(integrate) over X.
State-Augmented Marginal Estimation (SAME)
– Doucet et al. 2002
The marginal parameter distribution is 𝑃 Θ = 𝑃 𝑋, Θ 𝑑𝑋
We can sharpen it to 𝑃 𝑘
Θ as follows
𝑃 𝑘 Θ = … 𝑃 𝑋, Θ
𝑍𝑌𝑋
𝑃 𝑌, Θ … 𝑃 𝑍, Θ 𝑑𝑋𝑑𝑌 … 𝑑𝑍
𝑃 𝑘
Θ has the same peaks as 𝑃 Θ and corresponds to a
Gibbs distribution cooled to T = 1/k.
We modify a standard, blocked Gibbs sampler to take k
samples instead of 1 for X, and then recompute Θ from
these samples.
SAME Sampling
In the language of graphical models:
Run independent simulations with tied parameters Θ
Θ
SAME sampling as cooling
What cooling does:
Likelihood P() in model parameter space (peaks are good
models)
SAME sampling as cooling
What cooling does:
Likelihood P() in model parameter space (peaks are good
models)
SAME sampling as cooling
What cooling does:
Likelihood P() in model parameter space (peaks are good
models)
Making SAME sampling fast
The approach in general:
• Wherever you would take 1 sample before, take k
independent samples and average for parameter estimation.
Instead of taking distinct samples take counts of
anonymous samples – complexity independent of k
• Exact for LDA, HMM, Factorization Machines,…
• Corresponds to factored joint approximation for other
models.
Varying k allows for annealing optimization over .
Results on LDA
• We use k=100, make  10 passes over the dataset
(effectively taking 1000 total samples).
• Our SAME Gibbs sampler is within a factor of 3 of online
Variational Bayes. More than 100x faster than standard
Gibbs samplers.
• The model was more accurate than any other LDA method
we tested:
SAME sampling in general
• SAME sampling is an alternative to other message-passing
methods (VMP etc.)
• But rather than using bounds on posterior distributions it:
• Samples from the distribution of “uninteresting” variables
• Sharpens the posteriors on parameter variables.
Current Work
Inference on general graphical
models with Gibbs sampling.
Larger graphical models arise
in many applications but are
discarded for performance
reasons.
Existing tools (JAGS, Stan, Infer.net) are far from the roofline
for this problem.
Inference for Bayesian Networks
• (Main memory) Rooflined Approach: Sampling distribution
can be computed with an SpMM – 1-2GF on CPU and 5-
10GF on GPU.
• Experiment on MOOC learning concept graph with ~300
nodes and ~350 edges, 3000 samples
Getting to Warp Speed
The model for this problem
(CPTs for all nodes) fits
comfortably in GPU SHMEM.
The state of thousands of
particles fits comfortably in
GPU register memory.
It should be possible to gain another 10x to 100x in
performance with this design, and approach ALU limits.
4 MB registers
1 MB Shared Mem
Outline
Scaling Inward:
• What are the performance limits for machine learning on
single machines, and how close can we get to them?
(Single-node Roofline Design)
Scaling Out:
• What happens when we try to scale up from optimized
single nodes to clusters?
(Roofline Design for Clusters)
Challenges with Scaling MB algorithms
Can we combine node acceleration with clustering?
Its harder than it seems:
Dataset
Minibatch
updates
Single-node Minibatch Learner
Dataset partitions across nodes
Cluster-based Learner
Reduce (Allreduce)
Allreduce
Every node j has data Uj (a local copy of a global model).
We want to compute an aggregate (e.g. sum) of all data
vectors and distribute it to all nodes, i.e. to synchronize the
model across nodes
Why Allreduce?
• Speed: Minibatch SGD and related algorithms are the
fastest way to solve many of these problems (deep
learning, CF, topic modeling,…). But dataset sizes (several
TB) make single machine training slow (> 1 day).
• The more updates/sec the faster the method converges,
but:
• In practice minibatch dof ~ model dof is near-optimal.
• That means the ideal B/W for model updates is at least the
rate of data consumption (several Gb/s).
• In other words, we want an Allreduce that consumes data
at roughly the full NIC bandwidth.
Why Allreduce?
• Scale: Some models wont fit on one machine (e.g.
Pagerank).
• Other models will fit but minibatch updates only touch a
small fraction of features (Click Prediction). Updating only
those is much faster than a full Allreduce.
• Sparse Allreduce: Each node j pushes updates to a
subset of features Uj and pulls a subset Vj from a
distributed model.
Why Allreduce?
• Worker driven (emergency room model) vs. aggregator-
driven (doctor’s surgery)?
• “Excuse me, worker task, the aggregator will consume
your data now”
Lower Bounds for total message B/W.
• Intuitively, every node needs to send D values
somewhere, and receive D values for the final aggregate.
• The data from each node needs to be caught somewhere
if allreduce happens in the same network, and the final
aggregate message need to originate from somewhere.
• This suggests a naïve lower bound of 2D values in or out
of each node, or 2DN for an N-node network.
• In fact more careful analysis shows the lower bound is
2D(N-1) for total bandwidth, and ~ 2D/B for latency:
Patarasuk et al. Bandwidth optimal all-reduce algorithms for clusters of
workstations
Why not Parameter servers?
• Potentially B/W optimal, but latency limited by net B/W of
servers:
Clients
servers
This network has at least 2D(N-1)/PB latency with P
servers, suboptimal by (N-1)/P.
A Roofline For networks
• Optimal operating point
Packet size (kBytes)
Throughput(MB/s)
1
10
100
1000
1 10 100 1000 10000
Network capacity
100000
10000
Allreduce (example: Click Prediction)
• To run the most efficient solver (distributed SGD), we need to
synchronize models across the network.
• An Allreduce (reduce+broadcast) operation both
synchronizes and combines the updates from each node.
• For Logistic Regression, SVM and many other problems, we
need a Sparse Allreduce.
What about MapReduce?
The all-to-all Reduce implementations in Hadoop, Spark,
Powergraph etc. having scaling problems because of message
overhead. Smaller messages  lower throughput.
During reduce, N nodes exchange
O(N2) messages of diminishing size.
Solutions (Dense data)
• For dense data, there are optimal (and fast in practice)
methods for Allreduce on clusters.
• They are implemented in MPI, but not always used by
default.
• In a recent Baidu paper (Wu et al.*), the authors achieved
near-linear speedup on an Infiniband cluster using MPI all-to-
all primitives.
* “Deep Image: Scaling up Image Recognition” Arxiv 1501.02876v2
Kylix: A Sparse AllReduce
for Commodity Clusters
Huasha Zhao
John Canny
Department of Electrical Engineering
and Computer Sciences
University of California, Berkeley
Power-Law Data
Term-Document Graph
Power-Law Data
𝑭 = 𝒓−𝜶
F: frequency
r: rank
α: power law exponent
Sparse AllReduce
• Allreduce is a general primitive for distributed graph
mining and machine learning.
• Reduce, e.g. 𝑣 = 𝑣𝑖
𝑚
𝑖=1 or other commutative and associative aggregate
operator such as mean, max etc.
• And redistributed back across nodes.
• Fast Allreduce makes the cluster behave like a single giant node.
• 𝑣𝑖 is sparse (power law) for big data problems, and each
cluster node may not require all of the sum v but only a
sparse subset.
• Kylix is an efficient Sparse Allreduce primitive for
computing on commodity clusters.
• Loss function
• The derivative of loss, which defines the SGD update, is a
scaled copy of Xi – same sparse non-zeros as Xi
Mini-batch Machine Learning
…Features
Examples
Xi Xi+1
Mini-batch Machine Learning
…Features
Examples
Xi Xi+1
Pull requests:
non-zeros of
next iteration
Push requests:
non-zeros of
current iteration
Kylix – Nested Heterogeneous Butterfly
• Data split to 6 machines
Kylix – Nested Heterogeneous Butterfly
• Each machine is responsible for reducing 1/6 of the indices.
Kylix – Nested Heterogeneous Butterfly
• Scatter-Reduce along dimension 1
• data split by 3
Kylix – Nested Heterogeneous Butterfly
• Scatter-Reduce along dimension 2
• Reduce for shorter range, but denser data
grey scale indicates density
Kylix – Nested Heterogeneous Butterfly
• Allgather in a reverse order
• Redistribute reduced values
grey scale indicates density
Kylix – Nested Heterogeneous Butterfly
• Heterogeneous degree allows tailoring message size for each
layer in order to achieve optimal message size (larger than
the efficient message floor).
Kylix – Nested Heterogeneous Butterfly
• Heterogeneous degree allows tailoring message size for each
layer – achieve optimal message size.
• Nesting allows indices to be collapsed down the network, so
that communication is greatly reduced in the lower layers –
works particularly well for power data.
Kylix - Summary
• Each network node specifies a set of input indices to pull
and a set output indices to push.
• Configuration step
• Build the routing table, once for static graphs, multiple times for
dynamic graphs
• Partitioning
• Union/Collapsing indices
• Mapping
• Reduction step
• ConfigReduce
• Can be performed in a single down-pass
Tuning Degrees of the Network
• To compute the optimal degree for that layer, we need to know the
amount of data per node. That will be the length of the partition
range at that node times the density.
• Given this data size, we find the
largest degree d such that P/d is
larger then the message floor.
• Degree / diameter trade-off
• degree^diameter = const.
Data Volumes at Each Layer
• Total communication across all layers a small constant larger than
the top layer, which is close to optimal.
• Communication volume across layers has a characteristic Kylix
shape.
Experiments (PageRank)
• Twitter Followers’ Graph
• 1.4 billion edges and 40 million vertices
• Yahoo Web Graph
• 6.7 billion edges and 1.4 billion vertices
• EC2 cluster compute node (cc2.8xlarge)
90-node Yahoo M45 64-node EC2 64-node EC2
Summary
• Roofline design exposes fundamental limits for ML tasks,
and provides guidance to reaching them.
• Roofline design gives BIDMach one to several orders of
magnitude advantage over other tools – most of which ware
well below roofline limits.
• Rooflining is a powerful tool for network primitives and
allows cluster computing to approach linear speedups
(strong scaling).
Software (version 1.0 just released)
Code: github.com/BIDData/BIDMach
Wiki: http://bid2.berkeley.edu/bid-data-project/overview/
BSD open source libs and dependencies.
In this release:
• Random Forests
• Double-precision GPU matrices
• Ipython/IScala Notebook
• Simple DNNs
Wrapper for Berkeley’s Caffe coming soon)
Thanks
Sponsors:
Collaborators:

Weitere ähnliche Inhalte

Was ist angesagt?

Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Databricks
 
Koalas: Making an Easy Transition from Pandas to Apache Spark
Koalas: Making an Easy Transition from Pandas to Apache SparkKoalas: Making an Easy Transition from Pandas to Apache Spark
Koalas: Making an Easy Transition from Pandas to Apache SparkDatabricks
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Databricks
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...Edge AI and Vision Alliance
 
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Intel® Software
 
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, SparkDistributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, SparkJan Wiegelmann
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySparkSpark Summit
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...Herman Wu
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!DataWorks Summit
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Jen Aman
 
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Saliya Ekanayake
 
CUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce ClusterCUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce Clusterairbots
 
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd Mostak
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd MostakLeveraging GPU-Accelerated Analytics on top of Apache Spark with Todd Mostak
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd MostakDatabricks
 
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen..."Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...Edge AI and Vision Alliance
 
Database Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureDatabase Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureKyong-Ha Lee
 
USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)Ryousei Takano
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Intel® Software
 

Was ist angesagt? (20)

Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Koalas: Making an Easy Transition from Pandas to Apache Spark
Koalas: Making an Easy Transition from Pandas to Apache SparkKoalas: Making an Easy Transition from Pandas to Apache Spark
Koalas: Making an Easy Transition from Pandas to Apache Spark
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
 
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
 
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, SparkDistributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
 
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
 
CUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce ClusterCUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce Cluster
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd Mostak
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd MostakLeveraging GPU-Accelerated Analytics on top of Apache Spark with Todd Mostak
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd Mostak
 
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen..."Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
 
Database Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureDatabase Research on Modern Computing Architecture
Database Research on Modern Computing Architecture
 
USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
 

Andere mochten auch

Probability Forecasting - a Machine Learning Perspective
Probability Forecasting - a Machine Learning PerspectiveProbability Forecasting - a Machine Learning Perspective
Probability Forecasting - a Machine Learning Perspectivebutest
 
GPU Accelerated Backtesting and Machine Learning for Quant Trading Strategies
GPU Accelerated Backtesting and Machine Learning for Quant Trading StrategiesGPU Accelerated Backtesting and Machine Learning for Quant Trading Strategies
GPU Accelerated Backtesting and Machine Learning for Quant Trading StrategiesDaniel Egloff
 
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Al...
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Al...Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Al...
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Al...Tomonari Masada
 
Hacking The Trading Floor
Hacking The Trading FloorHacking The Trading Floor
Hacking The Trading Flooriffybird_099
 
That's like, so random! Monte Carlo for Data Science
That's like, so random! Monte Carlo for Data ScienceThat's like, so random! Monte Carlo for Data Science
That's like, so random! Monte Carlo for Data ScienceCorey Chivers
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Ha Phuong
 
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...PyData
 
Intro to Machine Learning
Intro to Machine LearningIntro to Machine Learning
Intro to Machine LearningCorey Chivers
 
Optimizing the
 Data Supply Chain
 for Data Science
Optimizing the
 Data Supply Chain
 for Data ScienceOptimizing the
 Data Supply Chain
 for Data Science
Optimizing the
 Data Supply Chain
 for Data ScienceVital.AI
 
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and SparkReal-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and SparkSingleStore
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine Learning(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine LearningAmazon Web Services
 
H2O Deep Learning at Next.ML
H2O Deep Learning at Next.MLH2O Deep Learning at Next.ML
H2O Deep Learning at Next.MLSri Ambati
 
Transform your Business with AI, Deep Learning and Machine Learning
Transform your Business with AI, Deep Learning and Machine LearningTransform your Business with AI, Deep Learning and Machine Learning
Transform your Business with AI, Deep Learning and Machine LearningSri Ambati
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer BuildPetteriTeikariPhD
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Grigory Sapunov
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through ExamplesSri Ambati
 

Andere mochten auch (20)

Probability Forecasting - a Machine Learning Perspective
Probability Forecasting - a Machine Learning PerspectiveProbability Forecasting - a Machine Learning Perspective
Probability Forecasting - a Machine Learning Perspective
 
GPU Accelerated Backtesting and Machine Learning for Quant Trading Strategies
GPU Accelerated Backtesting and Machine Learning for Quant Trading StrategiesGPU Accelerated Backtesting and Machine Learning for Quant Trading Strategies
GPU Accelerated Backtesting and Machine Learning for Quant Trading Strategies
 
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Al...
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Al...Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Al...
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Al...
 
Hacking The Trading Floor
Hacking The Trading FloorHacking The Trading Floor
Hacking The Trading Floor
 
That's like, so random! Monte Carlo for Data Science
That's like, so random! Monte Carlo for Data ScienceThat's like, so random! Monte Carlo for Data Science
That's like, so random! Monte Carlo for Data Science
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)
 
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...
 
Intro to Machine Learning
Intro to Machine LearningIntro to Machine Learning
Intro to Machine Learning
 
Optimizing the
 Data Supply Chain
 for Data Science
Optimizing the
 Data Supply Chain
 for Data ScienceOptimizing the
 Data Supply Chain
 for Data Science
Optimizing the
 Data Supply Chain
 for Data Science
 
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and SparkReal-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
 
Topic Models
Topic ModelsTopic Models
Topic Models
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine Learning(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine Learning
 
H2O Deep Learning at Next.ML
H2O Deep Learning at Next.MLH2O Deep Learning at Next.ML
H2O Deep Learning at Next.ML
 
Transform your Business with AI, Deep Learning and Machine Learning
Transform your Business with AI, Deep Learning and Machine LearningTransform your Business with AI, Deep Learning and Machine Learning
Transform your Business with AI, Deep Learning and Machine Learning
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer Build
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 

Ähnlich wie SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit By Prof. John Canny

Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)Nicolas Poggi
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheDavid Grier
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetupGanesan Narayanasamy
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedTuri, Inc.
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationBigstep
 
MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL Bernd Ocklin
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceAmazon Web Services
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureCeph Community
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMartin Zapletal
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Amazon Web Services
 
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Databricks
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceAmazon Web Services
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataHitoshi Sato
 

Ähnlich wie SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit By Prof. John Canny (20)

Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
 
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
 

Mehr von Chester Chen

SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfChester Chen
 
zookeeer+raft-2.pdf
zookeeer+raft-2.pdfzookeeer+raft-2.pdf
zookeeer+raft-2.pdfChester Chen
 
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...Chester Chen
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...Chester Chen
 
A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?Chester Chen
 
Shopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataShopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataChester Chen
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...Chester Chen
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...Chester Chen
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen
 
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleSF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleChester Chen
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftChester Chen
 
SFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapSFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapChester Chen
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bigheadChester Chen
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformChester Chen
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Chester Chen
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in sparkChester Chen
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 Chester Chen
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_indexChester Chen
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathChester Chen
 

Mehr von Chester Chen (20)

SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
 
zookeeer+raft-2.pdf
zookeeer+raft-2.pdfzookeeer+raft-2.pdf
zookeeer+raft-2.pdf
 
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
 
A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?
 
Shopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataShopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdata
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleSF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
 
SFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapSFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdap
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in spark
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_index
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
 

Kürzlich hochgeladen

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 

SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit By Prof. John Canny

  • 1. Machine Learning at the Limit John Canny*^ * Computer Science Division University of California, Berkeley ^ Yahoo Research Labs @Alpine Labs, March, 2015
  • 2. Outline Scaling Inward: • What are the performance limits for machine learning on single machines, and how close can we get to them? (Single-node Roofline Design) Scaling Out: • What happens when we try to scale up from optimized single nodes to clusters? (Roofline Design for Clusters)
  • 3. My Other Job(s) Yahoo [Chen, Pavlov, Canny, KDD 2009]* Ebay [Chen, Canny, SIGIR 2011]** Quantcast 2011-2013 Microsoft 2014 Yahoo 2015 * Best application paper prize ** Best paper honorable mention
  • 4. Data Scientist’s Workflow Digging Around in Data Hypothesize Model Customize Large Scale Exploitation Evaluate Interpret Sandbox Production
  • 5. Data Scientist’s Workflow Digging Around in Data Hypothesize Model Customize Large Scale Exploitation Evaluate Interpret Sandbox Production
  • 6. Why Build a New ML Toolkit? • Performance: GPU performance pulling away from other platforms for *sparse* and dense data. Minibatch + SGD methods dominant on Big Data,… • Customizability: Great value in customizing models (loss functions, constraints,…) • Explore/Deploy: Explore fast, run the same code in prototype and production. Be able to run on clusters.
  • 7. Desiderata • Performance: • Roofline Design (single machine and cluster) • General Matrix Library with full CPU/GPU acceleration • Customizability: • Modular Learner Architecture (reusable components) • Likelihood “Mixins” • Explore/Deploy: • Interactive, Scriptable, Graphical • JVM based (Scala) w/ optimal cluster primitives
  • 8. Outline Scaling Inward: • What are the performance limits for machine learning on single machines, and how close can we get to them? (Single-node Roofline Design) Scaling Out: • What happens when we try to scale up from optimized single nodes to clusters? (Roofline Design for Clusters)
  • 9. Roofline Design (Williams, Waterman, Patterson, 2009) • Roofline design establishes fundamental performance limits for a computational kernel. Operational Intensity (flops/byte) Throughput(gflops) 1 10 100 1000 0.01 0.1 1 10 100 GPU ALU throughput CPU ALU throughput 1000
  • 10. A Tale of Two Architectures Intel CPU NVIDIA GPU Memory Controller L3 Cache Core ALU Core ALU Core ALU Core ALU L2 Cache ALU ALU ALU ALU ALU ALU
  • 11. CPU vs GPU Memory Hierarchy Intel 8 core Sandy Bridge CPU NVIDIA GK110 GPU 8 MB L3 Cache 1.5 MB L2 Cache 4 MB register file (!)4kB registers: 1 MB Shared Mem 2 MB L2 Cache 512K L1 Cache 1 MB Constant Mem 1 TB/s1 TB/s 40 TB/s 13 TB/s 5 TB/s 10s GB Main Memory 4 GB Main Memory 20 GB/s 500 GB/s 500 GB/s 200 GB/s
  • 12. Roofline Design – Matrix kernels • Dense matrix multiply • Sparse matrix multiply Operational Intensity (flops/byte) Throughput(gflops) 1 10 100 1000 0.01 0.1 1 10 100 GPU ALU throughput CPU ALU throughput 1000
  • 13. DataSource (JBOD disks) Learner Model Optimizer Mixins Model Optimizer Mixins GPU 1 thread 1 GPU 2 thread 2 : : CPU host code data blocks DataSource (Memory) DataSource HDFS over network Zhao+Canny SIAM DM 13, KDD 13, BIGLearn 13 A Rooflined Machine Learning Toolkit Compressed disk streaming at ~ 0.1-2 GB/s  100 HDFS nodes 30 Gflops to 2 Teraflops per GPU
  • 14. Matrix + Machine Learning Layers Written in the beautiful Scala language: • Interpreter with JIT, scriptable. • Open syntax +,-,*, ,, etc, math looks like math. • Java VM + Java codebase – runs on Hadoop, Yarn, Spark. • Hardware acceleration in C/C++ native code (CPU/GPU). • Easy parallelism: Actors, parallel collections. • Memory management (sort of ). • Pre-built for multiple Platforms (Windows, MacOS, Linux). Experience similar to Matlab, R, SciPy
  • 15. Benchmarks Recent benchmarks on some representative tasks: • Text Classification on Reuters news data (0.5 GB) • Click prediction on the Kaggle Criteo dataset (12 GB) • Clustering of handwritten digit images (MNIST) (25 GB) • Collaborative filtering on the Netflix prize dataset (4 GB) • Topic modeling (LDA) on a NY times collection (0.5 GB) • Random Forests on a UCI Year Prediction dataset (0.2 GB) • Pagerank on two social network graphs at 12GB and 48GB
  • 16. Benchmarks Systems (single node) • BIDMach • VW (Vowpal Wabbit) from Yahoo/Microsoft • Scikit-Learn • LibLinear Cluster Systems • Spark v1.1 and v1.2 • Graphlab (academic version) • Yahoo’s LDA cluster
  • 17. Benchmarks: Single-Machine Systems System Algorithm Dataset Dim Time (s) Cost ($) Energy (KJ) BIDMach Logistic Reg. RCV1 103 14 0.002 3 Vowpal Wabbit Logistic Reg. RCV1 103 130 0.02 30 LibLinear Logistic Reg. RCV1 103 250 0.04 60 Scikit-Learn Logistic Reg. RCV1 103 576 0.08 120 RCV1: Text Classification, 103 topics (0.5GB). Algorithms were tuned to achieve similar accuracy.
  • 18. Benchmarks: Cluster Systems System A/B Algorithm Dataset Dim Time (s) Cost ($) Energy (KJ) Spark-72 BIDMach Logistic Reg. RCV1 1 103 30 14 0.07 0.002 120 3 Spark-64 BIDMach RandomForest YearPred 1 280 320 0.48 0.05 480 60 Spark-128 BIDMach Logistic Reg. Criteo 1 400 81 1.40 0.01 2500 16 Spark-XX = System with XX cores BIDMach ran on one node with GTX-680 GPU
  • 19. Benchmarks: Cluster Systems System A/B Algorithm Dataset Dim Time (s) Cost ($) Energy (KJ) Spark-384 BIDMach K-Means MNIST 4096 1100 735 9.00 0.12 22k 140 GraphLab-576 BIDMach Matrix Factorization Netflix 100 376 90 16 0.015 10k 20 Yahoo-1000 BIDMach LDA (Gibbs) NYtimes 1024 220k 300k 40k 60 4E10 6E7 Spark-XX or GraphLab-XX = System with XX cores Yahoo-1000 had 1000 nodes
  • 20. BIDMach at Scale Latent Dirichlet Allocation BIDMach outperforms cluster systems on this problem, and has run up to 10 TB on one node. Convergence on 1TB data
  • 21. Benchmark Summary • BIDMach is at least 10x faster than other single-machine systems for comparable accuracy. • For Random Forests or single-class regression, BIDMach on a GPU node is comparable with 8-16 worker clusters. • For multi-class regression, factor models, clustering etc., GPU-assisted BIDMach is comparable to 100-1000-worker clusters. Larger problems correlate with larger values in this range.
  • 22. Cost and Energy Use • BIDMach had a 10x-1000x cost advantage over the other systems. The ratio was higher for larger-scale problems. • Energy savings were similar to the cost savings, at 10x- 1000x.
  • 23. In the Wild (Examples from Industry) • Multilabel regression problem (summer intern project): • Existing tool (single-machine) took ~ 1 week to build a model. • BIDMach on a GPU node takes 1 hour (120x speedup) • Iteration and feature engineering gave +15% accuracy. • Auction simulation problem (cluster job): • Existing tool simulates auction variations on log data. • On NVIDIA 3.0 devices (64 registers/thread) we achieve a 70x speedup over a reference implementation in Scala • On NVIDIA 3.5 devices (256 registers/thread) we can move auction state entirely into register storage and gain a 400x speedup.
  • 24. In the Wild (Examples from Industry) • Classification (cluster job): • Cluster job (logistic regression) took 8 hours. • BIDMach version takes < 1 hour on a single node. • SVMs for image classification (single machine) • Large multi-label classification took 1 week with LibSVM. • BIDMach version (SGD-based SVM) took 90 seconds.
  • 25. Performance Revisited • BIDMach had a 10x-1000x cost advantage over the other systems. The ratio was higher for larger-scale problems. • Energy savings were similar to the cost savings, at 10x- 1000x. But why?? • We only expect about 10x from GPU acceleration?
  • 26. The Exponential Growth of O(1) Once upon a Time… a = b + c // single instruction y = A[i] // single instruction Now five orders of magnitude
  • 27. The Exponential Growth of O(1) By slight abuse of notation, let O(1) denote the ratio of times for a main memory access vs an ALU operation. Then O(1) is not only not “small,” but its growing exponentially with time. A lot of non-rooflined code is too “memory-hungry” – you can often do (much) better with custom, memory-aware kernels. Log (Memaccess/ALUop) Time (Years)
  • 28. Performance Creep • Code that isnt explicitly rooflined seems to be creeping further and further from its roofline limit. • With Intel’s profiling tools we can check the throughput of a variety of running programs – often 10s–100s of Mflops – memory bound. • Need well-designed kernels (probably hardware-aware code) to get to the limits. • Coding for performance: Avoid: Large hash tables Large binary trees Linked data structures Use: Sorting Memory B-trees Packed data structures
  • 29. BIDMach MLAlgorithms 1. Regression (logistic, linear) 2. Support Vector Machines 3. k-Means Clustering 4. Topic Modeling - Latent Dirichlet Allocation 5. Collaborative Filtering 6. NMF – Non-Negative Matrix Factorization 7. Factorization Machines 8. Random Forests 9. Multi-layer neural networks 10. IPTW (Causal Estimation) 11. ICA = Likely the fastest implementation available
  • 30. CPU vs GPU Memory Hierarchy Intel 8 core Sandy Bridge CPU NVIDIA GK110 GPU 8 MB L3 Cache 1.5 MB L2 Cache 4 MB register file (!)4kB registers: 1 MB Shared Mem 2 MB L2 Cache 512K L1 Cache 1 MB Constant Mem 1 TB/s1 TB/s 40 TB/s 13 TB/s 5 TB/s 10s GB Main Memory 4 GB Main Memory 20 GB/s 500 GB/s 500 GB/s 200 GB/s
  • 31. CoDesign: Gibbs Sampling • Gibbs sampling is a very general method for inference in machine learning, but typically slow. • But it’s a particularly good match for SIMT hardware (GPUs), not difficult to get 10-100x speedups. • But many popular models have slow convergence: • Latent Dirichlet Allocation • 10 passes of online Variational Bayes ≈ • 1000 passes for collapsed Gibbs sampling
  • 32. Consequences LDA graphical model: • Topic samples Z are discrete and sparse: very few samples for a specific topic in a particular document. • This gives a high-variance random walk through parameter space before  converges. Documents
  • 33. Gibbs Parameter Estimation Gibbs sampling in ML often applied to distributions of the form P(D,Z,) Where D is the data,  are the parameters of the model, and X are other hidden states we would like to integrate over. e.g for LDA,  are the ,  parameters and Z are assignments of words to topics. • Gibbs sampling most often simply samples over both, which is slow. • But we really want to optimize over , and marginalize (integrate) over X.
  • 34. State-Augmented Marginal Estimation (SAME) – Doucet et al. 2002 The marginal parameter distribution is 𝑃 Θ = 𝑃 𝑋, Θ 𝑑𝑋 We can sharpen it to 𝑃 𝑘 Θ as follows 𝑃 𝑘 Θ = … 𝑃 𝑋, Θ 𝑍𝑌𝑋 𝑃 𝑌, Θ … 𝑃 𝑍, Θ 𝑑𝑋𝑑𝑌 … 𝑑𝑍 𝑃 𝑘 Θ has the same peaks as 𝑃 Θ and corresponds to a Gibbs distribution cooled to T = 1/k. We modify a standard, blocked Gibbs sampler to take k samples instead of 1 for X, and then recompute Θ from these samples.
  • 35. SAME Sampling In the language of graphical models: Run independent simulations with tied parameters Θ Θ
  • 36. SAME sampling as cooling What cooling does: Likelihood P() in model parameter space (peaks are good models)
  • 37. SAME sampling as cooling What cooling does: Likelihood P() in model parameter space (peaks are good models)
  • 38. SAME sampling as cooling What cooling does: Likelihood P() in model parameter space (peaks are good models)
  • 39. Making SAME sampling fast The approach in general: • Wherever you would take 1 sample before, take k independent samples and average for parameter estimation. Instead of taking distinct samples take counts of anonymous samples – complexity independent of k • Exact for LDA, HMM, Factorization Machines,… • Corresponds to factored joint approximation for other models. Varying k allows for annealing optimization over .
  • 40. Results on LDA • We use k=100, make  10 passes over the dataset (effectively taking 1000 total samples). • Our SAME Gibbs sampler is within a factor of 3 of online Variational Bayes. More than 100x faster than standard Gibbs samplers. • The model was more accurate than any other LDA method we tested:
  • 41. SAME sampling in general • SAME sampling is an alternative to other message-passing methods (VMP etc.) • But rather than using bounds on posterior distributions it: • Samples from the distribution of “uninteresting” variables • Sharpens the posteriors on parameter variables.
  • 42. Current Work Inference on general graphical models with Gibbs sampling. Larger graphical models arise in many applications but are discarded for performance reasons. Existing tools (JAGS, Stan, Infer.net) are far from the roofline for this problem.
  • 43. Inference for Bayesian Networks • (Main memory) Rooflined Approach: Sampling distribution can be computed with an SpMM – 1-2GF on CPU and 5- 10GF on GPU. • Experiment on MOOC learning concept graph with ~300 nodes and ~350 edges, 3000 samples
  • 44. Getting to Warp Speed The model for this problem (CPTs for all nodes) fits comfortably in GPU SHMEM. The state of thousands of particles fits comfortably in GPU register memory. It should be possible to gain another 10x to 100x in performance with this design, and approach ALU limits. 4 MB registers 1 MB Shared Mem
  • 45. Outline Scaling Inward: • What are the performance limits for machine learning on single machines, and how close can we get to them? (Single-node Roofline Design) Scaling Out: • What happens when we try to scale up from optimized single nodes to clusters? (Roofline Design for Clusters)
  • 46. Challenges with Scaling MB algorithms Can we combine node acceleration with clustering? Its harder than it seems: Dataset Minibatch updates Single-node Minibatch Learner Dataset partitions across nodes Cluster-based Learner Reduce (Allreduce)
  • 47. Allreduce Every node j has data Uj (a local copy of a global model). We want to compute an aggregate (e.g. sum) of all data vectors and distribute it to all nodes, i.e. to synchronize the model across nodes
  • 48. Why Allreduce? • Speed: Minibatch SGD and related algorithms are the fastest way to solve many of these problems (deep learning, CF, topic modeling,…). But dataset sizes (several TB) make single machine training slow (> 1 day). • The more updates/sec the faster the method converges, but: • In practice minibatch dof ~ model dof is near-optimal. • That means the ideal B/W for model updates is at least the rate of data consumption (several Gb/s). • In other words, we want an Allreduce that consumes data at roughly the full NIC bandwidth.
  • 49. Why Allreduce? • Scale: Some models wont fit on one machine (e.g. Pagerank). • Other models will fit but minibatch updates only touch a small fraction of features (Click Prediction). Updating only those is much faster than a full Allreduce. • Sparse Allreduce: Each node j pushes updates to a subset of features Uj and pulls a subset Vj from a distributed model.
  • 50. Why Allreduce? • Worker driven (emergency room model) vs. aggregator- driven (doctor’s surgery)? • “Excuse me, worker task, the aggregator will consume your data now”
  • 51. Lower Bounds for total message B/W. • Intuitively, every node needs to send D values somewhere, and receive D values for the final aggregate. • The data from each node needs to be caught somewhere if allreduce happens in the same network, and the final aggregate message need to originate from somewhere. • This suggests a naïve lower bound of 2D values in or out of each node, or 2DN for an N-node network. • In fact more careful analysis shows the lower bound is 2D(N-1) for total bandwidth, and ~ 2D/B for latency: Patarasuk et al. Bandwidth optimal all-reduce algorithms for clusters of workstations
  • 52. Why not Parameter servers? • Potentially B/W optimal, but latency limited by net B/W of servers: Clients servers This network has at least 2D(N-1)/PB latency with P servers, suboptimal by (N-1)/P.
  • 53. A Roofline For networks • Optimal operating point Packet size (kBytes) Throughput(MB/s) 1 10 100 1000 1 10 100 1000 10000 Network capacity 100000 10000
  • 54. Allreduce (example: Click Prediction) • To run the most efficient solver (distributed SGD), we need to synchronize models across the network. • An Allreduce (reduce+broadcast) operation both synchronizes and combines the updates from each node. • For Logistic Regression, SVM and many other problems, we need a Sparse Allreduce.
  • 55. What about MapReduce? The all-to-all Reduce implementations in Hadoop, Spark, Powergraph etc. having scaling problems because of message overhead. Smaller messages  lower throughput. During reduce, N nodes exchange O(N2) messages of diminishing size.
  • 56. Solutions (Dense data) • For dense data, there are optimal (and fast in practice) methods for Allreduce on clusters. • They are implemented in MPI, but not always used by default. • In a recent Baidu paper (Wu et al.*), the authors achieved near-linear speedup on an Infiniband cluster using MPI all-to- all primitives. * “Deep Image: Scaling up Image Recognition” Arxiv 1501.02876v2
  • 57. Kylix: A Sparse AllReduce for Commodity Clusters Huasha Zhao John Canny Department of Electrical Engineering and Computer Sciences University of California, Berkeley
  • 59. Power-Law Data 𝑭 = 𝒓−𝜶 F: frequency r: rank α: power law exponent
  • 60. Sparse AllReduce • Allreduce is a general primitive for distributed graph mining and machine learning. • Reduce, e.g. 𝑣 = 𝑣𝑖 𝑚 𝑖=1 or other commutative and associative aggregate operator such as mean, max etc. • And redistributed back across nodes. • Fast Allreduce makes the cluster behave like a single giant node. • 𝑣𝑖 is sparse (power law) for big data problems, and each cluster node may not require all of the sum v but only a sparse subset. • Kylix is an efficient Sparse Allreduce primitive for computing on commodity clusters.
  • 61. • Loss function • The derivative of loss, which defines the SGD update, is a scaled copy of Xi – same sparse non-zeros as Xi Mini-batch Machine Learning …Features Examples Xi Xi+1
  • 62. Mini-batch Machine Learning …Features Examples Xi Xi+1 Pull requests: non-zeros of next iteration Push requests: non-zeros of current iteration
  • 63. Kylix – Nested Heterogeneous Butterfly • Data split to 6 machines
  • 64. Kylix – Nested Heterogeneous Butterfly • Each machine is responsible for reducing 1/6 of the indices.
  • 65. Kylix – Nested Heterogeneous Butterfly • Scatter-Reduce along dimension 1 • data split by 3
  • 66. Kylix – Nested Heterogeneous Butterfly • Scatter-Reduce along dimension 2 • Reduce for shorter range, but denser data grey scale indicates density
  • 67. Kylix – Nested Heterogeneous Butterfly • Allgather in a reverse order • Redistribute reduced values grey scale indicates density
  • 68. Kylix – Nested Heterogeneous Butterfly • Heterogeneous degree allows tailoring message size for each layer in order to achieve optimal message size (larger than the efficient message floor).
  • 69. Kylix – Nested Heterogeneous Butterfly • Heterogeneous degree allows tailoring message size for each layer – achieve optimal message size. • Nesting allows indices to be collapsed down the network, so that communication is greatly reduced in the lower layers – works particularly well for power data.
  • 70. Kylix - Summary • Each network node specifies a set of input indices to pull and a set output indices to push. • Configuration step • Build the routing table, once for static graphs, multiple times for dynamic graphs • Partitioning • Union/Collapsing indices • Mapping • Reduction step • ConfigReduce • Can be performed in a single down-pass
  • 71. Tuning Degrees of the Network • To compute the optimal degree for that layer, we need to know the amount of data per node. That will be the length of the partition range at that node times the density. • Given this data size, we find the largest degree d such that P/d is larger then the message floor. • Degree / diameter trade-off • degree^diameter = const.
  • 72. Data Volumes at Each Layer • Total communication across all layers a small constant larger than the top layer, which is close to optimal. • Communication volume across layers has a characteristic Kylix shape.
  • 73. Experiments (PageRank) • Twitter Followers’ Graph • 1.4 billion edges and 40 million vertices • Yahoo Web Graph • 6.7 billion edges and 1.4 billion vertices • EC2 cluster compute node (cc2.8xlarge) 90-node Yahoo M45 64-node EC2 64-node EC2
  • 74. Summary • Roofline design exposes fundamental limits for ML tasks, and provides guidance to reaching them. • Roofline design gives BIDMach one to several orders of magnitude advantage over other tools – most of which ware well below roofline limits. • Rooflining is a powerful tool for network primitives and allows cluster computing to approach linear speedups (strong scaling).
  • 75. Software (version 1.0 just released) Code: github.com/BIDData/BIDMach Wiki: http://bid2.berkeley.edu/bid-data-project/overview/ BSD open source libs and dependencies. In this release: • Random Forests • Double-precision GPU matrices • Ipython/IScala Notebook • Simple DNNs Wrapper for Berkeley’s Caffe coming soon)