SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Large Scale Kernel Learning using Block
Coordinate Descent
Shaleen Kumar Gupta, Research Assistant3
Authors:
Stephen Tu1 Rebecca Reolofs1 Shivaram Venkatraman1
Benjamin Recht1,2
1Department of Electrical Engineering and Computer Science
UC Berkeley, Berkeley, CA
2Department of Statistics
UC Berkeley, Berkeley, CA
3Nanyang Technological University, 2016
Outline
1 Overview
Introduction
Background
2 Datasets
TIMIT
Yelp Reviews
CIFAR-10
3 Experimental Results
4 Performance and Scalability
5 Conclusion
Outline
1 Overview
Introduction
Background
2 Datasets
TIMIT
Yelp Reviews
CIFAR-10
3 Experimental Results
4 Performance and Scalability
5 Conclusion
Overview
Kernel methods are a powerful tool in machine learning,
allowing one to discover non-linear structure by mapping data
into a higher dimensional, possibly inïŹnite, feature space.
Problem: They do not scale well.
This paper attempts to exploit distributed computation in
Block CD and present results.
Moreover, the paper attempts to study the performance of
Random Features and Nystrom approximations on three large
datasets from speech (TIMIT), text (Yelp) and image
classiïŹcation (CIFAR-10) domains.
Outline
1 Overview
Introduction
Background
2 Datasets
TIMIT
Yelp Reviews
CIFAR-10
3 Experimental Results
4 Performance and Scalability
5 Conclusion
Kernel Methods
https://www.reddit.com/r/MachineLearning/comments/15zrpp/please_explain_support_vector_machines_
svm_like_i/c7rkwce
If our data can’t be separated by a straight line we might need
to use a curvy line.
Kernel Methods
https://www.reddit.com/r/MachineLearning/comments/15zrpp/please_explain_support_vector_machines_
svm_like_i/c7rkwce
If our data can’t be separated by a straight line we might need
to use a curvy line.
A straight line in a higher dimensional space can be a curvy
line when projected onto a lower dimensional space.
So what we are really doing is using the kernel to put our data
into a high dimensional space, then ïŹnding a hyperplane to
separate the data in that high dimensional space.
This straight line looks like a curvy line when we bring it down
to the lower dimensional space.
Kernel Approximation Techniques (1/2)
Kernel Trick: The essence of the kernel-trick is that if you
can describe an algorithm in a certain way – which is using
only inner products – then you never need to actually use the
feature mapping, as long as you can compute the inner
product in the feature space.
While there are many kernel approximation techniques to do
the Kernel Trick, one prominent one is using the RBF Kernel.
We will also analyze two other Kernel approximation
techniques, namely Nystrom Method and Random Features
Technique, in this paper.
Kernel Approximation Techniques (2/2)
If we would use all data points, we would map to an RN
dimensional space and have the scaling problems.
Also, we would need to store all kernel values.
Nystrom method says that we don’t need go to the full space
spanned by all N training points, but we can just use a subset.
This will only yield an approximate embedding but if we keep
the number of samples we use the same, the resulting
embedding will be independent of dataset size and we can
basically choose the complexity to suit our problem.
Random feature based methods use an element-wise
approximation of the kernel.
Outline
1 Overview
Introduction
Background
2 Datasets
TIMIT
Yelp Reviews
CIFAR-10
3 Experimental Results
4 Performance and Scalability
5 Conclusion
TIMIT
Phone classiïŹcation task was performed on the TIMIT
dataset, which consisted of spoken audio from 462 speakers
The authors applied a Gaussian (RBF) kernel for the Nystrom
and exact methods and used random cosines for the random
feature method.
Outline
1 Overview
Introduction
Background
2 Datasets
TIMIT
Yelp Reviews
CIFAR-10
3 Experimental Results
4 Performance and Scalability
5 Conclusion
Yelp Reviews
The goal was to predict a rating from one to ïŹve stars from
the text of a review.
A usual 80:20 Training:Test split was applied
nltk was used for tokenization and stemming and n-gram
modeling was done with n=3.
For the exact and Nystrom experiments, they apply a linear
kernel.
For random features, they apply a hash kernel using
MurmurHash3 as their hash function.
Since they were predicting ratings for a review, they measured
accuracy by using the root mean square error (RMSE) of the
predicted rating as compared to the actual rating.
Outline
1 Overview
Introduction
Background
2 Datasets
TIMIT
Yelp Reviews
CIFAR-10
3 Experimental Results
4 Performance and Scalability
5 Conclusion
CIFAR-10
The task was to do image classiïŹcation of the CIFAR-10
dataset.
The dataset contained 500,000 training images and 4096
features per image.
The authors started with these 4096 features in the dataset as
input and used the RBF kernel for the exact and Nystrom
method and random cosines for the random features method.
Experimental Results (1/3)
Figure: ClassiïŹcation Error against Time using diïŹ€erent methods on the
TIMIT, Yelp and CIFAR-10 datasets. The little black stars denote the
end of an epoch
Experimental Results (2/3)
Figure: ClassiïŹcation Error against number of features for Nystrom and
Random Features on the TIMIT, Yelp and CIFAR-10 datasets
Experimental Results (3/3)
Performance
Figure: Breakdown of time to compute a single block of coordinate
descent in the ïŹrst epoch on the TIMIT, Yelp and CIFAR-10 datasets
From the ïŹgure, we see that the choice of the kernel
approximation can signiïŹcantly impact performance since
diïŹ€erent kernels take diïŹ€erent amounts of time to generate.
For example, the hash random feature used for the Yelp
dataset is much cheaper to compute than the string kernel.
However, computing a block of the RBF kernel is similar in
cost to computing a block of random cosine features.
Scalability of RBF Kernel Generation
Figure: Time taken to compute on eblock of RBF kernel as they scale the
number of examples and the number of machines used
Here, ideal scaling implies that the time to generate a block of the kernel
matrix remains constant as they increase both the data and the number
of machines.
However, computing a block of the RBF kernel involves broadcasting a b
x d matrix to all the machines in the cluster. This causes a slight
decrease in performance as they go from 8 to 128 machines. However,
they believe that the kernel block generation methods will continue to
scale well for larger datasets since broadcast routines scale as O(logM).
Conclusion
This paper shows that scalable kernel machines are feasible
with distributed computation.
Results suggest that the Nystrom method generally achieves
better statistical accuracy than random features
However, it can require signiïŹcantly more iterations of
optimization.
On the theoretical side, a limitation of this analysis is that
achieving rates better than gradient descent cannot be hoped.
References and Further Reading I
Stephen Tu, Rebecca Roelofs, Shivaram Venkataraman,
Benjamin Recht
Large Scale Kernel Learning using Block Coordinate Descent
February 18, 2016
Tianbao Yang, Yu-feng Li, Mehrdad Mahdavi, Rong Jin,
Zhi-Hua Zhou
Nystrom Method vs Random Fourier Features: A Theoretical
and Empirical Comparison
Advances in Neural Information Processing Systems 25 (NIPS
2012)

Weitere Àhnliche Inhalte

Was ist angesagt?

Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)
Sangamesh Ragate
 
Producer consumer-problems
Producer consumer-problemsProducer consumer-problems
Producer consumer-problems
Richard Ashworth
 
[ppt]
[ppt][ppt]
[ppt]
butest
 

Was ist angesagt? (20)

PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localization
 
A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...
A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...
A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...
 
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
 
virtualization
virtualizationvirtualization
virtualization
 
Energy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor Network
Energy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor NetworkEnergy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor Network
Energy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor Network
 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture Search
 
Producer consumer-problems
Producer consumer-problemsProducer consumer-problems
Producer consumer-problems
 
[딄녌읜] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딄녌읜] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review[딄녌읜] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딄녌읜] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
 
post119s1-file3
post119s1-file3post119s1-file3
post119s1-file3
 
Chap3 slides
Chap3 slidesChap3 slides
Chap3 slides
 
Image Processing IEEE 2015 Projects
Image Processing IEEE 2015 ProjectsImage Processing IEEE 2015 Projects
Image Processing IEEE 2015 Projects
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
 
[ppt]
[ppt][ppt]
[ppt]
 
Beyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networksBeyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networks
 
Optimal buffer allocation in
Optimal buffer allocation inOptimal buffer allocation in
Optimal buffer allocation in
 
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
 

Ähnlich wie Large Scale Kernel Learning using Block Coordinate Descent

Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
Pedro Lopes
 
Hyper-parameter optimization of convolutional neural network based on particl...
Hyper-parameter optimization of convolutional neural network based on particl...Hyper-parameter optimization of convolutional neural network based on particl...
Hyper-parameter optimization of convolutional neural network based on particl...
journalBEEI
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
IAEME Publication
 

Ähnlich wie Large Scale Kernel Learning using Block Coordinate Descent (20)

Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensembles
 
Super Resolution with OCR Optimization
Super Resolution with OCR OptimizationSuper Resolution with OCR Optimization
Super Resolution with OCR Optimization
 
IRJET- Automatic Object Sorting using Deep Learning
IRJET- Automatic Object Sorting using Deep LearningIRJET- Automatic Object Sorting using Deep Learning
IRJET- Automatic Object Sorting using Deep Learning
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)
 
Recursive
RecursiveRecursive
Recursive
 
Model checking
Model checkingModel checking
Model checking
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio
 
SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
 
Hyper-parameter optimization of convolutional neural network based on particl...
Hyper-parameter optimization of convolutional neural network based on particl...Hyper-parameter optimization of convolutional neural network based on particl...
Hyper-parameter optimization of convolutional neural network based on particl...
 
Hand Written Digit Classification
Hand Written Digit ClassificationHand Written Digit Classification
Hand Written Digit Classification
 
Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback  Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means Algorithm
 
Fulltext
FulltextFulltext
Fulltext
 
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
 

KĂŒrzlich hochgeladen

âž„đŸ” 7737669865 đŸ”â–» mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
âž„đŸ” 7737669865 đŸ”â–» mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...âž„đŸ” 7737669865 đŸ”â–» mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
âž„đŸ” 7737669865 đŸ”â–» mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Attibele ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 đŸ„” Book Your One night Stand
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
âž„đŸ” 7737669865 đŸ”â–» Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
âž„đŸ” 7737669865 đŸ”â–» Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...âž„đŸ” 7737669865 đŸ”â–» Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
âž„đŸ” 7737669865 đŸ”â–» Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Stand
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 JustđŸ“Č Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 JustđŸ“Č Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 JustđŸ“Č Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 JustđŸ“Č Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

KĂŒrzlich hochgeladen (20)

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
âž„đŸ” 7737669865 đŸ”â–» mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
âž„đŸ” 7737669865 đŸ”â–» mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...âž„đŸ” 7737669865 đŸ”â–» mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
âž„đŸ” 7737669865 đŸ”â–» mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Attibele ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Attibele ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 đŸ„” Book Your One night Stand
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
âž„đŸ” 7737669865 đŸ”â–» Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
âž„đŸ” 7737669865 đŸ”â–» Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...âž„đŸ” 7737669865 đŸ”â–» Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
âž„đŸ” 7737669865 đŸ”â–» Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Stand
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 JustđŸ“Č Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 JustđŸ“Č Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 JustđŸ“Č Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 JustđŸ“Č Call Ruhi Call Girl Phone No Amri...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

Large Scale Kernel Learning using Block Coordinate Descent

  • 1. Large Scale Kernel Learning using Block Coordinate Descent Shaleen Kumar Gupta, Research Assistant3 Authors: Stephen Tu1 Rebecca Reolofs1 Shivaram Venkatraman1 Benjamin Recht1,2 1Department of Electrical Engineering and Computer Science UC Berkeley, Berkeley, CA 2Department of Statistics UC Berkeley, Berkeley, CA 3Nanyang Technological University, 2016
  • 2. Outline 1 Overview Introduction Background 2 Datasets TIMIT Yelp Reviews CIFAR-10 3 Experimental Results 4 Performance and Scalability 5 Conclusion
  • 3. Outline 1 Overview Introduction Background 2 Datasets TIMIT Yelp Reviews CIFAR-10 3 Experimental Results 4 Performance and Scalability 5 Conclusion
  • 4. Overview Kernel methods are a powerful tool in machine learning, allowing one to discover non-linear structure by mapping data into a higher dimensional, possibly inïŹnite, feature space. Problem: They do not scale well. This paper attempts to exploit distributed computation in Block CD and present results. Moreover, the paper attempts to study the performance of Random Features and Nystrom approximations on three large datasets from speech (TIMIT), text (Yelp) and image classiïŹcation (CIFAR-10) domains.
  • 5. Outline 1 Overview Introduction Background 2 Datasets TIMIT Yelp Reviews CIFAR-10 3 Experimental Results 4 Performance and Scalability 5 Conclusion
  • 7. Kernel Methods https://www.reddit.com/r/MachineLearning/comments/15zrpp/please_explain_support_vector_machines_ svm_like_i/c7rkwce If our data can’t be separated by a straight line we might need to use a curvy line. A straight line in a higher dimensional space can be a curvy line when projected onto a lower dimensional space. So what we are really doing is using the kernel to put our data into a high dimensional space, then ïŹnding a hyperplane to separate the data in that high dimensional space. This straight line looks like a curvy line when we bring it down to the lower dimensional space.
  • 8. Kernel Approximation Techniques (1/2) Kernel Trick: The essence of the kernel-trick is that if you can describe an algorithm in a certain way – which is using only inner products – then you never need to actually use the feature mapping, as long as you can compute the inner product in the feature space. While there are many kernel approximation techniques to do the Kernel Trick, one prominent one is using the RBF Kernel. We will also analyze two other Kernel approximation techniques, namely Nystrom Method and Random Features Technique, in this paper.
  • 9. Kernel Approximation Techniques (2/2) If we would use all data points, we would map to an RN dimensional space and have the scaling problems. Also, we would need to store all kernel values. Nystrom method says that we don’t need go to the full space spanned by all N training points, but we can just use a subset. This will only yield an approximate embedding but if we keep the number of samples we use the same, the resulting embedding will be independent of dataset size and we can basically choose the complexity to suit our problem. Random feature based methods use an element-wise approximation of the kernel.
  • 10. Outline 1 Overview Introduction Background 2 Datasets TIMIT Yelp Reviews CIFAR-10 3 Experimental Results 4 Performance and Scalability 5 Conclusion
  • 11. TIMIT Phone classiïŹcation task was performed on the TIMIT dataset, which consisted of spoken audio from 462 speakers The authors applied a Gaussian (RBF) kernel for the Nystrom and exact methods and used random cosines for the random feature method.
  • 12. Outline 1 Overview Introduction Background 2 Datasets TIMIT Yelp Reviews CIFAR-10 3 Experimental Results 4 Performance and Scalability 5 Conclusion
  • 13. Yelp Reviews The goal was to predict a rating from one to ïŹve stars from the text of a review. A usual 80:20 Training:Test split was applied nltk was used for tokenization and stemming and n-gram modeling was done with n=3. For the exact and Nystrom experiments, they apply a linear kernel. For random features, they apply a hash kernel using MurmurHash3 as their hash function. Since they were predicting ratings for a review, they measured accuracy by using the root mean square error (RMSE) of the predicted rating as compared to the actual rating.
  • 14. Outline 1 Overview Introduction Background 2 Datasets TIMIT Yelp Reviews CIFAR-10 3 Experimental Results 4 Performance and Scalability 5 Conclusion
  • 15. CIFAR-10 The task was to do image classiïŹcation of the CIFAR-10 dataset. The dataset contained 500,000 training images and 4096 features per image. The authors started with these 4096 features in the dataset as input and used the RBF kernel for the exact and Nystrom method and random cosines for the random features method.
  • 16. Experimental Results (1/3) Figure: ClassiïŹcation Error against Time using diïŹ€erent methods on the TIMIT, Yelp and CIFAR-10 datasets. The little black stars denote the end of an epoch
  • 17. Experimental Results (2/3) Figure: ClassiïŹcation Error against number of features for Nystrom and Random Features on the TIMIT, Yelp and CIFAR-10 datasets
  • 19. Performance Figure: Breakdown of time to compute a single block of coordinate descent in the ïŹrst epoch on the TIMIT, Yelp and CIFAR-10 datasets From the ïŹgure, we see that the choice of the kernel approximation can signiïŹcantly impact performance since diïŹ€erent kernels take diïŹ€erent amounts of time to generate. For example, the hash random feature used for the Yelp dataset is much cheaper to compute than the string kernel. However, computing a block of the RBF kernel is similar in cost to computing a block of random cosine features.
  • 20. Scalability of RBF Kernel Generation Figure: Time taken to compute on eblock of RBF kernel as they scale the number of examples and the number of machines used Here, ideal scaling implies that the time to generate a block of the kernel matrix remains constant as they increase both the data and the number of machines. However, computing a block of the RBF kernel involves broadcasting a b x d matrix to all the machines in the cluster. This causes a slight decrease in performance as they go from 8 to 128 machines. However, they believe that the kernel block generation methods will continue to scale well for larger datasets since broadcast routines scale as O(logM).
  • 21. Conclusion This paper shows that scalable kernel machines are feasible with distributed computation. Results suggest that the Nystrom method generally achieves better statistical accuracy than random features However, it can require signiïŹcantly more iterations of optimization. On the theoretical side, a limitation of this analysis is that achieving rates better than gradient descent cannot be hoped.
  • 22. References and Further Reading I Stephen Tu, Rebecca Roelofs, Shivaram Venkataraman, Benjamin Recht Large Scale Kernel Learning using Block Coordinate Descent February 18, 2016 Tianbao Yang, Yu-feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou Nystrom Method vs Random Fourier Features: A Theoretical and Empirical Comparison Advances in Neural Information Processing Systems 25 (NIPS 2012)