SlideShare a Scribd company logo
1 of 12
Download to read offline
Distributed GLM Implementation
on H2O platform
Tomas Nykodym, 0xDATA
Linear Regression
Data:
x, y + noise
Goal:
predict y using x
i.e. find a,b s.t.
y = a*x + b
Linear Regression
Least Squares Fit
Real Relation:
y=3x+10+N(0,20)
Best Fit:
y = 3.08*x + 6
Prostate Cancer Example
Data:
x = PSA
(prostate-specific antigen)
y = CAPSULE
0 = no tumour
1 = tumour
Goal:
predict y using x
Prostate Cancer Example
Linear Regression Fit
Data:
x = PSA
(prostate-specific antigen)
y = CAPSULE
0 = no tumour
1 = tumour
Fit:
Least squares fit
Generalized Linear Model
Generalizes linear regression by:
– adding a link function g to transform the output
z = g(y) – new response variable
– noise (i.e.variance) does not have to be constant
– fit is maximal likelihood instead of least squares
Prostate Cancer
Logistic Regression Fit
Data:
x = PSA
(prostate-specific antigen)
y = CAPSULE
0 = no tumour
1 = tumour
GLM Fit:
– Binomial family
– Logit link
– Predict probability
of CAPSULE=1.
Implementation - Solve GLM by IRLSM
Input:
– X: data matrix N*P
– Y: response vector (N rows)
– family, link function, α,β
INNER LOOP:
Solve elastic net:
ADMM(Boyd 2010, page 43):
OUTER LOOP:
While β changes, compute:
zk +1=β k +( y−μ k )
d η
d μ
W k +1
−1
=(
d η
d μ
)
2
Var(μ k )
γ
l+1
=( X
T
WX +ρ I )
−1
X
T
Wz+ρ (β
l
−u
l
)
β l+1
=Sλ /ρ (γ l+1
+ul
)
ul+1
=uk
+γ l+1
−βl+1
Output:
– β vector of coefficients, solution
to max-likellihood
XX = X
T
W k+1 X
Xz= X
T
Wzk +1
H2O Implementation
Outer Loop:
(Map Reduce Task)
public class SimpleGLM extends MRTask {
@Override public void map(Chunk c) {
res = new double [p][p];
for(double [] x:c.rows()){
double eta,mu,var;
eta = computeEta(x);
mu = _link.linkInv(eta);
var = Math.max(1e-5,_family.variance(mu));
double dp = _link.linkInvDeriv(eta);
double w = dp*dp/var;
for(int i = 0; i < x.length; ++i)
for(int j = 0; j < x.length; ++j)
res[i][j] += x[i]*x[j]*w;
}
}
@Override public void reduce(SimpleGLM g) {
for(int i = 0; i < res.length; ++i)
for(int j = 0; i < res.length; ++i)
res[i][j] += g.res[i][j];
}
}
Inner Loop:
(ADMM solver)
public double [] solve(Matrix xx, Matrix xy) {
// ADMM LSM Solve
CholeskyDecomposition lu; // cache decomp!
lu = new CholeskyDecomposition(xx);
for( int i = 0; i < 1000; ++i ) {
// Solve using cached Cholesky decomposition!
xm = lu.solve(xyPrime);
// compute u and z update
for( int j = 0; j < N-1; ++j ) {
double x_hat = xm.get(j, 0);
x_norm += x_hat * x_hat;
double zold = z[j];
z[j] = shrinkage(x_hat + u[j], kappa);
u[j] += x_hat - z[j];
u_norm += u[j] * u[j];
}
}
}
double shrinkage(double x, double k) {
return Math.max(0,x-k)-Math.max(0,-x-k);
}
Regularization
Elastic Net (Zhou, Hastie, 2005):
● Added L1 and L2 penalty to β to:
– avoid overfitting, reduce variance
– obtain sparse solution (L1 penalty)
– avoid problems with correlated covariates
No longer analytical solution.
Options: LARS, ADMM, Generalized Gradient, ...
β =argmin( X β − y)
T
( X β −y)+α ∥β∥1+(1−α )∥β∥2
2
Linear Regression
Least Squares Method
Find β by minimizing the sum of squared errors:
Analytical solution:
Easily parallelized if XT
X is reasonably small.
β =(X T
X )−1
X T
y=(
1
n
∑ xi xi
T
)
−1
1
n
∑ xi y
β =argmin( X β − y)
T
( X β −y)
Generalized Linear Model
● Generalizes linear regression by:
– adding a link function g to transform the response
z = g(y) – new response variable
η = Xβ – linear predictor
μ = g-1
(η)
– y has a distribution in the exponential family
– variance depends on μ
e.g var(μ) = μ*(1-μ) for Binomial family.
– fit by maximizing the likelihood of the model

More Related Content

What's hot

Pytorch and Machine Learning for the Math Impaired
Pytorch and Machine Learning for the Math ImpairedPytorch and Machine Learning for the Math Impaired
Pytorch and Machine Learning for the Math ImpairedTyrel Denison
 
Numerical_Methods_Simpson_Rule
Numerical_Methods_Simpson_RuleNumerical_Methods_Simpson_Rule
Numerical_Methods_Simpson_RuleAlex_5991
 
Deep Learning with Julia1.0 and Flux
Deep Learning with Julia1.0 and FluxDeep Learning with Julia1.0 and Flux
Deep Learning with Julia1.0 and FluxSatoshi Terasaki
 
Fast parallelizable scenario-based stochastic optimization
Fast parallelizable scenario-based stochastic optimizationFast parallelizable scenario-based stochastic optimization
Fast parallelizable scenario-based stochastic optimizationPantelis Sopasakis
 
Sergey Shelpuk & Olha Romaniuk - “Deep learning, Tensorflow, and Fashion: how...
Sergey Shelpuk & Olha Romaniuk - “Deep learning, Tensorflow, and Fashion: how...Sergey Shelpuk & Olha Romaniuk - “Deep learning, Tensorflow, and Fashion: how...
Sergey Shelpuk & Olha Romaniuk - “Deep learning, Tensorflow, and Fashion: how...Lviv Startup Club
 
Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12
Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12
Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12OdessaFrontend
 
Statistics for Economics Midterm 2 Cheat Sheet
Statistics for Economics Midterm 2 Cheat SheetStatistics for Economics Midterm 2 Cheat Sheet
Statistics for Economics Midterm 2 Cheat SheetLaurel Ayuyao
 
The Uncertain Enterprise
The Uncertain EnterpriseThe Uncertain Enterprise
The Uncertain EnterpriseClarkTony
 
HMPC for Upper Stage Attitude Control
HMPC for Upper Stage Attitude ControlHMPC for Upper Stage Attitude Control
HMPC for Upper Stage Attitude ControlPantelis Sopasakis
 
CrystalBall - Compute Relative Frequency in Hadoop
CrystalBall - Compute Relative Frequency in Hadoop CrystalBall - Compute Relative Frequency in Hadoop
CrystalBall - Compute Relative Frequency in Hadoop Suvash Shah
 
Maximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeMaximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeTasuku Soma
 

What's hot (20)

Pytorch and Machine Learning for the Math Impaired
Pytorch and Machine Learning for the Math ImpairedPytorch and Machine Learning for the Math Impaired
Pytorch and Machine Learning for the Math Impaired
 
Semi vae memo (1)
Semi vae memo (1)Semi vae memo (1)
Semi vae memo (1)
 
Irs gan doc
Irs gan docIrs gan doc
Irs gan doc
 
Semi vae memo (2)
Semi vae memo (2)Semi vae memo (2)
Semi vae memo (2)
 
Numerical_Methods_Simpson_Rule
Numerical_Methods_Simpson_RuleNumerical_Methods_Simpson_Rule
Numerical_Methods_Simpson_Rule
 
Deep Learning with Julia1.0 and Flux
Deep Learning with Julia1.0 and FluxDeep Learning with Julia1.0 and Flux
Deep Learning with Julia1.0 and Flux
 
C++ ammar .s.q
C++  ammar .s.qC++  ammar .s.q
C++ ammar .s.q
 
Lec 3-mcgregor
Lec 3-mcgregorLec 3-mcgregor
Lec 3-mcgregor
 
Fast parallelizable scenario-based stochastic optimization
Fast parallelizable scenario-based stochastic optimizationFast parallelizable scenario-based stochastic optimization
Fast parallelizable scenario-based stochastic optimization
 
Sergey Shelpuk & Olha Romaniuk - “Deep learning, Tensorflow, and Fashion: how...
Sergey Shelpuk & Olha Romaniuk - “Deep learning, Tensorflow, and Fashion: how...Sergey Shelpuk & Olha Romaniuk - “Deep learning, Tensorflow, and Fashion: how...
Sergey Shelpuk & Olha Romaniuk - “Deep learning, Tensorflow, and Fashion: how...
 
Interpolation
InterpolationInterpolation
Interpolation
 
Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12
Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12
Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12
 
Cristian tovar 10 03 jt
Cristian tovar 10 03 jtCristian tovar 10 03 jt
Cristian tovar 10 03 jt
 
Cristian tovar 10 03 jt
Cristian tovar 10 03 jtCristian tovar 10 03 jt
Cristian tovar 10 03 jt
 
Statistics for Economics Midterm 2 Cheat Sheet
Statistics for Economics Midterm 2 Cheat SheetStatistics for Economics Midterm 2 Cheat Sheet
Statistics for Economics Midterm 2 Cheat Sheet
 
The Uncertain Enterprise
The Uncertain EnterpriseThe Uncertain Enterprise
The Uncertain Enterprise
 
HMPC for Upper Stage Attitude Control
HMPC for Upper Stage Attitude ControlHMPC for Upper Stage Attitude Control
HMPC for Upper Stage Attitude Control
 
CrystalBall - Compute Relative Frequency in Hadoop
CrystalBall - Compute Relative Frequency in Hadoop CrystalBall - Compute Relative Frequency in Hadoop
CrystalBall - Compute Relative Frequency in Hadoop
 
0004
00040004
0004
 
Maximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeMaximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer Lattice
 

Viewers also liked

H2O World - Building Data Products for Data Natives - Monica Rogati
H2O World - Building Data Products for Data Natives - Monica RogatiH2O World - Building Data Products for Data Natives - Monica Rogati
H2O World - Building Data Products for Data Natives - Monica RogatiSri Ambati
 
Running GLM in R
Running GLM in RRunning GLM in R
Running GLM in RSri Ambati
 
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14Sri Ambati
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreSri Ambati
 
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016Sri Ambati
 
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoH2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoSri Ambati
 

Viewers also liked (6)

H2O World - Building Data Products for Data Natives - Monica Rogati
H2O World - Building Data Products for Data Natives - Monica RogatiH2O World - Building Data Products for Data Natives - Monica Rogati
H2O World - Building Data Products for Data Natives - Monica Rogati
 
Running GLM in R
Running GLM in RRunning GLM in R
Running GLM in R
 
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
 
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016
 
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoH2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
 

Similar to Glm talk Tomas

Hybrid dynamics in large-scale logistics networks
Hybrid dynamics in large-scale logistics networksHybrid dynamics in large-scale logistics networks
Hybrid dynamics in large-scale logistics networksMKosmykov
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Sangwoo Mo
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydSri Ambati
 
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Michael Lie
 
SIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsSIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsJagadeeswaran Rathinavel
 
Logisticregression
LogisticregressionLogisticregression
Logisticregressionsujimaa
 
Logisticregression
LogisticregressionLogisticregression
Logisticregressionrnunoo
 
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...CDSL_at_SNU
 
logisticregression.ppt
logisticregression.pptlogisticregression.ppt
logisticregression.pptShrutiPanda12
 
NTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsMark Chang
 
Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Université de Liège (ULg)
 
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast AlgorithmsReinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast AlgorithmsSean Meyn
 
PMHMathematicaSample
PMHMathematicaSamplePMHMathematicaSample
PMHMathematicaSamplePeter Hammel
 
Hands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive ModelingHands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive ModelingArthur Charpentier
 
ADVANCED ALGORITHMS-UNIT-3-Final.ppt
ADVANCED   ALGORITHMS-UNIT-3-Final.pptADVANCED   ALGORITHMS-UNIT-3-Final.ppt
ADVANCED ALGORITHMS-UNIT-3-Final.pptssuser702532
 
Error control coding bch, reed-solomon etc..
Error control coding   bch, reed-solomon etc..Error control coding   bch, reed-solomon etc..
Error control coding bch, reed-solomon etc..Madhumita Tamhane
 

Similar to Glm talk Tomas (20)

Hybrid dynamics in large-scale logistics networks
Hybrid dynamics in large-scale logistics networksHybrid dynamics in large-scale logistics networks
Hybrid dynamics in large-scale logistics networks
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
 
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
 
SIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsSIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithms
 
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
 
Logisticregression
LogisticregressionLogisticregression
Logisticregression
 
Logisticregression
LogisticregressionLogisticregression
Logisticregression
 
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
 
logisticregression
logisticregressionlogisticregression
logisticregression
 
logisticregression.ppt
logisticregression.pptlogisticregression.ppt
logisticregression.ppt
 
NTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANs
 
Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...
 
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
 
ML unit-1.pptx
ML unit-1.pptxML unit-1.pptx
ML unit-1.pptx
 
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast AlgorithmsReinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
 
PMHMathematicaSample
PMHMathematicaSamplePMHMathematicaSample
PMHMathematicaSample
 
Hands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive ModelingHands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive Modeling
 
ADVANCED ALGORITHMS-UNIT-3-Final.ppt
ADVANCED   ALGORITHMS-UNIT-3-Final.pptADVANCED   ALGORITHMS-UNIT-3-Final.ppt
ADVANCED ALGORITHMS-UNIT-3-Final.ppt
 
Error control coding bch, reed-solomon etc..
Error control coding   bch, reed-solomon etc..Error control coding   bch, reed-solomon etc..
Error control coding bch, reed-solomon etc..
 

More from Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Recently uploaded (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Glm talk Tomas

  • 1. Distributed GLM Implementation on H2O platform Tomas Nykodym, 0xDATA
  • 2. Linear Regression Data: x, y + noise Goal: predict y using x i.e. find a,b s.t. y = a*x + b
  • 3. Linear Regression Least Squares Fit Real Relation: y=3x+10+N(0,20) Best Fit: y = 3.08*x + 6
  • 4. Prostate Cancer Example Data: x = PSA (prostate-specific antigen) y = CAPSULE 0 = no tumour 1 = tumour Goal: predict y using x
  • 5. Prostate Cancer Example Linear Regression Fit Data: x = PSA (prostate-specific antigen) y = CAPSULE 0 = no tumour 1 = tumour Fit: Least squares fit
  • 6. Generalized Linear Model Generalizes linear regression by: – adding a link function g to transform the output z = g(y) – new response variable – noise (i.e.variance) does not have to be constant – fit is maximal likelihood instead of least squares
  • 7. Prostate Cancer Logistic Regression Fit Data: x = PSA (prostate-specific antigen) y = CAPSULE 0 = no tumour 1 = tumour GLM Fit: – Binomial family – Logit link – Predict probability of CAPSULE=1.
  • 8. Implementation - Solve GLM by IRLSM Input: – X: data matrix N*P – Y: response vector (N rows) – family, link function, α,β INNER LOOP: Solve elastic net: ADMM(Boyd 2010, page 43): OUTER LOOP: While β changes, compute: zk +1=β k +( y−μ k ) d η d μ W k +1 −1 =( d η d μ ) 2 Var(μ k ) γ l+1 =( X T WX +ρ I ) −1 X T Wz+ρ (β l −u l ) β l+1 =Sλ /ρ (γ l+1 +ul ) ul+1 =uk +γ l+1 −βl+1 Output: – β vector of coefficients, solution to max-likellihood XX = X T W k+1 X Xz= X T Wzk +1
  • 9. H2O Implementation Outer Loop: (Map Reduce Task) public class SimpleGLM extends MRTask { @Override public void map(Chunk c) { res = new double [p][p]; for(double [] x:c.rows()){ double eta,mu,var; eta = computeEta(x); mu = _link.linkInv(eta); var = Math.max(1e-5,_family.variance(mu)); double dp = _link.linkInvDeriv(eta); double w = dp*dp/var; for(int i = 0; i < x.length; ++i) for(int j = 0; j < x.length; ++j) res[i][j] += x[i]*x[j]*w; } } @Override public void reduce(SimpleGLM g) { for(int i = 0; i < res.length; ++i) for(int j = 0; i < res.length; ++i) res[i][j] += g.res[i][j]; } } Inner Loop: (ADMM solver) public double [] solve(Matrix xx, Matrix xy) { // ADMM LSM Solve CholeskyDecomposition lu; // cache decomp! lu = new CholeskyDecomposition(xx); for( int i = 0; i < 1000; ++i ) { // Solve using cached Cholesky decomposition! xm = lu.solve(xyPrime); // compute u and z update for( int j = 0; j < N-1; ++j ) { double x_hat = xm.get(j, 0); x_norm += x_hat * x_hat; double zold = z[j]; z[j] = shrinkage(x_hat + u[j], kappa); u[j] += x_hat - z[j]; u_norm += u[j] * u[j]; } } } double shrinkage(double x, double k) { return Math.max(0,x-k)-Math.max(0,-x-k); }
  • 10. Regularization Elastic Net (Zhou, Hastie, 2005): ● Added L1 and L2 penalty to β to: – avoid overfitting, reduce variance – obtain sparse solution (L1 penalty) – avoid problems with correlated covariates No longer analytical solution. Options: LARS, ADMM, Generalized Gradient, ... β =argmin( X β − y) T ( X β −y)+α ∥β∥1+(1−α )∥β∥2 2
  • 11. Linear Regression Least Squares Method Find β by minimizing the sum of squared errors: Analytical solution: Easily parallelized if XT X is reasonably small. β =(X T X )−1 X T y=( 1 n ∑ xi xi T ) −1 1 n ∑ xi y β =argmin( X β − y) T ( X β −y)
  • 12. Generalized Linear Model ● Generalizes linear regression by: – adding a link function g to transform the response z = g(y) – new response variable η = Xβ – linear predictor μ = g-1 (η) – y has a distribution in the exponential family – variance depends on μ e.g var(μ) = μ*(1-μ) for Binomial family. – fit by maximizing the likelihood of the model