Glm talk Tomas

•

0 likes•2,046 views

Sri Ambati

The math and code behind H20's big data Generalized Linear Modeling (GLM) presented by Tomas Nykodym

Technology Health & Medicine

Distributed GLM Implementation
on H2O platform
Tomas Nykodym, 0xDATA

Linear Regression
Data:
x, y + noise
Goal:
predict y using x
i.e. find a,b s.t.
y = a*x + b

Linear Regression
Least Squares Fit
Real Relation:
y=3x+10+N(0,20)
Best Fit:
y = 3.08*x + 6

Prostate Cancer Example
Data:
x = PSA
(prostate-specific antigen)
y = CAPSULE
0 = no tumour
1 = tumour
Goal:
predict y using x

Prostate Cancer Example
Linear Regression Fit
Data:
x = PSA
(prostate-specific antigen)
y = CAPSULE
0 = no tumour
1 = tumour
Fit:
Least squares fit

Generalized Linear Model
Generalizes linear regression by:
– adding a link function g to transform the output
z = g(y) – new response variable
– noise (i.e.variance) does not have to be constant
– fit is maximal likelihood instead of least squares

Prostate Cancer
Logistic Regression Fit
Data:
x = PSA
(prostate-specific antigen)
y = CAPSULE
0 = no tumour
1 = tumour
GLM Fit:
– Binomial family
– Logit link
– Predict probability
of CAPSULE=1.

Implementation - Solve GLM by IRLSM
Input:
– X: data matrix N*P
– Y: response vector (N rows)
– family, link function, α,β
INNER LOOP:
Solve elastic net:
ADMM(Boyd 2010, page 43):
OUTER LOOP:
While β changes, compute:
zk +1=β k +( y−μ k )
d η
d μ
W k +1
−1
=(
d η
d μ
)
2
Var(μ k )
γ
l+1
=( X
T
WX +ρ I )
−1
X
T
Wz+ρ (β
l
−u
l
)
β l+1
=Sλ /ρ (γ l+1
+ul
)
ul+1
=uk
+γ l+1
−βl+1
Output:
– β vector of coefficients, solution
to max-likellihood
XX = X
T
W k+1 X
Xz= X
T
Wzk +1

$H2O Implementation Outer Loop: (Map Reduce Task) public class SimpleGLM extends MRTask { @Override public void map(Chunk c) { res = new double [p][p]; for(double [] x:c.rows()){ double eta,mu,var; eta = computeEta(x); mu = _link.linkInv(eta); var = Math.max(1e-5,_family.variance(mu)); double dp = _link.linkInvDeriv(eta); double w = dp*dp/var; for(int i = 0; i < x.length; ++i) for(int j = 0; j < x.length; ++j) res[i][j] += x[i]*x[j]*w; } } @Override public void reduce(SimpleGLM g) { for(int i = 0; i < res.length; ++i) for(int j = 0; i < res.length; ++i) res[i][j] += g.res[i][j]; } } Inner Loop: (ADMM solver) public double [] solve(Matrix xx, Matrix xy) { // ADMM LSM Solve CholeskyDecomposition lu; // cache decomp! lu = new CholeskyDecomposition(xx); for( int i = 0; i < 1000; ++i ) { // Solve using cached Cholesky decomposition! xm = lu.solve(xyPrime); // compute u and z update for( int j = 0; j < N-1; ++j ) { double x_hat = xm.get(j, 0); x_norm += x_hat * x_hat; double zold = z[j]; z[j] = shrinkage(x_hat + u[j], kappa); u[j] += x_hat - z[j]; u_norm += u[j] * u[j]; } } } double shrinkage(double x, double k) { return Math.max(0,x-k)-Math.max(0,-x-k); }$

Regularization
Elastic Net (Zhou, Hastie, 2005):
● Added L1 and L2 penalty to β to:
– avoid overfitting, reduce variance
– obtain sparse solution (L1 penalty)
– avoid problems with correlated covariates
No longer analytical solution.
Options: LARS, ADMM, Generalized Gradient, ...
β =argmin( X β − y)
T
( X β −y)+α ∥β∥1+(1−α )∥β∥2
2

Linear Regression
Least Squares Method
Find β by minimizing the sum of squared errors:
Analytical solution:
Easily parallelized if XT
X is reasonably small.
β =(X T
X )−1
X T
y=(
1
n
∑ xi xi
T
)
−1
1
n
∑ xi y
β =argmin( X β − y)
T
( X β −y)

Generalized Linear Model
● Generalizes linear regression by:
– adding a link function g to transform the response
z = g(y) – new response variable
η = Xβ – linear predictor
μ = g-1
(η)
– y has a distribution in the exponential family
– variance depends on μ
e.g var(μ) = μ*(1-μ) for Binomial family.
– fit by maximizing the likelihood of the model

What's hot

Pytorch and Machine Learning for the Math Impaired

Semi vae memo (1)

Irs gan doc

Semi vae memo (2)

Numerical_Methods_Simpson_Rule

Alex_5991

This slide shows an introduction to Deep Learning in Julia with Flux including how to load data, train, save model and predict. It also contains advanced topics e.g. how to use pretrained model, define own dataset iterator, ResNet-like layer and so on. Due to the critical issue that BatchNorm layer, Advanced layer e,g. MobileNetV2 defined on this slide does not work on GPU, we can't use them right now, but in the future I believe this documentation will helpful for someone who want to learn or use Flux. This slide is used for JuliaTokyo#8 conference on October 20, 2018

Deep Learning with Julia1.0 and Flux

Satoshi Terasaki

C++ ammar .s.q

ammarsalem5

Lec 3-mcgregor

Atner Yegorov

Fast parallelizable scenario-based stochastic optimization

Pantelis Sopasakis

Sergey Shelpuk & Olha Romaniuk - “Deep learning, Tensorflow, and Fashion: how...

Lviv Startup Club

Interpolation

Dmytro Mitin

В последние годы машинное обучаение получило широчайшее распространение во всех областях деятельности человека. каждая кофеварка и пылесос, не говоря уже о web приложениях, стараются сделать нашу жизнь чуточку лучше прибегая к использованию искусственного интеллекта. нужно ли получать научную степень для того чтобы попробовать себя в этом нелегком деле и может ли простой front-end разработчик применить у себя в родном фреймворке нейронку? Влад Борш рассказывает об этом и пытается разобраться откуда стартовать.

Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12

OdessaFrontend

Cristian tovar 10 03 jt

Isaac Geovaniz Tellez Chacon

Cristian tovar 10 03 jt

Isaac Geovaniz Tellez Chacon

Statistics for Economics Midterm 2 Cheat Sheet

Laurel Ayuyao

The Uncertain Enterprise

ClarkTony

HMPC for Upper Stage Attitude Control

Pantelis Sopasakis

CrystalBall - Compute Relative Frequency in Hadoop

Suvash Shah

0004

sts9ratm0624

Maximizing Submodular Function over the Integer Lattice

Tasuku Soma

What's hot (20)

Pytorch and Machine Learning for the Math Impaired

Semi vae memo (1)

Irs gan doc

Semi vae memo (2)

Numerical_Methods_Simpson_Rule

Deep Learning with Julia1.0 and Flux

C++ ammar .s.q

Lec 3-mcgregor

Fast parallelizable scenario-based stochastic optimization

Sergey Shelpuk & Olha Romaniuk - “Deep learning, Tensorflow, and Fashion: how...

Interpolation

Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12

Cristian tovar 10 03 jt

Statistics for Economics Midterm 2 Cheat Sheet

The Uncertain Enterprise

HMPC for Upper Stage Attitude Control

CrystalBall - Compute Relative Frequency in Hadoop

0004

Maximizing Submodular Function over the Integer Lattice

Viewers also liked

H2O World - Building Data Products for Data Natives - Monica Rogati

Sri Ambati

Running GLM in R

Sri Ambati

Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence. http://docs.0xdata.com/datascience/deeplearning.html - Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai - To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata

H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14

Sri Ambati

H2O World - Clustering & Feature Extraction on Text - Seth Redmore

Sri Ambati

H2O: A Platform for Big Math From just your laptop to 100's of nodes, H2O gives you a Single System Image - easy aggregation of all the memory and all the cores, and a simple coding style that scales wide at in-memory speeds. H2O is easily 1000x faster than disk based clustering solutions, and often 10x faster than best-of-breed alternative in-memory solutions - and will work directly on your existing Hadoop cluster. H2O ingests a wide variety of formats, parallel and distributed across the cluster, and stores the data highly compressed and then lets you do scale-out math at memory-bandwidth speeds (on compressed data!), making terabyte-scale munging an interactive experience. This is a technical talk on the insides of H2O, specifically focusing on the Single-System-Image aspect: how we write single-threaded code, and have H2O auto-parallelize and auto-scale-out to 100's of nodes and 1000's of cores. Arno is the Chief Architect of H2O, a distributed and scalable open-source machine learning platform. He is also the main author of H2O’s Deep Learning. Before joining H2O.ai, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives and collaborated with CERN on next-generation particle accelerators. Arno holds a PhD and Masters summa cum laude in Physics from ETH Zurich, Switzerland. He has authored dozens of scientific papers and is a sought-after conference speaker. Arno was named "2014 Big Data All-Star" by Fortune Magazine. Follow him on Twitter: @ArnoCandel. - Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai - To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata

Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016

Sri Ambati

H2O World - Advanced Analytics at Macys.com - Daqing Zhao

Sri Ambati

Viewers also liked (6)

H2O World - Building Data Products for Data Natives - Monica Rogati

Running GLM in R

H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14

H2O World - Clustering & Feature Extraction on Text - Seth Redmore

Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016

H2O World - Advanced Analytics at Macys.com - Daqing Zhao

Similar to Glm talk Tomas

Hybrid dynamics in large-scale logistics networks

MKosmykov

Improved Trainings of Wasserstein GANs (WGAN-GP)

Sangwoo Mo

H2O World - Consensus Optimization and Machine Learning - Stephen Boyd

Sri Ambati

Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...

Michael Lie

SIAM - Minisymposium on Guaranteed numerical algorithms

Jagadeeswaran Rathinavel

Simulators play a major role in analyzing multi-modal transportation networks. As their complexity increases, optimization becomes an increasingly challenging task. Current calibration procedures often rely on heuristics, rules of thumb and sometimes on brute-force search. Alternatively, we provide a statistical method which combines a distributed, Gaussian Process Bayesian optimization method with dimensionality reduction techniques and structural improvement. We then demonstrate our framework on the problem of calibrating a multi-modal transportation network of city of Bloomington, Illinois. Our framework is sample efficient and supported by theoretical analysis and an empirical study. We demonstrate on the problem of calibrating a multi-modal transportation network of city of Bloomington, Illinois. Finally, we discuss directions for further research.

MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...

The Statistical and Applied Mathematical Sciences Institute

Logisticregression

sujimaa

Logisticregression

rnunoo

Need for Controllers having Integer Coeﬃcients in Homomorphically Encrypted D...

CDSL_at_SNU

logisticregression

SahilShahPhD2020

logisticregression.ppt

ShrutiPanda12

NTHU AI Reading Group: Improved Training of Wasserstein GANs

Mark Chang

Computing near-optimal policies from trajectories by solving a sequence of st...

Université de Liège (ULg)

We study an elliptic eigenvalue problem, with a random coefficient that can be parametrised by infinitely-many stochastic parameters. The physical motivation is the criticality problem for a nuclear reactor: in steady state the fission reaction can be modeled by an elliptic eigenvalue problem, and the smallest eigenvalue provides a measure of how close the reaction is to equilibrium -- in terms of production/absorption of neutrons. The coefficients are allowed to be random to model the uncertainty of the composition of materials inside the reactor, e.g., the control rods, reactor structure, fuel rods etc. The randomness in the coefficient also results in randomness in the eigenvalues and corresponding eigenfunctions. As such, our quantity of interest is the expected value, with respect to the stochastic parameters, of the smallest eigenvalue, which we formulate as an integral over the infinite-dimensional parameter domain. Our approximation involves three steps: truncating the stochastic dimension, discretizing the spatial domain using finite elements and approximating the now finite but still high-dimensional integral. To approximate the high-dimensional integral we use quasi-Monte Carlo (QMC) methods. These are deterministic or quasi-random quadrature rules that can be proven to be very efficient for the numerical integration of certain classes of high-dimensional functions. QMC methods have previously been applied to linear functionals of the solution of a similar elliptic source problem; however, because of the nonlinearity of eigenvalues the existing analysis of the integration error does not hold in our case. We show that the minimal eigenvalue belongs to the spaces required for QMC theory, outline the approximation algorithm and provide numerical results.

QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...

The Statistical and Applied Mathematical Sciences Institute

ML unit-1.pptx

SwarnaKumariChinni

A tutorial, and very new algorithms -- more details on arXiv and at NIPS 2017 https://arxiv.org/abs/1707.03770 Part of the Data Science Summer School at École Polytechnique: http://www.ds3-datascience-polytechnique.fr/program/ --------- 2018 Updates: See Zap slides from ISMP 2018 for new inverse-free optimal algorithms Simons tutorial, March 2018 [one month before most discoveries announced at ISMP] Part I (Basics, with focus on variance of algorithms) https://www.youtube.com/watch?v=dhEF5pfYmvc Part II (Zap Q-learning) https://www.youtube.com/watch?v=Y3w8f1xIb6s Big 2017 survey on variance in SA: Fastest convergence for Q-learning https://arxiv.org/abs/1707.03770 You will find the infinite-variance Q result there. Our NIPS 2017 paper is distilled from this.

Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms

Sean Meyn

PMHMathematicaSample

Peter Hammel

Hands-On Algorithms for Predictive Modeling

Arthur Charpentier

ADVANCED ALGORITHMS-UNIT-3-Final.ppt

ssuser702532

Error control codes are necessary for transmission and storage of large volumes of date sensitive to errors. BCH codes and Reed Solomon codes are the most important class of multiple error correcting codes for binary and non-binary channels respectively. Peterson and later Berlekamp and Massey discovered powerful algorithms which became viable with the help of new digital technology. Use of Galois fields gave a structured approach to designing of these codes. This presentation deals with above in a very structured and systematic manner.

Error control coding bch, reed-solomon etc..

Madhumita Tamhane

Similar to Glm talk Tomas (20)

Hybrid dynamics in large-scale logistics networks

Improved Trainings of Wasserstein GANs (WGAN-GP)

H2O World - Consensus Optimization and Machine Learning - Stephen Boyd

Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...

SIAM - Minisymposium on Guaranteed numerical algorithms

MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...

Logisticregression

Need for Controllers having Integer Coeﬃcients in Homomorphically Encrypted D...

logisticregression

logisticregression.ppt

NTHU AI Reading Group: Improved Training of Wasserstein GANs

Computing near-optimal policies from trajectories by solving a sequence of st...

QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...

ML unit-1.pptx

Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms

PMHMathematicaSample

Hands-On Algorithms for Predictive Modeling

ADVANCED ALGORITHMS-UNIT-3-Final.ppt

Error control coding bch, reed-solomon etc..

More from Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day

Sri Ambati

Generative AI Masterclass - Model Risk Management.pptx

Sri Ambati

AI and the Future of Software Development: A Sneak Peek

Sri Ambati

LLMOps: Match report from the top of the 5th

Sri Ambati

Building, Evaluating, and Optimizing your RAG App for Production

Sri Ambati

Sandeep Singh, Head of Applied AI Computer Vision, Beans.ai H2O Open Source GenAI World SF 2023 In the modern era of machine learning, leveraging both open-source and closed-source solutions has become paramount for achieving cutting-edge results. This talk delves into the intricacies of seamlessly integrating open-source Large Language Model (LLM) solutions like Vicuna, Falcon, and Llama with industry giants such as ChatGPT and Google's Palm. As the demand for fine-tuned and specialized datasets grows, it is imperative to understand the synergy between these tools. Attendees will gain insights into best practices for building and enriching datasets tailored for fine-tuning tasks, ensuring that their LLM projects are both robust and efficient. Through real-world examples and hands-on demonstrations, this talk will equip attendees with the knowledge to harness the power of both open and closed-source tools in a coherent and effective manner.

Building LLM Solutions using Open Source and Closed Source Solutions in Coher...

Sri Ambati

Patrick Hall, Professor, AI Risk Management, The George Washington University H2O Open Source GenAI World SF 2023 Language models are incredible engineering breakthroughs but require auditing and risk management before productization. These systems raise concerns about toxicity, transparency and reproducibility, intellectual property licensing and ownership, disinformation and misinformation, supply chains, and more. How can your organization leverage these new tools without taking on undue or unknown risks? While language models and associated risk management are in their infancy, a small number of best practices in governance and risk are starting to emerge. If you have a language model use case in mind, want to understand your risks, and do something about them, this presentation is for you!

Risk Management for LLMs

Sri Ambati

Dr. Alexy Khrabrov, Open Source Science Community Director, IBM H2O Open Source GenAI World SF 2023 In this talk, Dr. Alexy Khrabrov, recently elected Chair of the new Generative AI Commons at Linux Foundation for AI & Data, outlines the OSS AI landscape, challenges, and opportunities. With new models and frameworks being unveiled weekly, one thing remains constant: community building and validation of all aspects of AI is key to reliable and responsible AI we can use for business and society needs. Industrial AI is one key area where such community validation can prove invaluable.

Open-Source AI: Community is the Way

Sri Ambati

Building Custom GenAI Apps at H2O

Sri Ambati

Megan Kurka, Vice President, Customer Data Scientist, H2O.ai H2O Open Source GenAI World SF 2023 Discover the transformative power of Applied Gen AI. Learn how the H2O team builds customized applications and workflows that integrate capabilities of Gen AI and AutoML specifically designed to address and enhance financial use cases. Explore real world examples, learn best practices, and witness firsthand how our innovative solutions are reshaping the landscape of finance technology.

Applied Gen AI for the Finance Vertical

Sri Ambati

Cutting Edge Tricks from LLM Papers

Sri Ambati

Pascal Pfeiffer, Principal Data Scientist, H2O.ai H2O Open Source GenAI World SF 2023 This talk dives into the expansive ecosystem of Large Language Models (LLMs), offering practitioners an insightful guide to various relevant applications, from natural language understanding to creative content generation. While exploring use cases across different industries, it also honestly addresses the current limitations of LLMs and anticipates future advancements.

Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...

Sri Ambati

Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...

Sri Ambati

KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...

Sri Ambati

LLM Interpretability

Sri Ambati

Never Reply to an Email Again

Sri Ambati

Introducción al Aprendizaje Automatico con H2O-3 (1)

Sri Ambati

Numerai is an open, crowd-sourced hedge fund powered by predictions from data scientists around the world. In return, participants are rewarded with weekly payouts in crypto. In this talk, Joe will give an overview of the Numerai tournament based on his own experience. He will then explain how he automates the time-consuming tasks such as testing different modelling strategies, scoring new datasets, submitting predictions to Numerai as well as monitoring model performance with H2O Driverless AI and R.

From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...

Sri Ambati

In this session, you will learn about what you should do after you’ve taken an AI transformation baseline. Over the span of this session, we will discuss the next steps in moving toward AI readiness through alignment of talent and tools to drive successful adoption and continuous use within an organization. To find additional videos on AI courses, earn badges, join the courses at H2O.ai Learning Center: https://training.h2o.ai/products/ai-foundations-course To find the Youtube video about this presentation: https://youtu.be/K1Cl3x3rd8g Speaker: Chemere Davis (H2O.ai - Senior Data Scientist Training Specialist)

AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...

Sri Ambati

The chances of successfully implementing AI strategies within an organization significantly improve when you can recognize where your organization is on the maturity scale. Over this course, you will learn the keys to unlocking value with AI which include asking the right questions about the problems you are solving and ensuring you have the right cross-section of talent, tools, and resources. By the end of this module, you should be able to recognize where your organization is on the AI transformation spectrum and identify some strategies that can get you to the next stage in your journey. To find additional videos on AI courses, earn badges, join the courses at H2O.ai Learning Center: https://training.h2o.ai/products/ai-foundations-course To find the Youtube video about this presentation: https://youtu.be/PJgr2epM6qs Speakers: Chemere Davis (H2O.ai - Senior Data Scientist Training Specialist) Ingrid Burton (H2O.ai - CMO)

AI Foundations Course Module 1 - An AI Transformation Journey

Sri Ambati

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day

Generative AI Masterclass - Model Risk Management.pptx

AI and the Future of Software Development: A Sneak Peek

LLMOps: Match report from the top of the 5th

Building, Evaluating, and Optimizing your RAG App for Production

Building LLM Solutions using Open Source and Closed Source Solutions in Coher...

Risk Management for LLMs

Open-Source AI: Community is the Way

Building Custom GenAI Apps at H2O

Applied Gen AI for the Finance Vertical

Cutting Edge Tricks from LLM Papers

Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...

Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...

KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...

LLM Interpretability

Never Reply to an Email Again

Introducción al Aprendizaje Automatico con H2O-3 (1)

From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...

AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...

AI Foundations Course Module 1 - An AI Transformation Journey

Recently uploaded

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

apidays

In this session, we will delve into strategic approaches for optimizing knowledge management within Microsoft 365, amidst the evolving landscape of Copilot. From leveraging automatic metadata classification and permission governance with SharePoint Premium, to unlocking Viva Engage for the cultivation of knowledge and communities, you will gain actionable insights to bolster your organization's knowledge-sharing initiatives. In this session, we will also explore how to facilitate solutions to enable your employees to find answers and expertise within Microsoft 365. You will leave equipped with practical techniques and a deeper understanding of how there is more to effective knowledge management than just enabling Copilot, but building actual solutions to prepare the knowledge that Copilot and your employees can use.

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Drew Madelung

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

Enterprise Knowledge’s Urmi Majumder, Principal Data Architecture Consultant, and Fernando Aguilar Islas, Senior Data Science Consultant, presented "Driving Behavioral Change for Information Management through Data-Driven Green Strategy" on March 27, 2024 at Enterprise Data World (EDW) in Orlando, Florida. In this presentation, Urmi and Fernando discussed a case study describing how the information management division in a large supply chain organization drove user behavior change through awareness of the carbon footprint of their duplicated and near-duplicated content, identified via advanced data analytics. Check out their presentation to gain valuable perspectives on utilizing data-driven strategies to influence positive behavioral shifts and support sustainability initiatives within your organization. In this session, participants gained answers to the following questions: - What is a Green Information Management (IM) Strategy, and why should you have one? - How can Artificial Intelligence (AI) and Machine Learning (ML) support your Green IM Strategy through content deduplication? - How can an organization use insights into their data to influence employee behavior for IM? - How can you reap additional benefits from content reduction that go beyond Green IM?

Driving Behavioral Change for Information Management through Data-Driven Gree...

Enterprise Knowledge

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

Advantages of Hiring UIUX Design Service Providers for Your Business

Pixlogix Infotech

Partners Life - Insurer Innovation Award 2024

The Digital Insurer

With more memory available, system performance of three Dell devices increased, which can translate to a better user experience Conclusion When your system has plenty of RAM to meet your needs, you can efficiently access the applications and data you need to finish projects and to-do lists without sacrificing time and focus. Our test results show that with more memory available, three Dell PCs delivered better performance and took less time to complete the Procyon Office Productivity benchmark. These advantages translate to users being able to complete workflows more quickly and multitask more easily. Whether you need the mobility of the Latitude 5440, the creative capabilities of the Precision 3470, or the high performance of the OptiPlex Tower Plus 7010, configuring your system with more RAM can help keep processes running smoothly, enabling you to do more without compromising performance.

Boost PC performance: How more available memory can improve productivity

Principled Technologies

The presentation explores the development and application of artificial intelligence (AI) from its inception to its current status in the modern world. The term "artificial intelligence" was first coined by John McCarthy in 1956 to describe efforts to develop computer programs capable of performing tasks that typically require human intelligence. This concept was first introduced at a conference held at Dartmouth College, where programs demonstrated capabilities such as playing chess, proving theorems, and interpreting texts. In the early stages, Alan Turing contributed to the field by defining intelligence as the ability of a being to respond to certain questions intelligently, proposing what is now known as the Turing Test to evaluate the presence of intelligent behavior in machines. As the decades progressed, AI evolved significantly. The 1980s focused on machine learning, teaching computers to learn from data, leading to the development of models that could improve their performance based on their experiences. The 1990s and 2000s saw further advances in algorithms and computational power, which allowed for more sophisticated data analysis techniques, including data mining. By the 2010s, the proliferation of big data and the refinement of deep learning techniques enabled AI to become mainstream. Notable milestones included the success of Google's AlphaGo and advancements in autonomous vehicles by companies like Tesla and Waymo. A major theme of the presentation is the application of generative AI, which has been used for tasks such as natural language text generation, translation, and question answering. Generative AI uses large datasets to train models that can then produce new, coherent pieces of text or other media. The presentation also discusses the ethical implications and the need for regulation in AI, highlighting issues such as privacy, bias, and the potential for misuse. These concerns have prompted calls for comprehensive regulations to ensure the safe and equitable use of AI technologies. Artificial intelligence has also played a significant role in healthcare, particularly highlighted during the COVID-19 pandemic, where it was used in drug discovery, vaccine development, and analyzing the spread of the virus. The capabilities of AI in healthcare are vast, ranging from medical diagnostics to personalized medicine, demonstrating the technology's potential to revolutionize fields beyond just technical or consumer applications. In conclusion, AI continues to be a rapidly evolving field with significant implications for various aspects of society. The development from theoretical concepts to real-world applications illustrates both the potential benefits and the challenges that come with integrating advanced technologies into everyday life. The ongoing discussion about AI ethics and regulation underscores the importance of managing these technologies responsibly to maximize their their benefits while minimizing potential harms.

Artificial Intelligence: Facts and Myths

Joaquim Jorge

Automating Google Workspace (GWS) & more with Apps Script

wesley chun

Developing An App To Navigate The Roads of Brazil

V3cube

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

This presentation explores the impact of HTML injection attacks on web applications, detailing how attackers exploit vulnerabilities to inject malicious code into web pages. Learn about the potential consequences of such attacks and discover effective mitigation strategies to protect your web applications from HTML injection vulnerabilities. for more information visit https://bostoninstituteofanalytics.org/category/cyber-security-ethical-hacking/

HTML Injection Attacks: Impact and Mitigation Strategies

Boston Institute of Analytics

This presentations targets students or working professionals. You may know Google for search, YouTube, Android, Chrome, and Gmail, but did you know Google has many developer tools, platforms & APIs? This comprehensive yet still high-level overview outlines the most impactful tools for where to run your code, store & analyze your data. It will also inspire you as to what's possible. This talk is 50 minutes in length.

Powerful Google developer tools for immediate impact! (2023-24 C)

wesley chun

What are drone anti-jamming systems? The drone anti-jamming systems and anti-spoof technology protect against interference, jamming, and spoofing of the UAVs. To protect their security, countries are beginning to research drone anti-jamming systems, also known as drone strike weapons. The anti-jam and anti-spoof technology protects against interference, jamming and spoofing. A drone strike weapon is a drone attack weapon that can attack and destroy enemy drones. So what is so unique about this amazing system?

What Are The Drone Anti-jamming Systems Technology?

Antenna Manufacturer Coco

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Rafal Los

Tata AIG General Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Strategies for Landing an Oracle DBA Job as a Fresher

Driving Behavioral Change for Information Management through Data-Driven Gree...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Advantages of Hiring UIUX Design Service Providers for Your Business

Partners Life - Insurer Innovation Award 2024

Boost PC performance: How more available memory can improve productivity

Artificial Intelligence: Facts and Myths

Automating Google Workspace (GWS) & more with Apps Script

Developing An App To Navigate The Roads of Brazil

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

HTML Injection Attacks: Impact and Mitigation Strategies

Powerful Google developer tools for immediate impact! (2023-24 C)

What Are The Drone Anti-jamming Systems Technology?

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

Boost Fertility New Invention Ups Success Rates.pdf

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Glm talk Tomas

1. Distributed GLM Implementation on H2O platform Tomas Nykodym, 0xDATA

2. Linear Regression Data: x, y + noise Goal: predict y using x i.e. find a,b s.t. y = a*x + b

3. Linear Regression Least Squares Fit Real Relation: y=3x+10+N(0,20) Best Fit: y = 3.08*x + 6

4. Prostate Cancer Example Data: x = PSA (prostate-specific antigen) y = CAPSULE 0 = no tumour 1 = tumour Goal: predict y using x

5. Prostate Cancer Example Linear Regression Fit Data: x = PSA (prostate-specific antigen) y = CAPSULE 0 = no tumour 1 = tumour Fit: Least squares fit

6. Generalized Linear Model Generalizes linear regression by: – adding a link function g to transform the output z = g(y) – new response variable – noise (i.e.variance) does not have to be constant – fit is maximal likelihood instead of least squares

7. Prostate Cancer Logistic Regression Fit Data: x = PSA (prostate-specific antigen) y = CAPSULE 0 = no tumour 1 = tumour GLM Fit: – Binomial family – Logit link – Predict probability of CAPSULE=1.

8. Implementation - Solve GLM by IRLSM Input: – X: data matrix N*P – Y: response vector (N rows) – family, link function, α,β INNER LOOP: Solve elastic net: ADMM(Boyd 2010, page 43): OUTER LOOP: While β changes, compute: zk +1=β k +( y−μ k ) d η d μ W k +1 −1 =( d η d μ ) 2 Var(μ k ) γ l+1 =( X T WX +ρ I ) −1 X T Wz+ρ (β l −u l ) β l+1 =Sλ /ρ (γ l+1 +ul ) ul+1 =uk +γ l+1 −βl+1 Output: – β vector of coefficients, solution to max-likellihood XX = X T W k+1 X Xz= X T Wzk +1

9. H2O Implementation Outer Loop: (Map Reduce Task) public class SimpleGLM extends MRTask { @Override public void map(Chunk c) { res = new double [p][p]; for(double [] x:c.rows()){ double eta,mu,var; eta = computeEta(x); mu = _link.linkInv(eta); var = Math.max(1e-5,_family.variance(mu)); double dp = _link.linkInvDeriv(eta); double w = dp*dp/var; for(int i = 0; i < x.length; ++i) for(int j = 0; j < x.length; ++j) res[i][j] += x[i]*x[j]*w; } } @Override public void reduce(SimpleGLM g) { for(int i = 0; i < res.length; ++i) for(int j = 0; i < res.length; ++i) res[i][j] += g.res[i][j]; } } Inner Loop: (ADMM solver) public double [] solve(Matrix xx, Matrix xy) { // ADMM LSM Solve CholeskyDecomposition lu; // cache decomp! lu = new CholeskyDecomposition(xx); for( int i = 0; i < 1000; ++i ) { // Solve using cached Cholesky decomposition! xm = lu.solve(xyPrime); // compute u and z update for( int j = 0; j < N-1; ++j ) { double x_hat = xm.get(j, 0); x_norm += x_hat * x_hat; double zold = z[j]; z[j] = shrinkage(x_hat + u[j], kappa); u[j] += x_hat - z[j]; u_norm += u[j] * u[j]; } } } double shrinkage(double x, double k) { return Math.max(0,x-k)-Math.max(0,-x-k); }

10. Regularization Elastic Net (Zhou, Hastie, 2005): ● Added L1 and L2 penalty to β to: – avoid overfitting, reduce variance – obtain sparse solution (L1 penalty) – avoid problems with correlated covariates No longer analytical solution. Options: LARS, ADMM, Generalized Gradient, ... β =argmin( X β − y) T ( X β −y)+α ∥β∥1+(1−α )∥β∥2 2

11. Linear Regression Least Squares Method Find β by minimizing the sum of squared errors: Analytical solution: Easily parallelized if XT X is reasonably small. β =(X T X )−1 X T y=( 1 n ∑ xi xi T ) −1 1 n ∑ xi y β =argmin( X β − y) T ( X β −y)

12. Generalized Linear Model ● Generalizes linear regression by: – adding a link function g to transform the response z = g(y) – new response variable η = Xβ – linear predictor μ = g-1 (η) – y has a distribution in the exponential family – variance depends on μ e.g var(μ) = μ*(1-μ) for Binomial family. – fit by maximizing the likelihood of the model

Glm talk Tomas

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Glm talk Tomas

Similar to Glm talk Tomas (20)

More from Sri Ambati

More from Sri Ambati (20)

Recently uploaded

Recently uploaded (20)

Glm talk Tomas