Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017

•

7 likes•5,335 views

Alexandra Johnson, Software Engineer, SigOpt Alexandra works on everything from infrastructure to product features to blog posts. Previously, she worked on growth, APIs, and recommender systems at Polyvore (acquired by Yahoo). She majored in computer science at Carnegie Mellon University with a minor in discrete mathematics and logic, and during the summers she A/B tested recommendations at internships with Facebook and Rent the Runway. Abstract Summary: Common Problems In Hyperparameter Optimization: All large machine learning pipelines have tunable parameters, commonly referred to as hyperparameters. Hyperparameter optimization is the process by which we find the values for these parameters that cause our system to perform the best. SigOpt provides a Bayesian optimization platform that is commonly used for hyperparameter optimization, and I’m going to share some of the common problems we’ve seen when integrating into machine learning pipelines.

Technology

Common Problems in Hyperparameter
Optimization
Alexandra Johnson
@alexandraj777

Hyperparameter Optimization
● Hyperparameter
tuning, model tuning,
model selection
● Finding "the best"
values for the
hyperparameters of
your model

Better Performance
● +315% accuracy boost for TensorFlow
● +49% accuracy boost for xgboost
● -41% error reduction for recommender system

● Default values are an implicit choice
● Defaults not always appropriate for your model
● You may build a classifier that looks like this:
Default Values

Choosing a Metric
● Balance long-term
and short-term goals
● Question underlying
assumptions
● Example from
Microsoft

Choose Multiple Metrics
●
● Composite Metric
● Multi-metric

Metric Generalization
● Cross validation
● Backtesting
● Regularization terms

Example: xgboost
● Optimized model
always performed
better with tuned
feature parameters
● No matter which
optimization method

You are not an Optimization Method
● Hand tuning is time
consuming and
expensive
● Algorithms can
quickly and cheaply
beat expert tuning

Grid Search Random Search Bayesian Optimization
Use an Algorithm

No Grid Search
Hyper-
parameters
Model
Evaluations
2 100
3 1,000
4 10,000
5 100,000

Random Search
● Theoretically more
effective than grid
search
● Large variance in
results
● No intelligence

Use an Intelligent Method
Genetic algorithms
Bayesian optimization
Particle-based methods
Convex optimizers
Simulated annealing
To name a few...

SigOpt: Bayesian Optimization Service
Three API calls:
1. Define
hyperparameters
2. Receive suggested
hyperparameters
3. Report observed
performance

Intro
Ian Dewancker. SigOpt for ML: TensorFlow ConvNets on a Budget with Bayesian Optimization.
Ian Dewancker. SigOpt for ML: Unsupervised Learning with Even Less Supervision Using Bayesian Optimization.
Ian Dewancker. SigOpt for ML : Bayesian Optimization for Collaborative Filtering with MLlib.
#1 Trusting the Defaults
Keras recurrent layers documentation
#2 Using the Wrong Metric
Ron Kohavi et al. Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained.
Xavier Amatriain. 10 Lessons Learning from building ML systems [Video at 19:03].
Image from PhD Comics.
See also: SigOpt in Depth: Intro to Multicriteria Optimization.
#4 Too Few Hyperparameters
Image from TensorFlow Playground.
Ian Dewancker. SigOpt for ML: Unsupervised Learning with Even Less Supervision Using Bayesian Optimization.
#5 Hand Tuning
On algorithms beating experts: Scott Clark, Ian Dewancker, and Sathish Nagappan. Deep Neural Network Optimization with SigOpt and Nervana
Cloud.
#6 Grid Search
NoGridSearch.com
References - by Section

References - by Section
#7 Random Search
James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization.
Ian Dewancker, Michael McCourt, Scott Clark, Patrick Hayes, Alexandra Johnson, George Ke. A Stratified Analysis of Bayesian Optimization
Methods.
Learn More
blog.sigopt.com
sigopt.com/research

What's hot

Machine Learning Fundamentals

SigOpt

Scott Clark, CEO, SigOpt, at The AI Conference 2017

MLconf

Using SigOpt to Tune Deep Learning Models with Nervana Cloud

SigOpt

Winning Kaggle 101: Introduction to Stacking

Ted Xiao

Using AI to build AI is a promising solution to give the power of AI to those who can't afford it as those multinational corporations. The technology is also known as Automatic Machine Learning (AutoML). OneClick.ai is the first deep learning AutoML platform that make the latest AI technology accessible to anyone with/without AI background. The deck gives a 30 minutes overview of the recent history of AutoML, and how OneClick.ai innovates on it. Check out our platform at http://www.oneclick.ai

The Evolution of AutoML

Ning Jiang

SigOpt for Machine Learning and AI

SigOpt

General Tips for participating Kaggle Competitions

Mark Peng

Machine Learning for .NET Developers - ADC21

Gülden Bilgütay

Best Practices for Hyperparameter Optimization: All machine learning and artificial intelligence pipelines – from reinforcement agents to deep neural nets – have tunable hyperparameters. Optimizing these hyperparameters provides tremendous performance gains, but only if the optimization is done correctly. This presentation will discuss topics including selecting performance criteria, why you should always use cross validation, and choosing between state of the art optimization methods.

Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017

MLconf

Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...

Aseda Owusua Addai-Deseh

Making Netflix Machine Learning Algorithms Reliable

Justin Basilico

Talk from Software Engineering for Machine Learning Workshop (SW4ML) at the Neural Information Processing Systems (NIPS) 2014 conference in Montreal, Canada on 2014-12-13. Abstract: Building a real system that incorporates machine learning as a part can be a difficult effort, both in terms of the algorithmic and engineering challenges involved. In this talk I will focus on the engineering side and discuss some of the practical issues we’ve encountered in developing real machine learning systems at Netflix and some of the lessons we’ve learned over time. I will describe our approach for building machine learning systems and how it comes from a desire to balance many different, and sometimes conflicting, requirements such as handling large volumes of data, choosing and adapting good algorithms, keeping recommendations fresh and accurate, remaining responsive to user actions, and also being flexible to accommodate research and experimentation. I will focus on what it takes to put machine learning into a real system that works in a feedback loop with our users and how that imposes different requirements and a different focus than doing machine learning only within a lab environment. I will address the particular software engineering challenges that we’ve faced in running our algorithms at scale in the cloud. I will also mention some simple design patterns that we’ve fond to be useful across a wide variety of machine-learned systems.

Lessons Learned from Building Machine Learning Software at Netflix

Justin Basilico

Alexandra johnson reducing operational barriers to model training

MLconf

"Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. In a typical machine learning application, practitioners must apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning. Following those preprocessing steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their final machine learning model. As many of these steps are often beyond the abilities of non-experts, AutoML was proposed as an artificial intelligence-based solution to the ever-growing challenge of applying machine learning. Automating the end-to-end process of applying machine learning offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform models that were designed by hand." In this talk we will discuss how QuSandbox and the Model Analytics Studio can be used in the selection of machine learning models. We will also illustrate AutoML frameworks through demos and examples and show you how to get started

Automatic machine learning (AutoML) 101

QuantUniversity

Neel Sundaresan - Teaching a machine to code

MLconf

Feature Engineering

HJ van Veen

Abstract – Although industries have started to adopt AI and Machine Learning in almost every sector to solve complex business problems, but are these models always trustworthy? Machine Learning models are not any oracle but rather are scientific methods and mathematical models which best describes the data. But science is all about explaining complex natural phenomena in the simplest way possible! So, can we make ML and DL models more interpretable, so that any business user can understand these models and trust the results of these models? In order to find out the answer, please join me in this session, in which I will take about concepts of Explainable AI and discuss its necessity and principles which help us demystify black-box AI models. I will be discussing about popular approaches like Feature Importance, Key Influencers, Decomposition trees used in classical Machine Learning interpretable. We will discuss about various techniques used for Deep Learning model interpretations like Saliency Maps, Grad-CAMs, Visual Attention Maps and finally go through more details about frameworks like LIME, SHAP, ELI5, SKATER, TCAV which helps us to make Machine Learning and Deep Learning models more interpretable, trustworthy and useful!

Explainable AI - making ML and DL models more interpretable

Aditya Bhattacharya

Adopting software design practices for better machine learning

MLconf

Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science. In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.

The Power of Auto ML and How Does it Work

Ivo Andreev

Winning data science competitions

Owen Zhang

What's hot (20)

Machine Learning Fundamentals

Scott Clark, CEO, SigOpt, at The AI Conference 2017

Using SigOpt to Tune Deep Learning Models with Nervana Cloud

Winning Kaggle 101: Introduction to Stacking

The Evolution of AutoML

SigOpt for Machine Learning and AI

General Tips for participating Kaggle Competitions

Machine Learning for .NET Developers - ADC21

Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017

Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...

Making Netflix Machine Learning Algorithms Reliable

Lessons Learned from Building Machine Learning Software at Netflix

Alexandra johnson reducing operational barriers to model training

Automatic machine learning (AutoML) 101

Neel Sundaresan - Teaching a machine to code

Feature Engineering

Explainable AI - making ML and DL models more interpretable

Adopting software design practices for better machine learning

The Power of Auto ML and How Does it Work

Winning data science competitions

Viewers also liked

A General Framework for Communication-Efficient Distributed Optimization: Communication remains the most significant bottleneck in the performance of distributed optimization algorithms for large-scale machine learning. In light of this, we propose a general framework, CoCoA, that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication. Our framework enjoys strong convergence guarantees and exhibits state-of-the-art empirical performance in the distributed setting. We demonstrate this performance with extensive experiments in Apache Spark, achieving speedups of up to 50x compared to leading distributed methods on common machine learning objectives.

Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016

MLconf

Aaron Roth is an Associate Professor of Computer and Information Sciences at the University of Pennsylvania, affiliated with the Warren Center for Network and Data Science, and co-director of the Networked and Social Systems Engineering (NETS) program. Previously, he received his PhD from Carnegie Mellon University and spent a year as a postdoctoral researcher at Microsoft Research New England. He is the recipient of a Presidential Early Career Award for Scientists and Engineers (PECASE) awarded by President Obama in 2016, an Alfred P. Sloan Research Fellowship, an NSF CAREER award, and a Yahoo! ACE award. His research focuses on the algorithmic foundations of data privacy, algorithmic fairness, game theory and mechanism design, learning theory, and the intersections of these topics. Together with Cynthia Dwork, he is the author of the book “The Algorithmic Foundations of Differential Privacy.” Abstract Summary: Differential Privacy and Machine Learning: In this talk, we will give a friendly introduction to Differential Privacy, a rigorous methodology for analyzing data subject to provable privacy guarantees, that has recently been widely deployed in several settings. The talk will specifically focus on the relationship between differential privacy and machine learning, which is surprisingly rich. This includes both the ability to do machine learning subject to differential privacy, and tools arising from differential privacy that can be used to make learning more reliable and robust (even when privacy is not a concern).

Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017

MLconf

A Friendly Introduction To Causality: Causality has been studied under several frameworks in statistics and artificial intelligence. We will briefly survey Pearl’s Structural Equation model and explain how interventions can be used to discover causality. We will also present a novel information theoretic framework for discovering causal directions from observational data when interventions are not possible. The starting point is conditional independence in joint probability distributions and no prior knowledge on causal inference is required.

Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...

MLconf

Ben Lau is a quantitative researcher in a macro hedge fund in Hong Kong and he looks to apply mathematical models and signal processing techniques to study the financial market. Prior joining the financial industry, he specialized in using his mathematical modelling skills to discover the mysteries of the universe whilst working at Stanford Linear Accelerator Centre, a national accelerator laboratory where he studied the asymmetry between matter and antimatter by analysing tens of billions of collision events created by the particle accelerators. Ben was awarded his Ph.D. in Particle Physics from Princeton University and his undergraduate degree (with First Class Honours) at the Chinese University of Hong Kong. Abstract Summary: Deep Reinforcement Learning: Developing a robotic car with the ability to form long term driving strategies is the key for enabling fully autonomous driving in the future. Reinforcement learning has been considered a strong AI paradigm which can be used to teach machines through interaction with the environment and by learning from their mistakes. In this talk, we will discuss how to apply deep reinforcement learning technique to train a self-driving car under an open source racing car simulator called TORCS. I am going to share how this is implemented and will discuss various challenges in this project.

Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017

MLconf

Hanie Sedghi is a Research Scientist at Allen Institute for Artificial Intelligence (AI2). Her research interests include large-scale machine learning, high-dimensional statistics and probabilistic models. More recently, she has been working on inference and learning in latent variable models. She has received her Ph.D. from University of Southern California with a minor in Mathematics in 2015. She was also a visiting researcher at University of California, Irvine working with professor Anandkumar during her Ph.D. She received her B.Sc. and M.Sc. degree from Sharif University of Technology, Tehran, Iran. Abstract summary Beating Perils of Non-convexity:Guaranteed Training of Neural Networks using Tensor Methods: Neural networks have revolutionized performance across multiple domains such as computer vision and speech recognition. However, training a neural network is a highly non-convex problem and the conventional stochastic gradient descent can get stuck in spurious local optima. We propose a computationally efficient method for training neural networks that also has guaranteed risk bounds. It is based on tensor decomposition which is guaranteed to converge to the globally optimal solution under mild conditions. We explain how this framework can be leveraged to train feedforward and recurrent neural networks.

Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...

MLconf

Jeff Bradshaw is the founder of Adaptris and Group CTO of Adaptris/F4F/DBT within Reed Business Information. He has spent his career integrating data wherever it resides and in-flight across a number of industries including Agriculture, Airlines, Telecommunications, Healthcare, Government and Finance. Jeff has worked with and contributed to a number of international standards bodies and continues to work with large enterprises to help them extract value from their data silos and share data seamlessly with their trading partners to achieve business benefit. For the last few years Jeff has been focusing on Big Data and how to gather that across a wide range of sources to help gain insight into the agri-food supply chain. Abstract Summary: Precision agriculture – Predicting outcomes for farmers using machine learning to help feed the world: Agricultural data is vast, often unstructured and includes many challenges when working with legacy farm systems on premise in rural areas. For instance, traditional farm equipment such as tractors, sprayers, and combines aren’t often from the same vendor, and it’s complex moving data between them. This is further complicated with the vast array of other systems used by our farmers. Furthermore, the number of sensors in agriculture is astonishing, whether it is sensors that measure the gait of the cow walking into the dairy parlor, or chickens that are pecking. All this data needs to turn into usable information on a global scale to improve the yields farmers get and provide greater visibility into what’s going on both in and out of the farm. In this session, a case study will be shared on how data was collected, normalized and analyzed leveraging the open source HPCC Systems platform from remote Farm Management Systems (used by farmers to manage their farms), and when merged with weather data, soil data and actual machinery data, the analyzed predictions is used to feed Agronomists and Crop Protection/Seed Manufacturers to get recommendations back. The goal is to deliver a precision agriculture solution, helping farmers increase their yield, which then helps feed the growing population of the world.

Jeff Bradshaw, Founder, Adaptris

MLconf

Layla El Asri is a research Scientist at Maluuba. Her work explores artificial intelligence in the context of language understanding, dialogue and human-machine interaction. Layla leads a team seeking to build artificial intelligence systems that are knowledgeable and can exchange information with users to help users accomplish tasks or gain knowledge. Layla completed her PhD at Université de Lorraine in France. Abstract Summary: Teaching AI To Make Decisions and Communicate: Many advances have been made in the area of artificial intelligence, with the goal of building agents that understand how they can interact with their environments, reason and solve complex tasks, and communicate their findings to humans. In this talk, I will focus on efficient decision-making and communication. For decision-making, I will present some work on building an efficient representation of the environment and breaking down tasks into generalizable subtasks. For communication, I will focus on dialogue through natural language and present some of our work in this area.

Layla El Asri, Research Scientist, Maluuba

MLconf

Yi Wang is the tech lead of AI Platform at Baidu. The team is a primary contributor of PaddlePaddle, the open source deep learning platform originally developed in Baidu. Before Baidu, he was a founding member of ScaledInference, a Palo Alto-based AI startup company. Before that, he was a senior staff at LinkedIn, engineering director of advertising system at Tencent, and researcher at Google. Abstract Summary: Fault-tolerable Deep Learning on General-purpose Clusters: Researchers have been used to running deep learning jobs on clusters. In industrial applications, AI is built on top of big data and deep learning is only one stage of the data pipeline. That is where MPI-based clusters are not enough, and general-purpose cluster management systems are necessary to run Web servers like Nginx, log collectors like fluentd and Kafka, data processors on top of Hadoop, Spark, and Storm, and deep learning, which improves the Web service quality. This talk explains how we integrate PaddlePaddle and Kubernetes to provide an open source fault-tolerable large-scale deep learning platform.

Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017

MLconf

Caroline Sinders is a machine learning designer/user researcher, artist. For the past few years, she has been focusing on the intersections of natural language processing, artificial intelligence, abuse, online harassment and politics in digital, conversational spaces.Caroline is a designer and researcher at Wikimedia, and a BuzzFeed/Eyebeam Open Lab Fellow. She holds a masters from New York University’s Interactive Telecommunications Program from New York University Emotional Trauma and Machine Learning How do we create, code and make emotional data inside of systems? And how do we create the necessary context t in larger systems that use data. Is it possible to use machine learning to solve very hard problems around conversation? For the past two years, I’ve been studying internet culture, online conversations, memes, and online harassment. I also worked as a user researcher at IBM Watson helping design and layout systems for chat bot software. As a designer and researcher interested in all of the nuances of human conversations and emotions, from humor to sadness, to memes and harassment, I wonder is it possible to code in emotions for machine learning systems? And what are the ethical implications of that? Can we design systems to mitigate harassment, to elevate humor? And can these systems promote human agency, and allow for participation from users to help decide and structure the system the talk in- can design and user participation help set what is harassment and what is not? With machine learning, often the creators of the system are deciding what norms of the system and the users are left out of the collaboration. How do we create systems that are transparent for users, that also facilitate user participation? With online communities, communication, and culture, users make, users, do, users are the community.

Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...

MLconf

Why Machine Learning Algorithms Fall Short (And What You Can Do About It): Many think that machine learning is all about the algorithms. Want a self-learning system? Get your data, start coding or hire a PhD that will build you a model that will stand the test of time. Of course we know that this is not enough. Models degrade over time, algorithms that work great on yesterday’s data may not be the best option, new data sources and types are made available. In short, your self-learning system may not be learning anything at all. In this session, we will examine how to overcome challenges in creating self-learning systems that perform better and are built to stand the test of time. We will show how to apply mathematical optimization algorithms that often prove superior to local optimization methods favored by typical machine learning applications and discuss why these methods can crate better results. We will also examine the role of smart automation in the context of machine learning and how smart automation can create self-learning systems that are built to last.

Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...

MLconf

Alex Smola is the Manager of the Cloud Machine Learning Platform at Amazon. Prior to his role at Amazon, Smola was a Professor in the Machine Learning Department of Carnegie Mellon University and cofounder and CEO of Marianas Labs. Prior to that he worked at Google Strategic Technologies, Yahoo Research, and National ICT Australia. Prior to joining CMU, he was professor at UC Berkeley and the Australian National University. Alex obtained his PhD at TU Berlin in 1998. He has published over 200 papers and written or coauthored 5 books. Abstract summary Personalization and Scalable Deep Learning with MXNET: User return times and movie preferences are inherently time dependent. In this talk I will show how this can be accomplished efficiently using deep learning by employing an LSTM (Long Short Term Model). Moreover, I will show how to train large scale distributed parallel models using MXNet efficiently. This includes a brief overview of key components of defining networks, of optimization, and a walkthrough of the steps required to allocate machines, and to train a model.

Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016

MLconf

Neural Turing Machines: Perils and Promise: Daniel Shank is a Senior Data Scientist at Talla, a company developing a platform for intelligent information discovery and delivery. His focus is on developing machine learning techniques to handle various business automation tasks, such as scheduling, polls, expert identification, as well as doing work on NLP. Before joining Talla as the company’s first employee in 2015, Daniel worked with TechStars Boston and did consulting work for ThriveHive, a small business focused marketing company in Boston. He studied economics at the University of Chicago.

Daniel Shank, Data Scientist, Talla at MLconf SF 2016

MLconf

Andrew recently joined Lucidworks to head up their Advisory practice, and is a Committer and PMC member on the Apache Mahout project. Abstract summary Apache Mahout: Distributed Matrix Math for Machine Learning: Machine learning and statistics tools like R and Scikit-learn are declarative, flexible, and extensible, but they scale poorly. “Big Data” tools such as Apache Spark, Apache Flink, and H2O distribute well, but have rudimentary functionality for machine learning and are not easily extensible. In this talk we present Apache Mahout, which provides a Scala-based, R-like DSL for doing linear algebra on distributed systems, letting practitioners quickly implement algorithms on distributed matrices. We will highlight new features in version 0.13 including the hybrid CPU/GPU-optimized engine, and a new framework for user-contributed methods and algorithms similar to R’s CRAN. We will cover some history of Mahout, introduce the R-Like Scala DSL, provide an overview of how Mahout is able to operate on matrices distributed across multiple computers, and how it takes advantage of GPUs on each computer in a cluster creating a hybrid distributed/GPU-accelerated environment; then demonstrate the kinds of normally complex or unfeasible problems users can easily solve with Mahout; show an integration which allows Mahout to leverage the visualization packages of projects such as R, Python, and D3; and lastly explain how to develop algorithms and submit them to the Mahout project for other users to use.

Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...

MLconf

Discerning Human Behavior from Mobility Data: Mobility data encompasses many elements, including location history, latitude coordinates, longitude coordinates, anonymized mobile device IDs, and timestamps. Such data are generated, for instance, by automobile navigation applications and by the mobile advertising ecosystem. Typical sources of mobility data contain extensive inaccuracies that result from a variety of sources, ranging from shortcomings in location services on mobile devices to the intentional misrepresentation of spatial coordinates by bad ecosystem actors. In this talk, we describe a production data pipeline, Darwin, which analyzes the location quality of mobility data to measure how accurately a set of mobility data represents true movement patterns. Darwin uses a number of measures that are ultimately combined into two quality scores: hyper-locality and clusterability. These measurements include techniques from information theory, the mean number of spatial clusters, the compactness of the clusters, and the differences between the empirical distribution of digits in the spatial coordinates and reference distributions.

Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016

MLconf

Using Bayesian Optimization to Tune Machine Learning Models: In this talk we briefly introduce Bayesian Global Optimization as an efficient way to optimize machine learning model parameters, especially when evaluating different parameters is time-consuming or expensive. We will motivate the problem and give example applications. We will also talk about our development of a robust benchmark suite for our algorithms including test selection, metric design, infrastructure architecture, visualization, and comparison to other standard and open source methods. We will discuss how this evaluation framework empowers our research engineers to confidently and quickly make changes to our core optimization engine. We will end with an in-depth example of using these methods to tune the features and hyperparameters of a real world problem and give several real world applications.

Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016

MLconf

Smart Reply: Learning a Model of Conversation from Data: Smart Reply is a text assistance feature that was recently introduced to Inbox by Gmail. Given an incoming email message, the Smartreply system analyzes its contents and suggests complete responses that the recipient can send with just one tap. This talk will cover how we built Smartreply using a combination of deep learning and semantic clustering, as well as what we learned along the way and why we think it shows promise for the future of dialogue models.

Anjuli Kannan, Software Engineer, Google at MLconf SF 2016

MLconf

DL4J and DataVec for Enterprise Deep Learning Workflows: Applications in NLP, sensor processing (IoT), image processing, and audio processing have all emerged as prime deep learning applications. In this session we will take a look at a practical review of building practical and secure Deep Learning workflows in the enterprise. We’ll see how DL4J’s DataVec tool enables scalable ETL and vectorization pipelines to be created for a single machine or scale out to Spark on Hadoop. We’ll also see how Deep Networks such as Recurrent Neural Networks are able to leverage DataVec to more quickly process data for modeling.

Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016

MLconf

Local Search Optimization for Hyper-Parameter Tuning: Many machine learning algorithms are sensitive to their hyper-parameter settings, lacking good universal rule-of-thumb defaults. In this talk we discuss the use of black-box local search optimization (LSO) for machine learning hyper-parameter tuning. Viewed as a black-box objective function of hyper-parameters, machine learning algorithms create a difficult class of optimization problems. The corresponding objective functions involved tend to be nonsmooth, discontinuous, unpredictably computationally expensive, requiring support for both continuous, categorical, and integer variables. Further evaluations can fail for a variety of reasons such as early exits due to node failure or hitting max time. Additionally, not all hyper-parameter combinations are compatible (creating so called “hidden constraints”). In this context, we apply a parallel hybrid derivative-free optimization algorithm that can make progress despite these difficulties providing significantly improved results over default settings with minimal user interaction. Further, we will address efficient parallel paradigms for different types of machine learning problems, while exploring the importance of validation to avoid overfitting and emphasizing that even for small data problems, the need to perform cross validations can create computationally intense functions that benefit from a distributed/threaded environment.

Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...

MLconf

Irina Rish is a researcher at the AI Foundations department of the IBM T.J. Watson Research Center. She received MS in Applied Mathematics from Moscow Gubkin Institute, Russia, and PhD in Computer Science from the University of California, Irvine. Her areas of expertise include artificial intelligence and machine learning, with a particular focus on probabilistic graphical models, sparsity and compressed sensing, active learning, and their applications to various domains, ranging from diagnosis and performance management of distributed computer systems (“autonomic computing”) to predictive modeling and statistical biomarker discovery in neuroimaging and other biological data. Irina has published over 60 research papers, several book chapters, two edited books, and a monograph on Sparse Modeling, taught several tutorials and organized multiple workshops at machine-learning conferences, including NIPS, ICML and ECML. She holds 24 patents and several IBM awards. Irina currently serves on the editorial board of the Artificial Intelligence Journal (AIJ). As an adjunct professor at the EE Department of Columbia University, she taught several advanced graduate courses on statistical learning and sparse signal modeling. Abstract Summary: Learning About the Brain and Brain-Inspired Learning: Quantifying mental states and identifying statistical biomarkers of mental disorders from neuroimaging data is an exciting and rapidly growing research area at the intersection of neuroscience and machine learning, with the particular focus on interpretability and reproducibility of learned models. We will discuss promises and limitations of machine-learning methods in such applications, focusing on recent applications of deep learning methods such as recurrent convnets to the analysis of “brain movies” (EEG) data. On the other hand, besides the above “AI to Brain” direction, we will also discuss the “Brain to AI”, namely, borrowing ideas from neuroscience to improve machine learning, with specific focus on adult neurogenesis and online model adaptation in representation learning.

Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017

MLconf

Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016

MLconf

Viewers also liked (20)

Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016

Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017

Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...

Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017

Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...

Jeff Bradshaw, Founder, Adaptris

Layla El Asri, Research Scientist, Maluuba

Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017

Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...

Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...

Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016

Daniel Shank, Data Scientist, Talla at MLconf SF 2016

Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...

Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016

Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016

Anjuli Kannan, Software Engineer, Google at MLconf SF 2016

Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016

Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...

Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017

Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016

Similar to Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017

Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...

Alok Singh

Guiding through a typical Machine Learning Pipeline

Michael Gerke

Automated Machine Learning (AutoML) has received significant interest recently. We believe that the right automation would bring significant value and dramatically shorten time-to-value for data science teams. Databricks is automating the Data Science and Machine Learning process through a combination of product offerings, partnerships, and custom solutions. This talk will focus on how Databricks can help automate hyperparameter tuning. For both traditional Machine Learning and modern Deep Learning, tuning hyperparameters can dramatically increase model performance and improve training times. However, tuning can be a complex and expensive process. In this talk, we'll start with a brief survey of the most popular techniques for hyperparameter tuning (e.g., grid search, random search, and Bayesian optimization). We will then discuss open source tools that implement each of these techniques, helping to automate the search over hyperparameters. Finally, we will discuss and demo improvements we built for these tools in Databricks, including integration with MLflow: Apache PySpark MLlib integration with MLflow for automatically tracking tuning Hyperopt integration with Apache Spark to distribute tuning and with MLflow for automatic tracking Recording and notebooks will be provided after the webinar so that you can practice at your own pace. Presenters Joseph Bradley, Software Engineer, Databricks Joseph Bradley is a Software Engineer and Apache Spark PMC member working on Machine Learning at Databricks. Previously, he was a postdoc at UC Berkeley after receiving his Ph.D. in Machine Learning from Carnegie Mellon in 2013. Yifan Cao, Senior Product Manager, Databricks Yifan Cao is a Senior Product Manager at Databricks. His product area spans ML/DL algorithms and Databricks Runtime for Machine Learning. Prior to Databricks, Yifan worked on two Machine Learning products, applying NLP to find metadata and applying machine learning to predict equipment failures. He helped build the products from ground up to multi-million dollars in ARR. Yifan started his career as a researcher in quantum computing. Yifan received his B.S in UC Berkeley and Master from MIT.

Automated Hyperparameter Tuning, Scaling and Tracking

Databricks

This poster is presenting a methodology for entity matching of product web offers. It was presented during the 8th Euroscipy conference in end of august of 2015. This poster is presenting Pricing Assistant’s recent work on product matching. The goal was to create a tool capable of determining if two web pages are selling the same product. Our approach combines various techniques from the fields of image analysis, semantic analysis and machine learning. The technique had great results and outperformed existing literature in fields such as skincare, cycling equipment and sporting goods.

Entity matching of web offers, from html to similarity score.

Paul Puget

Foutse_Khomh.pptx

Foutse Khomh

Tuning 2.0: Advanced Optimization Techniques Webinar

SigOpt

CODE TUNINGtertertertrtryryryryrtytrytrtry

kapib57390

Most large-scale online recommender systems like newsfeed ranking, people recommendations, job recommendations, etc. often have multiple utilities or metrics that need to be simultaneously optimized. The machine learning models that are trained to optimize a single utility are combined together through parameters to generate the final ranking function. These combination parameters drive business metrics. Finding the right choice of the parameters is often done through online A/B experimentation, which can be incredibly complex and time-consuming, especially considering the non-linear effects of these parameters on the metrics of interest. In this tutorial, we will talk about how we can apply Bayesian Optimization techniques to obtain the parameters for such complex online systems in order to balance the competing metrics. First, we will provide an in-depth introduction to Bayesian Optimization, covering some of the basics as well as the recent advances in the field. Second, we will talk about how to formulate a real-world recommender system problem as a black-box optimization problem that can be solved via Bayesian Optimization. We will focus on a few key problems such as newsfeed ranking, people recommendations, job recommendations, etc. Third, we will talk about the architecture of the solution and how we are able to deploy it for large-scale systems. Finally, we will discuss the extensions and some of the future directions in this domain.

Ijcai 2020

Viral Gupta

This presentation was made on June 9th, 2020. Video recording of the session can be viewed here: https://youtu.be/OCB9sTUnUug In this meetup with Sanyam Bhutani, Machine Learning Engineer at H2O.ai, he gives a recap of the eight annual ICLR (International Conference on Learning Representations) 2020 - a niche deep learning conference whose focus is to study how to learn representations of data, which is basically what deep learning does. Sanyam goes through a few of his favorite selected papers from this year’s ICLR, note this session may not be able to capture the richness of all papers or allow a detailed discussion. You will be able to find Sanyam in our community slack (https://www.h2o.ai/slack-community/), please feel free to start a discussion with him, if you send a  emoji greeting, you’ll find the answers. Following are the papers we will look into: U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty Your classifier is secretly an energy based model and you should treat it like one ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators ALBERT: A Lite BERT for Self-supervised Learning of Language Representations Reformer: The Efficient Transformer Generative Models for Effective ML on Private, Decentralized Datasets Once for All: Train One Network and Specialize it for Efficient Deployment Thieves on Sesame Street! Model Extraction of BERT-based APIs Plug and Play Language Models: A Simple Approach to Controlled Text Generation BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning Real or Not Real, that is the Question

ICLR 2020 Recap

Sri Ambati

Performance Tuning with XHProf

Salesforce Engineering

Hyperparameter tuning and optimization is a powerful tool in the area of AutoML, for both traditional statistical learning models as well as for deep learning. There are many existing tools to help drive this process, including both blackbox and whitebox tuning. In this talk, we'll start with a brief survey of the most popular techniques for hyperparameter tuning (e.g., grid search, random search, Bayesian optimization, and parzen estimators) and then discuss the open source tools which implement each of these techniques. Finally, we will discuss how we can leverage MLflow with these tools and techniques to analyze how our search is performing and to productionize the best models. Speaker: Joseph Bradley

Best Practices for Hyperparameter Tuning with MLflow

Databricks

MLPerf an industry standard benchmark suite for machine learning performance

jemin lee

Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...

gdgsurrey

housing price prediction.pptx

JINALVASOYA2

Pre-Report.pptx

TANVIBENPATEL

Modeling at Scale: SigOpt at TWIMLcon 2019

SigOpt

Rails Conf 2014 Concerns, Decorators, Presenters, Service-objects, Helpers, H...

Justin Gordon

Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...

Lucidworks

Using Bayesian Optimization to Tune Machine Learning Models

SigOpt

Test Automation

Rodrigo Paiva

Similar to Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017 (20)

Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...

Guiding through a typical Machine Learning Pipeline

Automated Hyperparameter Tuning, Scaling and Tracking

Entity matching of web offers, from html to similarity score.

Foutse_Khomh.pptx

Tuning 2.0: Advanced Optimization Techniques Webinar

CODE TUNINGtertertertrtryryryryrtytrytrtry

Ijcai 2020

ICLR 2020 Recap

Performance Tuning with XHProf

Best Practices for Hyperparameter Tuning with MLflow

MLPerf an industry standard benchmark suite for machine learning performance

Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...

housing price prediction.pptx

Pre-Report.pptx

Modeling at Scale: SigOpt at TWIMLcon 2019

Rails Conf 2014 Concerns, Decorators, Presenters, Service-objects, Helpers, H...

Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...

Using Bayesian Optimization to Tune Machine Learning Models

Test Automation

More from MLconf

Understanding Human Impact: Social and Equity Assessments for AI Technologies Social and Equity Impact Assessments have broad applications but can be a useful tool to explore and mitigate for Machine Learning fairness issues and can be applied to product specific questions as a way to generate insights and learnings about users, as well as impacts on society broadly as a result of the deployment of new and emerging technologies. In this presentation, my goal is to advocate for and highlight the need to consult community and external stakeholder engagement to develop a new knowledge base and understanding of the human and social consequences of algorithmic decision making and to introduce principles, methods and process for these types of impact assessments.

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

MLconf

The Brain’s Guide to Dealing with Context in Language Understanding Like the visual cortex, the regions of the brain involved in understanding language represent information hierarchically. But whereas the visual cortex organizes things into a spatial hierarchy, the language regions encode information into a hierarchy of timescale. This organization is key to our uniquely human ability to integrate semantic information across narratives. More and more, deep learning-based approaches to natural language understanding embrace models that incorporate contextual information at varying timescales. This has not only led to state-of-the art performance on many difficult natural language tasks, but also to breakthroughs in our understanding of brain activity. In this talk, we will discuss the important connection between language understanding and context at different timescales. We will explore how different deep learning architectures capture timescales in language and how closely their encodings mimic the brain. Along the way, we will uncover some surprising discoveries about what depth does and doesn’t buy you in deep recurrent neural networks. And we’ll describe a new, more flexible way to think about these architectures and ease design space exploration. Finally, we’ll discuss some of the exciting applications made possible by these breakthroughs.

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

MLconf

Applying Computer Vision to Reduce Contamination in the Recycling Stream With China’s recent refusal of most foreign recyclables, North American waste haulers are scrambling to figure out how to make on-shore recycling cost-effective in order to continue providing recycling services. Recyclables that were once being shipped to China for manual sorting are now primarily being redirected to landfills or incinerators. Without a solution, a nearly $5 billion annual recycling market could come to a halt. Purity in the recycling stream is key to this effort as contaminants in the stream can increase the cost of operations, damage equipment and reduce the ability to create pure commodities suitable for creating recycled goods. This market disruption as a result of China’s new regulations, however, provides us the chance to re-examine and improve our current disposal & collection habits with modern monitoring & artificial intelligence technology. Using images from our in-dumpster cameras, Compology has developed an ML-based process that helps identify, measure and alert for contaminants in recycling containers before they are picked-up, helping keep the recycling stream clean. Our convolutional neural network flags potential instances of contamination inside a dumpster, enabling garbage haulers to know which containers have the wrong type of material inside. This allows them to provide targeted, timely education, and when appropriate, assess fines, to improve recycling compliance at the businesses and residences they serve, helping keep recycling services financially viable. In this presentation, we will walk through our ML-based contamination measurement and scoring process by showing how Waste Management, a national waste hauler, has experienced 57% contamination reduction in nearly 2,000 containers over six months, This progress shows significant strides towards financially viable recycling services.

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

MLconf

Quantum Computing: a Treasure Hunt, not a Gold Rush Quantum computers promise a significant step up in computational power over conventional computers, but also suffer a number of counterintuitive limitations --- both in their computational model and in leading lab implementations. In this talk, we review how quantum computers compete with conventional computers and how conventional computers try to hold their ground. Then we outline what stands in the way of successful quantum ML applications.

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

MLconf

Data Labeling as Religious Experience One of the most common places to deploy a production machine learning systems is as a replacement for a legacy rules-based system that is having a hard time keeping up with new edge cases and requirements. I'll be walking through the process and tooling we used to help us design, train, and deploy a model to replace a set of static rules we had for handling invite spam at Slack, talk about what we learned, and discuss some problems to solve in order to make these migrations easier for everyone.

Josh Wills - Data Labeling as Religious Experience

MLconf

Project GaitNet: Ushering in the ImageNet moment for human Gait kinematics The emergence of the upright human bipedal gait can be traced back 4 to 2.8 million years ago, to the now extinct hominin Australopithecus afarensis. Fine grained analysis of gait using the modern MEMS sensors found on all smartphones not just reveals a lot about the person’s orthopedic and neuromuscular health status, but also has enough idiosyncratic clues that it can be harnessed as a passive biometric. While there were many siloed attempts made by the machine learning community to model Bipedal Gait sensor data, these were done with small datasets oft collected in restricted academic environs. In this talk, we will introduce the ImageNet moment for human gait analysis by presenting 'Project GaitNet', the largest ever planet-sized motion sensor based human bipedal gait dataset ever curated. We’ll also present the associated state-of-the-art results in classifying humans harnessing novel deep neural architectures and the related success stories we have enjoyed in transfer-learning into disparate domains of human kinematics analysis.

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

MLconf

Machine Learning Methods in Detecting Alzheimer’s Disease from Speech and Language Alzheimer's disease affects millions of people worldwide, and it is important to predict the disease as early and as accurate as possible. In this talk, I will discuss development of novel ML models that help classifying healthy people from those who develop Alzheimer's, using short samples of human speech. As an input to the model, features of different modalities are extracted from speech audio samples and transcriptions: (1) syntactic measures, such as e.g. production rules extracted from syntactic parse trees, (2) lexical measures, such as e.g. features of lexical richness and complexity and lexical norms, and (3) acoustic measures, such as e.g. standard Mel-frequency cepstral coefficients. I will present the ML model that detects cognitive impairment by reaching agreement among modalities. The resulting model is able to achieve state of the art performance in both supervised and semi-supervised manner, using manual transcripts of human speech. Additionally, I will discuss potential limitations of any fully-automated speech-based Alzheimer's disease detection model, focusing mostly on the analysis of the impact of a not-so-accurate automatic speech recognition (ASR) on the classification performance. To illustrate this, I will present the experiments with controlled amounts of artificially generated ASR errors and explain how the deletion errors affect Alzheimer's detection performance the most, due to their impact on the features of syntactic and lexical complexity.

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

MLconf

Optimized Image Classification on the Cheap In this talk, we anchor on building an image classifier trained on the Stanford Cars dataset to evaluate two approaches to transfer learning -fine tuning and feature extraction- and the impact of hyperparameter optimization on these techniques. Once we define the most performant transfer learning technique for Stanford Cars, we will double the size of the dataset through image augmentation to boost the classifier’s performance. We will use Bayesian optimization to learn the hyperparameters associated with image transformations using the downstream image classifier’s performance as the guide. In conjunction with model performance, we will also focus on the features of these augmented images and the downstream implications for our image classifier. To both maximize model performance on a budget and explore the impact of optimization on these methods, we apply a particularly efficient implementation of Bayesian optimization to each of these architectures in this comparison. Our goal is to draw on a rigorous set of experimental results that can help us answer the question: how can resource-constrained teams make trade-offs between efficiency and effectiveness using pre-trained models?

Meghana Ravikumar - Optimized Image Classification on the Cheap

MLconf

The Importance of Modeling Data Collection Data sets used in machine learning are often collected in a systematically biased way - certain data points are more likely to be collected than others. We call this "observation bias". For example, in health care, we are more likely to see lab tests when the patient is feeling unwell than otherwise. Failing to account for observation bias can, of course, result in poor predictions on new data. By contrast, properly accounting for this bias allows us to make better use of the data we do have. In this presentation, we discuss practical and theoretical approaches to dealing with observation bias. When the nature of the bias is known, there are simple adjustments we can make to nonparametric function estimation techniques, such as Gaussian Process models. We also discuss the scenario where the data collection model is unknown. In this case, there are steps we can take to estimate it from observed data. Finally, we demonstrate that having a small subset of data points that are known to be collected at random - that is, in an unbiased way - can vastly improve our ability to account for observation bias in the rest of the data set. My hope is that attendees of this presentation will be aware of the perils of observation bias in their own work, and be equipped with tools to address it.

Noam Finkelstein - The Importance of Modeling Data Collection

MLconf

The Uncanny Valley of ML Every so often, the conundrum of the Uncanny Valley re-emerges as advanced technologies evolve from clearly experimental products to refined accepted technologies. We have seen its effects in robotics, computer graphics, and page load times. The debate of how to handle the new technology detracts from its benefits. When machine learning is added to human decision systems a similar effect can be measured in increased response time and decreased accuracy. These systems include radiology, judicial assignments, bus schedules, housing prices, power grids and a growing variety of applications. Unfortunately, the Uncanny Valley of ML can be hard to detect in these systems and can lead to degraded system performance when ML is introduced, at great expense. Here, we'll introduce key design principles for introducing ML into human decision systems to navigate around the Uncanny Valley and avoid its pitfalls.

June Andrews - The Uncanny Valley of ML

MLconf

Deep Learning Architectures for Semantic Relation Detection Tasks Recognizing and distinguishing specific semantic relations from other types of semantic relations is an essential part of language understanding systems. Identifying expressions with similar and contrasting meanings is valuable for NLP systems which go beyond recognizing semantic relatedness and require to identify specific semantic relations. In this talk, I will first present novel techniques for creating labelled datasets required for training deep learning models for classifying semantic relations between phrases. I will further present various neural network architectures that integrate morphological features into integrated path-based and distributional relation detection algorithms and demonstrate that this model outperforms state-of-the-art models in distinguishing semantic relations and is capable of efficiently handling multi-word expressions.

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

MLconf

Building an Incrementally Trained, Local Taste Aware, Global Deep Learned Recommender System Model At Netflix, our main goal is to maximize our members’ enjoyment of the selected show by minimizing the amount of time it takes for them to find it. We try to achieve this goal by personalizing almost all the aspects of our product -- from what shows to recommend, to how to present these shows and construct their home-pages to what images to select per show, among many other things. Everything is recommendations for us and as an applied Machine Learning group, we spend our time building models for personalization that will eventually increase the joy and satisfaction of our members. In this talk we will primarily focus our attention on a) making a global deep learned recommender model that is regional tastes and popularity aware and b) adapting this model to changing taste preferences as well as dynamic catalog availability. We will first go through some standard recommender system models that use Matrix Factorization and Topic Models and then compare and contrast them with more powerful and higher capacity deep learning based models such as sequence models that use recurrent neural networks. We will show what it entails to build a global model that is aware of regional taste preferences and catalog availability. We will show how models that are built on simple Maximum Likelihood principle fail to do that. We will then describe one solution that we have employed in order to enable the global deep learned models to focus their attention on capturing regional taste preferences and changing catalog.In the latter half of the talk, we will discuss how we do incremental learning of deep learned recommender system models. Why do we need to do that ? Everything changes with time. Users’ tastes change with time. What’s available on Netflix and what’s popular also change over time. Therefore, updating or improving recommendation systems over time is necessary to bring more joy to users. In addition to how we apply incremental learning, we will discuss some of the challenges we face involving large-scale data preparation, infrastructure setup for incremental model training as well as pipeline scheduling. The incremental training enables us to serve fresher models trained on fresher and larger amounts of data. This helps our recommender system to nicely and quickly adapt to catalog and users’ taste changes, and improve overall performance.

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

MLconf

Vito Ostuni - The Voice: New Challenges in a Zero UI World The adoption of voice-enabled devices has seen an explosive growth in the last few years and music consumption is among the most popular use cases. Music personalization and recommendation plays a major role at Pandora in providing a daily delightful listening experience for millions of users. In turn, providing the same perfectly tailored listening experience through these novel voice interfaces brings new interesting challenges and exciting opportunities. In this talk we will describe how we apply personalization and recommendation techniques in three common voice scenarios which can be defined in terms of request types: known-item, thematic, and broad open-ended. We will describe how we use deep learning slot filling techniques and query classification to interpret the user intent and identify the main concepts in the query. We will also present the differences and challenges regarding evaluation of voice powered recommendation systems. Since pure voice interfaces do not contain visual UI elements, relevance labels need to be inferred through implicit actions such as play time, query reformulations or other types of session level information. Another difference is that while the typical recommendation task corresponds to recommending a ranked list of items, a voice play request translates into a single item play action. Thus, some considerations about closed feedback loops need to be made. In summary, improving the quality of voice interactions in music services is a relatively new challenge and many exciting opportunities for breakthroughs still remain. There are many new aspects of recommendation system interfaces to address to bring a delightful and effortless experience for voice users. We will share a few open challenges to solve for the future.

Vito Ostuni - The Voice: New Challenges in a Zero UI World

MLconf

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

MLconf

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

MLconf

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

MLconf

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

MLconf

Soumith Chintala - Increasing the Impact of AI Through Better Software

MLconf

Roy Lowrance - Predicting Bond Prices: Regime Changes

MLconf

Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...

MLconf

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

Josh Wills - Data Labeling as Religious Experience

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

Meghana Ravikumar - Optimized Image Classification on the Cheap

Noam Finkelstein - The Importance of Modeling Data Collection

June Andrews - The Uncanny Valley of ML

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

Vito Ostuni - The Voice: New Challenges in a Zero UI World

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

Soumith Chintala - Increasing the Impact of AI Through Better Software

Roy Lowrance - Predicting Bond Prices: Regime Changes

Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts

Maria Levchenko

Partners Life - Insurer Innovation Award 2024

The Digital Insurer

The presentation explores the development and application of artificial intelligence (AI) from its inception to its current status in the modern world. The term "artificial intelligence" was first coined by John McCarthy in 1956 to describe efforts to develop computer programs capable of performing tasks that typically require human intelligence. This concept was first introduced at a conference held at Dartmouth College, where programs demonstrated capabilities such as playing chess, proving theorems, and interpreting texts. In the early stages, Alan Turing contributed to the field by defining intelligence as the ability of a being to respond to certain questions intelligently, proposing what is now known as the Turing Test to evaluate the presence of intelligent behavior in machines. As the decades progressed, AI evolved significantly. The 1980s focused on machine learning, teaching computers to learn from data, leading to the development of models that could improve their performance based on their experiences. The 1990s and 2000s saw further advances in algorithms and computational power, which allowed for more sophisticated data analysis techniques, including data mining. By the 2010s, the proliferation of big data and the refinement of deep learning techniques enabled AI to become mainstream. Notable milestones included the success of Google's AlphaGo and advancements in autonomous vehicles by companies like Tesla and Waymo. A major theme of the presentation is the application of generative AI, which has been used for tasks such as natural language text generation, translation, and question answering. Generative AI uses large datasets to train models that can then produce new, coherent pieces of text or other media. The presentation also discusses the ethical implications and the need for regulation in AI, highlighting issues such as privacy, bias, and the potential for misuse. These concerns have prompted calls for comprehensive regulations to ensure the safe and equitable use of AI technologies. Artificial intelligence has also played a significant role in healthcare, particularly highlighted during the COVID-19 pandemic, where it was used in drug discovery, vaccine development, and analyzing the spread of the virus. The capabilities of AI in healthcare are vast, ranging from medical diagnostics to personalized medicine, demonstrating the technology's potential to revolutionize fields beyond just technical or consumer applications. In conclusion, AI continues to be a rapidly evolving field with significant implications for various aspects of society. The development from theoretical concepts to real-world applications illustrates both the potential benefits and the challenges that come with integrating advanced technologies into everyday life. The ongoing discussion about AI ethics and regulation underscores the importance of managing these technologies responsibly to maximize their their benefits while minimizing potential harms.

Artificial Intelligence: Facts and Myths

Joaquim Jorge

With more memory available, system performance of three Dell devices increased, which can translate to a better user experience Conclusion When your system has plenty of RAM to meet your needs, you can efficiently access the applications and data you need to finish projects and to-do lists without sacrificing time and focus. Our test results show that with more memory available, three Dell PCs delivered better performance and took less time to complete the Procyon Office Productivity benchmark. These advantages translate to users being able to complete workflows more quickly and multitask more easily. Whether you need the mobility of the Latitude 5440, the creative capabilities of the Precision 3470, or the high performance of the OptiPlex Tower Plus 7010, configuring your system with more RAM can help keep processes running smoothly, enabling you to do more without compromising performance.

Boost PC performance: How more available memory can improve productivity

Principled Technologies

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

What is a good lead in your organisation? Which leads are priority? What happens to leads? When sales and marketing give different answers to these questions, or perhaps aren't sure of the answers at all, frustrations build and opportunities are left on the table. Join us for an illuminating session with Cian McLoughlin, HubSpot Principal Customer Success Manager, as we look at that crucial piece of the customer journey in which leads are transferred from marketing to sales.

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

HampshireHUG

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

Scaling API-first – The story of a global engineering organization

Radu Cotescu

BooK Now Call us at +918448380779 to hire a gorgeous and seductive call girl for sex. Take a Delhi Escort Service. The help of our escort agency is mostly meant for men who want sexual Indian Escorts In Delhi NCR. It should be noted that any impersonator will get 100 attention from our Young Girls Escorts in Delhi. They will assume the position of reliable allies. VIP Call Girl With Original Photos Book Tonight +918448380779 Our Cheap Price 1 Hour not available 2 Hours 5000 Full Night 8000 TAG: Call Girls in Delhi, Noida, Gurgaon, Ghaziabad, Connaught Place, Greater Kailash Delhi, Lajpat Nagar Delhi, Mayur Vihar Delhi, Chanakyapuri Delhi, New Friends Colony Delhi, Majnu Ka Tilla, Karol Bagh, Malviya Nagar, Saket, Khan Market, Noida Sector 18, Noida Sector 76, Noida Sector 51, Gurgaon Mg Road, Iffco Chowk Gurgaon, Rajiv Chowk Gurgaon All Delhi Ncr Free Home Deliver

08448380779 Call Girls In Friends Colony Women Seeking Men

Delhi Call girls

Finology Group – Insurtech Innovation Award 2024

The Digital Insurer

Evaluating the top large language models.pdf

ChristopherTHyatt

This presentations targets students or working professionals. You may know Google for search, YouTube, Android, Chrome, and Gmail, but did you know Google has many developer tools, platforms & APIs? This comprehensive yet still high-level overview outlines the most impactful tools for where to run your code, store & analyze your data. It will also inspire you as to what's possible. This talk is 50 minutes in length.

Powerful Google developer tools for immediate impact! (2023-24 C)

wesley chun

The Raspberry Pi 5 was announced on October 2023. This new version of the popular embedded device comes with a new iteration of Broadcom’s VideoCore GPU platform, and was released with a fully open source driver stack, developed by Igalia. The presentation will discuss some of the major changes required to support this new Video Core iteration, the challenges we faced in the process and the solutions we provided in order to deliver conformant OpenGL ES and Vulkan drivers. The talk will also cover the next steps for the open source Raspberry Pi 5 graphics stack. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://eoss24.sched.com/event/1aBEx

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Igalia

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

Imagine a world where information flows as swiftly as thought itself, making decision-making as fluid as the data driving it. Every moment is critical, and the right tools can significantly boost your organization’s performance. The power of real-time data automation through FME can turn this vision into reality. Aimed at professionals eager to leverage real-time data for enhanced decision-making and efficiency, this webinar will cover the essentials of real-time data and its significance. We’ll explore: FME’s role in real-time event processing, from data intake and analysis to transformation and reporting An overview of leveraging streams vs. automations FME’s impact across various industries highlighted by real-life case studies Live demonstrations on setting up FME workflows for real-time data Practical advice on getting started, best practices, and tips for effective implementation Join us to enhance your skills in real-time data automation with FME, and take your operational capabilities to the next level.

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Safe Software

Automating Google Workspace (GWS) & more with Apps Script

wesley chun

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Rafal Los

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Neo4j

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts

Partners Life - Insurer Innovation Award 2024

Artificial Intelligence: Facts and Myths

Boost PC performance: How more available memory can improve productivity

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

How to Troubleshoot Apps for the Modern Connected Worker

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Scaling API-first – The story of a global engineering organization

08448380779 Call Girls In Friends Colony Women Seeking Men

Finology Group – Insurtech Innovation Award 2024

Evaluating the top large language models.pdf

Powerful Google developer tools for immediate impact! (2023-24 C)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Strategies for Landing an Oracle DBA Job as a Fresher

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Automating Google Workspace (GWS) & more with Apps Script

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017

1. Common Problems in Hyperparameter Optimization Alexandra Johnson @alexandraj777

2. What are Hyperparameters?

3. Hyperparameter Optimization ● Hyperparameter tuning, model tuning, model selection ● Finding "the best" values for the hyperparameters of your model

4. Better Performance ● +315% accuracy boost for TensorFlow ● +49% accuracy boost for xgboost ● -41% error reduction for recommender system

5. #1 Trusting the Defaults

6. ● Default values are an implicit choice ● Defaults not always appropriate for your model ● You may build a classifier that looks like this: Default Values

7. #2 Using the Wrong Metric

8. Choosing a Metric ● Balance long-term and short-term goals ● Question underlying assumptions ● Example from Microsoft

9. Choose Multiple Metrics ● ● Composite Metric ● Multi-metric

10. #3 Overfitting

11. Metric Generalization ● Cross validation ● Backtesting ● Regularization terms

12. Metric Generalization ● Cross validation ● Backtesting ● Regularization terms

13. Metric Generalization ● Cross validation ● Backtesting ● Regularization terms

14. #4 Too Few Hyperparameters

15. Optimize all Parameters at Once

16. Include Feature Parameters

17. Include Feature Parameters

18. Example: xgboost ● Optimized model always performed better with tuned feature parameters ● No matter which optimization method

19. #5 Hand Tuning

20. What is an Optimization Method?

21. You are not an Optimization Method ● Hand tuning is time consuming and expensive ● Algorithms can quickly and cheaply beat expert tuning

22. Grid Search Random Search Bayesian Optimization Use an Algorithm

23. #6 Grid Search

24. No Grid Search Hyper- parameters Model Evaluations 2 100 3 1,000 4 10,000 5 100,000

25. #7 Random Search

26. Random Search ● Theoretically more effective than grid search ● Large variance in results ● No intelligence

27. Use an Intelligent Method Genetic algorithms Bayesian optimization Particle-based methods Convex optimizers Simulated annealing To name a few...

28. SigOpt: Bayesian Optimization Service Three API calls: 1. Define hyperparameters 2. Receive suggested hyperparameters 3. Report observed performance

29. Thank You!

30. Intro Ian Dewancker. SigOpt for ML: TensorFlow ConvNets on a Budget with Bayesian Optimization. Ian Dewancker. SigOpt for ML: Unsupervised Learning with Even Less Supervision Using Bayesian Optimization. Ian Dewancker. SigOpt for ML : Bayesian Optimization for Collaborative Filtering with MLlib. #1 Trusting the Defaults Keras recurrent layers documentation #2 Using the Wrong Metric Ron Kohavi et al. Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained. Xavier Amatriain. 10 Lessons Learning from building ML systems [Video at 19:03]. Image from PhD Comics. See also: SigOpt in Depth: Intro to Multicriteria Optimization. #4 Too Few Hyperparameters Image from TensorFlow Playground. Ian Dewancker. SigOpt for ML: Unsupervised Learning with Even Less Supervision Using Bayesian Optimization. #5 Hand Tuning On algorithms beating experts: Scott Clark, Ian Dewancker, and Sathish Nagappan. Deep Neural Network Optimization with SigOpt and Nervana Cloud. #6 Grid Search NoGridSearch.com References - by Section

31. References - by Section #7 Random Search James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. Ian Dewancker, Michael McCourt, Scott Clark, Patrick Hayes, Alexandra Johnson, George Ke. A Stratified Analysis of Bayesian Optimization Methods. Learn More blog.sigopt.com sigopt.com/research

Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017

Similar to Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017 (20)

More from MLconf

More from MLconf (20)

Recently uploaded

Recently uploaded (20)

Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017