Evan Estola – Data Scientist, Meetup.com at MLconf ATL

•

6 likes•2,936 views

Beyond Collaborative Filtering: using Machine Learning to power recommendations at Meetup Collaborative filtering and other common recommendation algorithms are a powerful technique for some scenarios. I will cover how to design a recommendation system from the ground up using an ensemble classifier and supervised learning to avoid some of the pitfalls of collaborative filtering. From sampling to deployment, we’ve had to invent our approach with few non-academic and non-toy examples to follow. At Meetup we’re all about sharing information and empowering communities, so I’ll present the details of our model as well as some of the new features we are still developing.

Technology

Beyond Collaborative
Filtering: ML &
Recommendations at
Meetup
Evan Estola
Machine Learning Engineer
Meetup.com
evan@meetup.com
@estola

Why Meetup data is cool
● Real people meeting up
● Every meetup could change someone's life
● No ads, just do the best thing
● Oh and >125 million rsvps by 18 million
members
● 3 million rsvps in the last 30 days
○ 1/second

Tools at Meetup
● Hive - SQL on Hadoop
● Spark - Distributed Scala on Hadoop cluster
● Scala - Recommendations service
● R - Data analysis, Model building
● Python - Scripting, Data organizing
● Java - Backend of our web stack

Collaborative Filtering
● Classic recommendations approach
● Users who like this also like this

Weaknesses of CF
● Sparsity
● Cold Start
● Coverage
● Diversity

Why Recs at Meetup are hard
● Incomplete Data (topics)
● Cold start
● Asking user for data is hard
● Going to meetups is scary
● Sparsity
○ Location
○ Low rsvp/person
○ Membership: 0.001%
○ Compare to Netflix Prize Dataset: 1%

Supervised Learning/Classification
● “Inferring a function from labeled training
data”
● Joined Meetup group/Didn’t join Meetup
group

Preprocessing
● Schenectady
● Fake RSVP boosts (+100 guests!)
● Outliers
● Bucketing
● Etc etc

Problem definition and assumptions
● Assumption: if you’re not in a given group,
you don’t want to be
○ Negative samples: groups you’re not in
○ Also a good classifier...
● Membership << expected error rate
○ Solution: sample to 50/50 join/no-join

Ranking
● Model output label no longer explicitly true
○ Luckily, we’d rather rank all of the results anyway
● Use a classifier that gives you a useful
output
○ Fancy black box
○ Logistic Regression
■ Easier to explain

Ensemble Learning
“... use multiple learning algorithms to obtain
better predictive performance than could be
obtained from any of the constituent learning
algorithms”

Ensemble Learning
● Topic match (original algorithm)
● Collaborative Filtering on Topics
● Social algorithm
● Other simple features (Popularity, Gender…)
● Add output of algorithms as features into
Logistic Regression model

Logistic Regression Output
● TopicScore 4.14
● ExtendedTS 0.47
● RelatedTS 0.66
● FbFriends 2.02
● 2ndFbFriends 0.09
● AgeUnmatch -2.40
● GenUnmatch -3.37
● Distance -0.04
● StateMatch 0.54
● CountyMatch 0.41
● ZipScore 0.06
● RsvpScore 0.02

Facebook Likes
● Lots of information, but how to use?
● Map to topics, let training the model take
care of the rest!

Mapping FB Likes to Meetup Topics
● Text based?
○ Go(game) vs Go(lang)?
○ Burton?
● Data approach!
○ Grab most popular topics across all members with
the same like

Normalization
● Top topics for Burton-Likers
○ Meeting New People, Coffee, bla bla
○ Most popular still dominates
● Solution: Normalize based on expected topic
occurrence in sample

Normalization
● For members with a given Like, compare
percent with each topic to expected among
total population
● Total population
○ 20% “Meeting New People”
○ 2% “Snowboarding
● Burton:
○ 20% “Meeting New People”
○ 9% “Snowboarding”

Results
● Generate top topics for all likes
○ Path from member to like to topic to group
● Add Facebook Like based topic match
feature to model
● Positive weight
○ Very good sign!
● Deploy/Split test
○ TBD

Summary
● Supervised Learning for Ranking as
Recommendations is cool
● Simple, interpretable models are cool
● Feature engineering is cool

Thanks!
Smart people come work with me.
http://www.meetup.com/jobs/

What's hot

Just the mere mention of normalization can send shivers down the spine of the most experienced data modeler. And your team members likely hate it more than you do. But normalization does not have to be the bad word it is on most projects. It's likely that your team mates don't fully understand its benefits and that you don't fully understand why they fear it so much. In this month's webinar, Karen demonstrates the good, the bad, and the ugly of basic normalization.

Modeling Webinar: Normalization - It's Not Your Friend... or Your Enemy

DATAVERSITY

This powerpoint gives a technique to approximate (relaxation) discrete Markov Random Field (MRF) using convex programming. This approximated MRF can be used to approximate NP problem. This also proves that NP is not equal P because the MRF convex programming and the approximate MRF convex programming are not the same with removal of some product terms. kung fu Computer Science, Geometric complexity theory

Discrete Markov Random Field Relaxation

Sing Kuang Tan

Maintaining high quality user generated content through machine learning

Nikhil Dandekar

MLSEV Virtual. My first BigML Project

BigML, Inc

Search, Discovery and Questions at Quora

Nikhil Dandekar

MLSEV Virtual. Predictions

BigML, Inc

Playing Trivia with a Bot

Jose Nazario, Ph.D.

Classification Labels in a Fast Moving Environment: Classification problems are very common in ecommerce. Collecting and storing labels from different sources is key to train and evaluate such models. Labels are expensive to obtain, thus selecting which products to get labels for is key to optimally use any available labeling budget, both when training and evaluating a model. At the same time, if available labels are not correctly used, incorrect or suboptimal results can be produced. In this talk I will discuss some of the challenges and potential pitfalls of acquiring and using labels for classification in a quickly evolving environment. I will present a system that store labels, provides a way to select labels to optimize budget while providing accurate and unbias evaluations of the classification models.

Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15

MLconf

Barcelona ML Meetup - Lessons Learned

Xavier Amatriain

What's hot (9)

Modeling Webinar: Normalization - It's Not Your Friend... or Your Enemy

Discrete Markov Random Field Relaxation

Maintaining high quality user generated content through machine learning

MLSEV Virtual. My first BigML Project

Search, Discovery and Questions at Quora

MLSEV Virtual. Predictions

Playing Trivia with a Bot

Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15

Barcelona ML Meetup - Lessons Learned

Viewers also liked

Lessons learned from Running Hundreds of Kaggle Competitions: At Kaggle, we've run hundreds of machine learning competitions and seen over 80,000 data scientists make submissions. One thing is clear: winning competitions isn't random. We've learned that certain tools and methodologies work consistently well on different types of problems. Many participants make common mistakes (such as overfitting) that should be actively avoided. Similarly, competition hosts have their own set of pitfalls (such as data leakage). In this talk, I'll share what goes into a winning competition toolkit along with some war stories on what to avoid. Additionally, I’ll share what we’re seeing on the collaborative side of competitions. Our community is showing an increasing amount of collaboration in developing machine learning models and analytic solutions. I'll showcase examples of this and discuss how these types of collaboration will improve how data science is learned and applied.

Ben Hamner, Co-founder and CTO, Kaggle at MLconf SF - 11/13/15

MLconf

Title: Factorization Machines Abstract: Developing accurate recommender systems for a specific problem setting seems to be a complicated and time-consuming task: models have to be defined, learning algorithms derived and implementations written. In this talk, I present the factorization machine (FM) model which is a generic factorization approach that allows to be adapted to problems by feature engineering. Efficient FM learning algorithms are discussed among them SGD, ALS/CD and MCMC inference including automatic hyperparameter selection. I will show on several tasks, including the Netflix prize and KDDCup 2012, that FMs are flexible and generate highly competitive accuracy. With FMs these results can be achieved by simple data preprocessing and without any tuning of regularization parameters or learning rates.

Steffen Rendle, Research Scientist, Google at MLconf SF

MLconf

Generating a Billion Personal News Feeds: With exponential growth of information and improved access, there is more and more data and not enough time to digest it. Facebook’s News Feed attempts to solve this by offering a way to show the most relevant content to each individual person. We create billions of personalized experiences by ranking stories for each person. Over the years, News Feed ranking has evolved to use large-scale machine learning techniques, driving to maximize the value created for each individual. Ranking and organizing the content in a unique way for a billion of users poses unique challenges. Each time a person visits their News Feed, we need to find the best piece of content out of all the available stories for them and put it at the top of Feed, where people are most likely to see it. To accomplish this, we model each person, attempting to figure out which friends, pages, and topics they care most about, and pick the stories and ordering they will find most interesting. In addition to the machine learning problems we work on for directing those choices, another primary area of research is understanding the value we are creating for people. These joint problems of selection and evaluation are essential for delivering continued value in personalized Feeds, and they would not be possible at the huge scale of content and users that Facebook operates at without powerful machine learning and analytics.

Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16

MLconf

Building a Machine Learning Platform at Quora: Each month, over 100 million people use Quora to share and grow their knowledge. Machine learning has played a critical role in enabling us to grow to this scale, with applications ranging from understanding content quality to identifying users’ interests and expertise. By investing in a reusable, extensible machine learning platform, our small team of ML engineers has been able to productionize dozens of different models and algorithms that power many features across Quora. In this talk, I’ll discuss the core ideas behind our ML platform, as well as some of the specific systems, tools, and abstractions that have enabled us to scale our approach to machine learning.

Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016

MLconf

Smart Reply: Learning a Model of Conversation from Data: Smart Reply is a text assistance feature that was recently introduced to Inbox by Gmail. Given an incoming email message, the Smartreply system analyzes its contents and suggests complete responses that the recipient can send with just one tap. This talk will cover how we built Smartreply using a combination of deep learning and semantic clustering, as well as what we learned along the way and why we think it shows promise for the future of dialogue models.

Anjuli Kannan, Software Engineer, Google at MLconf SF 2016

MLconf

When Recommendations Systems Go Bad: Machine learning and recommendations systems have changed the way we interact with not just the internet, but some of the basic products and services that we use to run our lives. While the reach and impact of big data and algorithms will continue to grow, how do we ensure that people are treated justly? Certainly there are already algorithms in use that determine if someone will receive a job interview or be accepted into a school. Misuse of data in many of these cases could have serious public relations, legal, and ethical consequences. As the people that build these systems, we have a social responsibility to consider their effect on humanity, and we should do whatever we can to prevent these models from perpetuating some of the prejudice and bias that exist in our society today. In this talk I intend to cover some examples of recommendation systems that have gone wrong across various industries, as well as why they went wrong and what can be done about it. The first step towards solving this larger issue is raising awareness, but there are concrete technical approaches that can be employed as well. Three that will be covered are: - Accepting simplicity with interpretable models. - Data segregation via ensemble modelling. - Designing test data sets for capturing unintended bias.

Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16

MLconf

Why Machine Learning Algorithms Fall Short (And What You Can Do About It): Many think that machine learning is all about the algorithms. Want a self-learning system? Get your data, start coding or hire a PhD that will build you a model that will stand the test of time. Of course we know that this is not enough. Models degrade over time, algorithms that work great on yesterday’s data may not be the best option, new data sources and types are made available. In short, your self-learning system may not be learning anything at all. In this session, we will examine how to overcome challenges in creating self-learning systems that perform better and are built to stand the test of time. We will show how to apply mathematical optimization algorithms that often prove superior to local optimization methods favored by typical machine learning applications and discuss why these methods can crate better results. We will also examine the role of smart automation in the context of machine learning and how smart automation can create self-learning systems that are built to last.

Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...

MLconf

Before the Model: How Machine Learning Products Start, with Examples from Airbnb: Often the most important part of building a machine learning product is the formulation of the problem; the most elegant model is rendered useless without the right application and model architecture. Airbnb is an online marketplace for accommodations which has found many interesting applications for machine learning products by taking a data driven approach to investment in Machine learning products. Come hear about how the Airbnb team generates and vets ideas for machine learning products and tailors the product to business problems, with some examples of success and lessons learned along the way.

Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

MLconf

Byron Galbraith is the Chief Data Scientist and co-founder of Talla, where he works to translate the latest advancements in machine learning and natural language processing to build AI-powered conversational agents. Byron has a PhD in Cognitive and Neural Systems from Boston University and an MS in Bioinformatics from Marquette University. His research expertise includes brain-computer interfaces, neuromorphic robotics, spiking neural networks, high-performance computing, and natural language processing. Byron has also held several software engineering roles including back-end system engineer, full stack web developer, office automation consultant, and game engine developer at companies ranging in size from a two-person startup to a multi-national enterprise. Abstract Summary: Bayesian Bandits: What color should that button be to convert more sales? What ad will most likely get clicked on? What movie recommendations should be displayed to keep subscribers engaged? What should we have for lunch? These are all examples of iterated decision problems — the same choice has to be made repeatedly with the goal being to arrive at an optimal decision strategy by incorporating the results of the previous decisions. In this talk I will describe the Bayesian Bandit solution to these types of problems, how it adaptively learns to minimize regret, how additional contextual information can be incorporated, and how it compares to the more traditional A/B testing solution.

Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

MLconf

Feature Engineering

HJ van Veen

Viewers also liked (10)

Ben Hamner, Co-founder and CTO, Kaggle at MLconf SF - 11/13/15

Steffen Rendle, Research Scientist, Google at MLconf SF

Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16

Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016

Anjuli Kannan, Software Engineer, Google at MLconf SF 2016

Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16

Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...

Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Feature Engineering

Similar to Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Estola meetup big_datacampla_6_14_evan_estola

Data Con LA

Mahout is an open source machine learning library from Apache. From its humble beginnings at Apache Lucene, the project has grown into a active community of developers, machine learning experts and enthusiasts. With v0.5 released recently, the project has been focussing full steam on developing stable APIs with an eye on our major milestone of v1.0. The speaker has been with Mahout from his days in college as a computer science student. The talk will focus on the major use cases of Mahout. The design decisions, things that worked, things that didn't, and things to expect in the future releases. http://sdec.kr/

SDEC2011 Mahout - the what, the how and the why

Korea Sdec

Getting a Data Science Job

Alexey Grigorev

Meetup 18/10/2018 - Artificiële intelligentie en mobiliteit

Digipolis Antwerpen

Offline evaluation of recommender systems: all pain and no gain?

Mark Levy

BIG2016- Lessons Learned from building real-life user-focused Big Data systems

Xavier Amatriain

Machine Learning Product Managers Meetup Event

Benjamin Schulte

CP vs Project - Elevate Ep. 02.pdf

preetikumara

Customer segmentation scbcn17

Julio Martinez

A few questions about large scale machine learning

Theodoros Vasiloudis

Software Architecture & Design - Our Meetup Group

Oliver Stadie

A Multi-Armed Bandit Framework For Recommendations at Netflix

Jaya Kawale

Cepstrum Placement Talk 2022.pptx

gyan98

Beat the Benchmark.

Pruthuvi Maheshakya Wijewardena

Beat the Benchmark.

Pruthuvi Maheshakya Wijewardena

Scaling Quality on Quora Using Machine Learning

Vo Viet Anh

Recommendation systems today are widely used across many applications such as in multimedia content platforms, social networks, and ecommerce, to provide suggestions to users that are most likely to fulfill their needs, thereby improving the user experience. Academic research, to date, largely focuses on the performance of recommendation models in terms of ranking quality or accuracy measures, which often don’t directly translate into improvements in the real-world. In this talk, we present some of the most interesting challenges that we face in the personalization efforts at Netflix. The goal of this talk is to sunshine challenging research problems in industrial recommendation systems and start a conversation about exciting areas of future research.

Recent Trends in Personalization at Netflix

Förderverein Technische Fakultät

Your first 5 PHP design patterns - ThatConference 2012

Aaron Saray

Growing up new PostgreSQL developers (pgcon.org 2018)

Aleksander Alekseev

Paris ML meetup

Yves Raimond

Similar to Evan Estola – Data Scientist, Meetup.com at MLconf ATL (20)

Estola meetup big_datacampla_6_14_evan_estola

SDEC2011 Mahout - the what, the how and the why

Getting a Data Science Job

Meetup 18/10/2018 - Artificiële intelligentie en mobiliteit

Offline evaluation of recommender systems: all pain and no gain?

BIG2016- Lessons Learned from building real-life user-focused Big Data systems

Machine Learning Product Managers Meetup Event

CP vs Project - Elevate Ep. 02.pdf

Customer segmentation scbcn17

A few questions about large scale machine learning

Software Architecture & Design - Our Meetup Group

A Multi-Armed Bandit Framework For Recommendations at Netflix

Cepstrum Placement Talk 2022.pptx

Beat the Benchmark.

Scaling Quality on Quora Using Machine Learning

Recent Trends in Personalization at Netflix

Your first 5 PHP design patterns - ThatConference 2012

Growing up new PostgreSQL developers (pgcon.org 2018)

Paris ML meetup

More from MLconf

Understanding Human Impact: Social and Equity Assessments for AI Technologies Social and Equity Impact Assessments have broad applications but can be a useful tool to explore and mitigate for Machine Learning fairness issues and can be applied to product specific questions as a way to generate insights and learnings about users, as well as impacts on society broadly as a result of the deployment of new and emerging technologies. In this presentation, my goal is to advocate for and highlight the need to consult community and external stakeholder engagement to develop a new knowledge base and understanding of the human and social consequences of algorithmic decision making and to introduce principles, methods and process for these types of impact assessments.

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

MLconf

The Brain’s Guide to Dealing with Context in Language Understanding Like the visual cortex, the regions of the brain involved in understanding language represent information hierarchically. But whereas the visual cortex organizes things into a spatial hierarchy, the language regions encode information into a hierarchy of timescale. This organization is key to our uniquely human ability to integrate semantic information across narratives. More and more, deep learning-based approaches to natural language understanding embrace models that incorporate contextual information at varying timescales. This has not only led to state-of-the art performance on many difficult natural language tasks, but also to breakthroughs in our understanding of brain activity. In this talk, we will discuss the important connection between language understanding and context at different timescales. We will explore how different deep learning architectures capture timescales in language and how closely their encodings mimic the brain. Along the way, we will uncover some surprising discoveries about what depth does and doesn’t buy you in deep recurrent neural networks. And we’ll describe a new, more flexible way to think about these architectures and ease design space exploration. Finally, we’ll discuss some of the exciting applications made possible by these breakthroughs.

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

MLconf

Applying Computer Vision to Reduce Contamination in the Recycling Stream With China’s recent refusal of most foreign recyclables, North American waste haulers are scrambling to figure out how to make on-shore recycling cost-effective in order to continue providing recycling services. Recyclables that were once being shipped to China for manual sorting are now primarily being redirected to landfills or incinerators. Without a solution, a nearly $5 billion annual recycling market could come to a halt. Purity in the recycling stream is key to this effort as contaminants in the stream can increase the cost of operations, damage equipment and reduce the ability to create pure commodities suitable for creating recycled goods. This market disruption as a result of China’s new regulations, however, provides us the chance to re-examine and improve our current disposal & collection habits with modern monitoring & artificial intelligence technology. Using images from our in-dumpster cameras, Compology has developed an ML-based process that helps identify, measure and alert for contaminants in recycling containers before they are picked-up, helping keep the recycling stream clean. Our convolutional neural network flags potential instances of contamination inside a dumpster, enabling garbage haulers to know which containers have the wrong type of material inside. This allows them to provide targeted, timely education, and when appropriate, assess fines, to improve recycling compliance at the businesses and residences they serve, helping keep recycling services financially viable. In this presentation, we will walk through our ML-based contamination measurement and scoring process by showing how Waste Management, a national waste hauler, has experienced 57% contamination reduction in nearly 2,000 containers over six months, This progress shows significant strides towards financially viable recycling services.

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

MLconf

Quantum Computing: a Treasure Hunt, not a Gold Rush Quantum computers promise a significant step up in computational power over conventional computers, but also suffer a number of counterintuitive limitations --- both in their computational model and in leading lab implementations. In this talk, we review how quantum computers compete with conventional computers and how conventional computers try to hold their ground. Then we outline what stands in the way of successful quantum ML applications.

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

MLconf

Data Labeling as Religious Experience One of the most common places to deploy a production machine learning systems is as a replacement for a legacy rules-based system that is having a hard time keeping up with new edge cases and requirements. I'll be walking through the process and tooling we used to help us design, train, and deploy a model to replace a set of static rules we had for handling invite spam at Slack, talk about what we learned, and discuss some problems to solve in order to make these migrations easier for everyone.

Josh Wills - Data Labeling as Religious Experience

MLconf

Project GaitNet: Ushering in the ImageNet moment for human Gait kinematics The emergence of the upright human bipedal gait can be traced back 4 to 2.8 million years ago, to the now extinct hominin Australopithecus afarensis. Fine grained analysis of gait using the modern MEMS sensors found on all smartphones not just reveals a lot about the person’s orthopedic and neuromuscular health status, but also has enough idiosyncratic clues that it can be harnessed as a passive biometric. While there were many siloed attempts made by the machine learning community to model Bipedal Gait sensor data, these were done with small datasets oft collected in restricted academic environs. In this talk, we will introduce the ImageNet moment for human gait analysis by presenting 'Project GaitNet', the largest ever planet-sized motion sensor based human bipedal gait dataset ever curated. We’ll also present the associated state-of-the-art results in classifying humans harnessing novel deep neural architectures and the related success stories we have enjoyed in transfer-learning into disparate domains of human kinematics analysis.

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

MLconf

Machine Learning Methods in Detecting Alzheimer’s Disease from Speech and Language Alzheimer's disease affects millions of people worldwide, and it is important to predict the disease as early and as accurate as possible. In this talk, I will discuss development of novel ML models that help classifying healthy people from those who develop Alzheimer's, using short samples of human speech. As an input to the model, features of different modalities are extracted from speech audio samples and transcriptions: (1) syntactic measures, such as e.g. production rules extracted from syntactic parse trees, (2) lexical measures, such as e.g. features of lexical richness and complexity and lexical norms, and (3) acoustic measures, such as e.g. standard Mel-frequency cepstral coefficients. I will present the ML model that detects cognitive impairment by reaching agreement among modalities. The resulting model is able to achieve state of the art performance in both supervised and semi-supervised manner, using manual transcripts of human speech. Additionally, I will discuss potential limitations of any fully-automated speech-based Alzheimer's disease detection model, focusing mostly on the analysis of the impact of a not-so-accurate automatic speech recognition (ASR) on the classification performance. To illustrate this, I will present the experiments with controlled amounts of artificially generated ASR errors and explain how the deletion errors affect Alzheimer's detection performance the most, due to their impact on the features of syntactic and lexical complexity.

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

MLconf

Optimized Image Classification on the Cheap In this talk, we anchor on building an image classifier trained on the Stanford Cars dataset to evaluate two approaches to transfer learning -fine tuning and feature extraction- and the impact of hyperparameter optimization on these techniques. Once we define the most performant transfer learning technique for Stanford Cars, we will double the size of the dataset through image augmentation to boost the classifier’s performance. We will use Bayesian optimization to learn the hyperparameters associated with image transformations using the downstream image classifier’s performance as the guide. In conjunction with model performance, we will also focus on the features of these augmented images and the downstream implications for our image classifier. To both maximize model performance on a budget and explore the impact of optimization on these methods, we apply a particularly efficient implementation of Bayesian optimization to each of these architectures in this comparison. Our goal is to draw on a rigorous set of experimental results that can help us answer the question: how can resource-constrained teams make trade-offs between efficiency and effectiveness using pre-trained models?

Meghana Ravikumar - Optimized Image Classification on the Cheap

MLconf

The Importance of Modeling Data Collection Data sets used in machine learning are often collected in a systematically biased way - certain data points are more likely to be collected than others. We call this "observation bias". For example, in health care, we are more likely to see lab tests when the patient is feeling unwell than otherwise. Failing to account for observation bias can, of course, result in poor predictions on new data. By contrast, properly accounting for this bias allows us to make better use of the data we do have. In this presentation, we discuss practical and theoretical approaches to dealing with observation bias. When the nature of the bias is known, there are simple adjustments we can make to nonparametric function estimation techniques, such as Gaussian Process models. We also discuss the scenario where the data collection model is unknown. In this case, there are steps we can take to estimate it from observed data. Finally, we demonstrate that having a small subset of data points that are known to be collected at random - that is, in an unbiased way - can vastly improve our ability to account for observation bias in the rest of the data set. My hope is that attendees of this presentation will be aware of the perils of observation bias in their own work, and be equipped with tools to address it.

Noam Finkelstein - The Importance of Modeling Data Collection

MLconf

The Uncanny Valley of ML Every so often, the conundrum of the Uncanny Valley re-emerges as advanced technologies evolve from clearly experimental products to refined accepted technologies. We have seen its effects in robotics, computer graphics, and page load times. The debate of how to handle the new technology detracts from its benefits. When machine learning is added to human decision systems a similar effect can be measured in increased response time and decreased accuracy. These systems include radiology, judicial assignments, bus schedules, housing prices, power grids and a growing variety of applications. Unfortunately, the Uncanny Valley of ML can be hard to detect in these systems and can lead to degraded system performance when ML is introduced, at great expense. Here, we'll introduce key design principles for introducing ML into human decision systems to navigate around the Uncanny Valley and avoid its pitfalls.

June Andrews - The Uncanny Valley of ML

MLconf

Deep Learning Architectures for Semantic Relation Detection Tasks Recognizing and distinguishing specific semantic relations from other types of semantic relations is an essential part of language understanding systems. Identifying expressions with similar and contrasting meanings is valuable for NLP systems which go beyond recognizing semantic relatedness and require to identify specific semantic relations. In this talk, I will first present novel techniques for creating labelled datasets required for training deep learning models for classifying semantic relations between phrases. I will further present various neural network architectures that integrate morphological features into integrated path-based and distributional relation detection algorithms and demonstrate that this model outperforms state-of-the-art models in distinguishing semantic relations and is capable of efficiently handling multi-word expressions.

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

MLconf

Building an Incrementally Trained, Local Taste Aware, Global Deep Learned Recommender System Model At Netflix, our main goal is to maximize our members’ enjoyment of the selected show by minimizing the amount of time it takes for them to find it. We try to achieve this goal by personalizing almost all the aspects of our product -- from what shows to recommend, to how to present these shows and construct their home-pages to what images to select per show, among many other things. Everything is recommendations for us and as an applied Machine Learning group, we spend our time building models for personalization that will eventually increase the joy and satisfaction of our members. In this talk we will primarily focus our attention on a) making a global deep learned recommender model that is regional tastes and popularity aware and b) adapting this model to changing taste preferences as well as dynamic catalog availability. We will first go through some standard recommender system models that use Matrix Factorization and Topic Models and then compare and contrast them with more powerful and higher capacity deep learning based models such as sequence models that use recurrent neural networks. We will show what it entails to build a global model that is aware of regional taste preferences and catalog availability. We will show how models that are built on simple Maximum Likelihood principle fail to do that. We will then describe one solution that we have employed in order to enable the global deep learned models to focus their attention on capturing regional taste preferences and changing catalog.In the latter half of the talk, we will discuss how we do incremental learning of deep learned recommender system models. Why do we need to do that ? Everything changes with time. Users’ tastes change with time. What’s available on Netflix and what’s popular also change over time. Therefore, updating or improving recommendation systems over time is necessary to bring more joy to users. In addition to how we apply incremental learning, we will discuss some of the challenges we face involving large-scale data preparation, infrastructure setup for incremental model training as well as pipeline scheduling. The incremental training enables us to serve fresher models trained on fresher and larger amounts of data. This helps our recommender system to nicely and quickly adapt to catalog and users’ taste changes, and improve overall performance.

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

MLconf

Vito Ostuni - The Voice: New Challenges in a Zero UI World The adoption of voice-enabled devices has seen an explosive growth in the last few years and music consumption is among the most popular use cases. Music personalization and recommendation plays a major role at Pandora in providing a daily delightful listening experience for millions of users. In turn, providing the same perfectly tailored listening experience through these novel voice interfaces brings new interesting challenges and exciting opportunities. In this talk we will describe how we apply personalization and recommendation techniques in three common voice scenarios which can be defined in terms of request types: known-item, thematic, and broad open-ended. We will describe how we use deep learning slot filling techniques and query classification to interpret the user intent and identify the main concepts in the query. We will also present the differences and challenges regarding evaluation of voice powered recommendation systems. Since pure voice interfaces do not contain visual UI elements, relevance labels need to be inferred through implicit actions such as play time, query reformulations or other types of session level information. Another difference is that while the typical recommendation task corresponds to recommending a ranked list of items, a voice play request translates into a single item play action. Thus, some considerations about closed feedback loops need to be made. In summary, improving the quality of voice interactions in music services is a relatively new challenge and many exciting opportunities for breakthroughs still remain. There are many new aspects of recommendation system interfaces to address to bring a delightful and effortless experience for voice users. We will share a few open challenges to solve for the future.

Vito Ostuni - The Voice: New Challenges in a Zero UI World

MLconf

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

MLconf

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

MLconf

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

MLconf

Neel Sundaresan - Teaching a machine to code

MLconf

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

MLconf

Soumith Chintala - Increasing the Impact of AI Through Better Software

MLconf

Roy Lowrance - Predicting Bond Prices: Regime Changes

MLconf

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

Josh Wills - Data Labeling as Religious Experience

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

Meghana Ravikumar - Optimized Image Classification on the Cheap

Noam Finkelstein - The Importance of Modeling Data Collection

June Andrews - The Uncanny Valley of ML

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

Vito Ostuni - The Voice: New Challenges in a Zero UI World

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

Neel Sundaresan - Teaching a machine to code

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

Soumith Chintala - Increasing the Impact of AI Through Better Software

Roy Lowrance - Predicting Bond Prices: Regime Changes

Recently uploaded

Angeliki Cooney has spent over twenty years at the forefront of the life sciences industry, working out of Wynantskill, NY. She is highly regarded for her dedication to advancing the development and accessibility of innovative treatments for chronic diseases, rare disorders, and cancer. Her professional journey has centered on strategic consulting for biopharmaceutical companies, facilitating digital transformation, enhancing omnichannel engagement, and refining strategic commercial practices. Angeliki's innovative contributions include pioneering several software-as-a-service (SaaS) products for the life sciences sector, earning her three patents. As the Senior Vice President of Life Sciences at Avenga, Angeliki orchestrated the firm's strategic entry into the U.S. market. Avenga, a renowned digital engineering and consulting firm, partners with significant entities in the pharmaceutical and biotechnology fields. Her leadership was instrumental in expanding Avenga's client base and establishing its presence in the competitive U.S. market.

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Angeliki Cooney

MINDCTI Revenue Release Quarter One 2024

MIND CTI

💉💊+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI}}+971581248768 +971581248768 Mtp-Kit (500MG) Prices » Dubai [(+971581248768**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Maya Whatsapp +971581248768 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971581248768''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971581248768' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Cl

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

MadyBayot

[BuildWithAI] Introduction to Gemini.pdf

Sandro Moreira

ICT role in 21st century education and its challenges

rafiqahmad00786416

Passkeys: Developing APIs to enable passwordless authentication Cody Salas, Sr Developer Advocate | Solutions Architect - Yubico Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

apidays

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

DBX First Quarter 2024 Investor Presentation

Dropbox

Tracing the root cause of a performance issue requires a lot of patience, experience, and focus. It’s so hard that we sometimes attempt to guess by trying out tentative fixes, but that usually results in frustration, messy code, and a considerable waste of time and money. This talk explains how to correctly zoom in on a performance bottleneck using three levels of profiling: distributed tracing, metrics, and method profiling. After we learn to read the JVM profiler output as a flame graph, we explore a series of bottlenecks typical for backend systems, like connection/thread pool starvation, invisible aspects, blocking code, hot CPU methods, lock contention, and Virtual Thread pinning, and we learn to trace them even if they occur in library code you are not familiar with. Attend this talk and prepare for the performance issues that will eventually hit any successful system. About authorWith two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Victor Rentea

Dubai, often portrayed as a shimmering oasis in the desert, faces its own set of challenges, including the occasional threat of flooding. Despite its reputation for opulence and modernity, the emirate is not immune to the forces of nature. In recent years, Dubai has experienced sporadic but significant floods, testing the resilience of its infrastructure and communities. Among the critical lifelines in this bustling metropolis is the Dubai International Airport, a bustling hub that connects the city to the world. This article explores the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Orbitshub

Keynote 2: APIs in 2030: The Risk of Technological Sleepwalk Paolo Malinverno, Growth Advisor - The Business of Technology Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

apidays

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

Architecting Cloud Native Applications

WSO2

Dubai, known for its towering skyscrapers, luxurious lifestyle, and relentless pursuit of innovation, often finds itself in the global spotlight. However, amidst the glitz and glamour, the emirate faces its own set of challenges, including the occasional threat of flooding. In recent years, Dubai has experienced sporadic but significant floods, disrupting normalcy and posing unique challenges to its infrastructure. Among the critical nodes in this bustling metropolis is the Dubai International Airport, a vital hub connecting the world. This article delves into the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Orbitshub

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

The action of the next cyber saga takes place in the mystical lands of the Asia-Pacific region, where the main characters began their digital activities in the middle of 2021 and qualitatively strengthened it in 2022. Corporate espionage, document theft, audio recordings, and data leaks from messaging platforms were all a matter of one day for Dark Pink. Their geographical focus may have started in the Asia-Pacific region, but their ambitions knew no bounds, targeting a European government ministry in a bold move to expand their portfolio. Their victim profile was as diverse as a UN meeting, targeting military organizations, government agencies, and even a religious organization. Because discrimination is not a fashionable agenda. In the world of cybercrime, they serve as a reminder that sometimes the most serious threats come in the most unassuming packages with a pink bow.

Cyberprint. Dark Pink Apt Group [EN].pdf

Overkill Security

Manulife - Insurer Transformation Award 2024

The Digital Insurer

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Recently uploaded (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

MINDCTI Revenue Release Quarter One 2024

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

[BuildWithAI] Introduction to Gemini.pdf

ICT role in 21st century education and its challenges

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

Exploring the Future Potential of AI-Enabled Smartphone Processors

DBX First Quarter 2024 Investor Presentation

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Architecting Cloud Native Applications

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Artificial Intelligence Chap.5 : Uncertainty

Cyberprint. Dark Pink Apt Group [EN].pdf

Manulife - Insurer Transformation Award 2024

presentation ICT roal in 21st century education

How to Troubleshoot Apps for the Modern Connected Worker

Evan Estola – Data Scientist, Meetup.com at MLconf ATL

1. Beyond Collaborative Filtering: ML & Recommendations at Meetup Evan Estola Machine Learning Engineer Meetup.com evan@meetup.com @estola

2. Meetup what are you

3. Why Meetup data is cool ● Real people meeting up ● Every meetup could change someone's life ● No ads, just do the best thing ● Oh and >125 million rsvps by 18 million members ● 3 million rsvps in the last 30 days ○ 1/second

5. Tools at Meetup ● Hive - SQL on Hadoop ● Spark - Distributed Scala on Hadoop cluster ● Scala - Recommendations service ● R - Data analysis, Model building ● Python - Scripting, Data organizing ● Java - Backend of our web stack

6. Collaborative Filtering ● Classic recommendations approach ● Users who like this also like this

7. Weaknesses of CF ● Sparsity ● Cold Start ● Coverage ● Diversity

9. Why Recs at Meetup are hard ● Incomplete Data (topics) ● Cold start ● Asking user for data is hard ● Going to meetups is scary ● Sparsity ○ Location ○ Low rsvp/person ○ Membership: 0.001% ○ Compare to Netflix Prize Dataset: 1%

10. Supervised Learning/Classification ● “Inferring a function from labeled training data” ● Joined Meetup group/Didn’t join Meetup group

11. Preprocessing ● Schenectady ● Fake RSVP boosts (+100 guests!) ● Outliers ● Bucketing ● Etc etc

12. Problem definition and assumptions ● Assumption: if you’re not in a given group, you don’t want to be ○ Negative samples: groups you’re not in ○ Also a good classifier... ● Membership << expected error rate ○ Solution: sample to 50/50 join/no-join

13. Ranking ● Model output label no longer explicitly true ○ Luckily, we’d rather rank all of the results anyway ● Use a classifier that gives you a useful output ○ Fancy black box ○ Logistic Regression ■ Easier to explain

14. Meetup what are you

15. Ensemble Learning “... use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms”

16. Ensemble Learning ● Topic match (original algorithm) ● Collaborative Filtering on Topics ● Social algorithm ● Other simple features (Popularity, Gender…) ● Add output of algorithms as features into Logistic Regression model

17. Logistic Regression Output ● TopicScore 4.14 ● ExtendedTS 0.47 ● RelatedTS 0.66 ● FbFriends 2.02 ● 2ndFbFriends 0.09 ● AgeUnmatch -2.40 ● GenUnmatch -3.37 ● Distance -0.04 ● StateMatch 0.54 ● CountyMatch 0.41 ● ZipScore 0.06 ● RsvpScore 0.02

18. Facebook Likes ● Lots of information, but how to use? ● Map to topics, let training the model take care of the rest!

19. Mapping FB Likes to Meetup Topics ● Text based? ○ Go(game) vs Go(lang)? ○ Burton? ● Data approach! ○ Grab most popular topics across all members with the same like

20. Normalization ● Top topics for Burton-Likers ○ Meeting New People, Coffee, bla bla ○ Most popular still dominates ● Solution: Normalize based on expected topic occurrence in sample

21. Normalization ● For members with a given Like, compare percent with each topic to expected among total population ● Total population ○ 20% “Meeting New People” ○ 2% “Snowboarding ● Burton: ○ 20% “Meeting New People” ○ 9% “Snowboarding”

22. Results ● Generate top topics for all likes ○ Path from member to like to topic to group ● Add Facebook Like based topic match feature to model ● Positive weight ○ Very good sign! ● Deploy/Split test ○ TBD

23. Summary ● Supervised Learning for Ranking as Recommendations is cool ● Simple, interpretable models are cool ● Feature engineering is cool

24. Thanks! Smart people come work with me. http://www.meetup.com/jobs/

Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Recommended

Recommended

More Related Content

What's hot

What's hot (9)

Viewers also liked

Viewers also liked (10)

Similar to Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Similar to Evan Estola – Data Scientist, Meetup.com at MLconf ATL (20)

More from MLconf

More from MLconf (20)

Recently uploaded

Recently uploaded (20)

Evan Estola – Data Scientist, Meetup.com at MLconf ATL