Automated Machine Learning (AutoML) systems are emerging
that automatically search for possible solutions from a large space of possible kinds of models. Although fully automated machine learning is appropriate for many applications, users often have knowledge that supplements and constraints the available data and solutions. This paper proposes human-guided machine learning (HGML) as a hybrid approach where a user interacts with an AutoML system and tasks it to explore different problem settings that reflect the user’s knowledge about the data available. We present: 1) a task analysis of HGML that shows the tasks that a user would want to carry out, 2) a characterization of two scientific publications, one in neuroscience and one in political science, in terms of how the authors would search for solutions using an AutoML system, 3) requirements for HGML based on those characterizations, and 4) an assessment of existing AutoML systems in terms of those requirements.
1. Towards Human-Guided Machine Learning
Yolanda Gil1, James Honaker2, Shikhar Gupta1, Yibo Ma1, Vito D’Orazio3,
Daniel Garijo1, Shruti Gadewar1, Qifan Yang1 and Neda Jahanshad1
1University of Southern California
2University of Texas at Dallas
3Harvard University
https://w3id.org/people/dgarijo
@dgarijov
dgarijo@isi.edu
Intelligent User Interfaces (IUI19), March 18th, 2019
Information
Sciences
Institute
2. Rising Popularity of AutoML Systems
Intelligent User Interfaces, March 18th, 2019 2
auto-sklearn Auto-WEKA
AlphaZero
3. Anatomy of an AutoML System
Intelligent User Interfaces, March 18th, 2019 3
Auto ML
Predictions
Training data
Features: Train ML algorithm and one or more of the following:
• Extract features from data
• Data preparation (imputation, encoding, etc.)
• Feature selection
• Hyperparameter optimization
• Ensembling of solutions
Trained Model
Test data
4. Limitations of AutoML systems
Intelligent User Interfaces, March 18th, 2019 4
Training process is not transparent
Trained models are difficult to customize
Auto ML
Predictions
Training data
Trained Model
Test data
5. Human-Guided Machine Learning (HGML)
Intelligent User Interfaces, March 18th, 2019 5
Auto ML
Predictions
Training data
Trained Model
Test data
Domain expert
• Domain users don’t like black boxes
• They need to understand and modify the process to train a model with their
expertise
• Modify features (remove known biases)
• Guide hyper parameter search
• ….
Interface
6. Contributions of our work
Intelligent User Interfaces, March 18th, 2019 6
• AutoML system and user interface that supports basic HGML interactions
• A task analysis of HGML that enumerates discrete user tasks to guide
AutoML systems
• Characterizations of two significant studies in neuroscience and political sciences
• Requirements for HGML from AutoML system and user interface
• An assessment of how those requirements could be accommodated by
AutoML systems
7. AutoML System: P4ML
Intelligent User Interfaces, March 18th, 2019 7
• Extract features of interest from data (text, video, audio…)
• Builds a solution with the types of model and other steps to include (e.g.
imputation, encoding, etc.)
• Perform a hyperparameter search to improve the results
• Generate ensembles with the top-ranked models.
Phased Performance-Based Pipeline Planner
Predictions
Top Ranked
Solutions
Test data
Training data
Problem description
Evaluation
metric
HashingVectorizer -> LabelEncoder -> LogisticRegressionCV (0.9489)
CountVectorizer -> LabelEncoder -> BernoulliNB (0.9486)
TfidfVectorizer -> LabelEncoder -> AdaBoostClassifier (0.9460)
8. UI for AutoML System Interaction: TwoRavens
Intelligent User Interfaces, March 18th, 2019 8
• Statistical summaries of variables and variable exploration
• Integration with AutoML system (P4ML)
• Specify ML problem of interest
• Explore solution results returned by AutoML system
9. HGML Task Analysis
Intelligent User Interfaces, March 18th, 2019 9
• Top-down analysis
• Data Use
• Selection of variables (features) and instances
• Model Development
• Model selection and tuning
• Model Interpretation
• Result comparison
• Bottom up analysis
• Neuroscience: ENIGMA neurosciences consortium
• Political sciences: Seminal paper on civil war onset
10. Overview of task analysis (top down)
Intelligent User Interfaces, March 18th, 2019 10
11. Overview of task analysis (bottom up)
Intelligent User Interfaces, March 18th, 2019 11
Neuroscience
Political Sciences
Main task results:
• Feature selection and generation
• Model type selection
• Model configuration
• Quantities of interest and metrics
12. UI and AutoML Requirements
Intelligent User Interfaces, March 18th, 2019 12
Combined top-bottom and bottom up analyses to identify requirements for both
AutoML and user interface
13. Predictions
Accommodating HGML requirements – AutoML system
Intelligent User Interfaces, March 18th, 2019 13
Phased Performance-Based Pipeline Planner
Top Ranked
Solutions
Test data
Training data
Problem description
Evaluation
metric
Requirements
{
"include_model":["LinearSVC","LogisticRegression","DecisionTreeClassifier"],
"exclude_model":[],
"include_feature_generarion":["tfidfVectorizer"],
"use_imputation_method":"median",
"include_variables":[],
"exclude_variables":[],
"include_instances":[],
"exclude_instances":[],
"define_variable_weight":[{"variable":"","weight":},{}],
"select_training_and_test_data":{"training_data": [],"testing_data":
[],"cross_validation": "k-fold"},
…
}
14. Accommodating HGML requirements - UI
Intelligent User Interfaces, March 18th, 2019 14
• Extensions are needed for:
• Filtering variables and instances (subpopulations)
• Comparison and exploration of solutions
• Creation of variables from existing ones
Compare, filter, explore, transform
15. Conclusions and Future Work
Intelligent User Interfaces, March 18th, 2019 15
• Proliferation of AutoML systems
• AutoML solutions may not take into consideration domain expertise
• Interaction is needed: Human Guided Machine Learning
• Our contributions:
• Baseline HGML UI and AutoML system integration
• A task analysis of HGML
• Characterizations of two significant studies in neuroscience and political sciences
• Requirements for HGML based on task analysis
• An assessment of how those requirements could be accommodated by AutoML
systems
• Future work:
• Extend our baseline system with the requirements identified in this paper
16. Towards Human-Guided Machine Learning
Yolanda Gil1, James Honaker2, Shikhar Gupta1, Yibo Ma1, Vito D’Orazio3,
Daniel Garijo1, Shruti Gadewar1, Qifan Yang1 and Neda Jahanshad1
1University of Southern California
2University of Texas at Dallas
3Harvard University
https://w3id.org/people/dgarijo
@dgarijov
dgarijo@isi.edu
Intelligent User Interfaces (IUI19), March 18th, 2019
Information
Sciences
Institute
Hinweis der Redaktion
We view human-guided machine learning (HGML) as a new area of research focused on how to assist users to use domain knowledge to guide an AutoML system to select machine learning algorithms and find multi-step solutions.