Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Nächste SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Wird geladen in …3
×
1 von 91

Hyperparameter optimization landscape Berlin ML Group meetup 8/2019

3

Teilen

Herunterladen, um offline zu lesen

Methods, algorithms, libraries comparison and tool recommendations along with a ton of useful links.

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Hyperparameter optimization landscape Berlin ML Group meetup 8/2019

  1. 1. Hyperparameter Optimization (landscape) in Python kuba@neptune.ml @NeptuneML https://medium.com/neptune-ml Jakub Czakon
  2. 2. ● Intro ● Methods ● Libraries + Evaluation Criteria ○ Scikit-Optimize ○ Optuna + Hyperopt ○ HpBandster ● Results and Recommendations Agenda
  3. 3. learning rate depth feature fraction Model scoredata Intro
  4. 4. learning rate depth feature fraction Model score bin_nr groupby columns lagging Feature Engineering imputation method scaling method Data Cleaning thresholds Post- processing data objective(params, data=data) -> score Intro
  5. 5. ● Grid search ● Random search ● Guided search ● Grad student search (still best) Methods
  6. 6. ● Better configuration proposal ○ Objective function is estimated with surrogate models ○ Evolutionary methods ○ ... ● Faster objective function calculation ○ Bandid methods ○ Pruning ○ Estimating a score from the learning curve of NN ○ ... Methods
  7. 7. Methods: surrogate models objective(params) -> score surrogate(params) -> est_score surrogate(params2) = est_score*2 surrogate(params1000) = est_score1000 objective(params2) = score2 surrogate(params1) = est_score*1 expensive cheap explore cheap try expensive
  8. 8. Methods: surrogate models objective(params) -> score surrogate(params) -> est_score surrogate(params2) = est_score*2 surrogate(params1000) = est_score1000 objective(params2) = score2 surrogate(params1) = est_score*1 TPE, GP, RF EI, PI, LCB
  9. 9. Methods: bandid methods objective(params, data=data) -> score objective(params, budget=full) -> score objective(params, budget=low) -> score Estimate the score with lower fidelity runs
  10. 10. Methods: bandid methods ● Budget options: ○ Dataset size ○ Number of epochs ○ Time ○ Number of features ○ Number of CV-folds
  11. 11. Methods: bandid methods ● Successive halving: set resource, set budget, set run nr link ● Hyperband: random resource, grid search run nr, within set budget link
  12. 12. Methods: pruning ● Prune/abort runs that show little hope before they finish ● Isn’t it called early stopping?
  13. 13. ● Scikit-Optimize ● Optuna ● Hyperopt (sort of) ● HpBandSter ● Ray.tune (future work) ● … and many more Libraries
  14. 14. ● Algorithm ● API / ease of use ● Documentation ● Speed / Parallelization ● Visualization suite ● Experimental results Evaluation Criteria
  15. 15. Scikit-Optimize
  16. 16. Algorithm ● Objective function estimated with surrogate models ○ Random Forests ○ Gradient Boosted Trees ○ Gaussian process ● Next run params selected via acquisition function ○ Expected Improvement ○ Probability of Improvement ○ Lower Confidence Bound ● No objective func calculation speedup mechanism
  17. 17. API search space + objective + {fun}_minimize
  18. 18. API: search space ● Basic options: ○ skopt.space.Real ○ skopt.space.Integer ○ skopt.space.Categorical ● No support for nested search spaces
  19. 19. API: search space SPACE = [skopt.space.Real(0.01, 0.5, name='learning_rate', prior='log-uniform'), skopt.space.Integer(1, 30, name='max_depth'), skopt.space.Integer(2, 100, name='num_leaves'), skopt.space.Integer(10, 1000, name='min_data_in_leaf'), skopt.space.Real(0.1, 1.0, name='feature_fraction', prior='uniform'), skopt.space.Real(0.1, 1.0, name='subsample', prior='uniform'), ]
  20. 20. API: objective ● Define a function to minimize! ● Decorate if you want to keep parameter names
  21. 21. API: objective def objective(**params): return -1.0 * train_evaluate(X, y, **params) @skopt.utils.use_named_args(SPACE) def objective(**params): return -1.0 * train_evaluate(X, y, **params)
  22. 22. API: {fun}_minimize ● A few optimizers to choose from ○ skopt.forest_minimize ○ skopt.gbrt_minimize ○ skopt.gp_minimize ● Accepts callbacks
  23. 23. API: {fun}_minimize results = skopt.forest_minimize(objective, SPACE, n_calls=100, n_random_starts=10, base_estimator='ET', acq_func='LCB', xi=0.02, kappa=1.96)
  24. 24. API: {fun}_minimize def monitor(res): neptune.send_metric('run_score', res.func_vals[-1]) results = skopt.forest_minimize(..., callback=[monitor])
  25. 25. API: {fun}_minimize ● There are (hyper)hyperparameters ● Acquisition function: ○ ‘EI’, ‘PI’ , expected improvement probability of improvement (max) ○ ‘LCB’, expected value of objective + variance of GP ● Exploration vs exploitation ○ xi for ‘EI’ and ‘PI’, low xi exploration high xi exploitation ○ Kappa for ‘LCB’, low kappa exploitation, high kappa exploration
  26. 26. Documentation ● Amazing! ● Functions have docstrings. ● A lot of examples. link
  27. 27. Visualizations ● Options: ○ skopt.plots.plot_convergence - score improvement ○ skopt.plots.plot_evaluations - space search evolution ○ skopt.plots.plot_objective - sensitivity ● Beautiful and very useful.
  28. 28. Visualizations: plot_convergence skopt.plots.plot_convergence(results)
  29. 29. Visualizations: plot_convergence skopt.plots.plot_convergence(results_list)
  30. 30. Visualizations: plot_evaluations skopt.plots.plot_evaluations(results)
  31. 31. Visualizations: plot_evaluations skopt.plots.plot_objective(results)
  32. 32. Speed & Parallelization ● Runs sequentially and you cannot distribute it across many machines ● You can parallelize base estimator at every run with n_jobs ● If you have just 1 machine it is fast
  33. 33. Experimental Results
  34. 34. Conclusions: good ● Easy to use API and great documentation ● A lot of optimizers and tweaking options ● Awesome visualizations ● Solid gains over the random search ● Fast if you are running sequentially on 1 machine ● Active project support
  35. 35. Conclusions: bad ● Search space doesn’t support nesting ● No support for distributed computing
  36. 36. Optuna
  37. 37. Algorithm ● Objective function estimated with Tree of Parzen Estimators ● Next run params selected via Expected Improvement ● Objective func calculation speedup via run pruning and successive halving (optionally)
  38. 38. API search space & objective + {fun}_minimize
  39. 39. API: search space & objective def objective(trial): params = OrderedDict([ ('learning_rate',trial.suggest_loguniform('learning_rate', 0.01, 0.5)), ('max_depth',trial.suggest_int('max_depth', 1, 30)), ('num_leaves',trial.suggest_int('num_leaves', 2, 100)), ('min_data_in_leaf',trial.suggest_int('min_data_in_leaf', 10, 1000)), ('feature_fraction',trial.suggest_uniform('feature_fraction', 0.1, 1.0)), ('subsample',trial.suggest_uniform('subsample', 0.1, 1.0))]) score = -1.0 * train_evaluate(X, y, params) return score
  40. 40. API: search space & objective ● Basic options: ○ suggest_categorical ○ suggest_int , suggest_discrete_uniform ○ suggest_uniform , suggest_loguniform ● Nested search spaces ● Defined in-run (pytorch-like)
  41. 41. API: search space & objective def objective(trial): classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest']) if classifier_name == 'SVC': svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10) classifier_obj = sklearn.svm.SVC(C=svc_c) else: rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32)) classifier_obj = sklearn.ensemble.RandomForestClassifier(max_depth=rf_max_depth) …
  42. 42. API: {fun}_minimize ● Allows pruning ● Handles exceptions in objective ● Handles callbacks
  43. 43. study = optuna.create_study() study.optimize(objective, n_trials=100) results = study.trails_dataframe() API: {fun}_minimize
  44. 44. API: {fun}_minimize: pruning from optuna.integration import LightGBMPruningCallback def objective(trial): params = OrderedDict([ ('max_depth',trial.suggest_int('max_depth', 1, 30)), ('num_leaves',trial.suggest_int('num_leaves', 2, 100))]) pruning_callback = LightGBMPruningCallback(trial, 'auc') score = -1.0 * train_evaluate_with_pruning(X, y, params, pruning_callback) return score def train_evaluate_with_pruning(X, y, params, callback): ... model = lgb.train(params, train_data, ... , callbacks = [pruning_callback]) return model.best_score['valid']['auc']
  45. 45. API: {fun}_minimize: callbacks study = optuna.create_study() study.optimize(objective, n_trials=100, callbacks=[report_neptune]) def report_neptune(study, trial): neptune.send_metric('value', trial.value) neptune.send_metric('best_value', study.best_value) Available in bleeding edge version from source*
  46. 46. Documentation ● Solid read-the-docs project, ● Docstrings, docstrings everywhere, ● A lot of examples. link
  47. 47. Visualizations ● Options: ○ optuna.visualization.plot_intermediate_values ○ optuna.visualization.plot_optimization_history ● Basic monitoring ● Available in bleeding edge version from source*
  48. 48. Speed & Parallelization ● Can be easily distributed across one or many machines ● Has pruning to speed up unpromising runs
  49. 49. Speed & Parallelization: one study.optimize(objective, n_trials=100, n_jobs=5)
  50. 50. Speed & Parallelization: many … study = optuna.Study(study_name='distributed-search', storage='sqlite:///example.db') study.optimize(objective, n_trials=100) ... $ optuna create-study --study-name "distributed-search" --storage "sqlite:///example.db" $ python optuna_search.py $ python optuna_search.py terminal 1 terminal 2 terminal 3 optuna_search.py
  51. 51. Experimental Results
  52. 52. Conclusions: good ● Easy to use API ● Great documentation ● Can be easily distributed over a cluster of machines ● Has pruning ● Has callbacks ● Search space supports nesting ● Active project support
  53. 53. Conclusions: bad ● Only TPE optimizer available ● Only some visualizations ● *No gains over the random search (with 100 iterations budget)
  54. 54. Optuna is hyperopt with: ● better api ● waaaay better documentation ● pruning (and halving available) ● exception handling ● simpler parallelization ● active project support
  55. 55. Should I swap hyperopt with optuna?
  56. 56. HpBandSter https://www.automl.org/
  57. 57. ● HyperBand on Steroids ● It has state-of-the-art algorithms ○ Hyperband link ○ BOHB (Bayesian Optimization + Hyperband) link ● Distributed-computing-first API HpBandSter
  58. 58. HpBandSter
  59. 59. Algorithm ● Objective function estimated with TPE ● Next run params selected via Expected Improvement ● Objective func calculation speedup via bandid with random budgets (hyperband)
  60. 60. API server + worker + optimizer
  61. 61. API server + worker + optimizer
  62. 62. API: server ● Workers communicate with server to: ○ get next parameter configuration ○ send results ● You have to define it even for the most basic setups/problems (weird)
  63. 63. API: server import hpbandster.core.nameserver as hpns NS = hpns.NameServer(run_id=RUN_ID, host=HOST, port=PORT, working_directory=WORKING_DIRECTORY) ns_host, ns_port = NS.start()
  64. 64. API: worker: objective from hpbandster.core.worker import Worker class TrainEvalWorker(Worker): ... def compute(self, config, budget, working_directory, *args, **kwargs): loss = -1.0 * train_evaluate(self.X, self.y, budget, config) return ({'loss': loss, 'info': { 'auxiliary_stuff': 'worked' } })
  65. 65. API: worker: search space ● Basic options: ○ CSH.{Categorical/Ordinal}Hyperparameter ○ CSH.{Uniform/Normal}IntegerHyperparameter ○ CSH.{Uniform/Normal}FloatHyperparameter ● Nested search spaces with ifs
  66. 66. API: worker: search space class TrainEvalWorker(Worker): ... @staticmethod def get_configspace(): cs = CS.ConfigurationSpace() learning_rate = CSH.UniformFloatHyperparameter('learning_rate', lower=0.01, upper=0.5, default_value=0.01, log=True) subsample = CSH.UniformFloatHyperparameter('subsample', lower=0.1, upper=1.0, default_value=0.5, log=False) cs.add_hyperparameters([learning_rate, subsample]) return cs
  67. 67. API: worker: connecting to server worker = TrainEvalWorker(run_id=RUN_ID, nameserver=ns_host, nameserver_port=ns_port) worker.run(background=True)
  68. 68. API: optimizer from hpbandster.optimizers import BOHB optim = BOHB(configspace = worker.get_configspace(), run_id = RUN_ID, nameserver=ns_host, nameserver_port=ns_port, eta=3, min_budget=0.1, max_budget=1, num_samples=64, top_n_percent=15, min_bandwidth=1e-3, bandwidth_factor=3) study = optim.run(n_iterations=100)
  69. 69. API: optimizer: callbacks class NeptuneLogger: def new_config(self, *args, **kwargs): pass def __call__(self, job): neptune.send_metric('run_score', job.result['loss']) neptune.send_text('run_parameters', str(job.kwargs['config'])) optim = BOHB(configspace=worker.get_configspace(), run_id=RUN_ID, nameserver=ns_host, nameserver_port=ns_port, result_logger=NeptuneLogger())
  70. 70. Documentation ● Decent Read-the-docs project, ● Missing docstrings in a lot of places, ● A bunch of examples. link
  71. 71. Visualizations ● Options: ○ hpvis.losses_over_time - score improvement ○ hpvis.concurrent_runs_over_time - speed/parallelization ○ hpvis.finished_runs_over_time - budget adjustment ○ hpvis.correlation_across_budgets - budget adjustment ○ hpvis.performance_histogram_model_vs_random - sanity check ● Very lib/debug-specific but can be useful for tweaking
  72. 72. Visualizations: losses_over_time
  73. 73. Visualizations: losses_over_time all_runs = results.get_all_runs() hpvis.losses_over_time(all_runs);
  74. 74. Visualizations: correlation_across_budgets
  75. 75. Visualizations: correlation_across_budgets hpvis.correlation_across_budgets(results);
  76. 76. Visualizations: performance_histogram_model_vs_random
  77. 77. Visualizations: performance_histogram_model_vs_random all_runs = results.get_all_runs() id2conf = results.get_id2config_mapping() hpvis.performance_histogram_model_vs_random(all_runs, id2conf);
  78. 78. Speed & Parallelization ● Can be easily distributed across threads/processes/machines
  79. 79. Speed & Parallelization: threads workers=[] for i in range(N_WORKERS): w = TrainEvalWorker(run_id=RUN_ID, id=isleep_interval = 0.5, nameserver=ns_host, nameserver_port=ns_port) w.run(background=True) workers.append(w) optim = BOHB(configspace = TrainEvalWorker.get_configspace(), run_id = RUN_ID, nameserver=ns_host, nameserver_port=ns_port) study = optim.run(n_iterations=100, min_n_workers=N_WORKERS)
  80. 80. Speed & Parallelization: processes workers=[] for i in range(N_WORKERS): w = TrainEvalWorker(run_id=RUN_ID, id=isleep_interval = 0.5, nameserver=ns_host, nameserver_port=ns_port) w.run(background=False) exit(0) optim = BOHB(configspace = TrainEvalWorker.get_configspace(), run_id = RUN_ID, nameserver=ns_host, nameserver_port=ns_port) study = optim.run(n_iterations=100, min_n_workers=N_WORKERS)
  81. 81. Speed & Parallelization: machines Follow the example from the docs … but it is not obvious
  82. 82. Experimental Results
  83. 83. Conclusions: good ● State-of-the-art algorithm ● Can be distributed over a cluster of machines ● Useful visualizations ● Search space supports nesting
  84. 84. Conclusions: bad ● Project is not very active ● Complicated API ● Missing docstrings
  85. 85. Which one should I choose?
  86. 86. Results (mostly subjective) Scikit-Optimize Optuna HpBandSter Hyperopt API/ease of use Great Great Difficult Good Documentation Great Great Ok(ish) Bad Speed/Parallelization Fast if sequential/None Great Good Ok Visualizations Amazing Basic Very lib specific Some *Experimental results 0.8566 (100) 0.8419 (100) 0.8597 (10000) 0.8629 (100) 0.8420 (100)
  87. 87. Dream library Scikit-Optimize Visualizations + Optuna API + Docs + Pruning + Callbacks + Parallelization + HpBandSter Optimizers
  88. 88. Conversions between results objects are in neptune-contrib import neptunecontrib.hpo.utils as hpo_utils results = hpo_utils.optuna2skopt(study) Dream library
  89. 89. ● If you don’t have a lot of resources - use Scikit-Optimize ● If you want to get SOTA and don’t care about API/Docs - use HpBandSter ● If you want good docs/api/parallelization - use Optuna Recommendations
  90. 90. ● Slides link on Twitter @NeptuneML or Linkedin @neptune.ml ● Blog posts on Medium @jakub.czakon ● Experiments in Neptune tags skopt/optuna/hpbandster ○ Code ○ Best hyperparams and Hyper hyper params ○ learning curves ○ diagnostic charts ○ resource consumption charts ○ pickled results objects Materials
  91. 91. Data science work sharing hub. Track | Organize | Collaborate kuba@neptune.ml @NeptuneML https://medium.com/neptune-ml Jakub Czakon

×