Using Bayesian Optimization to Simultaneously Tune Multiple Metrics - Quantcon 2018

Model Optimization with Competing Objectives
QuantCon 2018
Scott Clark
scott@sigopt.com

OUTLINE
1. Why is Tuning Models Hard?
2. Common Tuning Methods
3. Deep Learning Example
4. Tuning Multiple Metrics
5. Multi-metric Optimization Examples

Algorithmic Trading and AI / ML are
extremely powerful
Tuning these systems is
extremely non-intuitive

TUNABLE PARAMETERS IN DEEP LEARNING

STANDARD METHODS
FOR PARAMETER SEARCH

STANDARD TUNING METHODS
Trading
Models
Data
Backtest /
Simulation
Parameter
Configuration
?
Grid Search Random Search
Manual Search
- Weights
- Thresholds
- Window sizes
- Transformations
Domain
Expertise

OPTIMIZATION FEEDBACK LOOP
Objective Metric
Better
Results
REST API
New configurations
Trading
Models
Data
Backtest /
Simulation
Domain
Expertise

● Create a strategy to trade Select Sector SPDR ETFs
○ XLV, XLF, XLP, XLE, XLK, XLB, XLU, XLI
● Trade on common signals
○ Relative Strength Interest (RSI)
○ Rate of Change (ROC)
● Maximize Sharpe Ratio
PROBLEM
https://blog.quantopian.com/bayesian-optimization-of-a-technical-trading-algorithm-with-ziplinesigopt-2/

TUNABLE PARAMETERS IN ALGO TRADING
● Relative Strength Interest (RSI)
○ Lookback window for # of prices used in the RSI calculation
○ Lower_bound value defining the trade entry condition
○ Range_width, which will be added to the Lower-bound
■ Lower_bound + Range_width is the range of values over which our RSI signal will be
considered True
● Rate of Change (ROC)
○ Lookback window for # of prices used in the ROC calculation
○ Lower_bound value defining the trade entry condition
○ Range_width, which will be added to the Lower-bound
■ Lower_bound + Range_width is the range of values over which our ROC signal will be
considered True
● Signal evaluation frequency
○ Number of days between evaluation of if our signals
■ Do we evaluate them every day, every week, every month, etc.

COMBINATORIAL EXPLOSION
● RSI lookback window: 115 values (5 to 120)
● RSI lower bound: 90 values (0 to 90)
● RSI range width: 20 values (10 to 30)
● ROC lookback window: 61 values (2 to 63)
● ROC lower bound: 30 values (0 to 30)
● ROC range width: 195 values (5 to 200)
● Evaluation frequency: 18 values (3 to 21)
=
1,329,623,100,000 possible configurations

COMPARATIVE PERFORMANCE
Grid Search
Expert
Grid
● Better: 200%
Higher model
returns than
manual search
● Faster/Cheaper:
10x fewer
evaluations
vs standard
methods
BacktestPortfolioValue
Time (2004-2012)
Blog Post

COMPARATIVE PERFORMANCE
https://papers.ssrn.com/sol3/paper
s.cfm?abstract_id=2745220
● Out of sample
performance is
terrible
● We need better
metrics

TUNING MULTIPLE METRICS
What if we want to optimize multiple competing metrics?
● Trading Tradeoffs
○ Sharpe Ratio vs Drawdown
○ Backtest Alpha vs Uncertainty
○ Quality vs Robustness
● Complexity Tradeoffs
○ Accuracy vs Training Time
○ Accuracy vs Inference Time

PARETO OPTIMAL
What does it mean to optimize two metrics simultaneously?
Pareto efficiency or Pareto optimality is a state of
allocation of resources from which it is impossible to
reallocate so as to make any one individual or
preference criterion better off without making at least
one individual or preference criterion worse off.

PARETO OPTIMAL
What does it mean to optimize two metrics simultaneously?
The red points are on the Pareto
Efficient Frontier, they strictly
dominate all of the grey points.
You can do no better in one metric
without sacrificing performance in
the other.
Point N is Pareto Optimal
compared to Point K.

PARETO EFFICIENT FRONTIER
Goal is to have best set of feasible solutions to select from
After optimization the expert picks
one or more of the red points from
the Pareto Efficient Frontier to
further study or put into production.

MULTI-METRIC OPT IN DEEP LEARNING
https://devblogs.nvidia.com/sigopt-deep-learning-hyperparameter-optimization/

DEEP LEARNING TRADEOFFS
● Deep Learning pipelines are time
consuming and expensive to run
● Application and deployment
conditions may make certain
configurations less desirable
● Tuning for both accuracy and
complexity metrics like training or
inference time allows expert to make
best decision for production

● Comparison of several RMSProp SGD parametrizations
● Different configurations converge differently
STOCHASTIC GRADIENT DESCENT

TEXT CLASSIFICATION PIPELINE
ML / AI
Model
(MXNet)
Testing
Text
Validation
Accuracy
Better
Results
REST API
Hyperparameter
Configurations
and
Feature
Transformations
Training
Text
Training Time

SEQUENCE CLASSIFICATION PIPELINE
ML / AI
Model
(Tensorflow)
Testing
Sequences
Validation
Accuracy
Better
Results
REST API
Hyperparameter
Configurations
and
Feature
Transformations
Training
Sequences
Inference Time

LOAN CLASSIFICATION PIPELINE
ML / AI
Model
(LightGBM)
Testing
Data
Validation
AUCPR
Better
Results
REST API
Hyperparameter
Configurations
and
Feature
Transformations
Training
Data
Avg $ Lost

GRID SEARCH CAN MISLEAD
● Best grid search point (wrt
accuracy) loses >$35 /
transaction
● Best grid search point (wrt loss)
has 70% accuracy
● Points of the Pareto Frontier give
user more information about
what is possible and more
control of trade-offs

TAKEAWAYS
One metric may not paint the whole picture
- Think about metric trade-offs in your model pipelines
- Optimizing for the wrong thing can be very expensive
Not all optimization strategies are equal
- Pick an optimization strategy that gives the most flexibility
- Different tools enable you to tackle new problems

Questions?
contact@sigopt.com
https://sigopt.com
@SigOpt

Using Bayesian Optimization to Simultaneously Tune Multiple Metrics - Quantcon 2018

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Mehr von SigOpt

Mehr von SigOpt (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Using Bayesian Optimization to Simultaneously Tune Multiple Metrics - Quantcon 2018