Bayesian Optimization is an efficient way to optimize model parameters, especially when evaluating different parameters is time-consuming or expensive. Trading pipelines often have many tunable configuration parameters that can have a large impact on the efficacy of the model and are notoriously expensive to train and backtest.
In traditional optimization a single metric like a Sharpe Ratio is being optimized over a potentially large set of configurations with the goal of a single, best configuration being produced. In this talk we’ll explore real world extensions to this where multiple competing objectives need to be optimized, a portfolio of multiple solutions may be required, constraints on the underlying system make certain configurations unviable, and more. We’ll present work from recent ICML and NIPS workshop papers and detailed examples.
We’ll compare the results of Bayesian Optimization to these optimization problems to standard techniques like grid search, random search, and expert tuning across several datasets.
13. OPTIMIZATION FEEDBACK LOOP
Objective Metric
Better
Results
REST API
New configurations
Trading
Models
Data
Backtest /
Simulation
Domain
Expertise
14. ● Create a strategy to trade Select Sector SPDR ETFs
○ XLV, XLF, XLP, XLE, XLK, XLB, XLU, XLI
● Trade on common signals
○ Relative Strength Interest (RSI)
○ Rate of Change (ROC)
● Maximize Sharpe Ratio
PROBLEM
https://blog.quantopian.com/bayesian-optimization-of-a-technical-trading-algorithm-with-ziplinesigopt-2/
15. TUNABLE PARAMETERS IN ALGO TRADING
● Relative Strength Interest (RSI)
○ Lookback window for # of prices used in the RSI calculation
○ Lower_bound value defining the trade entry condition
○ Range_width, which will be added to the Lower-bound
■ Lower_bound + Range_width is the range of values over which our RSI signal will be
considered True
● Rate of Change (ROC)
○ Lookback window for # of prices used in the ROC calculation
○ Lower_bound value defining the trade entry condition
○ Range_width, which will be added to the Lower-bound
■ Lower_bound + Range_width is the range of values over which our ROC signal will be
considered True
● Signal evaluation frequency
○ Number of days between evaluation of if our signals
■ Do we evaluate them every day, every week, every month, etc.
16. COMBINATORIAL EXPLOSION
● RSI lookback window: 115 values (5 to 120)
● RSI lower bound: 90 values (0 to 90)
● RSI range width: 20 values (10 to 30)
● ROC lookback window: 61 values (2 to 63)
● ROC lower bound: 30 values (0 to 30)
● ROC range width: 195 values (5 to 200)
● Evaluation frequency: 18 values (3 to 21)
=
1,329,623,100,000 possible configurations
17. COMPARATIVE PERFORMANCE
Grid Search
Expert
Grid
● Better: 200%
Higher model
returns than
manual search
● Faster/Cheaper:
10x fewer
evaluations
vs standard
methods
BacktestPortfolioValue
Time (2004-2012)
Blog Post
19. TUNING MULTIPLE METRICS
What if we want to optimize multiple competing metrics?
● Trading Tradeoffs
○ Sharpe Ratio vs Drawdown
○ Backtest Alpha vs Uncertainty
○ Quality vs Robustness
● Complexity Tradeoffs
○ Accuracy vs Training Time
○ Accuracy vs Inference Time
20. PARETO OPTIMAL
What does it mean to optimize two metrics simultaneously?
Pareto efficiency or Pareto optimality is a state of
allocation of resources from which it is impossible to
reallocate so as to make any one individual or
preference criterion better off without making at least
one individual or preference criterion worse off.
21. PARETO OPTIMAL
What does it mean to optimize two metrics simultaneously?
The red points are on the Pareto
Efficient Frontier, they strictly
dominate all of the grey points.
You can do no better in one metric
without sacrificing performance in
the other.
Point N is Pareto Optimal
compared to Point K.
22. PARETO EFFICIENT FRONTIER
Goal is to have best set of feasible solutions to select from
After optimization the expert picks
one or more of the red points from
the Pareto Efficient Frontier to
further study or put into production.
26. MULTI-METRIC OPT IN DEEP LEARNING
https://devblogs.nvidia.com/sigopt-deep-learning-hyperparameter-optimization/
27. DEEP LEARNING TRADEOFFS
● Deep Learning pipelines are time
consuming and expensive to run
● Application and deployment
conditions may make certain
configurations less desirable
● Tuning for both accuracy and
complexity metrics like training or
inference time allows expert to make
best decision for production
28. ● Comparison of several RMSProp SGD parametrizations
● Different configurations converge differently
STOCHASTIC GRADIENT DESCENT
29. TEXT CLASSIFICATION PIPELINE
ML / AI
Model
(MXNet)
Testing
Text
Validation
Accuracy
Better
Results
REST API
Hyperparameter
Configurations
and
Feature
Transformations
Training
Text
Training Time
31. SEQUENCE CLASSIFICATION PIPELINE
ML / AI
Model
(Tensorflow)
Testing
Sequences
Validation
Accuracy
Better
Results
REST API
Hyperparameter
Configurations
and
Feature
Transformations
Training
Sequences
Inference Time
35. LOAN CLASSIFICATION PIPELINE
ML / AI
Model
(LightGBM)
Testing
Data
Validation
AUCPR
Better
Results
REST API
Hyperparameter
Configurations
and
Feature
Transformations
Training
Data
Avg $ Lost
36. GRID SEARCH CAN MISLEAD
● Best grid search point (wrt
accuracy) loses >$35 /
transaction
● Best grid search point (wrt loss)
has 70% accuracy
● Points of the Pareto Frontier give
user more information about
what is possible and more
control of trade-offs
37. TAKEAWAYS
One metric may not paint the whole picture
- Think about metric trade-offs in your model pipelines
- Optimizing for the wrong thing can be very expensive
Not all optimization strategies are equal
- Pick an optimization strategy that gives the most flexibility
- Different tools enable you to tackle new problems