Data science allows us to turn a dark forest into a world of
perpetual twilight by giving us the tools to better understand the data that surrounds us. Unfortunately, in this world of twilight we still need a flashlight to get a clean crisp image of our immediate surroundings. We will talk about how to use deep domain expertise as that flashlight shedding light on our understanding of data. Our focus will be on using text analysis as a means to examine qualitative information in a structured, quantitative way. We will draw heavily from examples in complex central bank policy and financial regulation.
CNIC Information System with Pakdata Cf In Pakistan
Domain Expertise and Unstructured Data
1. Domain Expertise and
Unstructured Data
William D. MacMillan and
Evan A. Schnidman
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci
2. ▶ Everyone seems to love collecting and mining unstructured data.
▶ How to make decisions based on it?
Big Data -> Consequential Decisions?
3. Tools to Find Paths
Machine Learning Structural Methods
5. ▶ Expertise allows us to impose structure on otherwise messy results.
Imposing Structure
6. ▶ Data is not limited to numerical.
▶ Information not Data
▶ How to analyze:
-Corporate Communications?
-Central Bank Communications?
▶ Need to know things not easily
vectorized.
▶ Dimension reduction by applying
information.
Data is Everywhere
7. ▶ Good Buzzword minus Bad Buzzword == Sentiment
Traditional Sentiment Analysis
▶ Domain expertise
allows for much
more refined analysis
▶ Not a pure data
science solution
▶ Time for experts to
embrace tech and
data science to
utilize experts!
8. ▶ Central Bank communications are complex and important
▶ Focus today is Federal Reserve
Example: Central Banks
10. Failed Attempts
▶ Experts are biased and fail to be comprehensive
▶ Simple text analysis dictionaries don’t work for
Fed Speak and other complex language
▶ Ex. “modest” v. “moderate”
Necessary Components
▶ Must use expertise to train the system based on
whole communications
▶ Market response matters (Hawkish v. Dovish)
Experts in “Fed Speak”
11. Scaling Data
+ =
Enough documents
can eliminate bias
Expertise allows scaling
based on whole documents End result is whole
communications scored
in orderly fashion
12. Resulting Data:
▶ Comprehensive
▶ Unbiased
▶ Quantitative
▶ Fast
Many Possible Uses
▶ Eliminate post-hoc
hedging on CB policy
▶ Forecast based on
established correlations
▶ Add as a signal in
multifactor model
Qual Turned Quant
Trend matters more than value!
13. ▶ Alpha across asset classes, not just Fixed Income
▶ Mitigates downside risk, especially with Equities.
▶ Beats Buy and Hold and Trend Following
▶ Low correlation to commonly used strategies
▶ Better performance with FOREX because both sides of currency pair trade.
Backtested Data
Graph Courtesy
of Mavenomics
14. ▶ Method translates across wide variety of financially important texts
▶ Regulatory and shareholder documents for individual equities
▶ Other regulatory information (Dodd-Frank, FDA, EPA etc.)
Other Applications
18. U.S. Federal Reserve
European Central Bank
Bank of England
Bank of Canada
Bank of Japan
Reserve Bank of Australia
Bank of Korea
Reserve Bank of India
Swedish Riksbank
List of Central Banks
Reserve Bank of New Zealand
Central Bank of Mexico
Central Bank of Brazil
Central Bank of Russia
South African Reserve Bank
Bank of Israel
Central Bank of Turkey
Central Bank of Taiwan
Swiss National Bank
21. Backtesting
Independent Backtesting Results
The following results are from a fund that independently tested the Fed Playbook data in January of 2015. This fund primarily
utilized a standard return to volatility futures trading strategy based on a common risk parity model to test the FPSI data from
January 2000 to December 2014. All transactions costs are built into the testing. Their findings indicated the following:
• The FPSI is a superior trade signal to both of the most common trading strategies, “Trend Following” and “Buy and Hold.”
EQUITIES
• Using a simple portfolio of the S&P 500, both Trend Following and Buy and Hold generate returns of roughly 27% over the testing period.
• The FPSI generates risk adjusted returns of 58%, more than double the most commonly used trading strategies.
• FPSI returns were generated with almost perfect long/short balance.
• The FPSI only has a 0.3 correlation to Trend Following and just a 0.1 correlation to Buy and Hold, so the FPSI can be used in
concert with these established strategies to generate even higher returns.
• The FPSI also proved to be a superb indicator of downside risk, even beating Trend Following.
• Optimal holding periods for an equity portfolio traded on FPSI data is 2-3 months.
FOREX
• Examining only the U.S. Dollar and Euro based on just U.S. data indicates that the FPSI outperforms existing currency trading models.
• Trend Following tends to dominate the currency trading space because over the sample period it generated a 55% return.
• Over the same period the FPSI generates over 70% returns.
• The FPSI only has a 0.17 correlation to Trend Following, so these two strategies could be used in concert to generate even higher returns.
• Optimal holding periods for a currency trade based on the FPSI data is 10-15 days.
• These returns are only taking into account Prattle Analytics’ data on the U.S. Federal Reserve, since Prattle also has data on
the European Central Bank (along with more than a dozen other central banks), this information could be used to better
understand the other side of the currency pair trade and generate even greater returns.
22. Prattle AnalyticsTradable Data From Market Chatter
Using Domain Expertise
To Improve Text Analysis
--Evan A. Schnidman
eas@prattle-analytics.com