212MTAMount Durham University Bachelor's Diploma in Technology
Machine learning in finance using python
1. MACHINE LEARNING IN FINANCE
USING PYTHON
ERIC THAM
Director, Quant Strategies
Presentation Slides on
http://www.slideshare.net/erictham/machine-learning-in-finance-using-python
3. MACHINE LEARNING IN FINANCE
Questions :
How do u recognise finance patterns … ?
What data? What do u use it for ?
Unlike normal usage for facial recognition, NLP
4. MACHINE LEARNING IN FINANCE
i. Sentiment analysis : (Behavoiural finance)
ii. Credit analytics
iii. Financial forecasting
iv. Portfolio allocation
5. MACHINE LEARNING PYTHON LIBRARIES
Libraries:
i. sci-kit learn
ii. Theano
iii. Stats-model
Sentiment analysis generally use machine learning.
6. GENERAL FORECASTING: (MACHINE LEARNING)
3 steps to any forecasting: (or machine learning)
1. Preprocess and transform data:
- On both output and input: this is key; it is an art and a science;
- in finance: these could be economic variables, sentiment data, price data
2. Model :
- CART, neural network, logistic regression etc.
- time period
3. Assess and backtest
- statistical output;
- in sample and out of sample
Go back to 1 if necessary.
7. BUILDING A FINANCIAL FORECASTING MODEL IN
PYTHON
1. Sourcing data - retrieves data from sources eg quandl, pandas.io, Yahoo
finance, proprietary databases (go to datasource.py file)
8. BUILDING FINANCIAL FORECASTING MODEL IN
PYTHON
1 .. Technical transformation on data (dataTechnical.py)
- technical indicators like RSI, MACD, KDJ:
10. BUILDING FINANCIAL FORECASTING MODEL IN
PYTHON
Training - applies different model parameters (possibly 1000s combinations) to
assess best results
Go to dataTrain.py
11. PORTFOLIO SELECTION & ALLOCATION
1. clusterPortfolio.py (K-means)
- aggregates stock features eg. sentiment, technical indicators,
momentum indicators, historical returns, betas etc.
- X n * m : model with n stocks each with m features each
- these are clustered into K clusters with the best cluster being
selected)
- criteria to use: means scores, risk levels, portfolio themes, backtest
results etc.
13. CONCLUSION:
Thank you !
Remember it is an art not a science; machine learning in finance gives you
a framework to understand the system;
Still need intuition and trial-and-error (luck)
My Email : erictham115@yahoo.com
Hinweis der Redaktion
A self introduction of myself:
Studied phd in finance in University of Lausanne/ Switzerland 洛桑大学
Masters in Financial engineering in Columbia University 哥伦比亚 大学
Masters in Business Analytics (Big Data) in National University of Singapore
Presently a partner in a data analytics start-up doing web and consumer analytics
Now, have an interest in Big Data, and especially in NLP in finance. Paper : real time analysis of twitter sentiment on the NASDAQ markets. Hoping to get it published with some more work!
First real-time (20 mins) different from other papers
Some interesting findings (to elaborate later)
Definitions in wikipedia…
Key words – supervised learning in layman terms uses a reference (learning from past experiences) whilst unsupervised learning learns from unlabelled data eg clustering, PCA
questions need to be answered in context; a few areas that I think of as follows:
the answers: will not talk too much on sentiment analysis there is a talk previously on NLTK+ ;
number of other open source libraries as well like jieba : NLP (and sentiment analysis as a whole uses SVM and recurrent neural network)
- Unstructured data analysis
See my link on twitter mood drives markets;
Writing a paper on sentiment drives markets and markets drives sentiment – hope to complete it this couple of months
Credit analytics: uses classification on credit scoring : logistic regression; tree-based regression:
Assesses a person credit-worthiness based on his credit scores
The following two not the main point of my presentation; but the next two more so;
Not my aim to go through excellent ML libraries but will share those that I use and apply esp sci-kit lean and statsmodel
Separate presentation (I understand) using NLTK; and another Theano expert (Deep learning) which I will not touch on then!
Scikit learn and statsmodel -> both good; scikit-learn has more functions generally; for ordinary regressions good enough to use statsmodel
Step 1: actually tests your understanding of the subject matter;
Transformation could be normalisation, threshold
-> normally involves categorisation; or a mixture model ; frequency of data
Anything
Step 2: Not necessary complex models best: model complexity tend to be defined by parameterisation, non-linearity, time-varyingness (stochasticity), meta-models
Number of dimensions (of data),
In forecasting, the model basically says given this scenario or set of data under this situation, u should get this output with a certain degree of probability.
It is the same with other machine learning in computer science – whether NLP, speech etc.
Step 3: did the model achieve what you want?
Why is financial forecasting so difficult? Because it is social science! It is hard to deterministically human emotions, reactions and actions;
Structural changes to model
See code in github;
Criteria can be risk, different returns, drawdown; sharpe ratio etc
See code in github;
Criteria can be risk, different returns, drawdown; sharpe ratio etc
See code in github;
Criteria can be risk, different returns, drawdown; sharpe ratio etc
Code: See python slide
See code in github:
In portfolio allocation,
Imaibo has the advantage in it has the sentiment data.
See code in github:
In portfolio allocation,
Imaibo has the advantage in it has the sentiment data.