SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Vinod Gupta School of Management, IIT Kharagpur
Text Analytics- An Application in
Indian Stock Market
Applied Management Research Project, 2014
By Sinjana Ghosh
Done under the able guidance of
Prof. A. K. Misra
Background
Motivation behind this project
Algorithmic Trading in India
 Involves the use of algorithms in pre-built platforms to
place electronic trades on stocks, futures, options,
currencies and commodities on exchanges, without any
human intervention
 In 2008, India allowed the first Direct-Market-Access
(DMA) and algorithmic trades to go through
 The most commonly used strategies of algorithmic
trading in India include arbitrage, market making and
trend following algorithms
Big Data
 Data available in various forms – not just structured
but also semi-structured like XML and EDI
Documents and unstructured like Text, multimedia
etc.
 Big Data analytics is the strategy of using this huge
amount of data which is now accessible through
internet, mobile messages and various other
platforms, to extract useful information , that can be
further analyzed to help in the decision making
process
Text Data analytics
 Subset of Big data analytics which involves extraction of
entities like person, location, organization etc. from text
messages and relationship between the extracted entities
and analysing them for business needs
Predictive analytics
 Involves searching for meaningful relationships among
variables and representing those relationships in models
 Response variables and explanatory variables
 Two common types of model: Regression and
Classification
Sentiment Analysis
 Use of natural language processing, text analysis and
computational linguistics to identify and extract
subjective information in source materials
 Aims to determine the attitude of a speaker or a writer
with respect to some topic or the overall contextual
polarity of a document
Machine Learning
 A branch of artificial intelligence, concerns the
construction and study of systems that can learn from
data
The Problem
Using text mining of news articles available in the public
domain to analyse the market sentiment and correlate it
with the actual movement in Nifty 50
 Use textual news from a plethora of online
resources to perform data mining to check for
occurrence of a basic set of keywords in the
article.
 Training a machine learning algorithm for
accurately predicting the impact of the most
viewed news articles on the market sentiment
and predict the movement of market represented
in the study by Nifty50.
 Validate the results obtained through training set
using a set of recent news articles (Test set) to
check for errors and level of accuracy.
Objective
Methodology
 Textual Representation
 Bag of words
 Noun Phrasing
 Named Entities
 Named Entities with context-capturing feature
 Predictive Modelling Approach
Source: Modeling Techniques in Predictive Analytics: Business Problems and
Solutions with R (Mill)
Methodology
 Sources of textual data
Methodology
 Partitioning data in machine learning
Source: Modeling Techniques in Predictive Analytics: Business Problems and
Solutions with R (Mill)
Text Analysis Algorithm
1. Convert all the characters to lowercase
2. Remove stop-words which does not help in sentiment analysis
like “is”, “are”, “if”, “when”, “where”, “then”, “their”, “there”,
“where”, “why”, “when”, “which”, “how”
After this the following is done:
1. Create an array of named entities which are of significance
like “inflation”, “gdp”, “sensex” etc.
2. The script is run which extracts the named entities which
occur in the article along with the 2 words immediately
preceding and 3 words immediately succeeding it. This is done
to not only capture the keywords but also the context.
3. The algorithm is trained by assigning weights to each of the
keyword so that the sentiment score most closely reflects the
actual returns of the day.
Text Analysis Algorithm
4. A set of qualifiers is defined and the preceding and succeeding words
captured as “context” of the extracted keyword. The algorithm further
assigns a weight (-1 for negative, 0 for neutral and +1 for positive) to
each extracted qualifiers.
5. The sum product of the qualifier weight and keyword weight gives the
actual sentiment score of the article from which the returns of the day
due to that news can be predicted.
6. Importance score is simply the sum of the weights of the individual
occurrence of keywords in the article. However, whether the effect
will be positive or negative, and how much the market will react to it
is determined only by the sentiment score.
7. Regression is performed on the scores versus actual returns for the
training set and a formula is obtained for converting the scores into
forecasted returns.
8. This is tested on the validation set and errors are calculated.
Training of algorithm
 Training set: Daily returns of 2013-14 with
returns>1% or returns<1%
 Several iterations were run and regression was
performed at each level to finalize the set of
keywords in the lexicon, weights of each keyword,
set of qualifiers and their scores, and the set of
exceptional items in the lexicon
 Started iteration with 50 articles ended with 125
articles
Analysis and Results
125 news articles in the training set were analyzed using
the script in R and the following are extracted:
• All the named entities occurring in the news article that
match with the lexicon
• Capture the context in which they appear by extracting
the preceding as well as succeeding words of the named
entity
Interesting observations
 The number of keywords that a news article contains has a
much lesser bearing on the effect of the news article on the
market as does the context in which it appears. Based
simply on the occurrence of keywords 35 news articles got
importance score greater than 80 but when sentiment
score was calculated most of the context led to neutral
scoring (0) thus leading to low sentiment score suggesting
low returns ( both on the positive as well as negative side)
 The keywords assigned highest weight while training of the
algorithm are :
 RBI
 Rupee
 Inflation
 GDP
Interesting observations
 Names of specific indices, or industries or results of
specific companies which contain terms like
“quarterly”, “results”, “annual”, “profit”, “revenue”
etc. are least useful in evaluating the sentiment of the
overall market represented by Nifty
 When the Gold prices came down drastically, markets
in most nations fell as gold mutual funds incurred
huge losses. However, in India broad indices
outperformed on the same event, which goes on to
show that the prices of precious metals have inverse
effect on the Indian stock market as a whole. So gold
has also been included in the list of exceptional items
in the lexicon.
Prediction Accuracy
 Summary of Training set results:
Prediction Accuracy
 Line Fit plot for training set:
 Line Fit plot for test set:
An Example from test set
 March 24, 2014
An Example from test set
Dataset and analysis
Workspace showing the list of keywords
Conclusion and scope of further
work
Conclusion
 The algorithm used in the study along the weights given to
the terms in lexicon and qualifiers is able to predict daily
market returns effectively for daily returns greater than
equal to 1% (positive or negative)
 Indian stock market does react to systemically important
news articles
 Textual analysis of publicly available of news articles have
significant predictive quality
 As efficiency of Indian market increases hence arbitrage
opportunities will be less, so algorithmic traders will have
significant advantage over manual traders if text analytics
is implemented in algorithmic trading
Scope of further work
 News articles can be clustered or classified into “economic
news”, “political news” and “other news” based on the
frequency of specific named entities to find out which type of
news have greatest impact on the Indian market
 If minute-wise market returns are available then news articles
can be collected every hour and the returns can be observed
over a period to find how much time it requires a news article
of a certain importance score to affect the market
 This text mining algorithm is not fully automated. The news
articles need to be fed manually into the program for it to run
and predict the returns. However this process can be
automated to obtain live news feed from websites and
automatically predict its importance and sentiment score. If
the score is higher or lower than a particular range, then BUY
or SELL (or short sell) calls can be taken automatically by the
machine.
Thank you!
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Textual analysis of stock market
Textual analysis of stock marketTextual analysis of stock market
Textual analysis of stock market
ivan weinel
 
STOCK MARKET PREDICTION
STOCK MARKET PREDICTIONSTOCK MARKET PREDICTION
STOCK MARKET PREDICTION
Shivank Chaudhary
 
Stock Market Analysis
Stock Market AnalysisStock Market Analysis
Stock Market Analysis
Gabriel Policiuc
 

Was ist angesagt? (19)

Stock market prediction using data mining
Stock market prediction using data miningStock market prediction using data mining
Stock market prediction using data mining
 
IRJET - Stock Market Analysis and Prediction
IRJET - Stock Market Analysis and PredictionIRJET - Stock Market Analysis and Prediction
IRJET - Stock Market Analysis and Prediction
 
Stock market trend prediction using k nearest neighbor(knn) algorithm
Stock market trend prediction using k nearest neighbor(knn) algorithmStock market trend prediction using k nearest neighbor(knn) algorithm
Stock market trend prediction using k nearest neighbor(knn) algorithm
 
IRJET- Stock Market Prediction using Deep Learning and Sentiment Analysis
IRJET- Stock Market Prediction using Deep Learning and Sentiment AnalysisIRJET- Stock Market Prediction using Deep Learning and Sentiment Analysis
IRJET- Stock Market Prediction using Deep Learning and Sentiment Analysis
 
Stock Market Prediction
Stock Market PredictionStock Market Prediction
Stock Market Prediction
 
50120140503005
5012014050300550120140503005
50120140503005
 
Stock Market Prediction and Investment Portfolio Selection Using Computationa...
Stock Market Prediction and Investment Portfolio Selection Using Computationa...Stock Market Prediction and Investment Portfolio Selection Using Computationa...
Stock Market Prediction and Investment Portfolio Selection Using Computationa...
 
IRJET- Prediction of Stock Market using Machine Learning Algorithms
IRJET- Prediction of Stock Market using Machine Learning AlgorithmsIRJET- Prediction of Stock Market using Machine Learning Algorithms
IRJET- Prediction of Stock Market using Machine Learning Algorithms
 
STOCK TREND PREDICTION USING NEWS SENTIMENT ANALYSIS
STOCK TREND PREDICTION USING NEWS SENTIMENT ANALYSISSTOCK TREND PREDICTION USING NEWS SENTIMENT ANALYSIS
STOCK TREND PREDICTION USING NEWS SENTIMENT ANALYSIS
 
Stock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised LearningStock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised Learning
 
Stock Market Prediction
Stock Market Prediction Stock Market Prediction
Stock Market Prediction
 
IJET-V3I1P16
IJET-V3I1P16IJET-V3I1P16
IJET-V3I1P16
 
Textual analysis of stock market
Textual analysis of stock marketTextual analysis of stock market
Textual analysis of stock market
 
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: TwisentIRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
 
Stock market analysis
Stock market analysisStock market analysis
Stock market analysis
 
STOCK MARKET PREDICTION
STOCK MARKET PREDICTIONSTOCK MARKET PREDICTION
STOCK MARKET PREDICTION
 
STOCK MARKET PREDICTION USING MACHINE LEARNING METHODS
STOCK MARKET PREDICTION USING MACHINE LEARNING METHODSSTOCK MARKET PREDICTION USING MACHINE LEARNING METHODS
STOCK MARKET PREDICTION USING MACHINE LEARNING METHODS
 
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
 
Stock Market Analysis
Stock Market AnalysisStock Market Analysis
Stock Market Analysis
 

Andere mochten auch

The Importance of Research
The Importance of ResearchThe Importance of Research
The Importance of Research
Jill Allden
 
Operational research on Assignment ppt
Operational research on Assignment pptOperational research on Assignment ppt
Operational research on Assignment ppt
Nirali Solanki
 
applications of operation research in business
applications of operation research in businessapplications of operation research in business
applications of operation research in business
raaz kumar
 
POLICY MAKING PROCESS
POLICY MAKING PROCESSPOLICY MAKING PROCESS
POLICY MAKING PROCESS
Yammie Daud
 

Andere mochten auch (13)

Assignment 6.1
Assignment 6.1Assignment 6.1
Assignment 6.1
 
“Efficient and Sustainable Management of our Operational Forests.” Derek Do...
“Efficient and Sustainable Management of our Operational Forests.”   Derek Do...“Efficient and Sustainable Management of our Operational Forests.”   Derek Do...
“Efficient and Sustainable Management of our Operational Forests.” Derek Do...
 
PUBLIC POLICY: AN INTRODUCTION
PUBLIC POLICY: AN INTRODUCTIONPUBLIC POLICY: AN INTRODUCTION
PUBLIC POLICY: AN INTRODUCTION
 
Significance of research
Significance of researchSignificance of research
Significance of research
 
Significance of research - Research Methodology - Manu Melwin Joy
Significance of research - Research Methodology - Manu Melwin JoySignificance of research - Research Methodology - Manu Melwin Joy
Significance of research - Research Methodology - Manu Melwin Joy
 
The Importance of Research
The Importance of ResearchThe Importance of Research
The Importance of Research
 
Public Policy
Public PolicyPublic Policy
Public Policy
 
Public Policy Formulation - Process and Tools
Public Policy Formulation - Process and ToolsPublic Policy Formulation - Process and Tools
Public Policy Formulation - Process and Tools
 
Operational reseach ppt
Operational reseach pptOperational reseach ppt
Operational reseach ppt
 
PUBLIC POLICY: AN INTRODUCTION
PUBLIC POLICY: AN INTRODUCTIONPUBLIC POLICY: AN INTRODUCTION
PUBLIC POLICY: AN INTRODUCTION
 
Operational research on Assignment ppt
Operational research on Assignment pptOperational research on Assignment ppt
Operational research on Assignment ppt
 
applications of operation research in business
applications of operation research in businessapplications of operation research in business
applications of operation research in business
 
POLICY MAKING PROCESS
POLICY MAKING PROCESSPOLICY MAKING PROCESS
POLICY MAKING PROCESS
 

Ähnlich wie Text Analytics- An application in Indian Stock Markets

IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTIONIMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
IJDKP
 

Ähnlich wie Text Analytics- An application in Indian Stock Markets (20)

Project report on Share Market application
Project report on Share Market applicationProject report on Share Market application
Project report on Share Market application
 
Automation Tool Development to Improve Machine Results using Data Analysis
Automation Tool Development to Improve Machine Results using Data AnalysisAutomation Tool Development to Improve Machine Results using Data Analysis
Automation Tool Development to Improve Machine Results using Data Analysis
 
Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...
Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...
Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...
 
Methods for Sentiment Analysis: A Literature Study
Methods for Sentiment Analysis: A Literature StudyMethods for Sentiment Analysis: A Literature Study
Methods for Sentiment Analysis: A Literature Study
 
Natural Language Processing Use Cases for Business Optimization
Natural Language Processing Use Cases for Business OptimizationNatural Language Processing Use Cases for Business Optimization
Natural Language Processing Use Cases for Business Optimization
 
RETRIEVING FUNDAMENTAL VALUES OF EQUITY
RETRIEVING FUNDAMENTAL VALUES OF EQUITYRETRIEVING FUNDAMENTAL VALUES OF EQUITY
RETRIEVING FUNDAMENTAL VALUES OF EQUITY
 
STOCK MARKET PREDICTION USING MACHINE LEARNING IN PYTHON
STOCK MARKET PREDICTION USING MACHINE LEARNING IN PYTHONSTOCK MARKET PREDICTION USING MACHINE LEARNING IN PYTHON
STOCK MARKET PREDICTION USING MACHINE LEARNING IN PYTHON
 
Stock Market Prediction using Alpha Vantage API and Machine Learning Algorithm
Stock Market Prediction using Alpha Vantage API and Machine Learning AlgorithmStock Market Prediction using Alpha Vantage API and Machine Learning Algorithm
Stock Market Prediction using Alpha Vantage API and Machine Learning Algorithm
 
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTIONIMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
 
IRJET- Stock Market Prediction using Machine Learning Techniques
IRJET- Stock Market Prediction using Machine Learning TechniquesIRJET- Stock Market Prediction using Machine Learning Techniques
IRJET- Stock Market Prediction using Machine Learning Techniques
 
IRJET- Prediction in Stock Marketing
IRJET- Prediction in Stock MarketingIRJET- Prediction in Stock Marketing
IRJET- Prediction in Stock Marketing
 
INFORMS 2015
INFORMS 2015INFORMS 2015
INFORMS 2015
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine Learning
 
Sentiment Analysis based Stock Forecast Application
Sentiment Analysis based Stock Forecast ApplicationSentiment Analysis based Stock Forecast Application
Sentiment Analysis based Stock Forecast Application
 
Indian Stock Market Using Machine Learning(Volume1, oct 2017)
Indian Stock Market Using Machine Learning(Volume1, oct 2017)Indian Stock Market Using Machine Learning(Volume1, oct 2017)
Indian Stock Market Using Machine Learning(Volume1, oct 2017)
 
How economists should think about the revolutionary changes taking place in h...
How economists should think about the revolutionary changes taking place in h...How economists should think about the revolutionary changes taking place in h...
How economists should think about the revolutionary changes taking place in h...
 
IRJET - Stock Market Analysis and Prediction using Deep Learning
IRJET - Stock Market Analysis and Prediction using Deep LearningIRJET - Stock Market Analysis and Prediction using Deep Learning
IRJET - Stock Market Analysis and Prediction using Deep Learning
 
Inventory System
Inventory System Inventory System
Inventory System
 
Framework for Product Recommandation for Review Dataset
Framework for Product Recommandation for Review DatasetFramework for Product Recommandation for Review Dataset
Framework for Product Recommandation for Review Dataset
 

KĂźrzlich hochgeladen

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
SUHANI PANDEY
 

KĂźrzlich hochgeladen (20)

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

Text Analytics- An application in Indian Stock Markets

  • 1. Vinod Gupta School of Management, IIT Kharagpur Text Analytics- An Application in Indian Stock Market Applied Management Research Project, 2014 By Sinjana Ghosh Done under the able guidance of Prof. A. K. Misra
  • 3. Algorithmic Trading in India  Involves the use of algorithms in pre-built platforms to place electronic trades on stocks, futures, options, currencies and commodities on exchanges, without any human intervention  In 2008, India allowed the first Direct-Market-Access (DMA) and algorithmic trades to go through  The most commonly used strategies of algorithmic trading in India include arbitrage, market making and trend following algorithms
  • 4. Big Data  Data available in various forms – not just structured but also semi-structured like XML and EDI Documents and unstructured like Text, multimedia etc.  Big Data analytics is the strategy of using this huge amount of data which is now accessible through internet, mobile messages and various other platforms, to extract useful information , that can be further analyzed to help in the decision making process
  • 5. Text Data analytics  Subset of Big data analytics which involves extraction of entities like person, location, organization etc. from text messages and relationship between the extracted entities and analysing them for business needs Predictive analytics  Involves searching for meaningful relationships among variables and representing those relationships in models  Response variables and explanatory variables  Two common types of model: Regression and Classification
  • 6. Sentiment Analysis  Use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials  Aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document Machine Learning  A branch of artificial intelligence, concerns the construction and study of systems that can learn from data
  • 7. The Problem Using text mining of news articles available in the public domain to analyse the market sentiment and correlate it with the actual movement in Nifty 50
  • 8.  Use textual news from a plethora of online resources to perform data mining to check for occurrence of a basic set of keywords in the article.  Training a machine learning algorithm for accurately predicting the impact of the most viewed news articles on the market sentiment and predict the movement of market represented in the study by Nifty50.  Validate the results obtained through training set using a set of recent news articles (Test set) to check for errors and level of accuracy. Objective
  • 9. Methodology  Textual Representation  Bag of words  Noun Phrasing  Named Entities  Named Entities with context-capturing feature  Predictive Modelling Approach Source: Modeling Techniques in Predictive Analytics: Business Problems and Solutions with R (Mill)
  • 11. Methodology  Partitioning data in machine learning Source: Modeling Techniques in Predictive Analytics: Business Problems and Solutions with R (Mill)
  • 12. Text Analysis Algorithm 1. Convert all the characters to lowercase 2. Remove stop-words which does not help in sentiment analysis like “is”, “are”, “if”, “when”, “where”, “then”, “their”, “there”, “where”, “why”, “when”, “which”, “how” After this the following is done: 1. Create an array of named entities which are of significance like “inflation”, “gdp”, “sensex” etc. 2. The script is run which extracts the named entities which occur in the article along with the 2 words immediately preceding and 3 words immediately succeeding it. This is done to not only capture the keywords but also the context. 3. The algorithm is trained by assigning weights to each of the keyword so that the sentiment score most closely reflects the actual returns of the day.
  • 13. Text Analysis Algorithm 4. A set of qualifiers is defined and the preceding and succeeding words captured as “context” of the extracted keyword. The algorithm further assigns a weight (-1 for negative, 0 for neutral and +1 for positive) to each extracted qualifiers. 5. The sum product of the qualifier weight and keyword weight gives the actual sentiment score of the article from which the returns of the day due to that news can be predicted. 6. Importance score is simply the sum of the weights of the individual occurrence of keywords in the article. However, whether the effect will be positive or negative, and how much the market will react to it is determined only by the sentiment score. 7. Regression is performed on the scores versus actual returns for the training set and a formula is obtained for converting the scores into forecasted returns. 8. This is tested on the validation set and errors are calculated.
  • 14. Training of algorithm  Training set: Daily returns of 2013-14 with returns>1% or returns<1%  Several iterations were run and regression was performed at each level to finalize the set of keywords in the lexicon, weights of each keyword, set of qualifiers and their scores, and the set of exceptional items in the lexicon  Started iteration with 50 articles ended with 125 articles
  • 15. Analysis and Results 125 news articles in the training set were analyzed using the script in R and the following are extracted: • All the named entities occurring in the news article that match with the lexicon • Capture the context in which they appear by extracting the preceding as well as succeeding words of the named entity
  • 16. Interesting observations  The number of keywords that a news article contains has a much lesser bearing on the effect of the news article on the market as does the context in which it appears. Based simply on the occurrence of keywords 35 news articles got importance score greater than 80 but when sentiment score was calculated most of the context led to neutral scoring (0) thus leading to low sentiment score suggesting low returns ( both on the positive as well as negative side)  The keywords assigned highest weight while training of the algorithm are :  RBI  Rupee  Inflation  GDP
  • 17. Interesting observations  Names of specific indices, or industries or results of specific companies which contain terms like “quarterly”, “results”, “annual”, “profit”, “revenue” etc. are least useful in evaluating the sentiment of the overall market represented by Nifty  When the Gold prices came down drastically, markets in most nations fell as gold mutual funds incurred huge losses. However, in India broad indices outperformed on the same event, which goes on to show that the prices of precious metals have inverse effect on the Indian stock market as a whole. So gold has also been included in the list of exceptional items in the lexicon.
  • 18. Prediction Accuracy  Summary of Training set results:
  • 19. Prediction Accuracy  Line Fit plot for training set:  Line Fit plot for test set:
  • 20. An Example from test set  March 24, 2014
  • 21. An Example from test set
  • 23. Workspace showing the list of keywords
  • 24. Conclusion and scope of further work
  • 25. Conclusion  The algorithm used in the study along the weights given to the terms in lexicon and qualifiers is able to predict daily market returns effectively for daily returns greater than equal to 1% (positive or negative)  Indian stock market does react to systemically important news articles  Textual analysis of publicly available of news articles have significant predictive quality  As efficiency of Indian market increases hence arbitrage opportunities will be less, so algorithmic traders will have significant advantage over manual traders if text analytics is implemented in algorithmic trading
  • 26. Scope of further work  News articles can be clustered or classified into “economic news”, “political news” and “other news” based on the frequency of specific named entities to find out which type of news have greatest impact on the Indian market  If minute-wise market returns are available then news articles can be collected every hour and the returns can be observed over a period to find how much time it requires a news article of a certain importance score to affect the market  This text mining algorithm is not fully automated. The news articles need to be fed manually into the program for it to run and predict the returns. However this process can be automated to obtain live news feed from websites and automatically predict its importance and sentiment score. If the score is higher or lower than a particular range, then BUY or SELL (or short sell) calls can be taken automatically by the machine.