SlideShare ist ein Scribd-Unternehmen logo
1 von 20
SPARK MACHINE LEARNING
Certification Course Academic Year (2017-2018)
Done by:
K Teja Sreenivas
INTRODUCTION:
– Machine learning is a type of artificial intelligence (AI) that
allows software applications to become more accurate in
predicting outcomes without being explicitly programmed.
The basic premise of machine learning is to build
algorithm that can receive input data and use statistical
learning to predict an output value within an acceptable
range.
– Machine learning algorithms are often categorized as
being supervised or Unsupervised.
MACHINE LEARNING TYPES:
LIFE CYCLE IN DESIGNING A
MACHINE LEARNING MODEL
 1. Data collection
 2. Data processing
 3. Feature Engineering
 4. Model Building
 5. Model Evaluation
 6. Model evaluation
 7. Model Deployment
SPARK FOR MACHINE LEARNING:
• Spark is a distributed file system used in place of hadoop. Big Data is used over
network clusters and used as an essential application in several industries. The broad
use of Hadoop and MapReduce technologies shows how such technology is
constantly evolving. The increase in the use of Apache Spark, which is a data
processing engine, is testament to this fact.
• Superior abilities for Big Data applications are provided by Apache Spark when
compared to other Big Data Technologies like MapReduce or Hadoop. The Apache
Spark features are as follows:
1. Holistic framework
2. Speed
3. Easy to use
4. Enhanced support
PROBLEM STATMENT:
Prediction of Annual returns
using sets of weights which
are simulated using US stock
market historical data to
obtain their performances.
DATA SET ATTRIBUTE INFORMATION:
• The inputs are the weights of the stock-picking concepts as follows
X1=the weight of the Large B/P concept
X2=the weight of the Large ROE concept
X3=the weight of the Large S/P concept
X4=the weight of the Large Return Rate in the last quarter concept
X5=the weight of the Large Market Value concept
X6=the weight of the Small systematic Risk concept
The outputs are the investment performance indicators (normalized) as follows
Y1=Annual Return
Y2=Excess Return
Y3=Systematic Risk
Y4=Total Risk
Y5=Abs. Win Rate
Y6=Rel. Win Rate
TERMINOLOGY:
• P/B ratio : The price-to-book ratio, or P/B ratio, is a financial ratio used to compare a company's current market price to its
book value. It is also sometimes known as a Market-to-Book ratio.
• ROE: Return on equity (ROE) is the amount of net income returned as a percentage of shareholder equity. Return on
equity measures a corporation's profitability by revealing how much profit a company generates with the money
shareholders have invested.
• The S&P 500 measures the value of stocks of the 500 largest corporations by market capitalization listed on the New York
Stock Exchange or Nasdaq Composite. Standard & Poor's intention is to have a price that provides a quick look at the stock
market and economy.
• Return Rate: A rate of return is the gain or loss on an investment over a specified time period, expressed as a percentage
of the investment's cost. Gains on investments are defined as income received plus any capital gains realized on the sale of
the investment.
• market value: The amount for which something can be sold on a given market.
• Systematic Risk: Systematic risk is the risk inherent to the entire market or market segment. Systematic risk, also known
as “undiversifiable risk,” “volatility,” or “market risk,” affects the overall market, not just a particular stock or industry. This
type of risk is both unpredictable and impossible to completely avoid.
SOFTWARE TOOLS USED:
• SPARK
• SPYDER
• ANACONDA
• JUPYTER
• PYTHON
• VERTUAL MACHINE
• HDFS
from pyspark import SparkContext , SQLContext
sqlContext = SQLContext(sc)
#data collection:
data = sqlContext.read.csv('/home/tej/Documents/ML with spark/train.csv',header=True, sep=',')
data.show(n=5)
X_train =
data.select('Large_BnP','Large_ROE','Large_SnP','Large_Return_Rate_last_quarter','Large_Market_
Value','Small_systematic_Risk','systematic_risks','Annual_Return')
X_train=X_train.select(X_train.Large_ROE.cast('float'),X_train.Large_Return_Rate_last_quarter.cast
('float'),X_train.Large_Market_Value.cast('float'),X_train.Small_systematic_Risk.cast('float'),X_train.
systematic_risks.cast('float'),X_train.Large_BnP.cast('float'),X_train.Large_SnP.cast('float'),X_train.A
nnual_Return.cast('float'))
from pyspark.ml.feature import VectorAssembler,VectorIndexer,StringIndexer
assembler=VectorAssembler(inputCols=['Large_BnP','Large_ROE','Large_SnP','Large_Return_Rate_
last_quarter','Large_Market_Value','Small_systematic_Risk','systematic_risks'],outputCol='features')
X_train=assembler.transform(X_train)
featureIndexer = VectorIndexer(inputCol="features", outputCol="indexedFeatures",
maxCategories=7).fit(X_train)
X_train=featureIndexer.transform(X_train)
from pyspark.ml.regression import LinearRegression
linear_reg = LinearRegression(labelCol='Annual_Return',featuresCol =
'indexedFeatures')
linear_reg_model = linear_reg.fit(X_train)
test_data = sqlContext.read.csv('/home/tej/Documents/ML with spark/test.csv',header=True, sep=',')
X_test=test_data.select('Large_BnP','Large_ROE','Large_SnP','Large_Return_Rate_last_quarter','Large_Market_Value',
'Small_systematic_Risk','systematic_risks','Annual_Return')
X_test=
test_data.select(X_test.Large_ROE.cast('float'),X_test.Large_Return_Rate_last_quarter.cast('float'),X_test.Large_Mark
et_Value.cast('float'),X_test.Small_systematic_Risk.cast('float'),X_test.systematic_risks.cast('float'),X_test.Large_BnP.c
ast('float'),X_test.Large_SnP.cast('float'),X_test.Annual_Return.cast('float'))
assembler =
VectorAssembler(inputCols=['Large_BnP','Large_ROE','Large_SnP','Large_Return_Rate_last_quarter','Large_Market_
Value','Small_systematic_Risk','systematic_risks'],outputCol='features')
X_test=assembler.transform(X_test)
featureIndexer = VectorIndexer(inputCol="features", outputCol="indexedFeatures", maxCategories=7).fit(X_test)
X_test=featureIndexer.transform(X_test)
linear_predictions = linear_reg_model.transform(X_test)
linear_predictions.show()
linear_predictions.select('Annual_Return','prediction').show()
CONCLUSION:
• From the final output it is clear that using linear model in training the data set we have
obtained predictions which show perdictions of annul returns with less than 0.1 unit
error on average.
key learning :
• we have learnt the basic uses of a machine learning and the uses of spark
in the implementation of the machine learning model.
• The various phases involved in the designing machine learning model in
understood and implemented using a machine learning Random forest model
•
THANKYOU !

Weitere ähnliche Inhalte

Ähnlich wie Spark machine learning

"Lessons learned using Apache Spark for self-service data prep in SaaS world"
"Lessons learned using Apache Spark for self-service data prep in SaaS world""Lessons learned using Apache Spark for self-service data prep in SaaS world"
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
Pavel Hardak
 
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS WorldLessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Databricks
 
Chapter 2: Information Systems in Organizations
Chapter 2: Information Systems in OrganizationsChapter 2: Information Systems in Organizations
Chapter 2: Information Systems in Organizations
phak_09
 
Are indian life insurance companies cost efficient ppt
Are indian life insurance companies cost efficient pptAre indian life insurance companies cost efficient ppt
Are indian life insurance companies cost efficient ppt
Ram Pratap Sinha
 

Ähnlich wie Spark machine learning (20)

Leveraging Data Analysis for Sales
Leveraging Data Analysis for SalesLeveraging Data Analysis for Sales
Leveraging Data Analysis for Sales
 
I Know First Presentation (May 2016)
I Know First Presentation (May 2016)I Know First Presentation (May 2016)
I Know First Presentation (May 2016)
 
A Study on Empirical Testing of Capital Asset Pricing Model
A Study on Empirical Testing of Capital Asset Pricing ModelA Study on Empirical Testing of Capital Asset Pricing Model
A Study on Empirical Testing of Capital Asset Pricing Model
 
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
"Lessons learned using Apache Spark for self-service data prep in SaaS world""Lessons learned using Apache Spark for self-service data prep in SaaS world"
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
 
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS WorldLessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
 
Business analytics and it's tools and competitive advantage
Business analytics and it's tools and competitive advantage Business analytics and it's tools and competitive advantage
Business analytics and it's tools and competitive advantage
 
CAST HIGHLIGHT - Overview & Demos
CAST HIGHLIGHT - Overview & DemosCAST HIGHLIGHT - Overview & Demos
CAST HIGHLIGHT - Overview & Demos
 
2016-10 Using the Copy & Move webpart
2016-10 Using the Copy & Move webpart2016-10 Using the Copy & Move webpart
2016-10 Using the Copy & Move webpart
 
Stock Market Prediction
Stock Market Prediction Stock Market Prediction
Stock Market Prediction
 
Fin 550 Massive Success / snaptutorial.com
Fin 550  Massive Success / snaptutorial.comFin 550  Massive Success / snaptutorial.com
Fin 550 Massive Success / snaptutorial.com
 
Know risk for mining industry 1
Know risk for mining industry 1Know risk for mining industry 1
Know risk for mining industry 1
 
Project Evaluation and Estimation in Software Development
Project Evaluation and Estimation in Software DevelopmentProject Evaluation and Estimation in Software Development
Project Evaluation and Estimation in Software Development
 
Chapter 2: Information Systems in Organizations
Chapter 2: Information Systems in OrganizationsChapter 2: Information Systems in Organizations
Chapter 2: Information Systems in Organizations
 
Dhaval Shah on "Strategic Alignment Of Projects For Higher Profits And Increa...
Dhaval Shah on "Strategic Alignment Of Projects For Higher Profits And Increa...Dhaval Shah on "Strategic Alignment Of Projects For Higher Profits And Increa...
Dhaval Shah on "Strategic Alignment Of Projects For Higher Profits And Increa...
 
IRJET - Stock Recommendation System using Machine Learning Approache
IRJET - Stock Recommendation System using Machine Learning ApproacheIRJET - Stock Recommendation System using Machine Learning Approache
IRJET - Stock Recommendation System using Machine Learning Approache
 
Risk Insight v1.0 User Guide
Risk Insight v1.0 User GuideRisk Insight v1.0 User Guide
Risk Insight v1.0 User Guide
 
Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...
Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...
Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...
 
WACC
WACCWACC
WACC
 
SDX EQ Presentation
SDX EQ PresentationSDX EQ Presentation
SDX EQ Presentation
 
Are indian life insurance companies cost efficient ppt
Are indian life insurance companies cost efficient pptAre indian life insurance companies cost efficient ppt
Are indian life insurance companies cost efficient ppt
 

KĂźrzlich hochgeladen

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

KĂźrzlich hochgeladen (20)

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 

Spark machine learning

  • 1. SPARK MACHINE LEARNING Certification Course Academic Year (2017-2018) Done by: K Teja Sreenivas
  • 2. INTRODUCTION: – Machine learning is a type of artificial intelligence (AI) that allows software applications to become more accurate in predicting outcomes without being explicitly programmed. The basic premise of machine learning is to build algorithm that can receive input data and use statistical learning to predict an output value within an acceptable range. – Machine learning algorithms are often categorized as being supervised or Unsupervised.
  • 4.
  • 5. LIFE CYCLE IN DESIGNING A MACHINE LEARNING MODEL  1. Data collection  2. Data processing  3. Feature Engineering  4. Model Building  5. Model Evaluation  6. Model evaluation  7. Model Deployment
  • 6. SPARK FOR MACHINE LEARNING: • Spark is a distributed file system used in place of hadoop. Big Data is used over network clusters and used as an essential application in several industries. The broad use of Hadoop and MapReduce technologies shows how such technology is constantly evolving. The increase in the use of Apache Spark, which is a data processing engine, is testament to this fact. • Superior abilities for Big Data applications are provided by Apache Spark when compared to other Big Data Technologies like MapReduce or Hadoop. The Apache Spark features are as follows: 1. Holistic framework 2. Speed 3. Easy to use 4. Enhanced support
  • 7. PROBLEM STATMENT: Prediction of Annual returns using sets of weights which are simulated using US stock market historical data to obtain their performances.
  • 8. DATA SET ATTRIBUTE INFORMATION: • The inputs are the weights of the stock-picking concepts as follows X1=the weight of the Large B/P concept X2=the weight of the Large ROE concept X3=the weight of the Large S/P concept X4=the weight of the Large Return Rate in the last quarter concept X5=the weight of the Large Market Value concept X6=the weight of the Small systematic Risk concept The outputs are the investment performance indicators (normalized) as follows Y1=Annual Return Y2=Excess Return Y3=Systematic Risk Y4=Total Risk Y5=Abs. Win Rate Y6=Rel. Win Rate
  • 9. TERMINOLOGY: • P/B ratio : The price-to-book ratio, or P/B ratio, is a financial ratio used to compare a company's current market price to its book value. It is also sometimes known as a Market-to-Book ratio. • ROE: Return on equity (ROE) is the amount of net income returned as a percentage of shareholder equity. Return on equity measures a corporation's profitability by revealing how much profit a company generates with the money shareholders have invested. • The S&P 500 measures the value of stocks of the 500 largest corporations by market capitalization listed on the New York Stock Exchange or Nasdaq Composite. Standard & Poor's intention is to have a price that provides a quick look at the stock market and economy. • Return Rate: A rate of return is the gain or loss on an investment over a specified time period, expressed as a percentage of the investment's cost. Gains on investments are defined as income received plus any capital gains realized on the sale of the investment. • market value: The amount for which something can be sold on a given market. • Systematic Risk: Systematic risk is the risk inherent to the entire market or market segment. Systematic risk, also known as “undiversifiable risk,” “volatility,” or “market risk,” affects the overall market, not just a particular stock or industry. This type of risk is both unpredictable and impossible to completely avoid.
  • 10. SOFTWARE TOOLS USED: • SPARK • SPYDER • ANACONDA • JUPYTER • PYTHON • VERTUAL MACHINE • HDFS
  • 11. from pyspark import SparkContext , SQLContext sqlContext = SQLContext(sc) #data collection: data = sqlContext.read.csv('/home/tej/Documents/ML with spark/train.csv',header=True, sep=',') data.show(n=5) X_train = data.select('Large_BnP','Large_ROE','Large_SnP','Large_Return_Rate_last_quarter','Large_Market_ Value','Small_systematic_Risk','systematic_risks','Annual_Return')
  • 12. X_train=X_train.select(X_train.Large_ROE.cast('float'),X_train.Large_Return_Rate_last_quarter.cast ('float'),X_train.Large_Market_Value.cast('float'),X_train.Small_systematic_Risk.cast('float'),X_train. systematic_risks.cast('float'),X_train.Large_BnP.cast('float'),X_train.Large_SnP.cast('float'),X_train.A nnual_Return.cast('float')) from pyspark.ml.feature import VectorAssembler,VectorIndexer,StringIndexer assembler=VectorAssembler(inputCols=['Large_BnP','Large_ROE','Large_SnP','Large_Return_Rate_ last_quarter','Large_Market_Value','Small_systematic_Risk','systematic_risks'],outputCol='features') X_train=assembler.transform(X_train) featureIndexer = VectorIndexer(inputCol="features", outputCol="indexedFeatures", maxCategories=7).fit(X_train) X_train=featureIndexer.transform(X_train)
  • 13.
  • 14. from pyspark.ml.regression import LinearRegression linear_reg = LinearRegression(labelCol='Annual_Return',featuresCol = 'indexedFeatures') linear_reg_model = linear_reg.fit(X_train)
  • 15.
  • 16. test_data = sqlContext.read.csv('/home/tej/Documents/ML with spark/test.csv',header=True, sep=',') X_test=test_data.select('Large_BnP','Large_ROE','Large_SnP','Large_Return_Rate_last_quarter','Large_Market_Value', 'Small_systematic_Risk','systematic_risks','Annual_Return') X_test= test_data.select(X_test.Large_ROE.cast('float'),X_test.Large_Return_Rate_last_quarter.cast('float'),X_test.Large_Mark et_Value.cast('float'),X_test.Small_systematic_Risk.cast('float'),X_test.systematic_risks.cast('float'),X_test.Large_BnP.c ast('float'),X_test.Large_SnP.cast('float'),X_test.Annual_Return.cast('float')) assembler = VectorAssembler(inputCols=['Large_BnP','Large_ROE','Large_SnP','Large_Return_Rate_last_quarter','Large_Market_ Value','Small_systematic_Risk','systematic_risks'],outputCol='features') X_test=assembler.transform(X_test) featureIndexer = VectorIndexer(inputCol="features", outputCol="indexedFeatures", maxCategories=7).fit(X_test) X_test=featureIndexer.transform(X_test)
  • 18.
  • 19. CONCLUSION: • From the final output it is clear that using linear model in training the data set we have obtained predictions which show perdictions of annul returns with less than 0.1 unit error on average. key learning : • we have learnt the basic uses of a machine learning and the uses of spark in the implementation of the machine learning model. • The various phases involved in the designing machine learning model in understood and implemented using a machine learning Random forest model •