SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Hadoop Project

Stock Analyzer
(Mapreduce and Hive Implementation)
Presented by
Punit Kishore(A13011)
Debayan Datta(A13006)
Sunil Kumar P(A13020)
Maruthi Nataraj K(A13009)
Ashish Ranjan(A13004)
Praxis Business School
AGENDA
 Understanding of the problem
 Technical Architecture
 Basic Structure
 Pseudo Code
 Final Result
 Business Implications

Electronics Template
UNDERSTANDING OF THE PROBLEM
 Objective : To find the adjusted closing price for each

day that a stock not reported a dividend.

 Data Sources :
 NYSE daily prices dataset with the below schema
exchange

stock_symbol

date

stock_price
_open

stock_
price_high

stock_price
_low

stock_price
_close

stock_volume

stock_pric
e_adj_close

 NYSE dividends dataset with the below schema
exchange

stock_symbol

date

dividends

 Isolation of dividend data from total data will give better
picture of the company because sometimes firms avoid
cutting dividends even when earnings drop.
Framework– Mapreduce/Hive
Electronics Template
TECHNICAL ARCHITECTURE

Eclipse Indigo 3.7.2
Hadoop 1.2.1 plugin

Electronics Template
TECHNICAL ARCHITECTURE

Electronics Template
TECHNICAL ARCHITECTURE

Electronics Template
TECHNICAL ARCHITECTURE
WinSCP

Electronics Template
TECHNICAL ARCHITECTURE

Electronics Template
Putty

Electronics Template

TECHNICAL ARCHITECTURE
TECHNICAL ARCHITECTURE

Unix Environment /Amazon AWS EC2 Praxis Hadoop Cluster

Electronics Template
TECHNICAL ARCHITECTURE

Sample data - NYSE_daily_prices_AT.csv (Testing is done on sample data only due to
load and time constraints).

Electronics Template
TECHNICAL ARCHITECTURE

Sample data - NSE_daily_prices_BT.csv

Electronics Template
TECHNICAL ARCHITECTURE
Sample data - dividendstest.csv

Electronics Template
BASIC STRUCTURE
Input Key Value Pair <Memory Pointer,NYSE,AIT,
12-11-2009,X,X,X,X,X,20.69>

Intermediary Key Value Pair<AIT12-11-2009,1~20.69~0>
<AIT12-11-2009,1~Null~1>

Output/Result Key Value Pair
AIT
12-11-2009
20.69

Electronics Template
PSEUDO CODE
import java and hadoop packages

Mapper
Mapper

public static class StockAnalysisMapper extends MapReduceBase implements
Mapper<LongWritable, Text, Text, Text>
{
// declaration of Mapkey and Mapvalue
@Override
public void map(LongWritable key, Text value,OutputCollector<Text, Text> output,
Reporter reporter) throws IOException
{
// declaration of private variables
// switch case to parse the input lines and store the data
// check for null values in the key
// check the header and send the key value to output collector
}

}

Electronics Template
PSEUDO CODE
public static class StockAnalysisReducer extends MapReduceBase
implements Reducer<Text, Text, Text, Text>

Reducer
Reducer

{
//Declaration of required private variables
@Override
public void reduce(Text key, Iterator<Text> values,OutputCollector<Text, Text> output, Reporter
reporter) throws IOException
{
//Declaration of sum and flag variables
while (values.hasNext())
{
// Parse the inputs which are count,stock adjusted closing price and check
// Store them as required after parsing
//check for null values of stock adjusted closing price
}
}
}

//Increment the sum
// write to output if sum is 1

Electronics Template
PSEUDO CODE
public static void main(String [] arguments) throws Exception
{
JobConf conf = new JobConf(StockAnalyzer.class);
conf.setJobName("Stock Analysis");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(StockAnalysisMapper.class);
conf.setReducerClass(StockAnalysisReducer.class);
Path MapperInputPath = new Path(arguments[0]);
Path OutputPath = new Path(arguments[1]);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, MapperInputPath);
FileOutputFormat.setOutputPath(conf, OutputPath);
JobClient.runJob(conf);
}

Electronics Template

Driver
Driver
FINAL RESULT
• NYSE Daily A
– 14 inclusive of
1 header
• NYSE Daily B
– 39 inclusive of
1 header
• Dividends file
– 22 inclusive of
1 header
Total – 75

Electronics Template
FINAL RESULT
• Total – 75
• Matching
records – 7
• Headers – 3
• Dividend
records – 21
• Final Output
– 44 records

Electronics Template
FINAL RESULT

Electronics Template
HIVE
FINAL RESULT HIVE

Electronics Template
BUSINESS IMPLICATIONS
 The daily close stock prices are adjusted for dividend distributions/stock
splits because they are a part of total return and affect the historical volatility
estimates .
 The primary use for the adjusted closing price is as a means to develop an
accurate track record of a stock's performance. The comparison of a stock's
historical adjusted closing price to its current price shows the true rate of
return.
 Graphing the volatility history of the target firm simultaneously with that of its
competitors and Market Index can provide unique insights into risk and
comparative advantages(frequency distribution of returns can also be used).
 Historic stock price volatility might have implications to business valuators.

Electronics Template
Electronics Template

Weitere ähnliche Inhalte

Was ist angesagt?

Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Spark Summit
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit
 
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Spark Summit
 

Was ist angesagt? (20)

Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
 
Use r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrUse r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkr
 
Kapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing EngineKapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing Engine
 
Spark Summit EU talk by Yaroslav Nedashkovsky and Andy Starzhinsky
Spark Summit EU talk by Yaroslav Nedashkovsky and Andy StarzhinskySpark Summit EU talk by Yaroslav Nedashkovsky and Andy Starzhinsky
Spark Summit EU talk by Yaroslav Nedashkovsky and Andy Starzhinsky
 
Highly Available Graphite
Highly Available GraphiteHighly Available Graphite
Highly Available Graphite
 
Everyday Probabilistic Data Structures for Humans
Everyday Probabilistic Data Structures for HumansEveryday Probabilistic Data Structures for Humans
Everyday Probabilistic Data Structures for Humans
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
 
Producing Spark on YARN for ETL
Producing Spark on YARN for ETLProducing Spark on YARN for ETL
Producing Spark on YARN for ETL
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
 
Enhancements on Spark SQL optimizer by Min Qiu
Enhancements on Spark SQL optimizer by Min QiuEnhancements on Spark SQL optimizer by Min Qiu
Enhancements on Spark SQL optimizer by Min Qiu
 
Apache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing SystemApache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing System
 
Scaling graphite for application metrics
Scaling graphite for application metricsScaling graphite for application metrics
Scaling graphite for application metrics
 
Evolution is Continuous, and so are Big Data and Streaming Pipelines
Evolution is Continuous, and so are Big Data and Streaming PipelinesEvolution is Continuous, and so are Big Data and Streaming Pipelines
Evolution is Continuous, and so are Big Data and Streaming Pipelines
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi KeynoteSpark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
 
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
 

Andere mochten auch

Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduce
Frane Bandov
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-series
DataWorks Summit
 

Andere mochten auch (20)

Hadoop -- Enabling Expanded Financial Market Analysis Techniques while Improv...
Hadoop -- Enabling Expanded Financial Market Analysis Techniques while Improv...Hadoop -- Enabling Expanded Financial Market Analysis Techniques while Improv...
Hadoop -- Enabling Expanded Financial Market Analysis Techniques while Improv...
 
Stock analyzer.ppt review
Stock analyzer.ppt reviewStock analyzer.ppt review
Stock analyzer.ppt review
 
Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoop
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Fresher resume-sample10 by Babasab Patil
Fresher resume-sample10 by Babasab PatilFresher resume-sample10 by Babasab Patil
Fresher resume-sample10 by Babasab Patil
 
Hotel inspection data set analysis copy
Hotel inspection data set analysis   copyHotel inspection data set analysis   copy
Hotel inspection data set analysis copy
 
BIGDATA & HADOOP PROJECT
BIGDATA & HADOOP PROJECTBIGDATA & HADOOP PROJECT
BIGDATA & HADOOP PROJECT
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
 
Building Enterprise Applications on Google Cloud Platform Cloud Computing Exp...
Building Enterprise Applications on Google Cloud Platform Cloud Computing Exp...Building Enterprise Applications on Google Cloud Platform Cloud Computing Exp...
Building Enterprise Applications on Google Cloud Platform Cloud Computing Exp...
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduce
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-series
 
MapReduce Design Patterns
MapReduce Design PatternsMapReduce Design Patterns
MapReduce Design Patterns
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word count
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 

Ähnlich wie Stock Analyzer Hadoop MapReduce Implementation

.NET Portfolio
.NET Portfolio.NET Portfolio
.NET Portfolio
mwillmer
 
Educational Objectives After successfully completing this assignmen.pdf
Educational Objectives After successfully completing this assignmen.pdfEducational Objectives After successfully completing this assignmen.pdf
Educational Objectives After successfully completing this assignmen.pdf
rajeshjangid1865
 
An introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduceAn introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduce
Ananth PackkilDurai
 
Automating Interaction Testing with UML Sequence Diagrams: Where TDD and UML ...
Automating Interaction Testing with UML Sequence Diagrams: Where TDD and UML ...Automating Interaction Testing with UML Sequence Diagrams: Where TDD and UML ...
Automating Interaction Testing with UML Sequence Diagrams: Where TDD and UML ...
João Pascoal Faria
 
Hadoop_Pennonsoft
Hadoop_PennonsoftHadoop_Pennonsoft
Hadoop_Pennonsoft
PennonSoft
 
In Class AssignmetzCST280W13a-1.pdfCST 280 In-Class Pract.docx
In Class AssignmetzCST280W13a-1.pdfCST 280 In-Class Pract.docxIn Class AssignmetzCST280W13a-1.pdfCST 280 In-Class Pract.docx
In Class AssignmetzCST280W13a-1.pdfCST 280 In-Class Pract.docx
bradburgess22840
 

Ähnlich wie Stock Analyzer Hadoop MapReduce Implementation (20)

Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster Computing
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
Spark Sql and DataFrame
Spark Sql and DataFrameSpark Sql and DataFrame
Spark Sql and DataFrame
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
 
GDG Jakarta Meetup - Streaming Analytics With Apache Beam
GDG Jakarta Meetup - Streaming Analytics With Apache BeamGDG Jakarta Meetup - Streaming Analytics With Apache Beam
GDG Jakarta Meetup - Streaming Analytics With Apache Beam
 
.NET Portfolio
.NET Portfolio.NET Portfolio
.NET Portfolio
 
Compiler Construction for DLX Processor
Compiler Construction for DLX Processor Compiler Construction for DLX Processor
Compiler Construction for DLX Processor
 
Cs267 hadoop programming
Cs267 hadoop programmingCs267 hadoop programming
Cs267 hadoop programming
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Refactoring
RefactoringRefactoring
Refactoring
 
Accelerated data access
Accelerated data accessAccelerated data access
Accelerated data access
 
Madeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable HardwareMadeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable Hardware
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
 
Educational Objectives After successfully completing this assignmen.pdf
Educational Objectives After successfully completing this assignmen.pdfEducational Objectives After successfully completing this assignmen.pdf
Educational Objectives After successfully completing this assignmen.pdf
 
An introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduceAn introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduce
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
Automating Interaction Testing with UML Sequence Diagrams: Where TDD and UML ...
Automating Interaction Testing with UML Sequence Diagrams: Where TDD and UML ...Automating Interaction Testing with UML Sequence Diagrams: Where TDD and UML ...
Automating Interaction Testing with UML Sequence Diagrams: Where TDD and UML ...
 
Hadoop_Pennonsoft
Hadoop_PennonsoftHadoop_Pennonsoft
Hadoop_Pennonsoft
 
In Class AssignmetzCST280W13a-1.pdfCST 280 In-Class Pract.docx
In Class AssignmetzCST280W13a-1.pdfCST 280 In-Class Pract.docxIn Class AssignmetzCST280W13a-1.pdfCST 280 In-Class Pract.docx
In Class AssignmetzCST280W13a-1.pdfCST 280 In-Class Pract.docx
 
Refactoring
RefactoringRefactoring
Refactoring
 

Mehr von Maruthi Nataraj K

Mehr von Maruthi Nataraj K (15)

Time Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingTime Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and Forecasting
 
Text Mining of Movie Reviews
Text Mining of Movie ReviewsText Mining of Movie Reviews
Text Mining of Movie Reviews
 
How To Find Needles In Haystacks
How To Find Needles In HaystacksHow To Find Needles In Haystacks
How To Find Needles In Haystacks
 
Social Media Marketing - Daily Deals
Social Media Marketing - Daily DealsSocial Media Marketing - Daily Deals
Social Media Marketing - Daily Deals
 
Customer Profiling For Rural Financial Services
Customer Profiling For Rural Financial ServicesCustomer Profiling For Rural Financial Services
Customer Profiling For Rural Financial Services
 
Telecom Fraud Detection - Naive Bayes Classification
Telecom Fraud Detection - Naive Bayes ClassificationTelecom Fraud Detection - Naive Bayes Classification
Telecom Fraud Detection - Naive Bayes Classification
 
Time Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingTime Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and Forecasting
 
Linear Regression using R
Linear Regression using RLinear Regression using R
Linear Regression using R
 
Elementary School Performance (SAS Regression Analysis)
Elementary School Performance (SAS Regression Analysis)Elementary School Performance (SAS Regression Analysis)
Elementary School Performance (SAS Regression Analysis)
 
Hospital Market Segmentation using Cluster Analysis
Hospital Market Segmentation using Cluster AnalysisHospital Market Segmentation using Cluster Analysis
Hospital Market Segmentation using Cluster Analysis
 
SAS Medical Case Study - A Comparison Between Ketamine,Clonidine and combinat...
SAS Medical Case Study - A Comparison Between Ketamine,Clonidine and combinat...SAS Medical Case Study - A Comparison Between Ketamine,Clonidine and combinat...
SAS Medical Case Study - A Comparison Between Ketamine,Clonidine and combinat...
 
Maruti Suzuki India Ltd Financial Statement Analysis
Maruti Suzuki India Ltd Financial Statement AnalysisMaruti Suzuki India Ltd Financial Statement Analysis
Maruti Suzuki India Ltd Financial Statement Analysis
 
SBI Home Loan Customer Perception Survey
SBI Home Loan Customer Perception SurveySBI Home Loan Customer Perception Survey
SBI Home Loan Customer Perception Survey
 
Basketball League Sponsorship Proposal
Basketball League Sponsorship ProposalBasketball League Sponsorship Proposal
Basketball League Sponsorship Proposal
 
Bank market classification
Bank market classificationBank market classification
Bank market classification
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Stock Analyzer Hadoop MapReduce Implementation

  • 1. Hadoop Project Stock Analyzer (Mapreduce and Hive Implementation) Presented by Punit Kishore(A13011) Debayan Datta(A13006) Sunil Kumar P(A13020) Maruthi Nataraj K(A13009) Ashish Ranjan(A13004) Praxis Business School
  • 2. AGENDA  Understanding of the problem  Technical Architecture  Basic Structure  Pseudo Code  Final Result  Business Implications Electronics Template
  • 3. UNDERSTANDING OF THE PROBLEM  Objective : To find the adjusted closing price for each day that a stock not reported a dividend.  Data Sources :  NYSE daily prices dataset with the below schema exchange stock_symbol date stock_price _open stock_ price_high stock_price _low stock_price _close stock_volume stock_pric e_adj_close  NYSE dividends dataset with the below schema exchange stock_symbol date dividends  Isolation of dividend data from total data will give better picture of the company because sometimes firms avoid cutting dividends even when earnings drop. Framework– Mapreduce/Hive Electronics Template
  • 4. TECHNICAL ARCHITECTURE Eclipse Indigo 3.7.2 Hadoop 1.2.1 plugin Electronics Template
  • 10. TECHNICAL ARCHITECTURE Unix Environment /Amazon AWS EC2 Praxis Hadoop Cluster Electronics Template
  • 11. TECHNICAL ARCHITECTURE Sample data - NYSE_daily_prices_AT.csv (Testing is done on sample data only due to load and time constraints). Electronics Template
  • 12. TECHNICAL ARCHITECTURE Sample data - NSE_daily_prices_BT.csv Electronics Template
  • 13. TECHNICAL ARCHITECTURE Sample data - dividendstest.csv Electronics Template
  • 14. BASIC STRUCTURE Input Key Value Pair <Memory Pointer,NYSE,AIT, 12-11-2009,X,X,X,X,X,20.69> Intermediary Key Value Pair<AIT12-11-2009,1~20.69~0> <AIT12-11-2009,1~Null~1> Output/Result Key Value Pair AIT 12-11-2009 20.69 Electronics Template
  • 15. PSEUDO CODE import java and hadoop packages Mapper Mapper public static class StockAnalysisMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> { // declaration of Mapkey and Mapvalue @Override public void map(LongWritable key, Text value,OutputCollector<Text, Text> output, Reporter reporter) throws IOException { // declaration of private variables // switch case to parse the input lines and store the data // check for null values in the key // check the header and send the key value to output collector } } Electronics Template
  • 16. PSEUDO CODE public static class StockAnalysisReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> Reducer Reducer { //Declaration of required private variables @Override public void reduce(Text key, Iterator<Text> values,OutputCollector<Text, Text> output, Reporter reporter) throws IOException { //Declaration of sum and flag variables while (values.hasNext()) { // Parse the inputs which are count,stock adjusted closing price and check // Store them as required after parsing //check for null values of stock adjusted closing price } } } //Increment the sum // write to output if sum is 1 Electronics Template
  • 17. PSEUDO CODE public static void main(String [] arguments) throws Exception { JobConf conf = new JobConf(StockAnalyzer.class); conf.setJobName("Stock Analysis"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(Text.class); conf.setMapperClass(StockAnalysisMapper.class); conf.setReducerClass(StockAnalysisReducer.class); Path MapperInputPath = new Path(arguments[0]); Path OutputPath = new Path(arguments[1]); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, MapperInputPath); FileOutputFormat.setOutputPath(conf, OutputPath); JobClient.runJob(conf); } Electronics Template Driver Driver
  • 18. FINAL RESULT • NYSE Daily A – 14 inclusive of 1 header • NYSE Daily B – 39 inclusive of 1 header • Dividends file – 22 inclusive of 1 header Total – 75 Electronics Template
  • 19. FINAL RESULT • Total – 75 • Matching records – 7 • Headers – 3 • Dividend records – 21 • Final Output – 44 records Electronics Template
  • 22. BUSINESS IMPLICATIONS  The daily close stock prices are adjusted for dividend distributions/stock splits because they are a part of total return and affect the historical volatility estimates .  The primary use for the adjusted closing price is as a means to develop an accurate track record of a stock's performance. The comparison of a stock's historical adjusted closing price to its current price shows the true rate of return.  Graphing the volatility history of the target firm simultaneously with that of its competitors and Market Index can provide unique insights into risk and comparative advantages(frequency distribution of returns can also be used).  Historic stock price volatility might have implications to business valuators. Electronics Template