SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Downloaden Sie, um offline zu lesen
International Journal of Engineering Science Invention
ISSN (Online): 2319 – 6734, ISSN (Print): 2319 – 6726
www.ijesi.org ||Volume 5 Issue 9|| September 2016 || PP. 82-86
www.ijesi.org 82 | Page
Introduction to Big Data and Hadoop using
Local Standalone Mode
K.Srikanth 1
, P.Venkateswarlu 2
, R.Roje Spandana3
1,2,3
(Information Technology, Jawaharlal Nehru Technological University Kakinada, India)
Abstract: Big Data is a term defined for data sets that are extreme and complex where traditional data
processing applications are inadequate to deal with them. The term Big Data often refers simply to the use of
predictive investigation on analytic methods that extract value from data. Big data is generalized as a large
data which is a collection of big datasets that cannot be processed using traditional computing techniques. Big
data is not purely a data, rather than it is a complete subject involves various tools, techniques and frameworks.
Big data can be any structured collection which results incapability of conventional data management methods.
Hadoop is a distributed example used to change the large amount of data. This manipulation contains not only
storage as well as processing on the data. Hadoop is an open- source software framework for dispersed storage
and processing of big data sets on computer clusters built from commodity hardware. HDFS was built to
support high throughput, streaming reads and writes of extremely large files. Hadoop Map Reduce is a software
framework for easily writing applications which process vast amounts of data. Wordcount example reads text
files and counts how often words occur. The input is text files and the result is wordcount file, each line of which
contains a word and the count of how often it occurred separated by a tab.
Keywords: Big Data, Hadoop, HDFS, Map Reduce, WordCount
I. Introduction
Big data is generalized as a large data which is a collection of it is a collection of big datasets [6] that cannot be
processed using traditional computing techniques. Large data is not purely a data, rather than it is a complete
subject involves various techniques, tools and frameworks. Due to the onset of new advanced technologies,
creation, and communication means like media, social networking sites, the amount of data produced by
mankind is growing speedily every year [2]. The large amount of data beginning of time till 2003 was 6 billion
gigabyte. Hadoop is a distributed exemplar used to change the large amount of data. This direction contains not
only storage as well as processing on the data. In addition to simple storage, Google File System was built to
data-intensive, support large-scale, distributed processing applications. The coming year, new paper, related
Map Reduce Simplified Data Processing on Large Clusters, was presented, defining a programming model
framework that provided automatic parallelization and the scale to process thousands of terabytes of big data in
a one job no limitation thousands of machines. When paired, these two systems could be used to build large data
processing [5] clusters on relatively inexpensive, commodity machines. This paper directly inspired the
development of HDFS and Hadoop Map Reduce, respectively. Within the Apache Software Foundation, real
time projects that explicitly make use of integrate with, Hadoop are springing up regularly. Some of these real
time projects make authoring Map Reduce work easier and more jobs accessible, while others focus on getting
data in and out of HDFS, simplify operations, enable deployment in cloud environments, and so on.
II. Big Data
Big Data is a term defined for data sets that are extreme and complex where traditional data processing
applications are inadequate to deal with them. The data is large data, moves too fast, or does not fit the
structures of our current big data architectures [7]. Big Data is typically large volume of unstructured and
structured data that gets created from various organized and disordered applications, activities and channels such
as emails etc. The main adversity with Big Data includes search, capture, storage, sharing, analysis, and
visualization [2]. The core of Big Data is Hadoop which is a platform for administer computing problems across
a number of servers. It is first developed and released as open source by web application. It implements the Map
Reduce approach pioneered by Google in compiling its search indexes [8]. To store data, Hadoop utilizes its
own distributed file system, HDFS, which makes data available to multiple computing nodes. Big data outbreak
[4], a result not only of increasing web usage by people around the world the connection of billions of devices to
the web.
Introduction to Big Data and Hadoop using Local StandaloneMode
www.ijesi.org 83 | Page
III. Hadoop
Hadoop is a platform that provides both shared storage and computational capabilities. At the time Google had
published papers that described its different distributed file system, the Google File System and Map Reduce, a
computational framework for parallel processing. The successful implementation of these paper concepts in
Nutch resulted in its split into more projects, the second of which became Hadoop, in this section will look at
Hadoop architectural perspective, examine how much company’s uses it, and consider some of its weaknesses.
Once we have covered Hadoop background, will look at how to install Hadoop and run a Map Reduce job.
Hadoop is a distributed master and slave architecture that consists of the Hadoop Distributed File System for
storage and Map Reduce [10] for computational capabilities. Traits intrinsic to Hadoop are data partitioning and
parallel computing of large datasets. Its storage and computational capabilities scale with the more number of
hosts to a Hadoop cluster, and can reach volume sizes in the pet bytes on clusters with thousands of hosts. In the
first step in this section will examine the Hadoop Distributed File System and Map Reduce architectures.
Figure 1. Big Data and Hadoop Architecture
IV. Hdfs
The Apache Hadoop is a file system called the Hadoop Distributed File System or simply HDFS. HDFS was
built to support big throughput, processing read and write of extremely large files. Traditional large Network
Attached Storage (NAS) and Storage Area Networks (SANs) offer centralized [1], low-latency access to either a
block device or a file system on the order of terabytes in size. These systems are unusually as the backing store
for large databases, content delivery systems and same types of data storage needs because they can support
full-featured POSIX semantics, scale to meet the size requirements of these systems, and offer lesser-latency
access to data. Imagine for a second [2], though, hundreds or thousands of machines all awakening at the same
time and pulling hundreds of terabytes of data from a central control storage system at once. This is where
conventional storage does not necessarily scale. By creating a system composed of independent machines, each
with its own input output subsystem, network interfaces, hard disks, RAM, and CPU, for the specific type of
workload we are interested in.
Figure 2. HDFS Architecture
Introduction to Big Data and Hadoop using Local StandaloneMode
www.ijesi.org 84 | Page
4.1 There are a number of specific goals for HDFS:
 Store millions of huge files, each greater than tens of gigabytes, and file system sizes embracing tens of
petabytes.
 Use a scale out model based on inexpensive commodity servers with internal JBOD rather than RAID to
achieve large scale storage. Accomplish possibility and high throughput through application level
reproduction of data.
 Optimize for large streaming reads and writes rather than low suspension access to many small files. Batch
performance is more important than reciprocal response times.
 Gracefully deal with component failures of machines and disks.
 Support the performance and scale requirements of Map Reduce processing.
V. Mapreduce
Hadoop Map Reduce is a software framework for freely writing applications which process large amounts of
data in parallel on large clusters of product hardware very trustful, fault-tolerant manner. A Map
Reduce job usually splits the input dataset into self directed collections which are processed by the map tasks in
a completely parallel manner. The framework distinguished the outputs of the maps, which are then input to
the reduce tasks. Typically both the input and the output stored in a file system. The framework takes care of
scheduling tasks, monitoring them and reproduces the failed tasks. Typically the storage nodes and the compute
nodes are the same [3], that is, the Map Reduce framework and the widespread file system are running on the
same set of nodes. This configuration allows the framework to effectively schedule tasks on the nodes where
data is presented before, resulting in very high aggregate bandwidth across the cluster.
The Map Reduce framework consists of a single master [9] Job detector and one slave Taskdetector per cluster-
node. The master is responsible for scheduling the jobs’ element tasks on the slaves, monitoring them and
reproduces the failed tasks. The slaves execute the tasks as assisted by the master. Minimally, applications
specify the input and output locations and provide Map Reduce functions via implementations of perfect
interfaces and or abstract-classes, and other job parameters, composed of the job configuration. The Hadoop job
client then submits the job and configuration to the Jobdetector which then assumes and answerable of
distributing the software configuration to the slaves, scheduling tasks and monitoring them, providing status and
characteristic information to the job client.
Figure 3.MapReduce Architecture in Big Data
5.1 Key Features of Map Reduce Framework
 Implement Framework for Map Reduce Execution
 Intellectual Developer from the complexity of Distributed programming
 Partial failure of the refine cluster is expected and tolerable
 Redundancy and fault-tolerance is built in. so that programmer does not have to worry
 Map Reduce programming model is language self-sufficient
 Self starting parallelization and distribution
 Failing tolerance
 Enable data local refine
 Shared nothing architecture
Introduction to Big Data and Hadoop using Local StandaloneMode
www.ijesi.org 85 | Page
VI. Wordcount Example
Wordcount example reads text files and counts how frequently words occur. The input is text files and the
output is word count, each line contains a word and the count of how oftentimes it occurred, divided by a tab.
Each mapper takes a line as input and breaks it into words. It then spit out a key, value pair of the word and 1.
Each reducer sums the counts for each word and spit out a single key, value with the word and sum. As for
improvement of the reducer is also used as a incorporate on the map output. This reduces the amount of data
sent across the network by combining each word into a single record.
Step 1: Running Worcount
$mkdirinputdir
$cd inputdir
$nano input.txt
Figure 4.Create a file
Step 2:Create the jar file and enter some text and save the file, it will be there in local file system only.
Figure 5. Input to a text file
$hadoop jar /home/srikanth/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduceexamples-2.6.0
wordcount inputdir outputdir to
Figure 6.Create a Jar file
Step 3: wordcount inputdit ouputdir
$ cd outputdir
$ cat part-r-00000
Figure 7. Output for a wordcount
VII. Conclusion
Using big data large volumes of data get created from various applications and activates where these common
technical challenges are very common in various application domains. The data is extremely large where it does
not fit to the structure of current database architectures. The paper describes Local StandaloneMode with Hadoop
which is an open source software used for processing of large volumes of data. The paper directly is based on the
development of HDFS and Hadoop Map Reduce. The Interest and investment in Hadoop has led to an entire
ecosystem of related software both open source and commercial. Within the Apache Software Foundation,
projects are integrated with Hadoop are springing up recurrently. Some of these projects make Map Reduce jobs
easier and more accessible while others focus on getting data in and out of HDFS, simplify operations, and so on.
Based on this paper we can extend the paper with the help of YARN and DFS to execute the word count as a web
based application.
Introduction to Big Data and Hadoop using Local StandaloneMode
www.ijesi.org 86 | Page
References
[1]. S.VikramPhaneendra&E.Madhusudhan Reddy “Big Data- solutions for RDBMS problems- A survey” In 12thIEEE/IFIP Network
Operations & Management Symposium (NOMS 2010)(Osaka, Japan, Apr 19{23 2013).
[2]. Kiran kumara Reddi&Dnvsl Indira “Different Technique to Transfer Big Data: survey” IEEE Transactions on 52(8) (Aug.2013)
2348 {2355}.
[3]. Jimmy Lin “MapReduce Is Good Enough?” The control project.IEEE Computer 32 (2013).
[4]. Umasri.M.L, Shyamalagowri.D,Suresh Mining Big Data: - Current status and forecast to the future” Volume 4, Issue 1, January
2014 ISSN: 2277 128X.
[5]. B. J. Doorn, M. V. Duivestein, S. Manen and Ommeren, ―Creating clarity with Big Data‖, Sogeti, (2012).
[6]. Bernice Purcell “The emergence of “big data” technology and analytics” Journal of Technology Research 2013.
[7]. Definting Big Data Architecture Framework: Outcome of the Brainstorming Session at the University of Amsterdam, 17 July
2013.Presented at NBD-WG, 24 July 2013 [Online].
[8]. AhmedEldawy, Mohamed F. Mokbel “A Demonstration of Spatial Hadoop: An Efficient Map Reduce Framework for Spatial Data”
Proceedings of the VLDB Endowment, Vol. 6, No. 12Copyright 2013 VLDB Endowment 21508097/13/10.
[9]. A Workload Model for Map Reduce by Thomas A. deRuiter, Master’s Thesis in Computer Science, Parallel and Distributed
Systems Group Faculty of Electrical Engineering, Mathematics, and Computer Science. Delft University of Technology, 2nd June
2012.
[10]. HDFS, http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html.

Weitere ähnliche Inhalte

Was ist angesagt?

Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopIRJET Journal
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoopAditi Yadav
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoopManoj Jangalva
 
Introducing the hadoop ecosystem
Introducing the hadoop ecosystemIntroducing the hadoop ecosystem
Introducing the hadoop ecosystemGeert Van Landeghem
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Cognizant
 
assignment3
assignment3assignment3
assignment3Kirti J
 
Implementation of Multi-node Clusters in Column Oriented Database using HDFS
Implementation of Multi-node Clusters in Column Oriented Database using HDFSImplementation of Multi-node Clusters in Column Oriented Database using HDFS
Implementation of Multi-node Clusters in Column Oriented Database using HDFSIJEACS
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?IJCSIS Research Publications
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
 

Was ist angesagt? (17)

Big data
Big dataBig data
Big data
 
Hadoop
HadoopHadoop
Hadoop
 
hadoop
hadoophadoop
hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Big Data
Big DataBig Data
Big Data
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and Hadoop
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Introducing the hadoop ecosystem
Introducing the hadoop ecosystemIntroducing the hadoop ecosystem
Introducing the hadoop ecosystem
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 
assignment3
assignment3assignment3
assignment3
 
Implementation of Multi-node Clusters in Column Oriented Database using HDFS
Implementation of Multi-node Clusters in Column Oriented Database using HDFSImplementation of Multi-node Clusters in Column Oriented Database using HDFS
Implementation of Multi-node Clusters in Column Oriented Database using HDFS
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 

Andere mochten auch

Presentación1 (1)
Presentación1 (1)Presentación1 (1)
Presentación1 (1)Angie Samper
 
Examining the Usability of Message Reading Features on Smartwatches
Examining the Usability of Message Reading Features on SmartwatchesExamining the Usability of Message Reading Features on Smartwatches
Examining the Usability of Message Reading Features on Smartwatchesinventionjournals
 
Maria cabanas y_paula_calatrava
Maria cabanas y_paula_calatravaMaria cabanas y_paula_calatrava
Maria cabanas y_paula_calatravapaulacg15
 
Redes sociales-tarea
Redes sociales-tareaRedes sociales-tarea
Redes sociales-tareaElvisvq1197
 
Taller ErgoCV en el XII Congreso Internacional de Prevención de Riesgos Labor...
Taller ErgoCV en el XII Congreso Internacional de Prevención de Riesgos Labor...Taller ErgoCV en el XII Congreso Internacional de Prevención de Riesgos Labor...
Taller ErgoCV en el XII Congreso Internacional de Prevención de Riesgos Labor...ErgoCV
 
A New Frontier of Criminal Justice System
A New Frontier of Criminal Justice System A New Frontier of Criminal Justice System
A New Frontier of Criminal Justice System inventionjournals
 
In Causality Principle as the Framework to Contextualize Time in Modern Physics
In Causality Principle as the Framework to Contextualize Time in Modern PhysicsIn Causality Principle as the Framework to Contextualize Time in Modern Physics
In Causality Principle as the Framework to Contextualize Time in Modern Physicsinventionjournals
 
килаш расул+бизнес генератор+идея
килаш расул+бизнес генератор+идеякилаш расул+бизнес генератор+идея
килаш расул+бизнес генератор+идеяРасул Килаш
 
AULA 6 AUDITORIA DE GESTIÓN LILIANA BUENO CAMBA
AULA 6 AUDITORIA DE GESTIÓN LILIANA BUENO CAMBAAULA 6 AUDITORIA DE GESTIÓN LILIANA BUENO CAMBA
AULA 6 AUDITORIA DE GESTIÓN LILIANA BUENO CAMBADavicho B
 
noti-tena-septiembre-2016
noti-tena-septiembre-2016noti-tena-septiembre-2016
noti-tena-septiembre-2016TENA
 
Golf Variant 1.4 tsi total flex
Golf Variant 1.4 tsi total flexGolf Variant 1.4 tsi total flex
Golf Variant 1.4 tsi total flexEduardo Abbas
 
Environment In Context : A Perspective From Environment Behavior Relation
Environment In Context : A Perspective From Environment Behavior RelationEnvironment In Context : A Perspective From Environment Behavior Relation
Environment In Context : A Perspective From Environment Behavior Relationinventionjournals
 
“Desquamative Gingivitis Treated By An Antioxidant Therapy- A Case Report”
“Desquamative Gingivitis Treated By An Antioxidant Therapy- A Case Report”“Desquamative Gingivitis Treated By An Antioxidant Therapy- A Case Report”
“Desquamative Gingivitis Treated By An Antioxidant Therapy- A Case Report”inventionjournals
 
Luxo Italia - Spoil Yourself
Luxo Italia - Spoil YourselfLuxo Italia - Spoil Yourself
Luxo Italia - Spoil YourselfLuxo Italia
 
Shaoxia Yu_selected_publications_2016
Shaoxia Yu_selected_publications_2016Shaoxia Yu_selected_publications_2016
Shaoxia Yu_selected_publications_2016Shaoxia Yu
 
Entrepreneurship in Tunisia: Obstacles
Entrepreneurship in Tunisia: ObstaclesEntrepreneurship in Tunisia: Obstacles
Entrepreneurship in Tunisia: Obstaclesinventionjournals
 

Andere mochten auch (20)

Presentación1 (1)
Presentación1 (1)Presentación1 (1)
Presentación1 (1)
 
Examining the Usability of Message Reading Features on Smartwatches
Examining the Usability of Message Reading Features on SmartwatchesExamining the Usability of Message Reading Features on Smartwatches
Examining the Usability of Message Reading Features on Smartwatches
 
Maria cabanas y_paula_calatrava
Maria cabanas y_paula_calatravaMaria cabanas y_paula_calatrava
Maria cabanas y_paula_calatrava
 
Redes sociales-tarea
Redes sociales-tareaRedes sociales-tarea
Redes sociales-tarea
 
La naturaleza
La naturalezaLa naturaleza
La naturaleza
 
United Wagon Company (Rusia)
United Wagon Company (Rusia)United Wagon Company (Rusia)
United Wagon Company (Rusia)
 
Taller ErgoCV en el XII Congreso Internacional de Prevención de Riesgos Labor...
Taller ErgoCV en el XII Congreso Internacional de Prevención de Riesgos Labor...Taller ErgoCV en el XII Congreso Internacional de Prevención de Riesgos Labor...
Taller ErgoCV en el XII Congreso Internacional de Prevención de Riesgos Labor...
 
A New Frontier of Criminal Justice System
A New Frontier of Criminal Justice System A New Frontier of Criminal Justice System
A New Frontier of Criminal Justice System
 
In Causality Principle as the Framework to Contextualize Time in Modern Physics
In Causality Principle as the Framework to Contextualize Time in Modern PhysicsIn Causality Principle as the Framework to Contextualize Time in Modern Physics
In Causality Principle as the Framework to Contextualize Time in Modern Physics
 
килаш расул+бизнес генератор+идея
килаш расул+бизнес генератор+идеякилаш расул+бизнес генератор+идея
килаш расул+бизнес генератор+идея
 
AULA 6 AUDITORIA DE GESTIÓN LILIANA BUENO CAMBA
AULA 6 AUDITORIA DE GESTIÓN LILIANA BUENO CAMBAAULA 6 AUDITORIA DE GESTIÓN LILIANA BUENO CAMBA
AULA 6 AUDITORIA DE GESTIÓN LILIANA BUENO CAMBA
 
Islm 5 g p1
Islm 5 g p1Islm 5 g p1
Islm 5 g p1
 
noti-tena-septiembre-2016
noti-tena-septiembre-2016noti-tena-septiembre-2016
noti-tena-septiembre-2016
 
Golf Variant 1.4 tsi total flex
Golf Variant 1.4 tsi total flexGolf Variant 1.4 tsi total flex
Golf Variant 1.4 tsi total flex
 
RitaGibson-TIS
RitaGibson-TISRitaGibson-TIS
RitaGibson-TIS
 
Environment In Context : A Perspective From Environment Behavior Relation
Environment In Context : A Perspective From Environment Behavior RelationEnvironment In Context : A Perspective From Environment Behavior Relation
Environment In Context : A Perspective From Environment Behavior Relation
 
“Desquamative Gingivitis Treated By An Antioxidant Therapy- A Case Report”
“Desquamative Gingivitis Treated By An Antioxidant Therapy- A Case Report”“Desquamative Gingivitis Treated By An Antioxidant Therapy- A Case Report”
“Desquamative Gingivitis Treated By An Antioxidant Therapy- A Case Report”
 
Luxo Italia - Spoil Yourself
Luxo Italia - Spoil YourselfLuxo Italia - Spoil Yourself
Luxo Italia - Spoil Yourself
 
Shaoxia Yu_selected_publications_2016
Shaoxia Yu_selected_publications_2016Shaoxia Yu_selected_publications_2016
Shaoxia Yu_selected_publications_2016
 
Entrepreneurship in Tunisia: Obstacles
Entrepreneurship in Tunisia: ObstaclesEntrepreneurship in Tunisia: Obstacles
Entrepreneurship in Tunisia: Obstacles
 

Ähnlich wie Introduction to Big Data and Hadoop using Local Standalone Mode

Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System cscpconf
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training reportSarvesh Meena
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methodspaperpublications3
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar ReportAtul Kushwaha
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniquesijsrd.com
 
BDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdfBDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdfKUMARRISHAV37
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics iosrjce
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelEditor IJCATR
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 

Ähnlich wie Introduction to Big Data and Hadoop using Local Standalone Mode (20)

G017143640
G017143640G017143640
G017143640
 
Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System
 
Big data
Big dataBig data
Big data
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
IJARCCE_49
IJARCCE_49IJARCCE_49
IJARCCE_49
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
BDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdfBDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdf
 
hadoop
hadoophadoop
hadoop
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
paper
paperpaper
paper
 
B017320612
B017320612B017320612
B017320612
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 

Kürzlich hochgeladen

main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 

Kürzlich hochgeladen (20)

main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 

Introduction to Big Data and Hadoop using Local Standalone Mode

  • 1. International Journal of Engineering Science Invention ISSN (Online): 2319 – 6734, ISSN (Print): 2319 – 6726 www.ijesi.org ||Volume 5 Issue 9|| September 2016 || PP. 82-86 www.ijesi.org 82 | Page Introduction to Big Data and Hadoop using Local Standalone Mode K.Srikanth 1 , P.Venkateswarlu 2 , R.Roje Spandana3 1,2,3 (Information Technology, Jawaharlal Nehru Technological University Kakinada, India) Abstract: Big Data is a term defined for data sets that are extreme and complex where traditional data processing applications are inadequate to deal with them. The term Big Data often refers simply to the use of predictive investigation on analytic methods that extract value from data. Big data is generalized as a large data which is a collection of big datasets that cannot be processed using traditional computing techniques. Big data is not purely a data, rather than it is a complete subject involves various tools, techniques and frameworks. Big data can be any structured collection which results incapability of conventional data management methods. Hadoop is a distributed example used to change the large amount of data. This manipulation contains not only storage as well as processing on the data. Hadoop is an open- source software framework for dispersed storage and processing of big data sets on computer clusters built from commodity hardware. HDFS was built to support high throughput, streaming reads and writes of extremely large files. Hadoop Map Reduce is a software framework for easily writing applications which process vast amounts of data. Wordcount example reads text files and counts how often words occur. The input is text files and the result is wordcount file, each line of which contains a word and the count of how often it occurred separated by a tab. Keywords: Big Data, Hadoop, HDFS, Map Reduce, WordCount I. Introduction Big data is generalized as a large data which is a collection of it is a collection of big datasets [6] that cannot be processed using traditional computing techniques. Large data is not purely a data, rather than it is a complete subject involves various techniques, tools and frameworks. Due to the onset of new advanced technologies, creation, and communication means like media, social networking sites, the amount of data produced by mankind is growing speedily every year [2]. The large amount of data beginning of time till 2003 was 6 billion gigabyte. Hadoop is a distributed exemplar used to change the large amount of data. This direction contains not only storage as well as processing on the data. In addition to simple storage, Google File System was built to data-intensive, support large-scale, distributed processing applications. The coming year, new paper, related Map Reduce Simplified Data Processing on Large Clusters, was presented, defining a programming model framework that provided automatic parallelization and the scale to process thousands of terabytes of big data in a one job no limitation thousands of machines. When paired, these two systems could be used to build large data processing [5] clusters on relatively inexpensive, commodity machines. This paper directly inspired the development of HDFS and Hadoop Map Reduce, respectively. Within the Apache Software Foundation, real time projects that explicitly make use of integrate with, Hadoop are springing up regularly. Some of these real time projects make authoring Map Reduce work easier and more jobs accessible, while others focus on getting data in and out of HDFS, simplify operations, enable deployment in cloud environments, and so on. II. Big Data Big Data is a term defined for data sets that are extreme and complex where traditional data processing applications are inadequate to deal with them. The data is large data, moves too fast, or does not fit the structures of our current big data architectures [7]. Big Data is typically large volume of unstructured and structured data that gets created from various organized and disordered applications, activities and channels such as emails etc. The main adversity with Big Data includes search, capture, storage, sharing, analysis, and visualization [2]. The core of Big Data is Hadoop which is a platform for administer computing problems across a number of servers. It is first developed and released as open source by web application. It implements the Map Reduce approach pioneered by Google in compiling its search indexes [8]. To store data, Hadoop utilizes its own distributed file system, HDFS, which makes data available to multiple computing nodes. Big data outbreak [4], a result not only of increasing web usage by people around the world the connection of billions of devices to the web.
  • 2. Introduction to Big Data and Hadoop using Local StandaloneMode www.ijesi.org 83 | Page III. Hadoop Hadoop is a platform that provides both shared storage and computational capabilities. At the time Google had published papers that described its different distributed file system, the Google File System and Map Reduce, a computational framework for parallel processing. The successful implementation of these paper concepts in Nutch resulted in its split into more projects, the second of which became Hadoop, in this section will look at Hadoop architectural perspective, examine how much company’s uses it, and consider some of its weaknesses. Once we have covered Hadoop background, will look at how to install Hadoop and run a Map Reduce job. Hadoop is a distributed master and slave architecture that consists of the Hadoop Distributed File System for storage and Map Reduce [10] for computational capabilities. Traits intrinsic to Hadoop are data partitioning and parallel computing of large datasets. Its storage and computational capabilities scale with the more number of hosts to a Hadoop cluster, and can reach volume sizes in the pet bytes on clusters with thousands of hosts. In the first step in this section will examine the Hadoop Distributed File System and Map Reduce architectures. Figure 1. Big Data and Hadoop Architecture IV. Hdfs The Apache Hadoop is a file system called the Hadoop Distributed File System or simply HDFS. HDFS was built to support big throughput, processing read and write of extremely large files. Traditional large Network Attached Storage (NAS) and Storage Area Networks (SANs) offer centralized [1], low-latency access to either a block device or a file system on the order of terabytes in size. These systems are unusually as the backing store for large databases, content delivery systems and same types of data storage needs because they can support full-featured POSIX semantics, scale to meet the size requirements of these systems, and offer lesser-latency access to data. Imagine for a second [2], though, hundreds or thousands of machines all awakening at the same time and pulling hundreds of terabytes of data from a central control storage system at once. This is where conventional storage does not necessarily scale. By creating a system composed of independent machines, each with its own input output subsystem, network interfaces, hard disks, RAM, and CPU, for the specific type of workload we are interested in. Figure 2. HDFS Architecture
  • 3. Introduction to Big Data and Hadoop using Local StandaloneMode www.ijesi.org 84 | Page 4.1 There are a number of specific goals for HDFS:  Store millions of huge files, each greater than tens of gigabytes, and file system sizes embracing tens of petabytes.  Use a scale out model based on inexpensive commodity servers with internal JBOD rather than RAID to achieve large scale storage. Accomplish possibility and high throughput through application level reproduction of data.  Optimize for large streaming reads and writes rather than low suspension access to many small files. Batch performance is more important than reciprocal response times.  Gracefully deal with component failures of machines and disks.  Support the performance and scale requirements of Map Reduce processing. V. Mapreduce Hadoop Map Reduce is a software framework for freely writing applications which process large amounts of data in parallel on large clusters of product hardware very trustful, fault-tolerant manner. A Map Reduce job usually splits the input dataset into self directed collections which are processed by the map tasks in a completely parallel manner. The framework distinguished the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output stored in a file system. The framework takes care of scheduling tasks, monitoring them and reproduces the failed tasks. Typically the storage nodes and the compute nodes are the same [3], that is, the Map Reduce framework and the widespread file system are running on the same set of nodes. This configuration allows the framework to effectively schedule tasks on the nodes where data is presented before, resulting in very high aggregate bandwidth across the cluster. The Map Reduce framework consists of a single master [9] Job detector and one slave Taskdetector per cluster- node. The master is responsible for scheduling the jobs’ element tasks on the slaves, monitoring them and reproduces the failed tasks. The slaves execute the tasks as assisted by the master. Minimally, applications specify the input and output locations and provide Map Reduce functions via implementations of perfect interfaces and or abstract-classes, and other job parameters, composed of the job configuration. The Hadoop job client then submits the job and configuration to the Jobdetector which then assumes and answerable of distributing the software configuration to the slaves, scheduling tasks and monitoring them, providing status and characteristic information to the job client. Figure 3.MapReduce Architecture in Big Data 5.1 Key Features of Map Reduce Framework  Implement Framework for Map Reduce Execution  Intellectual Developer from the complexity of Distributed programming  Partial failure of the refine cluster is expected and tolerable  Redundancy and fault-tolerance is built in. so that programmer does not have to worry  Map Reduce programming model is language self-sufficient  Self starting parallelization and distribution  Failing tolerance  Enable data local refine  Shared nothing architecture
  • 4. Introduction to Big Data and Hadoop using Local StandaloneMode www.ijesi.org 85 | Page VI. Wordcount Example Wordcount example reads text files and counts how frequently words occur. The input is text files and the output is word count, each line contains a word and the count of how oftentimes it occurred, divided by a tab. Each mapper takes a line as input and breaks it into words. It then spit out a key, value pair of the word and 1. Each reducer sums the counts for each word and spit out a single key, value with the word and sum. As for improvement of the reducer is also used as a incorporate on the map output. This reduces the amount of data sent across the network by combining each word into a single record. Step 1: Running Worcount $mkdirinputdir $cd inputdir $nano input.txt Figure 4.Create a file Step 2:Create the jar file and enter some text and save the file, it will be there in local file system only. Figure 5. Input to a text file $hadoop jar /home/srikanth/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduceexamples-2.6.0 wordcount inputdir outputdir to Figure 6.Create a Jar file Step 3: wordcount inputdit ouputdir $ cd outputdir $ cat part-r-00000 Figure 7. Output for a wordcount VII. Conclusion Using big data large volumes of data get created from various applications and activates where these common technical challenges are very common in various application domains. The data is extremely large where it does not fit to the structure of current database architectures. The paper describes Local StandaloneMode with Hadoop which is an open source software used for processing of large volumes of data. The paper directly is based on the development of HDFS and Hadoop Map Reduce. The Interest and investment in Hadoop has led to an entire ecosystem of related software both open source and commercial. Within the Apache Software Foundation, projects are integrated with Hadoop are springing up recurrently. Some of these projects make Map Reduce jobs easier and more accessible while others focus on getting data in and out of HDFS, simplify operations, and so on. Based on this paper we can extend the paper with the help of YARN and DFS to execute the word count as a web based application.
  • 5. Introduction to Big Data and Hadoop using Local StandaloneMode www.ijesi.org 86 | Page References [1]. S.VikramPhaneendra&E.Madhusudhan Reddy “Big Data- solutions for RDBMS problems- A survey” In 12thIEEE/IFIP Network Operations & Management Symposium (NOMS 2010)(Osaka, Japan, Apr 19{23 2013). [2]. Kiran kumara Reddi&Dnvsl Indira “Different Technique to Transfer Big Data: survey” IEEE Transactions on 52(8) (Aug.2013) 2348 {2355}. [3]. Jimmy Lin “MapReduce Is Good Enough?” The control project.IEEE Computer 32 (2013). [4]. Umasri.M.L, Shyamalagowri.D,Suresh Mining Big Data: - Current status and forecast to the future” Volume 4, Issue 1, January 2014 ISSN: 2277 128X. [5]. B. J. Doorn, M. V. Duivestein, S. Manen and Ommeren, ―Creating clarity with Big Data‖, Sogeti, (2012). [6]. Bernice Purcell “The emergence of “big data” technology and analytics” Journal of Technology Research 2013. [7]. Definting Big Data Architecture Framework: Outcome of the Brainstorming Session at the University of Amsterdam, 17 July 2013.Presented at NBD-WG, 24 July 2013 [Online]. [8]. AhmedEldawy, Mohamed F. Mokbel “A Demonstration of Spatial Hadoop: An Efficient Map Reduce Framework for Spatial Data” Proceedings of the VLDB Endowment, Vol. 6, No. 12Copyright 2013 VLDB Endowment 21508097/13/10. [9]. A Workload Model for Map Reduce by Thomas A. deRuiter, Master’s Thesis in Computer Science, Parallel and Distributed Systems Group Faculty of Electrical Engineering, Mathematics, and Computer Science. Delft University of Technology, 2nd June 2012. [10]. HDFS, http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html.