SlideShare a Scribd company logo
1 of 4
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 448
Big Data Processing with Hadoop : A Review
Gayathri Ravichandran
Student, Department of Computer Science , M.S Ramaiah Institute of Technology, Bangalore, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract – We live in an era where data is being generated
by everything around us. The rate of data generation is so
alarming, that it has engendered apressing needtoimplement
easy and cost-effectivedatastorageandretrievalmechanisms.
Furthermore, big data needs to be analyzed for insights and
attribute relationships, which can lead to better decision-
making and efficient business strategies. In this paper, we will
describe a formal definition of Big Data and look into its
industrial applications. Further, we will understand how
traditional mechanisms proveinadequatefor dataprocessing
due to the sheer volume, velocity and variety of big data. We
will then look into the Hadoop Architecture and its underlying
functionalities. This will include delineations on the HDFSand
MapReduce Framework. We will then review the Hadoop
Ecosystem, and explain each component in detail.
Key Words: Big Data, Hadoop, MapReduce, Hadoop
Components, HDFS
1. INTRODUCTION
1.1 Big Data: Definition
Big data is a collection of large datasets- structured,
unstructured or semi-structured that is being generated
from multiple sources at an alarming rate. Key enablers for
the growth of big data are – increasing storage capacities,
increasing processing power and availability of data. It is
thus important to develop mechanisms for easy storage and
retrieval. Some of the fields that come under the umbrella of
big data are - stock exchange data ( includes buying and
selling decisions), social media data (Facebook andTwitter),
power grid data ( contains information about the power
consumed by each node in a power station) and search
engine data ( Google). Structured data may include
relational databases like MySQL. Unstructured data may
include text files in .doc, .pdf formats as well as media files.
1.2 Benefits of Big Data
Analysis of big data helps in improving business trends,
finding innovative solutions, customer profiling and in
sentimental analysis. It also helps in identifying the root
causes for failures and re-evaluating risk portfolios. In
addition, it also personalizes customer and interaction.
1. Valuable Insights
Valuable insights can be derived from big datasets by
employing proper tools and methodologies. This data
includes those stored in the company database, or those
obtained from social media and other third party sources.
When data is processed and analyzed,onecandrawvaluable
relationships between various attributes that can improve
the quality of decision making. Statistics and industrial
knowledge can be combined to obtain useful insights
2. New Products and Services
Analyzing big data helps theorganizationtounderstandhow
customers perceive their products and services. This aids in
developing new products that are concurrentwithcustomer
needs and demands. In addition, it also facilitates re-
developing of currently existing products to suit customer
requirements.
3. Smart cities
Population increase begets demand. To help cities deal with
the consequences of rapid expansion, big data is being used
for the benefit of the citizens and the environment. For
example, the city of Portland, Oregon adopted a mechanism
for optimizing traffic signals in response to high congestion.
This not only reduced traffic jams in the city, but was also
significant in eliminating 157,000 metric tons of carbon
dioxide emissions.
4. Risk Analysis
Risk is defined as the probability of injury or loss. Risk
management is a very crucial process which is often over-
looked. Frequent analysis of the data will help mitigate
potential risks. Predictive analysis aids the organization to
keep up to date with recent technologies, services and
products. It also identifies the risks involved, and how they
can be mitigated.
5. Miscellaneous
Big data also aids Media, Government, Technology,Scientific
Research and Healthcare in making crucial decisions and
predictions. For example, Google Flu Trends (GFT)provided
estimates of influenza activity for more than 25 countries. It
made accurate predictions about flu activity.
1.3 Challenges of Big Data
1. Volume
Data is being generated at an alarming rate. The sheer
volume of data being generated makes the issue of data
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 449
processing a complicated task. Organizations collect and
generate data from a variety of sources and with the help of
technologies such as Hadoop, storage and retrieval of data
has become easier.
2. Velocity
Velocity refers to the rate at which data is being processed.
Sometimes, data may arrive at unprecedented speeds, and
thus it must be dealt with in a timely manner. It should be
processed in such a speed that is compatible for real time
applications.
3. Variety
Data is being generated from various sources , including
social media data, stock exchange data and black box data.
Furthermore, the data canassumevariousforms – numerals,
text, media files, etc. Thus, big data processing mechanisms
must know how to deal with eclectic data.
4. Variability
Data flow can be inconsistent which can be challenging to
manage.
5. Complexity
The relationships between various attributes in a dataset,
hierarchies and data linkages add to the complexity of data
2. LIMITATIONS OF TRADITIONAL APPROACH
The traditional approach consists of a computertostoreand
process big data. Data is stored in a Relational Database like
MySQL and Oracle. This approach works well when the
volume of data is less. However, when dealing with larger
volumes of data, it becomes tedious to process it through a
database server. Hence, this calls for a more sophisticated
approach. We will now look into Hadoop – its modules,
framework and ecosystem.
3. HADOOP
Apache Hadoop is an open source software framework for
storing and processing large clusters of data. It has extensive
processing power and it consists of large networks of
computer clusters. Hadoop makes it possible to handle
thousands of terabytes of data. Hardware failures are
automatically handled by the framework.
Apache Hadoop consists of 4 modules:
a. Hadoop Distributed File System(HDFS)
b. Hadoop MapReduce
c. Hadoop YARN
d. Hadoop Common
This paper will primarily concentrate on the former two
modules.
3.1 Hadoop Distributed File System (HDFS)
Apache Hadoop uses the HadoopDistributedFileSystem.Itis
highly fault tolerant and uses minimal cost hardware. It
consists of a cluster of machines, and files are stored across
them. It also provides file permissions and authentication,
and streaming access to system data.
The following figure depictsthegeneralarchitectureofHDFS
Figure 1: HDFS Architecture
HDFS follows the Master- Slave Architecture. It has the
following components.
1. Name node
The HDFS consists of a single name node, which acts as the
master node. It controls and manages the file system
namespace. A file system namespace consists of a hierarchy
of files and directories, where users can create, remove or
move files based on their privilege. A file is split into one or
more blocks and each block is stored in a Data node. HDFS
consists of more than one Data Nodes.
The roles of the name node are as follows:
a. Mapping blocks to their data nodes.
b. Managing of file system namespace
c. Executing file system operations- opening, closing
and renaming of files.
2. Data node
The HDFS consists of more than one data node. The data
nodes store the file blocks that are mapped onto it by the
Name node. The data nodes are responsible for performing
read and write operations from file systems as per client
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 450
request. They also perform block creation and replication.
The minimum amount of data that the system can write or
read is called a block. This value however is not fixed, and it
can be increased.
3.2 Hadoop MapReduce Framework
Hadoop uses the MapReduce framework for
distributed computing applications to process large
amounts of data.Itisadistributedprogrammingmodel
based on the Java Programming language. The data
processing frameworks are called mappers and
reducers. The MapReduce framework is attractive due
to its scalability.
It consists of two important tasks : Map and Reduce
Figure 2: MapReduce Framework
1. Map stage
The map function takes in a set of data as the input, and
returns a key-value pair as the output. The input may be in
the form of a file or directory. The output of the map stage
serves as input to the reduce stage.
2. Reduce stage
The reduce function will combine the data tuples into a
smaller set. The map task always precedes the reduce task.
The output of reduce stage is stored in the HDFS.
3.3 Hadoop Ecosystem
Figure 3: Hadoop Ecosystem
1. HBase
HBASE or Hadoop Database is a NoSQL Database, i.e., it is
non-relational. It is built on top of the HDFS System written
in Java. It is the underlying technology of social media
websites like Facebook.
2. Hive
Hive is a structured Query Language. It uses the Hive Query
Language (HQL), and it deals with structured data. It runs
MapReduce Algorithm as its backend, and it is a data
warehousing framework.
3. Pig
Pig also deals with structured data, and it uses the Pig Latin
Language. It consists of a series of operations applied to
input data, and it uses MapReduce in the back-end. It adds a
level of abstraction to data processing.
4. Mahout
It is an open source Apache MachineLearninglibraryinJava.
It has modules for clustering, categorization, collective
filtering and mining of frequent patterns.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 451
4. CONCLUSION
This paper starts off by giving a formaldefinitiontoBig
Data. Then, the challenges of handling big data are
examined, followed by the limitations of using the
traditional big data processing approach. We then
delve into the details of Hadoop and its components,
and its MapReduce framework.
REFERENCES
[1] Dean, J. and Ghemawat, S., “MapReduce: a flexible data
processing tool” Young, The Technical Writer’s
Handbook. Mill Valley, CA: University Science, 1989.
[2] Varsha B.Bobade, “Survey Paper on Big Data and
Hadoop”, IRJET, Volume 3, Issue 1, January 2016
[3] Bijesh Dhyani,Anurag Barthwal, “Big Data Analytics
using Hadoop”, International Journal of Computing
Applications, Volume 108, No.12, December 2014
[4] Ms. Gurpreet Kaur,Ms.ManpreetKaur, “ReviewPaperon
Big Data using Hadoop”, International Journal of
Computing Engineering and Technology, Volume 6,
Issue 12, Dec 2015, pp. 65-71
[5] Harshwardhan S. Bhosale et al, “Review paper on Big
Data using Hadoop”, International Journal of Scientific
and Research Publications, Volume 4, Issue 10, October
2014
[6] Poonam S. Patil et al. “Survey Paper on Big Data
Processing and Hadoop Components”, International
Journal of Science and Research, Volume 3, Issue 10,
October 2014.
[7] Apache HBase. Available at http://hbase.apache.org
[8] Apache Hive. Available at http://hive.apache.org
[9] Abhishek S, “Big Data and Hadoop”, White Paper
[10] Konstantin Shvachko et.al, “The HadoopDistributedFile
System”

More Related Content

What's hot

What's hot (18)

A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challenges
 
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articles
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
The Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their UsageThe Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their Usage
 
IRJET- Survey of Big Data with Hadoop
IRJET-  	  Survey of Big Data with HadoopIRJET-  	  Survey of Big Data with Hadoop
IRJET- Survey of Big Data with Hadoop
 
Big data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security AllianceBig data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security Alliance
 
IRJET-Unraveling the Data Structures of Big data, the HDFS Architecture and I...
IRJET-Unraveling the Data Structures of Big data, the HDFS Architecture and I...IRJET-Unraveling the Data Structures of Big data, the HDFS Architecture and I...
IRJET-Unraveling the Data Structures of Big data, the HDFS Architecture and I...
 
Hadoop
HadoopHadoop
Hadoop
 
big data
big databig data
big data
 
Bigdata
Bigdata Bigdata
Bigdata
 
DOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCEDOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCE
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
 

Similar to Big Data Processing with Hadoop : A Review

Similar to Big Data Processing with Hadoop : A Review (20)

Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
 
DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond Hill
DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond HillDOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond Hill
DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond Hill
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutions
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutions
 
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONSBIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutions
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
 
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of HadoopIRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and Hadoop
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
TSE_Pres12.pptx
TSE_Pres12.pptxTSE_Pres12.pptx
TSE_Pres12.pptx
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop Environment
 
Big Data A Review
Big Data A ReviewBig Data A Review
Big Data A Review
 
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCESURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
 
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxUnit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
 
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 

More from IRJET Journal

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 

Recently uploaded (20)

22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 

Big Data Processing with Hadoop : A Review

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 448 Big Data Processing with Hadoop : A Review Gayathri Ravichandran Student, Department of Computer Science , M.S Ramaiah Institute of Technology, Bangalore, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract – We live in an era where data is being generated by everything around us. The rate of data generation is so alarming, that it has engendered apressing needtoimplement easy and cost-effectivedatastorageandretrievalmechanisms. Furthermore, big data needs to be analyzed for insights and attribute relationships, which can lead to better decision- making and efficient business strategies. In this paper, we will describe a formal definition of Big Data and look into its industrial applications. Further, we will understand how traditional mechanisms proveinadequatefor dataprocessing due to the sheer volume, velocity and variety of big data. We will then look into the Hadoop Architecture and its underlying functionalities. This will include delineations on the HDFSand MapReduce Framework. We will then review the Hadoop Ecosystem, and explain each component in detail. Key Words: Big Data, Hadoop, MapReduce, Hadoop Components, HDFS 1. INTRODUCTION 1.1 Big Data: Definition Big data is a collection of large datasets- structured, unstructured or semi-structured that is being generated from multiple sources at an alarming rate. Key enablers for the growth of big data are – increasing storage capacities, increasing processing power and availability of data. It is thus important to develop mechanisms for easy storage and retrieval. Some of the fields that come under the umbrella of big data are - stock exchange data ( includes buying and selling decisions), social media data (Facebook andTwitter), power grid data ( contains information about the power consumed by each node in a power station) and search engine data ( Google). Structured data may include relational databases like MySQL. Unstructured data may include text files in .doc, .pdf formats as well as media files. 1.2 Benefits of Big Data Analysis of big data helps in improving business trends, finding innovative solutions, customer profiling and in sentimental analysis. It also helps in identifying the root causes for failures and re-evaluating risk portfolios. In addition, it also personalizes customer and interaction. 1. Valuable Insights Valuable insights can be derived from big datasets by employing proper tools and methodologies. This data includes those stored in the company database, or those obtained from social media and other third party sources. When data is processed and analyzed,onecandrawvaluable relationships between various attributes that can improve the quality of decision making. Statistics and industrial knowledge can be combined to obtain useful insights 2. New Products and Services Analyzing big data helps theorganizationtounderstandhow customers perceive their products and services. This aids in developing new products that are concurrentwithcustomer needs and demands. In addition, it also facilitates re- developing of currently existing products to suit customer requirements. 3. Smart cities Population increase begets demand. To help cities deal with the consequences of rapid expansion, big data is being used for the benefit of the citizens and the environment. For example, the city of Portland, Oregon adopted a mechanism for optimizing traffic signals in response to high congestion. This not only reduced traffic jams in the city, but was also significant in eliminating 157,000 metric tons of carbon dioxide emissions. 4. Risk Analysis Risk is defined as the probability of injury or loss. Risk management is a very crucial process which is often over- looked. Frequent analysis of the data will help mitigate potential risks. Predictive analysis aids the organization to keep up to date with recent technologies, services and products. It also identifies the risks involved, and how they can be mitigated. 5. Miscellaneous Big data also aids Media, Government, Technology,Scientific Research and Healthcare in making crucial decisions and predictions. For example, Google Flu Trends (GFT)provided estimates of influenza activity for more than 25 countries. It made accurate predictions about flu activity. 1.3 Challenges of Big Data 1. Volume Data is being generated at an alarming rate. The sheer volume of data being generated makes the issue of data
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 449 processing a complicated task. Organizations collect and generate data from a variety of sources and with the help of technologies such as Hadoop, storage and retrieval of data has become easier. 2. Velocity Velocity refers to the rate at which data is being processed. Sometimes, data may arrive at unprecedented speeds, and thus it must be dealt with in a timely manner. It should be processed in such a speed that is compatible for real time applications. 3. Variety Data is being generated from various sources , including social media data, stock exchange data and black box data. Furthermore, the data canassumevariousforms – numerals, text, media files, etc. Thus, big data processing mechanisms must know how to deal with eclectic data. 4. Variability Data flow can be inconsistent which can be challenging to manage. 5. Complexity The relationships between various attributes in a dataset, hierarchies and data linkages add to the complexity of data 2. LIMITATIONS OF TRADITIONAL APPROACH The traditional approach consists of a computertostoreand process big data. Data is stored in a Relational Database like MySQL and Oracle. This approach works well when the volume of data is less. However, when dealing with larger volumes of data, it becomes tedious to process it through a database server. Hence, this calls for a more sophisticated approach. We will now look into Hadoop – its modules, framework and ecosystem. 3. HADOOP Apache Hadoop is an open source software framework for storing and processing large clusters of data. It has extensive processing power and it consists of large networks of computer clusters. Hadoop makes it possible to handle thousands of terabytes of data. Hardware failures are automatically handled by the framework. Apache Hadoop consists of 4 modules: a. Hadoop Distributed File System(HDFS) b. Hadoop MapReduce c. Hadoop YARN d. Hadoop Common This paper will primarily concentrate on the former two modules. 3.1 Hadoop Distributed File System (HDFS) Apache Hadoop uses the HadoopDistributedFileSystem.Itis highly fault tolerant and uses minimal cost hardware. It consists of a cluster of machines, and files are stored across them. It also provides file permissions and authentication, and streaming access to system data. The following figure depictsthegeneralarchitectureofHDFS Figure 1: HDFS Architecture HDFS follows the Master- Slave Architecture. It has the following components. 1. Name node The HDFS consists of a single name node, which acts as the master node. It controls and manages the file system namespace. A file system namespace consists of a hierarchy of files and directories, where users can create, remove or move files based on their privilege. A file is split into one or more blocks and each block is stored in a Data node. HDFS consists of more than one Data Nodes. The roles of the name node are as follows: a. Mapping blocks to their data nodes. b. Managing of file system namespace c. Executing file system operations- opening, closing and renaming of files. 2. Data node The HDFS consists of more than one data node. The data nodes store the file blocks that are mapped onto it by the Name node. The data nodes are responsible for performing read and write operations from file systems as per client
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 450 request. They also perform block creation and replication. The minimum amount of data that the system can write or read is called a block. This value however is not fixed, and it can be increased. 3.2 Hadoop MapReduce Framework Hadoop uses the MapReduce framework for distributed computing applications to process large amounts of data.Itisadistributedprogrammingmodel based on the Java Programming language. The data processing frameworks are called mappers and reducers. The MapReduce framework is attractive due to its scalability. It consists of two important tasks : Map and Reduce Figure 2: MapReduce Framework 1. Map stage The map function takes in a set of data as the input, and returns a key-value pair as the output. The input may be in the form of a file or directory. The output of the map stage serves as input to the reduce stage. 2. Reduce stage The reduce function will combine the data tuples into a smaller set. The map task always precedes the reduce task. The output of reduce stage is stored in the HDFS. 3.3 Hadoop Ecosystem Figure 3: Hadoop Ecosystem 1. HBase HBASE or Hadoop Database is a NoSQL Database, i.e., it is non-relational. It is built on top of the HDFS System written in Java. It is the underlying technology of social media websites like Facebook. 2. Hive Hive is a structured Query Language. It uses the Hive Query Language (HQL), and it deals with structured data. It runs MapReduce Algorithm as its backend, and it is a data warehousing framework. 3. Pig Pig also deals with structured data, and it uses the Pig Latin Language. It consists of a series of operations applied to input data, and it uses MapReduce in the back-end. It adds a level of abstraction to data processing. 4. Mahout It is an open source Apache MachineLearninglibraryinJava. It has modules for clustering, categorization, collective filtering and mining of frequent patterns.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 451 4. CONCLUSION This paper starts off by giving a formaldefinitiontoBig Data. Then, the challenges of handling big data are examined, followed by the limitations of using the traditional big data processing approach. We then delve into the details of Hadoop and its components, and its MapReduce framework. REFERENCES [1] Dean, J. and Ghemawat, S., “MapReduce: a flexible data processing tool” Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989. [2] Varsha B.Bobade, “Survey Paper on Big Data and Hadoop”, IRJET, Volume 3, Issue 1, January 2016 [3] Bijesh Dhyani,Anurag Barthwal, “Big Data Analytics using Hadoop”, International Journal of Computing Applications, Volume 108, No.12, December 2014 [4] Ms. Gurpreet Kaur,Ms.ManpreetKaur, “ReviewPaperon Big Data using Hadoop”, International Journal of Computing Engineering and Technology, Volume 6, Issue 12, Dec 2015, pp. 65-71 [5] Harshwardhan S. Bhosale et al, “Review paper on Big Data using Hadoop”, International Journal of Scientific and Research Publications, Volume 4, Issue 10, October 2014 [6] Poonam S. Patil et al. “Survey Paper on Big Data Processing and Hadoop Components”, International Journal of Science and Research, Volume 3, Issue 10, October 2014. [7] Apache HBase. Available at http://hbase.apache.org [8] Apache Hive. Available at http://hive.apache.org [9] Abhishek S, “Big Data and Hadoop”, White Paper [10] Konstantin Shvachko et.al, “The HadoopDistributedFile System”