SlideShare a Scribd company logo
1 of 3
Download to read offline
IJSRD - International Journal for Scientific Research & Development| Vol. 1, Issue 9, 2013 | ISSN (online): 2321-0613
All rights reserved by www.ijsrd.com 1806
Abstract—There is a growing trend of applications that
ought to handle huge information. However, analysing huge
information may be a terribly difficult drawback nowadays.
For such data many techniques can be considered. The
technologies like Grid Computing, Volunteering
Computing, and RDBMS can be considered as potential
techniques to handle such data. We have a still in growing
phase Hadoop Tool to handle such data also. We will do a
survey on all this techniques to find a potential technique to
manage and work with Big Data.
Key words: big data, big data analysis, hadoop;
I. INTRODUCTION
Large-scale data and its analysis are at the centre of modern
research and enterprise. These data are known as Big Data.
The facts and figures of such data are developed from online
transactions, internet messages, videos, audios, images, click
streams, logs, posts, search queries, wellbeing notes, social
networking interactions, science facts and figures, sensors
and wireless phones and their submissions . They are
retained in databases grow hugely and become tough to
capture, pattern, store, manage, share, investigate and
visualize by usual database software tools. [1]
5 Exabyte (1018 bytes) of facts and figures were
conceived by human until 2003. Today this amount of data
is created in two days. In 2012, digital world of data was
expanded to 2.72 zettabytes (1021 bytes). It is forecast to
double every two years, reaching about 8 zettabytes of facts
and figures by 2015. IBM indicates that every day 2.5
exabytes of facts and figures created furthermore 90% of the
facts and figures made in last two years. An individual
computer retains about 500 gigabytes (109 bytes), so it
would need about 20 billion PCs to shop all of the world’s
facts and figures. In the past, human genome decryption
method takes roughly 10 years, now not more than a week.
Multimedia facts and figures have large-scale weight on
internet backbone traffic and are anticipated to boost 70%
by 2013. Only Google has got more than one million servers
around the worlds. There have been 6 billion mobile
subscriptions in the world and every day 10 billion text
messages are sent. By the year 2020, 50 billion devices will
be attached to systems and the internet. [1]
II. RELATED WORK
There are many techniques to handle Big data. Here we will
discuss some Big Data storage techniques as well as some of
the Big data analysis techniques. Their brief study is as
mentioned below.
A. Hadoop
Hadoop is an open source project hosted by Apache
Software Foundation. It consists of many small sub projects
Which belong to the category of infrastructure for
distributed computing. Hadoop mainly consists of:
1) File System (The Hadoop File System)
2) Programming Paradigm (Map Reduce)
The other subprojects provide complementary services or
they are building on the core to add higher-level
abstractions. There exist many problems in dealing with
storage of large amount of data.
Though the storage capacities of the drives have
increased massively but the rate of reading data from them
hasn’t shown that considerable improvement. The reading
process takes large amount of time and the process of
writing is also slower. This time can be reduced by reading
from multiple disks at once. Only using one hundredth of a
disk may seem wasteful. But if there are one hundred
datasets, each of which is one terabyte and providing shared
access to them is also a solution. [2]
There occur many problems also with using many
pieces of hardware as it increases the chances of failure.
This can be avoided by Replication i.e. creating redundant
copies of the same data at different devices so that in case of
failure the copy of the data is available. [2]
The main problem is of combining the data being
read from different devices. Many a methods are available
in distributed computing to handle this problem but still it is
quite challenging. All the problems discussed are easily
handled by Hadoop. The problem of failure is handled by
the Hadoop Distributed File System and problem of
combining data is handled by Map reduce programming
Paradigm. Map Reduce basically reduces the problem of
disk reads and writes by providing a programming model
dealing in computation with keys and values. [2]
Hadoop thus provides: a reliable shared storage and
analysis system. The storage is provided by HDFS and
analysis by Map Reduce. [2]
B. Hadoop Components in detail
1) Hadoop Distributed File System:
Fig. 1: HDFS Architecture [4]
A Survey on Big Data Analysis Techniques
Himanshu Rathod1
Tarulata Chauhan2
1, 2
Department of Computer Engineering
1, 2
L.J.I.T, Ahmedabad, Gujarat
A Survey on Big Data Analysis Techniques
(IJSRD/Vol. 1/Issue 9/2013/0030)
All rights reserved by www.ijsrd.com 1807
Hadoop comes with a distributed File System called HDFS,
which stands for Hadoop Distributed File System. HDFS is
a File System designed for storing very large files with
streaming data access patterns, running on clusters on
commodity hardware. HDFS block size is much larger than
that of normal file system i.e. 64 MB by default. The reason
for this large size of blocks is to reduce the number of disk
seeks. [4]
A HDFS cluster has two types of nodes i.e.
namenode (the master) and number of datanodes (workers).
The name node manages the file system namespace,
maintains the file system tree and the metadata for all the
files and directories in the tree. The datanode stores and
retrieve blocks as per the instructions of clients or the
namenode. The data retrieved is reported back to the
namenode with lists of blocks that they are storing. Without
the namenode it is not possible to access the file. So it
becomes very important to make name node resilient to
failure. [4]
These are areas where HDFS is not a good fit:
Low-latency data access, Lots of small file, multiple writers
and arbitrary file modifications. [4]
2) MapReduce:
MapReduce is the programming paradigm allowing massive
scalability. The MapReduce basically performs two different
tasks i.e. Map Task and Reduce Task. A map-reduce
computation executes as follows:
Fig. 2: MapReduce Architecture
Map tasks are given input from distributed file system. The
map tasks produce a sequence of key-value pairs from the
input and this is done according to the code written for map
function. These value generated are collected by master
controller and are sorted by key and divided among reduce
tasks. The sorting basically assures that the same key values
ends with the same reduce tasks. The Reduce tasks combine
all the values associated with a key working with one key at
a time. Again the combination process depends on the code
written for reduce job. [5]
The Master controller process and some number of
worker processes at different compute nodes are forked by
the user. Worker handles map tasks (MAP WORKER) and
reduce tasks (REDUCE WORKER) but not both. [5]
The Master controller creates some number of
maps and reduces tasks which are usually decided by the
user program. The tasks are assigned to the worker nodes by
the master controller. Track of the status of each Map and
Reduce task (idle, executing at a particular Worker or
completed) is kept by the Master Process. On the
completion of the work assigned the worker process reports
to the master and master reassigns it with some task.
The failure of a compute node is detected by the
master as it periodically pings the worker nodes. All the
Map tasks assigned to that node are restarted even if it had
completed and this is due to the fact that the results of that
computation would be available on that node only for the
reduce tasks. The status of each of these Map tasks is set to
idle by Master. These get scheduled by Master on a Worker
only when one becomes available. The Master must also
inform each Reduce task that the location of its input from
that Map task has changed. [5]
III. COMPARISON OF HADOOP TECHNIQUE WITH
OTHER SYSTEM TECHNIQUES
A. Grid Computing Tools:
The approach in Grid computing includes the distribution of
work across a cluster and they are having a common shared
File system hosted by SAN. The jobs here are mainly
compute intensive and thus it suits well to them unlike as in
case of Big data where access to larger volume of data as
network bandwidth is the main bottleneck and the compute
nodes start becoming idle. Map Reduce component of
Hadoop here plays an important role by making use of the
Data Locality property where it collocates the data with the
compute node itself so that the data access is fast.[2,6]
Grid computing basically makes a use of the API’s
such as message passing Interface (MPI). Though it
provides great control to the user, the user needs to control
the mechanism for handling the data flow. On the other hand
Map Reduce operates only at the higher level where the data
flow is implicit and the programmer just thinks in terms of
key and value pairs. Coordination of the jobs on large
distributed systems is always challenging. Map Reduce
handles this problem easily as it is based on shared-nothing
architecture i.e. the tasks are independent of each other. The
implementation of Map Reduce itself detects the failed tasks
and reschedules them on healthy machines. Thus the order
in which the tasks run hardly matters from programmer’s
point of view. But in case of MPI, an explicit management
of check pointing and recovery system needs to be done by
the program. This gives more control to the programmer but
makes them more difficult to write. [2, 6]
B. Comparison with Volunteer Computing Technique:
In Volunteer computing work is broken down into chunks
called work units which are sent on computers across the
world to be analysed. After the completion of the analysis
the results are sent back to the server and the client is
assigned with another work unit. In order to assure accuracy,
each work unit is sent to three different machines and the
result is accepted if at least two of them match. This concept
of Volunteer Computing makes it look like MapReduce. But
there exists a big difference between the two the tasks in
case of Volunteer. Computing is basically CPU intensive.
This tasks makes these tasks suited to be distributed across
computers as transfer of work unit time is less than the time
required for the computation whereas in case of MapReduce
A Survey on Big Data Analysis Techniques
(IJSRD/Vol. 1/Issue 9/2013/0030)
All rights reserved by www.ijsrd.com 1808
is designed to run jobs that last minutes or hours on trusted,
dedicated hardware running in a single data centre with very
high aggregate bandwidth interconnects.[2,6]
C. Comparison with RDBMS:
The traditional database deals with data size in range of
Gigabytes as compared to MapReduce dealing in petabytes.
The Scaling in case of MapReduce is linear as compared to
that of traditional database. In fact the RDBMS differs
structurally, in updating, and access techniques from
MapReduce. Comparison of RDBMS over some general
properties of big data is as shown below. [2, 6]
Traditional RDBMS MapReduce
Data
Size
Gigabytes Petabytes
Access Interactive and batch Batch
Updates
Read and write many
times
Write once, read many
times
Structure Static schema Dynamic Schema
Integrity High Low
Scaling Nonlinear Linear
Table. 1: RDBMS compared to MapReduce [6]
IV. CONCLUSION AND FUTURE WORK
From the above study and survey we can conclude that
Hadoop is possibly one of the best solutions to maintain the
Big Data. We studied other techniques such as Grid
Computing tools, Volunteering Computing and RDBMS
techniques. We learnt that Hadoop is capable enough to
handle such amount of data and analyze such data.
In future we would like to carry out unstructured data of
logs with the help of Hadoop. We would like to carry out
analysis more efficiently and quickly.
REFRENCES
[1] Sagiroglu, S.; Sinanc, D., "Big data: A review,"
Collaboration Technologies and Systems (CTS), 2013
International Conference on , vol., no., pp.42,47, 20-24
May 2013
[2] Katal, A.; Wazid, M.; Goudar, R.H., "Big data: Issues,
challenges, tools and Good practices," Contemporary
Computing (IC3), 2013 Sixth International Conference
on , vol., no., pp.404,409, 8-10 Aug. 2013
[3] Nandimath, J.; Banerjee, E.; Patil, A.; Kakade, P.;
Vaidya, S., "Big data analysis using Apache Hadoop,"
Information Reuse and Integration (IRI), 2013 IEEE
14th International Conference on , vol., no., pp.700,703,
14-16 Aug. 2013
[4] Apache Hadoop (2013). HDFS Architecture Guide
[Online]. Available:
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.ht
[5] Apache Hadoop (2013). Hadoop Map/Reduce Tutorial
[Online]. Available:
http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.
html
[6] Tom White (2013). Inkling for Web [Online].
Available: ttps://www.inkling.com/read/hadoop-
definitive-guide-tom-white-3rd/chapter-1/comparison-
with-other-systems

More Related Content

What's hot

Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computinghuda2018
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesDavid Tjahjono,MD,MBA(UK)
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Robert Grossman
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportAhmad El Tawil
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoopdbpublications
 
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of HadoopIRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of HadoopIRJET Journal
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoopRexRamos9
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaNithin Kakkireni
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics iosrjce
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET Journal
 
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...IRJET Journal
 
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...IJSRD
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415SANTOSH WAYAL
 
A Brief on MapReduce Performance
A Brief on MapReduce PerformanceA Brief on MapReduce Performance
A Brief on MapReduce PerformanceAM Publications
 

What's hot (18)

Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
 
D04501036040
D04501036040D04501036040
D04501036040
 
Hadoop technology doc
Hadoop technology docHadoop technology doc
Hadoop technology doc
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
 
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of HadoopIRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoop
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
IJARCCE_49
IJARCCE_49IJARCCE_49
IJARCCE_49
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
 
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
 
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
 
A Brief on MapReduce Performance
A Brief on MapReduce PerformanceA Brief on MapReduce Performance
A Brief on MapReduce Performance
 

Similar to A Survey on Big Data Analysis Techniques

Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System cscpconf
 
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Modeinventionjournals
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar ReportAtul Kushwaha
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combinersijcsit
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabadsreehari orienit
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopIOSR Journals
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopGERARDO BARBERENA
 
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce ijujournal
 
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCEHMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCEijujournal
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoopAnusha sweety
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 

Similar to A Survey on Big Data Analysis Techniques (20)

Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System
 
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Mode
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
G017143640
G017143640G017143640
G017143640
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
B017320612
B017320612B017320612
B017320612
 
Big data
Big dataBig data
Big data
 
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
 
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCEHMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Hadoop
HadoopHadoop
Hadoop
 

More from ijsrd.com

IoT Enabled Smart Grid
IoT Enabled Smart GridIoT Enabled Smart Grid
IoT Enabled Smart Gridijsrd.com
 
A Survey Report on : Security & Challenges in Internet of Things
A Survey Report on : Security & Challenges in Internet of ThingsA Survey Report on : Security & Challenges in Internet of Things
A Survey Report on : Security & Challenges in Internet of Thingsijsrd.com
 
IoT for Everyday Life
IoT for Everyday LifeIoT for Everyday Life
IoT for Everyday Lifeijsrd.com
 
Study on Issues in Managing and Protecting Data of IOT
Study on Issues in Managing and Protecting Data of IOTStudy on Issues in Managing and Protecting Data of IOT
Study on Issues in Managing and Protecting Data of IOTijsrd.com
 
Interactive Technologies for Improving Quality of Education to Build Collabor...
Interactive Technologies for Improving Quality of Education to Build Collabor...Interactive Technologies for Improving Quality of Education to Build Collabor...
Interactive Technologies for Improving Quality of Education to Build Collabor...ijsrd.com
 
Internet of Things - Paradigm Shift of Future Internet Application for Specia...
Internet of Things - Paradigm Shift of Future Internet Application for Specia...Internet of Things - Paradigm Shift of Future Internet Application for Specia...
Internet of Things - Paradigm Shift of Future Internet Application for Specia...ijsrd.com
 
A Study of the Adverse Effects of IoT on Student's Life
A Study of the Adverse Effects of IoT on Student's LifeA Study of the Adverse Effects of IoT on Student's Life
A Study of the Adverse Effects of IoT on Student's Lifeijsrd.com
 
Pedagogy for Effective use of ICT in English Language Learning
Pedagogy for Effective use of ICT in English Language LearningPedagogy for Effective use of ICT in English Language Learning
Pedagogy for Effective use of ICT in English Language Learningijsrd.com
 
Virtual Eye - Smart Traffic Navigation System
Virtual Eye - Smart Traffic Navigation SystemVirtual Eye - Smart Traffic Navigation System
Virtual Eye - Smart Traffic Navigation Systemijsrd.com
 
Ontological Model of Educational Programs in Computer Science (Bachelor and M...
Ontological Model of Educational Programs in Computer Science (Bachelor and M...Ontological Model of Educational Programs in Computer Science (Bachelor and M...
Ontological Model of Educational Programs in Computer Science (Bachelor and M...ijsrd.com
 
Understanding IoT Management for Smart Refrigerator
Understanding IoT Management for Smart RefrigeratorUnderstanding IoT Management for Smart Refrigerator
Understanding IoT Management for Smart Refrigeratorijsrd.com
 
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...ijsrd.com
 
A Review: Microwave Energy for materials processing
A Review: Microwave Energy for materials processingA Review: Microwave Energy for materials processing
A Review: Microwave Energy for materials processingijsrd.com
 
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
Web Usage Mining: A Survey on User's Navigation Pattern from Web LogsWeb Usage Mining: A Survey on User's Navigation Pattern from Web Logs
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logsijsrd.com
 
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEM
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEMAPPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEM
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEMijsrd.com
 
Making model of dual axis solar tracking with Maximum Power Point Tracking
Making model of dual axis solar tracking with Maximum Power Point TrackingMaking model of dual axis solar tracking with Maximum Power Point Tracking
Making model of dual axis solar tracking with Maximum Power Point Trackingijsrd.com
 
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...ijsrd.com
 
Study and Review on Various Current Comparators
Study and Review on Various Current ComparatorsStudy and Review on Various Current Comparators
Study and Review on Various Current Comparatorsijsrd.com
 
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...ijsrd.com
 
Defending Reactive Jammers in WSN using a Trigger Identification Service.
Defending Reactive Jammers in WSN using a Trigger Identification Service.Defending Reactive Jammers in WSN using a Trigger Identification Service.
Defending Reactive Jammers in WSN using a Trigger Identification Service.ijsrd.com
 

More from ijsrd.com (20)

IoT Enabled Smart Grid
IoT Enabled Smart GridIoT Enabled Smart Grid
IoT Enabled Smart Grid
 
A Survey Report on : Security & Challenges in Internet of Things
A Survey Report on : Security & Challenges in Internet of ThingsA Survey Report on : Security & Challenges in Internet of Things
A Survey Report on : Security & Challenges in Internet of Things
 
IoT for Everyday Life
IoT for Everyday LifeIoT for Everyday Life
IoT for Everyday Life
 
Study on Issues in Managing and Protecting Data of IOT
Study on Issues in Managing and Protecting Data of IOTStudy on Issues in Managing and Protecting Data of IOT
Study on Issues in Managing and Protecting Data of IOT
 
Interactive Technologies for Improving Quality of Education to Build Collabor...
Interactive Technologies for Improving Quality of Education to Build Collabor...Interactive Technologies for Improving Quality of Education to Build Collabor...
Interactive Technologies for Improving Quality of Education to Build Collabor...
 
Internet of Things - Paradigm Shift of Future Internet Application for Specia...
Internet of Things - Paradigm Shift of Future Internet Application for Specia...Internet of Things - Paradigm Shift of Future Internet Application for Specia...
Internet of Things - Paradigm Shift of Future Internet Application for Specia...
 
A Study of the Adverse Effects of IoT on Student's Life
A Study of the Adverse Effects of IoT on Student's LifeA Study of the Adverse Effects of IoT on Student's Life
A Study of the Adverse Effects of IoT on Student's Life
 
Pedagogy for Effective use of ICT in English Language Learning
Pedagogy for Effective use of ICT in English Language LearningPedagogy for Effective use of ICT in English Language Learning
Pedagogy for Effective use of ICT in English Language Learning
 
Virtual Eye - Smart Traffic Navigation System
Virtual Eye - Smart Traffic Navigation SystemVirtual Eye - Smart Traffic Navigation System
Virtual Eye - Smart Traffic Navigation System
 
Ontological Model of Educational Programs in Computer Science (Bachelor and M...
Ontological Model of Educational Programs in Computer Science (Bachelor and M...Ontological Model of Educational Programs in Computer Science (Bachelor and M...
Ontological Model of Educational Programs in Computer Science (Bachelor and M...
 
Understanding IoT Management for Smart Refrigerator
Understanding IoT Management for Smart RefrigeratorUnderstanding IoT Management for Smart Refrigerator
Understanding IoT Management for Smart Refrigerator
 
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...
 
A Review: Microwave Energy for materials processing
A Review: Microwave Energy for materials processingA Review: Microwave Energy for materials processing
A Review: Microwave Energy for materials processing
 
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
Web Usage Mining: A Survey on User's Navigation Pattern from Web LogsWeb Usage Mining: A Survey on User's Navigation Pattern from Web Logs
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
 
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEM
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEMAPPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEM
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEM
 
Making model of dual axis solar tracking with Maximum Power Point Tracking
Making model of dual axis solar tracking with Maximum Power Point TrackingMaking model of dual axis solar tracking with Maximum Power Point Tracking
Making model of dual axis solar tracking with Maximum Power Point Tracking
 
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...
 
Study and Review on Various Current Comparators
Study and Review on Various Current ComparatorsStudy and Review on Various Current Comparators
Study and Review on Various Current Comparators
 
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...
 
Defending Reactive Jammers in WSN using a Trigger Identification Service.
Defending Reactive Jammers in WSN using a Trigger Identification Service.Defending Reactive Jammers in WSN using a Trigger Identification Service.
Defending Reactive Jammers in WSN using a Trigger Identification Service.
 

Recently uploaded

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 

A Survey on Big Data Analysis Techniques

  • 1. IJSRD - International Journal for Scientific Research & Development| Vol. 1, Issue 9, 2013 | ISSN (online): 2321-0613 All rights reserved by www.ijsrd.com 1806 Abstract—There is a growing trend of applications that ought to handle huge information. However, analysing huge information may be a terribly difficult drawback nowadays. For such data many techniques can be considered. The technologies like Grid Computing, Volunteering Computing, and RDBMS can be considered as potential techniques to handle such data. We have a still in growing phase Hadoop Tool to handle such data also. We will do a survey on all this techniques to find a potential technique to manage and work with Big Data. Key words: big data, big data analysis, hadoop; I. INTRODUCTION Large-scale data and its analysis are at the centre of modern research and enterprise. These data are known as Big Data. The facts and figures of such data are developed from online transactions, internet messages, videos, audios, images, click streams, logs, posts, search queries, wellbeing notes, social networking interactions, science facts and figures, sensors and wireless phones and their submissions . They are retained in databases grow hugely and become tough to capture, pattern, store, manage, share, investigate and visualize by usual database software tools. [1] 5 Exabyte (1018 bytes) of facts and figures were conceived by human until 2003. Today this amount of data is created in two days. In 2012, digital world of data was expanded to 2.72 zettabytes (1021 bytes). It is forecast to double every two years, reaching about 8 zettabytes of facts and figures by 2015. IBM indicates that every day 2.5 exabytes of facts and figures created furthermore 90% of the facts and figures made in last two years. An individual computer retains about 500 gigabytes (109 bytes), so it would need about 20 billion PCs to shop all of the world’s facts and figures. In the past, human genome decryption method takes roughly 10 years, now not more than a week. Multimedia facts and figures have large-scale weight on internet backbone traffic and are anticipated to boost 70% by 2013. Only Google has got more than one million servers around the worlds. There have been 6 billion mobile subscriptions in the world and every day 10 billion text messages are sent. By the year 2020, 50 billion devices will be attached to systems and the internet. [1] II. RELATED WORK There are many techniques to handle Big data. Here we will discuss some Big Data storage techniques as well as some of the Big data analysis techniques. Their brief study is as mentioned below. A. Hadoop Hadoop is an open source project hosted by Apache Software Foundation. It consists of many small sub projects Which belong to the category of infrastructure for distributed computing. Hadoop mainly consists of: 1) File System (The Hadoop File System) 2) Programming Paradigm (Map Reduce) The other subprojects provide complementary services or they are building on the core to add higher-level abstractions. There exist many problems in dealing with storage of large amount of data. Though the storage capacities of the drives have increased massively but the rate of reading data from them hasn’t shown that considerable improvement. The reading process takes large amount of time and the process of writing is also slower. This time can be reduced by reading from multiple disks at once. Only using one hundredth of a disk may seem wasteful. But if there are one hundred datasets, each of which is one terabyte and providing shared access to them is also a solution. [2] There occur many problems also with using many pieces of hardware as it increases the chances of failure. This can be avoided by Replication i.e. creating redundant copies of the same data at different devices so that in case of failure the copy of the data is available. [2] The main problem is of combining the data being read from different devices. Many a methods are available in distributed computing to handle this problem but still it is quite challenging. All the problems discussed are easily handled by Hadoop. The problem of failure is handled by the Hadoop Distributed File System and problem of combining data is handled by Map reduce programming Paradigm. Map Reduce basically reduces the problem of disk reads and writes by providing a programming model dealing in computation with keys and values. [2] Hadoop thus provides: a reliable shared storage and analysis system. The storage is provided by HDFS and analysis by Map Reduce. [2] B. Hadoop Components in detail 1) Hadoop Distributed File System: Fig. 1: HDFS Architecture [4] A Survey on Big Data Analysis Techniques Himanshu Rathod1 Tarulata Chauhan2 1, 2 Department of Computer Engineering 1, 2 L.J.I.T, Ahmedabad, Gujarat
  • 2. A Survey on Big Data Analysis Techniques (IJSRD/Vol. 1/Issue 9/2013/0030) All rights reserved by www.ijsrd.com 1807 Hadoop comes with a distributed File System called HDFS, which stands for Hadoop Distributed File System. HDFS is a File System designed for storing very large files with streaming data access patterns, running on clusters on commodity hardware. HDFS block size is much larger than that of normal file system i.e. 64 MB by default. The reason for this large size of blocks is to reduce the number of disk seeks. [4] A HDFS cluster has two types of nodes i.e. namenode (the master) and number of datanodes (workers). The name node manages the file system namespace, maintains the file system tree and the metadata for all the files and directories in the tree. The datanode stores and retrieve blocks as per the instructions of clients or the namenode. The data retrieved is reported back to the namenode with lists of blocks that they are storing. Without the namenode it is not possible to access the file. So it becomes very important to make name node resilient to failure. [4] These are areas where HDFS is not a good fit: Low-latency data access, Lots of small file, multiple writers and arbitrary file modifications. [4] 2) MapReduce: MapReduce is the programming paradigm allowing massive scalability. The MapReduce basically performs two different tasks i.e. Map Task and Reduce Task. A map-reduce computation executes as follows: Fig. 2: MapReduce Architecture Map tasks are given input from distributed file system. The map tasks produce a sequence of key-value pairs from the input and this is done according to the code written for map function. These value generated are collected by master controller and are sorted by key and divided among reduce tasks. The sorting basically assures that the same key values ends with the same reduce tasks. The Reduce tasks combine all the values associated with a key working with one key at a time. Again the combination process depends on the code written for reduce job. [5] The Master controller process and some number of worker processes at different compute nodes are forked by the user. Worker handles map tasks (MAP WORKER) and reduce tasks (REDUCE WORKER) but not both. [5] The Master controller creates some number of maps and reduces tasks which are usually decided by the user program. The tasks are assigned to the worker nodes by the master controller. Track of the status of each Map and Reduce task (idle, executing at a particular Worker or completed) is kept by the Master Process. On the completion of the work assigned the worker process reports to the master and master reassigns it with some task. The failure of a compute node is detected by the master as it periodically pings the worker nodes. All the Map tasks assigned to that node are restarted even if it had completed and this is due to the fact that the results of that computation would be available on that node only for the reduce tasks. The status of each of these Map tasks is set to idle by Master. These get scheduled by Master on a Worker only when one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. [5] III. COMPARISON OF HADOOP TECHNIQUE WITH OTHER SYSTEM TECHNIQUES A. Grid Computing Tools: The approach in Grid computing includes the distribution of work across a cluster and they are having a common shared File system hosted by SAN. The jobs here are mainly compute intensive and thus it suits well to them unlike as in case of Big data where access to larger volume of data as network bandwidth is the main bottleneck and the compute nodes start becoming idle. Map Reduce component of Hadoop here plays an important role by making use of the Data Locality property where it collocates the data with the compute node itself so that the data access is fast.[2,6] Grid computing basically makes a use of the API’s such as message passing Interface (MPI). Though it provides great control to the user, the user needs to control the mechanism for handling the data flow. On the other hand Map Reduce operates only at the higher level where the data flow is implicit and the programmer just thinks in terms of key and value pairs. Coordination of the jobs on large distributed systems is always challenging. Map Reduce handles this problem easily as it is based on shared-nothing architecture i.e. the tasks are independent of each other. The implementation of Map Reduce itself detects the failed tasks and reschedules them on healthy machines. Thus the order in which the tasks run hardly matters from programmer’s point of view. But in case of MPI, an explicit management of check pointing and recovery system needs to be done by the program. This gives more control to the programmer but makes them more difficult to write. [2, 6] B. Comparison with Volunteer Computing Technique: In Volunteer computing work is broken down into chunks called work units which are sent on computers across the world to be analysed. After the completion of the analysis the results are sent back to the server and the client is assigned with another work unit. In order to assure accuracy, each work unit is sent to three different machines and the result is accepted if at least two of them match. This concept of Volunteer Computing makes it look like MapReduce. But there exists a big difference between the two the tasks in case of Volunteer. Computing is basically CPU intensive. This tasks makes these tasks suited to be distributed across computers as transfer of work unit time is less than the time required for the computation whereas in case of MapReduce
  • 3. A Survey on Big Data Analysis Techniques (IJSRD/Vol. 1/Issue 9/2013/0030) All rights reserved by www.ijsrd.com 1808 is designed to run jobs that last minutes or hours on trusted, dedicated hardware running in a single data centre with very high aggregate bandwidth interconnects.[2,6] C. Comparison with RDBMS: The traditional database deals with data size in range of Gigabytes as compared to MapReduce dealing in petabytes. The Scaling in case of MapReduce is linear as compared to that of traditional database. In fact the RDBMS differs structurally, in updating, and access techniques from MapReduce. Comparison of RDBMS over some general properties of big data is as shown below. [2, 6] Traditional RDBMS MapReduce Data Size Gigabytes Petabytes Access Interactive and batch Batch Updates Read and write many times Write once, read many times Structure Static schema Dynamic Schema Integrity High Low Scaling Nonlinear Linear Table. 1: RDBMS compared to MapReduce [6] IV. CONCLUSION AND FUTURE WORK From the above study and survey we can conclude that Hadoop is possibly one of the best solutions to maintain the Big Data. We studied other techniques such as Grid Computing tools, Volunteering Computing and RDBMS techniques. We learnt that Hadoop is capable enough to handle such amount of data and analyze such data. In future we would like to carry out unstructured data of logs with the help of Hadoop. We would like to carry out analysis more efficiently and quickly. REFRENCES [1] Sagiroglu, S.; Sinanc, D., "Big data: A review," Collaboration Technologies and Systems (CTS), 2013 International Conference on , vol., no., pp.42,47, 20-24 May 2013 [2] Katal, A.; Wazid, M.; Goudar, R.H., "Big data: Issues, challenges, tools and Good practices," Contemporary Computing (IC3), 2013 Sixth International Conference on , vol., no., pp.404,409, 8-10 Aug. 2013 [3] Nandimath, J.; Banerjee, E.; Patil, A.; Kakade, P.; Vaidya, S., "Big data analysis using Apache Hadoop," Information Reuse and Integration (IRI), 2013 IEEE 14th International Conference on , vol., no., pp.700,703, 14-16 Aug. 2013 [4] Apache Hadoop (2013). HDFS Architecture Guide [Online]. Available: https://hadoop.apache.org/docs/r1.2.1/hdfs_design.ht [5] Apache Hadoop (2013). Hadoop Map/Reduce Tutorial [Online]. Available: http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial. html [6] Tom White (2013). Inkling for Web [Online]. Available: ttps://www.inkling.com/read/hadoop- definitive-guide-tom-white-3rd/chapter-1/comparison- with-other-systems