SlideShare ist ein Scribd-Unternehmen logo
1 von 7
Downloaden Sie, um offline zu lesen
Indonesian Journal of Electrical Engineering and Informatics (IJEEI)
Vol. 4, No. 1, March 2016, pp. 74~80
ISSN: 2089-3272, DOI: 10.11591/ijeei.v4i1.195  74
Received October 9, 2015; Revised January 6, 2016; Accepted January 20, 2016
Big Data-Survey
PSG Aruna Sri*
1
, Anusha M
2
1
Department of Electronics and Computer Engineering, K L University,
2
Research Scholar, Dept of CSE, K L UNIVERSITY
Greenfields, Vaddeswaram, Guntur District, Andhra Pradesh 522502
*Corresponding author, e-mail: arunasri_2012@kluniversity.in
1
, anushaaa9@kluniversity.in
2
Abstract
Big data is the term for any gathering of information sets, so expensive and complex, that it gets
to be hard to process for utilizing customary information handling applications. The difficulties incorporate
investigation, catch, duration, inquiry, sharing, stockpiling, Exchange, perception, and protection
infringement. To reduce spot business patterns, anticipate diseases, conflict etc., we require bigger data
sets when compared with the smaller data sets. Enormous information is hard to work with utilizing most
social database administration frameworks and desktop measurements and perception bundles, needing
rather enormously parallel programming running on tens, hundreds, or even a large number of servers. In
this paper there was an observation on Hadoop architecture, different tools used for big data and its
security issues.
Keywords: Big data, Hadoop, Software tools, Map Reduce
1. Introduction
We make 2.5 quintillion bytes of information [1] - so much that 90% of the data on the
planet today has been made in the last two years alone. This data originates from all around:
sensors used to accumulate atmosphere data, posts to social media sites, digital pictures and
videos, and cell phone GPS signal. This enormous amount of the data is known as "Big data".
Big data is a catchphrase, or motto, utilizes to describe a massive volume of both structured and
unstructured data that is so huge that it's complicated to process using conventional database
and software procedures. In most project circumstances the information is too big or it shifts too
quickly or it surpasses existing processing ability.
Big data has the potential to help organizations to improve operations and make faster
by taking more intelligent decisions. Now-a-days, Big Data is the term which finds to be normal
in IT businesses. As there was enormous information in the industry although there is nothing
before big data which comes into imagine. Big data is really an advancing term that illustrates
any huge amount of organized, semi organized information that can possibly be extracting for
data. Although big data doesn't refer to any particular quantity, so this term is often utilized
when talking about petabytes and exabytes of data Big data is a comprehensive term for
expansive accumulation of the data sets so this large and complex that it gets to be
troublesome to work with conventional data processing applications. At the point when
managing bigger datasets, organizations face challenges in creating and managing big data. No
standard tools and procedures for searching and analyzing large data sets in business analytics
A case of Big data may be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data
comprising of billions to trillions of records of a huge number of individuals all from distinctive
sources for example agreements, web, moveable data. The data is commonly approximately
organized data that is frequently fragmented also, inaccessible. The problems faced by big data
are analyzing, capturing, searching, sharing, storing, transferring, visualization and privacy
abuse. Larger data sets is needed in order to prevent diseases, combat crime, spot business
trends and so on. Because of large information sets in these area researchers identify
limitations frequently like meteorology, genomics, connectomics, complex physics simulations,
biological, ecological research, and funding and trade information. Data sets increased their size
due to collecting data from sensing portable devices, aerial sensory technology, software logs,
cameras, microphones, radio-frequency identification readers, and wireless sensor networks.
In March 2012 [2], the Obama administration announced the big data research and
development initiative. The leading IT companies, such as SAG, Oracle, IBM, Microsoft, SAP
 ISSN: 2089-3272
IJEEI Vol. 4, No. 1, March 2016 : 74 – 80
75
and HP, have spent more than $15 billion on buying data management and analytics software.
Big data defined as far back as 2001, industry expert Doug Laney (right now with Gartner)
expressed the standard meaning of big data as the 5 Vs of big data: volume, velocity, variety,
veracity and value, as shown Figure 1.
Figure 1. Five (5) V’s factor of big data
Volume: At present big scale of systems are flooded with constantly increasing information,
simply growing terabytes or even petabytes of data.
Velocity: Information is flowing at unique speed and should be deal with a sensible way. In real
time for many organizations it is difficult to deal with RFID tag, sensors and smart metering data.
Variety: Organized and unorganized information are producing a variety of data types by
making it feasible to search novel approaches, while analyzing these information collectively,
prediction might be attained as data flow into the organization.
Veracity: Identifying and verifying inconsistent information is significant, to accomplish faithful
study. Creating faith in big data is a big challenge to manage even more variety of data is
available.
Value: It is to be derived from big data. There is no reason for building the capacity to store and
manage, if unable to get the value from data.
One of the capable remarks on the advances of the arrangement with the Big Data is Hadoop.
2. Hadoop
Hadoop was made by Doug Cutting and Mike Cafarella in 2005. Doug Cutting, who was working
at Yahoo! at the time, named it after his kid's toy elephant.
Hadoop system is shown Figure 2 and consists [3]:
Reliable: this software can handle both hardware and software failures.
Scalable: Designed for gigantic size of processors, memory, and local appended capacity
Distributed: Map reduce recommends parallel programming model and provides concept of
replication
IJEEI ISSN: 2089-3272 
Big Data-Survey (PSG Aruna Sri)
76
Figure 2. Hadoop system [1]
Hadoop is open-source programming that allows loyal, flexible, expressed figuring on
groups of reserved servers. That utilization the Map-Reduce framework presented by Google by
utilizing the idea of map and reduce functions that remarkable utilized in Functional
Programming. In spite of the fact that the Hadoop structure is composed in Java, it permits
designers to send custom-composed projects coded in Java or some other dialect to process
information in a parallel manner over hundreds or a great many thing servers. It is streamlined
for touching read requests (streaming reads), where transforming incorporates of checking all
the information. Contingent upon the intricacy of the procedure and the volume of information,
reaction time can change from minutes to hours. While Hadoop can forms information quick, so
its key focal point is its monstrous adaptability. Hadoop is right now being utilized for file web
seeks, email spam location, proposal motors, expectation in budgetary administrations, genome
control in life sciences, furthermore, for investigation of unstructured information, for example,
log, content, and click stream. While a considerable lot of these applications could indeed be
actualized in a social database (RDBMS) figure 2, the primary center of the Hadoop system is
practically not the same as a RDBMS. The accompanying examines some of these distinctions
Hadoop is especially helpful when:
 For handling of Complex data.
 Need of conversion of unstructured information to structured information
 Usage of SQL queries
 Recursive calculations are very large
 Complex geo-spatial investigation or genome sequencing
 Machine learning
 Data sets are so substantial it couldn't be possible fit into database RAM, plates, or require
an excess of centers (10's of TB up to PB)
 Data worth does not defend cost of steady ongoing accessibility, for example, files or
exceptional interest information, which can be moved to Hadoop and stay accessible at
lower expense
 Results are not required continuously
 Fault resistance is basic
 Significant custom coding would be obliged to handle employment booking
Hadoop was motivated by Google's Map Reduce, a product structure in which an
application is separated into various little parts. Any of these parts (additionally called sections
or squares) can be run on any hub in the group. Doug Cutting, Hadoop's maker, named the
system after his youngster's full toy elephant. The current Apache Hadoop biological system
comprises of the Hadoop bit, Map Reduce, the Hadoop circulated record framework (HDFS)
what's more, various related activities, for example, Apache Hive, HBase and Zookeeper. The
Hadoop structure is utilized by significant players including Google, Yahoo and IBM, generally
for applications including web search tools and promoting. The favored working frameworks are
Windows and Linux be that as it may Hadoop can likewise work with BSD and OS X.
 ISSN: 2089-3272
IJEEI Vol. 4, No. 1, March 2016 : 74 – 80
77
A distributed file system framework is a customer/server-based application that permits
customers to get to and process information put away on the server as though it were all alone
PC. At the point when a client gets to a document on the server, the server sends the client a
duplicate of the document, which is stored on the client's PC while the information is being
prepared and is then come back to the server. Preferably, a convey record framework arranges
document and registry administrations of individual servers into a worldwide catalog in such a
way that remote information access is not area particular however is indistinguishable from any
customer. All records are available to all clients of the worldwide record framework and
association is progressive what more, registry based is. Since more than one customer may get
to the same information all the while, the server must have a system in spot, (for example,
keeping up data about the times of access) to arrange overhauls so that the customer
dependably gets the most current adaptation of information and that information clashes don't
emerge. Dispersed record frameworks ordinarily utilize record or database replication
(conveying duplicates of information on different servers) to ensure against information access
failures [4]. Sun Microsystems' Network File System (NFS), Novell NetWare, Microsoft's
Distributed File Framework, and IBM/Transarc's DFS are a few samples of distributed file
systems framework.
3. HDFS
Hadoop frame work consists the Hadoop Distributed File System (HDFS) [4]. HDFS is
planned and improved to store information more than a lot of ease equipment in an
appropriated manner, as shown in Figure 3.
Figure 3. HDFS architecture
The structure of HDFS is master/slave. HDFS cluster consists of one Name Node, a
master server that manages the classification system namespace and regulates access to files
by clients. In addition, there square measure variety of information nodes, sometimes one per
node within the cluster, that manage storage hooked up to the nodes that they run on. HDFS
exposes a classification system namespace and permits user knowledge to be hold on in files.
Internally, a file is split into one or a lot of blocks and these blocks square measure hold on
during a set of information Nodes. The Name Node executes classification system namespace
operations like opening, closing, and renaming files and directories. It conjointly determines the
mapping of blocks to knowledge Nodes. The info Nodes square measure to responsibility for
serving scans and write requests from the file system’s clients. The info Nodes conjointly
performs block formation, deletion, and replication upon instruction from the Name Node.
The Name Node and knowledge Node square measure items of package designed to
run on goods machines. These machines generally run a GNU/Linux software system (OS).
HDFS is constructed exploitation the Java language; any machine that supports Java will run
the Name Node or the Data Node package. Usage of the extremely moveable Java language
IJEEI ISSN: 2089-3272 
Big Data-Survey (PSG Aruna Sri)
78
implies that HDFS will be deployed on a large variety of machines. A typical readying includes a
dedicated machine that runs solely the Name Node package. Every of the opposite machines
within the cluster runs one instance of the Data Node package. The design doesn't preclude
running multiple knowledge Nodes on an equivalent machine however during a real readying
that's seldom the case. The existence of Name Node during a cluster greatly simplifies the
design of the system. The Name Node is that the intermediary and repository for all HDFS data.
The system is intended to flow the user knowledge through the Name Node.
The base Apache Hadoop structure is made out of the taking after modules: Hadoop
Common-contains libraries and utilities required by other Hadoop modules. Hadoop Distributed
File System (HDFS) - a appropriated document framework that stores information on product
machines, giving high total data transfer capacity over the group. Hadoop Map Reduce – a
programming model for huge scale information preparing. All the modules in Hadoop are
planned with a basic presumption that equipment disappointments (of individual machines, or
racks of machines) are normal also, consequently ought to be naturally taken care of in
programming by the structure. Apache Hadoop’s Map Reduce and HDFS parts initially got
individually from Google's Map Reduce and Google File System (GFS) papers."Hadoop"
frequently alludes not to simply the base Hadoop bundle yet rather to the Hadoop Ecosystem
fig.4 which incorporates the greater part of the extra programming bundles that can be
introduced on top of or nearby Hadoop, for example, Apache Hive, Apache Pig and A HBase.
4. Map Reduce Framework
Map Reduce as shown in Figure 4 is a product system for appropriated transforming of
Big data sets on PC groups [5]. It is first grown by Google. Map Reduce is planned to
encourage also, improve the preparing of incomprehensible measures of information in parallel
on extensive bunches of merchandise equipment in a solid, issue tolerant way. Map Reduce is
the key calculation that the Hadoop Map Reduce motor uses to circulate work around a bunch.
Commonplace Hadoop bunch coordinates Map Reduce and HFDS layer. In Map Reduce layer
job tracker assigns tasks to the task tracker. Master node job tracker also allots tasks to the
slave node task tracker figure.
Figure 4. Map reduce is based on the Master-Slave architecture
Master node contains -
 Job tracker node (Map Reduce layer)
 Task tracker node (Map Reduce layer)
 Name node (HFDS layer)
 Data node (HFDS layer)
Multiple slave nodes contain -
 Task tracker node (Map Reduce layer)
 Data node (HFDS layer)
 Map Reduce layer has job and task tracker nodes
 HFDS layer has name and data nodes
A. Map Reduce core functionality (I):
Map & Reduce stage plays an significant role in map reduce core functionality.
 ISSN: 2089-3272
IJEEI Vol. 4, No. 1, March 2016 : 74 – 80
79
Map stage: In Map step, master node takes consideration of large problem input and divided
into smaller problems and allotted to worker nodes. These nodes process the smaller problems
and return to the master node.
• Map (key1, value)  list<key2, value2>
Reduce stage: In this Reduce stage, Master node takes the response from the sub problems
and joins them in a predefined manner to get the output to original problem, as shown in
Figure 5.
• Reduce (key2, list < value2 >)  list
Figure 5. Reduce stage
5. PIG
Pigs [6] was at first shaped at Yahoo! to permit individuals utilizing Hadoop to
concentrate all the more on examining substantial information sets and invest less moment of
time is needed to compose mapper and reducer programs. Pig is comprised of two segments:
the first is the is called Pig Latin and the second is a runtime situation where Pig Latin programs
are executed.
6. HIVE
Apache Hive [7] was initially developed by face book. It has data warehouse structure
built on top of hadoop for analysis and inquiry of data. By default, Hive stores metadata in an
installed Apache Derby Database and other customer/server databases like MySQL can
alternatively be used. Right now, there are four document configurations upheld in Hive, which
are TEXTFILE SEQUENCEFILE, ORC and RCFILE.
7. HBase
HBase [8] is a section arranged database administration framework that keeps running
on top of HDFS. HBase applications are composed in Java much like an average Map Reduce
application. A HBase framework consists an arrangement of tables. Table Consists of rows and
columns like a conventional database.
8. Issues
Although a considerable measure of research is going on big data yet at the same time.
Several ideas are still to be investigated. Scientists would attempt to upgrade security stage to
enhance capacity of programming to discover propelled dangers, respond in like manner and
would create preventive measures for future. Specialists would attempt to enhance quality and
dependability of security framework. A few analysts are wanting to taken up information
accumulation, pretreatment, incorporation, Map Reduce and investigation utilizing machine
learning strategies. They would utilize the outcomes for securing and actualizing preventive
measures from dangers to big business information. Specialists would attempt to outline the
meet the creation needs of endeavors for growing high caliber item by applying efforts to
IJEEI ISSN: 2089-3272 
Big Data-Survey (PSG Aruna Sri)
80
establish safety with the assistance of big data Analytics with Hadoop. A few specialists are
utilizing systems administration checking tools like Packet pig, Mahout and so on to Improve the
security levels. Targeted threats will be analyzed by using hadoop cluster.
9. Conclusion
Big data is going to keep developing amid the following years, and every information
researcher will need to oversee considerably more measure of information will be more
different, bigger, and speedier. We talked about a few experiences about the theme, and what
we consider are the fundamental concerns and primary issues for what’s to come. Big data is
turning into new final boundary for experimental information research and for business
applications. Everyone is warmly welcomed to take part in this fearless trip.
References
[1] Ms Vibhavari Chavan et al. “Survey Paper on Big Data”. International Journal of Computer Science
and Information Technologies. 2014; 5 (6), ISSN: 0975-9646.
[2] Michael R Lyu. “Service-generated Big Data and Big Data-as-a-Service”. Internetware. 2013.
[3] Suman Arora et al. “Survey Paper on Scheduling in Hadoop”. International Journal of Advanced
Research in Computer Science and Software Engineering. 2014; 4.
[4] Dhruba Borthakur. “The Hadoop Distributed File System: Architecture and Design”. The Apache
Software Foundation. 2007.
[5] Yaxiong Zhao et al. “Dache: A Data AwareCaching for Big-Data Applications Using the MapReduce
Framework”. Tsinghua science and technology. 2014; 19(1), ISSN: l1007-0214l.
[6] Apache Pig. Available at http://pig.apache.org
[7] Apache Hive. Available at http://hive.apache.org
[8] Apache HBase. Available at http://hbase.apache.org

Weitere ähnliche Inhalte

Was ist angesagt?

Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesIJRESJOURNAL
 
Big data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security AllianceBig data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security AllianceInformation Security Awareness Group
 
Big data privacy issues in public social media
Big data privacy issues in public social mediaBig data privacy issues in public social media
Big data privacy issues in public social mediaSupriya Radhakrishna
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...IJSRD
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabatinabati
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data Shallote Dsouza
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxPankajkumar496281
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and howbobosenthil
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi
 
IRJET- Big Data: A Study
IRJET-  	  Big Data: A StudyIRJET-  	  Big Data: A Study
IRJET- Big Data: A StudyIRJET Journal
 

Was ist angesagt? (20)

1
11
1
 
ANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEWANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEW
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and Perspectives
 
Big data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security AllianceBig data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security Alliance
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Big data privacy issues in public social media
Big data privacy issues in public social mediaBig data privacy issues in public social media
Big data privacy issues in public social media
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
Big data mining
Big data miningBig data mining
Big data mining
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
IRJET- Big Data: A Study
IRJET-  	  Big Data: A StudyIRJET-  	  Big Data: A Study
IRJET- Big Data: A Study
 
IJARCCE_49
IJARCCE_49IJARCCE_49
IJARCCE_49
 

Ähnlich wie Big Data-Survey

DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond Hill
DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond HillDOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond Hill
DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond HillClaraZara1
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformIRJET Journal
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewIJERA Editor
 
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATAREAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATAijp2p
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challengesijcisjournal
 
How to tackle big data from a security
How to tackle big data from a securityHow to tackle big data from a security
How to tackle big data from a securityTyrone Systems
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...IJSRD
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)Shahbaz Anjam
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAudrey Britton
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigDataValarmathi V
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfrajsharma159890
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond Rajesh Kumar
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
 

Ähnlich wie Big Data-Survey (20)

DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond Hill
DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond HillDOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond Hill
DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond Hill
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
 
Big Data Hadoop
Big Data HadoopBig Data Hadoop
Big Data Hadoop
 
Big data
Big dataBig data
Big data
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
 
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATAREAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challenges
 
How to tackle big data from a security
How to tackle big data from a securityHow to tackle big data from a security
How to tackle big data from a security
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data Analytics
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
Big Data
Big DataBig Data
Big Data
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Big data
Big dataBig data
Big data
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdf
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 

Mehr von ijeei-iaes

An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data ClusteringAn Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data Clusteringijeei-iaes
 
Development of a Wireless Sensors Network for Greenhouse Monitoring and Control
Development of a Wireless Sensors Network for Greenhouse Monitoring and ControlDevelopment of a Wireless Sensors Network for Greenhouse Monitoring and Control
Development of a Wireless Sensors Network for Greenhouse Monitoring and Controlijeei-iaes
 
Analysis of Genetic Algorithm for Effective power Delivery and with Best Upsurge
Analysis of Genetic Algorithm for Effective power Delivery and with Best UpsurgeAnalysis of Genetic Algorithm for Effective power Delivery and with Best Upsurge
Analysis of Genetic Algorithm for Effective power Delivery and with Best Upsurgeijeei-iaes
 
Design for Postplacement Mousing based on GSM in Long-Distance
Design for Postplacement Mousing based on GSM in Long-DistanceDesign for Postplacement Mousing based on GSM in Long-Distance
Design for Postplacement Mousing based on GSM in Long-Distanceijeei-iaes
 
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...ijeei-iaes
 
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...ijeei-iaes
 
Mitigation of Power Quality Problems Using Custom Power Devices: A Review
Mitigation of Power Quality Problems Using Custom Power Devices: A ReviewMitigation of Power Quality Problems Using Custom Power Devices: A Review
Mitigation of Power Quality Problems Using Custom Power Devices: A Reviewijeei-iaes
 
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...ijeei-iaes
 
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...ijeei-iaes
 
Intelligent Management on the Home Consumers with Zero Energy Consumption
Intelligent Management on the Home Consumers with Zero Energy ConsumptionIntelligent Management on the Home Consumers with Zero Energy Consumption
Intelligent Management on the Home Consumers with Zero Energy Consumptionijeei-iaes
 
Analysing Transportation Data with Open Source Big Data Analytic Tools
Analysing Transportation Data with Open Source Big Data Analytic ToolsAnalysing Transportation Data with Open Source Big Data Analytic Tools
Analysing Transportation Data with Open Source Big Data Analytic Toolsijeei-iaes
 
A Pattern Classification Based approach for Blur Classification
A Pattern Classification Based approach for Blur ClassificationA Pattern Classification Based approach for Blur Classification
A Pattern Classification Based approach for Blur Classificationijeei-iaes
 
Computing Some Degree-Based Topological Indices of Graphene
Computing Some Degree-Based Topological Indices of GrapheneComputing Some Degree-Based Topological Indices of Graphene
Computing Some Degree-Based Topological Indices of Grapheneijeei-iaes
 
A Lyapunov Based Approach to Enchance Wind Turbine Stability
A Lyapunov Based Approach to Enchance Wind Turbine StabilityA Lyapunov Based Approach to Enchance Wind Turbine Stability
A Lyapunov Based Approach to Enchance Wind Turbine Stabilityijeei-iaes
 
Fuzzy Control of a Large Crane Structure
Fuzzy Control of a Large Crane StructureFuzzy Control of a Large Crane Structure
Fuzzy Control of a Large Crane Structureijeei-iaes
 
Site Diversity Technique Application on Rain Attenuation for Lagos
Site Diversity Technique Application on Rain Attenuation for LagosSite Diversity Technique Application on Rain Attenuation for Lagos
Site Diversity Technique Application on Rain Attenuation for Lagosijeei-iaes
 
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...ijeei-iaes
 
Music Recommendation System with User-based and Item-based Collaborative Filt...
Music Recommendation System with User-based and Item-based Collaborative Filt...Music Recommendation System with User-based and Item-based Collaborative Filt...
Music Recommendation System with User-based and Item-based Collaborative Filt...ijeei-iaes
 
A Real-Time Implementation of Moving Object Action Recognition System Based o...
A Real-Time Implementation of Moving Object Action Recognition System Based o...A Real-Time Implementation of Moving Object Action Recognition System Based o...
A Real-Time Implementation of Moving Object Action Recognition System Based o...ijeei-iaes
 
Wireless Sensor Network for Radiation Detection
Wireless Sensor Network for Radiation DetectionWireless Sensor Network for Radiation Detection
Wireless Sensor Network for Radiation Detectionijeei-iaes
 

Mehr von ijeei-iaes (20)

An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data ClusteringAn Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
 
Development of a Wireless Sensors Network for Greenhouse Monitoring and Control
Development of a Wireless Sensors Network for Greenhouse Monitoring and ControlDevelopment of a Wireless Sensors Network for Greenhouse Monitoring and Control
Development of a Wireless Sensors Network for Greenhouse Monitoring and Control
 
Analysis of Genetic Algorithm for Effective power Delivery and with Best Upsurge
Analysis of Genetic Algorithm for Effective power Delivery and with Best UpsurgeAnalysis of Genetic Algorithm for Effective power Delivery and with Best Upsurge
Analysis of Genetic Algorithm for Effective power Delivery and with Best Upsurge
 
Design for Postplacement Mousing based on GSM in Long-Distance
Design for Postplacement Mousing based on GSM in Long-DistanceDesign for Postplacement Mousing based on GSM in Long-Distance
Design for Postplacement Mousing based on GSM in Long-Distance
 
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...
 
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
 
Mitigation of Power Quality Problems Using Custom Power Devices: A Review
Mitigation of Power Quality Problems Using Custom Power Devices: A ReviewMitigation of Power Quality Problems Using Custom Power Devices: A Review
Mitigation of Power Quality Problems Using Custom Power Devices: A Review
 
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...
 
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...
 
Intelligent Management on the Home Consumers with Zero Energy Consumption
Intelligent Management on the Home Consumers with Zero Energy ConsumptionIntelligent Management on the Home Consumers with Zero Energy Consumption
Intelligent Management on the Home Consumers with Zero Energy Consumption
 
Analysing Transportation Data with Open Source Big Data Analytic Tools
Analysing Transportation Data with Open Source Big Data Analytic ToolsAnalysing Transportation Data with Open Source Big Data Analytic Tools
Analysing Transportation Data with Open Source Big Data Analytic Tools
 
A Pattern Classification Based approach for Blur Classification
A Pattern Classification Based approach for Blur ClassificationA Pattern Classification Based approach for Blur Classification
A Pattern Classification Based approach for Blur Classification
 
Computing Some Degree-Based Topological Indices of Graphene
Computing Some Degree-Based Topological Indices of GrapheneComputing Some Degree-Based Topological Indices of Graphene
Computing Some Degree-Based Topological Indices of Graphene
 
A Lyapunov Based Approach to Enchance Wind Turbine Stability
A Lyapunov Based Approach to Enchance Wind Turbine StabilityA Lyapunov Based Approach to Enchance Wind Turbine Stability
A Lyapunov Based Approach to Enchance Wind Turbine Stability
 
Fuzzy Control of a Large Crane Structure
Fuzzy Control of a Large Crane StructureFuzzy Control of a Large Crane Structure
Fuzzy Control of a Large Crane Structure
 
Site Diversity Technique Application on Rain Attenuation for Lagos
Site Diversity Technique Application on Rain Attenuation for LagosSite Diversity Technique Application on Rain Attenuation for Lagos
Site Diversity Technique Application on Rain Attenuation for Lagos
 
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...
 
Music Recommendation System with User-based and Item-based Collaborative Filt...
Music Recommendation System with User-based and Item-based Collaborative Filt...Music Recommendation System with User-based and Item-based Collaborative Filt...
Music Recommendation System with User-based and Item-based Collaborative Filt...
 
A Real-Time Implementation of Moving Object Action Recognition System Based o...
A Real-Time Implementation of Moving Object Action Recognition System Based o...A Real-Time Implementation of Moving Object Action Recognition System Based o...
A Real-Time Implementation of Moving Object Action Recognition System Based o...
 
Wireless Sensor Network for Radiation Detection
Wireless Sensor Network for Radiation DetectionWireless Sensor Network for Radiation Detection
Wireless Sensor Network for Radiation Detection
 

Kürzlich hochgeladen

AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 

Kürzlich hochgeladen (20)

AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 

Big Data-Survey

  • 1. Indonesian Journal of Electrical Engineering and Informatics (IJEEI) Vol. 4, No. 1, March 2016, pp. 74~80 ISSN: 2089-3272, DOI: 10.11591/ijeei.v4i1.195  74 Received October 9, 2015; Revised January 6, 2016; Accepted January 20, 2016 Big Data-Survey PSG Aruna Sri* 1 , Anusha M 2 1 Department of Electronics and Computer Engineering, K L University, 2 Research Scholar, Dept of CSE, K L UNIVERSITY Greenfields, Vaddeswaram, Guntur District, Andhra Pradesh 522502 *Corresponding author, e-mail: arunasri_2012@kluniversity.in 1 , anushaaa9@kluniversity.in 2 Abstract Big data is the term for any gathering of information sets, so expensive and complex, that it gets to be hard to process for utilizing customary information handling applications. The difficulties incorporate investigation, catch, duration, inquiry, sharing, stockpiling, Exchange, perception, and protection infringement. To reduce spot business patterns, anticipate diseases, conflict etc., we require bigger data sets when compared with the smaller data sets. Enormous information is hard to work with utilizing most social database administration frameworks and desktop measurements and perception bundles, needing rather enormously parallel programming running on tens, hundreds, or even a large number of servers. In this paper there was an observation on Hadoop architecture, different tools used for big data and its security issues. Keywords: Big data, Hadoop, Software tools, Map Reduce 1. Introduction We make 2.5 quintillion bytes of information [1] - so much that 90% of the data on the planet today has been made in the last two years alone. This data originates from all around: sensors used to accumulate atmosphere data, posts to social media sites, digital pictures and videos, and cell phone GPS signal. This enormous amount of the data is known as "Big data". Big data is a catchphrase, or motto, utilizes to describe a massive volume of both structured and unstructured data that is so huge that it's complicated to process using conventional database and software procedures. In most project circumstances the information is too big or it shifts too quickly or it surpasses existing processing ability. Big data has the potential to help organizations to improve operations and make faster by taking more intelligent decisions. Now-a-days, Big Data is the term which finds to be normal in IT businesses. As there was enormous information in the industry although there is nothing before big data which comes into imagine. Big data is really an advancing term that illustrates any huge amount of organized, semi organized information that can possibly be extracting for data. Although big data doesn't refer to any particular quantity, so this term is often utilized when talking about petabytes and exabytes of data Big data is a comprehensive term for expansive accumulation of the data sets so this large and complex that it gets to be troublesome to work with conventional data processing applications. At the point when managing bigger datasets, organizations face challenges in creating and managing big data. No standard tools and procedures for searching and analyzing large data sets in business analytics A case of Big data may be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data comprising of billions to trillions of records of a huge number of individuals all from distinctive sources for example agreements, web, moveable data. The data is commonly approximately organized data that is frequently fragmented also, inaccessible. The problems faced by big data are analyzing, capturing, searching, sharing, storing, transferring, visualization and privacy abuse. Larger data sets is needed in order to prevent diseases, combat crime, spot business trends and so on. Because of large information sets in these area researchers identify limitations frequently like meteorology, genomics, connectomics, complex physics simulations, biological, ecological research, and funding and trade information. Data sets increased their size due to collecting data from sensing portable devices, aerial sensory technology, software logs, cameras, microphones, radio-frequency identification readers, and wireless sensor networks. In March 2012 [2], the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle, IBM, Microsoft, SAP
  • 2.  ISSN: 2089-3272 IJEEI Vol. 4, No. 1, March 2016 : 74 – 80 75 and HP, have spent more than $15 billion on buying data management and analytics software. Big data defined as far back as 2001, industry expert Doug Laney (right now with Gartner) expressed the standard meaning of big data as the 5 Vs of big data: volume, velocity, variety, veracity and value, as shown Figure 1. Figure 1. Five (5) V’s factor of big data Volume: At present big scale of systems are flooded with constantly increasing information, simply growing terabytes or even petabytes of data. Velocity: Information is flowing at unique speed and should be deal with a sensible way. In real time for many organizations it is difficult to deal with RFID tag, sensors and smart metering data. Variety: Organized and unorganized information are producing a variety of data types by making it feasible to search novel approaches, while analyzing these information collectively, prediction might be attained as data flow into the organization. Veracity: Identifying and verifying inconsistent information is significant, to accomplish faithful study. Creating faith in big data is a big challenge to manage even more variety of data is available. Value: It is to be derived from big data. There is no reason for building the capacity to store and manage, if unable to get the value from data. One of the capable remarks on the advances of the arrangement with the Big Data is Hadoop. 2. Hadoop Hadoop was made by Doug Cutting and Mike Cafarella in 2005. Doug Cutting, who was working at Yahoo! at the time, named it after his kid's toy elephant. Hadoop system is shown Figure 2 and consists [3]: Reliable: this software can handle both hardware and software failures. Scalable: Designed for gigantic size of processors, memory, and local appended capacity Distributed: Map reduce recommends parallel programming model and provides concept of replication
  • 3. IJEEI ISSN: 2089-3272  Big Data-Survey (PSG Aruna Sri) 76 Figure 2. Hadoop system [1] Hadoop is open-source programming that allows loyal, flexible, expressed figuring on groups of reserved servers. That utilization the Map-Reduce framework presented by Google by utilizing the idea of map and reduce functions that remarkable utilized in Functional Programming. In spite of the fact that the Hadoop structure is composed in Java, it permits designers to send custom-composed projects coded in Java or some other dialect to process information in a parallel manner over hundreds or a great many thing servers. It is streamlined for touching read requests (streaming reads), where transforming incorporates of checking all the information. Contingent upon the intricacy of the procedure and the volume of information, reaction time can change from minutes to hours. While Hadoop can forms information quick, so its key focal point is its monstrous adaptability. Hadoop is right now being utilized for file web seeks, email spam location, proposal motors, expectation in budgetary administrations, genome control in life sciences, furthermore, for investigation of unstructured information, for example, log, content, and click stream. While a considerable lot of these applications could indeed be actualized in a social database (RDBMS) figure 2, the primary center of the Hadoop system is practically not the same as a RDBMS. The accompanying examines some of these distinctions Hadoop is especially helpful when:  For handling of Complex data.  Need of conversion of unstructured information to structured information  Usage of SQL queries  Recursive calculations are very large  Complex geo-spatial investigation or genome sequencing  Machine learning  Data sets are so substantial it couldn't be possible fit into database RAM, plates, or require an excess of centers (10's of TB up to PB)  Data worth does not defend cost of steady ongoing accessibility, for example, files or exceptional interest information, which can be moved to Hadoop and stay accessible at lower expense  Results are not required continuously  Fault resistance is basic  Significant custom coding would be obliged to handle employment booking Hadoop was motivated by Google's Map Reduce, a product structure in which an application is separated into various little parts. Any of these parts (additionally called sections or squares) can be run on any hub in the group. Doug Cutting, Hadoop's maker, named the system after his youngster's full toy elephant. The current Apache Hadoop biological system comprises of the Hadoop bit, Map Reduce, the Hadoop circulated record framework (HDFS) what's more, various related activities, for example, Apache Hive, HBase and Zookeeper. The Hadoop structure is utilized by significant players including Google, Yahoo and IBM, generally for applications including web search tools and promoting. The favored working frameworks are Windows and Linux be that as it may Hadoop can likewise work with BSD and OS X.
  • 4.  ISSN: 2089-3272 IJEEI Vol. 4, No. 1, March 2016 : 74 – 80 77 A distributed file system framework is a customer/server-based application that permits customers to get to and process information put away on the server as though it were all alone PC. At the point when a client gets to a document on the server, the server sends the client a duplicate of the document, which is stored on the client's PC while the information is being prepared and is then come back to the server. Preferably, a convey record framework arranges document and registry administrations of individual servers into a worldwide catalog in such a way that remote information access is not area particular however is indistinguishable from any customer. All records are available to all clients of the worldwide record framework and association is progressive what more, registry based is. Since more than one customer may get to the same information all the while, the server must have a system in spot, (for example, keeping up data about the times of access) to arrange overhauls so that the customer dependably gets the most current adaptation of information and that information clashes don't emerge. Dispersed record frameworks ordinarily utilize record or database replication (conveying duplicates of information on different servers) to ensure against information access failures [4]. Sun Microsystems' Network File System (NFS), Novell NetWare, Microsoft's Distributed File Framework, and IBM/Transarc's DFS are a few samples of distributed file systems framework. 3. HDFS Hadoop frame work consists the Hadoop Distributed File System (HDFS) [4]. HDFS is planned and improved to store information more than a lot of ease equipment in an appropriated manner, as shown in Figure 3. Figure 3. HDFS architecture The structure of HDFS is master/slave. HDFS cluster consists of one Name Node, a master server that manages the classification system namespace and regulates access to files by clients. In addition, there square measure variety of information nodes, sometimes one per node within the cluster, that manage storage hooked up to the nodes that they run on. HDFS exposes a classification system namespace and permits user knowledge to be hold on in files. Internally, a file is split into one or a lot of blocks and these blocks square measure hold on during a set of information Nodes. The Name Node executes classification system namespace operations like opening, closing, and renaming files and directories. It conjointly determines the mapping of blocks to knowledge Nodes. The info Nodes square measure to responsibility for serving scans and write requests from the file system’s clients. The info Nodes conjointly performs block formation, deletion, and replication upon instruction from the Name Node. The Name Node and knowledge Node square measure items of package designed to run on goods machines. These machines generally run a GNU/Linux software system (OS). HDFS is constructed exploitation the Java language; any machine that supports Java will run the Name Node or the Data Node package. Usage of the extremely moveable Java language
  • 5. IJEEI ISSN: 2089-3272  Big Data-Survey (PSG Aruna Sri) 78 implies that HDFS will be deployed on a large variety of machines. A typical readying includes a dedicated machine that runs solely the Name Node package. Every of the opposite machines within the cluster runs one instance of the Data Node package. The design doesn't preclude running multiple knowledge Nodes on an equivalent machine however during a real readying that's seldom the case. The existence of Name Node during a cluster greatly simplifies the design of the system. The Name Node is that the intermediary and repository for all HDFS data. The system is intended to flow the user knowledge through the Name Node. The base Apache Hadoop structure is made out of the taking after modules: Hadoop Common-contains libraries and utilities required by other Hadoop modules. Hadoop Distributed File System (HDFS) - a appropriated document framework that stores information on product machines, giving high total data transfer capacity over the group. Hadoop Map Reduce – a programming model for huge scale information preparing. All the modules in Hadoop are planned with a basic presumption that equipment disappointments (of individual machines, or racks of machines) are normal also, consequently ought to be naturally taken care of in programming by the structure. Apache Hadoop’s Map Reduce and HDFS parts initially got individually from Google's Map Reduce and Google File System (GFS) papers."Hadoop" frequently alludes not to simply the base Hadoop bundle yet rather to the Hadoop Ecosystem fig.4 which incorporates the greater part of the extra programming bundles that can be introduced on top of or nearby Hadoop, for example, Apache Hive, Apache Pig and A HBase. 4. Map Reduce Framework Map Reduce as shown in Figure 4 is a product system for appropriated transforming of Big data sets on PC groups [5]. It is first grown by Google. Map Reduce is planned to encourage also, improve the preparing of incomprehensible measures of information in parallel on extensive bunches of merchandise equipment in a solid, issue tolerant way. Map Reduce is the key calculation that the Hadoop Map Reduce motor uses to circulate work around a bunch. Commonplace Hadoop bunch coordinates Map Reduce and HFDS layer. In Map Reduce layer job tracker assigns tasks to the task tracker. Master node job tracker also allots tasks to the slave node task tracker figure. Figure 4. Map reduce is based on the Master-Slave architecture Master node contains -  Job tracker node (Map Reduce layer)  Task tracker node (Map Reduce layer)  Name node (HFDS layer)  Data node (HFDS layer) Multiple slave nodes contain -  Task tracker node (Map Reduce layer)  Data node (HFDS layer)  Map Reduce layer has job and task tracker nodes  HFDS layer has name and data nodes A. Map Reduce core functionality (I): Map & Reduce stage plays an significant role in map reduce core functionality.
  • 6.  ISSN: 2089-3272 IJEEI Vol. 4, No. 1, March 2016 : 74 – 80 79 Map stage: In Map step, master node takes consideration of large problem input and divided into smaller problems and allotted to worker nodes. These nodes process the smaller problems and return to the master node. • Map (key1, value)  list<key2, value2> Reduce stage: In this Reduce stage, Master node takes the response from the sub problems and joins them in a predefined manner to get the output to original problem, as shown in Figure 5. • Reduce (key2, list < value2 >)  list Figure 5. Reduce stage 5. PIG Pigs [6] was at first shaped at Yahoo! to permit individuals utilizing Hadoop to concentrate all the more on examining substantial information sets and invest less moment of time is needed to compose mapper and reducer programs. Pig is comprised of two segments: the first is the is called Pig Latin and the second is a runtime situation where Pig Latin programs are executed. 6. HIVE Apache Hive [7] was initially developed by face book. It has data warehouse structure built on top of hadoop for analysis and inquiry of data. By default, Hive stores metadata in an installed Apache Derby Database and other customer/server databases like MySQL can alternatively be used. Right now, there are four document configurations upheld in Hive, which are TEXTFILE SEQUENCEFILE, ORC and RCFILE. 7. HBase HBase [8] is a section arranged database administration framework that keeps running on top of HDFS. HBase applications are composed in Java much like an average Map Reduce application. A HBase framework consists an arrangement of tables. Table Consists of rows and columns like a conventional database. 8. Issues Although a considerable measure of research is going on big data yet at the same time. Several ideas are still to be investigated. Scientists would attempt to upgrade security stage to enhance capacity of programming to discover propelled dangers, respond in like manner and would create preventive measures for future. Specialists would attempt to enhance quality and dependability of security framework. A few analysts are wanting to taken up information accumulation, pretreatment, incorporation, Map Reduce and investigation utilizing machine learning strategies. They would utilize the outcomes for securing and actualizing preventive measures from dangers to big business information. Specialists would attempt to outline the meet the creation needs of endeavors for growing high caliber item by applying efforts to
  • 7. IJEEI ISSN: 2089-3272  Big Data-Survey (PSG Aruna Sri) 80 establish safety with the assistance of big data Analytics with Hadoop. A few specialists are utilizing systems administration checking tools like Packet pig, Mahout and so on to Improve the security levels. Targeted threats will be analyzed by using hadoop cluster. 9. Conclusion Big data is going to keep developing amid the following years, and every information researcher will need to oversee considerably more measure of information will be more different, bigger, and speedier. We talked about a few experiences about the theme, and what we consider are the fundamental concerns and primary issues for what’s to come. Big data is turning into new final boundary for experimental information research and for business applications. Everyone is warmly welcomed to take part in this fearless trip. References [1] Ms Vibhavari Chavan et al. “Survey Paper on Big Data”. International Journal of Computer Science and Information Technologies. 2014; 5 (6), ISSN: 0975-9646. [2] Michael R Lyu. “Service-generated Big Data and Big Data-as-a-Service”. Internetware. 2013. [3] Suman Arora et al. “Survey Paper on Scheduling in Hadoop”. International Journal of Advanced Research in Computer Science and Software Engineering. 2014; 4. [4] Dhruba Borthakur. “The Hadoop Distributed File System: Architecture and Design”. The Apache Software Foundation. 2007. [5] Yaxiong Zhao et al. “Dache: A Data AwareCaching for Big-Data Applications Using the MapReduce Framework”. Tsinghua science and technology. 2014; 19(1), ISSN: l1007-0214l. [6] Apache Pig. Available at http://pig.apache.org [7] Apache Hive. Available at http://hive.apache.org [8] Apache HBase. Available at http://hbase.apache.org