SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Presented by,
Atul Kushwaha
B.Tech, 3rd
Year(IT)
1109113033
Introduction
Big Data:
•Big data is a term used to describe the voluminous
amount of unstructured and semi-structured data a
company creates.
•Data that would take too much time and cost too much
money to load into a relational database for analysis.
• Big data doesn't refer to any specific quantity, the term
is often used when speaking about petabytes and
exabytes of data.
• The New York Stock Exchange generates about one terabyte of new
trade data per day.
• Facebook hosts approximately 10 billion photos, taking up one
petabyte of storage.
• The Internet Archive stores around 2 petabytes of data, and is
growing at a rate of 20 terabytes per month.
• The Large Hadron Collider near Geneva, Switzerland, produces about
15 petabytes of data per year.
Facts!!
So What Is The Problem?
 The transfer speed is around 100 MB/s
 A standard disk is 1 Terabyte
 Time to read entire disk= 10000 seconds or 3 Hours!
 Increase in processing time may not be as helpful because
• Network bandwidth is now more of a limiting factor
• Physical limits of processor chips have been reached
So What do We Do?
•The obvious solution is that
we use multiple processors to
solve the same problem by
fragmenting it into pieces.
•Imagine if we had 100 drives,
each holding one hundredth
of the data. Working in
parallel, we could read the
data in under two minutes.
Distributed Computing Vs
Parallelization
 Parallelization- Multiple processors or CPU’s
in a single machine
 Distributed Computing- Multiple computers
connected via a network
Problems In Distributed Computing
• Hardware Failure:
As soon as we start using many pieces of
hardware, the chance that one will fail is fairly
high.
• Combine the data after analysis:
Most analysis tasks need to be able to combine
the data in some way; data read from one
disk may need to be combined with the data
from any of the other 99 disks.
To The Rescue!
Apache Hadoop is a framework for running applications on large
cluster built of commodity hardware.
A common way of avoiding data loss is through replication:
redundant copies of the data are kept by the system so that in the
event of failure, there is another copy available. The Hadoop
Distributed Filesystem (HDFS), takes care of this problem.
The second problem is solved by a simple programming model-
Mapreduce. Hadoop is the popular open source implementation
of MapReduce, a powerful tool designed for deep analysis and
transformation of very large data sets.
What is ? ?
• It is an open source project by the Apache Foundation to handle
large data processing
• It was inspired by Google’s MapReduce and Google File System
(GFS) papers in 2003 and 2004
• It was originally conceived by Doug Cutting in 2005 and first used by
Yahoo! in 2006
• It is named after his son’s pet elephant incidentally
• It is basically a distributed file system which is written in Java.
Hadoop Approach to Distributed
Computing
 The theoretical 1000-CPU machine would cost a very
large amount of money, far more than 1,000 single-CPU.
 Hadoop will tie these smaller and more reasonably priced
machines together into a single cost-effective computer
cluster.
 Hadoop provides a simplified programming model which
allows the user to quickly write and test distributed
systems, and its’ efficient, automatic distribution of data
and work across machines and in turn utilizing the
underlying parallelism of the CPU cores.
Hadoop Components
HDFSHDFS
Storage
Self-healing
high-bandwidth
clustered storage
MapReduceMapReduce
Processing
Fault-tolerant
distributed
processing
Hadoop MapReduce
 MapReduce is a programming model
 Programs written in this functional style are automatically parallelized and
executed on a large cluster of commodity machines
 MapReduce is an associated implementation for processing and generating
large data sets.
The Programming Model Of MapReduce
 Map, written by the user, takes an input pair and produces a set of intermediate
key/value pairs. The MapReduce library groups together all intermediate values
associated with the same intermediate key I and passes them to the Reduce
function.
 The Reduce function, also written by the user, accepts an intermediate key I and a set of values
for that key. It merges together these values to form a possibly smaller set of values
 Filesystems that manage the storage across a network of machines
are called distributed filesystems.
 Hadoop comes with a distributed filesystem called HDFS, which
stands for Hadoop Distributed Filesystem.
 HDFS, the Hadoop Distributed File System, is a distributed file
system designed to hold very large amounts of data (terabytes or even
petabytes), and provide high-throughput access to this information.
 The Hadoop distributed file system (HDFS) is a distributed,
scalable, and portable file-system written in Java for the Hadoop
framework.
HADOOP DISTRIBUTED
FILESYSTEM (HDFS)
HDFS
 It manages storage on the cluster by breaking
incoming files into pieces, called blocks
 Stores each of the blocks redundantly across the
pool of servers
 It stores three complete copies of each file by
copying each piece to three different servers
Namenodes and Datanodes
 A HDFS cluster has two types of node operating in a master-slave
pattern: a namenode (the master) and a number of datanodes (slave).
 The namenode manages the filesystem namespace. It maintains the
filesystem tree and the metadata for all the files and directories in the
tree.
 Datanodes are the work horses of the filesystem.It manages storage
attached to the nodes that they run on.
 HDFS exposes a file system namespace and allows user data to be
stored in files.
 Internally, a file is split into one or more blocks and these blocks are
stored in a set of DataNodes.
Who Uses Hadoop?
Advantages Over RDBMS
• Scalable: It can reliably store and process petabytes.
• Economical: It distributes the data and processing across
clusters of commonly available computers (in
thousands).
• Efficient: By distributing the data, it can process it in
parallel on the nodes where the data is located.
• Reliable: It automatically maintains multiple copies of
data and automatically redeploys computing tasks based
on failures.
Conclusion
 So major companies like facebook
amazon,yahoo,etc. are adapting Hadoop and in
future there can be many names in the list.
 This technology has bright future scope because day
by day need of data would increase and security
issues also the major point.
 Hence Hadoop Technology is the best appropriate
approach for handling the large data in smart way
and its future is bright…
Questions???
‘Thank You’

Weitere ähnliche Inhalte

Was ist angesagt?

The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
J Singh
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 

Was ist angesagt? (20)

What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop
HadoopHadoop
Hadoop
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 

Andere mochten auch (6)

Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Computer network ppt
Computer network pptComputer network ppt
Computer network ppt
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Basic concepts of computer Networking
Basic concepts of computer NetworkingBasic concepts of computer Networking
Basic concepts of computer Networking
 

Ähnlich wie Hadoop Technology

Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
saili mane
 

Ähnlich wie Hadoop Technology (20)

Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
 
Hadoop
HadoopHadoop
Hadoop
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Cppt
CpptCppt
Cppt
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 

Kürzlich hochgeladen

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Kürzlich hochgeladen (20)

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 

Hadoop Technology

  • 1. Presented by, Atul Kushwaha B.Tech, 3rd Year(IT) 1109113033
  • 2. Introduction Big Data: •Big data is a term used to describe the voluminous amount of unstructured and semi-structured data a company creates. •Data that would take too much time and cost too much money to load into a relational database for analysis. • Big data doesn't refer to any specific quantity, the term is often used when speaking about petabytes and exabytes of data.
  • 3. • The New York Stock Exchange generates about one terabyte of new trade data per day. • Facebook hosts approximately 10 billion photos, taking up one petabyte of storage. • The Internet Archive stores around 2 petabytes of data, and is growing at a rate of 20 terabytes per month. • The Large Hadron Collider near Geneva, Switzerland, produces about 15 petabytes of data per year. Facts!!
  • 4. So What Is The Problem?  The transfer speed is around 100 MB/s  A standard disk is 1 Terabyte  Time to read entire disk= 10000 seconds or 3 Hours!  Increase in processing time may not be as helpful because • Network bandwidth is now more of a limiting factor • Physical limits of processor chips have been reached
  • 5. So What do We Do? •The obvious solution is that we use multiple processors to solve the same problem by fragmenting it into pieces. •Imagine if we had 100 drives, each holding one hundredth of the data. Working in parallel, we could read the data in under two minutes.
  • 6. Distributed Computing Vs Parallelization  Parallelization- Multiple processors or CPU’s in a single machine  Distributed Computing- Multiple computers connected via a network
  • 7. Problems In Distributed Computing • Hardware Failure: As soon as we start using many pieces of hardware, the chance that one will fail is fairly high. • Combine the data after analysis: Most analysis tasks need to be able to combine the data in some way; data read from one disk may need to be combined with the data from any of the other 99 disks.
  • 8. To The Rescue! Apache Hadoop is a framework for running applications on large cluster built of commodity hardware. A common way of avoiding data loss is through replication: redundant copies of the data are kept by the system so that in the event of failure, there is another copy available. The Hadoop Distributed Filesystem (HDFS), takes care of this problem. The second problem is solved by a simple programming model- Mapreduce. Hadoop is the popular open source implementation of MapReduce, a powerful tool designed for deep analysis and transformation of very large data sets.
  • 9. What is ? ? • It is an open source project by the Apache Foundation to handle large data processing • It was inspired by Google’s MapReduce and Google File System (GFS) papers in 2003 and 2004 • It was originally conceived by Doug Cutting in 2005 and first used by Yahoo! in 2006 • It is named after his son’s pet elephant incidentally • It is basically a distributed file system which is written in Java.
  • 10. Hadoop Approach to Distributed Computing  The theoretical 1000-CPU machine would cost a very large amount of money, far more than 1,000 single-CPU.  Hadoop will tie these smaller and more reasonably priced machines together into a single cost-effective computer cluster.  Hadoop provides a simplified programming model which allows the user to quickly write and test distributed systems, and its’ efficient, automatic distribution of data and work across machines and in turn utilizing the underlying parallelism of the CPU cores.
  • 12. Hadoop MapReduce  MapReduce is a programming model  Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines  MapReduce is an associated implementation for processing and generating large data sets.
  • 13. The Programming Model Of MapReduce  Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs. The MapReduce library groups together all intermediate values associated with the same intermediate key I and passes them to the Reduce function.
  • 14.  The Reduce function, also written by the user, accepts an intermediate key I and a set of values for that key. It merges together these values to form a possibly smaller set of values
  • 15.  Filesystems that manage the storage across a network of machines are called distributed filesystems.  Hadoop comes with a distributed filesystem called HDFS, which stands for Hadoop Distributed Filesystem.  HDFS, the Hadoop Distributed File System, is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes), and provide high-throughput access to this information.  The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file-system written in Java for the Hadoop framework. HADOOP DISTRIBUTED FILESYSTEM (HDFS)
  • 16. HDFS  It manages storage on the cluster by breaking incoming files into pieces, called blocks  Stores each of the blocks redundantly across the pool of servers  It stores three complete copies of each file by copying each piece to three different servers
  • 17. Namenodes and Datanodes  A HDFS cluster has two types of node operating in a master-slave pattern: a namenode (the master) and a number of datanodes (slave).  The namenode manages the filesystem namespace. It maintains the filesystem tree and the metadata for all the files and directories in the tree.  Datanodes are the work horses of the filesystem.It manages storage attached to the nodes that they run on.  HDFS exposes a file system namespace and allows user data to be stored in files.  Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes.
  • 18.
  • 20. Advantages Over RDBMS • Scalable: It can reliably store and process petabytes. • Economical: It distributes the data and processing across clusters of commonly available computers (in thousands). • Efficient: By distributing the data, it can process it in parallel on the nodes where the data is located. • Reliable: It automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.
  • 21. Conclusion  So major companies like facebook amazon,yahoo,etc. are adapting Hadoop and in future there can be many names in the list.  This technology has bright future scope because day by day need of data would increase and security issues also the major point.  Hence Hadoop Technology is the best appropriate approach for handling the large data in smart way and its future is bright…

Hinweis der Redaktion

  1. Gm…I think everybody heard abt hadoop bcz before fews days an institutional lecture was held in our clg but nobody attended it…so I m give u a brief introduction abt hadoop what it is,what is the need of it,its feature and applications.basically hadoop deals with big data.
  2. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users.[2] It is licensed under the Apache License 2.0.
  3. Hadoop is designed to efficiently process large volumes of information by connecting many commodity computers together to work in parallel. A 1000 CPU single machine (i.e a supercomputer with a vast memory storage) would cost a lot. Thus Hadoop parallelizes the computation by tying smaller and more reasonably priced machines together into a single costeffective compute cluster.
  4. Hadoop is designed to efficiently process large volumes of information by connecting many commodity computers together to work in parallel. A 1000 CPU single machine (i.e a supercomputer with a vast memory storage) would cost a lot. Thus Hadoop parallelizes the computation by tying smaller and more reasonably priced machines together into a single costeffective compute cluster.
  5. HDFS cluster/healing. MapReduce
  6. MapReduce is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster.[1]
  7. First explain distributed file system?-managing storage across n/w of machines.
  8. Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.
  9. Rackspace for log processing. Netflix for recommendations. LinkedIn for social graph. SU for page recommendations.