Hadoop

Nishant Gandhi
Nishant GandhiCo-Founder at Patistic Innovations
Hadoop
        Framework for Distributed Applications




                                        Nishant M Gandhi
                                BE 4th YEAR Comp. Eng.
C K Pithawalla College of Engineering & Technology,Surat.
Hadoop

• Introduction
• History
• Key Technologies
  – MapReduce
  – HDFS
• Other Projects On Hadoop
• Conclusion
Introduction:

What is                       ?
 Hadoop is a framework for running applications on large clusters
 built of commodity hardware.
                          ----HADOOP WIKI


  Hadoop is a free, Java-based programming framework that
  supports the processing of large data sets in a distributed
  computing environment.
Introduction (conti..)


 #1 Open Source
 #2 Part of Apache group
 #3 Power of JAVA
 #4 Supported By Big Web Giant Companies




#1 Google’s Powerful Computation MapReduce Technology
#2 Hadoop Distributed File System(HDFS) inspired by Google File
System(GFS)
#3 Used for Cluster & Distributed Computing
#4 Support from…
History:
Inventor Doug Cutting, creator of Apache Lucene

The Origin of the Name “Hadoop”:

The name my kid gave a stuffed yellow elephant. Short, relatively easy to
spell and pronounce, meaningless, and not used elsewhere: those are my
naming criteria. ---Daug Cutting.

Started with building Web Search Engine
   •Nutch in 2002
   •Aim was to index billions of pages
   •Architecture can’t support billions of pages
Google’s GFS in 2003 solved storage problem
   •Nutch Distributed Filesystem(NDFS) in 2004
Google’s MapReduce in 2004
   •MapReduce implimented in Nutch 2005
Feb 2006 they moved out of Nutch to form an independent
subproject of Lucene called Hadoop.
History (conti..)
At around the same time, Doug Cutting joined Yahoo
February 2008 , Yahoo! announced that its production search index
was being generated by a 10,000-core Hadoop cluster




 In January 2008, Hadoop was made its own top-level project at
 apache, confirming its success and its diverse, active community.
 By this time Hadoop was being used by many other companies
 besides Yahoo! such as
     • Last.fm
     • Facebook
     • The New York Times
     • Twitter
     • Microsoft
     • IBM
Key Technologies:
   •MapReduce
     -Computational Parallel Programming Model
     -Technology developed by google




   •Hadoop Distributed File System
     -Distributed File System for large data set
     -Inspired by Google File System
Key Technologies: MapReduce
Key Technologies: MapReduce

 •   Programming model developed at Google

 •   Sort/merge based distributed computing

 •   Initially, it was intended for their internal search/indexing
     application, but now used extensively by more organizations
     (e.g., Yahoo, Amazon.com, IBM, etc.)

 •   It is functional style programming (e.g., LISP) that is naturally
     parallelizable across a large cluster of workstations or PCS.

 •   The underlying system takes care of the partitioning of the
     input data, scheduling the program’s execution across several
     machines, handling machine failures, and managing required
     inter-machine communication. (This is the key for Hadoop’s
     success)
Key Technologies: HDFS

 At Google MapReduce operation are run on a special file system
  called Google File System (GFS) that is highly optimized for this
  purpose.

 GFS is not open source.

 Doug Cutting and others at Yahoo! reverse engineered the GFS
  and called it Hadoop Distributed File System (HDFS).
Key Technologies: HDFS
Key Technologies: HDFS

  •   Very Large Distributed File System
      – 10K nodes, 100 million files, 10 PB
  •   Assumes Commodity Hardware
      – Files are replicated to handle hardware failure
      – Detect failures and recovers from them
  •   Optimized for Batch Processing
      – Data locations exposed so that computations can move to
      where data resides
      – Provides very high aggregate bandwidth
  •   User Space, runs on heterogeneous OS
Other Projects on Hadoop:

        ZooKeeper: co-ordination services



        Pig: A high-level data-flow language and execution
        framework for parallel computation.



        Hive:A data warehouse infrastructure that provides
        data summarization and ad hoc querying.



         Chukwa: A data collection system for managing
         large distributed systems.
Other Projects on Hadoop:

               Avro: Apache Avro is a data serialization system.
               Avro provides:
               •Rich data structures.
               •A compact, fast, binary data format.
               •A container file, to store persistent data.
               •Simple integration with dynamic languages.


               Just as Google's Bigtable leverages the
               distributed data storage provided by the
               Google File System, HBase provides
               Bigtable-like capabilities on top of
               Hadoop Core.
Hadoop Architecture on DELL C Series
Server:
Conclusion:



Hadoop has been very effective solution for companies dealing
 with the data in perabytes.

It has solved many problems in industry related to huge data
 management and distributed system.

As it is open source, so it is adopted by companies widely.
Thank You…..
1 von 17

Recomendados

Introduction to Hadoop von
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopDr. C.V. Suresh Babu
2.2K views29 Folien
Introduction to Hadoop and Hadoop component von
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
2.7K views15 Folien
Hadoop Tutorial For Beginners von
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersDataflair Web Services Pvt Ltd
1.3K views26 Folien
HADOOP TECHNOLOGY ppt von
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
38.7K views22 Folien
Hadoop Ecosystem von
Hadoop EcosystemHadoop Ecosystem
Hadoop EcosystemSandip Darwade
2.9K views29 Folien
Big Data Analytics with Hadoop von
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
441.9K views64 Folien

Más contenido relacionado

Was ist angesagt?

PPT on Hadoop von
PPT on HadoopPPT on Hadoop
PPT on HadoopShubham Parmar
23.2K views14 Folien
Hadoop File system (HDFS) von
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
8.4K views54 Folien
Hadoop And Their Ecosystem ppt von
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
10K views24 Folien
Hadoop technology von
Hadoop technologyHadoop technology
Hadoop technologytipanagiriharika
2.9K views85 Folien
Unit-3_BDA.ppt von
Unit-3_BDA.pptUnit-3_BDA.ppt
Unit-3_BDA.pptPoojaShah174393
558 views52 Folien
Features of Hadoop von
Features of HadoopFeatures of Hadoop
Features of HadoopDr. C.V. Suresh Babu
506 views13 Folien

Was ist angesagt?(20)

Hadoop And Their Ecosystem ppt von sunera pathan
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan10K views
Introduction to Big Data & Hadoop Architecture - Module 1 von Rohit Agrawal
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal1.3K views
Introduction to Hadoop von joelcrabb
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
joelcrabb2.5K views
Hadoop Overview & Architecture von EMC
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC64.6K views
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo... von Simplilearn
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn992 views
Introduction to Map Reduce von Apache Apex
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
Apache Apex7.7K views
Seminar Presentation Hadoop von Varun Narang
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang81.5K views
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be... von Simplilearn
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn663 views
Introduction to Hadoop von Apache Apex
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Apache Apex14.4K views
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop... von Simplilearn
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn2.1K views

Destacado

Practical Problem Solving with Apache Hadoop & Pig von
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
237.2K views199 Folien
Introduction to Hadoop Technology von
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop TechnologyManish Borkar
2.4K views25 Folien
Hadoop Technologies von
Hadoop TechnologiesHadoop Technologies
Hadoop TechnologiesKannappan Sirchabesan
5.8K views10 Folien
Hadoop Technology von
Hadoop TechnologyHadoop Technology
Hadoop TechnologyAtul Kushwaha
2.5K views22 Folien
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes von
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
4.3K views38 Folien
Big data- HDFS(2nd presentation) von
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Takrim Ul Islam Laskar
967 views21 Folien

Destacado(20)

Practical Problem Solving with Apache Hadoop & Pig von Milind Bhandarkar
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar237.2K views
Introduction to Hadoop Technology von Manish Borkar
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar2.4K views
Distributed Computing with Apache Hadoop: Technology Overview von Konstantin V. Shvachko
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
Hadoop Operations - Best Practices from the Field von DataWorks Summit
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
DataWorks Summit5K views
Hadoop - Lessons Learned von tcurdt
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
tcurdt3.5K views
Hadoop - Overview von Jay
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
Jay 8.2K views
Hadoop HDFS Architeture and Design von sudhakara st
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
sudhakara st9.3K views
Hadoop demo ppt von Phil Young
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
Phil Young40.6K views
Hadoop & HDFS for Beginners von Rahul Jain
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
Rahul Jain24.8K views
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop von royans
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans82.7K views
Pig, Making Hadoop Easy von Nick Dimiduk
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
Nick Dimiduk84.7K views
HIVE: Data Warehousing & Analytics on Hadoop von Zheng Shao
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Zheng Shao111.2K views

Similar a Hadoop

Hadoop training von
Hadoop trainingHadoop training
Hadoop trainingTIB Academy
33 views17 Folien
Anju von
AnjuAnju
AnjuAnju Shekhawat
417 views23 Folien
Introduction to Apache Hadoop Ecosystem von
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemMahabubur Rahaman
5.3K views97 Folien
Hadoop jon von
Hadoop jonHadoop jon
Hadoop jonHumoyun Ahmedov
354 views30 Folien
Big Data in the Microsoft Platform von
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
5.2K views53 Folien
Cap 10 ingles von
Cap  10 inglesCap  10 ingles
Cap 10 inglesElianaSalinas4
43 views15 Folien

Similar a Hadoop(20)

Introduction to Apache Hadoop Ecosystem von Mahabubur Rahaman
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
Mahabubur Rahaman5.3K views
Big Data in the Microsoft Platform von Jesus Rodriguez
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez5.2K views
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2 von tcloudcomputing-tw
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
tcloudcomputing-tw1.9K views
Hadoop And Their Ecosystem von sunera pathan
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan453 views
Introduction to BIg Data and Hadoop von Amir Shaikh
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
Amir Shaikh870 views

Más de Nishant Gandhi

Customer Feedback Analytics for Starbucks von
Customer Feedback Analytics for Starbucks Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Nishant Gandhi
1.1K views32 Folien
Guest Lecture: Introduction to Big Data at Indian Institute of Technology von
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi
469 views18 Folien
Processing Large Graphs von
Processing Large GraphsProcessing Large Graphs
Processing Large GraphsNishant Gandhi
261 views68 Folien
Graph Coloring Algorithms on Pregel Model using Hadoop von
Graph Coloring Algorithms on Pregel Model using HadoopGraph Coloring Algorithms on Pregel Model using Hadoop
Graph Coloring Algorithms on Pregel Model using HadoopNishant Gandhi
623 views38 Folien
Neo4j vs giraph von
Neo4j vs giraphNeo4j vs giraph
Neo4j vs giraphNishant Gandhi
7.1K views7 Folien
Map reduce programming model to solve graph problems von
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsNishant Gandhi
5.9K views26 Folien

Más de Nishant Gandhi(8)

Customer Feedback Analytics for Starbucks von Nishant Gandhi
Customer Feedback Analytics for Starbucks Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks
Nishant Gandhi1.1K views
Guest Lecture: Introduction to Big Data at Indian Institute of Technology von Nishant Gandhi
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Nishant Gandhi469 views
Graph Coloring Algorithms on Pregel Model using Hadoop von Nishant Gandhi
Graph Coloring Algorithms on Pregel Model using HadoopGraph Coloring Algorithms on Pregel Model using Hadoop
Graph Coloring Algorithms on Pregel Model using Hadoop
Nishant Gandhi623 views
Map reduce programming model to solve graph problems von Nishant Gandhi
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problems
Nishant Gandhi5.9K views
Packet tracer practical guide von Nishant Gandhi
Packet tracer practical guidePacket tracer practical guide
Packet tracer practical guide
Nishant Gandhi38.5K views

Hadoop

  • 1. Hadoop Framework for Distributed Applications Nishant M Gandhi BE 4th YEAR Comp. Eng. C K Pithawalla College of Engineering & Technology,Surat.
  • 2. Hadoop • Introduction • History • Key Technologies – MapReduce – HDFS • Other Projects On Hadoop • Conclusion
  • 3. Introduction: What is ? Hadoop is a framework for running applications on large clusters built of commodity hardware. ----HADOOP WIKI Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.
  • 4. Introduction (conti..) #1 Open Source #2 Part of Apache group #3 Power of JAVA #4 Supported By Big Web Giant Companies #1 Google’s Powerful Computation MapReduce Technology #2 Hadoop Distributed File System(HDFS) inspired by Google File System(GFS) #3 Used for Cluster & Distributed Computing #4 Support from…
  • 5. History: Inventor Doug Cutting, creator of Apache Lucene The Origin of the Name “Hadoop”: The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria. ---Daug Cutting. Started with building Web Search Engine •Nutch in 2002 •Aim was to index billions of pages •Architecture can’t support billions of pages Google’s GFS in 2003 solved storage problem •Nutch Distributed Filesystem(NDFS) in 2004 Google’s MapReduce in 2004 •MapReduce implimented in Nutch 2005 Feb 2006 they moved out of Nutch to form an independent subproject of Lucene called Hadoop.
  • 6. History (conti..) At around the same time, Doug Cutting joined Yahoo February 2008 , Yahoo! announced that its production search index was being generated by a 10,000-core Hadoop cluster In January 2008, Hadoop was made its own top-level project at apache, confirming its success and its diverse, active community. By this time Hadoop was being used by many other companies besides Yahoo! such as • Last.fm • Facebook • The New York Times • Twitter • Microsoft • IBM
  • 7. Key Technologies: •MapReduce -Computational Parallel Programming Model -Technology developed by google •Hadoop Distributed File System -Distributed File System for large data set -Inspired by Google File System
  • 9. Key Technologies: MapReduce • Programming model developed at Google • Sort/merge based distributed computing • Initially, it was intended for their internal search/indexing application, but now used extensively by more organizations (e.g., Yahoo, Amazon.com, IBM, etc.) • It is functional style programming (e.g., LISP) that is naturally parallelizable across a large cluster of workstations or PCS. • The underlying system takes care of the partitioning of the input data, scheduling the program’s execution across several machines, handling machine failures, and managing required inter-machine communication. (This is the key for Hadoop’s success)
  • 10. Key Technologies: HDFS  At Google MapReduce operation are run on a special file system called Google File System (GFS) that is highly optimized for this purpose.  GFS is not open source.  Doug Cutting and others at Yahoo! reverse engineered the GFS and called it Hadoop Distributed File System (HDFS).
  • 12. Key Technologies: HDFS • Very Large Distributed File System – 10K nodes, 100 million files, 10 PB • Assumes Commodity Hardware – Files are replicated to handle hardware failure – Detect failures and recovers from them • Optimized for Batch Processing – Data locations exposed so that computations can move to where data resides – Provides very high aggregate bandwidth • User Space, runs on heterogeneous OS
  • 13. Other Projects on Hadoop: ZooKeeper: co-ordination services Pig: A high-level data-flow language and execution framework for parallel computation. Hive:A data warehouse infrastructure that provides data summarization and ad hoc querying. Chukwa: A data collection system for managing large distributed systems.
  • 14. Other Projects on Hadoop: Avro: Apache Avro is a data serialization system. Avro provides: •Rich data structures. •A compact, fast, binary data format. •A container file, to store persistent data. •Simple integration with dynamic languages. Just as Google's Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop Core.
  • 15. Hadoop Architecture on DELL C Series Server:
  • 16. Conclusion: Hadoop has been very effective solution for companies dealing with the data in perabytes. It has solved many problems in industry related to huge data management and distributed system. As it is open source, so it is adopted by companies widely.