Introdution to Apache Hadoop

Apache Hadoop
● What is it ?
● Architecture
● Related Projects
● Large users

Hadoop – What is it ?
● An open source system developed using Java
● Supports very large data sets
● Supports large clusters of servers
● Designed to run on pre existing low cost hardware
● Allows for fragmentation of work over cluster
● Allows for fragmentation of storage over cluster
● Provides resiliance via automatic failure handling

Hadoop - Architecture
Hadoop consists of
● Hadoop Common
Common utilities for Hadoop module support
● Hadoop MapReduce
Parallel processing of Hadoop data
● Hadoop Yarn
Scheduler and resource manager
● Hadoop Distributed File System (HDFS)
A Master/Slave file system which spreads the Hadoop data over a very
large cluster of slave data nodes controlled by a single name node.

Hadoop – Related Projects
● Pig - for analysing large data sets
● Hive – data warehouse system for Hadoop
● Mahout – machine learning and data mining
● Avro – a data serialization system
● Zoo Keeper – helps build distributed applications
● Chukwa – data collection and analysis

Hadoop – Related Projects
● Hue – Hadoop user interface
● Oozie – work flow scheduler
● Hama – bulk synchronous parallel framework
– For massive scientific computations
● Nutch – web crawler
● Hbase – Non relational database

Hadoop – Large Users
● Yahoo
– 10,000 core Linux cluster
● Facebook
– 100 Petabytes, growing at .5 Petabytes a day
● Amazon
– Its possible to run Hadoop on Amazon's EC2 and S3

Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems

Introdution to Apache Hadoop

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Andere mochten auch

Andere mochten auch (9)

Ähnlich wie Introdution to Apache Hadoop

Ähnlich wie Introdution to Apache Hadoop (20)

Mehr von Mike Frampton

Mehr von Mike Frampton (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Introdution to Apache Hadoop