SlideShare ist ein Scribd-Unternehmen logo
1 von 48
“Data is a precious things and will
last longer than the system
themselves”
– Tim Berners Lee
Sandeep Kumar
What is Data ?
• What is Data ?
• And why should we care about it ?
What is Big Data ?
• Big data is a collection of data sets so large and
complex that it becomes difficult to process using
traditional data processing applications.
Few Examples
• Web logs
• RFID
• Social Data-Facebook, Linkedin, Twitter.
• Call Detail Records
• Large-Scale e-commerce
• Medical Records
• Video archives
• Atmospheric Science
• Astronomy
• Feeds
• Media & Advertising.
What is Big Data ?
• Ancestry.com stores around 2.5 petabytes of Data.
• The New York Stock Exchange generates about one
terabyte of new trade data per day.
• The Internet Archive stores around 2 petabytes of
data, and is growing at a rate of 20 terabytes per
month. (http://archive.org/web/web.php)
How to Process The Big Data?
• Need to process large datasets (>100TB)
• Only reading 100TB of data can be overwhelming
• Takes ~11 days to read on a standard computer
• Takes a day across a 10GB link (very high end
storage solution)
• On a single node (@50MB/s) – 23days
• On a 1000 node cluster – 33min
Not so easy………..
• The challenges are in search, sharing, transfer,
visualization etc.
• Moving data from storage cluster to computation cluster is not
feasible.
• In large cluster failure is expected . Computer fails everyday.
• Very expensive to build reliability into each application.
• massively parallel software running on tens, hundreds, or even
thousands of servers
• A programmer worries about errors, data motion,
communication.
What We are looking for.
What we are looking for.
• A common infrastructure and standard set of tools to
handle this complexity.
• A Efficient, Reliable fault-tolerant and usable
framework.
What is Hadoop ?
• Its a framework that allows distributed processing of
large data sets across clusters of computers.
• It is designed to scale up from single servers to
thousands of machines.
• Its also designed to run on commodity hardware.
 Scalable: store and process petabytes, scale by adding
HW and added without needing to change data
formats.
 Economical: 1000s of commodity machines.
 Efficient: runs tasks where data is located.
 Flexible: Hadoop is schema-less, and can absorb any
type of data, structured or not, from any number of
sources.
 Fault tolerant: When you lose a node, the system
redirects work to another location of the data and
continues processing without missing a beat.
Hadoop is….
Hadoop is useful for…….
• Batch Data Processing.
• Log Processing.
• Document Analysis & Indexing.
• Text Mining.
• Crawl Data Processing.
• Highly parallel data intensive distributed applications.
Use The Right Tool For The Right Job
Hadoop:RDBMS
When to use?
• Write once read many times.
• Structured or Not (Agility)
• Batch Processing
When to use?
• Interactive Reporting (<1sec)
• Multistep Transactions
• Lots of Inserts/Updates/Deletes
Hadoop Terminology…….
Node 1
Hadoop Terminology…….
Node 1
Node 2
Hadoop Terminology…….
Node 1
Node 2
Node 3
Hadoop Terminology…….
Node 1
Node 2
.
.
Node 3
Rack 1
Hadoop Terminology…….
Node 1
Node 2
.
.
Node 3
Rack 1
Node 1
Node 2
.
.
Node 3
Rack 2
Hadoop Terminology…….
Node 1
Node 2
.
.
Node 3
Rack 1
Node 1
Node 2
.
.
Node 3
Rack 2
Node 1
Node 2
.
.
Node 3
Rack 3
Hadoop Terminology…….
Node 1
Node 2
.
.
Node 3
Rack 1
Node 1
Node 2
.
.
Node 3
Rack 2
Node 1
Node 2
.
.
Node 3
Rack 3
Hadoop Cluster
Hadoop Framework…….
Hadoop Nodes…….
• HDFS Nodes
 NameNode (Master)
 DataNode (Slaves)
 Checkpoint Node
 Secondary NameNode (deprecated)
 Backup Node
Hadoop Nodes…….
• MapReduce nodes
 JobTracker (Master)
 TaskTracker (Slaves)
Hadoop Nodes-Overview
Hadoop Nodes-NameNode
• Manages the filesystem namespace and metadata
• Replicate missing blocks
• No data goes through the NameNode
• NameNode mainly consists of:
 fsimage: Contains a checkpoint copy of the metadata on disk
 edit logs: Records all write operations, synchronizes with
metadata in RAM after each write
 In case of ‘power failure’ on NameNode Can recover using
fsimage + edit logs
Hadoop Nodes-CheckPoint Node
• Periodically creates checkpoints of NameNode filesystem
• The Checkpoint node should run on a different machine
than the NameNode
• Should have same storage requirements as NameNode
• There can be many Checkpoint nodes per cluster
Hadoop Nodes-BackUp Node
• Difference with Checkpoint node is that it keeps and up-
to-date copy of metadata in RAM
• Same RAM requirements as NameNode
• Can only have one Backup node per cluster
Hadoop Nodes-Data Node
Can be many per Hadoop cluster
•Manages blocks with data and serves them to
clients
•Periodically reports to NameNode the list of
blocks it stores
•Use inexpensive commodity hardware for this
node
Hadoop Nodes-Job Tracker
One per Hadoop cluster (Multiple namenode can be configured in Hadoop 2.2 or letter version)
•Receives job requests submitted by client
•Schedules and monitors MapReduce jobs on task
trackers
Hadoop Nodes-Task Tracker
• Can be many per Hadoop cluster
• Executes MapReduce operations
• Reads blocks from DataNodes
Map Reduce
It offers:
• Operates on key and value pairs.
• Two major functions: Map() and Reduce()
• Input formats and splits
• Number of tasks.
• Provides status about jobs to users
• Monitors task progress
Map Reduce Diagram
Map Reduce Architecture.
Map Reduce Job.
JobTracker
client
TaskTackers &
Datanodes
←4.tasks
NameNode
3. Namespace info
Input Output .
The MapReduce framework operates on <key, value> pairs.
It views the input to the job as a set of <key, value> pairs and
produces a set of <key, value> pairs as the output of the job.
Input Output..
Input and Output types of a MapReduce job:
(input) <k1, v1> -> map -> <k2, v2> -> combine -> <k2, v2>
-> reduce -> <k3, v3> (output)
Reference:
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html
HDFS Architecture
Hadoop Tools…….
Hive
 It’s a data warehouse system for Hadoop
 Providing data summarization, query, and analysis.
Hadoop Tools…….
• Pig
 Its a high-level platform for creating MapReduce
programs used with Hadoop.
 Developed by Yahoo.
Hadoop Tools…….
Hbase
 Used when needs random, real-time read/write access to
your Big Data.
 Also used for storing historical data.
Hadoop Tools…….
• Hue
 Its a Web application for interacting with Apache Hadoop.
It supports a file browser, job tracker interface, Hive, Pig
and more.
Hadoop Tools…….
• Sqoop
 Its a Command-line interface application for transferring
data between relational databases and Hadoop.
 Microsoft uses a Sqoop-based connector to help transfer
data from Microsoft SQL Server databases to Hadoop.
Hadoop Tools…….
• Flume
 Its used for efficiently collecting, aggregating, and
moving large amounts of distributed data or log data.
Hadoop Tools…….
• Flume Model
Hadoop in the Enterprise…….
There are many tools developed on top of hadoop these days and
those are available in market and being used widely in industry.
We can get more on it from Cloudera, hortonworks and from
Google.com
Thanks for your time today.

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop: The elephant in the room
Hadoop: The elephant in the roomHadoop: The elephant in the room
Hadoop: The elephant in the room
cacois
 
Payment Gateway Live hadoop project
Payment Gateway Live hadoop projectPayment Gateway Live hadoop project
Payment Gateway Live hadoop project
Kamal A
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
 

Was ist angesagt? (20)

Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Hadoop Fundamentals
Hadoop FundamentalsHadoop Fundamentals
Hadoop Fundamentals
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Hadoop Ecosystem Overview
Hadoop Ecosystem OverviewHadoop Ecosystem Overview
Hadoop Ecosystem Overview
 
Hadoop: The elephant in the room
Hadoop: The elephant in the roomHadoop: The elephant in the room
Hadoop: The elephant in the room
 
Hadoop
Hadoop Hadoop
Hadoop
 
Payment Gateway Live hadoop project
Payment Gateway Live hadoop projectPayment Gateway Live hadoop project
Payment Gateway Live hadoop project
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop
HadoopHadoop
Hadoop
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Hadoop
Hadoop Hadoop
Hadoop
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 

Andere mochten auch

Tarzan
TarzanTarzan
Tarzan
bymafe
 
Laughwithnotears xrist1
Laughwithnotears xrist1Laughwithnotears xrist1
Laughwithnotears xrist1
bymafe
 
το χρονικό της ‘’στάσης του νίκα’’
το χρονικό της ‘’στάσης του νίκα’’το χρονικό της ‘’στάσης του νίκα’’
το χρονικό της ‘’στάσης του νίκα’’
guest67bc6b
 
百会办公门户使用手册
百会办公门户使用手册百会办公门户使用手册
百会办公门户使用手册
ideapress
 
Uusi kasvu ja uusi työ akava berd volume
Uusi kasvu ja uusi työ akava berd volumeUusi kasvu ja uusi työ akava berd volume
Uusi kasvu ja uusi työ akava berd volume
Vesa Vuorenkoski
 
παρουσίασ..
παρουσίασ..παρουσίασ..
παρουσίασ..
bymafe
 
Wayfinding objects mechanism
Wayfinding objects mechanismWayfinding objects mechanism
Wayfinding objects mechanism
jianfeng
 
Sustaining Your Business After A Disaster Fmj Jan Feb 2012
Sustaining Your Business After A Disaster   Fmj Jan Feb 2012Sustaining Your Business After A Disaster   Fmj Jan Feb 2012
Sustaining Your Business After A Disaster Fmj Jan Feb 2012
afpizzitola
 
تصاميمي
تصاميميتصاميمي
تصاميمي
botareq
 

Andere mochten auch (20)

Tarzan
TarzanTarzan
Tarzan
 
Laughwithnotears xrist1
Laughwithnotears xrist1Laughwithnotears xrist1
Laughwithnotears xrist1
 
Love your local school
Love your local schoolLove your local school
Love your local school
 
το χρονικό της ‘’στάσης του νίκα’’
το χρονικό της ‘’στάσης του νίκα’’το χρονικό της ‘’στάσης του νίκα’’
το χρονικό της ‘’στάσης του νίκα’’
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
 
Perk Up Your Pub
Perk Up Your PubPerk Up Your Pub
Perk Up Your Pub
 
Africa
AfricaAfrica
Africa
 
Greece from above
Greece from aboveGreece from above
Greece from above
 
Turismo Emilia Romagna: un'APT alla scoperta del web sociale
Turismo Emilia Romagna: un'APT alla scoperta del web socialeTurismo Emilia Romagna: un'APT alla scoperta del web sociale
Turismo Emilia Romagna: un'APT alla scoperta del web sociale
 
Dramas for creation time
Dramas for creation timeDramas for creation time
Dramas for creation time
 
Arkas
ArkasArkas
Arkas
 
Usa jeopardy
Usa jeopardyUsa jeopardy
Usa jeopardy
 
百会办公门户使用手册
百会办公门户使用手册百会办公门户使用手册
百会办公门户使用手册
 
Uusi kasvu ja uusi työ akava berd volume
Uusi kasvu ja uusi työ akava berd volumeUusi kasvu ja uusi työ akava berd volume
Uusi kasvu ja uusi työ akava berd volume
 
S'outiller pour mieux s'organiser
S'outiller pour mieux s'organiserS'outiller pour mieux s'organiser
S'outiller pour mieux s'organiser
 
παρουσίασ..
παρουσίασ..παρουσίασ..
παρουσίασ..
 
Gli studenti e la rete
Gli studenti e la reteGli studenti e la rete
Gli studenti e la rete
 
Wayfinding objects mechanism
Wayfinding objects mechanismWayfinding objects mechanism
Wayfinding objects mechanism
 
Sustaining Your Business After A Disaster Fmj Jan Feb 2012
Sustaining Your Business After A Disaster   Fmj Jan Feb 2012Sustaining Your Business After A Disaster   Fmj Jan Feb 2012
Sustaining Your Business After A Disaster Fmj Jan Feb 2012
 
تصاميمي
تصاميميتصاميمي
تصاميمي
 

Ähnlich wie Hadoop-Quick introduction

Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
saipriyacoool
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
Taldor Group
 

Ähnlich wie Hadoop-Quick introduction (20)

Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
Big Data Technologies - Hadoop
Big Data Technologies - HadoopBig Data Technologies - Hadoop
Big Data Technologies - Hadoop
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Hadoop
HadoopHadoop
Hadoop
 
Big data applications
Big data applicationsBig data applications
Big data applications
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 

Kürzlich hochgeladen

Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 

Kürzlich hochgeladen (20)

Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 

Hadoop-Quick introduction

  • 1. “Data is a precious things and will last longer than the system themselves” – Tim Berners Lee
  • 3. What is Data ? • What is Data ? • And why should we care about it ?
  • 4. What is Big Data ? • Big data is a collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications.
  • 5. Few Examples • Web logs • RFID • Social Data-Facebook, Linkedin, Twitter. • Call Detail Records • Large-Scale e-commerce • Medical Records • Video archives • Atmospheric Science • Astronomy • Feeds • Media & Advertising.
  • 6. What is Big Data ? • Ancestry.com stores around 2.5 petabytes of Data. • The New York Stock Exchange generates about one terabyte of new trade data per day. • The Internet Archive stores around 2 petabytes of data, and is growing at a rate of 20 terabytes per month. (http://archive.org/web/web.php)
  • 7. How to Process The Big Data? • Need to process large datasets (>100TB) • Only reading 100TB of data can be overwhelming • Takes ~11 days to read on a standard computer • Takes a day across a 10GB link (very high end storage solution) • On a single node (@50MB/s) – 23days • On a 1000 node cluster – 33min
  • 8. Not so easy……….. • The challenges are in search, sharing, transfer, visualization etc. • Moving data from storage cluster to computation cluster is not feasible. • In large cluster failure is expected . Computer fails everyday. • Very expensive to build reliability into each application. • massively parallel software running on tens, hundreds, or even thousands of servers • A programmer worries about errors, data motion, communication.
  • 9. What We are looking for.
  • 10. What we are looking for. • A common infrastructure and standard set of tools to handle this complexity. • A Efficient, Reliable fault-tolerant and usable framework.
  • 11. What is Hadoop ? • Its a framework that allows distributed processing of large data sets across clusters of computers. • It is designed to scale up from single servers to thousands of machines. • Its also designed to run on commodity hardware.
  • 12.  Scalable: store and process petabytes, scale by adding HW and added without needing to change data formats.  Economical: 1000s of commodity machines.  Efficient: runs tasks where data is located.  Flexible: Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources.  Fault tolerant: When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat. Hadoop is….
  • 13. Hadoop is useful for……. • Batch Data Processing. • Log Processing. • Document Analysis & Indexing. • Text Mining. • Crawl Data Processing. • Highly parallel data intensive distributed applications.
  • 14. Use The Right Tool For The Right Job Hadoop:RDBMS When to use? • Write once read many times. • Structured or Not (Agility) • Batch Processing When to use? • Interactive Reporting (<1sec) • Multistep Transactions • Lots of Inserts/Updates/Deletes
  • 19. Hadoop Terminology……. Node 1 Node 2 . . Node 3 Rack 1 Node 1 Node 2 . . Node 3 Rack 2
  • 20. Hadoop Terminology……. Node 1 Node 2 . . Node 3 Rack 1 Node 1 Node 2 . . Node 3 Rack 2 Node 1 Node 2 . . Node 3 Rack 3
  • 21. Hadoop Terminology……. Node 1 Node 2 . . Node 3 Rack 1 Node 1 Node 2 . . Node 3 Rack 2 Node 1 Node 2 . . Node 3 Rack 3 Hadoop Cluster
  • 23. Hadoop Nodes……. • HDFS Nodes  NameNode (Master)  DataNode (Slaves)  Checkpoint Node  Secondary NameNode (deprecated)  Backup Node
  • 24. Hadoop Nodes……. • MapReduce nodes  JobTracker (Master)  TaskTracker (Slaves)
  • 26. Hadoop Nodes-NameNode • Manages the filesystem namespace and metadata • Replicate missing blocks • No data goes through the NameNode • NameNode mainly consists of:  fsimage: Contains a checkpoint copy of the metadata on disk  edit logs: Records all write operations, synchronizes with metadata in RAM after each write  In case of ‘power failure’ on NameNode Can recover using fsimage + edit logs
  • 27. Hadoop Nodes-CheckPoint Node • Periodically creates checkpoints of NameNode filesystem • The Checkpoint node should run on a different machine than the NameNode • Should have same storage requirements as NameNode • There can be many Checkpoint nodes per cluster
  • 28. Hadoop Nodes-BackUp Node • Difference with Checkpoint node is that it keeps and up- to-date copy of metadata in RAM • Same RAM requirements as NameNode • Can only have one Backup node per cluster
  • 29. Hadoop Nodes-Data Node Can be many per Hadoop cluster •Manages blocks with data and serves them to clients •Periodically reports to NameNode the list of blocks it stores •Use inexpensive commodity hardware for this node
  • 30. Hadoop Nodes-Job Tracker One per Hadoop cluster (Multiple namenode can be configured in Hadoop 2.2 or letter version) •Receives job requests submitted by client •Schedules and monitors MapReduce jobs on task trackers
  • 31. Hadoop Nodes-Task Tracker • Can be many per Hadoop cluster • Executes MapReduce operations • Reads blocks from DataNodes
  • 32. Map Reduce It offers: • Operates on key and value pairs. • Two major functions: Map() and Reduce() • Input formats and splits • Number of tasks. • Provides status about jobs to users • Monitors task progress
  • 35. Map Reduce Job. JobTracker client TaskTackers & Datanodes ←4.tasks NameNode 3. Namespace info
  • 36. Input Output . The MapReduce framework operates on <key, value> pairs. It views the input to the job as a set of <key, value> pairs and produces a set of <key, value> pairs as the output of the job.
  • 37. Input Output.. Input and Output types of a MapReduce job: (input) <k1, v1> -> map -> <k2, v2> -> combine -> <k2, v2> -> reduce -> <k3, v3> (output) Reference: http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html
  • 39. Hadoop Tools……. Hive  It’s a data warehouse system for Hadoop  Providing data summarization, query, and analysis.
  • 40. Hadoop Tools……. • Pig  Its a high-level platform for creating MapReduce programs used with Hadoop.  Developed by Yahoo.
  • 41. Hadoop Tools……. Hbase  Used when needs random, real-time read/write access to your Big Data.  Also used for storing historical data.
  • 42. Hadoop Tools……. • Hue  Its a Web application for interacting with Apache Hadoop. It supports a file browser, job tracker interface, Hive, Pig and more.
  • 43. Hadoop Tools……. • Sqoop  Its a Command-line interface application for transferring data between relational databases and Hadoop.  Microsoft uses a Sqoop-based connector to help transfer data from Microsoft SQL Server databases to Hadoop.
  • 44. Hadoop Tools……. • Flume  Its used for efficiently collecting, aggregating, and moving large amounts of distributed data or log data.
  • 46. Hadoop in the Enterprise…….
  • 47. There are many tools developed on top of hadoop these days and those are available in market and being used widely in industry. We can get more on it from Cloudera, hortonworks and from Google.com
  • 48. Thanks for your time today.