SlideShare ist ein Scribd-Unternehmen logo
Introducing:

The Modern Data Operating System
Hadoop is ...
A scalable fault tolerant distributed for data storage and
processing (open source under the Apache license)
- Core Hadoop has two main systems:
● Hadoop Distributed FileSystem (HDFS):
self-healing, high-bandwidth clustered storage

● MapReduce: distributed fault-tolerant
resource management and scheduling
coupled with a scalable data programming
abstraction
Hadoop Origins

>>>

HDFS

>>>

MapReduce

GFS

Map/Reduce

>>>
BigTable
Hadoop Chronicles

GFS

Map/Reduce

BigTable

Doug Cutting
Etymology
● Hadoop was created in 2004
by "Douglass (Doug) Cutting"
● Implemented Google
Filesystem and Big Tables
papers
● He aimed it, to index the
internet in google style for
startup search engine 'Nutch'
● Named it after his son's
elephant shaped favourite
toy named hadoop
What is Big Data?
"In Information Technology, big data is loosely
defined term used to describe set so large and
complex that they became awkward to work with
using on-hand database management tools."
Wikipedia
How big is big?
● 2008: Google processes 20PB a day
● 2012: Facebook ingests 500TB of data a day
● 2009: eBay has 6.5 PB user data + 50 TB a day
● 2011: Yahoo! has 180-200 PB of data
Limitations of Existing Analytics Architecture
Can't explore original raw data

BI Reports + Online Apps
RDBMS (aggregated data)
ETL (Extract, Transfer & Load)

Moving Data from storage to
compute doesn't scale!
Storage Grid
Archiving = Premature death
Mostly Append
Data Collection

Instrumentation (Raw Data Sources)
Why Hadoop?
Challenge: Read 1 TB of data

1 Machine
- 4 IO channels
- Each channel: 100 MB/s

?
45 minutes

10 Machines
- 4 IO channels
- Each channel: 100 MB/s

4.5 minutes
?
Hadoop and Friends
The Key Benefit: Agility/Flexibility
Schema-On-Write (RDBMS)

Schema-On-Read (Hadoop)

- Schema must be created before any
data can be loaded

- Data is simply copied to the file store, no
transformations are needed

- An explicit load operation has to take
place which transforms data to DB internal
structure

- A SerDe (Serializer/Deserializer) is
applied during read tume to extract the
required column (late binding)

- New columns must be be added
explicitly before new data for such
columns can be loaded into the database

- New data can strat flowing anytime and
will appear retroactively once the SerDe is
updated to parse it

- Reads are fast
- Standards / Governance

- Load is fast
- Flexibility / Agility
Hadoop Components
Master/Slave Architecture

Name Node

Data Nodes

Job Tracker

Task Trackers
r=3

NameNode
File metadata:
/kenshoo/data1.txt ---> 1,2,3
/kenshoo/data2.txt ---> 4,5

hdfs-site.xml

dfs.replication

3

5

3

5

4

5

1

4

1

4

2

2

3

Data Nodes

1

2
Underlying FS options

ext3
- released in 2001
- Used by Yahoo!
- bootstrap + format slow
- set:
- noatime
- tune2fs (to turn
off reserved blocks)

ext4
- released in 2008
- Used by Google
- Fast as XFS
- set:
- delayed
allocation off
-noatime
- tune2fs (to turn off
reserved blocks)

XFS
- released in 1993
- Fast
- Drawbacks:
- deleting large # of files
Sample HDFS shell Commands
bin/hadoop
bin/hadoop
bin/hadoop
bin/hadoop
bin/hadoop
bin/hadoop
bin/hadoop
bin/hadoop
bin/hadoop

fs
fs
fs
fs
fs
fs
fs
fs
fs

-ls
-mkdir
-copyFromLocal
-copyToLocal
-moveToLocal
-rm
-tail
-chmod
-setrep -w 4 -R /dir1/s-dir

Mounting using FUSE:
hadoop-fuse-dfs dfs://10.73.9.50 /hdfs
Network Topology

Yahoo! Installation

Name Node

Job Tracker

HBase Master

2

2

3

3

3

4

4

4

5
Rack 1

2

5

5

Rack 2

Rack 3

- 8 core switches
- 100 racks
- 40 servers/rack
- 1 GBit in rack
- 10 GBit among
racks
-Total 11PB
Rack Awareness

NameNode

Name Node

Job Tracker

metadata

HBase Master

file.txt =
A

2

A

7

3

A

8
B

4
5
Rack 1

B

Blk A: A
DN: 2,7,8

13
B

9
10

Rack 2

12

14
15

Rack 3

Blk B: B
DN: 9,12,14
HDFS Writes
Client
NameNode
Core
metadata
A

B

C

file.txt =
A

Blk A:
DN: 2,7,9

A

A

2
3

8
A

4
5
Rack 1

7

9
10

Rack 2
Reading Files
File1.txt parts:
Blk A: 2,7,8
Blk B: 9,12,14

wanna read file1.txt

Client
NameNode
Core
metadata
file.txt =
Blk A: A
DN: 2,7,8
A

2

A

7

3

A

8
B

4
5
Rack 1

B

13
B

9
10

Rack 2

12

14
15

Rack 3

Blk B: B
DN: 9,12,14

Weitere ähnliche Inhalte

Was ist angesagt?

Security Multitenant
Security MultitenantSecurity Multitenant
Security Multitenant
Arush Jain
 
5050 dev nation
5050 dev nation5050 dev nation
5050 dev nation
Arun Gupta
 
Spring dependency injection
Spring dependency injectionSpring dependency injection
Spring dependency injection
srmelody
 
Semantic Search Engines
Semantic Search EnginesSemantic Search Engines
Semantic Search Engines
Atul Shridhar
 
Dependency Injection in Spring in 10min
Dependency Injection in Spring in 10minDependency Injection in Spring in 10min
Dependency Injection in Spring in 10min
Corneil du Plessis
 
JPA and Coherence with TopLink Grid
JPA and Coherence with TopLink GridJPA and Coherence with TopLink Grid
JPA and Coherence with TopLink Grid
James Bayer
 
Hibernate jj
Hibernate jjHibernate jj
Hibernate jj
Joe Jacob
 

Was ist angesagt? (20)

Jdbc
JdbcJdbc
Jdbc
 
Orcale Presentation
Orcale PresentationOrcale Presentation
Orcale Presentation
 
Database Connection Pooling With c3p0
Database Connection Pooling With c3p0Database Connection Pooling With c3p0
Database Connection Pooling With c3p0
 
Hibernate tutorial
Hibernate tutorialHibernate tutorial
Hibernate tutorial
 
Security Multitenant
Security MultitenantSecurity Multitenant
Security Multitenant
 
A first Draft to Java Configuration
A first Draft to Java ConfigurationA first Draft to Java Configuration
A first Draft to Java Configuration
 
Owner - Java properties reinvented.
Owner - Java properties reinvented.Owner - Java properties reinvented.
Owner - Java properties reinvented.
 
5050 dev nation
5050 dev nation5050 dev nation
5050 dev nation
 
Spring dependency injection
Spring dependency injectionSpring dependency injection
Spring dependency injection
 
Semantic Search Engines
Semantic Search EnginesSemantic Search Engines
Semantic Search Engines
 
Spring - Part 2 - Autowiring, Annotations, Java based Configuration - slides
Spring - Part 2 - Autowiring, Annotations, Java based Configuration - slidesSpring - Part 2 - Autowiring, Annotations, Java based Configuration - slides
Spring - Part 2 - Autowiring, Annotations, Java based Configuration - slides
 
Advance java session 5
Advance java session 5Advance java session 5
Advance java session 5
 
Spring 4 final xtr_presentation
Spring 4 final xtr_presentationSpring 4 final xtr_presentation
Spring 4 final xtr_presentation
 
Dependency Injection in Spring in 10min
Dependency Injection in Spring in 10minDependency Injection in Spring in 10min
Dependency Injection in Spring in 10min
 
JNDI
JNDIJNDI
JNDI
 
Spring 3.1
Spring 3.1Spring 3.1
Spring 3.1
 
Weblogic Administration Managed Server migration
Weblogic Administration Managed Server migrationWeblogic Administration Managed Server migration
Weblogic Administration Managed Server migration
 
Quiery builder
Quiery builderQuiery builder
Quiery builder
 
JPA and Coherence with TopLink Grid
JPA and Coherence with TopLink GridJPA and Coherence with TopLink Grid
JPA and Coherence with TopLink Grid
 
Hibernate jj
Hibernate jjHibernate jj
Hibernate jj
 

Andere mochten auch

Njug presentation
Njug presentationNjug presentation
Njug presentation
iwrigley
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 

Andere mochten auch (20)

Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohug
 
Njug presentation
Njug presentationNjug presentation
Njug presentation
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
 
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014
 
Hadoop 101 v1
Hadoop 101 v1Hadoop 101 v1
Hadoop 101 v1
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Cloudera cluster setup and configuration
Cloudera cluster setup and configurationCloudera cluster setup and configuration
Cloudera cluster setup and configuration
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data Processing
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Overview of Hadoop and HDFS
Overview of Hadoop and HDFSOverview of Hadoop and HDFS
Overview of Hadoop and HDFS
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 

Ähnlich wie Hadoop 101

Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete information
bhargavi804095
 

Ähnlich wie Hadoop 101 (20)

Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
 
Hadoop – big deal
Hadoop – big dealHadoop – big deal
Hadoop – big deal
 
Big Data Reverse Knowledge Transfer.pptx
Big Data Reverse Knowledge Transfer.pptxBig Data Reverse Knowledge Transfer.pptx
Big Data Reverse Knowledge Transfer.pptx
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
Anju
AnjuAnju
Anju
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete information
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 

Kürzlich hochgeladen

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Kürzlich hochgeladen (20)

Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Buy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptxBuy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 

Hadoop 101

  • 1. Introducing: The Modern Data Operating System
  • 2. Hadoop is ... A scalable fault tolerant distributed for data storage and processing (open source under the Apache license) - Core Hadoop has two main systems: ● Hadoop Distributed FileSystem (HDFS): self-healing, high-bandwidth clustered storage ● MapReduce: distributed fault-tolerant resource management and scheduling coupled with a scalable data programming abstraction
  • 5. Etymology ● Hadoop was created in 2004 by "Douglass (Doug) Cutting" ● Implemented Google Filesystem and Big Tables papers ● He aimed it, to index the internet in google style for startup search engine 'Nutch' ● Named it after his son's elephant shaped favourite toy named hadoop
  • 6. What is Big Data? "In Information Technology, big data is loosely defined term used to describe set so large and complex that they became awkward to work with using on-hand database management tools." Wikipedia
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. How big is big? ● 2008: Google processes 20PB a day ● 2012: Facebook ingests 500TB of data a day ● 2009: eBay has 6.5 PB user data + 50 TB a day ● 2011: Yahoo! has 180-200 PB of data
  • 13. Limitations of Existing Analytics Architecture Can't explore original raw data BI Reports + Online Apps RDBMS (aggregated data) ETL (Extract, Transfer & Load) Moving Data from storage to compute doesn't scale! Storage Grid Archiving = Premature death Mostly Append Data Collection Instrumentation (Raw Data Sources)
  • 14. Why Hadoop? Challenge: Read 1 TB of data 1 Machine - 4 IO channels - Each channel: 100 MB/s ? 45 minutes 10 Machines - 4 IO channels - Each channel: 100 MB/s 4.5 minutes ?
  • 16. The Key Benefit: Agility/Flexibility Schema-On-Write (RDBMS) Schema-On-Read (Hadoop) - Schema must be created before any data can be loaded - Data is simply copied to the file store, no transformations are needed - An explicit load operation has to take place which transforms data to DB internal structure - A SerDe (Serializer/Deserializer) is applied during read tume to extract the required column (late binding) - New columns must be be added explicitly before new data for such columns can be loaded into the database - New data can strat flowing anytime and will appear retroactively once the SerDe is updated to parse it - Reads are fast - Standards / Governance - Load is fast - Flexibility / Agility
  • 17. Hadoop Components Master/Slave Architecture Name Node Data Nodes Job Tracker Task Trackers
  • 18. r=3 NameNode File metadata: /kenshoo/data1.txt ---> 1,2,3 /kenshoo/data2.txt ---> 4,5 hdfs-site.xml dfs.replication 3 5 3 5 4 5 1 4 1 4 2 2 3 Data Nodes 1 2
  • 19. Underlying FS options ext3 - released in 2001 - Used by Yahoo! - bootstrap + format slow - set: - noatime - tune2fs (to turn off reserved blocks) ext4 - released in 2008 - Used by Google - Fast as XFS - set: - delayed allocation off -noatime - tune2fs (to turn off reserved blocks) XFS - released in 1993 - Fast - Drawbacks: - deleting large # of files
  • 20. Sample HDFS shell Commands bin/hadoop bin/hadoop bin/hadoop bin/hadoop bin/hadoop bin/hadoop bin/hadoop bin/hadoop bin/hadoop fs fs fs fs fs fs fs fs fs -ls -mkdir -copyFromLocal -copyToLocal -moveToLocal -rm -tail -chmod -setrep -w 4 -R /dir1/s-dir Mounting using FUSE: hadoop-fuse-dfs dfs://10.73.9.50 /hdfs
  • 21. Network Topology Yahoo! Installation Name Node Job Tracker HBase Master 2 2 3 3 3 4 4 4 5 Rack 1 2 5 5 Rack 2 Rack 3 - 8 core switches - 100 racks - 40 servers/rack - 1 GBit in rack - 10 GBit among racks -Total 11PB
  • 22. Rack Awareness NameNode Name Node Job Tracker metadata HBase Master file.txt = A 2 A 7 3 A 8 B 4 5 Rack 1 B Blk A: A DN: 2,7,8 13 B 9 10 Rack 2 12 14 15 Rack 3 Blk B: B DN: 9,12,14
  • 23. HDFS Writes Client NameNode Core metadata A B C file.txt = A Blk A: DN: 2,7,9 A A 2 3 8 A 4 5 Rack 1 7 9 10 Rack 2
  • 24. Reading Files File1.txt parts: Blk A: 2,7,8 Blk B: 9,12,14 wanna read file1.txt Client NameNode Core metadata file.txt = Blk A: A DN: 2,7,8 A 2 A 7 3 A 8 B 4 5 Rack 1 B 13 B 9 10 Rack 2 12 14 15 Rack 3 Blk B: B DN: 9,12,14