The Data Center and Hadoop

The Data Center and Hadoop

Jacob Rapp, Cisco

jarapp@cisco.com

Hadoop Considerations
•

Traffic Types, Job Patterns, Network Considerations, Compute

Network Integration
•

Co-exist with current Data Center infrastructure

•

Open, Programmable and Application-Aware Networks

Multi-tenancy
• Remove the “Silo clusters”

2

Reduce
Ingress vs.
Egress
Data Set

Analyze

1:0.3
The Time the reducers
start is dependent on:

Reduce

Extract Transform Load
(ETL)

Ingress vs.
Egress
Data Set

mapred.reduce.slowstart.co
mpleted.maps
It doesn’t change the amount
of data sent to Reducers, but
may change the timing to
send that data

1:1
Reduce

Explode

Ingress vs.
Egress
Data Set

1:2
4

Small Flows/Messaging
(Admin Related, Heart-beats, Keep-alive,
delay sensitive application messaging)

Small – Medium Incast
(Hadoop Shuffle)

Large Flows
(HDFS Ingest)

Large Incast
(Hadoop Replication)

5

NameNode
JobTracker
ZooKeeper

Many-to-Many Traffic Pattern

Map 1

Map 2

Map 3

Map N

Shuffle

Reducer 1

Reducer 2

Reducer 3

Reducer N
Output
Replication

HDFS

6

Job Patterns have varying impact on network utilization
Analyze
Simulated with
Shakespeare
Wordcount

(ETL)
Simulated with
Yahoo TeraSort

(ETL)
Simulated with
Yahoo TeraSort with output
replication

Integration Considerations
 Network Attributes
 Architecture
 Availability
 Capacity, Scale &
Oversubscription
 Flexibility
 Management & Visibility

9

Generally 1G is being used largely due to the cost/performance trade-offs.
Though 10GE can provide benefits depending on workload

Single 1GE
100% Utilized

Dual 1GE
75% Utilized

10GE
40% Utilized

10

• No single point of failure from network view point. No impact on job completion time
• NIC bonding configured at Linux – with LACP mode of bonding

• Effective load-sharing of traffic flow on two NICs.
• Recommended to change the hashing to src-dst-ip-port (both network and NIC bonding in

Linux) for optimal load-sharing

11

1GE vs. 10GE Buffer Usage

1
13
25
37
49
61
73
85
97
109
121
133
145
157
169
181
193
205
217
229
241
253
265
277
289
301
313
325
337
349
361
373
385
397
409
421
433
445
457
469
481
493
505
517
529
541
553
565
577
589
601
613
625
637
649
661
673
685
697
709
721
733
745
757
769
781
793

Cell Usage

Job Completion

Moving from 1GE to 10GE actually lowers the buffer requirement at the switching layer.

1G Buffer Used

10G Buffer Used

1G Map %

1G Reduce %

10G Map %

10G Reduce %

By moving to 10GE, the data node has a wider pipe to receive data lessening the
need for buffers on the network as the total aggregate transfer rate and amount of
data does not increase substantially. This is due, in part, to limits of I/O and
Compute capabilities
12

Findings

Goals

• 10G and/or Dual attached server provides

• Extensive Validation of

Hadoop Workload
• Reference Architecture
Make it easy for Enterprise
Demystify Network for Hadoop
Deployment
Integration with Enterprise
with efficient choices of
network topology/devices

More Details From Hadoop
Summit 2012 at:

consistent job completion time & better buffer
utilization
• 10G provide reduce burst at the access layer
• Dual Attached Sever is recommended design –

1G or 10G. 10G for future proofing
• Rack failure has the biggest impact on job

completion time
• Does not require non-blocking network
• Latency does not matter much in Hadoop

workloads

http://www.slideshare.net/Hadoop_Summit/ref-arch-validated-and-tested-approach-to-define-a-network-design
http://youtu.be/YJODsK0T67A

13

n3548-001# show interface brief
-------------------------------------------------------------------------------Ethernet
VLAN
Type Mode
Status Reason
Speed
Port
Interface
Ch #
-------------------------------------------------------------------------------Eth1/1
1
eth access up
none
10G(D) -Eth1/2
1
eth access up
none
10G(D) -Eth1/3
1
eth access up
none
10G(D) -Eth1/4
1
eth access up
none
10G(D) -Eth1/5
1
eth access up
none
10G(D) –.
.
Eth1/33
1
eth access up
none
10G(D) -Eth1/34
1
eth access up
none
10G(D) -Eth1/35
1
eth access down
SFP not inserted
10G(D) -Eth1/36
1
eth access down
SFP not inserted
10G(D) -Eth1/37
1
eth access down
Administratively down
10G(D) –
.

© 2011 Cisco and/or its affiliates. All rights reserved.

Cisco Confidential

15

n3548-001# show mac address-table dynamic
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay
MAC
age - seconds since first seen,+ - primary entry using vPC PeerLink
MAC Addresses
VLAN
MAC Address
Type
age
Secure NTFY
Ports
of the connected
---------+-----------------+--------+---------+------+----+----------------devices … and
* 1
e8b7.484d.a208
dynamic
60570
F Eth1/31
the port they are F
* 1
e8b7.484d.a20a
dynamic
60560
F
F Eth1/31
on…
* 1
e8b7.484d.a73e
dynamic
60560
F
F Eth1/34
* 1
e8b7.484d.a740
dynamic
60560
F
F Eth1/34
* 1
e8b7.484d.ad15
dynamic
60560
F
F Eth1/28
* 1
e8b7.484d.ad17
dynamic
60560
F
F Eth1/28
* 1
e8b7.484d.b3e9
dynamic
60570
F
F Eth1/25
* 1
e8b7.484d.b3eb
dynamic
60560
F
F Eth1/25
.
.


Cisco Confidential

16

n3548-001# portServerMap
=======================================
Port
Server FQDN
--------------------------------------Eth1/1 c200-m2-10g2-001.cluster10g.com
Eth1/2 c200-m2-10g2-002.cluster10g.com
.
.
.


Cisco Confidential

17

n3548-001# trackerList
===========================================
Port
Server
Server Port
------------------------------------------Eth1/2 c200-m2-10g2-002
50544
Eth1/3 c200-m2-10g2-003
41909
Eth1/4 c200-m2-10g2-004
36480
Eth1/5 c200-m2-10g2-005
38179
Eth1/6 c200-m2-10g2-006
51375
Eth1/7 c200-m2-10g2-031
41915
Eth1/8 c200-m2-10g2-008
50983
Eth1/9 c200-m2-10g2-009
37056
Eth1/11 c200-m2-10g2-011
35882
Eth1/12 c200-m2-10g2-012
44551
.
.
.


Cisco Confidential

18

n3548-001# bufferServerMap
===================================================================
Port
Server
1sec
5sec
60sec
5min
1hr
------------------------------------------------------------------Eth1/1 c200-m2-10g2-001
0KB
0KB
0KB
0KB
0KB
Eth1/2 c200-m2-10g2-002
384KB
384KB
1536KB 2304KB 2304KB
Eth1/3 c200-m2-10g2-003
384KB
384KB
1152KB 1536KB 1536KB
Eth1/4 c200-m2-10g2-004
384KB
384KB
2304KB 2304KB 2304KB
Eth1/5 c200-m2-10g2-005
384KB
384KB
768KB
1536KB 1536KB
Eth1/6 c200-m2-10g2-006
384KB
2304KB 2304KB 2304KB 2304KB
Eth1/7 c200-m2-10g2-031
384KB
384KB
3456KB 3840KB 3840KB
Eth1/8 c200-m2-10g2-008
768KB
768KB
2688KB 2688KB 2688KB
Eth1/9 c200-m2-10g2-009
384KB
384KB
2304KB 2304KB 2304KB
Eth1/11 c200-m2-10g2-011
384KB
384KB
1920KB 1920KB 1920KB
.
.
Eth1/1(c200-m2-10g2-001)
.

has 0 buffer usage because
it’s the name node


Cisco Confidential

19

n3548-001# jobsBuffer
Hadoop Job Info ...
What jobs were running
===================================================================
during peak buffer usage
1 jobs currently running
JobId
RunTime(secs)
User
Priority … and for how long were
job_201306131423_0009
120
hadoop NORMAL
they running
===================================================================
Buffer Info - Per Port
Port
Server
1sec
5sec
60sec
5min
1hr
------------------------------------------------------------------Eth1/1 c200-m2-10g2-001
0KB
0KB
0KB
0KB
0KB
Eth1/2 c200-m2-10g2-002
384KB
384KB
768KB
768KB
768KB
Eth1/3 c200-m2-10g2-003
384KB
384KB
1152KB 1152KB 1152KB
Eth1/4 c200-m2-10g2-004
384KB
1536KB 1536KB 1536KB 1536KB
Eth1/5 c200-m2-10g2-005
384KB
768KB
1152KB 1152KB 1152KB
.
.


Cisco Confidential

20

n3548-001(config)# jobsBuffer
Hadoop Job Info ...
Historic look at
=================================================================== the
0 jobs currently running
buffer usage …
JobId
RunTime(secs)
User
Priority
===================================================================
Buffer Info - Per Port
Port
Server
1sec
5sec
60sec
5min
1hr
------------------------------------------------------------------Eth1/1 c200-m2-10g2-001
0KB
0KB
0KB
0KB
0KB
Eth1/2 c200-m2-10g2-002
0KB
0KB
0KB
1920KB 1920KB
Eth1/3 c200-m2-10g2-003
0KB
0KB
0KB
2304KB 2304KB
Eth1/4 c200-m2-10g2-004
0KB
0KB
0KB
2688KB 2688KB
Eth1/5 c200-m2-10g2-005
0KB
0KB
0KB
2304KB 2304KB
Eth1/6 c200-m2-10g2-006
0KB
0KB
0KB
2304KB 2304KB
Eth1/7 c200-m2-10g2-031
0KB
0KB
0KB
1920KB 2688KB
.


Cisco Confidential

21


Cisco Confidential

22


Cisco Confidential

23


Cisco Confidential

24

Buffer Usage

Shuffle

Replication

Reduce

Map

0

60

120

180


240

300

360

420

480

540

600

660

720

780

Cisco Confidential

25

github.com/datacenter

PTP Grandmaster
(OPTIONAL)

Push Data

Push Data

Push Data

Analyze

(Python Socket)


Cisco Confidential

26

Various Multitenant Environments

 Hadoop + HBASE

Need to understand
Traffic Patterns

 Job Based

Scheduling
Dependent

 Department Based

Permissions and
Scheduling
Dependent

28

Client

Client
Update

Read

Update

Map 1

Map 2

Map 3

Read

Region
Server

Map N

Shuffle

Region
Server

Read
Read

Reducer
1

Reducer
2

Reducer
3

Reducer
N

Major
Compaction

Major
Compaction

Output
Replication

HDFS

29

Hbase During Major Compaction
9000
8000

~45% for Read
Improvement

Latency (us)

7000
6000

Read/Update
Latency
Comparison of NonQoS vs. QoS Policy

5000
4000
3000
2000
1000
0

Time
UPDATE - Average Latency (us)

READ - Average Latency (us)

QoS - UPDATE - Average Latency (us)

QoS - READ - Average Latency (us)

Switch Buffer
Usage
With Network QoS
Policy to prioritize
Hbase Update/Read
Operations

30

Hbase + Hadoop Map Reduce
40000
35000

Latency (us)

30000

Read/Update
Latency
Comparison of NonQoS vs. QoS Policy

25000

~60% for Read
Improvement

20000
15000
10000
5000
0

Time
READ - Average Latency (us)

QoS - UPDATE - Average Latency (us)

QoS - READ - Average Latency (us)

1
70
139
208
277
346
415
484
553
622
691
760
829
898
967
1036
1105
1174
1243
1312
1381
1450
1519
1588
1657
1726
1795
1864
1933
2002
2071
2140
2209
2278
2347
2416
2485
2554
2623
2692
2761
2830
2899
2968
3037
3106
3175
3244
3313
3382
3451
3520
3589
3658
3727
3796
3865
3934
4003
4072
4141
4210
4279
4348
4417
4486
4555
4624
4693
4762
4831
4900
4969
5038
5107
5176
5245
5314
5383
5452
5521
5590
5659
5728
5797
5866
5935

Buffer Used

UPDATE - Average Latency (us)

Timeline
Hadoop TeraSort

Hbase

Switch Buffer
Usage
With Network QoS
Policy to prioritize
Hbase Update/Read
Operations

THANK YOU FOR LISTENING
Cisco.com Big Data
www.cisco.com/go/bigdata
Data Center Script Examples from
Presentation:
github.com/datacenter

Cisco Unified Data Center

UNIFIED
FABRIC

UNIFIED
COMPUTING

Highly Scalable, Secure
Network Fabric

Modular Stateless
Computing Elements

www.cisco.com/go/nexus

www.cisco.com/go/ucs

UNIFIED
MANAGEMENT
Automated
Management

Manages Enterprise
Workloads

http://www.cisco.com/go/wor
kloadautomation

The Data Center and Hadoop

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (11)

Ähnlich wie The Data Center and Hadoop

Ähnlich wie The Data Center and Hadoop (20)

Mehr von Michael Zhang

Mehr von Michael Zhang (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

The Data Center and Hadoop

Hinweis der Redaktion