SlideShare ist ein Scribd-Unternehmen logo
1 von 32
The Data Center and Hadoop

Jacob Rapp, Cisco

jarapp@cisco.com
Hadoop Considerations
•

Traffic Types, Job Patterns, Network Considerations, Compute

Network Integration
•

Co-exist with current Data Center infrastructure

•

Open, Programmable and Application-Aware Networks

Multi-tenancy
• Remove the “Silo clusters”

2
3
Reduce
Ingress vs.
Egress
Data Set

Analyze

1:0.3
The Time the reducers
start is dependent on:

Reduce

Extract Transform Load
(ETL)

Ingress vs.
Egress
Data Set

mapred.reduce.slowstart.co
mpleted.maps
It doesn’t change the amount
of data sent to Reducers, but
may change the timing to
send that data

1:1
Reduce

Explode

Ingress vs.
Egress
Data Set

1:2
4
Small Flows/Messaging
(Admin Related, Heart-beats, Keep-alive,
delay sensitive application messaging)

Small – Medium Incast
(Hadoop Shuffle)

Large Flows
(HDFS Ingest)

Large Incast
(Hadoop Replication)

5
NameNode
JobTracker
ZooKeeper

Many-to-Many Traffic Pattern

Map 1

Map 2

Map 3

Map N

Shuffle

Reducer 1

Reducer 2

Reducer 3

Reducer N
Output
Replication

HDFS

6
Job Patterns have varying impact on network utilization
Analyze
Simulated with
Shakespeare
Wordcount

Extract Transform Load
(ETL)
Simulated with
Yahoo TeraSort

Extract Transform Load
(ETL)
Simulated with
Yahoo TeraSort with output
replication
8
Integration Considerations
 Network Attributes
 Architecture
 Availability
 Capacity, Scale &
Oversubscription
 Flexibility
 Management & Visibility

9
Generally 1G is being used largely due to the cost/performance trade-offs.
Though 10GE can provide benefits depending on workload

Single 1GE
100% Utilized

Dual 1GE
75% Utilized

10GE
40% Utilized

10
• No single point of failure from network view point. No impact on job completion time
• NIC bonding configured at Linux – with LACP mode of bonding

• Effective load-sharing of traffic flow on two NICs.
• Recommended to change the hashing to src-dst-ip-port (both network and NIC bonding in

Linux) for optimal load-sharing

11
1GE vs. 10GE Buffer Usage

1
13
25
37
49
61
73
85
97
109
121
133
145
157
169
181
193
205
217
229
241
253
265
277
289
301
313
325
337
349
361
373
385
397
409
421
433
445
457
469
481
493
505
517
529
541
553
565
577
589
601
613
625
637
649
661
673
685
697
709
721
733
745
757
769
781
793

Cell Usage

Job Completion

Moving from 1GE to 10GE actually lowers the buffer requirement at the switching layer.

1G Buffer Used

10G Buffer Used

1G Map %

1G Reduce %

10G Map %

10G Reduce %

By moving to 10GE, the data node has a wider pipe to receive data lessening the
need for buffers on the network as the total aggregate transfer rate and amount of
data does not increase substantially. This is due, in part, to limits of I/O and
Compute capabilities
12
Findings

Goals

• 10G and/or Dual attached server provides

• Extensive Validation of

Hadoop Workload
• Reference Architecture
Make it easy for Enterprise
Demystify Network for Hadoop
Deployment
Integration with Enterprise
with efficient choices of
network topology/devices

More Details From Hadoop
Summit 2012 at:

consistent job completion time & better buffer
utilization
• 10G provide reduce burst at the access layer
• Dual Attached Sever is recommended design –

1G or 10G. 10G for future proofing
• Rack failure has the biggest impact on job

completion time
• Does not require non-blocking network
• Latency does not matter much in Hadoop

workloads

http://www.slideshare.net/Hadoop_Summit/ref-arch-validated-and-tested-approach-to-define-a-network-design
http://youtu.be/YJODsK0T67A

13
14
n3548-001# show interface brief
-------------------------------------------------------------------------------Ethernet
VLAN
Type Mode
Status Reason
Speed
Port
Interface
Ch #
-------------------------------------------------------------------------------Eth1/1
1
eth access up
none
10G(D) -Eth1/2
1
eth access up
none
10G(D) -Eth1/3
1
eth access up
none
10G(D) -Eth1/4
1
eth access up
none
10G(D) -Eth1/5
1
eth access up
none
10G(D) –.
.
Eth1/33
1
eth access up
none
10G(D) -Eth1/34
1
eth access up
none
10G(D) -Eth1/35
1
eth access down
SFP not inserted
10G(D) -Eth1/36
1
eth access down
SFP not inserted
10G(D) -Eth1/37
1
eth access down
Administratively down
10G(D) –
.

© 2011 Cisco and/or its affiliates. All rights reserved.

Cisco Confidential

15
n3548-001# show mac address-table dynamic
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay
MAC
age - seconds since first seen,+ - primary entry using vPC PeerLink
MAC Addresses
VLAN
MAC Address
Type
age
Secure NTFY
Ports
of the connected
---------+-----------------+--------+---------+------+----+----------------devices … and
* 1
e8b7.484d.a208
dynamic
60570
F Eth1/31
the port they are F
* 1
e8b7.484d.a20a
dynamic
60560
F
F Eth1/31
on…
* 1
e8b7.484d.a73e
dynamic
60560
F
F Eth1/34
* 1
e8b7.484d.a740
dynamic
60560
F
F Eth1/34
* 1
e8b7.484d.ad15
dynamic
60560
F
F Eth1/28
* 1
e8b7.484d.ad17
dynamic
60560
F
F Eth1/28
* 1
e8b7.484d.b3e9
dynamic
60570
F
F Eth1/25
* 1
e8b7.484d.b3eb
dynamic
60560
F
F Eth1/25
.
.

© 2011 Cisco and/or its affiliates. All rights reserved.

Cisco Confidential

16
n3548-001# portServerMap
=======================================
Port
Server FQDN
--------------------------------------Eth1/1 c200-m2-10g2-001.cluster10g.com
Eth1/2 c200-m2-10g2-002.cluster10g.com
Eth1/3 c200-m2-10g2-003.cluster10g.com
Eth1/4 c200-m2-10g2-004.cluster10g.com
Eth1/5 c200-m2-10g2-005.cluster10g.com
Eth1/6 c200-m2-10g2-006.cluster10g.com
Eth1/7 c200-m2-10g2-031.cluster10g.com
Eth1/8 c200-m2-10g2-008.cluster10g.com
Eth1/9 c200-m2-10g2-009.cluster10g.com
Eth1/11 c200-m2-10g2-011.cluster10g.com
.
.
.

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Confidential

17
n3548-001# trackerList
===========================================
Port
Server
Server Port
------------------------------------------Eth1/2 c200-m2-10g2-002
50544
Eth1/3 c200-m2-10g2-003
41909
Eth1/4 c200-m2-10g2-004
36480
Eth1/5 c200-m2-10g2-005
38179
Eth1/6 c200-m2-10g2-006
51375
Eth1/7 c200-m2-10g2-031
41915
Eth1/8 c200-m2-10g2-008
50983
Eth1/9 c200-m2-10g2-009
37056
Eth1/11 c200-m2-10g2-011
35882
Eth1/12 c200-m2-10g2-012
44551
.
.
.

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Confidential

18
n3548-001# bufferServerMap
===================================================================
Port
Server
1sec
5sec
60sec
5min
1hr
------------------------------------------------------------------Eth1/1 c200-m2-10g2-001
0KB
0KB
0KB
0KB
0KB
Eth1/2 c200-m2-10g2-002
384KB
384KB
1536KB 2304KB 2304KB
Eth1/3 c200-m2-10g2-003
384KB
384KB
1152KB 1536KB 1536KB
Eth1/4 c200-m2-10g2-004
384KB
384KB
2304KB 2304KB 2304KB
Eth1/5 c200-m2-10g2-005
384KB
384KB
768KB
1536KB 1536KB
Eth1/6 c200-m2-10g2-006
384KB
2304KB 2304KB 2304KB 2304KB
Eth1/7 c200-m2-10g2-031
384KB
384KB
3456KB 3840KB 3840KB
Eth1/8 c200-m2-10g2-008
768KB
768KB
2688KB 2688KB 2688KB
Eth1/9 c200-m2-10g2-009
384KB
384KB
2304KB 2304KB 2304KB
Eth1/11 c200-m2-10g2-011
384KB
384KB
1920KB 1920KB 1920KB
.
.
Eth1/1(c200-m2-10g2-001)
.

has 0 buffer usage because
it’s the name node

© 2011 Cisco and/or its affiliates. All rights reserved.

Cisco Confidential

19
n3548-001# jobsBuffer
Hadoop Job Info ...
What jobs were running
===================================================================
during peak buffer usage
1 jobs currently running
JobId
RunTime(secs)
User
Priority … and for how long were
job_201306131423_0009
120
hadoop NORMAL
they running
===================================================================
Buffer Info - Per Port
Port
Server
1sec
5sec
60sec
5min
1hr
------------------------------------------------------------------Eth1/1 c200-m2-10g2-001
0KB
0KB
0KB
0KB
0KB
Eth1/2 c200-m2-10g2-002
384KB
384KB
768KB
768KB
768KB
Eth1/3 c200-m2-10g2-003
384KB
384KB
1152KB 1152KB 1152KB
Eth1/4 c200-m2-10g2-004
384KB
1536KB 1536KB 1536KB 1536KB
Eth1/5 c200-m2-10g2-005
384KB
768KB
1152KB 1152KB 1152KB
.
.

© 2011 Cisco and/or its affiliates. All rights reserved.

Cisco Confidential

20
n3548-001(config)# jobsBuffer
Hadoop Job Info ...
Historic look at
=================================================================== the
0 jobs currently running
buffer usage …
JobId
RunTime(secs)
User
Priority
===================================================================
Buffer Info - Per Port
Port
Server
1sec
5sec
60sec
5min
1hr
------------------------------------------------------------------Eth1/1 c200-m2-10g2-001
0KB
0KB
0KB
0KB
0KB
Eth1/2 c200-m2-10g2-002
0KB
0KB
0KB
1920KB 1920KB
Eth1/3 c200-m2-10g2-003
0KB
0KB
0KB
2304KB 2304KB
Eth1/4 c200-m2-10g2-004
0KB
0KB
0KB
2688KB 2688KB
Eth1/5 c200-m2-10g2-005
0KB
0KB
0KB
2304KB 2304KB
Eth1/6 c200-m2-10g2-006
0KB
0KB
0KB
2304KB 2304KB
Eth1/7 c200-m2-10g2-031
0KB
0KB
0KB
1920KB 2688KB
.

© 2011 Cisco and/or its affiliates. All rights reserved.

Cisco Confidential

21
© 2011 Cisco and/or its affiliates. All rights reserved.

Cisco Confidential

22
© 2011 Cisco and/or its affiliates. All rights reserved.

Cisco Confidential

23
© 2011 Cisco and/or its affiliates. All rights reserved.

Cisco Confidential

24
Buffer Usage

Shuffle

Replication

Reduce

Map

0

60

120

180

© 2011 Cisco and/or its affiliates. All rights reserved.

240

300

360

420

480

540

600

660

720

780

Cisco Confidential

25
github.com/datacenter

PTP Grandmaster
(OPTIONAL)

Push Data

Push Data

Push Data

Analyze

(Python Socket)

© 2011 Cisco and/or its affiliates. All rights reserved.

Cisco Confidential

26
27
Various Multitenant Environments

 Hadoop + HBASE

Need to understand
Traffic Patterns

 Job Based

Scheduling
Dependent

 Department Based

Permissions and
Scheduling
Dependent

28
Client

Client
Update

Read

Update

Map 1

Map 2

Map 3

Read

Region
Server

Map N

Shuffle

Region
Server

Read
Read

Reducer
1

Reducer
2

Reducer
3

Reducer
N

Major
Compaction

Major
Compaction

Output
Replication

HDFS

29
Hbase During Major Compaction
9000
8000

~45% for Read
Improvement

Latency (us)

7000
6000

Read/Update
Latency
Comparison of NonQoS vs. QoS Policy

5000
4000
3000
2000
1000
0

Time
UPDATE - Average Latency (us)

READ - Average Latency (us)

QoS - UPDATE - Average Latency (us)

QoS - READ - Average Latency (us)

Switch Buffer
Usage
With Network QoS
Policy to prioritize
Hbase Update/Read
Operations

30
Hbase + Hadoop Map Reduce
40000
35000

Latency (us)

30000

Read/Update
Latency
Comparison of NonQoS vs. QoS Policy

25000

~60% for Read
Improvement

20000
15000
10000
5000
0

Time
READ - Average Latency (us)

QoS - UPDATE - Average Latency (us)

QoS - READ - Average Latency (us)

1
70
139
208
277
346
415
484
553
622
691
760
829
898
967
1036
1105
1174
1243
1312
1381
1450
1519
1588
1657
1726
1795
1864
1933
2002
2071
2140
2209
2278
2347
2416
2485
2554
2623
2692
2761
2830
2899
2968
3037
3106
3175
3244
3313
3382
3451
3520
3589
3658
3727
3796
3865
3934
4003
4072
4141
4210
4279
4348
4417
4486
4555
4624
4693
4762
4831
4900
4969
5038
5107
5176
5245
5314
5383
5452
5521
5590
5659
5728
5797
5866
5935

Buffer Used

UPDATE - Average Latency (us)

Timeline
Hadoop TeraSort

Hbase

Switch Buffer
Usage
With Network QoS
Policy to prioritize
Hbase Update/Read
Operations
THANK YOU FOR LISTENING
Cisco.com Big Data
www.cisco.com/go/bigdata
Data Center Script Examples from
Presentation:
github.com/datacenter

Cisco Unified Data Center

UNIFIED
FABRIC

UNIFIED
COMPUTING

Highly Scalable, Secure
Network Fabric

Modular Stateless
Computing Elements

www.cisco.com/go/nexus

www.cisco.com/go/ucs

UNIFIED
MANAGEMENT
Automated
Management

Manages Enterprise
Workloads

http://www.cisco.com/go/wor
kloadautomation

Weitere ähnliche Inhalte

Was ist angesagt?

Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Futureinside-BigData.com
 
Performance Aware SDN, LSPE talk
Performance Aware SDN, LSPE talkPerformance Aware SDN, LSPE talk
Performance Aware SDN, LSPE talknetvis
 
Data center network architectures v1.3
Data center network architectures v1.3Data center network architectures v1.3
Data center network architectures v1.3Jeong, Wookjae
 
Ipv6 deployment at the university of warwick - networkshop44
Ipv6 deployment at the university of warwick - networkshop44Ipv6 deployment at the university of warwick - networkshop44
Ipv6 deployment at the university of warwick - networkshop44Jisc
 
Keep your Hadoop cluster at its best!
Keep your Hadoop cluster at its best!Keep your Hadoop cluster at its best!
Keep your Hadoop cluster at its best!Sheetal Dolas
 
Benchmark: Bananas vs Spark Streaming
Benchmark: Bananas vs Spark StreamingBenchmark: Bananas vs Spark Streaming
Benchmark: Bananas vs Spark StreamingAKUDA Labs
 
DevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksDevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksJason TC HOU (侯宗成)
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobileDataWorks Summit
 
Why is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier LeauteWhy is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier LeauteDatabricks
 
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANNetwork for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANDataWorks Summit/Hadoop Summit
 
L3DSR - Overcoming Layer 2 Limitations of Direct Server Return Load Balancing
L3DSR - Overcoming Layer 2 Limitations of Direct Server Return Load BalancingL3DSR - Overcoming Layer 2 Limitations of Direct Server Return Load Balancing
L3DSR - Overcoming Layer 2 Limitations of Direct Server Return Load BalancingJan Schaumann
 
Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and futureCodemotion
 
Generic Resource Manager - László Vadkerti, András Kovács
Generic Resource Manager - László Vadkerti, András KovácsGeneric Resource Manager - László Vadkerti, András Kovács
Generic Resource Manager - László Vadkerti, András Kovácsharryvanhaaren
 
SDN Traffic Engineering, A Natural Evolution
SDN Traffic Engineering, A Natural EvolutionSDN Traffic Engineering, A Natural Evolution
SDN Traffic Engineering, A Natural EvolutionAPNIC
 
Network Application Performance
Network Application PerformanceNetwork Application Performance
Network Application PerformanceShumon Huque
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Zbigniew Jerzak
 
Capital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 msCapital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 msApache Apex
 
DPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. MeltonDPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. Meltonharryvanhaaren
 

Was ist angesagt? (20)

Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
 
Performance Aware SDN, LSPE talk
Performance Aware SDN, LSPE talkPerformance Aware SDN, LSPE talk
Performance Aware SDN, LSPE talk
 
Data center network architectures v1.3
Data center network architectures v1.3Data center network architectures v1.3
Data center network architectures v1.3
 
Mcserviceguard2
Mcserviceguard2Mcserviceguard2
Mcserviceguard2
 
Ipv6 deployment at the university of warwick - networkshop44
Ipv6 deployment at the university of warwick - networkshop44Ipv6 deployment at the university of warwick - networkshop44
Ipv6 deployment at the university of warwick - networkshop44
 
Keep your Hadoop cluster at its best!
Keep your Hadoop cluster at its best!Keep your Hadoop cluster at its best!
Keep your Hadoop cluster at its best!
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Benchmark: Bananas vs Spark Streaming
Benchmark: Bananas vs Spark StreamingBenchmark: Bananas vs Spark Streaming
Benchmark: Bananas vs Spark Streaming
 
DevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksDevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance Networks
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
Why is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier LeauteWhy is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier Leaute
 
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANNetwork for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
 
L3DSR - Overcoming Layer 2 Limitations of Direct Server Return Load Balancing
L3DSR - Overcoming Layer 2 Limitations of Direct Server Return Load BalancingL3DSR - Overcoming Layer 2 Limitations of Direct Server Return Load Balancing
L3DSR - Overcoming Layer 2 Limitations of Direct Server Return Load Balancing
 
Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and future
 
Generic Resource Manager - László Vadkerti, András Kovács
Generic Resource Manager - László Vadkerti, András KovácsGeneric Resource Manager - László Vadkerti, András Kovács
Generic Resource Manager - László Vadkerti, András Kovács
 
SDN Traffic Engineering, A Natural Evolution
SDN Traffic Engineering, A Natural EvolutionSDN Traffic Engineering, A Natural Evolution
SDN Traffic Engineering, A Natural Evolution
 
Network Application Performance
Network Application PerformanceNetwork Application Performance
Network Application Performance
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
 
Capital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 msCapital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 ms
 
DPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. MeltonDPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. Melton
 

Andere mochten auch

HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkHA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkSteve Loughran
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Sumeet Singh
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User ReferenceBiju Nair
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
 
Data Center Network Topologies
Data Center Network TopologiesData Center Network Topologies
Data Center Network Topologiesrjain51
 
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyHDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyDataWorks Summit
 

Andere mochten auch (11)

HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkHA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talk
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
 
Hadoop Internals
Hadoop InternalsHadoop Internals
Hadoop Internals
 
Aioug big data and hadoop
Aioug  big data and hadoopAioug  big data and hadoop
Aioug big data and hadoop
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
 
Data Center Network Topologies
Data Center Network TopologiesData Center Network Topologies
Data Center Network Topologies
 
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyHDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
 

Ähnlich wie The Data Center and Hadoop

BigData Clusters Redefined
BigData Clusters RedefinedBigData Clusters Redefined
BigData Clusters RedefinedDataWorks Summit
 
"Morphology of Modern Data Center Networks: Overview". Dinesh Dutt, Cumulus N...
"Morphology of Modern Data Center Networks: Overview". Dinesh Dutt, Cumulus N..."Morphology of Modern Data Center Networks: Overview". Dinesh Dutt, Cumulus N...
"Morphology of Modern Data Center Networks: Overview". Dinesh Dutt, Cumulus N...Yandex
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Toronto-Oracle-Users-Group
 
Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...
Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...
Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...Nelson Calero
 
Oracle Drivers configuration for High Availability, is it a developer's job?
Oracle Drivers configuration for High Availability, is it a developer's job?Oracle Drivers configuration for High Availability, is it a developer's job?
Oracle Drivers configuration for High Availability, is it a developer's job?Ludovico Caldara
 
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]Markus Michalewicz
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL ServerStephen Rose
 
Taming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsTaming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsEMC
 
Task allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed systemTask allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed systemDeepak Shankar
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkDatabricks
 
New Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceNew Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceAnil Nair
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico
 
Oracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodLudovico Caldara
 
Bruno Decraene - Improving network availability through the graceful shutdown...
Bruno Decraene - Improving network availability through the graceful shutdown...Bruno Decraene - Improving network availability through the graceful shutdown...
Bruno Decraene - Improving network availability through the graceful shutdown...PROIDEA
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreDataStax Academy
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesAlexander Penev
 
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...Zahid Anwar (OCM)
 
Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15Bobby Curtis
 

Ähnlich wie The Data Center and Hadoop (20)

BigData Clusters Redefined
BigData Clusters RedefinedBigData Clusters Redefined
BigData Clusters Redefined
 
"Morphology of Modern Data Center Networks: Overview". Dinesh Dutt, Cumulus N...
"Morphology of Modern Data Center Networks: Overview". Dinesh Dutt, Cumulus N..."Morphology of Modern Data Center Networks: Overview". Dinesh Dutt, Cumulus N...
"Morphology of Modern Data Center Networks: Overview". Dinesh Dutt, Cumulus N...
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
 
Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...
Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...
Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...
 
Oracle Drivers configuration for High Availability, is it a developer's job?
Oracle Drivers configuration for High Availability, is it a developer's job?Oracle Drivers configuration for High Availability, is it a developer's job?
Oracle Drivers configuration for High Availability, is it a developer's job?
 
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
 
Taming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsTaming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data Analytics
 
Task allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed systemTask allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed system
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache Spark
 
New Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceNew Generation Oracle RAC Performance
New Generation Oracle RAC Performance
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 
Oracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The Hood
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
Bruno Decraene - Improving network availability through the graceful shutdown...
Bruno Decraene - Improving network availability through the graceful shutdown...Bruno Decraene - Improving network availability through the graceful shutdown...
Bruno Decraene - Improving network availability through the graceful shutdown...
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...
 
Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15
 
Performance vision Version 2.15 news
Performance vision Version 2.15 newsPerformance vision Version 2.15 news
Performance vision Version 2.15 news
 

Mehr von Michael Zhang

廣告系統在Docker/Mesos上的可靠性實踐
廣告系統在Docker/Mesos上的可靠性實踐廣告系統在Docker/Mesos上的可靠性實踐
廣告系統在Docker/Mesos上的可靠性實踐Michael Zhang
 
HKIX Upgrade to 100Gbps-Based Two-Tier Architecture
HKIX Upgrade to 100Gbps-Based Two-Tier ArchitectureHKIX Upgrade to 100Gbps-Based Two-Tier Architecture
HKIX Upgrade to 100Gbps-Based Two-Tier ArchitectureMichael Zhang
 
2014 GITC 帶上數據去創業 talkingdata—高铎
 2014 GITC 帶上數據去創業 talkingdata—高铎 2014 GITC 帶上數據去創業 talkingdata—高铎
2014 GITC 帶上數據去創業 talkingdata—高铎Michael Zhang
 
Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket LinxiaofengMichael Zhang
 
2014 Hpocon 李志刚 1号店 - puppet在1号店的实践
2014 Hpocon 李志刚   1号店 - puppet在1号店的实践2014 Hpocon 李志刚   1号店 - puppet在1号店的实践
2014 Hpocon 李志刚 1号店 - puppet在1号店的实践Michael Zhang
 
2014 Hpocon 姚仁捷 唯品会 - data driven ops
2014 Hpocon 姚仁捷   唯品会 - data driven ops2014 Hpocon 姚仁捷   唯品会 - data driven ops
2014 Hpocon 姚仁捷 唯品会 - data driven opsMichael Zhang
 
2014 Hpocon 高驰涛 云智慧 - apm在高性能架构中的应用
2014 Hpocon 高驰涛   云智慧 - apm在高性能架构中的应用2014 Hpocon 高驰涛   云智慧 - apm在高性能架构中的应用
2014 Hpocon 高驰涛 云智慧 - apm在高性能架构中的应用Michael Zhang
 
2014 Hpocon 黄慧攀 upyun - 平台架构的服务监控
2014 Hpocon 黄慧攀   upyun - 平台架构的服务监控2014 Hpocon 黄慧攀   upyun - 平台架构的服务监控
2014 Hpocon 黄慧攀 upyun - 平台架构的服务监控Michael Zhang
 
2014 Hpocon 吴磊 ucloud - 由点到面 提升公有云服务可用性
2014 Hpocon 吴磊   ucloud - 由点到面 提升公有云服务可用性2014 Hpocon 吴磊   ucloud - 由点到面 提升公有云服务可用性
2014 Hpocon 吴磊 ucloud - 由点到面 提升公有云服务可用性Michael Zhang
 
2014 Hpocon 周辉 大众点评 - 大众点评混合开发模式下的加速尝试
2014 Hpocon 周辉   大众点评 - 大众点评混合开发模式下的加速尝试2014 Hpocon 周辉   大众点评 - 大众点评混合开发模式下的加速尝试
2014 Hpocon 周辉 大众点评 - 大众点评混合开发模式下的加速尝试Michael Zhang
 
Cuda 6 performance_report
Cuda 6 performance_reportCuda 6 performance_report
Cuda 6 performance_reportMichael Zhang
 
Hadoop Hardware @Twitter: Size does matter.
Hadoop Hardware @Twitter: Size does matter.Hadoop Hardware @Twitter: Size does matter.
Hadoop Hardware @Twitter: Size does matter.Michael Zhang
 
Q con shanghai2013-[ben lavender]-[long-distance relationships with robots]
Q con shanghai2013-[ben lavender]-[long-distance relationships with robots]Q con shanghai2013-[ben lavender]-[long-distance relationships with robots]
Q con shanghai2013-[ben lavender]-[long-distance relationships with robots]Michael Zhang
 
Q con shanghai2013-[刘海锋]-[京东文件系统简介]
Q con shanghai2013-[刘海锋]-[京东文件系统简介]Q con shanghai2013-[刘海锋]-[京东文件系统简介]
Q con shanghai2013-[刘海锋]-[京东文件系统简介]Michael Zhang
 
Q con shanghai2013-[韩军]-[超大型电商系统架构解密]
Q con shanghai2013-[韩军]-[超大型电商系统架构解密]Q con shanghai2013-[韩军]-[超大型电商系统架构解密]
Q con shanghai2013-[韩军]-[超大型电商系统架构解密]Michael Zhang
 
Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]
Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]
Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]Michael Zhang
 
Q con shanghai2013-[黄舒泉]-[intel it openstack practice]
Q con shanghai2013-[黄舒泉]-[intel it openstack practice]Q con shanghai2013-[黄舒泉]-[intel it openstack practice]
Q con shanghai2013-[黄舒泉]-[intel it openstack practice]Michael Zhang
 
Q con shanghai2013-罗婷-performance methodology
Q con shanghai2013-罗婷-performance methodologyQ con shanghai2013-罗婷-performance methodology
Q con shanghai2013-罗婷-performance methodologyMichael Zhang
 
Q con shanghai2013-赵永明-ats与cdn实践
Q con shanghai2013-赵永明-ats与cdn实践Q con shanghai2013-赵永明-ats与cdn实践
Q con shanghai2013-赵永明-ats与cdn实践Michael Zhang
 

Mehr von Michael Zhang (20)

廣告系統在Docker/Mesos上的可靠性實踐
廣告系統在Docker/Mesos上的可靠性實踐廣告系統在Docker/Mesos上的可靠性實踐
廣告系統在Docker/Mesos上的可靠性實踐
 
HKIX Upgrade to 100Gbps-Based Two-Tier Architecture
HKIX Upgrade to 100Gbps-Based Two-Tier ArchitectureHKIX Upgrade to 100Gbps-Based Two-Tier Architecture
HKIX Upgrade to 100Gbps-Based Two-Tier Architecture
 
2014 GITC 帶上數據去創業 talkingdata—高铎
 2014 GITC 帶上數據去創業 talkingdata—高铎 2014 GITC 帶上數據去創業 talkingdata—高铎
2014 GITC 帶上數據去創業 talkingdata—高铎
 
Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket Linxiaofeng
 
Spark sql meetup
Spark sql meetupSpark sql meetup
Spark sql meetup
 
2014 Hpocon 李志刚 1号店 - puppet在1号店的实践
2014 Hpocon 李志刚   1号店 - puppet在1号店的实践2014 Hpocon 李志刚   1号店 - puppet在1号店的实践
2014 Hpocon 李志刚 1号店 - puppet在1号店的实践
 
2014 Hpocon 姚仁捷 唯品会 - data driven ops
2014 Hpocon 姚仁捷   唯品会 - data driven ops2014 Hpocon 姚仁捷   唯品会 - data driven ops
2014 Hpocon 姚仁捷 唯品会 - data driven ops
 
2014 Hpocon 高驰涛 云智慧 - apm在高性能架构中的应用
2014 Hpocon 高驰涛   云智慧 - apm在高性能架构中的应用2014 Hpocon 高驰涛   云智慧 - apm在高性能架构中的应用
2014 Hpocon 高驰涛 云智慧 - apm在高性能架构中的应用
 
2014 Hpocon 黄慧攀 upyun - 平台架构的服务监控
2014 Hpocon 黄慧攀   upyun - 平台架构的服务监控2014 Hpocon 黄慧攀   upyun - 平台架构的服务监控
2014 Hpocon 黄慧攀 upyun - 平台架构的服务监控
 
2014 Hpocon 吴磊 ucloud - 由点到面 提升公有云服务可用性
2014 Hpocon 吴磊   ucloud - 由点到面 提升公有云服务可用性2014 Hpocon 吴磊   ucloud - 由点到面 提升公有云服务可用性
2014 Hpocon 吴磊 ucloud - 由点到面 提升公有云服务可用性
 
2014 Hpocon 周辉 大众点评 - 大众点评混合开发模式下的加速尝试
2014 Hpocon 周辉   大众点评 - 大众点评混合开发模式下的加速尝试2014 Hpocon 周辉   大众点评 - 大众点评混合开发模式下的加速尝试
2014 Hpocon 周辉 大众点评 - 大众点评混合开发模式下的加速尝试
 
Cuda 6 performance_report
Cuda 6 performance_reportCuda 6 performance_report
Cuda 6 performance_report
 
Hadoop Hardware @Twitter: Size does matter.
Hadoop Hardware @Twitter: Size does matter.Hadoop Hardware @Twitter: Size does matter.
Hadoop Hardware @Twitter: Size does matter.
 
Q con shanghai2013-[ben lavender]-[long-distance relationships with robots]
Q con shanghai2013-[ben lavender]-[long-distance relationships with robots]Q con shanghai2013-[ben lavender]-[long-distance relationships with robots]
Q con shanghai2013-[ben lavender]-[long-distance relationships with robots]
 
Q con shanghai2013-[刘海锋]-[京东文件系统简介]
Q con shanghai2013-[刘海锋]-[京东文件系统简介]Q con shanghai2013-[刘海锋]-[京东文件系统简介]
Q con shanghai2013-[刘海锋]-[京东文件系统简介]
 
Q con shanghai2013-[韩军]-[超大型电商系统架构解密]
Q con shanghai2013-[韩军]-[超大型电商系统架构解密]Q con shanghai2013-[韩军]-[超大型电商系统架构解密]
Q con shanghai2013-[韩军]-[超大型电商系统架构解密]
 
Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]
Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]
Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]
 
Q con shanghai2013-[黄舒泉]-[intel it openstack practice]
Q con shanghai2013-[黄舒泉]-[intel it openstack practice]Q con shanghai2013-[黄舒泉]-[intel it openstack practice]
Q con shanghai2013-[黄舒泉]-[intel it openstack practice]
 
Q con shanghai2013-罗婷-performance methodology
Q con shanghai2013-罗婷-performance methodologyQ con shanghai2013-罗婷-performance methodology
Q con shanghai2013-罗婷-performance methodology
 
Q con shanghai2013-赵永明-ats与cdn实践
Q con shanghai2013-赵永明-ats与cdn实践Q con shanghai2013-赵永明-ats与cdn实践
Q con shanghai2013-赵永明-ats与cdn实践
 

Kürzlich hochgeladen

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Kürzlich hochgeladen (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

The Data Center and Hadoop

  • 1. The Data Center and Hadoop Jacob Rapp, Cisco jarapp@cisco.com
  • 2. Hadoop Considerations • Traffic Types, Job Patterns, Network Considerations, Compute Network Integration • Co-exist with current Data Center infrastructure • Open, Programmable and Application-Aware Networks Multi-tenancy • Remove the “Silo clusters” 2
  • 3. 3
  • 4. Reduce Ingress vs. Egress Data Set Analyze 1:0.3 The Time the reducers start is dependent on: Reduce Extract Transform Load (ETL) Ingress vs. Egress Data Set mapred.reduce.slowstart.co mpleted.maps It doesn’t change the amount of data sent to Reducers, but may change the timing to send that data 1:1 Reduce Explode Ingress vs. Egress Data Set 1:2 4
  • 5. Small Flows/Messaging (Admin Related, Heart-beats, Keep-alive, delay sensitive application messaging) Small – Medium Incast (Hadoop Shuffle) Large Flows (HDFS Ingest) Large Incast (Hadoop Replication) 5
  • 6. NameNode JobTracker ZooKeeper Many-to-Many Traffic Pattern Map 1 Map 2 Map 3 Map N Shuffle Reducer 1 Reducer 2 Reducer 3 Reducer N Output Replication HDFS 6
  • 7. Job Patterns have varying impact on network utilization Analyze Simulated with Shakespeare Wordcount Extract Transform Load (ETL) Simulated with Yahoo TeraSort Extract Transform Load (ETL) Simulated with Yahoo TeraSort with output replication
  • 8. 8
  • 9. Integration Considerations  Network Attributes  Architecture  Availability  Capacity, Scale & Oversubscription  Flexibility  Management & Visibility 9
  • 10. Generally 1G is being used largely due to the cost/performance trade-offs. Though 10GE can provide benefits depending on workload Single 1GE 100% Utilized Dual 1GE 75% Utilized 10GE 40% Utilized 10
  • 11. • No single point of failure from network view point. No impact on job completion time • NIC bonding configured at Linux – with LACP mode of bonding • Effective load-sharing of traffic flow on two NICs. • Recommended to change the hashing to src-dst-ip-port (both network and NIC bonding in Linux) for optimal load-sharing 11
  • 12. 1GE vs. 10GE Buffer Usage 1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253 265 277 289 301 313 325 337 349 361 373 385 397 409 421 433 445 457 469 481 493 505 517 529 541 553 565 577 589 601 613 625 637 649 661 673 685 697 709 721 733 745 757 769 781 793 Cell Usage Job Completion Moving from 1GE to 10GE actually lowers the buffer requirement at the switching layer. 1G Buffer Used 10G Buffer Used 1G Map % 1G Reduce % 10G Map % 10G Reduce % By moving to 10GE, the data node has a wider pipe to receive data lessening the need for buffers on the network as the total aggregate transfer rate and amount of data does not increase substantially. This is due, in part, to limits of I/O and Compute capabilities 12
  • 13. Findings Goals • 10G and/or Dual attached server provides • Extensive Validation of Hadoop Workload • Reference Architecture Make it easy for Enterprise Demystify Network for Hadoop Deployment Integration with Enterprise with efficient choices of network topology/devices More Details From Hadoop Summit 2012 at: consistent job completion time & better buffer utilization • 10G provide reduce burst at the access layer • Dual Attached Sever is recommended design – 1G or 10G. 10G for future proofing • Rack failure has the biggest impact on job completion time • Does not require non-blocking network • Latency does not matter much in Hadoop workloads http://www.slideshare.net/Hadoop_Summit/ref-arch-validated-and-tested-approach-to-define-a-network-design http://youtu.be/YJODsK0T67A 13
  • 14. 14
  • 15. n3548-001# show interface brief -------------------------------------------------------------------------------Ethernet VLAN Type Mode Status Reason Speed Port Interface Ch # -------------------------------------------------------------------------------Eth1/1 1 eth access up none 10G(D) -Eth1/2 1 eth access up none 10G(D) -Eth1/3 1 eth access up none 10G(D) -Eth1/4 1 eth access up none 10G(D) -Eth1/5 1 eth access up none 10G(D) –. . Eth1/33 1 eth access up none 10G(D) -Eth1/34 1 eth access up none 10G(D) -Eth1/35 1 eth access down SFP not inserted 10G(D) -Eth1/36 1 eth access down SFP not inserted 10G(D) -Eth1/37 1 eth access down Administratively down 10G(D) – . © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15
  • 16. n3548-001# show mac address-table dynamic Legend: * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC age - seconds since first seen,+ - primary entry using vPC PeerLink MAC Addresses VLAN MAC Address Type age Secure NTFY Ports of the connected ---------+-----------------+--------+---------+------+----+----------------devices … and * 1 e8b7.484d.a208 dynamic 60570 F Eth1/31 the port they are F * 1 e8b7.484d.a20a dynamic 60560 F F Eth1/31 on… * 1 e8b7.484d.a73e dynamic 60560 F F Eth1/34 * 1 e8b7.484d.a740 dynamic 60560 F F Eth1/34 * 1 e8b7.484d.ad15 dynamic 60560 F F Eth1/28 * 1 e8b7.484d.ad17 dynamic 60560 F F Eth1/28 * 1 e8b7.484d.b3e9 dynamic 60570 F F Eth1/25 * 1 e8b7.484d.b3eb dynamic 60560 F F Eth1/25 . . © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16
  • 17. n3548-001# portServerMap ======================================= Port Server FQDN --------------------------------------Eth1/1 c200-m2-10g2-001.cluster10g.com Eth1/2 c200-m2-10g2-002.cluster10g.com Eth1/3 c200-m2-10g2-003.cluster10g.com Eth1/4 c200-m2-10g2-004.cluster10g.com Eth1/5 c200-m2-10g2-005.cluster10g.com Eth1/6 c200-m2-10g2-006.cluster10g.com Eth1/7 c200-m2-10g2-031.cluster10g.com Eth1/8 c200-m2-10g2-008.cluster10g.com Eth1/9 c200-m2-10g2-009.cluster10g.com Eth1/11 c200-m2-10g2-011.cluster10g.com . . . © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17
  • 18. n3548-001# trackerList =========================================== Port Server Server Port ------------------------------------------Eth1/2 c200-m2-10g2-002 50544 Eth1/3 c200-m2-10g2-003 41909 Eth1/4 c200-m2-10g2-004 36480 Eth1/5 c200-m2-10g2-005 38179 Eth1/6 c200-m2-10g2-006 51375 Eth1/7 c200-m2-10g2-031 41915 Eth1/8 c200-m2-10g2-008 50983 Eth1/9 c200-m2-10g2-009 37056 Eth1/11 c200-m2-10g2-011 35882 Eth1/12 c200-m2-10g2-012 44551 . . . © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18
  • 19. n3548-001# bufferServerMap =================================================================== Port Server 1sec 5sec 60sec 5min 1hr ------------------------------------------------------------------Eth1/1 c200-m2-10g2-001 0KB 0KB 0KB 0KB 0KB Eth1/2 c200-m2-10g2-002 384KB 384KB 1536KB 2304KB 2304KB Eth1/3 c200-m2-10g2-003 384KB 384KB 1152KB 1536KB 1536KB Eth1/4 c200-m2-10g2-004 384KB 384KB 2304KB 2304KB 2304KB Eth1/5 c200-m2-10g2-005 384KB 384KB 768KB 1536KB 1536KB Eth1/6 c200-m2-10g2-006 384KB 2304KB 2304KB 2304KB 2304KB Eth1/7 c200-m2-10g2-031 384KB 384KB 3456KB 3840KB 3840KB Eth1/8 c200-m2-10g2-008 768KB 768KB 2688KB 2688KB 2688KB Eth1/9 c200-m2-10g2-009 384KB 384KB 2304KB 2304KB 2304KB Eth1/11 c200-m2-10g2-011 384KB 384KB 1920KB 1920KB 1920KB . . Eth1/1(c200-m2-10g2-001) . has 0 buffer usage because it’s the name node © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19
  • 20. n3548-001# jobsBuffer Hadoop Job Info ... What jobs were running =================================================================== during peak buffer usage 1 jobs currently running JobId RunTime(secs) User Priority … and for how long were job_201306131423_0009 120 hadoop NORMAL they running =================================================================== Buffer Info - Per Port Port Server 1sec 5sec 60sec 5min 1hr ------------------------------------------------------------------Eth1/1 c200-m2-10g2-001 0KB 0KB 0KB 0KB 0KB Eth1/2 c200-m2-10g2-002 384KB 384KB 768KB 768KB 768KB Eth1/3 c200-m2-10g2-003 384KB 384KB 1152KB 1152KB 1152KB Eth1/4 c200-m2-10g2-004 384KB 1536KB 1536KB 1536KB 1536KB Eth1/5 c200-m2-10g2-005 384KB 768KB 1152KB 1152KB 1152KB . . © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20
  • 21. n3548-001(config)# jobsBuffer Hadoop Job Info ... Historic look at =================================================================== the 0 jobs currently running buffer usage … JobId RunTime(secs) User Priority =================================================================== Buffer Info - Per Port Port Server 1sec 5sec 60sec 5min 1hr ------------------------------------------------------------------Eth1/1 c200-m2-10g2-001 0KB 0KB 0KB 0KB 0KB Eth1/2 c200-m2-10g2-002 0KB 0KB 0KB 1920KB 1920KB Eth1/3 c200-m2-10g2-003 0KB 0KB 0KB 2304KB 2304KB Eth1/4 c200-m2-10g2-004 0KB 0KB 0KB 2688KB 2688KB Eth1/5 c200-m2-10g2-005 0KB 0KB 0KB 2304KB 2304KB Eth1/6 c200-m2-10g2-006 0KB 0KB 0KB 2304KB 2304KB Eth1/7 c200-m2-10g2-031 0KB 0KB 0KB 1920KB 2688KB . © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21
  • 22. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
  • 23. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23
  • 24. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24
  • 25. Buffer Usage Shuffle Replication Reduce Map 0 60 120 180 © 2011 Cisco and/or its affiliates. All rights reserved. 240 300 360 420 480 540 600 660 720 780 Cisco Confidential 25
  • 26. github.com/datacenter PTP Grandmaster (OPTIONAL) Push Data Push Data Push Data Analyze (Python Socket) © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26
  • 27. 27
  • 28. Various Multitenant Environments  Hadoop + HBASE Need to understand Traffic Patterns  Job Based Scheduling Dependent  Department Based Permissions and Scheduling Dependent 28
  • 29. Client Client Update Read Update Map 1 Map 2 Map 3 Read Region Server Map N Shuffle Region Server Read Read Reducer 1 Reducer 2 Reducer 3 Reducer N Major Compaction Major Compaction Output Replication HDFS 29
  • 30. Hbase During Major Compaction 9000 8000 ~45% for Read Improvement Latency (us) 7000 6000 Read/Update Latency Comparison of NonQoS vs. QoS Policy 5000 4000 3000 2000 1000 0 Time UPDATE - Average Latency (us) READ - Average Latency (us) QoS - UPDATE - Average Latency (us) QoS - READ - Average Latency (us) Switch Buffer Usage With Network QoS Policy to prioritize Hbase Update/Read Operations 30
  • 31. Hbase + Hadoop Map Reduce 40000 35000 Latency (us) 30000 Read/Update Latency Comparison of NonQoS vs. QoS Policy 25000 ~60% for Read Improvement 20000 15000 10000 5000 0 Time READ - Average Latency (us) QoS - UPDATE - Average Latency (us) QoS - READ - Average Latency (us) 1 70 139 208 277 346 415 484 553 622 691 760 829 898 967 1036 1105 1174 1243 1312 1381 1450 1519 1588 1657 1726 1795 1864 1933 2002 2071 2140 2209 2278 2347 2416 2485 2554 2623 2692 2761 2830 2899 2968 3037 3106 3175 3244 3313 3382 3451 3520 3589 3658 3727 3796 3865 3934 4003 4072 4141 4210 4279 4348 4417 4486 4555 4624 4693 4762 4831 4900 4969 5038 5107 5176 5245 5314 5383 5452 5521 5590 5659 5728 5797 5866 5935 Buffer Used UPDATE - Average Latency (us) Timeline Hadoop TeraSort Hbase Switch Buffer Usage With Network QoS Policy to prioritize Hbase Update/Read Operations
  • 32. THANK YOU FOR LISTENING Cisco.com Big Data www.cisco.com/go/bigdata Data Center Script Examples from Presentation: github.com/datacenter Cisco Unified Data Center UNIFIED FABRIC UNIFIED COMPUTING Highly Scalable, Secure Network Fabric Modular Stateless Computing Elements www.cisco.com/go/nexus www.cisco.com/go/ucs UNIFIED MANAGEMENT Automated Management Manages Enterprise Workloads http://www.cisco.com/go/wor kloadautomation

Hinweis der Redaktion

  1. Generally 1G is being used largely due to the cost/performance trade-offs. Though 10GE can provide benefits depending on workloadReduced spike with 10G and smoother job completion timeMultiple 1G or 10G links can be bonded together to not only increase bandwidth, but increase resiliency.
  2. Talk about intensity of failure with smaller job vs bigger jobThe MAP job are executed parallel so unit time for each MAP tasks/node remains same and more less completes the job roughly at the same time. However during the failure, set of MAP task remains pending (since other nodes in the cluster are still completing their task) till ALL the node finishes the assigned tasks.Once all the node finishes their MAP task, the left over MAP task being reassigned by name node, the unit time it take to finish those sets of MAP task remain the same(linear) as the time it took to finish the other MAPs – its just happened to be NOT done in parallel thus it could double job completion time. This is the worst case scenario with Terasort, other workload may have variable completion time.