Cisco's Unified Fabric provides an integrated networking solution optimized for big data infrastructures using Hadoop. The document describes Cisco's testing of the Unified Fabric using a Hadoop cluster of 128 and 16 nodes running Yahoo's Terasort benchmark on 1TB of data. It found that the Unified Fabric can support the network traffic patterns of Hadoop workloads while efficiently utilizing buffering to absorb bursts of traffic during shuffle and replication phases.
2. Unified Fabric optimized for Big Data infrastructures with seamless integration with current data models Traditional Database RDBMS Storage SAN/NAS “ Big Data” Store And Analyze “ Big Data” Real-Time Capture, Read & Update NoSQL Application Virtualized, Bare-Metal, Cloud Sensor Data Logs Social Media Click Streams Mobility Trends Event Data Cisco Unified Fabric
3.
4.
5.
6.
7. A general characteristic of an optimally configured cluster is the ability to decrease job completion times by scaling out the nodes. Test results from ETL-like Workload (Yahoo Terasort) using 1TB data set.
8.
9. Network Graph of all Traffic received on an single node (80 Node run) Reducers Start Maps Finish Job Complete Maps Start These symbols represent a node sending traffic to HPC064 Note: Shortly after the Reducers start Map tasks are finishing and data is being shuffled to reducers As Maps completely finish the network is no loner used as Reducers have all the data they need to finish the job The red line is the total amount of traffic received by hpc064
10.
11. Network Graph of all Traffic received on an single node (80 node run) Reducers Start Maps Finish Job Complete Maps Start The red line is the total amount of traffic received by hpc064 These symbols represent a node sending traffic to HPC064 Note: Due the the combination of the length of the Map phase and the reduced data set being shuffled, the network is being utilized throughout the job, but by a limited amount.
12. Given the same MapReduce Job, the larger the input dataset, the longer the job will take. Note: It is important to note that as dataset sizes increase completion times may not scale linearly as many jobs can hit the ceiling of I/O and/or Compute power. Test results from ETL-like Workload (Yahoo Terasort) using varying data set sizes.
13. The I/O capacity, CPU and memory of the Data Node can have a direct impact on performance of a cluster. Note: A 2RU Server with 16 disks gives the node more storage, but trading off CPU per RU. On the other hand a 1RU server gives more CPU per rack.
14. Data Locality – The ability to process data where it is locally stored. Note: During the Map Phase, the JobTracker attempts to use data locality to schedule map tasks where the data is locally stored. This is not perfect and is dependent on a data nodes where the data is located. This is a consideration when choosing the replication factor. More replicas tend to create higher probability for data locality. Map Tasks: Initial spike for non-local data. Sometimes a task may be scheduled on a node that does not have the data available locally.
15. Hadoop clusters are generally multi-use. The effect of background use can effect any single jobs completion. Note: A given Cluster, is generally running may different types of Jobs, Importing into HDFS, Etc. Example View of 24 Hour Cluster Use Large ETL Job Overlaps with medium and small ETL Jobs and many small BI Jobs (Blue lines are ETL Jobs and purple lines are BI Jobs) Importing Data into HDFS
16.
17.
18.
19.
20.
21. In a multi-use cluster described previously, multiple job types (ETL, BI, etc.) and Importing data into HDFS can be happening at the same time Note: Usage may vary depending on job scheduling options
22. In the largest workloads, multi-terabytes can be transmitted across the network Note: Data taken from multi-use workload (Multi-ETL + Multi-BI + HDFS Import).
23. Generally 1GE is being used largely due to the cost/performance trade-offs. Though 10GE can provide benefits depending on workload. Note: Multiple 1GE links can be bonded together to not only increase bandwidth, but increase bandwidth.
24. Moving from 1GE to 10GE actually lowers the buffer requirement on the switching layer. Note: By moving to 10GE, the data node has a larger input to receive data lessening the need for buffers on the network as the total aggregate speed or amount of data does not increase substantially. This is due, in part, to limits of I/O and Compute capabilities
25. Generally network latency, while consistent latency being important, does not represent a significant factor for Hadoop Clusters. Note: There is a difference in network latency vs. application latency. Optimization in the application stack can decrease application latency that can potentially have a significant benefit.