7. Analyze
Simulated with
Shakespeare
Wordcount
[ 10s-20s Mgbps]
Extract Transform Load
(ETL)
Simulated with
Yahoo TeraSort
[ Larger than 1 Gbps]
Extract Transform Load
(ETL)
Simulated with
Yahoo TeraSort with output
replication
[ 2 – 4 Gbps]
Job Patterns have varying impact on network utilization
Job Pattern - network graph of data coming into one node.
8. 8
Map 1 Map 2 Map NMap 3
Reducer
1
Reducer
2
Reducer
3
Reducer
N
HDFS
Shuffle
Output
Replication
Region
Server
Region
Server
Client Client
Major
Compaction
Read
Read
Read
Update
Update
Read
Major
Compaction
9. 9
Hbase During Major Compaction.
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Latency(us)
Time
UPDATE - Average Latency (us) READ - Average Latency (us) QoS - UPDATE - Average Latency (us) QoS - READ - Average Latency (us)
Read/Update
Latency
Comparison of
Non-QoS vs. QoS
Policy
~45% for Read
Improvement
Switch Buffer
Usage
With Network
QoS Policy to
prioritize Hbase
Update/Read
Operations
every 24 hours HBase wakes up and has this stampede of elephants that does this
massive push into HDFS.
25. workload automation facilitates the flow of data
costs
Twitter
Feeds
Map Reduce
Hive
BI Analytics
SQL
Sqoop
Map Reduce
Map Reduce
Call logs
Web Clicks
Gather Data Data IntegrationLoad Data Data Analysis
Report Generation
and Distribution
Web Services
SSH
DB/JDBC
ERP/CRM
Data Mover
Sqoop
MapReduce
Informatica
Hive
Sqoop
Informatica
Business Objects
Cognos
Web Services