14. What is Hadoop?
Input from HDFS.
Process by MapReduce.
Output to HDFS.
Source : http://horicky.blogspot.com/2008_11_01_archive.html
14
15. What is Hadoop?
Simple Example.
Source : http://techblog.yahoo.co.jp/cat207/cat209/hadoop/
15
16. What is Hadoop?
In Common Case,
Combine Several Simple Jobs.
Source : http://horicky.blogspot.com/2008_11_01_archive.html
16
17. What is Hadoop?
NameNode & DataNode
Constitute
HDFS.
Source : http://horicky.blogspot.com/2008_11_01_archive.html
17
18. What is Hadoop?
Read & Write on HDFS.
Source : http://hadoop.apache.org/common/docs/current/hdfs_design.html#NameNode+and+DataNodes
18
19. What is Hadoop?
JobTracker & TaskTracker
Constitute
MapReduce.
Source : http://horicky.blogspot.com/2008_11_01_archive.html
19
20. What is Hadoop?
Good & Bad Points of Hadoop.
Good!
Easy to Scale Out System.
Easy to Implement Distributed Processing.
Bad…
There is SPoF at NameNode.
20
22. Our Current Hadoop System Overview.
The Cluster Infrastructure. #1
For Instance.
Source : http://www.ibm.com/developerworks/linux/library/l-hadoop/
22
24. Our Current Hadoop System Overview.
The Monitoring System.
Using Ganglia (& MRTG).
Every Time We Easily Can Check
The Resource Usage,
Not Only Each Machine
But As Cluster.
24
25. Our Current Hadoop System Overview.
High Availability. Client
Using DRBD & HeartBeat. NN : NameNode
JT : JobTracker
v-host.rakuten.co.jp
Active Standby
eth1 eth1
NN JT NN JT
/foo/drbd0 /foo/drbd1 /foo/drbd0 /foo/drbd1
eth0 eth0
Source : Gen
DRBD Sync The Change.
25
27. Our Hadoop Usage.
Who Is Using Our Hadoop.
1. Generating Recommend Engine Index.
2. Analyzing Redirect Log.
3. Calculating AD Targeting Index.
4. Measuring AD Effects.
5. Analyzing Ichiba Merchandise & Order Info.
6. Calculating Ichiba Product Ranking.
7. Analyzing Search Log.
8. Analyzing Rakuten Travel’s Access Log. (Coming Soon...)
9. Analyzing Search Word N-gram. (Coming Soon...)
27
28. Our Hadoop Usage.
The Issues of The Previous System.
1. Need High Cost to Keep Up The RDBMS.
2. Need Quite a Lot of Storage Space More & More.
3. System Cannot Handle So Many Job Request
Due to Low Performance.
Batch Server
Purchase Marketing
Manipulate
Shop Intermediate Utility
Unload
Load
File
File File
File
File File
Category Intermediate NFS
ITEM Previous System Mail
28
29. Our Hadoop Usage.
The Effect of The New System.
1. Get Scalable System at Very Low Cost. (80% OFF as Storage.)
2. Transaction Time is Dramatically Improved. (50-75% OFF.)
Batch Server
with
Purchase Marketing
Manipulate
Shop Utility
Unload
Load
File
File File
File
File File
Category NFS
Intermediate
ITEM Mail
New System! 1st Step. 29
30. Our Hadoop Usage.
The Remaining Subject of
The New System.
1. Still Halfway to Aiming DWH.
2. The Negative Influence Due to The Migration
from Occupied Environment to Shared
Environment.
1. Security.
2. Sharing Cluster Resource.
30
32. Our Challenge.
The Issues with Our Hadoop.
1. Likely to Use Up The HDFS Space.
2. Need Much Electlicity Power.
3. Share The Cluster Resource Efficiently.
4. Need More Network Bandwidth.
32
34. Our Future Plan.
Considering New Slave Machine.
Now Looking for a Machine Which has…
Low Electric Power Consumption,
About 6 Cores CPU x2,
About 10TB HDD,
About 96GB Memory,
& Naturally Compatible With Our Data Center.
? 34
35. Our Future Plan.
Upgrade from Apache to CDH3.
Mr.Eric Sammer (Solution Architect at Cloudera)
Described the Advantage of Hadoop from Cloudera on Quora.
1. A version of Hadoop that has frequent releases (quarterly) that include bug fixes
and back ported features (append for HBase, Kerberos security from Y!, etc.).
2. Related projects (Hive, Pig, Oozie, HBase, Flume, Sqoop, etc.) tested together and
work as a cohesive system.
3. Simplified installation via Yum / Apt repositories.
4. Tighter integration with the OS (init scripts for daemons, installation of things in
common paths, logs in their proper location.).
5. A fixed release schedule.
6. Support available from Cloudera with SLAs.
Source : http://www.quora.com/What-are-the-advantages-of-getting-Apache-Hadoop-from-Cloudera-rather-than-the-Apache-Software-Foundation]
35
36. Our Future Plan.
Evaluating HBase Using AWS.
Constructing HBase Cluster on Amazon EC2.
Doing Evaluation & Verification This Summer!
36