Weitere ähnliche Inhalte Ähnlich wie データ解析技術入門(Hadoop編) (20) Kürzlich hochgeladen (20) データ解析技術入門(Hadoop編)1. ( &Hadoop )
2013 4 12
Takumi Asai
2. (26 )
–
– H21 H23 NTT Communications IP
– H23 NTT
– twitter:@p_i_o4545
– blog:http://pioneerinocean.hatenablog.com/
•
• R Hadoop ( )
–
•
3. ( :4/12)
Hadoop
( : )
R
Ruby R
7. 21 ( )
⇒
Google,Facebook
14. Hadoop
Hadoop
– Apache
Java
– Google MapReduce,Google File
System(GFS)
• google
15. Hadoop
Hadoop
– HDFS MapReduce
– Hbase
HDFS
– Google GFS
–
MapReduce
– Google MapReduce
– Key-Value
Java
17. HDFS
• HDFS
(64MB )
abcdefg #Block1
hijklmn
(64MB)
opqrstu
abcdefg
hijklmn
opqrstu
vwxyz vwxyz
#Block2
(64MB)
150M
#Block3
(22MB)
18. HDFS
–
–
–
abcdefg
#Block1 Data Node:A has 1,2
hijklmn
(64MB)
opqrstu
Data Node:B has 2,3
vwxyz Data Node:C has 1,3
#Block2
(64MB)
Data Node:D has 1
#Block3
(22MB) Data Node:E has 2,3
21. Namenode !
– Namenode HDFS
– NN 2NN
– HDFS
–
–
22. HDFS
HDFS
Data Node Data Node
Name Node
Active
Data Node Data Node
Name Node
Standby
Data Node Data Node
Standby 2NN
2NN
23. HDFS
HDFS
– Datanode
– Datanode
Namenode
– Namenode
– Namenode ⇔Datanode
Datanode⇔Datanode
•
•
•
Linux
– ls,cat
– rwx
• x HDFS
24. MapReduce
MapReduce
–
–
– Map/Reduce 2
– Map/Reduce ,Mapper/Reducer
– Map,Reduce Shuffle
25. MapReduce
HDFS
Task Tracker
Task Tracker
( )
Job Tracker
Task Tracker ( ) Task Tracker
Task Tracker Task Tracker
JobTracker TaskTracker
26. Data Node
Data Node
Task Tracker
Task Tracker
Name Node
Job Tracker
Data Node Data Node
Task Tracker Task Tracker
Secondary
Name Node
Data Node ※ HDFS
Data Node
Task Tracker ※ Mapreduce
Task Tracker
27. Mapreduce
YARN
– HDFS Mapreduce
– YARN(Mapreduce Ver2)
– Mapreduce
– YARN
– YARN
28. MapReduce
WordCount
– MapReduce (Hello World )
Hello Hadoop Goodbye World Hello Goodbye World World Hadoop
Map
<Hello,1> <Hadoop,1> <Goodbye,1> <World,1> <Hello,1>
<Goodbye,1> <World,1> <World,1> <Hadoop,1>
Shuffle
<Goodbye,[1,1]> <Hadoop,[1,1]> <Hello,[1,1]> <World,[1,1,1]>
Reduce
<Goodbye,2> <Hadoop,2> <Hello,2> <World,3>
29. MapReduce
Mapper Reducer
–
–
–
– HDFS ” ”
Map
reduce
Map
reduce
Map
30. MapReduce
– WordCount
– Map Reduce
–
• fizz buzz fizzbuzz fizz
– Ruby Ruby
– Map #{ }¥t1
OK
– Reduce
31. MapReduce
hdfs Hadoop Hadoop Mapred Mapred Mapred Hadoop Mapred
Hadoop 3
hdfs 1
Mapred 4
– OK
• #{ }¥t#{ }
– cat test.txt | ruby map.rb | sort | ruby reduce.rb
• Hadoop
32. MapReduce
:Map
hdfs Hadoop Hadoop Mapred Mapred Mapred Hadoop Mapred
hdfs 1
Hadoop 1
Hadoop 1
Mapred 1
Mapred 1
34. Reduce
wordhash = {}
STDIN.each_line do |line|
word, count = line.strip.split
if wordhash.has_key?(word)
wordhash[word] += count.to_i
else
wordhash[word] = count.to_i
end
end
wordhash.each {|record, count| puts "#{record}¥t#{count}"}
35. Hadoop
Hadoop
–
– Java OK
–
•
.
37. Hadoop
–
• Pig
• Hive
–
• Sqoop
–
• Mahout
– Hadoop
• whirr
etc…
38. Hadoop
– HDFS
• RAID
•
– HDFS Mapreduce
• Amazon S3
–
•
–
•
–
•