33. val memberInfoFile = /tmp/member_info.csv
val sc = new SparkContext()
会員属性ファイルから性別をカウント
会員 ID, 性別コード, 年齢, 地域コード, 会員登録日
100000000, male, 32, osaka, 2014-04-21 19:48:18
100000001, male, 50, tokyo, 2014-06-01 09:17:40
100000002, female, 37, tokyo, 2014-07-31 07:34:48
100000003, male, 41, osaka, 2014-06-06 08:25:55
100000004, female, 63, osaka, 2014-04-18 05:01:21
33
34. val memberInfoFile = /tmp/member_info.csv
val sc = new SparkContext()
会員属性ファイルから性別をカウント
34
35. val memberInfoFile = /tmp/member_info.csv
val sc = new SparkContext()
val lines = sc.textFile(memberInfoFile)
会員属性ファイルから性別をカウント
35
36. val memberInfoFile = /tmp/member_info.csv
val sc = new SparkContext()
val lines = sc.textFile(memberInfoFile)
会員属性ファイルから性別をカウント
会員 ID, 性別コード, 年齢, 地域コード, 会員登録日
100000000, male, 32, osaka, 2014-04-21 19:48:18
100000001, male, 50, tokyo, 2014-06-01 09:17:40
100000002, female, 37, tokyo, 2014-07-31 07:34:48
100000003, male, 41, osaka, 2014-06-06 08:25:55
100000004, female, 63, osaka, 2014-04-18 05:01:21
36
37. val memberInfoFile = /tmp/member_info.csv
val sc = new SparkContext()
val lines = sc.textFile(memberInfoFile)
val cols = lines.map(_.split(","))
会員属性ファイルから性別をカウント
{会員 ID, 性別コード, 年齢, 地域コード, 会員登録日}
{100000000, male, 32, osaka, 2014-04-21 19:48:18}
{100000001, male, 50, tokyo, 2014-06-01 09:17:40}
{100000002, female, 37, tokyo, 2014-07-31 07:34:48}
{100000003, male, 41, osaka, 2014-06-06 08:25:55}
{100000004, female, 63, osaka, 2014-04-18 05:01:21}
37
38. val memberInfoFile = /tmp/member_info.csv
val sc = new SparkContext()
val lines = sc.textFile(memberInfoFile)
val cols = lines.map(_.split(","))
val genders = cols.map(row => (row(1), 1))
会員属性ファイルから性別をカウント
{row(0), row(1), row(2), row(3), row(4)}
{100000000, male, 32, osaka, 2014-04-21 19:48:18}
{100000001, male, 50, tokyo, 2014-06-01 09:17:40}
{100000002, female, 37, tokyo, 2014-07-31 07:34:48}
{100000003, male, 41, osaka, 2014-06-06 08:25:55}
38
39. val memberInfoFile = /tmp/member_info.csv
val sc = new SparkContext()
val lines = sc.textFile(memberInfoFile)
val cols = lines.map(_.split(","))
val genders = cols.map(row => (row(1), 1))
会員属性ファイルから性別をカウント
{row(0), row(1), row(2), row(3), row(4)}
{100000000, male, 32, osaka, 2014-04-21 19:48:18}
{100000001, male, 50, tokyo, 2014-06-01 09:17:40}
{100000002, female, 37, tokyo, 2014-07-31 07:34:48}
{100000003, male, 41, osaka, 2014-06-06 08:25:55}
39
40. val memberInfoFile = /tmp/member_info.csv
val sc = new SparkContext()
val lines = sc.textFile(memberInfoFile)
val cols = lines.map(_.split(","))
val genders = cols.map(row => (row(1), 1))
会員属性ファイルから性別をカウント
(row(1), 1)
(male, 1)
(male, 1)
(female, 1)
(male, 1)
40
41. val memberInfoFile = /tmp/member_info.csv
val sc = new SparkContext()
val lines = sc.textFile(memberInfoFile)
val cols = lines.map(_.split(","))
val genders = cols.map(row => (row(1), 1))
val result = genders.reduceByKey((x, y) => x + y)
会員属性ファイルから性別をカウント
(male, 1)
(female, 1)
(male, 1)
(male, 1)
(male, 2)
(male, 3)
(female, 1)
41
42. val memberInfoFile = /tmp/member_info.csv
val sc = new SparkContext()
val lines = sc.textFile(memberInfoFile)
val cols = lines.map(_.split(","))
val genders = cols.map(row => (row(1), 1))
val result = genders.reduceByKey((x, y) => x + y)
会員属性ファイルから性別をカウント
42
43. val memberInfoFile = /tmp/member_info.csv
val sc = new SparkContext()
val lines = sc.textFile(memberInfoFile)
val cols = lines.map(_.split(","))
val genders = cols.map(row => (row(1), 1))
val result = genders.reduceByKey((x, y) => x + y)
result.collect().foreach(println)
sc.stop()
(各種リソースの解放)
会員属性ファイルから性別をカウント
43
52. 52
YARN 側のチューニングが必要
!
14/12/23 05:00:10 ERROR yarn.Client: Required executor
memory (16384 MB), is above the max threshold (8192 MB)
of this cluster.
Exception in thread "main" java.lang.IllegalArgumentException:
Required executor memory (16384 MB), is above the max
threshold (8192 MB) of this cluster.
53. 53
推奨されるコンテナ数の算出
!
# of containers = min (
2 * CORES,
1.8 * DISKS,
(Total available RAM) / MIN_CONTAINER_SIZE
)
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/
content/rpm-chap1-11.html
※ 8コア CPU x 32GB メモリ x 640GB ディスク x 10 ノード