SlideShare a Scribd company logo
1 of 27
Download to read offline
대용량 플랫폼 구축을 위한
Hadoop 따라가기
skplanet
김경진
Data Service 개발팀
- Recopick
- SyrupAd
- DMP 디엠ㅍ …가 아니고 MRSS (Marketing &
Recommendation Suport System)
2 month
막연하다….
Q. 대동법이 시행된 연도?
-> 모른다
Q. 여자친구는 어떻게 만드나요?
-> ???????
단순히 아는 것과/모르는 것으로 분리되는 일
VS
다양한 양상과 컨텍스트가 분석이 필요한 일
Task 초입
: 대부분 모르는 것의 문제
Task 진행 중기/말기
: 대부분 다양한 문제
그럼 설명하기 전에 잠깐만...
보통 빅데이터 설명은 왜 힘든가?
사실 개발은 CRUD 아닌가?
왜 이해가 안되지?
-> 사실 ‘무엇’때문에 ‘뭘 해야’하는지
모르는거 아냐?
그래서 뭘 하고 싶었는지
적어봤습니다…
많은 파일로 쪼개져 있던 대용량의 데이터
읽기
-> Spark
대용량의 데이터를 맵리듀스 하여 원하는
아웃풋으로 만들기
-> Spark
대용량의 데이터를 주기적으로 배치,
관리하기
-> Oozie
Spark
빅데이터 처리를 위한 분산 플랫폼
- RDD (Resilient Distribute DataSet)
: Transformation, Action
- DataFrame
: tabular data(테이블형 데이터) 처리를 위한
분산 컬렉션
실행하기 위해 겪어야 했던 것들..
Getting started with Spark
Run Spark in Stand Alone Mode
Run Spark in Cluster Mode
Run Spark in Real Distribute Environment
Run Spark by Scheduler
Oozie
Server based Workflow Engine specialized in
running workflow jobs with actions that run
Hadoop Map/Reduce and Pig jobs
Oozie는 하둡의 Workflow 스케쥴러
Oozie workflow
<workflow-app xmlns='uri:oozie:workflow:0.1' name='processDir'>
<start to='getDirInfo' />
<!-- STEP ONE -->
<action name='getDirInfo'>
<!--writes 2 properties: dir.num-files: returns -1 if dir doesn't exist,
otherwise returns # of files in dir dir.age: returns -1 if dir doesn't exist,
otherwise returns age of dir in days -->
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<main-class>com.navteq.oozie.GetDirInfo</main-class>
<arg>${inputDir}</arg>
<capture-output />
</java>
<ok to="makeIngestDecision" />
<error to="fail" />
</action>
<kill name="fail">
<message>Java failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
</workflow>
Oozie command
$ oozie job -oozie http://localhost:8080/oozie -config examples/apps/map-reduce/job.properties -run
.
job: 14-20090525161321-oozie-tucu
Check the workflow job status:
$ oozie job -oozie http://localhost:8080/oozie -info 14-20090525161321-oozie-tucu
.
.----------------------------------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : map-reduce-wf
App Path : hdfs://localhost:9000/user/tucu/examples/apps/map-reduce
Status : SUCCEEDED
Run : 0
User : tucu
Group : users
Created : 2009-05-26 05:01 +0000
Started : 2009-05-26 05:01 +0000
Ended : 2009-05-26 05:01 +0000
Actions
.----------------------------------------------------------------------------------------------------------------------------------------------------------------
Action Name Type Status Transition External Id External Status Error Code Start Time End Time
.----------------------------------------------------------------------------------------------------------------------------------------------------------------
mr-node map-reduce OK end job_200904281535_0254 SUCCEEDED - 2009-05-26 05:01 +0000 2009-05-26
05:01 +0000
.----------------------------------------------------------------------------------------------------------------------------------------------------------------
Java Client
...
// start local Oozie
LocalOozie.start();
.
// get a OozieClient for local Oozie
OozieClient wc = LocalOozie.getClient();
.
// create a workflow job configuration and set the workflow application path
Properties conf = wc.createConfiguration();
conf.setProperty(OozieClient.APP_PATH, "hdfs://foo:9000/usr/tucu/my-wf-app");
.
// setting workflow parameters
conf.setProperty("jobTracker", "foo:9001");
conf.setProperty("inputDir", "/usr/tucu/inputdir");
conf.setProperty("outputDir", "/usr/tucu/outputdir");
...
.
// submit and start the workflow job
String jobId = wc.run(conf);
System.out.println("Workflow job submitted");
.
// wait until the workflow job finishes printing the status every 10 seconds
while (wc.getJobInfo(jobId).getStatus() == Workflow.Status.RUNNING) {
System.out.println("Workflow job running ...");
Thread.sleep(10 * 1000);
}
.
// print the final status o the workflow job
System.out.println("Workflow job completed ...");
System.out.println(wf.getJobInfo(jobId));
.
// stop local Oozie
LocalOozie.stop();
…
Run Spark with Oozie
Oozie : Curse or Blessing?
Unmanaged xml hell
Unmanaged dependencies with different big data
applications
${oozie.wf.application.path}/lib
oozie.libpath=${oozie.wf.application.path}/lib
ShareLib : /user/${user.name}/share/lib
<property>
<name>oozie.hive.defaults</name>
<value>${jobDir}/hive-conf.xml</value>
</property>
HiveMain
public class HiveMain extends LauncherMain {
public static final String HIVE_SITE_CONF = "hive-site.xml";
public static Configuration setUpHiveSite() throws Exception {
Configuration hiveConf = initActionConf();
// Write the action configuration out to hive-site.xml
OutputStream os = new FileOutputStream(HIVE_SITE_CONF);
hiveConf.writeXml(os);
os.close();
System.out.println();
System.out.println("Hive Configuration Properties:");
System.out.println("------------------------");
for (Entry<String, String> entry : hiveConf) {
System.out.println(entry.getKey() + "=" + entry.getValue());
}
System.out.flush();
System.out.println("------------------------");
System.out.println();
Do not trust sharedLib
HDFS
Processing
Hive Hive
Hadoop
Spark
Segmentation
1. prdname
contains(‘나이키’)
AND adtitle
contains(‘Aution’)
prdname adtitle
나이키 Aution
리복 Gmarket
id count
1 2300
2 156000
Spark with DataFrame
- DataFrame Features
: pass data between nodes, in a much more
efficient way than using Java serialization.
(Because Spark understands the schema)
: transformations directly data on off-heap
memory, avoiding the garbage-collection costs
: API for building a relational query plan that
Spark’s Catalyst optimizer can then execute
Spark Too Slow…
DataFrame을 통해 여차저차 구현 후…
30(80G*30)일치 데이터 100개의 세그먼트
처리
-> 처리 시간 8시간
-> 스펙을 맞출 수 없음
-> 느리지만 돌아가해도 괜찮겠다 바람…
Spark Failed
- DataFrame Performance
Heavy overhead
Not optimized in distribute environment :
Memory Risky
SqlContext Problems
core 2 executor 256
hive metastore db connection full 유발 (mysql
connnection limit : 2,000)
Fail reason
내가 소홀히 했던 것들…
라이브러리 의존성 체크에 방만
로컬 테스트가 불가능할 때를 겪어보지 못함
분산 환경 프로그래밍에 대한 지식 부재

More Related Content

What's hot

To Hire, or to train, that is the question (Percona Live 2014)
To Hire, or to train, that is the question (Percona Live 2014)To Hire, or to train, that is the question (Percona Live 2014)
To Hire, or to train, that is the question (Percona Live 2014)Geoffrey Anderson
 
Elastic search 클러스터관리
Elastic search 클러스터관리Elastic search 클러스터관리
Elastic search 클러스터관리HyeonSeok Choi
 
Node js presentation
Node js presentationNode js presentation
Node js presentationmartincabrera
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화NAVER D2
 
Migrating and living on rds aurora
Migrating and living on rds auroraMigrating and living on rds aurora
Migrating and living on rds auroraBalazs Pocze
 
Network Automation: Ansible 102
Network Automation: Ansible 102Network Automation: Ansible 102
Network Automation: Ansible 102APNIC
 
Node.js and Cassandra
Node.js and CassandraNode.js and Cassandra
Node.js and CassandraStratio
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesSadayuki Furuhashi
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.jsorkaplan
 
Data integration with embulk
Data integration with embulkData integration with embulk
Data integration with embulkTeguh Nugraha
 
Rhebok, High Performance Rack Handler / Rubykaigi 2015
Rhebok, High Performance Rack Handler / Rubykaigi 2015Rhebok, High Performance Rack Handler / Rubykaigi 2015
Rhebok, High Performance Rack Handler / Rubykaigi 2015Masahiro Nagano
 
Using Ansible for Deploying to Cloud Environments
Using Ansible for Deploying to Cloud EnvironmentsUsing Ansible for Deploying to Cloud Environments
Using Ansible for Deploying to Cloud Environmentsahamilton55
 
Speeding Up The Snail
Speeding Up The SnailSpeeding Up The Snail
Speeding Up The SnailMarcus Deglos
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHPfwso
 
MongoDB San Francisco DrupalCon 2010
MongoDB San Francisco DrupalCon 2010MongoDB San Francisco DrupalCon 2010
MongoDB San Francisco DrupalCon 2010Karoly Negyesi
 

What's hot (20)

To Hire, or to train, that is the question (Percona Live 2014)
To Hire, or to train, that is the question (Percona Live 2014)To Hire, or to train, that is the question (Percona Live 2014)
To Hire, or to train, that is the question (Percona Live 2014)
 
Elastic search 클러스터관리
Elastic search 클러스터관리Elastic search 클러스터관리
Elastic search 클러스터관리
 
Node js presentation
Node js presentationNode js presentation
Node js presentation
 
Express node js
Express node jsExpress node js
Express node js
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
 
Migrating and living on rds aurora
Migrating and living on rds auroraMigrating and living on rds aurora
Migrating and living on rds aurora
 
Node.js
Node.jsNode.js
Node.js
 
What is nodejs
What is nodejsWhat is nodejs
What is nodejs
 
Network Automation: Ansible 102
Network Automation: Ansible 102Network Automation: Ansible 102
Network Automation: Ansible 102
 
Node ppt
Node pptNode ppt
Node ppt
 
Node.js and Cassandra
Node.js and CassandraNode.js and Cassandra
Node.js and Cassandra
 
Scaling PHP apps
Scaling PHP appsScaling PHP apps
Scaling PHP apps
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
 
Data integration with embulk
Data integration with embulkData integration with embulk
Data integration with embulk
 
Rhebok, High Performance Rack Handler / Rubykaigi 2015
Rhebok, High Performance Rack Handler / Rubykaigi 2015Rhebok, High Performance Rack Handler / Rubykaigi 2015
Rhebok, High Performance Rack Handler / Rubykaigi 2015
 
Using Ansible for Deploying to Cloud Environments
Using Ansible for Deploying to Cloud EnvironmentsUsing Ansible for Deploying to Cloud Environments
Using Ansible for Deploying to Cloud Environments
 
Speeding Up The Snail
Speeding Up The SnailSpeeding Up The Snail
Speeding Up The Snail
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHP
 
MongoDB San Francisco DrupalCon 2010
MongoDB San Francisco DrupalCon 2010MongoDB San Francisco DrupalCon 2010
MongoDB San Francisco DrupalCon 2010
 

Viewers also liked

Node.js를 사용한 Big Data 사례연구
Node.js를 사용한 Big Data 사례연구Node.js를 사용한 Big Data 사례연구
Node.js를 사용한 Big Data 사례연구ByungJoon Lee
 
Syrup pay 인증 모듈 개발 사례
Syrup pay 인증 모듈 개발 사례Syrup pay 인증 모듈 개발 사례
Syrup pay 인증 모듈 개발 사례HyungTae Lim
 
Enterprise Docker
Enterprise DockerEnterprise Docker
Enterprise DockerLee Ji Eun
 
유한 상태 기반의 한국어 형태소 분석기_이상호
유한 상태 기반의 한국어 형태소 분석기_이상호유한 상태 기반의 한국어 형태소 분석기_이상호
유한 상태 기반의 한국어 형태소 분석기_이상호Lee Ji Eun
 
애자일은 반드시 없어져야 한다
애자일은 반드시 없어져야 한다애자일은 반드시 없어져야 한다
애자일은 반드시 없어져야 한다종범 고
 
SK플래닛_README_마이크로서비스 아키텍처로 개발하기
SK플래닛_README_마이크로서비스 아키텍처로 개발하기SK플래닛_README_마이크로서비스 아키텍처로 개발하기
SK플래닛_README_마이크로서비스 아키텍처로 개발하기Lee Ji Eun
 
기술적 변화를 이끌어가기
기술적 변화를 이끌어가기기술적 변화를 이끌어가기
기술적 변화를 이끌어가기Jaewoo Ahn
 
부동산 텔레그램봇 사내공유 @Tech
부동산 텔레그램봇 사내공유 @Tech부동산 텔레그램봇 사내공유 @Tech
부동산 텔레그램봇 사내공유 @TechHoChul Shin
 
Pull reqeust 활용기
Pull reqeust 활용기Pull reqeust 활용기
Pull reqeust 활용기jungseob shin
 
T map network graph_t map spider 프로젝트 at_tech
T map network graph_t map spider 프로젝트 at_techT map network graph_t map spider 프로젝트 at_tech
T map network graph_t map spider 프로젝트 at_techLee Ji Eun
 
구글 인박스 히드라 프로그래밍
구글 인박스 히드라 프로그래밍구글 인박스 히드라 프로그래밍
구글 인박스 히드라 프로그래밍Lee Ji Eun
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieChicago Hadoop Users Group
 
Spring Cloud and Netflix OSS overview v1
Spring Cloud and Netflix OSS overview v1Spring Cloud and Netflix OSS overview v1
Spring Cloud and Netflix OSS overview v1Dmitry Skaredov
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filteringSungMin OH
 
FIDO 생체인증 기술 개발 사례
FIDO 생체인증 기술 개발 사례FIDO 생체인증 기술 개발 사례
FIDO 생체인증 기술 개발 사례Lee Ji Eun
 
그루비 소개 발표자료 - 김연수
그루비 소개 발표자료 - 김연수그루비 소개 발표자료 - 김연수
그루비 소개 발표자료 - 김연수Yeon Soo Kim
 
Data Analytics with Druid
Data Analytics with DruidData Analytics with Druid
Data Analytics with DruidYousun Jeong
 
납땜하는 개발자 이야기 @Tech판교
납땜하는 개발자 이야기 @Tech판교납땜하는 개발자 이야기 @Tech판교
납땜하는 개발자 이야기 @Tech판교Lee Ji Eun
 
Redis twemproxy failover
Redis twemproxy failoverRedis twemproxy failover
Redis twemproxy failover성재 장
 

Viewers also liked (20)

Node.js를 사용한 Big Data 사례연구
Node.js를 사용한 Big Data 사례연구Node.js를 사용한 Big Data 사례연구
Node.js를 사용한 Big Data 사례연구
 
Syrup pay 인증 모듈 개발 사례
Syrup pay 인증 모듈 개발 사례Syrup pay 인증 모듈 개발 사례
Syrup pay 인증 모듈 개발 사례
 
Enterprise Docker
Enterprise DockerEnterprise Docker
Enterprise Docker
 
유한 상태 기반의 한국어 형태소 분석기_이상호
유한 상태 기반의 한국어 형태소 분석기_이상호유한 상태 기반의 한국어 형태소 분석기_이상호
유한 상태 기반의 한국어 형태소 분석기_이상호
 
애자일은 반드시 없어져야 한다
애자일은 반드시 없어져야 한다애자일은 반드시 없어져야 한다
애자일은 반드시 없어져야 한다
 
SK플래닛_README_마이크로서비스 아키텍처로 개발하기
SK플래닛_README_마이크로서비스 아키텍처로 개발하기SK플래닛_README_마이크로서비스 아키텍처로 개발하기
SK플래닛_README_마이크로서비스 아키텍처로 개발하기
 
기술적 변화를 이끌어가기
기술적 변화를 이끌어가기기술적 변화를 이끌어가기
기술적 변화를 이끌어가기
 
부동산 텔레그램봇 사내공유 @Tech
부동산 텔레그램봇 사내공유 @Tech부동산 텔레그램봇 사내공유 @Tech
부동산 텔레그램봇 사내공유 @Tech
 
Pull reqeust 활용기
Pull reqeust 활용기Pull reqeust 활용기
Pull reqeust 활용기
 
T map network graph_t map spider 프로젝트 at_tech
T map network graph_t map spider 프로젝트 at_techT map network graph_t map spider 프로젝트 at_tech
T map network graph_t map spider 프로젝트 at_tech
 
구글 인박스 히드라 프로그래밍
구글 인박스 히드라 프로그래밍구글 인박스 히드라 프로그래밍
구글 인박스 히드라 프로그래밍
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
 
Spring Cloud and Netflix OSS overview v1
Spring Cloud and Netflix OSS overview v1Spring Cloud and Netflix OSS overview v1
Spring Cloud and Netflix OSS overview v1
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
FIDO 생체인증 기술 개발 사례
FIDO 생체인증 기술 개발 사례FIDO 생체인증 기술 개발 사례
FIDO 생체인증 기술 개발 사례
 
그루비 소개 발표자료 - 김연수
그루비 소개 발표자료 - 김연수그루비 소개 발표자료 - 김연수
그루비 소개 발표자료 - 김연수
 
Data Analytics with Druid
Data Analytics with DruidData Analytics with Druid
Data Analytics with Druid
 
납땜하는 개발자 이야기 @Tech판교
납땜하는 개발자 이야기 @Tech판교납땜하는 개발자 이야기 @Tech판교
납땜하는 개발자 이야기 @Tech판교
 
Redis twemproxy failover
Redis twemproxy failoverRedis twemproxy failover
Redis twemproxy failover
 
Hystrix소개
Hystrix소개Hystrix소개
Hystrix소개
 

Similar to Dmp hadoop getting_start

Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesLindsay Holmwood
 
Head First Zend Framework - Part 1 Project & Application
Head First Zend Framework - Part 1 Project & ApplicationHead First Zend Framework - Part 1 Project & Application
Head First Zend Framework - Part 1 Project & ApplicationJace Ju
 
[Coscup 2012] JavascriptMVC
[Coscup 2012] JavascriptMVC[Coscup 2012] JavascriptMVC
[Coscup 2012] JavascriptMVCAlive Kuo
 
Debugging: Rules & Tools
Debugging: Rules & ToolsDebugging: Rules & Tools
Debugging: Rules & ToolsIan Barber
 
Debugging: Rules And Tools - PHPTek 11 Version
Debugging: Rules And Tools - PHPTek 11 VersionDebugging: Rules And Tools - PHPTek 11 Version
Debugging: Rules And Tools - PHPTek 11 VersionIan Barber
 
Express Presentation
Express PresentationExpress Presentation
Express Presentationaaronheckmann
 
Spring data iii
Spring data iiiSpring data iii
Spring data iii명철 강
 
IaC and Immutable Infrastructure with Terraform, Сергей Марченко
IaC and Immutable Infrastructure with Terraform, Сергей МарченкоIaC and Immutable Infrastructure with Terraform, Сергей Марченко
IaC and Immutable Infrastructure with Terraform, Сергей МарченкоSigma Software
 
Zend Framework 1.9 Setup & Using Zend_Tool
Zend Framework 1.9 Setup & Using Zend_ToolZend Framework 1.9 Setup & Using Zend_Tool
Zend Framework 1.9 Setup & Using Zend_ToolGordon Forsythe
 
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011JRuby + Rails = Awesome Java Web Framework at Jfokus 2011
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011Nick Sieger
 
Building Testable PHP Applications
Building Testable PHP ApplicationsBuilding Testable PHP Applications
Building Testable PHP Applicationschartjes
 
Harmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and PuppetHarmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and PuppetAchieve Internet
 
Mojolicious - A new hope
Mojolicious - A new hopeMojolicious - A new hope
Mojolicious - A new hopeMarcus Ramberg
 
Service discovery and configuration provisioning
Service discovery and configuration provisioningService discovery and configuration provisioning
Service discovery and configuration provisioningSource Ministry
 
Writing robust Node.js applications
Writing robust Node.js applicationsWriting robust Node.js applications
Writing robust Node.js applicationsTom Croucher
 

Similar to Dmp hadoop getting_start (20)

Fatc
FatcFatc
Fatc
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websites
 
Head First Zend Framework - Part 1 Project & Application
Head First Zend Framework - Part 1 Project & ApplicationHead First Zend Framework - Part 1 Project & Application
Head First Zend Framework - Part 1 Project & Application
 
[Coscup 2012] JavascriptMVC
[Coscup 2012] JavascriptMVC[Coscup 2012] JavascriptMVC
[Coscup 2012] JavascriptMVC
 
October 2013 HUG: Oozie 4.x
October 2013 HUG: Oozie 4.xOctober 2013 HUG: Oozie 4.x
October 2013 HUG: Oozie 4.x
 
Debugging: Rules & Tools
Debugging: Rules & ToolsDebugging: Rules & Tools
Debugging: Rules & Tools
 
Having Fun with Play
Having Fun with PlayHaving Fun with Play
Having Fun with Play
 
Debugging: Rules And Tools - PHPTek 11 Version
Debugging: Rules And Tools - PHPTek 11 VersionDebugging: Rules And Tools - PHPTek 11 Version
Debugging: Rules And Tools - PHPTek 11 Version
 
Write php deploy everywhere
Write php deploy everywhereWrite php deploy everywhere
Write php deploy everywhere
 
Express Presentation
Express PresentationExpress Presentation
Express Presentation
 
Write php deploy everywhere tek11
Write php deploy everywhere   tek11Write php deploy everywhere   tek11
Write php deploy everywhere tek11
 
Spring data iii
Spring data iiiSpring data iii
Spring data iii
 
IaC and Immutable Infrastructure with Terraform, Сергей Марченко
IaC and Immutable Infrastructure with Terraform, Сергей МарченкоIaC and Immutable Infrastructure with Terraform, Сергей Марченко
IaC and Immutable Infrastructure with Terraform, Сергей Марченко
 
Zend Framework 1.9 Setup & Using Zend_Tool
Zend Framework 1.9 Setup & Using Zend_ToolZend Framework 1.9 Setup & Using Zend_Tool
Zend Framework 1.9 Setup & Using Zend_Tool
 
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011JRuby + Rails = Awesome Java Web Framework at Jfokus 2011
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011
 
Building Testable PHP Applications
Building Testable PHP ApplicationsBuilding Testable PHP Applications
Building Testable PHP Applications
 
Harmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and PuppetHarmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and Puppet
 
Mojolicious - A new hope
Mojolicious - A new hopeMojolicious - A new hope
Mojolicious - A new hope
 
Service discovery and configuration provisioning
Service discovery and configuration provisioningService discovery and configuration provisioning
Service discovery and configuration provisioning
 
Writing robust Node.js applications
Writing robust Node.js applicationsWriting robust Node.js applications
Writing robust Node.js applications
 

Recently uploaded

High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTSneha Padhiar
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labsamber724300
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunicationnovrain7111
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfalene1
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfBalamuruganV28
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosVictor Morales
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
AntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxAntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxLina Kadam
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier Fernández Muñoz
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
Substation Automation SCADA and Gateway Solutions by BRH
Substation Automation SCADA and Gateway Solutions by BRHSubstation Automation SCADA and Gateway Solutions by BRH
Substation Automation SCADA and Gateway Solutions by BRHbirinder2
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.elesangwon
 
Indian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdfIndian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdfalokitpathak01
 

Recently uploaded (20)

High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labs
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunication
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdf
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitos
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
AntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxAntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptx
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptx
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
Substation Automation SCADA and Gateway Solutions by BRH
Substation Automation SCADA and Gateway Solutions by BRHSubstation Automation SCADA and Gateway Solutions by BRH
Substation Automation SCADA and Gateway Solutions by BRH
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
 
Indian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdfIndian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdf
 

Dmp hadoop getting_start

  • 1. 대용량 플랫폼 구축을 위한 Hadoop 따라가기 skplanet 김경진
  • 2. Data Service 개발팀 - Recopick - SyrupAd - DMP 디엠ㅍ …가 아니고 MRSS (Marketing & Recommendation Suport System)
  • 5. Q. 대동법이 시행된 연도? -> 모른다 Q. 여자친구는 어떻게 만드나요? -> ???????
  • 6. 단순히 아는 것과/모르는 것으로 분리되는 일 VS 다양한 양상과 컨텍스트가 분석이 필요한 일
  • 7. Task 초입 : 대부분 모르는 것의 문제
  • 8. Task 진행 중기/말기 : 대부분 다양한 문제
  • 9. 그럼 설명하기 전에 잠깐만... 보통 빅데이터 설명은 왜 힘든가? 사실 개발은 CRUD 아닌가? 왜 이해가 안되지? -> 사실 ‘무엇’때문에 ‘뭘 해야’하는지 모르는거 아냐?
  • 10. 그래서 뭘 하고 싶었는지 적어봤습니다… 많은 파일로 쪼개져 있던 대용량의 데이터 읽기 -> Spark 대용량의 데이터를 맵리듀스 하여 원하는 아웃풋으로 만들기 -> Spark 대용량의 데이터를 주기적으로 배치, 관리하기 -> Oozie
  • 11. Spark 빅데이터 처리를 위한 분산 플랫폼 - RDD (Resilient Distribute DataSet) : Transformation, Action - DataFrame : tabular data(테이블형 데이터) 처리를 위한 분산 컬렉션
  • 12. 실행하기 위해 겪어야 했던 것들.. Getting started with Spark Run Spark in Stand Alone Mode Run Spark in Cluster Mode Run Spark in Real Distribute Environment Run Spark by Scheduler
  • 13.
  • 14. Oozie Server based Workflow Engine specialized in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs Oozie는 하둡의 Workflow 스케쥴러
  • 15. Oozie workflow <workflow-app xmlns='uri:oozie:workflow:0.1' name='processDir'> <start to='getDirInfo' /> <!-- STEP ONE --> <action name='getDirInfo'> <!--writes 2 properties: dir.num-files: returns -1 if dir doesn't exist, otherwise returns # of files in dir dir.age: returns -1 if dir doesn't exist, otherwise returns age of dir in days --> <java> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <main-class>com.navteq.oozie.GetDirInfo</main-class> <arg>${inputDir}</arg> <capture-output /> </java> <ok to="makeIngestDecision" /> <error to="fail" /> </action> <kill name="fail"> <message>Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name='end' /> </workflow>
  • 16. Oozie command $ oozie job -oozie http://localhost:8080/oozie -config examples/apps/map-reduce/job.properties -run . job: 14-20090525161321-oozie-tucu Check the workflow job status: $ oozie job -oozie http://localhost:8080/oozie -info 14-20090525161321-oozie-tucu . .---------------------------------------------------------------------------------------------------------------------------------------------------------------- Workflow Name : map-reduce-wf App Path : hdfs://localhost:9000/user/tucu/examples/apps/map-reduce Status : SUCCEEDED Run : 0 User : tucu Group : users Created : 2009-05-26 05:01 +0000 Started : 2009-05-26 05:01 +0000 Ended : 2009-05-26 05:01 +0000 Actions .---------------------------------------------------------------------------------------------------------------------------------------------------------------- Action Name Type Status Transition External Id External Status Error Code Start Time End Time .---------------------------------------------------------------------------------------------------------------------------------------------------------------- mr-node map-reduce OK end job_200904281535_0254 SUCCEEDED - 2009-05-26 05:01 +0000 2009-05-26 05:01 +0000 .----------------------------------------------------------------------------------------------------------------------------------------------------------------
  • 17. Java Client ... // start local Oozie LocalOozie.start(); . // get a OozieClient for local Oozie OozieClient wc = LocalOozie.getClient(); . // create a workflow job configuration and set the workflow application path Properties conf = wc.createConfiguration(); conf.setProperty(OozieClient.APP_PATH, "hdfs://foo:9000/usr/tucu/my-wf-app"); . // setting workflow parameters conf.setProperty("jobTracker", "foo:9001"); conf.setProperty("inputDir", "/usr/tucu/inputdir"); conf.setProperty("outputDir", "/usr/tucu/outputdir"); ... . // submit and start the workflow job String jobId = wc.run(conf); System.out.println("Workflow job submitted"); . // wait until the workflow job finishes printing the status every 10 seconds while (wc.getJobInfo(jobId).getStatus() == Workflow.Status.RUNNING) { System.out.println("Workflow job running ..."); Thread.sleep(10 * 1000); } . // print the final status o the workflow job System.out.println("Workflow job completed ..."); System.out.println(wf.getJobInfo(jobId)); . // stop local Oozie LocalOozie.stop(); …
  • 18. Run Spark with Oozie
  • 19. Oozie : Curse or Blessing? Unmanaged xml hell Unmanaged dependencies with different big data applications ${oozie.wf.application.path}/lib oozie.libpath=${oozie.wf.application.path}/lib ShareLib : /user/${user.name}/share/lib <property> <name>oozie.hive.defaults</name> <value>${jobDir}/hive-conf.xml</value> </property>
  • 20. HiveMain public class HiveMain extends LauncherMain { public static final String HIVE_SITE_CONF = "hive-site.xml"; public static Configuration setUpHiveSite() throws Exception { Configuration hiveConf = initActionConf(); // Write the action configuration out to hive-site.xml OutputStream os = new FileOutputStream(HIVE_SITE_CONF); hiveConf.writeXml(os); os.close(); System.out.println(); System.out.println("Hive Configuration Properties:"); System.out.println("------------------------"); for (Entry<String, String> entry : hiveConf) { System.out.println(entry.getKey() + "=" + entry.getValue()); } System.out.flush(); System.out.println("------------------------"); System.out.println();
  • 21. Do not trust sharedLib
  • 22. HDFS Processing Hive Hive Hadoop Spark Segmentation 1. prdname contains(‘나이키’) AND adtitle contains(‘Aution’) prdname adtitle 나이키 Aution 리복 Gmarket id count 1 2300 2 156000
  • 23. Spark with DataFrame - DataFrame Features : pass data between nodes, in a much more efficient way than using Java serialization. (Because Spark understands the schema) : transformations directly data on off-heap memory, avoiding the garbage-collection costs : API for building a relational query plan that Spark’s Catalyst optimizer can then execute
  • 24. Spark Too Slow… DataFrame을 통해 여차저차 구현 후… 30(80G*30)일치 데이터 100개의 세그먼트 처리 -> 처리 시간 8시간 -> 스펙을 맞출 수 없음 -> 느리지만 돌아가해도 괜찮겠다 바람…
  • 25. Spark Failed - DataFrame Performance Heavy overhead Not optimized in distribute environment : Memory Risky
  • 26. SqlContext Problems core 2 executor 256 hive metastore db connection full 유발 (mysql connnection limit : 2,000)
  • 27. Fail reason 내가 소홀히 했던 것들… 라이브러리 의존성 체크에 방만 로컬 테스트가 불가능할 때를 겪어보지 못함 분산 환경 프로그래밍에 대한 지식 부재

Editor's Notes

  1. 프로젝트 세팅은 어떻게 해요? 클러스터 어떻게 구축해요? 애플리케이션 가동은 어떻게 해요? to do list를 잘게 쪼갠다
  2. 왜 실행 자체가 안되죠? 왜 돌다가 돌연사하죠? 왜 이렇게 느리죠? 어떻게 빠르게 하죠? 원인이 추정되는 부분을 조금씩 소거한다
  3. etl cluster 242:DataNode core:4