Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Stream Analysis with Kafka Native way
and Considerations in Monitoring As A service
Andrew yongjoon Kong
sstrato.open@gmai...
• Cloud Technical Advisory for Government Broad Cast Agency
• Adjunct Prof. Ajou Univ
• Korea Data Base Agency Acting Prof...
Some Terms
Batch vs Stream
Processing
응용프로그램
센서(IOT)
기타
이
벤
트
이
벤
트
이
벤
트
이
벤
트
이
벤
트
응용프로그램
스트림
데이터베이스
분산 파일 시스템
이벤트 이벤트 이벤트 이벤트 이벤트
신규 스트림
쿼리 실행
분석...
What is Real Time?
The term real-time analytics implies
practically instant access and use of
analytical data
Relative, Time is
To be continued
processing
이
벤
트
이
벤
트
이
벤
트
이
벤
트
이
벤
트
응용프로그램 이벤
트
이벤
트
이벤
트
이벤
트
이벤
트
기존 스트림
신규 스트림
state저장소
Popular stream processor
• Apache Flume ( Too old school)
• Apache Storm
• Apache Spark
• Apache Samza
• Apache Nifi …
Popular stream processor
• e.g. Apache Flume
• flume comprises source, sink, channel
Channel
source sink
AVRO Source
Thrif...
Kafka Streams
• Simple (only can work with kafka)
• guarantee exactly once
• provide local state store
• DSL support
• Kaf...
Kafka Streams Sample 1, pipe
• kafka streams code:
• running:
mvn exec:java -Dexec.mainClass=myapps.Pipe
final StreamsBuil...
Kafka Streams Sample 1, pipe
vs Apache Samza
• Code:
• running:
• copy the jar or path to the hadoop cluster
• run the pro...
Kafka Streams Sample 2, wordcount
• Topology
Kafka Streams Sample 2, wordcount
• Code
builder.<String, String>stream("streams-plaintext-input")
.flatMapValues(new Valu...
Kafka Streams
• Demo
Kafka Streams QnA
• Let’s talk
KSQL
Before go into KSQL
• Why you need SQL?
Productivity Perspective
• 18 ~ 25세 연령대의 사용자가 가장 많이 방문하는 사이트 5개를
찾아라
사용자
정보
사이트 방문
데이터
사용자정보
Loading
사이트 방문
데이터 Loading
나이...
• 직접 MapReduce 프로그램을 코딩 할 경우
import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import jav...
• It’s for coder , not for user
• duplicated code and effort
• complexity in managing code
Productivity Perspective
High Level Parallel Processing Language
• 쉬운 MapReduce 를 위한 병렬 처리 언어
• Pig by Yahoo
• Hive by FaceBook
22
Pig example
• Same purpose code in PIG
Users = load ‘users’ as (name, age);
Fltrd = filter Users by
age >= 18 and age <= 2...
아파치 Pig
• 데이터 처리를 위한 고차원 언어
• 아파치 Top-Level 프로젝트
• Yahoo 내 Hadoop 작업의 30%
• 2007년 배포 이후 2~10배 성능 개선
• Native 대비 70 ~ 80 % ...
Hive 개발 동기
• 벤더 데이터 웨어하우스 시스템 교체
• 데이터 확장성 문제(최초 10GB  수십TB)
• 라이선스 등 운영 비용 절감
• 벤더 DBMS 에서 Hadoop 으로 교체 결정
• 교체 과정에서 나타난...
Hive 기반 데이터웨어하우징
• Hive on Hadoop 클러스터
• Scribe & MySQL 데이터를 HDFS 에 벌크 로딩
• 수동 Python 스크립트를 Hive 로 변경
Oracle DatabaseData ...
What is the key component in Hive?
27
metastore
SerDe
execution
worker
Lambda Architecture
로우(Raw)
데이터 토픽
가공된 데이터
토픽
장기 데이터
저장소
데이터
조회
배치(Batch)
계산
배치 테이블
고속 테이블
스트리밍 계산
SUM
ADD
Filtering
데이터
제...
Lambda Architecture
BTW, (not BTS)
Why it calls Lambda Architecture.
Greek letter lambda (λ)
Lambda Architecture e.g.
kakao’s KEMI-stat
http://tech.kakao.com/2016/08/25/kemi/
Kappa Architecture
• Key takeaways is “Calculating instantly, not retrieving instantly”
로우(Raw)
데이터 토픽
(+장기 저장)
가공된 데이
터
토...
Kappa Architecture with KSQL
단기 데이터
토픽
장기 데이터
토픽
데이터
조회
계산
select * from
short_topic
계산
select * from
long_topic
KSQL
KSQL
카프카 클러스터
KSQL 서버
KSQL
Engine
REST API
KSQL 서버
KSQL
Engine
REST API
KSQL 서버
KSQL
Engine
REST API
KSQL Shell Client
KSQL metastore
KSQL execution worker
Kafka Streams
KSQL metastore
KSQL DDL (Data Definition
Language)• stream vs table
• support only “CREATE/DELETE” for stream and table
• sample
CREATE T...
KSQL DML (Data Manipulation
Language)
• SELECT, LEFT JOIN
• aggregate function like ADD, SUM and UDF like ABS/CONCAT
suppo...
Example
PageViews
(카프카토픽)
Users
(카프카토픽)
Page view 토픽 데이터
생성기
Users 토픽
데이터 생성기
PAGEVIEWS_
FEMALE
(신규 카프카토픽)
pageviews_en
ri...
Example, create user/pageview table
PageView
s
(카프카토픽)
Users
(카프카토픽)
Page view 토픽
데이터
생성기
Users 토픽
데이터 생성기
PAGEVIE
WS_FEM
...
Example, create table/stream from
query
PageView
s
(카프카토픽)
Users
(카프카토픽)
Page view 토픽
데이터
생성기
Users 토픽
데이터 생성기
PAGEVIE
WS_...
Considerations about
Monitoring As A Service
What is most important in
Data pipeline?
Kafka (ESB)
파이썬
앱
플러그인
자바
앱
ERP. 브릿지 웹앱
Kafka (ESB)
기존
메시징 시스템
EIP
EIP
What is the most important in Data
pipeline?
• Performance
• Have to be real-time ( or near real-time)
• Data Integrity
• ...
The most important in Data pipeline?
• Provider Perspective
• Service Level Agreement
• Rate
• Format
• ACL
• It’s all abo...
About Data Structure?
• Data Structure defines the Data computing architecture
• It defines API
• It defines Data Storage
...
QnA
Nächste SlideShare
Wird geladen in …5
×

Stream analysis with kafka native way and considerations about monitoring as a service

478 Aufrufe

Veröffentlicht am

Talk about kafka native stream analysis tool and considerationss in building monitoring as a service

Veröffentlicht in: Internet
  • Als Erste(r) kommentieren

Stream analysis with kafka native way and considerations about monitoring as a service

  1. 1. Stream Analysis with Kafka Native way and Considerations in Monitoring As A service Andrew yongjoon Kong sstrato.open@gmail.com
  2. 2. • Cloud Technical Advisory for Government Broad Cast Agency • Adjunct Prof. Ajou Univ • Korea Data Base Agency Acting Professor for Bigdata • Member of National Information Agency Bigdata Advisory committee • Kakaocorp, Cloud Part Lead • Talks • Scalable Loadbalancer with VM orchestrator (2017, netdev, korea) • Embrace clouds (2017, openstack days, korea) • Full route based network with linux (2016, netdev, Tokyo) • SDN without SDN (2015, openstack, Vancouber) Who am I Andrew. Yongjoon kong Supervised, Korean edition Korean Edition.
  3. 3. Some Terms Batch vs Stream
  4. 4. Processing 응용프로그램 센서(IOT) 기타 이 벤 트 이 벤 트 이 벤 트 이 벤 트 이 벤 트 응용프로그램 스트림 데이터베이스 분산 파일 시스템 이벤트 이벤트 이벤트 이벤트 이벤트 신규 스트림 쿼리 실행 분석 업테이트 응용프로그램 배치처리 영역 스트림 처리영역
  5. 5. What is Real Time? The term real-time analytics implies practically instant access and use of analytical data
  6. 6. Relative, Time is To be continued
  7. 7. processing 이 벤 트 이 벤 트 이 벤 트 이 벤 트 이 벤 트 응용프로그램 이벤 트 이벤 트 이벤 트 이벤 트 이벤 트 기존 스트림 신규 스트림 state저장소
  8. 8. Popular stream processor • Apache Flume ( Too old school) • Apache Storm • Apache Spark • Apache Samza • Apache Nifi …
  9. 9. Popular stream processor • e.g. Apache Flume • flume comprises source, sink, channel Channel source sink AVRO Source Thrift Source Exec source JMS source Spooling Directory Source NetCat Source Sequence Generator Syslog sources HTTP source twitter source HDFS sink Logger sink Avro sink Thift sink IRC sink File Roll sink HbaseSink ElasticSearchSink sourc e sink partitions
  10. 10. Kafka Streams • Simple (only can work with kafka) • guarantee exactly once • provide local state store • DSL support • Kafka Streams comprises source processor, sink processor, topology • Source: reading data from Kakfa topic • Sink : getting data from other processor • topology: automatically created data pipe line
  11. 11. Kafka Streams Sample 1, pipe • kafka streams code: • running: mvn exec:java -Dexec.mainClass=myapps.Pipe final StreamsBuilder builder = new StreamsBuilder(); builder.stream(“streams-plaintext-input”).to(“streams-pipe-output”); final Topology topology = builder.build(); final KafkaStreams streams = new KafkaStreams(topology, props);
  12. 12. Kafka Streams Sample 1, pipe vs Apache Samza • Code: • running: • copy the jar or path to the hadoop cluster • run the program • what happen if something goes bad?
  13. 13. Kafka Streams Sample 2, wordcount • Topology
  14. 14. Kafka Streams Sample 2, wordcount • Code builder.<String, String>stream("streams-plaintext-input") .flatMapValues(new ValueMapper<String, Iterable<String>>() { @Override public Iterable<String> apply(String value) { return Arrays.asList(value.toLowerCase(Locale.getDefault()).split("W+")); } }) .groupBy(new KeyValueMapper<String, String, String>() { @Override public String apply(String key, String value) { return value; } }) .count(Materialized.<String, Long, KeyValueStore<Bytes, byte[]>>as("counts-store")) .toStream() .to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));
  15. 15. Kafka Streams • Demo
  16. 16. Kafka Streams QnA • Let’s talk
  17. 17. KSQL
  18. 18. Before go into KSQL • Why you need SQL?
  19. 19. Productivity Perspective • 18 ~ 25세 연령대의 사용자가 가장 많이 방문하는 사이트 5개를 찾아라 사용자 정보 사이트 방문 데이터 사용자정보 Loading 사이트 방문 데이터 Loading 나이 Filtering Join (아이디) 그룹핑 (사이트) 카운트 (방문횟수) 정렬 (방문횟수) Top 5 사이트 사용자 아이디 나이 성별 길동 kildong 20 남 철수 cheol 25 남 영희 young 15 여 영구 ygu 34 남 사이트 방문자 시간 chosum.com kildong 08:00 ddanji.com tiffany 12:00 flickr.com yuna 11:00 espn.com ygu 21:34
  20. 20. • 직접 MapReduce 프로그램을 코딩 할 경우 import java.io.IOException; import java.util.ArrayList; import java.util.Iterator; import java.util.List; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Writable; import org.apache.hadoop.io.WritableComparable; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.KeyValueTextInputFormat; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.RecordReader; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.SequenceFileInputFormat; import org.apache.hadoop.mapred.SequenceFileOutputFormat; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.jobcontrol.Job; import org.apache.hadoop.mapred.jobcontrol.JobControl; import org.apache.hadoop.mapred.lib.IdentityMapper; public class MRExample { public static class LoadPages extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> { public void map(LongWritable k, Text val, OutputCollector<Text, Text> oc, Reporter reporter) throws IOException { // Pull the key out String line = val.toString(); int firstComma = line.indexOf(','); String key = line.substring(0, firstComma); String value = line.substring(firstComma + 1); Text outKey = new Text(key); // Prepend an index to the value so we know which file // it came from. Text outVal = new Text("1" + value); oc.collect(outKey, outVal); } } public static class LoadAndFilterUsers extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> { public void map(LongWritable k, Text val, OutputCollector<Text, Text> oc, Reporter reporter) throws IOException { // Pull the key out String line = val.toString(); int firstComma = line.indexOf(','); String value = line.substring(firstComma + 1); int age = Integer.parseInt(value); if (age < 18 || age > 25) return; String key = line.substring(0, firstComma); Text outKey = new Text(key); // Prepend an index to the value so we know which file // it came from. Text outVal = new Text("2" + value); oc.collect(outKey, outVal); } } public static class Join extends MapReduceBase implements Reducer<Text, Text, Text, Text> { public void reduce(Text key, Iterator<Text> iter, OutputCollector<Text, Text> oc, Reporter reporter) throws IOException { // For each value, figure out which file it's from and store it // accordingly. List<String> first = new ArrayList<String>(); List<String> second = new ArrayList<String>(); while (iter.hasNext()) { Text t = iter.next(); String value = t.toString(); if (value.charAt(0) == '1') first.add(value.substring(1)); else second.add(value.substring(1)); reporter.setStatus("OK"); } // Do the cross product and collect the values for (String s1 : first) { for (String s2 : second) { String outval = key + "," + s1 + "," + s2; oc.collect(null, new Text(outval)); reporter.setStatus("OK"); } } } } public static class LoadJoined extends MapReduceBase implements Mapper<Text, Text, Text, LongWritable> { public void map( Text k, Text val, OutputCollector<Text, LongWritable> oc, Reporter reporter) throws IOException { // Find the url String line = val.toString(); int firstComma = line.indexOf(','); int secondComma = line.indexOf(',', firstComma); String key = line.substring(firstComma, secondComma); // drop the rest of the record, I don't need it anymore, // just pass a 1 for the combiner/reducer to sum instead. Text outKey = new Text(key); oc.collect(outKey, new LongWritable(1L)); } } public static class ReduceUrls extends MapReduceBase implements Reducer<Text, LongWritable, WritableComparable, Writable> { public void reduce( Text key, Iterator<LongWritable> iter, OutputCollector<WritableComparable, Writable> oc, Reporter reporter) throws IOException { // Add up all the values we see long sum = 0; while (iter.hasNext()) { sum += iter.next().get(); reporter.setStatus("OK"); } oc.collect(key, new LongWritable(sum)); } } public static class LoadClicks extends MapReduceBase implements Mapper<WritableComparable, Writable, LongWritable, Text> { public void map( WritableComparable key, Writable val, OutputCollector<LongWritable, Text> oc, Reporter reporter) throws IOException { oc.collect((LongWritable)val, (Text)key); } } public static class LimitClicks extends MapReduceBase implements Reducer<LongWritable, Text, LongWritable, Text> { int count = 0; public void reduce( LongWritable key, Iterator<Text> iter, OutputCollector<LongWritable, Text> oc, Reporter reporter) throws IOException { // Only output the first 100 records while (count < 100 && iter.hasNext()) { oc.collect(key, iter.next()); count++; } } } public static void main(String[] args) throws IOException { JobConf lp = new JobConf(MRExample.class); lp.setJobName("Load Pages"); lp.setInputFormat(TextInputFormat.class); lp.setOutputKeyClass(Text.class); lp.setOutputValueClass(Text.class); lp.setMapperClass(LoadPages.class); FileInputFormat.addInputPath(lp, new Path("/user/gates/pages")); FileOutputFormat.setOutputPath(lp, new Path("/user/gates/tmp/indexed_pages")); lp.setNumReduceTasks(0); Job loadPages = new Job(lp); JobConf lfu = new JobConf(MRExample.class); lfu.setJobName("Load and Filter Users"); lfu.setInputFormat(TextInputFormat.class); lfu.setOutputKeyClass(Text.class); lfu.setOutputValueClass(Text.class); lfu.setMapperClass(LoadAndFilterUsers.class); FileInputFormat.addInputPath(lfu, new Path("/user/gates/users")); FileOutputFormat.setOutputPath(lfu, new Path("/user/gates/tmp/filtered_users")); lfu.setNumReduceTasks(0); Job loadUsers = new Job(lfu); JobConf join = new JobConf(MRExample.class); join.setJobName("Join Users and Pages"); join.setInputFormat(KeyValueTextInputFormat.class); join.setOutputKeyClass(Text.class); join.setOutputValueClass(Text.class); join.setMapperClass(IdentityMapper.class); join.setReducerClass(Join.class); FileInputFormat.addInputPath(join, new Path("/user/gates/tmp/indexed_pages")); FileInputFormat.addInputPath(join, new Path("/user/gates/tmp/filtered_users")); FileOutputFormat.setOutputPath(join, new Path("/user/gates/tmp/joined")); join.setNumReduceTasks(50); Job joinJob = new Job(join); joinJob.addDependingJob(loadPages); joinJob.addDependingJob(loadUsers); JobConf group = new JobConf(MRExample.class); group.setJobName("Group URLs"); group.setInputFormat(KeyValueTextInputFormat.class); group.setOutputKeyClass(Text.class); group.setOutputValueClass(LongWritable.class); group.setOutputFormat(SequenceFileOutputFormat.class); group.setMapperClass(LoadJoined.class); group.setCombinerClass(ReduceUrls.class); group.setReducerClass(ReduceUrls.class); FileInputFormat.addInputPath(group, new Path("/user/gates/tmp/joined")); FileOutputFormat.setOutputPath(group, new Path("/user/gates/tmp/grouped")); group.setNumReduceTasks(50); Job groupJob = new Job(group); groupJob.addDependingJob(joinJob); JobConf top100 = new JobConf(MRExample.class); top100.setJobName("Top 100 sites"); top100.setInputFormat(SequenceFileInputFormat.class); top100.setOutputKeyClass(LongWritable.class); top100.setOutputValueClass(Text.class); top100.setOutputFormat(SequenceFileOutputFormat.class); top100.setMapperClass(LoadClicks.class); top100.setCombinerClass(LimitClicks.class); top100.setReducerClass(LimitClicks.class); FileInputFormat.addInputPath(top100, new Path("/user/gates/tmp/grouped")); FileOutputFormat.setOutputPath(top100, new Path("/user/gates/top100sitesforusers18to25")); top100.setNumReduceTasks(1); Job limit = new Job(top100); limit.addDependingJob(groupJob); JobControl jc = new JobControl("Find top 100 sites for users 18 to 25"); jc.addJob(loadPages); jc.addJob(loadUsers); jc.addJob(joinJob); jc.addJob(groupJob); jc.addJob(limit); jc.run(); } } Productivity Perspective MapReduce Sample Code
  21. 21. • It’s for coder , not for user • duplicated code and effort • complexity in managing code Productivity Perspective
  22. 22. High Level Parallel Processing Language • 쉬운 MapReduce 를 위한 병렬 처리 언어 • Pig by Yahoo • Hive by FaceBook 22
  23. 23. Pig example • Same purpose code in PIG Users = load ‘users’ as (name, age); Fltrd = filter Users by age >= 18 and age <= 25; Pages = load ‘pages’ as (user, url); Jnd = join Fltrd by name, Pages by user; Grpd = group Jnd by url; Smmd = foreach Grpd generate group, COUNT(Jnd) as clicks; Srtd = order Smmd by clicks desc; Top5 = limit Srtd 5; store Top5 into ‘top5sites’; code line 1/20 coding time 1/16 easy code 23
  24. 24. 아파치 Pig • 데이터 처리를 위한 고차원 언어 • 아파치 Top-Level 프로젝트 • Yahoo 내 Hadoop 작업의 30% • 2007년 배포 이후 2~10배 성능 개선 • Native 대비 70 ~ 80 % 성능
  25. 25. Hive 개발 동기 • 벤더 데이터 웨어하우스 시스템 교체 • 데이터 확장성 문제(최초 10GB  수십TB) • 라이선스 등 운영 비용 절감 • 벤더 DBMS 에서 Hadoop 으로 교체 결정 • 교체 과정에서 나타난 필요 기능을 개발 • 사용자를 위한 CLI • 코딩 없이 Ad-hoc 질의를 할 수 있는 기능 • 스키마 정보들의 관리
  26. 26. Hive 기반 데이터웨어하우징 • Hive on Hadoop 클러스터 • Scribe & MySQL 데이터를 HDFS 에 벌크 로딩 • 수동 Python 스크립트를 Hive 로 변경 Oracle DatabaseData collection server Scribe server tier MySQL server tier 26
  27. 27. What is the key component in Hive? 27 metastore SerDe execution worker
  28. 28. Lambda Architecture 로우(Raw) 데이터 토픽 가공된 데이터 토픽 장기 데이터 저장소 데이터 조회 배치(Batch) 계산 배치 테이블 고속 테이블 스트리밍 계산 SUM ADD Filtering 데이터 제공 영역
  29. 29. Lambda Architecture BTW, (not BTS) Why it calls Lambda Architecture. Greek letter lambda (λ)
  30. 30. Lambda Architecture e.g. kakao’s KEMI-stat http://tech.kakao.com/2016/08/25/kemi/
  31. 31. Kappa Architecture • Key takeaways is “Calculating instantly, not retrieving instantly” 로우(Raw) 데이터 토픽 (+장기 저장) 가공된 데이 터 토픽 (+장기 저장) 데이터 조회 스트리밍 계산 SUM ADD Filtering 장기 계산 SUM ADD Filtering 데이터 제공 영역
  32. 32. Kappa Architecture with KSQL 단기 데이터 토픽 장기 데이터 토픽 데이터 조회 계산 select * from short_topic 계산 select * from long_topic KSQL
  33. 33. KSQL 카프카 클러스터 KSQL 서버 KSQL Engine REST API KSQL 서버 KSQL Engine REST API KSQL 서버 KSQL Engine REST API KSQL Shell Client
  34. 34. KSQL metastore
  35. 35. KSQL execution worker Kafka Streams
  36. 36. KSQL metastore
  37. 37. KSQL DDL (Data Definition Language)• stream vs table • support only “CREATE/DELETE” for stream and table • sample CREATE TABLE users (usertimestamp BIGINT, user_id VARCHAR, gender VARCHAR, region_id VARCHAR) WITH (VALUE_FORMAT = 'JSON', KAFKA_TOPIC = 'my-users-topic');
  38. 38. KSQL DML (Data Manipulation Language) • SELECT, LEFT JOIN • aggregate function like ADD, SUM and UDF like ABS/CONCAT supported
  39. 39. Example PageViews (카프카토픽) Users (카프카토픽) Page view 토픽 데이터 생성기 Users 토픽 데이터 생성기 PAGEVIEWS_ FEMALE (신규 카프카토픽) pageviews_en riched_r8_r9 (신규 카프카토픽) PAGEVIEWS_ REGIONS (신규 카프카토픽) pageviews_female (스트림) pageviews_female_like _89 (스트림) pageviews_region (테이블)
  40. 40. Example, create user/pageview table PageView s (카프카토픽) Users (카프카토픽) Page view 토픽 데이터 생성기 Users 토픽 데이터 생성기 PAGEVIE WS_FEM ALE (신규 카프 카토픽) pageviews _enriched _r8_r9 (신규 카프 카토픽) PAGEVIE WS_REGI ONS (신규 카프 카토픽) pageviews_femal e (스트림) pageviews_fem ale_like_89 (스트림) pageviews_regi on (테이블) ksql> CREATE STREAM pageviews_original ❶ (viewtime bigint, userid varchar, pageid varchar) ❷ WITH (kafka_topic='pageviews', value_format='DELIMITED'); ❸ ksql> CREATE TABLE users_original ❶ (registertime bigint, gender varchar, regionid varchar, userid varchar) ❷ WITH (kafka_topic='users', value_format='JSON'); ❸
  41. 41. Example, create table/stream from query PageView s (카프카토픽) Users (카프카토픽) Page view 토픽 데이터 생성기 Users 토픽 데이터 생성기 PAGEVIE WS_FEM ALE (신규 카프 카토픽) pageviews _enriched _r8_r9 (신규 카프 카토픽) PAGEVIE WS_REGI ONS (신규 카프 카토픽) pageviews_femal e (스트림) pageviews_fem ale_like_89 (스트림) pageviews_regi on (테이블) ksql> CREATE STREAM pageviews_female AS ❶ SELECT users_original.userid AS userid, pageid, regionid, gender FROM pageviews_ original ❷ LEFT JOIN users_original ❸ ON pageviews_original.userid = users_original.userid WHERE gender = 'FEMALE'; ❹ ksql> CREATE STREAM pageviews_female_like_89 ❶ WITH (kafka_topic='pageviews_enriched_r8_r9', value_format='DELIMITED') AS ❷ SELECT * FROM pageviews_female WHERE regionid LIKE '%_8' OR regionid LIKE '%_9'; ❸
  42. 42. Considerations about Monitoring As A Service
  43. 43. What is most important in Data pipeline? Kafka (ESB) 파이썬 앱 플러그인 자바 앱 ERP. 브릿지 웹앱 Kafka (ESB) 기존 메시징 시스템 EIP EIP
  44. 44. What is the most important in Data pipeline? • Performance • Have to be real-time ( or near real-time) • Data Integrity • No data loss • Every Data can be consumed
  45. 45. The most important in Data pipeline? • Provider Perspective • Service Level Agreement • Rate • Format • ACL • It’s all about Managed Service
  46. 46. About Data Structure? • Data Structure defines the Data computing architecture • It defines API • It defines Data Storage • It defines Computing method • What would you do if the data structure like below data["resource_id"]=”Some Server ID” data["svc_id"]= “Some Service ID” data["timestamp"]=str(int(time.time())) data["statistics"]= stats response =requests.put( url, data=json.dumps(data), )
  47. 47. QnA

×