SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Re-architecting
Data Platform with Kubernetes
SeungYong Oh, Devsisters Corp.
오 승 용, 데브시스터즈
● Mobile Game Publisher: including Cookie Run Series
● 10+ Million App Downloads, 2M+ Monthly Active Users
"Creating the Best Player Experiences through Excellent Content, Service, and Technology!"
"탁월한 기술, 서비스, 콘텐츠로 전 세계 고객에게 최고의 경험을 선사합니다."
3
Who am I?
SeungYong Oh (오 승 용)
● DevOps & Data Engineer at Devsisters
○ Also worked as a game server developer for Cookie Run series
● Kubernetes user from 2016
○ Adopted Kubernetes for development/testing environment for Cookie Run OvenBreak
■ NDC 2017: Kubernetes로 개발서버 간단히 찍어내기
https://www.slideshare.net/seungyongoh3/ndc17-kubernetes
4
Kubernetes @ Devsisters
● 2014~2015: Proof-of-Concept level evaluation
● 2016: Dev/testing environment game infra for Cookie Run: OvenBreak
● 2017~: Dev/testing environment game infra platform
● 2019~: Production-level game infra platform
○ Cookie Run for Kakao, Hello Brave Cookies
● 2019~: Data Platform on Kubernetes
○ Still ongoing
5
Data Platform @ Devsisters
● Central logging infra with various data sources
Using those logs,
● Search
○ Search server/client logs for debug or operational purpose
○ Search specific user's action logs for customer service purpose
● Applications
○ A/B Testing, Machine Learning, ..
● And of course, Analytics!
6
Data Platform @ Devsisters
● Analytics: Analytics for Everyone!
○ Everyone?
■ Not only limited to data scientist, analyst
■ CEO, Project Manager, Marketer, Game Designer, ...
○ KPI(Key Performance Indicator) Service: Active users, Revenue, Retention,..
○ Ad-hoc query/analytics environment
■ Fully programmable environment for Data Scientist / Engineer
■ SQL query / clickable interface for many users
7
Analytics Infra / Service
Data Platform @ Devsisters
Server Logs
k8s Pod / Docker /
Host syslog
Game Client / Web /
3rd Party Logs
(via custom SDK/API)
Apache Kafka /
Kafka Streams
Log collection
& pre-processing
Elastic Stack (ELK)
(Near) Real-time
Log Search & Analytics
Batch
Amazon S3
Data Lake
AWS Glue Amazon
Athena
Amazon
QuickSight
8
$SOME_PROJECT_NAME supports K8S!
Many Big data & analytics related projects support Kubernetes in 2019!
● Hadoop, Spark, Airflow, Kubeflow, JupyterHub, Hue, Zeppelin, Superset, Kafka, Elastic Stack, etc.
Support?
● Case 1: (Official/Popular) Helm Chart
● Case 2: K8S Operator for the project
● Case 3: Cloud Native / Kubernetes-specific Integrations
9
But, does it mean we should move to K8S?
Many projects are stateful apps, especially in Big data / analytics
● Considerable risks / maintenance costs
○ Node Upgrade? Retirement? Reallocation is NOT easy
● Benefits?
○ Bin-packing? 🤔 (especially in public cloud)
○ Autoscaling? 🤔
○ Scheduling? 🤔
■ Already there are some advanced scheduler/resource manager (ex. Hadoop YARN)
10
Actually, there are some (great!) benefits!
11
Benefit #1: Deployment / Ops Made Easy
12
Example: Apache Airflow
Apache Airflow: Workflow Management tool
Source:
https://github.com/GoogleCloudPlatform/airflow-operator/blob/
master/docs/design.md
Quick Deployment with
● helm install stable/airflow
or
● Airflow operator
https://github.com/GoogleCloudPlatform/airflow-operator/
13
Benefit #2: Analytics - Easy and Efficient
14
Apache Spark @ Devsisters
A unified analytics engine for large-scale data processing
Use-cases of Spark @ Devsisters:
Amazon S3
Data Lake: Raw
Amazon S3
Data Lake: Analytics
(Snapshot, DW, DM..)
15
Case #1: Store Logs
(Batch)
Case #2: Transform
(Batch)
Case #3: Ad-hoc
Analytics
Data Scientist,
Engineer, Marketer,
PM, ..
Apache Spark @ Devsisters
Self-service web interface for Spark Cluster Provision
Launches Spark Cluster (standalone mode)
● 1 master node with Jupyter Notebook
● multiple worker nodes
16https://www.slideshare.net/JPark0426/ndc-2018-spark-flintrock-airflow
Apache Spark @ Devsisters
So, anyone can have their own Spark Cluster with one-click! Looks Good?
● Many users (ex. Marketing Manager) only want to focus on data analysis!
○ r5.xlarge?
○ spot? bidding price?
○ Waiting few minutes for spark provisioning, with some failures
○ Should backup their files before terminating cluster
● Cost ineffective
○ idle resources
○ forget to terminate cluster after use
Then sharing a centralized cluster? NO! (access control issues etc.)
17
Spark on Kubernetes
Apache Spark - Native K8S Scheduler
(experimental)
https://spark.apache.org/docs/latest/running-on-kubernetes.html
K8S Operator for Apache Spark
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
18
Jupyter + Spark on K8S
# Execute below in Jupyter Pod
PYSPARK_DRIVER_PYTHON=jupyter 
PYSPARK_DRIVER_PYTHON_OPTS='notebook --allow-root' 
./bin/pyspark 
--master k8s://https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_PORT_443_TCP_PORT/ 
--name spark-$APP_NAME 
--conf spark.executor.instances=$EXECUTOR_COUNT 
--conf spark.kubernetes.container.image=<<custom built spark container>> 
--conf spark.kubernetes.namespace=spark-test 
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-test-server 
--conf spark.driver.host=spark-driver 
--conf spark.driver.port=5555 
--conf spark.driver.bindAddress=0.0.0.0 
--conf spark.driver.blockManager.port=5556 
--conf spark.kubernetes.executor.annotation.iam.amazonaws.com/role=<<AWS IAM role (KIAM)>>
● But users need much easier interface!
19
JupyterHub + Spark on K8S
● Multi-user version of Jupyter Notebook, support k8s natively
● Launch a notebook pod per user
○ Data is persisted (stored in PersistentVolume)
● Let's Integrate Spark on K8S with JupyterHub !
20
JupyterHub + Spark on K8S
● Demo
21
22
23
But many users only need SQL interface
● Spark has distributed SQL Engine
https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html
○ Thrift JDBC/ODBC server (corresponds to the HiveServer2)
● Integration with Apache Superset (Business Intelligence web app)
24
25
For applications?
● Example #1:
○ Send push messages to users that played game more than a hour yesterday
■ JDBC!
● Example #2:
○ Give special items to users that has some possibility not to play the game anymore
■ spark-submit?
26
Benefit #3: Fine-tuning Access Controls
27
Still Stressful but Easier
● Web Apps: Use Auth Proxy Sidecar
● For apps that don't support RBAC:
○ Pod-level data access control (S3 in Devsisters' case)
■ In AWS, with KIAM or AWS IAM Authenticator
○ Or Node-level (with nodeSelector)
○ Pods/Deployment per each group/roles + Benefit of Bin-packing
28
Then, Let's move to Kubernetes Now..?!
29
Still in Early Stage
● Spark on Kubernetes
○ Doesn't support dynamic allocation / external shuffle service yet
■ You can't change count of executors
○ Pod Templating - available on 3.0 Preview
■ Even You can't specify service account of executor pod now
● Other Many Projects also in alpha stage,
which requires code-level modifications / forking
to meet the team's requirements
● If storage & computing resource is coupled, maybe little more hard to move
30
But it's worth to move!
● A Data Engineer who doesn't have prior knowledge of Kubernetes
could implement Spark on K8s/JupyterHub/Superset customizations
less than 1.5 months (expected more than 4 months for non-k8s env)
● Kubernetes' abstraction allows users to focus on what they can do best
31

Weitere ähnliche Inhalte

Was ist angesagt?

MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks EDB
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaYoungHeon (Roy) Kim
 
CQRS and Event Sourcing for Java Developers
CQRS and Event Sourcing for Java DevelopersCQRS and Event Sourcing for Java Developers
CQRS and Event Sourcing for Java DevelopersMarkus Eisele
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리Junyi Song
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]MongoDB
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaAmazee Labs
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기NAVER D2
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안SANG WON PARK
 
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법Open Source Consulting
 
Amazon Aurora Deep Dive (김기완) - AWS DB Day
Amazon Aurora Deep Dive (김기완) - AWS DB DayAmazon Aurora Deep Dive (김기완) - AWS DB Day
Amazon Aurora Deep Dive (김기완) - AWS DB DayAmazon Web Services Korea
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackMichel Tricot
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETconfluent
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 

Was ist angesagt? (20)

MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & Grafana
 
CQRS and Event Sourcing for Java Developers
CQRS and Event Sourcing for Java DevelopersCQRS and Event Sourcing for Java Developers
CQRS and Event Sourcing for Java Developers
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
kafka
kafkakafka
kafka
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
 
Amazon Aurora Deep Dive (김기완) - AWS DB Day
Amazon Aurora Deep Dive (김기완) - AWS DB DayAmazon Aurora Deep Dive (김기완) - AWS DB Day
Amazon Aurora Deep Dive (김기완) - AWS DB Day
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stack
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 

Ähnlich wie Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes

Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On DemandBogdan Kyryliuk
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesDatabricks
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kuberneteskloia
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftChester Chen
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Pythonwesley chun
 
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes Elasticsearch
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleDmytro Semenov
 
Openstack India May Meetup
Openstack India May MeetupOpenstack India May Meetup
Openstack India May MeetupDeepak Garg
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016aspyker
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Sharma Podila
 
OpenEBS hangout #4
OpenEBS hangout #4OpenEBS hangout #4
OpenEBS hangout #4OpenEBS
 
Serverless computing with Google Cloud
Serverless computing with Google CloudServerless computing with Google Cloud
Serverless computing with Google Cloudwesley chun
 
Scaling Apache Spark on Kubernetes at Lyft
Scaling Apache Spark on Kubernetes at LyftScaling Apache Spark on Kubernetes at Lyft
Scaling Apache Spark on Kubernetes at LyftDatabricks
 
PySpark on Kubernetes @ Python Barcelona March Meetup
PySpark on Kubernetes @ Python Barcelona March MeetupPySpark on Kubernetes @ Python Barcelona March Meetup
PySpark on Kubernetes @ Python Barcelona March MeetupHolden Karau
 
SpringOne Platform 2018 Recap in 5 minutes
SpringOne Platform 2018 Recap in 5 minutesSpringOne Platform 2018 Recap in 5 minutes
SpringOne Platform 2018 Recap in 5 minutesRohit Kelapure
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using AnsibleAlok Patra
 
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan GoksuSpring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan GoksuVMware Tanzu
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...Pierre GRANDIN
 

Ähnlich wie Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes (20)

Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On Demand
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kubernetes
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Python
 
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
Openstack India May Meetup
Openstack India May MeetupOpenstack India May Meetup
Openstack India May Meetup
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016
 
OpenEBS hangout #4
OpenEBS hangout #4OpenEBS hangout #4
OpenEBS hangout #4
 
Serverless computing with Google Cloud
Serverless computing with Google CloudServerless computing with Google Cloud
Serverless computing with Google Cloud
 
Where should I run my code? Serverless, Containers, Virtual Machines and more
Where should I run my code? Serverless, Containers, Virtual Machines and moreWhere should I run my code? Serverless, Containers, Virtual Machines and more
Where should I run my code? Serverless, Containers, Virtual Machines and more
 
Scaling Apache Spark on Kubernetes at Lyft
Scaling Apache Spark on Kubernetes at LyftScaling Apache Spark on Kubernetes at Lyft
Scaling Apache Spark on Kubernetes at Lyft
 
PySpark on Kubernetes @ Python Barcelona March Meetup
PySpark on Kubernetes @ Python Barcelona March MeetupPySpark on Kubernetes @ Python Barcelona March Meetup
PySpark on Kubernetes @ Python Barcelona March Meetup
 
SpringOne Platform 2018 Recap in 5 minutes
SpringOne Platform 2018 Recap in 5 minutesSpringOne Platform 2018 Recap in 5 minutes
SpringOne Platform 2018 Recap in 5 minutes
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using Ansible
 
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan GoksuSpring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
 

Kürzlich hochgeladen

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 

Kürzlich hochgeladen (20)

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 

Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes

  • 1. Re-architecting Data Platform with Kubernetes SeungYong Oh, Devsisters Corp. 오 승 용, 데브시스터즈
  • 2. ● Mobile Game Publisher: including Cookie Run Series ● 10+ Million App Downloads, 2M+ Monthly Active Users "Creating the Best Player Experiences through Excellent Content, Service, and Technology!" "탁월한 기술, 서비스, 콘텐츠로 전 세계 고객에게 최고의 경험을 선사합니다." 3
  • 3. Who am I? SeungYong Oh (오 승 용) ● DevOps & Data Engineer at Devsisters ○ Also worked as a game server developer for Cookie Run series ● Kubernetes user from 2016 ○ Adopted Kubernetes for development/testing environment for Cookie Run OvenBreak ■ NDC 2017: Kubernetes로 개발서버 간단히 찍어내기 https://www.slideshare.net/seungyongoh3/ndc17-kubernetes 4
  • 4. Kubernetes @ Devsisters ● 2014~2015: Proof-of-Concept level evaluation ● 2016: Dev/testing environment game infra for Cookie Run: OvenBreak ● 2017~: Dev/testing environment game infra platform ● 2019~: Production-level game infra platform ○ Cookie Run for Kakao, Hello Brave Cookies ● 2019~: Data Platform on Kubernetes ○ Still ongoing 5
  • 5. Data Platform @ Devsisters ● Central logging infra with various data sources Using those logs, ● Search ○ Search server/client logs for debug or operational purpose ○ Search specific user's action logs for customer service purpose ● Applications ○ A/B Testing, Machine Learning, .. ● And of course, Analytics! 6
  • 6. Data Platform @ Devsisters ● Analytics: Analytics for Everyone! ○ Everyone? ■ Not only limited to data scientist, analyst ■ CEO, Project Manager, Marketer, Game Designer, ... ○ KPI(Key Performance Indicator) Service: Active users, Revenue, Retention,.. ○ Ad-hoc query/analytics environment ■ Fully programmable environment for Data Scientist / Engineer ■ SQL query / clickable interface for many users 7
  • 7. Analytics Infra / Service Data Platform @ Devsisters Server Logs k8s Pod / Docker / Host syslog Game Client / Web / 3rd Party Logs (via custom SDK/API) Apache Kafka / Kafka Streams Log collection & pre-processing Elastic Stack (ELK) (Near) Real-time Log Search & Analytics Batch Amazon S3 Data Lake AWS Glue Amazon Athena Amazon QuickSight 8
  • 8. $SOME_PROJECT_NAME supports K8S! Many Big data & analytics related projects support Kubernetes in 2019! ● Hadoop, Spark, Airflow, Kubeflow, JupyterHub, Hue, Zeppelin, Superset, Kafka, Elastic Stack, etc. Support? ● Case 1: (Official/Popular) Helm Chart ● Case 2: K8S Operator for the project ● Case 3: Cloud Native / Kubernetes-specific Integrations 9
  • 9. But, does it mean we should move to K8S? Many projects are stateful apps, especially in Big data / analytics ● Considerable risks / maintenance costs ○ Node Upgrade? Retirement? Reallocation is NOT easy ● Benefits? ○ Bin-packing? 🤔 (especially in public cloud) ○ Autoscaling? 🤔 ○ Scheduling? 🤔 ■ Already there are some advanced scheduler/resource manager (ex. Hadoop YARN) 10
  • 10. Actually, there are some (great!) benefits! 11
  • 11. Benefit #1: Deployment / Ops Made Easy 12
  • 12. Example: Apache Airflow Apache Airflow: Workflow Management tool Source: https://github.com/GoogleCloudPlatform/airflow-operator/blob/ master/docs/design.md Quick Deployment with ● helm install stable/airflow or ● Airflow operator https://github.com/GoogleCloudPlatform/airflow-operator/ 13
  • 13. Benefit #2: Analytics - Easy and Efficient 14
  • 14. Apache Spark @ Devsisters A unified analytics engine for large-scale data processing Use-cases of Spark @ Devsisters: Amazon S3 Data Lake: Raw Amazon S3 Data Lake: Analytics (Snapshot, DW, DM..) 15 Case #1: Store Logs (Batch) Case #2: Transform (Batch) Case #3: Ad-hoc Analytics Data Scientist, Engineer, Marketer, PM, ..
  • 15. Apache Spark @ Devsisters Self-service web interface for Spark Cluster Provision Launches Spark Cluster (standalone mode) ● 1 master node with Jupyter Notebook ● multiple worker nodes 16https://www.slideshare.net/JPark0426/ndc-2018-spark-flintrock-airflow
  • 16. Apache Spark @ Devsisters So, anyone can have their own Spark Cluster with one-click! Looks Good? ● Many users (ex. Marketing Manager) only want to focus on data analysis! ○ r5.xlarge? ○ spot? bidding price? ○ Waiting few minutes for spark provisioning, with some failures ○ Should backup their files before terminating cluster ● Cost ineffective ○ idle resources ○ forget to terminate cluster after use Then sharing a centralized cluster? NO! (access control issues etc.) 17
  • 17. Spark on Kubernetes Apache Spark - Native K8S Scheduler (experimental) https://spark.apache.org/docs/latest/running-on-kubernetes.html K8S Operator for Apache Spark https://github.com/GoogleCloudPlatform/spark-on-k8s-operator 18
  • 18. Jupyter + Spark on K8S # Execute below in Jupyter Pod PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS='notebook --allow-root' ./bin/pyspark --master k8s://https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_PORT_443_TCP_PORT/ --name spark-$APP_NAME --conf spark.executor.instances=$EXECUTOR_COUNT --conf spark.kubernetes.container.image=<<custom built spark container>> --conf spark.kubernetes.namespace=spark-test --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-test-server --conf spark.driver.host=spark-driver --conf spark.driver.port=5555 --conf spark.driver.bindAddress=0.0.0.0 --conf spark.driver.blockManager.port=5556 --conf spark.kubernetes.executor.annotation.iam.amazonaws.com/role=<<AWS IAM role (KIAM)>> ● But users need much easier interface! 19
  • 19. JupyterHub + Spark on K8S ● Multi-user version of Jupyter Notebook, support k8s natively ● Launch a notebook pod per user ○ Data is persisted (stored in PersistentVolume) ● Let's Integrate Spark on K8S with JupyterHub ! 20
  • 20. JupyterHub + Spark on K8S ● Demo 21
  • 21. 22
  • 22. 23
  • 23. But many users only need SQL interface ● Spark has distributed SQL Engine https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html ○ Thrift JDBC/ODBC server (corresponds to the HiveServer2) ● Integration with Apache Superset (Business Intelligence web app) 24
  • 24. 25
  • 25. For applications? ● Example #1: ○ Send push messages to users that played game more than a hour yesterday ■ JDBC! ● Example #2: ○ Give special items to users that has some possibility not to play the game anymore ■ spark-submit? 26
  • 26. Benefit #3: Fine-tuning Access Controls 27
  • 27. Still Stressful but Easier ● Web Apps: Use Auth Proxy Sidecar ● For apps that don't support RBAC: ○ Pod-level data access control (S3 in Devsisters' case) ■ In AWS, with KIAM or AWS IAM Authenticator ○ Or Node-level (with nodeSelector) ○ Pods/Deployment per each group/roles + Benefit of Bin-packing 28
  • 28. Then, Let's move to Kubernetes Now..?! 29
  • 29. Still in Early Stage ● Spark on Kubernetes ○ Doesn't support dynamic allocation / external shuffle service yet ■ You can't change count of executors ○ Pod Templating - available on 3.0 Preview ■ Even You can't specify service account of executor pod now ● Other Many Projects also in alpha stage, which requires code-level modifications / forking to meet the team's requirements ● If storage & computing resource is coupled, maybe little more hard to move 30
  • 30. But it's worth to move! ● A Data Engineer who doesn't have prior knowledge of Kubernetes could implement Spark on K8s/JupyterHub/Superset customizations less than 1.5 months (expected more than 4 months for non-k8s env) ● Kubernetes' abstraction allows users to focus on what they can do best 31