SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Best Practices for
Running Kafka on
Docker Containers
Nanda Vijaydev, BlueData
Kafka Summit San Francisco
August 28, 2017
Agenda
• Services	and	Messaging	Systems
• Producers	and	Consumers	on	Docker
• Messaging	system	(Kafka)	on	Docker:	Challenges
• How	We	Did	it:	Lessons	Learned
• Key	Takeaways	for	Running	Kafka	on	Docker
• Q	&	A
Kafka – A Real-Time Messaging System
• Open	source	streaming	platform
• Handles	firehose-style	events	
with	low	latency
• Storage	is	a	scalable	pub/sub	
queue
• Supports	diversified	consumer	
groups	for	the	same	message
• Consumers	pull	messages	at	
their	pace
Role of Producers and Consumers
• Independent	services	that	
send/receive	messages
• Can	be	written	in	many	
languages	
• Purpose-built	for	specific	actions
• Mostly	operate	on	high	
frequency	events	and	data
• Availability and	scalability	are	
important
What is a Docker Container?
• Lightweight,	stand-alone,	executable	
package	of	software	to	run	specific	
services
• Runs	on	all	major	Linux	distributions
• On	any	infrastructure	including	VMs,	
bare-metal,	and	in	the	cloud
• Package	includes	code,	runtime,	
system	libraries,	configurations,	etc.	
• Run	as	an	isolated	process	in	user	
space
Docker Containers vs. Virtual Machines
• Unlike	VMs,	containers	virtualize	OS	
and	not	hardware
• More	portable	and	efficient
• Abstraction	at	the	app	layer	that	
packages	app	and	dependencies
• Multiple	containers	share	the	base	
kernel
• Take	up	less	space	and	start	almost		
immediately
Containers = the Future of Apps
Infrastructure
• Agility	and	elasticity
• Standardized	environments	
(dev,	test,	prod)
• Portability	
(on-premises	and	cloud)
• Higher	resource	utilization
Applications
• Fool-proof	packaging	
(configs,	libraries,	driver	
versions,	etc.)
• Repeatable	builds	and	
orchestration
• Faster	app	dev cycles
Considerations for Kafka Deployment
• Multiple	processes:	Kafka,	Zookeeper,	
Schema	Registry,	and	more
• Different	infrastructure	needs	for	
different	roles
• Auto-configuration	for	multi-node	
deployment
• Dev,	Test,	and	Production	in	
isolated/multi-tenant	environment
• Sharing	resources	with	other	processes	
on	same	host	machines
Prod/Cons
Prod/Cons
Kafka Server Requirements
• Broker	– heavy	on	storage	
with	decent	cache
• More	memory	is	needed	
for	in-memory	processing
• Zookeeper	also	needs	
memory	and	may	not	need	
lots	of	disk	space
• Sequential	writes	are	light	
on	CPU	requirements
ZookeeperZookeeper
Zookeeper
Schema	
Registry
Prod/Cons
Broker
How We Did It: Design Decisions I
• Run	Kafka	(e.g.	Confluent	distribution)	and	related	
services and	tools	/	applications unmodified
– Deploy	all	services	that	run	on	a	single	bare-metal	
host	in	a	single	container
• Multi-tenancy	support	is	key
– Network	and	storage	security
• Clusters	of	containers	span	physical	hosts
How We Did It: Design Decisions II
• Images	built	to	“auto-configure”	themselves	at	time	
of	instantiation
– Not	all	instances	of	a	single	image	run	the	same	set	
of	services	when	instantiated
• Zookeeper	vs.	Broker	cluster	nodes
– Support	“reboot”	of	cluster
– Ability	to	scale	on	demand
How We Did It: Implementation
Resource	Utilization
• CPU	shares	translated	to	vCPU for	usability
• No	over-provisioning	of	memory
• Node	storage	using	device	mappers,	thin	pools	and	
logical	volumes
• Shared	external	HDFS	for	Hadoop storage
Noisy	neighbors
How We Did It: Resource Allocation
• Users	to	choose	
“flavors”	while	
launching	containers	
• Storage	heavy	
containers	can	have	
more	disk	space
• vCPUs *	n	=	cpu-shares
① Get	started	with	Kafka	(e.g.	
Confluent	community	edition)
② Evaluate	features/configurations	
simultaneously	on	smaller	
hardware	footprint
③ Prototype	multiple	data	pipelines	
quickly	with	dockerized	producers	
and	consumers
① Spin	up	dev/test	clusters	with	
replica	image	of	production
② QA/UAT	using	production	
configuration	without	re-
inventing	the	wheel
③ Offload	specific	users	and	
workloads	from	production
① LOB	multi-tenancy	with	strict	
resource	allocations
② Bare-metal	performance	for	
business	critical	workloads
③ Share	data	hub	/	data	lake	with	
strict	access	controls
Kafka On Docker Use Cases
Prototyping Departmental Enterprise
Exploring	
the	Value	of	Kafka
Initial	Departmental	
Deployments
Enterprise-Wide,	
Mission-Critical	Deployments
Multi-Tenant Deployment
5.10 3.3
2.1
Compute	Isolation
Compute	Isolation
Team	1 Team	2 Team	3
Build	Components End	to	End	Testing Prod	Environment
Team	1 Team	2 Team3
Multiple	teams	or	business	groups
Evaluate	different	Kafka	use	cases	
(e.g.	producers,	consumers,	
pipelines)
Use	different	services	&	tools					
(e.g.	Broker,	Zookeeper,	Schema	
Registry,	API	Gateway)
Use	different	distributions	of	
standalone	Kafka	and/or	Hadoop	
BlueData EPIC	software	platform
Shared	server	infrastructure	with	
node	labels
Shared	data	sets	for	HDFS	access
Multiple	distributions,	services,	tools	on	shared,	cost-effective	infrastructure
Shared	Servers
Dev/QA	Hardware
Shared,	Centrally	Managed	Server	Infrastructure
Confluent	3.2
Prod	Hardware
Apache	Kafka	0.9Apache	Kafka	0.8
How We Did It: Security Considerations
• Security	is	essential	since	containers	and	host	share	one	kernel
– Non-privileged	containers
• Achieved	through	layered	set	of	capabilities
• Different	capabilities	provide	different	levels	of	isolation	and	protection
• Add	“capabilities”	to	a	container	based	on	what	operations	are	permitted
How We Did It: Network Architecture
• Connect	containers	across	
hosts
• Persistence	of	IP	address	
across	container	restart
• DHCP/DNS	service	required	
for	IP	allocation	and	hostname	
resolution
• Deploy	VLANs	and	VxLAN
tunnels	for	tenant-level	traffic	
isolation
Data Volumes and Storage Drivers
Data	Volume
• A	directory	on	host	FS
• Data	not	deleted	when	
container	is	deleted
Device	Mapper	Storage	Driver	
• Default	– OverlayFS
• We	use	direct-lvm thinpool
with	devicemapper
• Data	is	deleted	with	container
How We Did It: Sample Dockerfile
#	Confluent	Kafka	3.2.1	docker	image
FROM	bluedata/centos7:latest
#Install	java	1.8
RUN	yum	-y	install	java-1.8.0-openjdk-devel
#Download	and	extract	Kafka	installation	tar	file	
RUN	mkdir /usr/lib/kafka;curl -s	http://packages.confluent.io/archive/3.2/confluent-3.2.1-2.11.tar.gz	|	tar	xz -C	
/usr/lib/kafka/
##Create	necessary	directories	for	Kafka	and	Zookeeper	to	run
RUN	mkdir /var/lib/zookeeper
BlueData Application Image (.bin file)
Application	bin	file
Docker
image
CentOS
Dockerfile
RHEL
Dockerfile
appconfig
conf Init.d startscript
<app>	
logo	file
<app>.wb
bdwb
command
clusterconfig,			image,		role,	
appconfig,	catalog,	service,	..	
Sources
Docker file,	
logo	.PNG,	
Init.d
RuntimeSoftware	Bits
OR
Development
(e.g.	extract	.bin	and	modify	to	
create	new	bin)
How We Did It: Deployment Configuration
#!/usr/bin/env bdwb
##############################################################
#
#		Sample	workbench	instructions	for	building	a	BlueData catalog	entry.								
#																																																																											
##############################################################
#
#	YOUR_ORGANIZATION_NAME	must	be	replaced	with	a	valid	organization	name.	Please
#	refer	to	'help	builder	organization'	for	details.
#	builder	organization	--name	YOUR_ORGANIZATION_NAME
builder	organization	--name	BlueData
##	Begin	a	new	catalog	entry
catalog	new	--distroid confluent-kafka --name	"Confluent	Kafka	3.2.1"																						
--desc "The	free,	open-source	streaming	platform	(Enterprise	edition)	based	on	Apache								
Kafka.	Confluent	Platform	is	the	best	way	to	get	started									
with	real-time	data	streams."																																									 
--categories	Kafka	--version	4.0
##	Define	all	node	roles	for	the	virtual	
cluster.
role add	broker	1+
role add	zookeeper	1+
role add	schemareg 1+
role add	gateway	0+
##	Define	all	services	that	are	available	in	the	virtual	cluster.
service	add	--srvcid kafka-broker	--name	"Kafka	Broker	service"	--port	9092
service	add	--srvcid zookeeper	--name	"Zookeeper	service"	--port	2181
service	add	--srvcid schema-registry	--name	"Schema-registry	service"	--port	8081
service	add	--srvcid control-center	--name	"Control	center	service"	--port	9021
##	Dev Configuration.	Multiple	services	are	placed	on	same	container
clusterconfig new	--configid default
clusterconfig assign	--configid default	--role	broker	--srvcids kafka-broker	zookeeper	schema-registry	
control-center
clusterconfig assign	--configid default	--role	zookeeper	--srvcids kafka-server	zookeeper
##	Prod	Configuration.	Services	run	on	dedicated	nodes	with	special	
attributes
clusterconfig new	--configid production
clusterconfig assign	--configid production	--role	broker	--srvcids kafka-broker
clusterconfig assign	--configid production	--role	zookeeper	--srvcids zookeeper
clusterconfig assign	--configid production	--role	schemareg --srvcids schemareg
clusterconfig assign	--configid production	--role	gateway	--srvcids control-center
How We Did It: Deployment Configuration
#Configure	your	docker	nodes	with	appropriate	values
appconfig autogen --replace	/tmp/zookeeper/myid –pattern	@@ID@@	--macro	UNIQUE_SELF_NODE_INT
appconfig autogen --replace	/usr/lib/kafka/etc/kafka/server.properties –pattern	@@HOST@@	--macro	GET_NODE_FQDN
appconfig autogen --replace	/usr/lib/kafka/etc/kafka/server.properties –pattern	@@zookeeper.connet@@	--macro	
ZOOKEEPER_SERVICE_STRING
#Start	services	in	the	order	specified
REGISTER_START_SERVICE_SYSV	zookeeper
REGISTER_START_SERVICE_SYSV	kafka-broker		–wait	zookeeper
REGISTER_START_SERVICE_SYSV	schema-registry		–wait	zookeeper
Multi-Container Cluster
Kafka	cluster	
consisting	of	4	
containers
Multi-Host Kafka Deployment
4	containers
On	3	different	hosts
using	1	VLAN	and	4	persistent	IPs
Different Services in Each Container
Broker	+	Zookeeper	+	Schema	Registry
Broker	+	Zookeeper
Broker
Container Storage On Host
Container	Storage
Host	Storage
Container	Hosts
Kafka	Cluster	Containers
Multi-Tenant Resource Quotas
Aggregate	Docker
container	storage,	
memory	and	cores	
(CPU	shares)	for	all	
containers	in	
tenant	“Team	1”
Aggregate	compute,	memory,	&	storage	quotas	for	Docker containers
Monitoring Containers
Resource	monitoring	
Several	open	source	and	
commercial	monitoring	
options	available
We	use	Elasticsearch with	
Metricbeat plugin
App Store for Kafka, Spark, & More
Pre-built	
images,	or	
author	your	
own	Docker
app	images	
with	our	App	
Workbench
Docker image	+	app	config scripts	+		
metadata	(e.g.	name,	logo)
Kafka on Docker: Key Takeaways
• All	apps	can	be	“Dockerized”,	including	stateful apps
– Docker containers	enable	a	more	flexible	and	agile	
deployment	model
– Faster	application	development	cycles	for	application	
developers,	data	scientists,	and	engineers
– Enables	DevOps for	data	engineering	teams
Kafka on Docker: Key Takeaways
• Enterprise	deployment	requirements:
– Docker	base	images	should	include	all	needed	services	
(Kafka,	Zookeeper,	Schema	registry, etc.),	libraries,	jar	files
– Container	orchestration,	including	networking	and	storage,	
depends	on	standards	enforced	by	enterprises
– Resource-aware	runtime	environment,	including	CPU	and	
RAM
– Sequence-aware	app	deployment	needs	more	thought
Kafka on Docker: Key Takeaways
• Ecosystem	considerations:
– Ability	to	build	producers	and	consumers	on	demand
– Quick	instantiation	and	access	to	Kafka	services
– Interoperability	between	services	is	easier	with	Docker
– Ability	to	run,	rerun,	and	scale	clusters
– Ability	to	deploy	&	integrate	enterprise-ready	solution
Kafka on Docker: Key Takeaways
• Enterprise	deployment	challenges:
– Access	to	container	secured	with	ssh keypair	or	PAM	
module	(LDAP/AD)
– Access	to	Kafka	from	Data	Science	applications
– Management	agents	in	Docker	images
– Runtime	injection	of	resource	and	configuration	
information
• Consider	a	turnkey	software	solution	(e.g.	BlueData)	
to	accelerate	time	to	value	and	avoid	DIY	pitfalls
Nanda Vijaydev
@NandaVijaydev
www.bluedata.com

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Untangling Cluster Management with Helix
Untangling Cluster Management with HelixUntangling Cluster Management with Helix
Untangling Cluster Management with Helix
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
kafka
kafkakafka
kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka on Pulsar
Kafka on Pulsar Kafka on Pulsar
Kafka on Pulsar
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
AWS EMR Cost optimization
AWS EMR Cost optimizationAWS EMR Cost optimization
AWS EMR Cost optimization
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
[2018] NHN 모니터링의 현재와 미래 for 인프라 엔지니어
[2018] NHN 모니터링의 현재와 미래 for 인프라 엔지니어[2018] NHN 모니터링의 현재와 미래 for 인프라 엔지니어
[2018] NHN 모니터링의 현재와 미래 for 인프라 엔지니어
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafka
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication
 
An Introduction to Kafka Cruise Control with Viktor Somogyi-Vass
An Introduction to Kafka Cruise Control with Viktor Somogyi-VassAn Introduction to Kafka Cruise Control with Viktor Somogyi-Vass
An Introduction to Kafka Cruise Control with Viktor Somogyi-Vass
 
Spark overview
Spark overviewSpark overview
Spark overview
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 

Ähnlich wie Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers

Intro Docker october 2013
Intro Docker october 2013Intro Docker october 2013
Intro Docker october 2013
dotCloud
 
Intro to Docker November 2013
Intro to Docker November 2013Intro to Docker November 2013
Intro to Docker November 2013
Docker, Inc.
 
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
dotCloud
 
Docker - Portable Deployment
Docker - Portable DeploymentDocker - Portable Deployment
Docker - Portable Deployment
javaonfly
 

Ähnlich wie Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers (20)

Best Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBest Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker Containers
 
Docker-Intro
Docker-IntroDocker-Intro
Docker-Intro
 
Microservices deck
Microservices deckMicroservices deck
Microservices deck
 
Intro Docker october 2013
Intro Docker october 2013Intro Docker october 2013
Intro Docker october 2013
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
 
OpenStack Summit
OpenStack SummitOpenStack Summit
OpenStack Summit
 
Unleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxUnleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptx
 
The ABC of Docker: The Absolute Best Compendium of Docker
The ABC of Docker: The Absolute Best Compendium of DockerThe ABC of Docker: The Absolute Best Compendium of Docker
The ABC of Docker: The Absolute Best Compendium of Docker
 
Intro to Docker at the 2016 Evans Developer relations conference
Intro to Docker at the 2016 Evans Developer relations conferenceIntro to Docker at the 2016 Evans Developer relations conference
Intro to Docker at the 2016 Evans Developer relations conference
 
Docker introduction
Docker introductionDocker introduction
Docker introduction
 
Intro to Docker November 2013
Intro to Docker November 2013Intro to Docker November 2013
Intro to Docker November 2013
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
 
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
 
Docker - Portable Deployment
Docker - Portable DeploymentDocker - Portable Deployment
Docker - Portable Deployment
 
Docker open stack boston
Docker open stack bostonDocker open stack boston
Docker open stack boston
 
OpenStack Boston
OpenStack BostonOpenStack Boston
OpenStack Boston
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container Ecosystem
 
Docker for the enterprise
Docker for the enterpriseDocker for the enterprise
Docker for the enterprise
 
Webinar Docker Tri Series
Webinar Docker Tri SeriesWebinar Docker Tri Series
Webinar Docker Tri Series
 
Docker
DockerDocker
Docker
 

Mehr von confluent

Mehr von confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Kürzlich hochgeladen

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 

Kürzlich hochgeladen (20)

PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 

Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers

  • 1. Best Practices for Running Kafka on Docker Containers Nanda Vijaydev, BlueData Kafka Summit San Francisco August 28, 2017
  • 2. Agenda • Services and Messaging Systems • Producers and Consumers on Docker • Messaging system (Kafka) on Docker: Challenges • How We Did it: Lessons Learned • Key Takeaways for Running Kafka on Docker • Q & A
  • 3. Kafka – A Real-Time Messaging System • Open source streaming platform • Handles firehose-style events with low latency • Storage is a scalable pub/sub queue • Supports diversified consumer groups for the same message • Consumers pull messages at their pace
  • 4. Role of Producers and Consumers • Independent services that send/receive messages • Can be written in many languages • Purpose-built for specific actions • Mostly operate on high frequency events and data • Availability and scalability are important
  • 5. What is a Docker Container? • Lightweight, stand-alone, executable package of software to run specific services • Runs on all major Linux distributions • On any infrastructure including VMs, bare-metal, and in the cloud • Package includes code, runtime, system libraries, configurations, etc. • Run as an isolated process in user space
  • 6. Docker Containers vs. Virtual Machines • Unlike VMs, containers virtualize OS and not hardware • More portable and efficient • Abstraction at the app layer that packages app and dependencies • Multiple containers share the base kernel • Take up less space and start almost immediately
  • 7. Containers = the Future of Apps Infrastructure • Agility and elasticity • Standardized environments (dev, test, prod) • Portability (on-premises and cloud) • Higher resource utilization Applications • Fool-proof packaging (configs, libraries, driver versions, etc.) • Repeatable builds and orchestration • Faster app dev cycles
  • 8. Considerations for Kafka Deployment • Multiple processes: Kafka, Zookeeper, Schema Registry, and more • Different infrastructure needs for different roles • Auto-configuration for multi-node deployment • Dev, Test, and Production in isolated/multi-tenant environment • Sharing resources with other processes on same host machines
  • 9. Prod/Cons Prod/Cons Kafka Server Requirements • Broker – heavy on storage with decent cache • More memory is needed for in-memory processing • Zookeeper also needs memory and may not need lots of disk space • Sequential writes are light on CPU requirements ZookeeperZookeeper Zookeeper Schema Registry Prod/Cons Broker
  • 10. How We Did It: Design Decisions I • Run Kafka (e.g. Confluent distribution) and related services and tools / applications unmodified – Deploy all services that run on a single bare-metal host in a single container • Multi-tenancy support is key – Network and storage security • Clusters of containers span physical hosts
  • 11. How We Did It: Design Decisions II • Images built to “auto-configure” themselves at time of instantiation – Not all instances of a single image run the same set of services when instantiated • Zookeeper vs. Broker cluster nodes – Support “reboot” of cluster – Ability to scale on demand
  • 12. How We Did It: Implementation Resource Utilization • CPU shares translated to vCPU for usability • No over-provisioning of memory • Node storage using device mappers, thin pools and logical volumes • Shared external HDFS for Hadoop storage Noisy neighbors
  • 13. How We Did It: Resource Allocation • Users to choose “flavors” while launching containers • Storage heavy containers can have more disk space • vCPUs * n = cpu-shares
  • 14. ① Get started with Kafka (e.g. Confluent community edition) ② Evaluate features/configurations simultaneously on smaller hardware footprint ③ Prototype multiple data pipelines quickly with dockerized producers and consumers ① Spin up dev/test clusters with replica image of production ② QA/UAT using production configuration without re- inventing the wheel ③ Offload specific users and workloads from production ① LOB multi-tenancy with strict resource allocations ② Bare-metal performance for business critical workloads ③ Share data hub / data lake with strict access controls Kafka On Docker Use Cases Prototyping Departmental Enterprise Exploring the Value of Kafka Initial Departmental Deployments Enterprise-Wide, Mission-Critical Deployments
  • 15. Multi-Tenant Deployment 5.10 3.3 2.1 Compute Isolation Compute Isolation Team 1 Team 2 Team 3 Build Components End to End Testing Prod Environment Team 1 Team 2 Team3 Multiple teams or business groups Evaluate different Kafka use cases (e.g. producers, consumers, pipelines) Use different services & tools (e.g. Broker, Zookeeper, Schema Registry, API Gateway) Use different distributions of standalone Kafka and/or Hadoop BlueData EPIC software platform Shared server infrastructure with node labels Shared data sets for HDFS access Multiple distributions, services, tools on shared, cost-effective infrastructure Shared Servers Dev/QA Hardware Shared, Centrally Managed Server Infrastructure Confluent 3.2 Prod Hardware Apache Kafka 0.9Apache Kafka 0.8
  • 16. How We Did It: Security Considerations • Security is essential since containers and host share one kernel – Non-privileged containers • Achieved through layered set of capabilities • Different capabilities provide different levels of isolation and protection • Add “capabilities” to a container based on what operations are permitted
  • 17. How We Did It: Network Architecture • Connect containers across hosts • Persistence of IP address across container restart • DHCP/DNS service required for IP allocation and hostname resolution • Deploy VLANs and VxLAN tunnels for tenant-level traffic isolation
  • 18. Data Volumes and Storage Drivers Data Volume • A directory on host FS • Data not deleted when container is deleted Device Mapper Storage Driver • Default – OverlayFS • We use direct-lvm thinpool with devicemapper • Data is deleted with container
  • 19. How We Did It: Sample Dockerfile # Confluent Kafka 3.2.1 docker image FROM bluedata/centos7:latest #Install java 1.8 RUN yum -y install java-1.8.0-openjdk-devel #Download and extract Kafka installation tar file RUN mkdir /usr/lib/kafka;curl -s http://packages.confluent.io/archive/3.2/confluent-3.2.1-2.11.tar.gz | tar xz -C /usr/lib/kafka/ ##Create necessary directories for Kafka and Zookeeper to run RUN mkdir /var/lib/zookeeper
  • 20. BlueData Application Image (.bin file) Application bin file Docker image CentOS Dockerfile RHEL Dockerfile appconfig conf Init.d startscript <app> logo file <app>.wb bdwb command clusterconfig, image, role, appconfig, catalog, service, .. Sources Docker file, logo .PNG, Init.d RuntimeSoftware Bits OR Development (e.g. extract .bin and modify to create new bin)
  • 21. How We Did It: Deployment Configuration #!/usr/bin/env bdwb ############################################################## # # Sample workbench instructions for building a BlueData catalog entry. # ############################################################## # # YOUR_ORGANIZATION_NAME must be replaced with a valid organization name. Please # refer to 'help builder organization' for details. # builder organization --name YOUR_ORGANIZATION_NAME builder organization --name BlueData ## Begin a new catalog entry catalog new --distroid confluent-kafka --name "Confluent Kafka 3.2.1" --desc "The free, open-source streaming platform (Enterprise edition) based on Apache Kafka. Confluent Platform is the best way to get started with real-time data streams." --categories Kafka --version 4.0 ## Define all node roles for the virtual cluster. role add broker 1+ role add zookeeper 1+ role add schemareg 1+ role add gateway 0+ ## Define all services that are available in the virtual cluster. service add --srvcid kafka-broker --name "Kafka Broker service" --port 9092 service add --srvcid zookeeper --name "Zookeeper service" --port 2181 service add --srvcid schema-registry --name "Schema-registry service" --port 8081 service add --srvcid control-center --name "Control center service" --port 9021 ## Dev Configuration. Multiple services are placed on same container clusterconfig new --configid default clusterconfig assign --configid default --role broker --srvcids kafka-broker zookeeper schema-registry control-center clusterconfig assign --configid default --role zookeeper --srvcids kafka-server zookeeper ## Prod Configuration. Services run on dedicated nodes with special attributes clusterconfig new --configid production clusterconfig assign --configid production --role broker --srvcids kafka-broker clusterconfig assign --configid production --role zookeeper --srvcids zookeeper clusterconfig assign --configid production --role schemareg --srvcids schemareg clusterconfig assign --configid production --role gateway --srvcids control-center
  • 22. How We Did It: Deployment Configuration #Configure your docker nodes with appropriate values appconfig autogen --replace /tmp/zookeeper/myid –pattern @@ID@@ --macro UNIQUE_SELF_NODE_INT appconfig autogen --replace /usr/lib/kafka/etc/kafka/server.properties –pattern @@HOST@@ --macro GET_NODE_FQDN appconfig autogen --replace /usr/lib/kafka/etc/kafka/server.properties –pattern @@zookeeper.connet@@ --macro ZOOKEEPER_SERVICE_STRING #Start services in the order specified REGISTER_START_SERVICE_SYSV zookeeper REGISTER_START_SERVICE_SYSV kafka-broker –wait zookeeper REGISTER_START_SERVICE_SYSV schema-registry –wait zookeeper
  • 25. Different Services in Each Container Broker + Zookeeper + Schema Registry Broker + Zookeeper Broker
  • 26. Container Storage On Host Container Storage Host Storage Container Hosts Kafka Cluster Containers
  • 29. App Store for Kafka, Spark, & More Pre-built images, or author your own Docker app images with our App Workbench Docker image + app config scripts + metadata (e.g. name, logo)
  • 30. Kafka on Docker: Key Takeaways • All apps can be “Dockerized”, including stateful apps – Docker containers enable a more flexible and agile deployment model – Faster application development cycles for application developers, data scientists, and engineers – Enables DevOps for data engineering teams
  • 31. Kafka on Docker: Key Takeaways • Enterprise deployment requirements: – Docker base images should include all needed services (Kafka, Zookeeper, Schema registry, etc.), libraries, jar files – Container orchestration, including networking and storage, depends on standards enforced by enterprises – Resource-aware runtime environment, including CPU and RAM – Sequence-aware app deployment needs more thought
  • 32. Kafka on Docker: Key Takeaways • Ecosystem considerations: – Ability to build producers and consumers on demand – Quick instantiation and access to Kafka services – Interoperability between services is easier with Docker – Ability to run, rerun, and scale clusters – Ability to deploy & integrate enterprise-ready solution
  • 33. Kafka on Docker: Key Takeaways • Enterprise deployment challenges: – Access to container secured with ssh keypair or PAM module (LDAP/AD) – Access to Kafka from Data Science applications – Management agents in Docker images – Runtime injection of resource and configuration information • Consider a turnkey software solution (e.g. BlueData) to accelerate time to value and avoid DIY pitfalls