10. Infrastructure
• Publish-subscribe messaging
• Implemented as distributed commit log
• Fast: 100s of MB (reads and writes) per
s from thousands of clients
• Scalable: elastically and transparently
expanded without downtime
• Durable: Messages persisted on disk
18. Infrastructure
• Shard:
– Group of data records in a stream
– 1MB write per second
– 2MB read per second
– 1000 puts per second
19.
20. Infrastructure
• Package application with dependencies
• Standardized unit for software
development
• Layered filesystem, share common files
• Isolate applications from each other
21.
22. Infrastructure
• Docker container: stripped-to-basics
version of a Linux operating system
• Docker image: software you load into a
container
23. Infrastructure
• Docker image built with a Dockerfile
• Docker images built using “inheritance”
• Custom image based on “base image”
26. Software & Frameworks
• Toolkit for reactive applications
• Based on the JVM
• Event driven and non-blocking
• Polyglot (Java, JS, Groovy, Ruby)
• Lightweight and modular
36. Software & Frameworks
• Language-neutral
• Platform-neutral
• Extensible mechanism for serializing
structured data
• Support for Java, Python, and C++
56. Deployment
• Why Spotify Kafka Docker Image?
– Kafka depends on Zookeeper
– Spotify Kafka runs Kafka and Zookeeper
– No dependency to external Zookeeper
– Runs out of the box
60. Deployment
• Vert.x Producer Docker Container
– Based on phusion/baseimage
– Installs Oracle Java 8
– Add producer Fat-JAR
– Starts the Fat-JAR
61. Deployment
• Requirements for AWS:
– VPC
– User Role for Kinesis access from EC2
– User Role for Kinesis access from Lambda
– EC2 instance
– Kinesis stream
– Lambda package
64. Deployment
• In AWS
– Create Lambda function
• Upload JAR to S3 bucket
• Specify function
• Add event source (SUMMIT_STREAM)
65.
66.
67. Deployment
• In AWS
– Start an EC2 instance
• t2.small is sufficient
• Install Docker and run container using EC2
user data
• Important: select correct IAM role
68. Deployment
• EC2 User Data
#!/bin/bash -ex
yum -y update
yum install docker -y
service docker start
docker run autoscaling/ingestion-service
72. Putting it all together
• Integration of Kinesis and Kafka
– Kinesis consumer that processes records
– Record processing -> sending to Kafka
– AWS Lambda perfect choice
– Problem: Lambda and VPN (VPC) not
working*
73. Putting it all together
• Integration testing Kinesis and Kafka
– AWS API:
• Create Kinesis stream in @BeforeClass
• Produce data and write into stream
• Delete stream in @AfterClass
76. Putting it all together
• Integration testing Kinesis and Kafka
– Spotify Docker Client
• Run Spotify Kafka container in @BeforeClass
• Produce data and write into stream
• Stop Spotify Kafka container in @AfterClass
78. Putting it all together
• Integrationtests Kinesis and Kafka
– Put messages into Kinesis
– Consumer messages in application
– Put messages in Kafka
– Consume messages from Kafka
– Compare messages
79. Putting it all together
• Integrationtests Kinesis and Kafka
– After tests: clean up infrastructure
– Very cost effective
– Real world tests without mocking
– Quite fast
80. Recap
• What have we achieved today?
– We created a distributed, message driven
system
– Based on JVM and Docker
– Running locally and AWS
84. Resources
• EC2 User Data
– https://gist.github.com/SaschaMoellering/c
6ee24ec999325c43e90
• EC2 User Role
– https://gist.github.com/SaschaMoellering/
a971fb73626f41ad80f4
85. Resources
• Lambda User Role
– https://gist.github.com/SaschaMoellering/b
14540b144263e5fea4b