14. page
“A single, unified database that supports
transactions and analytics in real-time without
sacrificing transactional integrity, performance,
and scale.” - Mike Gualtieri, Forrester
Transactional
Operational
Analytical
Translytical Database
Traditional Stack For streaming data
Single data layer
15. page 15
How can queuing systems like
Kafka Simplify the architecture?
19. page 19
• Handles 1.4 Trillion/day messages for various applications in Linked-in
• Over 1400 brokers
• Can handle well over a few million messages/sec
• At-least once delivery of messages
• Strong durability contract with replication
• Rich ecosystem
• Expresso - offload My-SQL replication
• Venice – Compute derived data
• Nuage- a portal to manage topics and associated metadata
• Goblin – Ingestion framework
• Mirror-maker - replication
CAN KAFKA SCALE?
21. page 21
• Make sure that the producer is set to Acks = all
• Make sure “replica.lag.time.max.ms” set to a minimum (match it with the voltdb timeout)
• Make sure "replica.lag.max.messages” set to a minimum (this parameter is getting
deprecated from 0.9)
• Disable unclean.leader.election.enable = false
• Use default.replication.factor = 3
• Make sure that the consumer is set to read only committed messages min.insync.replicas = 2
(this is applied to Topic level – need to be done manually before 0.9)
• “autocommit.enable” = false
• Disable automatic topic creation in kafka
• “Block.on.buffer.full” = true
• “Max.inflight.requests.per.connect = 1”
• Use rebalance listener to limit duplicates
• Connection to Zookeeper
• Monitor Consumer lag via offsets
• Report consumer counts and errors to a separate topics
SO WHAT ARE THE CAVEATS WITH KAFKA?
22. page 22
Sample Config file
<import>
<configuration type="kafka" format="csv" enabled="true">
<property name="brokers">kafkasvr:9092</property>
<property name="topics">employees</property>
<property name="procedure">EMPLOYEE.insert</property>
</configuration>
<configuration type="kafka" enabled="true">
<property name="brokers">kafkasvr:9092</property>
<property name="topics">employees</property>
<property name="procedure">EMPLOYEE.insert</property>
</configuration>
<configuration type="kafka" enabled="true">
<property name="brokers">kafkasvr:9092</property>
<property name="topics">managers</property>
<property name="procedure">MANAGER.insert</property>
</configuration>
</import>
• Supports multiple data formats like CSV(default), TSV,
JSON etc. (refer documentation)
• Supports various types sources of data like Kafka,
Kinesis.
• Supply a list of brokers to pick-up offsets
• Supply a topic name which contains the messages
• Supply the stored procedure name to invoke per
event/message and then insert the result into the db
• “fetch.message.max.bytes” maximum size of message
that is fetched from Kafka (default 64 KB)
• “groupid” the group the consumer belongs to.
• “socket.timeout.ms” (milliseconds) the maximum
time the socket connection waits before timing out.
HOW TO CONFIGURE KAFKA-> VOLTDB? (IMPORTER)
24. page 24
• According to our customer success team as of today – approximately 15-20% of our
customers are using Kafka & VoltDB together.
Examples
• King Games (of Candy Crush fame) – 5 nodes, 384GB RAM, 32 cores – 300+ topics
with more that a 400,000 Txns/sec @ 50% CPU utilization.
• MaxCDN (now Stackpath – Global CDN) – 11 nodes, 128 GB RAM,16 cores, couple
of hundred topics with more that 500,000 Txns/sec @ 30% CPU utilization.
• Nimble Storage (Infosight dashboard & support) – 9 nodes,128GB RAM,64 cores –
50+ topics with more that 200,000 Txns/sec @ 20~30% CPU utilization.
• We highly recommend this architecture if it meets the SLA requirements
IS KAFKA & VOLTDB INTEGRATION IN PRODUCTION?
29. page 29
What interesting problems do we solve?
• Correlation – streaming Join (state management)
• Out of order delivery
• At least once delivery – How to dedup
• Precise Accounting
• Precise Statistics – Event time vs processing time