Netflix Keystone Pipeline at Samza Meetup 10-13-2015

A NETFLIX ORIGINAL SERVICE
SAMZA Edition
@ Samza Meetup, Oct 2015
@monaldax
https://www.linkedin.com/in/monaldaxini

What am I going to learn ?
● Situational Awareness
● Routing Service
● Samza Alterations
● Sprinkles of Best Practices - in our opinion

550 billion events per day
8.5 million events (22 GB per second) peak
Hundreds of event types
Over 1 Petabyte / day
Numbers Galore!

550 billion events per day
8.5 million events (22 GB per second) peak
Hundreds of event types
Over 1 Petabyte / day
Numbers Galore!
550 x 2

Chukwa / Suro
Event
Producer
Druid
Stream
Consumers
EMR
Kafka
Suro Router
Event
Producer
Suro

Keystone
Stream
Consumers
Samza
Router
EMR
Fronting
Kafka
Event
Producer
Consumer
Kafka
Control Plane

Injected Event Metadata
● GUID
● Timestamp
● Host
● App

Keystone Extensible Wire Protocol
● Invisible to source & sinks
● Backwards and forwards compatibility
● Supports JSON, AVRO on the horizon
● Efficient - 10 bytes overhead per message
○ message size - hundreds of bytes to 10MB

Keystone Extensible Wire Protocol
● Packaged as a jar - Chaski
● Why?
○ Evolve Independently
■ event metadata & traceability metadata
■ event payload serialization

Restrictive Access to Fronting Kafka Clusters

Why?
● Better manage 2700 brokers across 4x3 (12) clusters and 3 regions
○ Availability
○ Scalability
○ Decoupling
○ SLA

Access to Fronting Kafka Clusters
● Client Library
○ Wraps Kafka producer
○ Integrates with Netflix ecosystem
● REST proxy
○ Uses the client library
● Wire protocol Non-conformant messages dropped
Side Effect

Routing Infrastructure
+
Checkpointing
Cluster
+ 0.9.1

Routing Infrastructure
+
Checkpointing
Cluster
+
What?
0.9.1

Router
Job Manager
(Control Plane)
EC2 Instances
Zookeeper
(Instance Id assignment)
Job
Job
Job
ksnode
Checkpointing
Cluster
ASG

Distributed Systems are Hard
Keep it Simple
Minimize Moving Parts

Obey!
Obey the principles without being bound by them.
- Bruce Lee

Mind bender - Sink Isolation
● Multiple Samza jobs for one Kafka source topic
● Each job processes messages for one sink
○ E.g. separate job for each S3 & ElasticSearch cluster sinks
● Tradeoff
○ Sink isolation for extra load on Kafka source topic cluster
● Initial release
○ Each job processes partitions only from one topic

Samza Job Details
● Use window function to implement health check
○ task.window.ms=30000
● Batch requests to sinks
● Explicit offset commits only
○ automatic commits disabled - task.commit.ms=-1

Samza 0.9.1’s hardcoded
checkpoint topic naming scheme

Checkpoint Topic Name
"__samza_checkpoint_ver_%d_for_%s_%s" format
(CHECKPOINT_LOG_VERSION_NUMBER, jobName.
replaceAll("_", "-"), jobId.replaceAll("_", "-"))
Not fully Configurable

Mind bender - Checkpoint Cluster
● __samza_checkpoint_ver_$ver_$job.name_$job.id
● Topic name example
○ __samza_checkpoint_ver_1_for_ksrouter_kafka-elbert-map-audit-s3
○ job.name = ksrouter job.id = $kafkaCluster_$topic_$sink

POWERFULL!
1 checkpoint topic per sink, & source topic
for many Samza Jobs

1 checkpoint topic per kafka cluster, sink, source topic
● Change the number of samza jobs for a topic
● Easily redistribute the partitions across jobs
● Add new partitions seamlessly
● Our naming scheme facilitates migrating topics to other clusters

Job Startup Delays reading Checkpoint
Causing health check failures - timeout 5 min
What to do?

Checkpoint Topic Broker Default Config
default.replication.factor 3
log.cleaner.delete.retention.ms 30000
log.cleaner.enable true
log.cleaner.min.cleanable.ratio 0.25
og.cleaner.threads 5
log.segment.bytes 3145728
●

Checkpoint topic Samza Job Configuration
Replication factor is hard coded to 3
task.checkpoint.system=checkpoint
task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
task.checkpoint.segment.bytes=3145728

Add. Checkpoint Information
● About 300 bytes per offset commit
● Change log topic logs into the same checkpoint offset topic
○ Even if not enabled, one time large message with system-stream-partition
inserted into the same checkpoint offset topic

What’s running inside the node?

Custom
Executor
./runJob
Logs
Snapshots
Attach Volumes
./runJob
./runJob
Reconcile Loop
1 min
Health Check

Logs
ZFS Volume
Snapshots
Custom
Executor
.
/runJo
b
.
/runJo
b
.
/runJo
b
Tools
Server
Client Tools
Stream Logs
Browse through
rotated logs by date

Yes! You inferred right!
No Mesos & No Yarn

Using ThreadJobFactory in production
job.factory.class=org.apache.samza.job.local.ThreadJobFactory

SAMZA-41 - static partition range assignment
job.systemstreampartition.matcher.class=
org.apache.samza.system.RegexSystemStreamPartitionMatcher
job.systemstreampartition.matcher.config.ranges=[8-10]
^8&|^9$|^10$
you need

SAMZA-41 - static partition range assignment
Simplify...
job.systemstreampartition.matcher.class=
org.apache.samza.system.RangeSystemStreamPartitionMatcher
job.systemstreampartition.matcher.config.ranges=6-10

Prefetch Buffer - When is it going to OOM?
● Default count based per Samza container
○ (50,000 / # partitions) per topic
○ systems.source.samza.fetch.threshold=50000
● Hard to get it right and avoid OOM
○ changing message size

SAMZA-775- size based Prefetch buffer
● How much of heap should I use for prefetching?
○ systems.source.samza.fetch.threshold.bytes=200000000 (200MB)
○ per system / stream / partition
○ if > 0 precedence over systems.source.samza.fetch.threshold

● systems.source.samza.fetch.threshold.bytes is a Soft limit
○ bytes limit + size of last max message stream
● I don’t get it, where is the example?

● systems.source.samza.fetch.threshold.bytes=100000 (100K)
● 50 SystemStreamPartitions
● per system-stream-partition threshold is (100000 / 2) / 50 = 1000 bytes.
● Enforced limit would be
○ 1000 bytes + size of last message from the partition

● Value of systems.source.samza.fetch.threshold.bytes based on
○ Incoming traffic Bps into source Kafka
○ 60 seconds of buffer with region failover traffic
○ Samza in memory data structures (2 x message size)

● How does it perform?
○ Per message overhead within 0.02% of computed heuristics in the patch
○ Actual footprint of systems.source.samza.fetch.threshold.bytes is 10-15% at
the most in worst case.
■ Example: If set to 200MB, worst case observed 230MB

● Con
○ Implementation to enforce systems.source.samza.fetch.threshold.bytes is very
dependent on the implementation version of Samza
○ Hence, higher maintenance when code changes. However,
Well Worth It! Ergonmic Config! Adds Stability!

SAMZA-655 & SAMZA-540
● Backported from 0.10
○ environment variable configuration rewriter
■ Pass config from RDS to executor to Docker to Samza Job
○ expose latency related metrics in OffsetManager
■ checkpointed offset guage

Checkpointed offset gauge in action

Immutable Config in Running Job

Integration with NetlixOSS Ecosystem
● Atlas
○ Alert & Monitoring system
● Eureka
○ Service discovery

Router Stats
● S3 sink
○ 620 c3.4xl across 3 regions running 6736 docker containers
○ Avg 21 MBps (168 Mbps) per container
● On its way...
○ Kafka sink - 280 c3.4xl across 3 regions running 3200 docker containers
○ ElasticSearch sink - 70 c3.4xl across 3 regions running 850 containers

AWS c3.4xl
● 16 vCPU
● 30GB Ram
● 320GB SSD - 160 x 2
● High gigabit network
○ support for SR-IOV when we move to VPC

Per Container Reserved
● 2G - 5G memory
○ based on incoming traffic rate, prefetch buffer
● 160 mbps max network bandwidth
● 1 CPU Share
● 20G disk for buffer & logs
● Processes 1-12 partitions

Observed - Per Container
● Avg memory ~1.8G
● Avg memory usage ~ 20G
● Avg CPU utilization 8%
● Avg NetworkIn 256Mbps
● Avg NetworkOut 156Mbps
○ outgoing data compressed
I/O bound, very close to reserved capacity

Metrics
● External process Auditor - continuously monitors & diffs offset checkpoint & source Kafka topic
offset
○ consumerLag
○ missingConsumerOffset
○ stuckConsumer
○ logOffset
○ consumerOffset

End to End metrics
● Producer to Router latency
○ Avg. about 2.5 seconds
○ 90 percentile topics under 2 sec
● Kafka to Router consumer lag (estimated time to catch up)
○ 65 percentile under 500ms
○ 90 percentile under 5 seconds
● Producer event timestamp to Samza job router avg latency - 6 seconds

Additional Metrics
● uploaded-messages
● uploaded-bytes
● upload-ms
● upload
● processed-messages
● processed-bytes
● message-latency
● latest-message-timestamp
● compression-ratio-timer
● compression-ratio-gauge
● window-execution-ms
● window-interval-ms
● window-error
● kssamza-offset-change
● kssamza-offset
● kssamza-messages-behind-high-watermark
● kssamza-high-watermark
● kssamza-checkpointed-offset
● kssamza-buffered-message-size
● kssamza-buffered-message-count

Backpressure
Producer ⇐ Kafka Cluster ⇐ Samza job router ⇐ Sink

Wait there’s more in the pipeline...
● Self service tools
● Multi-tenant Stream Processing as a Service - SPaaS
○ probably add spark streaming to the mix
● Event traceability - on demand and sampled
● As number of jobs increase checkpoint topic may give way to Cassandra
● Optimization & Automation

Fronting Kafka Clusters
● Normal-priority (majority)
○ 2 clusters
○ 2 copies, 8 hour retention, 4 hour log roll
● High-priority (streaming activities etc.)
○ 2 clusters
○ 3 copies, 24 hour retention, 12 hour log roll

Fronting Kafka Instances
● 2700 d2.xl AWS instances across 3 regions for regular & failover traffic
● d2.xl
○ Large disk (6TB) - 450-475MB/s of sequential I/O throughput
○ 30GB memory, 700 Mbps medium network capability
○ Replication lag above 18MB/second per broker with thousands of partitions
○ cons: multiple instances on same physical host - increases failures

Kafka Capacity Planning
1. Stay under 20k partitions per cluster (14K)
2. Leave ≅ 40% free disk space on each broker for growth & movement
3. Throughput per partition based on 1, 2, # of brokers, and the retention
period

Partition Assignment
● All assignments Zone / Rack aware
● Strategy 1 - Multiple of brokers
● Stategy 2 - Stateful Round Robin

Kafka Auditor as a Service
● Broker monitoring
● Consumer monitoring
● Heart-beat & Continuous message latency
● On-demand Broker performance testing
● Built as a service deployable on single or multiple instances

Kafka Management UI (Beta)
Open sourcing on the road map

Netflix Keystone Pipeline at Samza Meetup 10-13-2015

Netflix Keystone Pipeline at Samza Meetup 10-13-2015

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Netflix Keystone Pipeline at Samza Meetup 10-13-2015

Ähnlich wie Netflix Keystone Pipeline at Samza Meetup 10-13-2015 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Netflix Keystone Pipeline at Samza Meetup 10-13-2015