Building Pinterest Real-Time Ads Platform Using Kafka Streams (Liquan Pei + Boyang Chen, Pinterest) Kafka Summit SF 2018
In this talk, we are sharing the experience of building Pinterest’s real-time Ads Platform utilizing Kafka Streams. The real-time budgeting system is the most mission-critical component of the Ads Platform as it controls how each ad is delivered to maximize user, advertiser and Pinterest value. The system needs to handle over 50,000 queries per section (QPS) impressions, requires less than five seconds of end-to-end latency and recovers within five minutes during outages. It also needs to be scalable to handle the fast growth of Pinterest’s ads business.
The real-time budgeting system is composed of real-time stream-stream joiner, real-time spend aggregator and a spend predictor. At Pinterest’s scale, we need to overcome quite a few challenges to make each component work. For example, the stream-stream joiner needs to maintain terabyte size state while supporting fast recovery, and the real-time spend aggregator needs to publish to thousands of ads servers while supporting over one million read QPS. We choose Kafka Streams as it provides milliseconds latency guarantee, scalable event-based processing and easy-to-use APIs. In the process of building the system, we performed tons of tuning to RocksDB, Kafka Producer and Consumer, and pushed several open source contributions to Apache Kafka. We are also working on adding a remote checkpoint for Kafka Streams state to reduce the time of code start when adding more machines to the application. We believe that our experience can be beneficial to people who want to build real-time streaming solutions at large scale and deeply understand Kafka Streams.
18. ● Read/write performance
○ Use point query for fast lookup
■ fetch(key, timeFrom, timeTo);
■ fetch(key, windowStartTime); [ >=Kafka 2.0.0 ]
○ Increase block cache size
○ Reduce action state store size
How to achieve sub-second latency?
20. ● Each commit triggers RocksDB flush to ensure data is persistent on disk.
● Each RocksDB flush creates SST.
● Accumulated number of SST files will trigger compaction.
● Tune commit.interval.ms.
Kafka Streams Commit
22. Fast recovery
● Rolling restart could trigger multiple rebalances.
● State shuffling is expensive.
Approaches:
● Recover faster:
○ increase max.poll.records for restore consumer (KIP-276)
○ RocksDB window store batch recovery (KAFKA-7023)
● Single rebalance:
○ Wait for all members to be ready = increase session.timeout.ms.
○ Restore faster: static membership (KIP-345)
26. Aggregator
● Utilize Stream DSL API
● Requirements
○ End to end sub second latency. User action to ads serving.
○ Thousands of ads serving machines needs to consume this data.
27. Output to a compacted topic
Cons:
○ High fanout, broker saturation.
○ Replay could be long.
Pros:
○ Fast correction.
○ Logic simplicity.
28. Cons:
○ Event based: no way to reset.
○ Time based: expensive batch
operation.
Pros:
○ Very small volume.
○ Logic consolidation.
Output to a signal topic
29. Streaming in budget change
Pros:
○ Unblock signal reset without
batch update.
Cons:
○ Consistency guarantee.
○ Strong ordering guarantee.
30. Budgeting Summary
● Low level metrics are critical, especially storage layer.
● Large state shuffling is bad.
● Compacted topic as partitioned key-value store.
● Unified solution for serving stream output.
31. New Ads Exploration
● A new ad is created, however, the Ads Platform doesn’t know about the user
engagement with this ad on different surfaces.
● The faster the Ads Platform knows about the performance of the newly
created ad, the better value we provide to the user.
● Balance between exploiting good ads and exploring new ads.
● Solution: Add a boosting factor to new ads to increase the probability of
winning auction.
33. New Ads Exploration
● Need to compute <ad id, past X day impressions>.
● The result published to S3 for serving.
● Backfilling is needed.
○ Exactly same logic as the normal processing.
40. Stream Platform
● Usability
○ User should only focus on business logic.
○ Support for more state store backends.
○ Type system for easier code sharing.
● Scalability
○ Applications should be able to handle more QPS with more machines.
● Fault Tolerance
○ Application should recover within X minutes.
○ Application should support code and state rollback.
● Developer Velocity
○ The platform should provide standard ways of backfilling.
● Debuggability
○ The platform should provide standard ways of exposing debug information to be queryable.
41. Contributions
● KIP-91 Adding delivery.timeout.ms to Kafka producer.
● KIP-245 Replace StreamsConfig with Properties
● KIP-276 Add config prefix for different consumers
● KIP-300 (ongoing) Add windowed KTable API
● KIP-345 (ongoing) Reduce consumer rebalances through static membership
● KAFKA-6896 Export producer and consumer metrics in Kafka Streams
● KAFKA-7023 Move prepareForBulkLoad() call after customized RocksDBConfigSetter
● KAFKA-7103 Use bulkloading for RocksDBSegmentedBytesStore during init
● RocksDB Metrics Lib