Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | Kahn Chen and Thirteen Wang, Tencent

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 20 Anzeige

Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | Kahn Chen and Thirteen Wang, Tencent

Herunterladen, um offline zu lesen

In this session we share our experience of building a real-time data pipelines at Tencent PCG - one that handles 20 trillion daily messages with 700 clusters and 100Gb/s bursting traffic from a single app. We discuss our roadmap of enhancing Kafka to break its limits in terms of scalability, robustness and cost of operation.
We first built a proxy layer that aggregates physical clusters in a way agnostic to the clients. While this architecture solves many operational problems, it requires significant development to stay future-proof. With retrospection with our customer and careful study of the ongoing work from the community, we then designed a region federation solution in the broker layer, which allows us to deploy clusters at a much larger scale than previously possible, while at the same time providing better failure recovery and operability. We discuss how we make this development compatible with KIP-500 and KIP-405, and the two KIP (693, 694) that we submitted for discussion.

In this session we share our experience of building a real-time data pipelines at Tencent PCG - one that handles 20 trillion daily messages with 700 clusters and 100Gb/s bursting traffic from a single app. We discuss our roadmap of enhancing Kafka to break its limits in terms of scalability, robustness and cost of operation.
We first built a proxy layer that aggregates physical clusters in a way agnostic to the clients. While this architecture solves many operational problems, it requires significant development to stay future-proof. With retrospection with our customer and careful study of the ongoing work from the community, we then designed a region federation solution in the broker layer, which allows us to deploy clusters at a much larger scale than previously possible, while at the same time providing better failure recovery and operability. We discuss how we make this development compatible with KIP-500 and KIP-405, and the two KIP (693, 694) that we submitted for discussion.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | Kahn Chen and Thirteen Wang, Tencent (20)

Anzeige

Weitere von HostedbyConfluent (20)

Aktuellste (20)

Anzeige

Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | Kahn Chen and Thirteen Wang, Tencent

  1. 1. Kahn Chen 、Liang Wang 13/06/2021 Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent
  2. 2. Self-Introduction Agenda Main application scenarios at Tencent Business requirements in Using Apache Kafka Architecture evolution process in using Apache kafka Operational data at Tencent Incorporate new features of the Apache Kafka community
  3. 3. Name:Liang Wang Title :Tencent Senior Engineer Self-Introduction
  4. 4. Main application scenarios As one of the world’s biggest internet-based platform companies, Tencent uses technology to enrich the lives of users and assist the digital upgrade of enterprises. An example product is the popular WeChat application, which has over one billion active users worldwide. The Platform and Content Group (PCG) is responsible for integrating Tencent’s internet, social, and content platforms. PCG promotes the cross-platform and multi-modal development of IP, with the overall goal of creating more diversified premium digital content experiences. Since its inception, many major products—from the well-known QQ, QZone, video, App Store, news, and browser, to relatively new members of the ecosystem such as live broadcast, anime, and movies—have been evolving on top of consolidated infrastructure and a set of foundational technology components. At our center stands the real-time messaging system that connects data and computing. We have proudly built many essential data pipelines and messaging queues around Apache Kafka. Our application of Kafka is similar to other organizations: We build pipelines for cross-region log ingestion, machine learning platforms, and asynchronous communication among microservices. The unique challenges come from stretching our architecture along multiple dimensions, namely scalability, customization, and SLA.
  5. 5. Business requirements in Using Apache Kafka l Low latency and high SLA In the actual operation, the business often requires very low latency. And we found that the annual disk failure rate is as high as 5%, there may also be cases such as computer room network failures. However, the current disaster tolerance of Kafka in the community can only support the Broker level, and it is difficult to support the low delay required by our business. l Topic partition needs to support lossless scaling down In some real-world operation scenarios, such as after the peak traffic during the holidays, we need to recycle the machines that have been temporarily added to the cluster, and scale down the temporarily expanded partitions. However, currently Kafka does not support this capability. It requires the developer change the business logic of the code to achieve this. l A large number of topics need to be supported on a single cluster In Tencent real-time data warehouse scenarios, table1->flink_sql-> table2 ... table1 and table2 are usually single topics in real-time stream. Thus, a lot of topics are needed. However, because of the performance bottleneck of Zookeeper, apache Kafka cannot support over one million of the partitions well before version 2.8.0.
  6. 6. In order to meet the above requirements, what have we done
  7. 7. first federal kafka design Example: A logical topic with eight partitions (P0–P7) is distributed to two physical clusters each with four partitions (P0–P3)
  8. 8. On the one hand We add Two proxy services, one for the producer (pproxy) and another for the consumer (cproxy), implement the core protocols of the Kafka broker. They are also responsible for mapping logical topics to their physical incarnation. The application uses the same Kafka SDK to connect directly to the proxy, which acts as a broker. On the other hand We extract the logic of the Kafka controller node (such as topic metadata) into a separate service, which is also called “the controller,” but it is different from Kafka’s own controller functionality. This service is responsible for collecting the Metadata of physical clusters, composing the partition information logical topics, and then publishing it to the proxy brokers. Key components and workflows
  9. 9. 1. Controller reads the mapping between logical topic and physical topic from the config database. 2. Controller scans metadata of all physical topics from each Kafka cluster. 3. Controller finds the list of available pproxy from heartbeat message. 4. Controller composes the metadata for the logical topics. 5. Controller pushes both the topic mapping and topic metadata to proxy brokers. 6. The metadata is sent to the Kafka SDK. Cluster provisioning workflow
  10. 10. Cluster decommission workflow 1. Controller marks the cluster (cluster 1) to be decommissioned in read-only mode in the config database. 2. Controller picks up this state change when it fetches the metadata from the database. 3. Controller reconstructs metadata for producers, removing cluster 1 from the physical clusters. 4. Controller pushes a new version of producer metadata to the proxy. At this moment, data stops being written to cluster 1 , while the consuming side is not impacted. 5. While all topics in the cluster are expired, the controller reconstructs the consumer metadata again and removes the physical topics no longer valid. 6. Consumer proxy loads the new metadata and now when the new consumer arrives, it will no longer pick up cluster 1. This completes the decommission operation.
  11. 11. Advantages and disadvantages of Cluster Federation Advantages ü The topic partition can achieve lossless expansion and contraction ü Disaster tolerance can be achieved at the sub-cluster level Disadvantages l Proxy service needs to implement all kafka protocols, and it is difficult to incorporate new features of apache kafka community l Add an additional 30% of the cost to deploy the Proxy service
  12. 12. 1. How to reduce the additional 30% overhead caused by proxy services 2. How to make it easy to incorporate new features from the apache kafka community Now we have two new problems
  13. 13. First, the role of the proxy service is to proxy the partition request of the logical topic to the correct physical topic. If we want to remove the proxy service, the metadata obtained by the sdk must be directly requested on the real broker. We can merge the metadata of the topics distributed on multiple sub-clusters to make the SDK look like a complete topic. Example: A logical topic with eight partitions (P0–P7) is distributed to two physical clusters,the first one with four partitions (P0–P3) and the other with four partitions (P4–P7,partition id was started from 4), and the topic name must be same in the two clusters P4 P5 P6 P7 new design of federal Kafka (Aims to remove proxy service) P4 P6 P5 P7 P5 P7 P4 P6 partition id does not start from 0
  14. 14. The improved version of the Cluster Federation(without proxy service layer) In order to make the sdk get a merged metadata, we added a new module "Master cluster", which is a complete Kafka cluster with zookeeper, but there is no broker role on it. On this "Master cluster" cluster, we have mounted multiple kafka clusters with Brokers, and divided these kafka clusters into groups. In operation, we ensure that a topic partition will be allocated to all sub-clusters in a group, but cross-groups are not allowed. Therefore, this improved federated cluster can achieve the purpose of expanding the partitions by expanding the sub-clusters under the group where the topic is located when the number of topic partitions is insufficient. When the number of topics in the federated cluster is insufficient, the purpose of expanding the topic can be achieved by expanding the group. "Master cluster" will obtain the metadata of all sub-clusters, and sort and summarize the metadata of the same topic to obtain a complete metadata and send it to the SDK. so that the message production without transaction is already available Up. Since there is no correlation between the sub-clusters and a consumer group tends to subscribe to multiple topics, the management of consumer groups and transactions must be implemented on the "Master Cluster".
  15. 15. How does the new Cluster Federation work ? After the "Master Cluster" is started, the metadata of all sub-clusters will be summarized and sorted by Topic 1、The SDK obtains metadata from the "Master Cluster" cluster 2、"Master Cluster" will return the aggregated topic metadata 3、The SDK normally reads and writes messages based on metadata, but Metadata Update and FindCoordiantor will only request from the "Master Cluster". Produce/Fetch request for partition 0 and partition 3 of U_TopicA
  16. 16. How to realize the expansion and disaster tolerance of “Master Cluster”? 1、We have realized the role of brokers in KIP 500, so it becomes very easy for Coordinator to expand according to actual needs. 2、When a new sub-cluster is registered or a sub-cluster is unregistered, this change will be written to Redis. 3、The data requested by OffsetCommit will also be written to Redis asynchronously. 4、When“Master Cluster” fails,we will use the backup cluster to recover the data from Redis immediately.
  17. 17. l Limit the number of partitions according to the cluster specifications to prevent performance problems caused by the out-of-control file number. How to achieve cluster high performance and high availability l Mute topics and partitions of failure cluster, and we have submitted KIP-694 about it. l Mute reads and writes for lossless cluster upgrades. l The SDK has the ability to mute abnormal partitions, and we have submitted KIP-693 about it also.
  18. 18. Operational data at Tencent Apach kafka Tencent's customized kafka Read and write recovery time after cluster failure It usually takes 30-60 minutes to restart a broker with 20TB data The client will stop writing to the abnormal sub-cluster, and the recovery time is <1min Number of Brokers in a single cluster Before version 2.8.0, the number of brokers in a single cluster can reach about 200. The number of brokers in a cluster can support up to 1000 Machine recycling and partition shrinking after traffic peaks Machine recycling can be achieved through rebalance, but partition scaling down cannot be supported You can achieve business- insensitive partition scaling down and recycling machines without balancing
  19. 19. Incorporate new features of the Apache Kafka community When we designed the federation mode, kafka 2.8.0 was not released, so our overall design idea is to be better compatible with the features of KIP 500 in the future, such as role splitting, and Controller unified management metadata. In the future we will incorporate these cool features of the community
  20. 20. Thinks

×