Apache Kafka® im Unternehmenseinsatz: 10 Lektionen (Dos and don'ts with Apache Kafka)

•Als PPTX, PDF herunterladen•

0 gefällt mir•472 views

Im digitalen Zeitalter sind Events allgegenwärtig. Unternehmen haben ihre Geschäftsmodelle neu ausgerichtet, sind dynamischer, technologie- und dienstleistungsorientierter geworden, was in event-getriebenen, komplexen Geschäftsabläufen in Echtzeit resultiert. Dabei setzen 60% der Fortune100-Unternehmen auf Event-Streaming-Plattformen als grundlegende Technologie - Apache Kafka hat sich hier als de-facto Standard etabliert.Welche Stolpersteine und Herausforderungen beim produktiven Einsatz von Apache Kafka zu beachten sind, hat Confluent-Partner SVA in einem kurzweiligen Webinar zusammen gefasst. Security, Retention Time oder Exactly Once-Semantiken sind nur drei der Themen, welche wir beleuchten werden. Sprecher: Charalampos Papadopoulos, System Engineer Big Data / Analytics, SVA Marie Fraune, System Engineer Big Data / Analytics, SVA

Technologie

Online Talk: “Do's & Don'ts with Apache Kafka”

ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
APACHE KAFKA
26.04.2019
ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
2
„Ein erstes Zeichen beginnender Erkenntnis
ist der Wunsch zu sterben.“
Franz Kafka
Apache Kafka ist eine sehr performante, gut skalierbare
Streaming Plattform mit vielfältigen Use Cases.

ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
10 „PANNEN“
26.04.2019
ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
4
KEINE UNTERSCHEIDUNG ZWISCHEN INTERNEN UND
EXTERNEN DATEN
• Ohne Security kann jeder alle Daten aus Kafka auslesen
=> add „Security from the beginning“
• Damit auch Multi-Tenancy möglich
• Confluent bietet zusätzliche Security Features für die
Confluent Plattform Komponenten

ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
10 „PANNEN“
26.04.2019
ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
5
TOPICS UND KONFIGURATION
• Replication Faktor beachten, auch für interne Topics
• Topic autocreate in Produktion ausschalten
• Partitionen nur in Absprache skalieren
• Nicht nur default Einstellungen des Kafka Cluster
benutzen, Use Case abhängig Topics konfigurieren
 Confluent bietet eine gute Übersicht auf der Webseite

ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
10 „PANNEN“
26.04.2019
ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
6
DATENVERLUST
• Retention Time beachten
• Topic default replication factor 1 => abändern
• Immer auch die internen Topics im Blick behalten
Use Case Abhängig:
• Unclean leader election: true/false?
• Acknowledgement: all?
• Minimum in Sync Replicas passend konfigurieren

ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
10 „PANNEN“
26.04.2019
ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
7
EXACTLY ONCE SEMANTICS (EOS)
• Schwierig einzurichten in verteilten Systemen
• Wichtig für Transaktionen
• Logik unabhängig von Kafka extrem schwierig
abzubilden
• Seit Kafka 0.11.x ist EOS build-in und dadurch einfach
zu nutzen mit Kafka Streams

ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
10 „PANNEN“
26.04.2019
ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
8
DOPPELTE DATEN?
• Use Case abhängig, vielleicht auch gewünscht
• EOS, at most Once Semantic, garantiert dass Daten
nicht doppelt vorliegen
• Consumer „richtig“ konfigurieren => idempotente
Applikation
• Topic Konfiguration, compacted etc. => unique Key

ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
10 „PANNEN“
26.04.2019
ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
9
ANZAHL PARTITIONEN
• Nicht zu viele Partitionen pro Topic als Default
• Zookeeper hat sonst zu viel Overhead
• Besser mit wenig beginnen und für bestimmte Topics
höher setzen
 Confluent bietet Empfehlungen auf der Webseite

ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
10 „PANNEN“
26.04.2019
ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
10
ZENTRALE STELLE DER DEFINITION DER SCHEMATA
• Schwer für neue Applikationen Daten zu bearbeiten,
wenn kein Schema vorliegt
• Schemata sollten zentral „außerhalb von Applikationen
gespeichert werden
 Benutze Confluent Schema Registry

ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
10 „PANNEN“
26.04.2019
ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
11
ZOOKEEPER
• Kritische Komponente
• Kein weiterer Overhead auf dem gleichen Host
• Quorum, 3 oder 5, nicht zwischen Datacenter verteilen,
wenn Netzwerk zu langsam

ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
10 „PANNEN“
26.04.2019
ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
12
PRODUCER UND CONSUMER - KONFIGURATION UND
DEPLOYMENT
• Die Konfiguration der Producer und Consumer ist
genau so wichtig, wie die des Kafka Cluster
• Deployment und Skalierung von Consumer sollte
abgesprochen werden
 Dies geht nur mit guter Absprache zwischen Dev und
Ops -> Devops? Ja

ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
10 „PANNEN“
26.04.2019
ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
13
MONITORING UND LOGGING
• Monitoring und Logging für die alle Kafka
Komponenten aufsetzen (z. B. Confluent Control Center
etc.)
• Unbedingt JMX Werte anzapfen
• Alerts (Schwellenwerte etc.) können frühzeitig auf
Probleme hinweisen
 Confluent bietet dafür eine gute Übersicht

ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
FINALE
26.04.2019
ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA”
14
WIE MAN KAFKA ZUM „EXPLODIEREN“ BRINGT:
• Zookeeper nicht mehr verfügbar
• Broker Updates nicht seriell durchgeführt
• Partitionen erhöhen ohne das Producer und Consumer
dies „wissen“, aber davon abhängen
• Zu viele Consumer Rebalancing Aktionen

Empfohlen

Mehrserver LösungenAvarteq

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent

Santander Stream Processing with Apache Flinkconfluent

Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent

Workshop híbrido: Stream Processing con Flinkconfluent

Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent

AWS Immersion Day Mapfre - Confluentconfluent

Eventos y Microservicios - Santander TechTalkconfluent

Empfohlen

Mehrserver LösungenAvarteq

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent

Santander Stream Processing with Apache Flinkconfluent

Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent

Workshop híbrido: Stream Processing con Flinkconfluent

Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent

AWS Immersion Day Mapfre - Confluentconfluent

Eventos y Microservicios - Santander TechTalkconfluent

Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent

Citi TechTalk Session 2: Kafka Deep Diveconfluent

Build real-time streaming data pipelines to AWS with Confluentconfluent

Q&A with Confluent Professional Services: Confluent Service Meshconfluent

Citi Tech Talk: Event Driven Kafka Microservicesconfluent

Confluent & GSI Webinars series - Session 3confluent

Citi Tech Talk: Messaging Modernizationconfluent

Citi Tech Talk: Data Governance for streaming and real time dataconfluent

Confluent & GSI Webinars series: Session 2confluent

Data In Motion Paris 2023confluent

Confluent Partner Tech Talk with Synthesisconfluent

The Future of Application Development - API Days - Melbourne 2023confluent

The Playful Bond Between REST And Data Streamsconfluent

The Journey to Data Mesh with Confluentconfluent

Citi Tech Talk: Monitoring and Performanceconfluent

Confluent Partner Tech Talk with Replyconfluent

Citi Tech Talk Disaster Recovery Solutions Deep Diveconfluent

Citi Tech Talk: Hybrid Cloudconfluent

Partner Tech Talk Q3: Q&A with PS - Migration and Upgradeconfluent

Confluent Partner Tech Talk with QLIKconfluent

Weitere ähnliche Inhalte

Mehr von confluent

Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent

Citi TechTalk Session 2: Kafka Deep Diveconfluent

Build real-time streaming data pipelines to AWS with Confluentconfluent

Q&A with Confluent Professional Services: Confluent Service Meshconfluent

Citi Tech Talk: Event Driven Kafka Microservicesconfluent

Confluent & GSI Webinars series - Session 3confluent

Citi Tech Talk: Messaging Modernizationconfluent

Citi Tech Talk: Data Governance for streaming and real time dataconfluent

Confluent & GSI Webinars series: Session 2confluent

Data In Motion Paris 2023confluent

Confluent Partner Tech Talk with Synthesisconfluent

The Future of Application Development - API Days - Melbourne 2023confluent

The Playful Bond Between REST And Data Streamsconfluent

The Journey to Data Mesh with Confluentconfluent

Citi Tech Talk: Monitoring and Performanceconfluent

Confluent Partner Tech Talk with Replyconfluent

Citi Tech Talk Disaster Recovery Solutions Deep Diveconfluent

Citi Tech Talk: Hybrid Cloudconfluent

Partner Tech Talk Q3: Q&A with PS - Migration and Upgradeconfluent

Confluent Partner Tech Talk with QLIKconfluent

Mehr von confluent (20)

Q&A with Confluent Experts: Navigating Networking in Confluent Cloud

Citi TechTalk Session 2: Kafka Deep Dive

Build real-time streaming data pipelines to AWS with Confluent

Q&A with Confluent Professional Services: Confluent Service Mesh

Citi Tech Talk: Event Driven Kafka Microservices

Confluent & GSI Webinars series - Session 3

Citi Tech Talk: Messaging Modernization

Citi Tech Talk: Data Governance for streaming and real time data

Confluent & GSI Webinars series: Session 2

Data In Motion Paris 2023

Confluent Partner Tech Talk with Synthesis

The Future of Application Development - API Days - Melbourne 2023

The Playful Bond Between REST And Data Streams

The Journey to Data Mesh with Confluent

Citi Tech Talk: Monitoring and Performance

Confluent Partner Tech Talk with Reply

Citi Tech Talk Disaster Recovery Solutions Deep Dive

Citi Tech Talk: Hybrid Cloud

Partner Tech Talk Q3: Q&A with PS - Migration and Upgrade

Confluent Partner Tech Talk with QLIK

Apache Kafka® im Unternehmenseinsatz: 10 Lektionen (Dos and don'ts with Apache Kafka)

1. Online Talk: “Do's & Don'ts with Apache Kafka”

2. ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” APACHE KAFKA 26.04.2019 ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 2 „Ein erstes Zeichen beginnender Erkenntnis ist der Wunsch zu sterben.“ Franz Kafka Apache Kafka ist eine sehr performante, gut skalierbare Streaming Plattform mit vielfältigen Use Cases.

3. 10 „PANNEN“ MIT KAFKA

4. ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 10 „PANNEN“ 26.04.2019 ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 4 KEINE UNTERSCHEIDUNG ZWISCHEN INTERNEN UND EXTERNEN DATEN • Ohne Security kann jeder alle Daten aus Kafka auslesen => add „Security from the beginning“ • Damit auch Multi-Tenancy möglich • Confluent bietet zusätzliche Security Features für die Confluent Plattform Komponenten

5. ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 10 „PANNEN“ 26.04.2019 ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 5 TOPICS UND KONFIGURATION • Replication Faktor beachten, auch für interne Topics • Topic autocreate in Produktion ausschalten • Partitionen nur in Absprache skalieren • Nicht nur default Einstellungen des Kafka Cluster benutzen, Use Case abhängig Topics konfigurieren  Confluent bietet eine gute Übersicht auf der Webseite

6. ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 10 „PANNEN“ 26.04.2019 ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 6 DATENVERLUST • Retention Time beachten • Topic default replication factor 1 => abändern • Immer auch die internen Topics im Blick behalten Use Case Abhängig: • Unclean leader election: true/false? • Acknowledgement: all? • Minimum in Sync Replicas passend konfigurieren

7. ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 10 „PANNEN“ 26.04.2019 ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 7 EXACTLY ONCE SEMANTICS (EOS) • Schwierig einzurichten in verteilten Systemen • Wichtig für Transaktionen • Logik unabhängig von Kafka extrem schwierig abzubilden • Seit Kafka 0.11.x ist EOS build-in und dadurch einfach zu nutzen mit Kafka Streams

8. ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 10 „PANNEN“ 26.04.2019 ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 8 DOPPELTE DATEN? • Use Case abhängig, vielleicht auch gewünscht • EOS, at most Once Semantic, garantiert dass Daten nicht doppelt vorliegen • Consumer „richtig“ konfigurieren => idempotente Applikation • Topic Konfiguration, compacted etc. => unique Key

9. ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 10 „PANNEN“ 26.04.2019 ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 9 ANZAHL PARTITIONEN • Nicht zu viele Partitionen pro Topic als Default • Zookeeper hat sonst zu viel Overhead • Besser mit wenig beginnen und für bestimmte Topics höher setzen  Confluent bietet Empfehlungen auf der Webseite

10. ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 10 „PANNEN“ 26.04.2019 ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 10 ZENTRALE STELLE DER DEFINITION DER SCHEMATA • Schwer für neue Applikationen Daten zu bearbeiten, wenn kein Schema vorliegt • Schemata sollten zentral „außerhalb von Applikationen gespeichert werden  Benutze Confluent Schema Registry

11. ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 10 „PANNEN“ 26.04.2019 ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 11 ZOOKEEPER • Kritische Komponente • Kein weiterer Overhead auf dem gleichen Host • Quorum, 3 oder 5, nicht zwischen Datacenter verteilen, wenn Netzwerk zu langsam

12. ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 10 „PANNEN“ 26.04.2019 ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 12 PRODUCER UND CONSUMER - KONFIGURATION UND DEPLOYMENT • Die Konfiguration der Producer und Consumer ist genau so wichtig, wie die des Kafka Cluster • Deployment und Skalierung von Consumer sollte abgesprochen werden  Dies geht nur mit guter Absprache zwischen Dev und Ops -> Devops? Ja

13. ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 10 „PANNEN“ 26.04.2019 ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 13 MONITORING UND LOGGING • Monitoring und Logging für die alle Kafka Komponenten aufsetzen (z. B. Confluent Control Center etc.) • Unbedingt JMX Werte anzapfen • Alerts (Schwellenwerte etc.) können frühzeitig auf Probleme hinweisen  Confluent bietet dafür eine gute Übersicht

14. ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” FINALE 26.04.2019 ONLINE TALK: “DO'S & DON'TS WITH APACHE KAFKA” 14 WIE MAN KAFKA ZUM „EXPLODIEREN“ BRINGT: • Zookeeper nicht mehr verfügbar • Broker Updates nicht seriell durchgeführt • Partitionen erhöhen ohne das Producer und Consumer dies „wissen“, aber davon abhängen • Zu viele Consumer Rebalancing Aktionen

15. ENDE

Hinweis der Redaktion

https://www.confluent.io/blog/okay-store-data-apache-kafka/
processing.guarantee=exactly_once
Other than using the subscribe() method, there is another way for a consumer to read from topic partitions: the assign() method. In this case, the consumer is able to specify the topic partitions it wants to read for. This type of approach can be useful when you know exactly where some specific messages will be written (the partition) and you want to read directly from there. Of course, you lose the re-balancing feature in this case, which is the first big difference in using the subscribe method. Another difference is that with “manual” assignment, you can avoid specifying a consumer group (i.e. the group.id property) for the consumer — it will be just empty. In any case, it’s better to specify it. Most people use the subscribe method, leveraging the “automatic” assignment and re-balancing feature. Using both of these methods can break things, as we're about to see. Imagine having a single “test” topic with only two partitions (P0 and P1) and a consumer C1 that subscribes to the topic as part of the consumer group G1. This consumer will be assigned to both the partitions receiving messages from them. Now, let’s start a new consumer C2 that is configured to be part of the same consumer group G1 but it uses the assign method to ask partitions P0 and P1 explicitly. Now we have broken something! ...but what is it? Both C1 and C2 will receive messages from the topic from both partitions P0 and P1, but they are part of the same consumer group G1! So we have “broken” what we said in the previous paragraph about “competing consumers” when they are part of the same consumer group. You experience a “publish/subscribe” pattern, but with consumers within the same consumer group. https://dzone.com/articles/dont-use-apache-kafka-consumer-groups-the-wrong-wa
Other than using the subscribe() method, there is another way for a consumer to read from topic partitions: the assign() method. In this case, the consumer is able to specify the topic partitions it wants to read for. This type of approach can be useful when you know exactly where some specific messages will be written (the partition) and you want to read directly from there. Of course, you lose the re-balancing feature in this case, which is the first big difference in using the subscribe method. Another difference is that with “manual” assignment, you can avoid specifying a consumer group (i.e. the group.id property) for the consumer — it will be just empty. In any case, it’s better to specify it. Most people use the subscribe method, leveraging the “automatic” assignment and re-balancing feature. Using both of these methods can break things, as we're about to see. Imagine having a single “test” topic with only two partitions (P0 and P1) and a consumer C1 that subscribes to the topic as part of the consumer group G1. This consumer will be assigned to both the partitions receiving messages from them. Now, let’s start a new consumer C2 that is configured to be part of the same consumer group G1 but it uses the assign method to ask partitions P0 and P1 explicitly. Now we have broken something! ...but what is it? Both C1 and C2 will receive messages from the topic from both partitions P0 and P1, but they are part of the same consumer group G1! So we have “broken” what we said in the previous paragraph about “competing consumers” when they are part of the same consumer group. You experience a “publish/subscribe” pattern, but with consumers within the same consumer group. https://dzone.com/articles/dont-use-apache-kafka-consumer-groups-the-wrong-wa