Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernetes | Murali Kaundinya, Wells Fargo

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 15 Anzeige

Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernetes | Murali Kaundinya, Wells Fargo

Herunterladen, um offline zu lesen

At Wells-Fargo, we move 150 TB of logs data from our syslogs to Splunk forwarders that get indexed and organized for analytic queries. As we modernize and migrate our applications to our hybrid cloud the performance expectations for this infrastructure will proportionately increase. Those improvements include the resilience of the end to end infrastructure. First, we decoupled the applications from their logging interface through a loglibrary which split the streams of logs from their sources to KAFKA which routed them to two separate destinations Splunk and ELK respectively. We also used prometheus and grafana for monitoring the metrics. We also deployed KAFKA, Splunk, ELK, Prometheus and Grafana on the Kubernetes clusters. Confluent had released a version of KAFKA without Zookeeper and replaced its functionality with Quorum Controller. The Quorum-Controller version exhibited better disposability one of the 12factors that's important for Cloud-Nativeness. We packaged this version into a Kubernetes operator called Keda and deployed this for auto-scaling. We tested this to simulate the amount of logdata that we typically generate in production. Based on the above we have also implemented distributed tracing and help make it just as resilient. We will share our lessons learnt, the patterns and practices to modernize both our underlying runtime platforms and our applications with highly performing and resilient event-driven architectures.

At Wells-Fargo, we move 150 TB of logs data from our syslogs to Splunk forwarders that get indexed and organized for analytic queries. As we modernize and migrate our applications to our hybrid cloud the performance expectations for this infrastructure will proportionately increase. Those improvements include the resilience of the end to end infrastructure. First, we decoupled the applications from their logging interface through a loglibrary which split the streams of logs from their sources to KAFKA which routed them to two separate destinations Splunk and ELK respectively. We also used prometheus and grafana for monitoring the metrics. We also deployed KAFKA, Splunk, ELK, Prometheus and Grafana on the Kubernetes clusters. Confluent had released a version of KAFKA without Zookeeper and replaced its functionality with Quorum Controller. The Quorum-Controller version exhibited better disposability one of the 12factors that's important for Cloud-Nativeness. We packaged this version into a Kubernetes operator called Keda and deployed this for auto-scaling. We tested this to simulate the amount of logdata that we typically generate in production. Based on the above we have also implemented distributed tracing and help make it just as resilient. We will share our lessons learnt, the patterns and practices to modernize both our underlying runtime platforms and our applications with highly performing and resilient event-driven architectures.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernetes | Murali Kaundinya, Wells Fargo (20)

Anzeige

Weitere von HostedbyConfluent (20)

Aktuellste (20)

Anzeige

Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernetes | Murali Kaundinya, Wells Fargo

  1. 1. Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernetes Murali Kaundinya Troy Stow
  2. 2. Outline • Our current installed base of KAFKA... • Operational metrics and growth trajectory… • KAFKA in a Hybrid/Multi Cloud environment • Replicator vs. Cluster Link • Zookeeper vs. Quorum Controller • Operators: Confluent for Kubernetes, KEDA and Strimzi • Summary 2
  3. 3. Jithendra Manne Keith Kroculick Suresh Veda Patrick White Troy Stow Shyam Gadde Venky Ramakrishnan Murali Kaundinya Prasanth Kuppa Girish Tavag 3
  4. 4. KEES Platform Overview 4 Vision Statement Kafka Enterprise Event Streaming (KEES) will provide our Line of Business partners with an enterprise event streaming capability that allows them to implement event driven architecture and integration patterns for business and operational transaction, be able to dynamically provision with minimum hand holding and scale with minimal friction. Goals • Enablement of Confluent Platform to the Core Data Centers • Design and implementation of functional clusters based upon business requirements • Provide engineer guidance and support for the infrastructure being deployed in Wells Fargo • Document and operationalize management and support of the ecosystem • Develop self-service capabilities, in partnership with Architecture
  5. 5. KEES Platform Ecosystem 5
  6. 6. Confluent Cluster Detail 6
  7. 7. Confluent features offered to LOB partners 7
  8. 8. Event Streaming Requirements/Hybrid Cloud • Consistency across on-prem, one or more Cloud Service Providers • Should Event Streaming be centralized, federated or localized? • Ease of use, development and operations. • Run Event Streaming on Kubernetes as the core substrate. • Secure, Auto-Scaling, Portable, Managed Self-Service. • Abstraction for several Functional & Non-Functional Requirements. • Manage down middleware sprawl and be interoperable. • Backing Services, Config., Processes, Disposability, Concurrency 8
  9. 9. KAFKA in a Hybrid/Multi Cloud environment 9
  10. 10. Zookeeper vs. Quorum Controller • Metadata Management • Duplication Functionality • Additional Resources 10
  11. 11. Confluent for Kubernetes Quorum Controller 11
  12. 12. Confluent for Kubernetes • Cloud-Native Declarative API: • Declarative Kubernetes-native API • Configure, deploy, and manage through IaC: GitOps • Cloud-Native Security Best Practices • RBAC, AuthN, TLS, Certs/Secrets • Upgrades • Automated rolling updates to configuration. • Automated rolling upgrades w/o affecting downtime. • Auto-Scaling • API • Reliability Checks. 12
  13. 13. Confluent for Kubernetes • Resiliency • Restores Pod on failures. • Automated rack awareness to spread replicas of a partition; • Improves availability. • Scheduling • Supports Kubernetes labels & annotations • Supports Kubernetes tolerations and pod/node affinity for pod placement. • Monitoring • Supports aggregated metrics export to Prometheus 13
  14. 14. Summary Features • Cluster Link • Support “Fail Forward” • Confluent for Kubernetes • Auto Deployment & Auto Scaling • Configure once and run everywhere. • Kafka Stream • Custom Utility for masking and filtering • ksqlDB Lessons Learnt • Quorum Controller has potential. • Good for pilots. • Test it extensively for enterprise use. • Needs better documentation. • Needs community engagement. • Recommend Kafka 3.0 or 6.1.2 • Can use better developer guides on new features. 14
  15. 15. Thank you!

×