Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 25 Anzeige

Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google

Herunterladen, um offline zu lesen

The data that organizations are required to analyze in order to make informed decisions is growing at an unprecedented rate. Companies have to capture the window of opportunity and become not just data driven, but event driven. In this talk, we will talk around addressing these issues and look into ways to bridge the on-premise kafka deployments with GCP stack for different usecases and personas. This will be followed by architecture examples on How do you deploy kafka and integrate with the rest of the GCP stack.

The data that organizations are required to analyze in order to make informed decisions is growing at an unprecedented rate. Companies have to capture the window of opportunity and become not just data driven, but event driven. In this talk, we will talk around addressing these issues and look into ways to bridge the on-premise kafka deployments with GCP stack for different usecases and personas. This will be followed by architecture examples on How do you deploy kafka and integrate with the rest of the GCP stack.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google (20)

Anzeige

Weitere von HostedbyConfluent (20)

Aktuellste (20)

Anzeige

Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google

  1. 1. © 2021 Google LLC. All rights reserved. Hybrid Streaming Analytics for Apache Kafka Users Firat Tekiner (ftekiner@google.com) EMEA Data Analytics Practice Lead
  2. 2. © 2021 Google LLC. All rights reserved. On-premises or Other Cloud Hybrid Kafka Reference Architecture Dataflow BigQuery Cloud Storage Data Studio Cloud Functions AI Platform Bigtable Confluent Replicator KSQL App App DataStore MySQL HDFS Teradata, Netezza Mainframe App App
  3. 3. © 2021 Google LLC. All rights reserved. Business is transforming Businesses have to anticipate and act on risks and opportunities faster than ever before The data and events needed for analysis are increasing in velocity, volume, and type Companies that are able to quickly identify and capitalize on insights within this changing landscape have a strategic advantage.
  4. 4. © 2021 Google LLC. All rights reserved. Why Enterprises choose Google Cloud for Streaming Analytics Serverless Architecture Robust ingestion services Unified batch and stream processing Comprehensive set of analysis tools Flexibility for users
  5. 5. © 2021 Google LLC. All rights reserved. Serverless data analytics From infrastructure to platform for insights Performance tuning Monitoring Reliability Deployment & configuration Utilization improvements The traditional data analytics platform Analysis and insights Resource provisioning Handling growing scale Analysis and insights The serverless data analytics model
  6. 6. © 2021 Google LLC. All rights reserved. Right-time Action Dashboard Visualize and share anomalous events in your data. Alerts Manage by exception through condition- based notifications. Actions Automatically trigger workflows in other systems using conditions. 1 2 3 Looker Blocks
  7. 7. © 2021 Google LLC. All rights reserved. Comprehensive set of analysis tools BigQuery Cloud Data Warehouse Easy setup Directly integrated with streaming Dataflow and Confluent Cloud Real time Fast insights and action powered by BigQuery’s Streaming API Intelligent Built-in ML for out-of-the- box predictive insights Cloud AI Platform AI & ML Tools Plug-and-play Easily experiment and collaborate with Google’s AI Hub Building blocks Tools for sight, language, conversation, and structured data Fast deployment Code-based AI platform quickly moves ML ideas to deployment Tensorflow Extended (TFX)
  8. 8. © 2021 Google LLC. All rights reserved. Improve the customer experience with Real-time AI TFX uses Dataflow and Apache Beam as the distributed data processing engine to enable several aspects of the ML life cycle, all supported with CI/CD for ML through Kubeflow pipelines. Predictive Analytics Fraud Detection Real-time Personalization More!
  9. 9. Proprietary + Confidential © 2021 Google LLC. All rights reserved. Data Analytics & Management Google Cloud Smart Analytics & AI Prebuilt ML APIs Foundation AI Platform AutoML AI Solutions Language Conversation Horizontal solutions Structured Data Language Frameworks Compute Contact Center AI Ingestion and Processing Storage and Analytics Orchestration Notebooks Industry solutions Data Labeling Training Prediction Continuous evaluation Explainability Pipelines Compute Engine Cloud TPU Cloud GPU Cloud scheduler Cloud Composer Instrumentation Cloud Build Container Registry Cloud Pub/Sub Cloud Dataflow Cloud Dataproc Data Fusion Cloud Storage BigQuery Cloud Bigtable Cloud SQL Data Catalog Data Studio Data Science and Machine Learning Sight Sight Vision Video Translate Natural Language Tables Video Intelligence Vision Natural Language Translate Speech-to-Text Text-to-Speech Document AI Dialogflow Talent Solution Recommendation AI
  10. 10. © 2021 Google LLC. All rights reserved. Flexibility for users Apache Beam Open-source, unified model and set of SDKs for defining and executing data processing Open source programming model Serves as the SDK for creating Cloud Dataflow jobs; community development increases flexibility Choose your language Java, Python, Scala, and GO are available; join DA Spotlight for news on languages Portability Program in Beam, and gain the ability to move between Spark, Flink, Dataflow, and more Dataflow Simplified stream and batch data processing Batch and Stream Reduce complexity and reuse code by driving batch and stream workloads from the same tool Reliable and consistent processing Exactly once processing with built-in support for fault-tolerant execution Simplified operations & management Performance, scaling, availability, security, and compliance handled automatically Integrated Integration with Kafka/Confluent Cloud, the Google Data Analytics suite, and GCP broadly Unified stream and batch processing
  11. 11. © 2021 Google LLC. All rights reserved. Ingest Transform Analyze Ingest and distribute data reliably Fast, correct computations quickly and simply Machine learning & data warehouse Cloud Dataflow Cloud ML Pub/Sub BigQuery Dataflow Flexible stream analytics with OSS KSQL
  12. 12. © 2021 Google LLC. All rights reserved. Google Cloud has an end-to-end, fully- managed Stream Analytics offering Pub/Sub (Messaging) Confluent Kafka (Messaging)* BigQuery Streaming API IoT Core Collect Data Catalog (Metadata Management) & Composer (Workflow Orchestration) Dataflow (Beam Streaming) Dataproc (Spark Streaming and Flink) Dataform Kubernetes Process BigQuery Bigtable AI Platform + TFX Integration Databases (e.g. Cloud SQL, Spanner) Store and Analyze Looker Apigee Firebase Activate Cloud Functions * Partner Solution
  13. 13. © 2021 Google LLC. All rights reserved. A platform for all users and intents throughout the data lifecycle Fine-grained access control Cloud IAM Metadata management Data Catalog Always encrypted Data at rest and in transit Redact sensitive data Cloud DLP Security Admin Protecting data Messaging PubSub Data Processing Dataflow Data Apps Looker (LookML) OSS Engines Dataproc (Spark, Flink) Developer Intelligent apps DW & DB BigQuery , BigTable Data processing (OSS) pipelines Dataproc (Spark, Presto, Flink) Data Processing (Native) pipelines Dataflow Orchestration Composer Data engineer Get clean, useful data Messaging PubSub or Confluent Kafka CDW BigQuery CDW & Orchestration BigQuery Visual data Integration Data Fusion ML in SQL BigQuery ML Data models, catalog Looker, Data Catalog Data analyst Query and analyze Ingestion BigQuery Streaming & DTS Governed BI Looker CDW in a Spreadsheet Connected Sheets Natural Language Query Data QnA Business User Insights Everywhere Data models, catalog Looker, Data Catalog CDW BigQuery Portable notebooks AI Platform Notebooks Simplified ML BigQuery ML & Auto ML Collaboration Feature Store, AI Platform Pipelines Spark Dataproc Data scientist Models that work CDW BigQuery Secure data sharing BigQuery
  14. 14. © 2021 Google LLC. All rights reserved. Real-time Analytics GCP Approach Event Collect Process Store and Analyze Activate BigQuery Looker Event stream / Integration Pub/sub Dataflow IoT Core Analytics Low Latency, Time Series Bigtable Apigee Firebase Apigee Firebase Monetization Cloud Logging ... Templates AI Platform Continuous Intelligence Edge Manager for ML ML at the Edge App Activation
  15. 15. © 2021 Google LLC. All rights reserved. Real-time Analytics GCP Simplified Approach Event Collect Process Store and Analyze Activate BigQuery Looker Streaming API ELT (Dataform) Materialized Views BQML BI Engine Data Studio Apigee Connected Sheets Event stream / Integration
  16. 16. © 2021 Google LLC. All rights reserved. Real-time Analytics Open and Partner Approach Event Collect Process Store and Analyze Activate Dataproc Streaming BigQuery 3rd Party BI and activation tools ... ...
  17. 17. © 2021 Google LLC. All rights reserved. Options Hybrid ● Accessing Kafka on-prem directly from GCP ● Kafka replication (on-prem to GCE or Confluent Cloud’s GCP marketplace offering) Lift and Shift ● Confluent Cloud’s fully managed Kafka (Marketplace offering) – Connectors available to BigQuery, Cloud Storage, Pub/Sub, MongoDB Atlas, etc – Clustering, SLAs, etc ● Self-managing Kafka on GCE GCP Integration ● Pre-Built Dataflow Flex ● Kafka to BigQuery template ● Using Kafka Connect ● To push to Google BigQuery. Supported by Confluent and WePay ● To push to Google Cloud Pub/Sub. Supported by Google ● Fivetran, Confluent ... How do we deploy Kafka or integrate it with the rest of the GCP stack?
  18. 18. © 2021 Google LLC. All rights reserved. On-prem Hybrid: Access Kafka on-prem from GCP Gateway Google Cloud Interconnect & VPN Gateway Kafka Cluster Analysis Cloud Dataflow Analysis Compute Engine Analysis Cloud Dataproc
  19. 19. © 2021 Google LLC. All rights reserved. On-prem Hybrid: Replicate Kafka on-prem to GCP Gateway Google Cloud Interconnect & VPN Gateway Kafka Cluster Kafka Self Managed Cluster Compute Engine Analysis Cloud Dataflow Analysis Compute Engine Kafka Connect Kafka Connect Replicator Analysis Cloud Dataproc
  20. 20. © 2021 Google LLC. All rights reserved. On-prem Lift and Shift: Confluent Cloud’s Kafka on GCP Analysis Cloud Dataflow Analysis Compute Engine Analysis Cloud Dataproc Confluent Cloud Managed by Confluent Kafka Cluster Customer Project Internet Private network
  21. 21. © 2021 Google LLC. All rights reserved. On-prem Lift and Shift: Self-managing Kafka on GCP Gateway Google Cloud Interconnect & VPN Gateway Kafka Self Managed Cluster Compute Engine Analysis Cloud Dataflow Analysis Compute Engine Analysis Cloud Dataproc
  22. 22. © 2021 Google LLC. All rights reserved. GCP Integration: Using Dataflow Template Kafka to BQ Dataflow Template Table BigQuery Kafka Compute Engine
  23. 23. © 2021 Google LLC. All rights reserved. On-prem GCP Integration: Using Kafka Connect Gateway Google Cloud Interconnect & VPN Gateway Analysis Cloud Dataflow Kafka Connect Cloud Pub/Sub Connector Kafka Topic Cloud Pub/Sub Kafka Topic Dest. BigQuery Kafka Connect BigQuery Connector Internet Private Network Supported by Google Supported by Confluent and WePay Analysis Cloud BigQuery
  24. 24. © 2021 Google LLC. All rights reserved. Comparing it to Google Cloud Pub/Sub Self-managed Kafka ● Open source ● Set up your own auth to protect your Kafka ● You must provision and plan for load isolation ● You must support it ● You must infer costs based on variety of capacity and availability patterns, buy components (rather than pay for usage): CPU, disk, network ● You must design and maintain your own replication and backup setup ● Can be used as a system of record, messages re- read from beginning — new subscribers can read from start (depending on retention policy) ● Order guarantees within a partition ● Large platform of streaming tools — KSQL, Schema Registry, Connectors to/from data sources Cloud Pub/Sub ● GCP only; however, the API can be emulated on a Kafka server on-prem ● GCP IAM integration ● 24-hour on-call support, SLAs from Google, and integrated monitoring with Stackdriver ● Transparent replication and backups for high availability and durability ● Predictable bandwidth-based billing ● Global presence: Pub/Sub is already deployed in all GCP data centers for consistent latency and high availability. Today, only global is possible. ● Single service: You only worry about managing topics and subscribers, rather than clusters ● At least once delivery
  25. 25. Thank you

×