Realtime stream processing with kafka

•Als PPTX, PDF herunterladen•

0 gefällt mir•102 views

Praveen Singh Bora

Real time stream processing with Apache Kafka

Technologie

New World
DB/DWH + distributes systems
Monolithic Application -> Microservices
Batch -> Real-time

Real-time Distributed Streaming Platform

Terminologies
● Message
● Topic and Partition

Terminologies
● Message/ Batches
● Topic and Partition
● Kafka Brokers & Cluster

Terminologies
● Message/ Batches
● Topic and Partition
● Kafka Brokers & Cluster
● Producer and Consumer

Terminologies
● Message/ Batches
● Topic and Partition
● Kafka Brokers & Cluster
● Producer and Consumer
● Zookeeper

Problem statement
When a customer uses a credit card to do a transaction, the vendor needs a fast
response to the question, “Is it a fraudulent payment?”
Real time stream processing

Fraud Detection ( Is payment fraud? YES/NO )
Broker 1 Broker 2
Kafka Cluster
Kafka
Connect
External
System
Payment 1
Payment 2
Payment 1001
Payment 10002
NO
YES
NO
NO

Is a payment fraudulent one?
● Analysis and forensics on historical data to build the machine learning models.
● Use machine learning models to prediction fraud on live streams.
○ Card Velocity
○ Average spending in last 60 mins > 10 * average spending in 60 mins ever

Problem statement : Fraud detection
● POS Transaction Data (Live Stream)
● User Information
● User Transaction History
● Fraud Location Estimator

Kafka Core API
1. Producer API
2. Consumer API
3. Connect API
4. Stream API

Step 1: Produce messages using Producer API
POS_TRANSACTION_TOPIC

Streaming Platform Overview
Broker 1 Broker 2
Kafka Cluster
Kafka
Connect
Kafka
Connect
External
System
External
System

Key concepts
Key ValueX 25
Key ValueY 50
Key ValueZ 9
Table
Key ValueX 20
Key ValueX 25
Key ValueZ 9
Key ValueY 5
Key ValueY 50
Stream
Duality of Stream and Table

Key concepts
Processor Topology
Stream Processor
Stream

How to use Kafka Streams API?
Just three steps
1. Create one or more streams from Kafka topic(s).
2. Compose transformations on these streams.
3. Write transformed streams back to Kafka.

Creating source streams from Kafka
● Input topics to KStream
○ Each app instance gets a subsect of partitions of input streams.
○ Specify the serializer and deserializer .

Transform a Stream
● Stateless transformation
○ Don’t require state for processing.
○ Don’t require state store with stream processor.
○ E.g. Branch, Filter, Inverse Filter, FlatMap, Peek, Map etc

Transform a Stream
● Stateful transformation
○ Depends on state for processing inputs and producing outputs.
○ Require a state store with stream processor.
○ State stores are fault tolerant.
■ Aggregating
■ Joining
■ Windowing
■ Applying custom processors and transformers

Aggregating
○ Group the record by either groupByKey or groupBy.
○ KGroupedStream or KGroupedTable can be aggregated via operations like reduce.
○ Aggregation can be performed on windowed or non-windowed data.

Weitere ähnliche Inhalte

Was ist angesagt?

APAC ksqlDB Workshop

confluent

Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services

confluent

Watch this talk here: https://www.confluent.io/online-talks/bridge-to-cloud-apache-kafka-migrate-aws Speakers: Priya Shivakumar, Director of Product, Confluent + Konstantine Karantasis, Software Engineer, Confluent + Rohit Pujari, Partner Solutions Architect, AWS Most companies start their cloud journey with a new use case, or a new application. Sometimes these applications can run independently in the cloud, but often times they need data from the on premises datacenter. Existing applications will slowly migrate, but will need a strategy and the technology to enable a multi-year migration. In this session, we will share how companies around the world are using Confluent Cloud, a fully managed Apache Kafka service, to migrate to AWS. By implementing a central-pipeline architecture using Apache Kafka to sync on-prem and cloud deployments, companies can accelerate migration times and reduce costs. In this online talk we will cover: •How to take the first step in migrating to AWS •How to reliably sync your on premises applications using a persistent bridge to cloud •Learn how Confluent Cloud can make this daunting task simple, reliable and performant •See a demo of the hybrid-cloud and multi-region deployment of Apache Kafka

Bridge to Cloud: Using Apache Kafka to Migrate to AWS

confluent

Watch this talk here: https://www.confluent.io/online-talks/building-event-driven-applications-apache-kafka-and-confluent-platform Apache Kafka® has become the de facto technology for real-time event streaming. Confluent Platform, developed by the creators of Apache Kafka, is an event-streaming platform that enables the ingest and processing of massive amounts of data in real time. In this session, we will cover the easiest ways to start developing event-driven applications with Apache Kafka using Confluent Platform. We will also demo a contextual event-driven application built using our ecosystem of connectors, REST proxy, and a variety of native clients. View now to learn: -How to create Apache Kafka topics in minutes and process event streams in real time -Check the health of an Apache Kafka broker using Confluent Control Center -The latest enhancements to Confluent Platform that make it easier to run Apache Kafka at scale -How to use KSQL, streaming SQL for Apache Kafka, to process event streams in real time using simple SQL queries

Building Event-Driven Applications with Apache Kafka & Confluent Platform

confluent

With a fully managed Apache Kafka(R) as-a-service on Microsoft Azure, businesses can focus on building applications and not managing clusters. Build a persistent bridge from on-premises data systems to the cloud with a hybrid Kafka service or stream across public clouds for multi-cloud data pipelines. In this session for business and technical data leaders, you can learn about powering business applications with the managed Kafka service that streams data into Azure SQL Data Warehouse, Cosmos DB, Azure Data Lake Storage and Azure Blob Storage.

Bridge Your Kafka Streams to Azure Webinar

confluent

ksqlDB Workshop

confluent

How can you leverage the flexibility and extreme scale in the public cloud combined with your Apache Kafka ecosystem to build scalable, mission-critical machine learning infrastructures, which span multiple public clouds or bridge your on-premise data centre to cloud? This talk will discuss and demo how you can leverage machine learning technologies such as TensorFlow with your Kafka deployments in public cloud to build a scalable, mission-critical machine learning infrastructure for data preprocessing and ingestion, and model training, deployment and monitoring. The discussed architecture includes capabilities like scalable data preprocessing for training and predictions, combination of different Deep Learning frameworks, data replication between data centres, intelligent real time microservices running on Kubernetes, and local deployment of analytic models for offline predictions. Deep Learning UDF for KSQL for Streaming Anomaly Detection of MQTT IoT Sensor Data.: I built a KSQL UDF for sensor analytics. It leverages the new API features of KSQL to build UDF / UDAF functions easily with Java to do continuous stream processing on incoming events. Use Case: Connected Cars - Real Time Streaming Analytics using Deep Learning Continuously process millions of events from connected devices (sensors of cars in this example).

Unleashing Apache Kafka and TensorFlow in the Cloud 

Kai Wähner

New Features in Confluent Platform 6.0 / Apache Kafka 2.6

Kai Wähner

Viktor Gamov, Confluent, Developer Advocate Apache Kafka is an open source distributed streaming platform that allows you to build applications and process events as they occur. Viktor Gamov (developer Advocate at Confluent) walks through how it works and important underlying concepts. As a real-time, scalable, and durable system, Kafka can be used for fault-tolerant storage as well as for other use cases, such as stream processing, centralized data management, metrics, log aggregation, event sourcing, and more. This talk will explain what a streaming platform such as Apache Kafka is and some of the use cases and design patterns around its use—including several examples of where it is solving real business problems. https://www.meetup.com/Chennai-Kafka/events/269942117/

What is Apache Kafka®?

confluent

Amsterdam meetup at ING June 18, 2019

confluent

More info: https://cnfl.io/cloud-native-experience-for-kafka-in-cloud | Neha Narkhede is co-founder and CTO at Confluent, a company backing the popular Apache Kafka messaging system. Prior to founding Confluent, Neha led streams infrastructure at LinkedIn, where she was responsible for LinkedIn’s streaming infrastructure built on top of Apache Kafka and Apache Samza. She is one of the initial authors of Apache Kafka and a committer and PMC member on the project.

Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...

confluent

Speaker: Pere Urbón-Bayes, Technical Account Manager, Confluent The need to integrate a swarm of systems has always been present in the history of IT; however, with the advent of microservices, big data and IoT, this has simply exploded. Through the exploration of a few use cases, this presentation will introduce stream processing, a powerful and scalable way to transform and connect applications around your business. We will explain in this talk how Apache Kafka® and the Confluent Platform can be used to connect the diverse collection of applications that the actual business faces. Components such as KSQL where non-developers can process streaming events at scale or those that are Kafka Streams-oriented to build scalable applications to process event data.

Building a Streaming Platform with Kafka

confluent

I did a webinar with Confluent's partner Expero about "Apache Kafka and Machine Learning for Real Time Supply Chain Optimization". This is a great example for anybody in automation industry / Industrial IoT (IIoT) like automotive, manufacturing, logistics, etc. We explain how a real time event streaming platform can integrate in real time with the legacy world and proprietary IIoT protocols (like Siemens S7, Modbus, Beckhoff ADS, OPC-UA, et al). You can process the data at scale and then ingest it into a modern database (like AWS S3, Snowflake or MongoDB) or analytic / machine learning framework (like TensorFlow, PyTorch or Azure Machine Learning Service).

IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...

Kai Wähner

KSQL: Open Source Streaming for Apache Kafka

confluent

Top use cases for 2022 with Data in Motion and Apache Kafka

confluent

Speaker: Anirudh Ramanthan, Product Manager, Rockset Tracking key events and analyzing these event streams are critical to many enterprises. We highlight how organizations are using Apache Kafka® as a fast, reliable event streaming platform alongside Rockset, a serverless search and analytics engine, to create stateful microservices to analyze their event streams. In this talk, we will discuss a stateful microservices architecture, where events from multiple channels are collected and streamed into Kafka and continuously ingested into Rockset with no explicit schema or metadata specification required. Developers then use serverless compute frameworks, like AWS Lambda, in conjunction with serverless data management from Rockset to build microservices to derive insights on the data from Kafka. Organizations can leverage this pattern to support low-latency queries on event streams, providing immediate insight on their business.

Operational Analytics on Event Streams in Kafka

confluent

Have you ever imagined what it would be like to build a massively scalable streaming application on Kafka, the challenges, the patterns and the thought process involved? How much of the application can be reused? What patterns will you discover? How does it all fit together? Depending upon your use case and business, this can mean many things. Starting out with a data pipeline is one thing, but evolving into a company-wide real-time application that is business critical and entirely dependent upon a streaming platform is a giant leap. Large-scale streaming applications are also called event streaming applications. They are classically different from other data systems; event streaming applications are viewed as a series of interconnected streams that are topologically defined using stream processors; they hold state that models your use case as events. Almost like a deconstructed realtime database. In this talk, I step through the origins of event streaming systems, understanding how they are developed from raw events to evolve into something that can be adopted at an organizational scale. I start with event-first thinking, Domain Driven Design to build data models that work with the fundamentals of Streams, Kafka Streams, KSQL and Serverless (FaaS). Building upon this, I explain how to build common business functionality by stepping through patterns for Scalable payment processing Run it on rails: Instrumentation and monitoring Control flow patterns (start, stop, pause) Finally, all of these concepts are combined in a solution architecture that can be used at enterprise scale. I will introduce enterprise patterns such as events-as-a-backbone, events as APIs and methods for governance and self-service. You will leave talk with an understanding of how to model events with event-first thinking, how to work towards reusable streaming patterns and most importantly, how it all fits together at scale.

Kafka summit SF 2019 - the art of the event-streaming app

Neil Avery

All Streams Ahead! ksqlDB Workshop ANZ

confluent

Eine weitere neue sicherheitsrelevante Funktion in Confluent Platform 5.4 sind Structured Audit Logs. Jetzt ist natürlich alles in Kafka ein Log, aber Kafka protokolliert nicht, was Kafka mit Kafka macht - nur das, was in einen Topics geschrieben wird. Im dritten Teil der Deep Dive Sessions besprechen wir neben den Structured Audit Logs außerdem die "Weiterentwicklung" der bereits bekannten Schema Registry: Die Schema Validation agiert auf dem Topic-Level und stellt sicher, dass jede einzelne Message, die zu einem bestimmten Topic erstellt wird in der Schema Registry überprüft wird. Mehr dazu erklären wir in unserem Deep Dive #3.

Deep Dive Series #3: Schema Validation + Structured Audit Logs

confluent

Watch this talk here: https://www.confluent.io/online-talks/best-practices-for-streaming-iot-data-with-MQTT-and-apache-kafka-on-demand Organizations today are looking to stream IoT data to Apache Kafka. However, connecting tens of thousands or even millions of devices over unreliable networks can create some architecture challenges. In this session, we will identify and demo some best practices for implementing a large scale IoT system that can stream MQTT messages to Apache Kafka.

Best Practices for Streaming IoT Data with MQTT and Apache Kafka®

confluent

Was ist angesagt? (20)

APAC ksqlDB Workshop

Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services

Bridge to Cloud: Using Apache Kafka to Migrate to AWS

Building Event-Driven Applications with Apache Kafka & Confluent Platform

Bridge Your Kafka Streams to Azure Webinar

ksqlDB Workshop

Unleashing Apache Kafka and TensorFlow in the Cloud 

New Features in Confluent Platform 6.0 / Apache Kafka 2.6

What is Apache Kafka®?

Amsterdam meetup at ING June 18, 2019

Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...

Building a Streaming Platform with Kafka

IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...

KSQL: Open Source Streaming for Apache Kafka

Top use cases for 2022 with Data in Motion and Apache Kafka

Operational Analytics on Event Streams in Kafka

Kafka summit SF 2019 - the art of the event-streaming app

All Streams Ahead! ksqlDB Workshop ANZ

Deep Dive Series #3: Schema Validation + Structured Audit Logs

Best Practices for Streaming IoT Data with MQTT and Apache Kafka®

Ähnlich wie Realtime stream processing with kafka

With more and more companies adopting microservices and service-oriented architectures, it becomes clear that the HTTP/RPC synchronous communication (while great) is not always the best option for every use case. In this presentation, I discuss two approaches to an asynchronous event-based architecture. The first is a "classic" style protocol (Python services driven by callbacks with decorators communicating using a messaging layer) that we've been implementing at Demonware (Activision) for Call of Duty back-end services. The second is an actor-based approach (Scala/Akka based microservices communicating using a messaging layer and a centralized router) in place at Bench Accounting. Both systems, while event based, take different approaches to building asynchronous, reactive applications. This talk explores the benefits, challenges, and lessons learned architecting both Actor and Non-Actor systems.

Actors or Not: Async Event Architectures

Yaroslav Tkachenko

Event-based APIs are becoming more popular, enabling developers to craft new integrations and solutions that go beyond the original design of an API. Yet, there remains a challenge: how can teams design thoughtful event-based APIs that are long-lasting, evolvable, and discoverable? This talk will dive into the design practices of event-based APIs, including tips for determining which protocol(s) you should select, design patterns we should apply, and anti-patterns should we avoid. We will also look at how AI and tools such as ChatGPT are starting to shape the next generation of APIs. Delivered on May 10, 2023 for the EDA Summit

Event-Based API Patterns and Practices

LaunchAny

apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...

apidays

Have you ever migrated Kafka clusters from one data center to another being completely transparent to client applications? At PayPal, as part of a massive datacenter migration initiative, Kafka team successfully moved all PayPal Kafka traffic across data centers. This initiative involved migrating 20+ Kafka clusters (1000+ broker and zookeeper nodes), as well as 60+ mirrormaker groups which seamlessly handle Kafka traffic volumes as high as 1 trillion messages per day. Throughout the course of this migration, applications required no modification, encountered 0% service outage, 0% message loss and duplicated messages. The whole migration process was fully transparent to Kafka applications. In this session, you will learn the strategies, techniques and tools the PayPal Kafka team has utilized for managing the migration process. You will also learn the lessons and pitfalls they experienced during this exercise, as well as the secret sauce of making the migration successful.

How did we move the mountain? - Migrating 1 trillion+ messages per day across...

HostedbyConfluent

"You build your streaming applications and event-driven microservices using Apache Kafka. Are your systems observable enough without depending only on the broker-side metrics and application logs? Can you track down the root cause during incidents, or do you hope everything will be fine after a restart? In this talk, Tim & Kosta will take you on their observability journey by sharing pitfalls and knowledge our team gained over the last couple of years. We are going to answer questions like: • Do you understand how to expose and use your client-side Kafka metrics? • JMX, Metric interceptors, Micrometer where to start? • Why is there a difference between the values of client-side and broker-side metrics? • Learn how client-side consumer lag metrics can differ from the lag calculated on the cluster. • What is the right way to use and interpret them? • Can you measure latency through your complete stack using distributed tracing? • OpenTelemetry, Jaeger & Zipkin, what to pick? During a step-by-step demo, we will look into different real-life examples and scenarios to demonstrate how to bring the observability of your Kafka applications to the next level."

A Practical Deep Dive into Observability of Streaming Applications with Kosta...

HostedbyConfluent

KFServing Payload Logging for Trusted AI

Animesh Singh

Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드

confluent

Flexible and Real-Time Stream Processing with Apache Flink

DataWorks Summit

Jay Kreps is the CEO of Confluent, Inc., a company backing the popular Apache Kafka® messaging system. Prior to founding Confluent, he was formerly the lead architect for data infrastructure at LinkedIn. He is among the original authors of several open source projects including Project Voldemort (a key-value store). Apache Kafka (a distributed messaging system) and Apache Samza (a stream processing system).

Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent

confluent

Stream Processing with Flink and Stream Sharing

confluent

Learn the differences between an event-driven streaming platform and middleware like MQ, ETL and ESBs – including best practices and anti-patterns, but also how these concepts and tools complement each other in an enterprise architecture. Extract-Transform-Load (ETL) is still a widely-used pattern to move data between different systems via batch processing. Due to its challenges in today’s world where real time is the new standard, an Enterprise Service Bus (ESB) is used in many enterprises as integration backbone between any kind of microservice, legacy application or cloud service to move data via SOAP / REST Web Services or other technologies. Stream Processing is often added as its own component in the enterprise architecture for correlation of different events to implement contextual rules and stateful analytics. Using all these components introduces challenges and complexities in development and operations. This session discusses how teams in different industries solve these challenges by building a native streaming platform from the ground up instead of using ETL and ESB tools in their architecture. This allows to build and deploy independent, mission-critical streaming real time application and microservices. The architecture leverages distributed processing and fault-tolerance with fast failover, no-downtime rolling deployments and the ability to reprocess events, so you can recalculate output when your code changes. Integration and Stream Processing are still key functionality but can be realized in real time natively instead of using additional ETL, ESB or Stream Processing tools.

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)

Kai Wähner

apidays New York 2023 APIs for Embedded Business Models: Finance, Healthcare, Retail, and Media May 16 & 17, 2023 Why Finance needs Asychronous APIs Nicholas Goodman, Systems Engineer at Solace ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

apidays New York 2023 - Why Finance needs Asychronous APIs, Nicholas Goodman,...

apidays

Learn how the combination of Apache Kafka and Apache Flink is making stateful stream processing even more expressive and flexible to support applications in streaming that were previously not considered streamable. The new world of applications and fast data architectures has broken up the database: Raw data persistence comes in the form of event logs, and the state of the world is computed by a stream processor. Apache Kafka provides a strong solution for the event log, while Apache Flink forms a powerful foundation for the computation over the event streams. In this talk we discuss how Flink’s abstraction and management of application state have evolved over time and how Flink’s snapshot persistence model and Kafka’s log work together to form a base to build ‘versioned applications’. We will also show how end-to-end exactly-once processing works through a smart integration of Kafka’s transactions and Flink’s checkpointing mechanism.

2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...

Ververica

Santander Stream Processing with Apache Flink

confluent

Data Stream Processing with Apache Flink

Fabian Hueske

JHipster conf 2019 - Kafka Ecosystem

Florent Ramiere

(Mike Graham + Dan Carroll, Comcast) Kafka Summit SF 2018 Comcast manages over 2 million miles of fiber and coax, and over 40 million in home devices. This “outside plant” is subject to adverse conditions from severe weather to power grid outages to construction-related disruptions. Maintaining the health of this large and important infrastructure requires a distributed, scalable, reliable and fast information system capable of real-time processing and rapid analysis and response. Using Apache Kafka and the Kafka Streams Processor API, Comcast built an innovative new system for monitoring, problem analysis, metrics reporting and action response for the outside plant. In this talk, you’ll learn how topic partitions, state stores, key mapping, source and sink topics and processors from the Kafka Streams Processor API work together to build a powerful dynamic system. We will dive into the details about the inner workings of the state store—how it is backed by a Kafka “changelog” topic, how it is scaled horizontally by partition and how the instances are rebuilt on startup or on processor failure. We will discuss how these state stores essentially become like materialized views in a SQL database but are updated incrementally as data flows through the system, and how this allows the developers to maintain the data in the optimal structures for performing the processing. The best part is that the data is readily available when needed by the processors. You will see how a REST API using Kafka Streams “interactive queries” can be used to retrieve the data in the state stores. We will explore the deployment and monitoring mechanisms used to deliver this system as a set of independently deployed components.

Inside Kafka Streams—Monitoring Comcast’s Outside Plant

confluent

How Priceline uses Kafka Streams technology to effectively save TBs on daily licenses of our monitoring systems. Kafka Streams powers a big part of our analytics and monitoring pipelines and delivers operational metrics transformations in real time. All logs and operational metrics from all of the APIs of Priceline’s products flow into Kafka and is ingested into our Monitoring System Splunk for Alerting and Monitoring. We have now implemented data transformations, aggregations and summarizations using Kafka Streams technologies to effectively eliminate PCI/PII violations on the log data; do aggregations on metrics to avoid ingesting sub-second metrics and ingest metrics only at the granularity that we need to. We will cover the need for custom Serdes, custom partitioners, and why we don’t use the confluent registry. You will also learn how Priceline uses a self service model to configure its streams, topics and consumers using Data Collection Console, which is our UI for managing the Kafka streaming pipelines.

Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...

confluent

When building service-based systems, we don’t generally think too much about data. If we need data from another service, we ask for it. This pattern works well for whole swathes of use cases, particularly ones where datasets are small and requirements are simple. But real business services have to join and operate on datasets from many different sources. This can be slow and cumbersome in practice. These problems stem from an underlying dichotomy. Data systems are built to make data as accessible as possible—a mindset that focuses on getting the job done. Services, instead, focus on encapsulation—a mindset that allows independence and autonomy as we evolve and grow. But these two forces inevitably compete in most serious service-based architectures. Ben Stopford explains why understanding and accepting this dichotomy is an important part of designing service-based systems at any significant scale. Ben looks at how companies make use of a shared, immutable sequence of records to balance data that sits inside their services with data that is shared, an approach that allows the likes of Uber, Netflix, and LinkedIn to scale to millions of events per second. Ben concludes by examining the potential of stream processors as a mechanism for joining significant, event-driven datasets across a whole host of services and explains why stream processing provides much of the benefits of data warehousing but without the same degree of centralization.

NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams

Ben Stopford

Counting Elements in Streams

Jamie Grier

Ähnlich wie Realtime stream processing with kafka (20)

Actors or Not: Async Event Architectures

Event-Based API Patterns and Practices

apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...

How did we move the mountain? - Migrating 1 trillion+ messages per day across...

A Practical Deep Dive into Observability of Streaming Applications with Kosta...

KFServing Payload Logging for Trusted AI

Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드

Flexible and Real-Time Stream Processing with Apache Flink

Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent

Stream Processing with Flink and Stream Sharing

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)

apidays New York 2023 - Why Finance needs Asychronous APIs, Nicholas Goodman,...

2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...

Santander Stream Processing with Apache Flink

Data Stream Processing with Apache Flink

JHipster conf 2019 - Kafka Ecosystem

Inside Kafka Streams—Monitoring Comcast’s Outside Plant

Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...

NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams

Counting Elements in Streams

Kürzlich hochgeladen

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

Tech Trends Report 2024 Future Today Institute.pdf

hans926745

Scaling API-first – The story of a global engineering organization

Radu Cotescu

What is a good lead in your organisation? Which leads are priority? What happens to leads? When sales and marketing give different answers to these questions, or perhaps aren't sure of the answers at all, frustrations build and opportunities are left on the table. Join us for an illuminating session with Cian McLoughlin, HubSpot Principal Customer Success Manager, as we look at that crucial piece of the customer journey in which leads are transferred from marketing to sales.

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

HampshireHUG

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

Presentation on how to chat with PDF using ChatGPT code interpreter

naman860154

GenAI Risks & Security Meetup 01052024.pdf

lior mazor

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

🐬 The future of MySQL is Postgres 🐘

RTylerCroy

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Neo4j

Histor y of HAM Radio presentation slide

vu2urc

Building Digital Trust in a Digital Economy Veronica Tan, Director - Cyber Security Agency of Singapore Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

apidays

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

BooK Now Call us at +918448380779 to hire a gorgeous and seductive call girl for sex. Take a Delhi Escort Service. The help of our escort agency is mostly meant for men who want sexual Indian Escorts In Delhi NCR. It should be noted that any impersonator will get 100 attention from our Young Girls Escorts in Delhi. They will assume the position of reliable allies. VIP Call Girl With Original Photos Book Tonight +918448380779 Our Cheap Price 1 Hour not available 2 Hours 5000 Full Night 8000 TAG: Call Girls in Delhi, Noida, Gurgaon, Ghaziabad, Connaught Place, Greater Kailash Delhi, Lajpat Nagar Delhi, Mayur Vihar Delhi, Chanakyapuri Delhi, New Friends Colony Delhi, Majnu Ka Tilla, Karol Bagh, Malviya Nagar, Saket, Khan Market, Noida Sector 18, Noida Sector 76, Noida Sector 51, Gurgaon Mg Road, Iffco Chowk Gurgaon, Rajiv Chowk Gurgaon All Delhi Ncr Free Home Deliver

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Delhi Call girls

Finology Group – Insurtech Innovation Award 2024

The Digital Insurer

CNv6 Instructor Chapter 6 Quality of Service

giselly40

Evaluating the top large language models.pdf

ChristopherTHyatt

Kürzlich hochgeladen (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Tech Trends Report 2024 Future Today Institute.pdf

Scaling API-first – The story of a global engineering organization

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Data Cloud, More than a CDP by Matt Robison

How to Troubleshoot Apps for the Modern Connected Worker

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Presentation on how to chat with PDF using ChatGPT code interpreter

GenAI Risks & Security Meetup 01052024.pdf

presentation ICT roal in 21st century education

2024: Domino Containers - The Next Step. News from the Domino Container commu...

🐬 The future of MySQL is Postgres 🐘

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Histor y of HAM Radio presentation slide

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Finology Group – Insurtech Innovation Award 2024

CNv6 Instructor Chapter 6 Quality of Service

Evaluating the top large language models.pdf

Realtime stream processing with kafka

1. Real-time stream processing with Apache Kafka Juhi Praveen Singh Bor

2. Business is driven by streams of events

3. Business is driven by streams of events

4. Old World of data

5. Evolution of processing data

6. New World DB/DWH + distributes systems Monolithic Application -> Microservices Batch -> Real-time

7. How Organizations Handle Data Flow

8. Real-time Distributed Streaming Platform

10. Publish + Subscribe

11. Store

12. Process

13. Terminologies ● Message or Event

14. Terminologies ● Message ● Topic and Partition

15. Topic and Partitions

16. Terminologies ● Message/ Batches ● Topic and Partition ● Kafka Brokers & Cluster

17. Terminologies ● Message/ Batches ● Topic and Partition ● Kafka Brokers & Cluster

18. Terminologies ● Message/ Batches ● Topic and Partition ● Kafka Brokers & Cluster ● Producer and Consumer

19. Terminologies ● Message/ Batches ● Topic and Partition ● Kafka Brokers & Cluster ● Producer and Consumer ● Zookeeper

20. Problem statement When a customer uses a credit card to do a transaction, the vendor needs a fast response to the question, “Is it a fraudulent payment?” Real time stream processing

21. Fraud Detection ( Is payment fraud? YES/NO ) Broker 1 Broker 2 Kafka Cluster Kafka Connect External System Payment 1 Payment 2 Payment 1001 Payment 10002 NO YES NO NO

22. Is a payment fraudulent one? ● Analysis and forensics on historical data to build the machine learning models. ● Use machine learning models to prediction fraud on live streams. ○ Card Velocity ○ Average spending in last 60 mins > 10 * average spending in 60 mins ever

23. Problem statement : Fraud detection ● POS Transaction Data (Live Stream) ● User Information ● User Transaction History ● Fraud Location Estimator

24. Let’s build real-time fraud detection system for Credit Card Fraud Detector Customer Profile Step 1 Step 2 Step 3

25. Kafka Core API 1. Producer API 2. Consumer API 3. Connect API 4. Stream API

26. Step 1: Produce messages using Producer API POS_TRANSACTION_TOPIC

27.

28. Step 2: Capture Data from External Data source Customer Profile POS_TRANSACTION_TOPIC CUSTOMER_RECORD_TOPIC KafkaConnect

29. Streaming Platform Overview Broker 1 Broker 2 Kafka Cluster Kafka Connect Kafka Connect External System External System

30. Key concepts Key ValueX 25 Key ValueY 50 Key ValueZ 9 Table Key ValueX 20 Key ValueX 25 Key ValueZ 9 Key ValueY 5 Key ValueY 50 Stream Duality of Stream and Table

31. Key concepts Processor Topology Stream Processor Stream

32. How to use Kafka Streams API? Just three steps 1. Create one or more streams from Kafka topic(s). 2. Compose transformations on these streams. 3. Write transformed streams back to Kafka.

33. Creating source streams from Kafka ● Input topics to KStream ○ Each app instance gets a subsect of partitions of input streams. ○ Specify the serializer and deserializer .

34. Transform a Stream ● Stateless transformation ○ Don’t require state for processing. ○ Don’t require state store with stream processor. ○ E.g. Branch, Filter, Inverse Filter, FlatMap, Peek, Map etc

35. Transform a Stream ● Stateful transformation ○ Depends on state for processing inputs and producing outputs. ○ Require a state store with stream processor. ○ State stores are fault tolerant. ■ Aggregating ■ Joining ■ Windowing ■ Applying custom processors and transformers

36. Aggregating ○ Group the record by either groupByKey or groupBy. ○ KGroupedStream or KGroupedTable can be aggregated via operations like reduce. ○ Aggregation can be performed on windowed or non-windowed data.

37. Aggregating

38. Joins

39.

40. Stream Partitions and Tasks

41. State Store

Hinweis der Redaktion

In a way every business generates stream of events. Retail has stream on orders and shipments, finance has stream of stock tickers. Bitcoins exchanges has stream of exchange rates, website has stream of impression and clicks. Every byte of data has story to tell and it In today’s world Business is becoming more digital. And Ubounded, unorded large scale data sets are increasingly common in day to day business. Every application generates data in form of use clicks, logs or some transaction. Every byte has story to tell. Like your single click on Amazon, behind the scene determines which item would you like to see nex. So data that application generated can be thought as streams of events.
Business is becoming more digital. Every application generates data in form of use clicks, logs or some transaction. Every byte has story to tell. Like your single click on Amazon, behind the scene determines which item would you like to see nex. So data that application generated can be thought as streams of events.
Request response Batch processing Real time processing
To caught up with the need to process data in as it arrives companies has implemented data pipelines like this. It is ver messy. There are applications which talks to each other using some kind of messenging queue Custom etl scripts written to move data between sources and destinations. This adhoc fashion of connecting source and destination to build real time processing application is pretty chaotic.
In this talk we will see how Apache kafka cleans up the mess providing distributed streaming platform. The idea is to have kafka as central neve of your system. Which has ability to collect data from variety of sources and make it available at real-time and at large scale to any number of destination as it comes up.
Here is how you go about building streaming platform.
Kafka as a Messaging System It acts as a publish subscribe system where publishers publish messages and consumers reads messages from server.
It is not just limited to pub sub system. It is storage system which stores the stream of data. Persistance and strict ordering Data written to Kafka is written to disk and replicated for fault-tolerance. you can think of Kafka as a kind of special purpose distributed filesystem dedicated to high-performance, low-latency commit log storage, replication, and propagation. Distributed by design Replication Fault Tolerance Partitioning Elastic Scaling Scalability of file systems
It isn't enough to just read, write, and store streams of data, the purpose is to enable real-time processing of streams. In Kafka a stream processor is anything that takes continuous streams of data from input topics, performs some processing on this input, and produces continual streams of data to output topics.
- Unit of data is Message which has key and data. - It is like record in database - Just byte array. No meaning to Kafka - Message has Optional bit of metadata, referred as key. - for efficiency message is written into batches - Batch is just a collection of messages. - Trade off between Latency and throughput
- Messages are categorized into Topics. - Closest analogy is database table or folder - Topics are broken down into partitions - Partition provides redundancy and scalability - Each partition can be hosted on to different server ( Single partition can be scaled horizontally across different server to provide performance - Multiple partition does not guaranty ordering of messages across multiple partitions but ordering is maintain in single partition Offset - Another bit of metadata, an Integer that continuously increases - Kafka adds offset to message as it is produced and which is unique in single partition.
Broker: A single Kafka server is called a broker. The broker receives messages from producers, assigns offsets to them, and commits the messages to storage on disk. It also services consumers, responding to fetch requests for partitions and responding with the messages that have been committed to disk. Cluster: Kafka brokers are designed to operate as part of a cluster
Producer: - Producer creates new messages and publish to Topic Kafka. Consumer - Subscribes to one or more topic and read messages in the order in which they were produced. - keep track of which messages are already consumed by keeping offset of last consumed message. - With this Consumer can restart and stop without losing its place.
Apache Kafka uses Zookeeper to store metadata about the Kafka cluster, as well as consumer client detail
I think we have covered enough of theory so let’s build our own simple Credit card fraud detection system. This is very basic example o, So the idea is whenever card-holder use card, transaction events gets generated.
Ingest Transaction Stream into Kafka from Web Application using Kafka Producer API Capture Card Holder information from external data source using Kafka Connect Process Stream for Fraud Detection using Kafka stream API

Realtime stream processing with kafka

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Realtime stream processing with kafka

Ähnlich wie Realtime stream processing with kafka (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Realtime stream processing with kafka

Hinweis der Redaktion