SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Real-time stream
processing with Apache
Kafka
Juhi
Praveen Singh Bor
Business is driven by streams of events
Business is driven by streams of events
Old World of data
Evolution of processing data
New World
DB/DWH + distributes systems
Monolithic Application -> Microservices
Batch -> Real-time
How Organizations Handle Data Flow
Real-time Distributed Streaming Platform
Publish + Subscribe
Store
Process
Terminologies
● Message or Event
Terminologies
● Message
● Topic and Partition
Topic and Partitions
Terminologies
● Message/ Batches
● Topic and Partition
● Kafka Brokers & Cluster
Terminologies
● Message/ Batches
● Topic and Partition
● Kafka Brokers & Cluster
Terminologies
● Message/ Batches
● Topic and Partition
● Kafka Brokers & Cluster
● Producer and Consumer
Terminologies
● Message/ Batches
● Topic and Partition
● Kafka Brokers & Cluster
● Producer and Consumer
● Zookeeper
Problem statement
When a customer uses a credit card to do a transaction, the vendor needs a fast
response to the question, “Is it a fraudulent payment?”
Real time stream processing
Fraud Detection ( Is payment fraud? YES/NO )
Broker 1 Broker 2
Kafka Cluster
Kafka
Connect
External
System
Payment 1
Payment 2
Payment 1001
Payment 10002
NO
YES
NO
NO
Is a payment fraudulent one?
● Analysis and forensics on historical data to build the machine learning models.
● Use machine learning models to prediction fraud on live streams.
○ Card Velocity
○ Average spending in last 60 mins > 10 * average spending in 60 mins ever
Problem statement : Fraud detection
● POS Transaction Data (Live Stream)
● User Information
● User Transaction History
● Fraud Location Estimator
Let’s build real-time fraud detection system for Credit Card
Fraud
Detector
Customer
Profile
Step 1
Step 2
Step 3
Kafka Core API
1. Producer API
2. Consumer API
3. Connect API
4. Stream API
Step 1: Produce messages using Producer API
POS_TRANSACTION_TOPIC
Step 2: Capture Data from External Data source
Customer
Profile
POS_TRANSACTION_TOPIC
CUSTOMER_RECORD_TOPIC
KafkaConnect
Streaming Platform Overview
Broker 1 Broker 2
Kafka Cluster
Kafka
Connect
Kafka
Connect
External
System
External
System
Key concepts
Key ValueX 25
Key ValueY 50
Key ValueZ 9
Table
Key ValueX 20
Key ValueX 25
Key ValueZ 9
Key ValueY 5
Key ValueY 50
Stream
Duality of Stream and Table
Key concepts
Processor Topology
Stream Processor
Stream
How to use Kafka Streams API?
Just three steps
1. Create one or more streams from Kafka topic(s).
2. Compose transformations on these streams.
3. Write transformed streams back to Kafka.
Creating source streams from Kafka
● Input topics to KStream
○ Each app instance gets a subsect of partitions of input streams.
○ Specify the serializer and deserializer .
Transform a Stream
● Stateless transformation
○ Don’t require state for processing.
○ Don’t require state store with stream processor.
○ E.g. Branch, Filter, Inverse Filter, FlatMap, Peek, Map etc
Transform a Stream
● Stateful transformation
○ Depends on state for processing inputs and producing outputs.
○ Require a state store with stream processor.
○ State stores are fault tolerant.
■ Aggregating
■ Joining
■ Windowing
■ Applying custom processors and transformers
Aggregating
○ Group the record by either groupByKey or groupBy.
○ KGroupedStream or KGroupedTable can be aggregated via operations like reduce.
○ Aggregation can be performed on windowed or non-windowed data.
Aggregating
Joins
Stream Partitions and Tasks
State Store

Weitere ähnliche Inhalte

Was ist angesagt?

Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
Neil Avery
 

Was ist angesagt? (20)

APAC ksqlDB Workshop
APAC ksqlDB WorkshopAPAC ksqlDB Workshop
APAC ksqlDB Workshop
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
 
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
Bridge to Cloud: Using Apache Kafka to Migrate to AWSBridge to Cloud: Using Apache Kafka to Migrate to AWS
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
 
Building Event-Driven Applications with Apache Kafka & Confluent Platform
Building Event-Driven Applications with Apache Kafka & Confluent PlatformBuilding Event-Driven Applications with Apache Kafka & Confluent Platform
Building Event-Driven Applications with Apache Kafka & Confluent Platform
 
Bridge Your Kafka Streams to Azure Webinar
Bridge Your Kafka Streams to Azure WebinarBridge Your Kafka Streams to Azure Webinar
Bridge Your Kafka Streams to Azure Webinar
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshop
 
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud

 
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
 
What is Apache Kafka®?
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?
 
Amsterdam meetup at ING June 18, 2019
Amsterdam meetup at ING June 18, 2019Amsterdam meetup at ING June 18, 2019
Amsterdam meetup at ING June 18, 2019
 
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
 
Building a Streaming Platform with Kafka
Building a Streaming Platform with KafkaBuilding a Streaming Platform with Kafka
Building a Streaming Platform with Kafka
 
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafka
 
Top use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache KafkaTop use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache Kafka
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafka
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
 
Deep Dive Series #3: Schema Validation + Structured Audit Logs
Deep Dive Series #3: Schema Validation + Structured Audit LogsDeep Dive Series #3: Schema Validation + Structured Audit Logs
Deep Dive Series #3: Schema Validation + Structured Audit Logs
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
 

Ähnlich wie Realtime stream processing with kafka

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Kai Wähner
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
confluent
 
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
Ben Stopford
 

Ähnlich wie Realtime stream processing with kafka (20)

Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event Architectures
 
Event-Based API Patterns and Practices
Event-Based API Patterns and PracticesEvent-Based API Patterns and Practices
Event-Based API Patterns and Practices
 
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
 
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
 
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
 
KFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIKFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AI
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, ConfluentJay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
 
Stream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream SharingStream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream Sharing
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
 
apidays New York 2023 - Why Finance needs Asychronous APIs, Nicholas Goodman,...
apidays New York 2023 - Why Finance needs Asychronous APIs, Nicholas Goodman,...apidays New York 2023 - Why Finance needs Asychronous APIs, Nicholas Goodman,...
apidays New York 2023 - Why Finance needs Asychronous APIs, Nicholas Goodman,...
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
JHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka EcosystemJHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka Ecosystem
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
 
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
 
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in Streams
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 

Realtime stream processing with kafka

Hinweis der Redaktion

  1. In a way every business generates stream of events. Retail has stream on orders and shipments, finance has stream of stock tickers. Bitcoins exchanges has stream of exchange rates, website has stream of impression and clicks. Every byte of data has story to tell and it In today’s world Business is becoming more digital. And Ubounded, unorded large scale data sets are increasingly common in day to day business. Every application generates data in form of use clicks, logs or some transaction. Every byte has story to tell. Like your single click on Amazon, behind the scene determines which item would you like to see nex. So data that application generated can be thought as streams of events.
  2. Business is becoming more digital. Every application generates data in form of use clicks, logs or some transaction. Every byte has story to tell. Like your single click on Amazon, behind the scene determines which item would you like to see nex. So data that application generated can be thought as streams of events.
  3. Request response Batch processing Real time processing
  4. To caught up with the need to process data in as it arrives companies has implemented data pipelines like this. It is ver messy. There are applications which talks to each other using some kind of messenging queue Custom etl scripts written to move data between sources and destinations. This adhoc fashion of connecting source and destination to build real time processing application is pretty chaotic.
  5. In this talk we will see how Apache kafka cleans up the mess providing distributed streaming platform. The idea is to have kafka as central neve of your system. Which has ability to collect data from variety of sources and make it available at real-time and at large scale to any number of destination as it comes up.
  6. Here is how you go about building streaming platform.
  7. Kafka as a Messaging System It acts as a publish subscribe system where publishers publish messages and consumers reads messages from server.
  8. It is not just limited to pub sub system. It is storage system which stores the stream of data. Persistance and strict ordering Data written to Kafka is written to disk and replicated for fault-tolerance. you can think of Kafka as a kind of special purpose distributed filesystem dedicated to high-performance, low-latency commit log storage, replication, and propagation. Distributed by design Replication Fault Tolerance Partitioning Elastic Scaling Scalability of file systems
  9. It isn't enough to just read, write, and store streams of data, the purpose is to enable real-time processing of streams. In Kafka a stream processor is anything that takes continuous streams of data from input topics, performs some processing on this input, and produces continual streams of data to output topics.
  10. - Unit of data is Message which has key and data. - It is like record in database - Just byte array. No meaning to Kafka - Message has Optional bit of metadata, referred as key. - for efficiency message is written into batches - Batch is just a collection of messages. - Trade off between Latency and throughput
  11. - Messages are categorized into Topics. - Closest analogy is database table or folder - Topics are broken down into partitions - Partition provides redundancy and scalability - Each partition can be hosted on to different server ( Single partition can be scaled horizontally across different server to provide performance - Multiple partition does not guaranty ordering of messages across multiple partitions but ordering is maintain in single partition Offset - Another bit of metadata, an Integer that continuously increases - Kafka adds offset to message as it is produced and which is unique in single partition.
  12. Broker: A single Kafka server is called a broker. The broker receives messages from producers, assigns offsets to them, and commits the messages to storage on disk. It also services consumers, responding to fetch requests for partitions and responding with the messages that have been committed to disk. Cluster: Kafka brokers are designed to operate as part of a cluster
  13. Producer: - Producer creates new messages and publish to Topic Kafka. Consumer - Subscribes to one or more topic and read messages in the order in which they were produced. - keep track of which messages are already consumed by keeping offset of last consumed message. - With this Consumer can restart and stop without losing its place.
  14. Apache Kafka uses Zookeeper to store metadata about the Kafka cluster, as well as consumer client detail
  15. I think we have covered enough of theory so let’s build our own simple Credit card fraud detection system. This is very basic example o, So the idea is whenever card-holder use card, transaction events gets generated.
  16. Ingest Transaction Stream into Kafka from Web Application using Kafka Producer API Capture Card Holder information from external data source using Kafka Connect Process Stream for Fraud Detection using Kafka stream API