SlideShare a Scribd company logo
1 of 41
1
Processing IoT Data with
Apache Kafka
Matt Howlett
Confluent Inc.
2
Pub Sub
Messaging Protocol
Pub Sub
Messaging System
(rethought as a distributed commit log)
Distributed Streaming Platform
● Pub Sub Messaging
● Event Storage
● Processing Framework
3
OBD-II Adapters
4
Problem Statement
Let’s build a system to:
• Transport OBD-II data over unreliable links from cars to the data center
• Capable of handling millions of devices*
• Extract information from + respond to this data in (near) real time (at scale)
• Handle surges in usage
• Potential for ad-hoc historical processing
* also less
Architecture / technology / methods applicable to many scenarios.
5
Publish / subscribe messaging protocol:
• Built on top of TCP/IP
• Features that make it well suited to poor connectivity / high latency scenarios
• Lightweight
• Efficient client implementations, low network overhead
• MQTT-SN for non IP networks (’virtual connections’)
• Many (open source) broker implementations
• Mosquitto, RabbitMQ, HiveMQ, VerneMQ
• Many Client Libraries
• C, C++, Java, C#, Python, Javascript, websockets, Arduino …
• Widely used (incl. phone apps!)
• Oil pipeline sensor via satellite link
• Facebook Messenger
• AWS IoT
MQTT Introduction
6
• Simple API
• Hierarchical topics
• myhome/kitchen/door/front/battery/level
• wildcard subscription: myhome/*/door/*/battery/level
• 3 qualities of service (on both produce and consume)
• At most once (QoS 0)
• At least once (QoS 1)
• Exactly once (QoS 2) [not universally supported]
• Persistent consumer sessions
• Important for QoS 1, QoS 2
• Last will and testament
• Last known good value
• Authorization, SSL/TLS
MQTT Features
7
• Device Id
• GPS Location [lon, lat]
• Ignition on / off
• Speedometer reading
• Timestamp
• …plus a lot more
Assume: data sent via 3G wireless connection at ~30 second interval
OBD-II Data
8
Deficiencies:
• Single MQTT server can handle maybe ~100K
connections
• Can’t handle usage surges (no buffering)
• No storage of events or reprocess capability
MQTT
Server 1
Processor 1 Processor 2 ...
Ingest Architecture V1
topic: [deviceid]/obd
9
MQTT
Server
Coordinator
MQTT
Server 1
MQTT
Server 2
MQTT
Server 3
MQTT
Server 4
topic: [deviceid]/obd
http / REST
...
• Easily Shardable
• Treat MQTT server as
commodity service
Ingest Architecture V2
10
MQTT
Server
Coordinator
MQTT
Server 1
MQTT
Server 2
MQTT
Server 3
MQTT
Server 4
topic: [deviceid]/obd
Kafka Connect
OBD_Data
Stream
processing
kafka
OBD -> MQTT -> Kafka
11
Apache Kafka
Distributed Streaming Platform:
• Pub Sub Messaging
• (typically clients are within data-center)
• Data Store
• Messages not deleted after delivery
• Stream Processing
• Low or high level libraries
• Data re-processing
12
Apache Kafka adoption spans
companies across industries.
13
● Persisted
● Append only
● Immutable
● Delete earliest data based on time / size / never
14
• Allows topics to scale past constraints
of single server
• Message → partition_id deterministic.
Partition relevant to application.
• Ordering guarantees per partition but
not across partitions
15
Apache Kafka Replication
• cheap durability!
• choose # acks for
message produced
confirmation
16
Apache Kafka Consumer Groups
partitions possibly across different brokers
17
Kafka Connect
• Use client library producers / consumers in custom applications.
• Often want to bulk transfer data between standard systems:
• Don’t re-invent the wheel – configure Kafka Connect
• Narrow scope: move data into & out of Kafka
• Off-the-shelf connectors
• Fault Tolerant
• Auto-balances load
• Pluggable Serialization
• Standalone and distributed modes of operation
• Configuration / management via REST API
18
19
MQTT Connector
https://github.com/evokly/kafka-connect-mqtt
• Single Task
• Single MQTT Broker
• Source only
Either:
• Start a bunch of these connectors (in one connect cluster), one per server, or:
• Implement a new multi-task connector, one task per MQTT broker.
• Communicate with MQTT Controller
20
• user_id
• device_id
• name
• address
• phone_number
• speed_alert_level
• ...
SQL Db
User_Info
User Data
21
Example: Car Towed Alert
Detect movement of car when ignition off, send SMS alert
kafka
OBD_Data P1
OBD_Data P5
Consumer 1
Consumer 2
Broker 1
...
OBD_Data P3
OBD_Data P7
Broker 2
...
...
...
SMS Gateway
Last loc. in mem
KV store
Last loc. in mem
KV store
User Info
22
Consumer Implementation
on_message(message m)
{
var device_id = m.key;
var obd_data = m.value;
if (obd_data.ignition_on)
return;
if (!kv_store.contains(device_id)) {
kv_store.add(device_id, obd_data.lon_lat);
return;
}
var prev_lon_lat = kv_store.get(device_id);
var dist = calc_dist(obd_data.lon_lat, prev_lon_lat);
kv_store.set(device_id, obd_data.lon_lat);
if (dist > alert_max_dist) {
// infrequent
send_alert(SQL.get_phone_number(device_id));
}
}
• Message can be from any partition
assigned to this consumer
• Ordering guaranteed per partition, but
not predictable across partitions
• All messages from a particular device
guaranteed to arrive at the same
consumer instance
23
Example: Speed Alert
• Scenario: Parent wants to monitor son/daughter driving and be alerted if they exceed a
specified speed.
• In the Tow Alert example User_Info only needs to be queried in the event of an alert.
• In this example, the table needs to be queried for every OBD data record in every partition.
OBD_data
[can update
at any time]
User Info
table
Not scalable! Cache?
...
Highfrequency
P1
24
Time = 0 1 60 {device_id=1, speed_limit=60}
Time = 1 1 60 {device_id=2, speed_limit=80}
2 80
Time = 2 1 60 {device_id=3, speed_limit=70}
2 80
3 70
Time = 3 1 80 {device_id=1, speed_limit=80}
2 80
3 70
Time = 4 1 80 {device_id=1, speed_limit=65}
2 80
3 70
Table can be represented as stream of updates
device_id speed_limit
Log compaction!
25
Debezium
Kafka Connector that turns database tables into streams of update records.
debezium
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
...
MySQL
User Info
[key: userId]
User_Info
[changelog topic]Partition by device_id
26
Stream / Table Join
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Partition 7
...
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
...
Consumer 1
Relevant subset of
User_Info
device_id speed_limit
1 80
3 70
User_Info
[ChangeLog, compacted]
OBD_Data
[Record Stream]
...
debezium
key:device_id
key:device_id
27
Speed Alert: Message handler
on_message(message m)
{
var device_id = m.key;
var obd_data = m.value;
var user_info = user_info_local.get(device_id);
if (obd_data.speedometer > user_info.max_speed) {
alert_user(device_id, user_info);
}
}
28
MQTT Phone Client Connectivity
MQTT
Server
Coordinator
MQTT
Server 1
MQTT
Server 2
[deviceid]/alert
...
Consumer 1 ...
MQTT
Server 3
...
[deviceid]/obd
29
Speed Limit Alert: Rate limiting
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Partition 7
...
app_state kafka topic
• Prefer to rate limit on server to minimize network overhead.
• Create new Kafka topic app_state, partitioned on
device_id.
• When alert triggered, store alert time in this topic.
• [can use this topic as general store for other per device
state info too]
• Materialize this change-log stream on consumers as
necessary.
30
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Partition 7
...
Partition 1
Partition 2
Partition 3
...
Consumer 1
Relevant
subset of
User_Info
...
OBD_Data
[Record Stream]
User_Info
[ChangeLog, compacted]
Partition 4
Partition 1
Partition 2
Partition 3
...
Partition 4
App_State
[compacted]
Relevant
subset of
App_State
31
Example: Location Based Special Offers
When Car enters specific region, send available special offers to the user’s phone.
Require:
• User_Info
• Address – so we know whether they are local to their current location or not
• App_state
• Use to persist already sent offers
• Special_Offer_Info
• Table that store list of all special offers.
32
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 32 33 34 35
36 37 38 39 40 41 42
Regions
• Regions may be simple (as depicted
here) or complex
• F(lon, lat) -> locationId.
• Note: could also implement ride—share
surge pricing using similar partitioning.
33
Special Offer Change-log Stream
debezium
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
...
MySQL
Special Offer
Info
Special_Offers
[changelog,
compacted]
Partition by location_id
34
Multi-stage Data Pipeline
OBD_Data App_State
[offers already sent]
User_Info
[address]
K: device_id
V: OBD record
consume enrich
K: device_id
V: OBD record
address
K: device_id
V: OBD record
Address
offers_sent
enrich
35
Multi-stage Data Pipeline (continued)
K: [device_id]
V: OBD record
Address
offers_sent
K: location_id
V: OBD record
Address
offers_sent
OBD_Data_By_Location
P1
……
…
Repartition by location_id
P2
P1
P3
Data from given device will still all be on the same partition
(except when region changes)
36
Multi-stage Data Pipeline (continued)
K: location_id
V: OBD record
Address
offers_sent
Special_Offers
K: location_id
V: OBD record
address
offers_sent
available_offers
re-partition
enrich
37
Multi-stage Data Pipeline (continued)
Special offer available in
location
Special offer not already
sent
User address near location?
MQTT
Server
filter
filter
filter
...
[deviceId]/alert
38
39
40
Discount code: kafcom17
Use the Apache Kafka community discount code to get $50 off
www.kafka-summit.org
Kafka Summit New York: May 8
Kafka Summit San Francisco: August 28
Presented by
41
Thank You
@matt_howlett
@confluentinc

More Related Content

What's hot

What's hot (16)

Protocols for internet of things
Protocols for internet of thingsProtocols for internet of things
Protocols for internet of things
 
Osiot14 buildout
Osiot14 buildoutOsiot14 buildout
Osiot14 buildout
 
OpenContrail Silicon Valley Meetup Aug 25 2015
OpenContrail Silicon Valley Meetup Aug 25 2015OpenContrail Silicon Valley Meetup Aug 25 2015
OpenContrail Silicon Valley Meetup Aug 25 2015
 
In search of the perfect IoT Stack - Scalable IoT Architectures with MQTT
In search of the perfect IoT Stack - Scalable IoT Architectures with MQTTIn search of the perfect IoT Stack - Scalable IoT Architectures with MQTT
In search of the perfect IoT Stack - Scalable IoT Architectures with MQTT
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
 
Functional reactive programming
Functional reactive programmingFunctional reactive programming
Functional reactive programming
 
CoAP for the Web of Things: From Tiny Resource-constrained Devices to the W...
CoAP for the Web of Things: From Tiny Resource-constrained Devices to the W...CoAP for the Web of Things: From Tiny Resource-constrained Devices to the W...
CoAP for the Web of Things: From Tiny Resource-constrained Devices to the W...
 
Microservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive ProgrammingMicroservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive Programming
 
Choosing the right platform for your Internet -of-Things solution
Choosing the right platform for your Internet -of-Things solutionChoosing the right platform for your Internet -of-Things solution
Choosing the right platform for your Internet -of-Things solution
 
Cisco OpenSOC
Cisco OpenSOCCisco OpenSOC
Cisco OpenSOC
 
StarlingX - Project Onboarding
StarlingX - Project OnboardingStarlingX - Project Onboarding
StarlingX - Project Onboarding
 
Informix on ARM and informix Timeseries - producing an Internet-of-Things sol...
Informix on ARM and informix Timeseries - producing an Internet-of-Things sol...Informix on ARM and informix Timeseries - producing an Internet-of-Things sol...
Informix on ARM and informix Timeseries - producing an Internet-of-Things sol...
 
Open Source Bristol 30 March 2022
Open Source Bristol 30 March 2022Open Source Bristol 30 March 2022
Open Source Bristol 30 March 2022
 
Standards Drive the Internet of Things
Standards Drive the Internet of ThingsStandards Drive the Internet of Things
Standards Drive the Internet of Things
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
 
OpenDaylight VTN Policy
OpenDaylight VTN PolicyOpenDaylight VTN Policy
OpenDaylight VTN Policy
 

Viewers also liked

Viewers also liked (7)

Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer Consumers
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data Platform
 
Avro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and HadoopAvro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and Hadoop
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
MapR Data Analyst
MapR Data AnalystMapR Data Analyst
MapR Data Analyst
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 

Similar to Processing IoT Data with Apache Kafka

MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021
Julian Douch
 
Data Capture in IBM WebSphere Premises Server - Aldo Eisma, IBM
Data Capture in IBM WebSphere Premises Server - Aldo Eisma, IBMData Capture in IBM WebSphere Premises Server - Aldo Eisma, IBM
Data Capture in IBM WebSphere Premises Server - Aldo Eisma, IBM
mfrancis
 

Similar to Processing IoT Data with Apache Kafka (20)

IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
 
Network-Connected Development with ZeroMQ
Network-Connected Development with ZeroMQNetwork-Connected Development with ZeroMQ
Network-Connected Development with ZeroMQ
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021
 
ADAM-3600 Sales kit_WATER.pptx
ADAM-3600 Sales kit_WATER.pptxADAM-3600 Sales kit_WATER.pptx
ADAM-3600 Sales kit_WATER.pptx
 
IzoT platform presentation
IzoT platform presentationIzoT platform presentation
IzoT platform presentation
 
Thingsboard IoT Platform - A Quick Tour
Thingsboard IoT Platform - A Quick TourThingsboard IoT Platform - A Quick Tour
Thingsboard IoT Platform - A Quick Tour
 
The Art of Displaying Industrial Data
The Art of Displaying Industrial DataThe Art of Displaying Industrial Data
The Art of Displaying Industrial Data
 
Using Kubernetes to make cellular data plans cheaper for 50M users
Using Kubernetes to make cellular data plans cheaper for 50M usersUsing Kubernetes to make cellular data plans cheaper for 50M users
Using Kubernetes to make cellular data plans cheaper for 50M users
 
Open source building blocks for the Internet of Things - Jfokus 2013
Open source building blocks for the Internet of Things - Jfokus 2013Open source building blocks for the Internet of Things - Jfokus 2013
Open source building blocks for the Internet of Things - Jfokus 2013
 
Data Capture in IBM WebSphere Premises Server - Aldo Eisma, IBM
Data Capture in IBM WebSphere Premises Server - Aldo Eisma, IBMData Capture in IBM WebSphere Premises Server - Aldo Eisma, IBM
Data Capture in IBM WebSphere Premises Server - Aldo Eisma, IBM
 
Fiware: Connecting to robots
Fiware: Connecting to robotsFiware: Connecting to robots
Fiware: Connecting to robots
 
OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月
OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月
OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 
Machine to Machine Communication with Microsoft Azure IoT Edge & HiveMQ
Machine to Machine Communication with Microsoft Azure IoT Edge & HiveMQMachine to Machine Communication with Microsoft Azure IoT Edge & HiveMQ
Machine to Machine Communication with Microsoft Azure IoT Edge & HiveMQ
 
Powering your next IoT application with MQTT - JavaOne 2014 tutorial
Powering your next IoT application with MQTT - JavaOne 2014 tutorialPowering your next IoT application with MQTT - JavaOne 2014 tutorial
Powering your next IoT application with MQTT - JavaOne 2014 tutorial
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
From the Internet of Things to Intelligent Systems A Developer's Primer - Gar...
From the Internet of Things to Intelligent Systems A Developer's Primer - Gar...From the Internet of Things to Intelligent Systems A Developer's Primer - Gar...
From the Internet of Things to Intelligent Systems A Developer's Primer - Gar...
 
Autopilot : Securing Cloud Native Storage
Autopilot : Securing Cloud Native StorageAutopilot : Securing Cloud Native Storage
Autopilot : Securing Cloud Native Storage
 
Monitoring klassisch oder Cloud
Monitoring klassisch oder CloudMonitoring klassisch oder Cloud
Monitoring klassisch oder Cloud
 

Recently uploaded

CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Recently uploaded (20)

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 

Processing IoT Data with Apache Kafka

  • 1. 1 Processing IoT Data with Apache Kafka Matt Howlett Confluent Inc.
  • 2. 2 Pub Sub Messaging Protocol Pub Sub Messaging System (rethought as a distributed commit log) Distributed Streaming Platform ● Pub Sub Messaging ● Event Storage ● Processing Framework
  • 4. 4 Problem Statement Let’s build a system to: • Transport OBD-II data over unreliable links from cars to the data center • Capable of handling millions of devices* • Extract information from + respond to this data in (near) real time (at scale) • Handle surges in usage • Potential for ad-hoc historical processing * also less Architecture / technology / methods applicable to many scenarios.
  • 5. 5 Publish / subscribe messaging protocol: • Built on top of TCP/IP • Features that make it well suited to poor connectivity / high latency scenarios • Lightweight • Efficient client implementations, low network overhead • MQTT-SN for non IP networks (’virtual connections’) • Many (open source) broker implementations • Mosquitto, RabbitMQ, HiveMQ, VerneMQ • Many Client Libraries • C, C++, Java, C#, Python, Javascript, websockets, Arduino … • Widely used (incl. phone apps!) • Oil pipeline sensor via satellite link • Facebook Messenger • AWS IoT MQTT Introduction
  • 6. 6 • Simple API • Hierarchical topics • myhome/kitchen/door/front/battery/level • wildcard subscription: myhome/*/door/*/battery/level • 3 qualities of service (on both produce and consume) • At most once (QoS 0) • At least once (QoS 1) • Exactly once (QoS 2) [not universally supported] • Persistent consumer sessions • Important for QoS 1, QoS 2 • Last will and testament • Last known good value • Authorization, SSL/TLS MQTT Features
  • 7. 7 • Device Id • GPS Location [lon, lat] • Ignition on / off • Speedometer reading • Timestamp • …plus a lot more Assume: data sent via 3G wireless connection at ~30 second interval OBD-II Data
  • 8. 8 Deficiencies: • Single MQTT server can handle maybe ~100K connections • Can’t handle usage surges (no buffering) • No storage of events or reprocess capability MQTT Server 1 Processor 1 Processor 2 ... Ingest Architecture V1 topic: [deviceid]/obd
  • 9. 9 MQTT Server Coordinator MQTT Server 1 MQTT Server 2 MQTT Server 3 MQTT Server 4 topic: [deviceid]/obd http / REST ... • Easily Shardable • Treat MQTT server as commodity service Ingest Architecture V2
  • 10. 10 MQTT Server Coordinator MQTT Server 1 MQTT Server 2 MQTT Server 3 MQTT Server 4 topic: [deviceid]/obd Kafka Connect OBD_Data Stream processing kafka OBD -> MQTT -> Kafka
  • 11. 11 Apache Kafka Distributed Streaming Platform: • Pub Sub Messaging • (typically clients are within data-center) • Data Store • Messages not deleted after delivery • Stream Processing • Low or high level libraries • Data re-processing
  • 12. 12 Apache Kafka adoption spans companies across industries.
  • 13. 13 ● Persisted ● Append only ● Immutable ● Delete earliest data based on time / size / never
  • 14. 14 • Allows topics to scale past constraints of single server • Message → partition_id deterministic. Partition relevant to application. • Ordering guarantees per partition but not across partitions
  • 15. 15 Apache Kafka Replication • cheap durability! • choose # acks for message produced confirmation
  • 16. 16 Apache Kafka Consumer Groups partitions possibly across different brokers
  • 17. 17 Kafka Connect • Use client library producers / consumers in custom applications. • Often want to bulk transfer data between standard systems: • Don’t re-invent the wheel – configure Kafka Connect • Narrow scope: move data into & out of Kafka • Off-the-shelf connectors • Fault Tolerant • Auto-balances load • Pluggable Serialization • Standalone and distributed modes of operation • Configuration / management via REST API
  • 18. 18
  • 19. 19 MQTT Connector https://github.com/evokly/kafka-connect-mqtt • Single Task • Single MQTT Broker • Source only Either: • Start a bunch of these connectors (in one connect cluster), one per server, or: • Implement a new multi-task connector, one task per MQTT broker. • Communicate with MQTT Controller
  • 20. 20 • user_id • device_id • name • address • phone_number • speed_alert_level • ... SQL Db User_Info User Data
  • 21. 21 Example: Car Towed Alert Detect movement of car when ignition off, send SMS alert kafka OBD_Data P1 OBD_Data P5 Consumer 1 Consumer 2 Broker 1 ... OBD_Data P3 OBD_Data P7 Broker 2 ... ... ... SMS Gateway Last loc. in mem KV store Last loc. in mem KV store User Info
  • 22. 22 Consumer Implementation on_message(message m) { var device_id = m.key; var obd_data = m.value; if (obd_data.ignition_on) return; if (!kv_store.contains(device_id)) { kv_store.add(device_id, obd_data.lon_lat); return; } var prev_lon_lat = kv_store.get(device_id); var dist = calc_dist(obd_data.lon_lat, prev_lon_lat); kv_store.set(device_id, obd_data.lon_lat); if (dist > alert_max_dist) { // infrequent send_alert(SQL.get_phone_number(device_id)); } } • Message can be from any partition assigned to this consumer • Ordering guaranteed per partition, but not predictable across partitions • All messages from a particular device guaranteed to arrive at the same consumer instance
  • 23. 23 Example: Speed Alert • Scenario: Parent wants to monitor son/daughter driving and be alerted if they exceed a specified speed. • In the Tow Alert example User_Info only needs to be queried in the event of an alert. • In this example, the table needs to be queried for every OBD data record in every partition. OBD_data [can update at any time] User Info table Not scalable! Cache? ... Highfrequency P1
  • 24. 24 Time = 0 1 60 {device_id=1, speed_limit=60} Time = 1 1 60 {device_id=2, speed_limit=80} 2 80 Time = 2 1 60 {device_id=3, speed_limit=70} 2 80 3 70 Time = 3 1 80 {device_id=1, speed_limit=80} 2 80 3 70 Time = 4 1 80 {device_id=1, speed_limit=65} 2 80 3 70 Table can be represented as stream of updates device_id speed_limit Log compaction!
  • 25. 25 Debezium Kafka Connector that turns database tables into streams of update records. debezium Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 ... MySQL User Info [key: userId] User_Info [changelog topic]Partition by device_id
  • 26. 26 Stream / Table Join Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Partition 7 ... Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 ... Consumer 1 Relevant subset of User_Info device_id speed_limit 1 80 3 70 User_Info [ChangeLog, compacted] OBD_Data [Record Stream] ... debezium key:device_id key:device_id
  • 27. 27 Speed Alert: Message handler on_message(message m) { var device_id = m.key; var obd_data = m.value; var user_info = user_info_local.get(device_id); if (obd_data.speedometer > user_info.max_speed) { alert_user(device_id, user_info); } }
  • 28. 28 MQTT Phone Client Connectivity MQTT Server Coordinator MQTT Server 1 MQTT Server 2 [deviceid]/alert ... Consumer 1 ... MQTT Server 3 ... [deviceid]/obd
  • 29. 29 Speed Limit Alert: Rate limiting Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Partition 7 ... app_state kafka topic • Prefer to rate limit on server to minimize network overhead. • Create new Kafka topic app_state, partitioned on device_id. • When alert triggered, store alert time in this topic. • [can use this topic as general store for other per device state info too] • Materialize this change-log stream on consumers as necessary.
  • 30. 30 Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Partition 7 ... Partition 1 Partition 2 Partition 3 ... Consumer 1 Relevant subset of User_Info ... OBD_Data [Record Stream] User_Info [ChangeLog, compacted] Partition 4 Partition 1 Partition 2 Partition 3 ... Partition 4 App_State [compacted] Relevant subset of App_State
  • 31. 31 Example: Location Based Special Offers When Car enters specific region, send available special offers to the user’s phone. Require: • User_Info • Address – so we know whether they are local to their current location or not • App_state • Use to persist already sent offers • Special_Offer_Info • Table that store list of all special offers.
  • 32. 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Regions • Regions may be simple (as depicted here) or complex • F(lon, lat) -> locationId. • Note: could also implement ride—share surge pricing using similar partitioning.
  • 33. 33 Special Offer Change-log Stream debezium Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 ... MySQL Special Offer Info Special_Offers [changelog, compacted] Partition by location_id
  • 34. 34 Multi-stage Data Pipeline OBD_Data App_State [offers already sent] User_Info [address] K: device_id V: OBD record consume enrich K: device_id V: OBD record address K: device_id V: OBD record Address offers_sent enrich
  • 35. 35 Multi-stage Data Pipeline (continued) K: [device_id] V: OBD record Address offers_sent K: location_id V: OBD record Address offers_sent OBD_Data_By_Location P1 …… … Repartition by location_id P2 P1 P3 Data from given device will still all be on the same partition (except when region changes)
  • 36. 36 Multi-stage Data Pipeline (continued) K: location_id V: OBD record Address offers_sent Special_Offers K: location_id V: OBD record address offers_sent available_offers re-partition enrich
  • 37. 37 Multi-stage Data Pipeline (continued) Special offer available in location Special offer not already sent User address near location? MQTT Server filter filter filter ... [deviceId]/alert
  • 38. 38
  • 39. 39
  • 40. 40 Discount code: kafcom17 Use the Apache Kafka community discount code to get $50 off www.kafka-summit.org Kafka Summit New York: May 8 Kafka Summit San Francisco: August 28 Presented by