Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs is a hyperscale PaaS event stream broker with protocol support for HTTP, AMQP, and Apache Kafka RPC that accepts and forwards several trillion (!) events per day and is available in all global Azure regions. This session is a look behind the curtain where we dive deep into the architecture of Event Hubs and look at the Event Hubs cluster model, resource isolation, and storage strategies and also review some performance figures.
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
1. Azure Event Hubs – Behind the scenes
Stream data using open standards and Apache Kafka®
Kasun Indrasiri
Sr. Product Manager
Azure Messaging @Microsoft
2. Business cases for Event Streaming
• High volume of events produced continuously from
wide array of sources at a rapid rate.
• Web clickstream
• Anomaly and fraud detection
• Application logs
• IoT sensor data
• Real-time ETL
• Change data capture
• Respond faster to customer needs.
3. Event Streaming Platform
• Ingest, store, enrich and analyze millions of events in such event
streams.
• Trigger
• Event Data source
• Capture and publish data to stream ingestion layer.
• Ingest and store
• Ingests and store event streams
• Wide array of APIs and input sources
• Delivery semantics – at-least once, consumer rewind and replay.
• Scale and distribute events into storage.
• Process/Analyze
• Consumes from event ingestion and storage layer.
• Stream Processing: Ability to react in real-time, filtering, aggregating and
prepping for analytics.
• Model and serve
• Serving queries
Trigger Ingest and store Process/Analyze Model and Serve
4. What is Azure Event Hubs?
• Platform-as-a-Service real-time event service
• Muti-protocol(AMQP, Kafka, HTTPS) low latency
event streaming.
• Seamlessly run Apache Kafka® workloads with far
lower cost and better performance.
• Fully managed: You use the features, Azure deals
with everything else
• Polyglot Azure SDK and cross-platform client
support
• Industry-leading reliability and availability
• Best-in-class performance.
Coordinate ownership of partitions
across multiple receivers
Clients can use any
native protocol.
Partitions are like lanes
on a freeway. More
lanes, more throughput.
Entity/Topic
AMQP 1.0
5. What is Event Hubs for Kafka
• Event Hubs does not run/host Kafka.
• Implements Kafka protocol head.
• Single broker supports AMQP and Kafka.
• Provides versioning and compatibility
• Support from version 1.0 and above
• No code changes to existing applications.
• Single stable endpoint.
6. Why choose Event Hubs for Kafka?
Cost Efficient
Better Performance and
Reliability
Simplify
Kafka
Far lower cost compared to
running Kafka on-prem or using
managed Kafka services
No initial/recurring licensing fees
Fully managed, no hardware.
Availability Zones with no
additional cost.
End to end latency of < 10ms
Minimal latency jitter
Ability choose the SKU based
on the performance needs.
Triple replicated, AZs and 99.99
availability
SLA(premium/dedicated)
Just a quick network hop away from your
existing workloads
Zero code setup, Seamless migration
Single stable endpoint (no broker endpoints)
Fully managed
Easy to scale (1MBps to >5GBps)
Azure Essentials
by default
Security, Compliance and Availability
AAD/OAuth
VNet/BYOK/IPFiltering
Zone Redundancy/Geo-DR
7. Multi-protocol Event
Streaming
• Event Hubs support multiple native event streaming protocols such as AMQP and
Kafka
• Kafka is RPC protocol, but AMQP can be more suitable for data streaming
• AMQP can offer
• Better performance for certain streaming workloads.
• Better idle connection handling with heart beats.
• Less resource utilization (avoiding redundant fetch calls)
• Ability to mix and match different protocols.
• Downstream Azure services use AMQP to stream data to and from Event Hubs.
Kafka
AMQP
Kafka
AMQP
HTTPs
Azure Event Hubs
8. Azure Event Hubs Hosting Models
Azure Stack Hub
Owner operated
Same limits as dedicated
Connected and Disconnected hosting
Standard Premium Dedicated
Ingress: 1 MB/s – 40MBs ingress Ingress: 10 MB/s(1 PU) – 160 MB/s (16
PU) ingress
Ingress: 50 MB/s to GBs
Multi-tenant Multi-tenant, minimal cross tenant
interference.
Single tenant (You own it)
Reserved bandwidth + pay as you go. Reserved compute and memory capacity. Reserved compute and memory capacity.
Low latency with predictability. Low latency with predictability.
Throttled beyond reserved capacity No throttling limits on data
ingress/egress, Extended limits and
quotas
No throttling limits on data ingress/egress,
Extended limits and quotas
Charged for capture and ingress events Capture and ingress event are included,
Premium features.
Capture and ingress event are included,
Premium features.
Throughput Units(TU) Processing Unit(PU) Capacity Units(CU)
99.95% 99.99% 99.99%
9. Event Hubs is fast!
• End-to-end latency:
• message to traverse the event streaming engine from the
producer through the system to the consumer.
• Event Hubs Premium – end-to-end latency is < 10ms for both Kafka
and AMQP workloads.
• Predictable low latency for high volume workloads.
• Faster than native Kafka brokers and managed Kafka offerings.
• More details at: Benchmarking Azure Event Hubs Premium for Kafka
and AMQP workloads - Microsoft Tech Community
0
2
4
6
8
10
12
0 20 40 60 80 100 120
E2E
Latency
(ms)
Number of Partitions
Event Hubs E2E Latency (4 PU, 1MB/s - 10 MB/s)
Event Hubs - Native Event Hubs - Kafka
10. High Availability
• Replicas/Fault Domain Placement
• Events are replicated across the cluster maintaining the low end to end
latency.
• Every topic partition is replicated three times
• One replica is designated as the primary/leader.
• Cluster VMs are spread across at least 3 fault domains such that the loss of a
rack or network poses no availability risk. Recovery from a fault domain
failure is fully automated and the system maintains SLA.
• Availability Zones(AZs)
• Each cluster spans three availability zones and maintains SLA without any
tolerance for data loss when one or two zones fail.
• Data is replicated more than one of the AZ instances before the producer is
acknowledged.
• Azure Azs support in Event Hubs are offered with no additional cost.
12. Logical Architecture
3-tier architecture: Networking, Messaging, Storage
Gateway
Backend (Broker)
Storage
• Connection Management
• IP Filtering, VNET/PEP
• Transport Level Security (TLS)
• Authorization Handling
• Entity Management
• HTTPS / WebSocket Protocol
• AMQP 1.0 Protocol
• Apache Kafka Protocol
Azure Resource
Management API
Azure Portal
• Partition Placement
• Volatile State Replication
• Sequencing & Timestamping
• Journal Cache
• Indexing
• Pending request tracking
• Checkpoint handling
• At-Rest Encryption (CMK)
• EH Capture
AMQP
Azure
Active
Directory
• Binary Log Data Store
• Index Store
• At-Rest Data Replication
Premium Units (PU)
buy isolated capacity
at this layer
HTTPS / WebSockets
AMQP 1.0 Apache Kafka RPC
HTTPS
(RPC)
13. Availability Zone 2 Availability Zone 3
Availability Zone 1
Backend and Gateway Clusters
Logical Architecture meets Placement
Fault Domain 1 Fault Domain 2 Fault Domain 3 Fault Domain 1 Fault Domain 2 Fault Domain 3 Fault Domain 1 Fault Domain 2 Fault Domain 3
Azure Virtual Machine Scale Sets (VMSS) – VM Placement/Deployment & Networking (SLB)
VM2
VM1
VM4
VM3
VM6
VM5
VM8
VM7
VM10
VM9
VM12
VM11
VM14
VM13
VM16
VM15
VM18
VM17
Azure Service Fabric (SF) – Leader election (who owns what?), leader lookup, process placement and activation
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
14. Gateway and Backend
Workload placement
Gateway
Backend
Azure Resource
Management API
Azure Portal
AMQP
Azure
Active
Directory
HTTPS / WebSockets
AMQP 1.0 Apache Kafka
VM1 VM3 VM5 VM7 VM9 VM11 VM13 VM15 VM17
Service Fabric Placement – Stateless Gateway Processes
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
VM2 VM4 VM6 VM8 VM10 VM12 VM14 VM16 VM18
Service Fabric Placement – Stateful Backend Processes
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
The gateway looks up the
backend owner of an entity from
SF for routing.
Entities are managed
by the gateway layer
AuthZ delegated
to AAD
Apache Kafka clients only see one broker
that owns all partitions. Partition
ownership is abstracted.
P P
15. Backend: Event Hubs Premium.
Workload placement
Backend
VM2 VM4 VM6 VM8 VM10 VM12 VM14 VM16 VM18
Service Fabric Placement – Stateful Backend Processes
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Designated
secondaries in
different fault domains
and AZ reserved for
failover
Host 2
Namespace Broker Process 2 (4PU)
P P P P P P P P
VM6
CPU
Cores
L-Series
1 2 Memory
Allocations
8GB
Host 1
VM6
CPU
Cores
L-Series
1 2 + Memory
Allocations
8GB
Namespace Broker Process 1 (4PU)
P P P P P P P P
Namespace PU are split across processes:
1 PU = 2 Proc (8GB Mem), 1 Core/Proc (2C)
2 PU = 2 Proc, 1 Core/Proc + 1 Core (3C)
4 PU = 2 Proc, 2 Core/Proc + 1 Core (5C)
8 PU = 4 Proc, 2 Core/Proc + 1 Core (9C)
16 PU = 4 Proc, 4 Core/Proc + 1 Core (17C)
Cores are exclusively
mapped to a broker
process.
>=2PU: 1 Core extra for
utility tasks.
Partition ownership is
dynamically mapped to the
process(es) associated with a
namespace via SF
Isolated VMs / L88is
16. NVMe NVMe NVMe NVMe NVMe NVMe NVMe
Storage Layer – Event Hubs Premium
Backend
VM2 VM4 VM6 VM8 VM10 VM12 VM14 VM16 VM18
Service Fabric Placement – Stateful Backend Processes
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
P P
Storage Account 1 (ZRS) Stg Acc N
Stg Acc 2
Partition X Log
Extent
1
Extent
2
B K E T V A
Normalizing / Indexing
Batching
Block Store Provider
Index
At-Rest Encryption (CMK)
Local
Block
Store
VM2 VM4 VM6 VM8 VM10 VM12 VM14 VM16 VM18
Service Fabric Placement – Stateful Backend Processes
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
B B
rpc
Partition X Log
Extent
1
Extent
2
Extent
3
Active Writes
Sealed
Sealed
B
As with EH Std, Partitions are
mapped to accounts.
But only complete extents
are written to storage at
once.
Block store is a fast NVMe
based append log store.
Native code, Service
Fabric Replication.
Synchronous 3x availability zone
replication with flush to disk for
each write:
Consistently under 3ms.
Azure Storage
17. Networking Features
Firewall, Virtual Network Integration with Private Endpoints
Gateway
HTTPS / WebSockets
AMQP 1.0 w/ TLS Apache Kafka RPC
VM1 VM3 VM5 VM7 VM9 VM11 VM13 VM15 VM17
Each cluster has a single public
load-balancer IPv4 address.
The address is generally stable and
will very rarely change. But: use
DNS firewall rules on your
namespace.
ns-eh1-prod-am3-403.cloudapp.net
52.236.186.64
CNAME contoso.servicebus.windows.net
Namespace names alias the cluster DNS
name. EH relies on that hostname to identify
the namespace tenant and it can therefore
not be further aliased.
TLS 1.2 is the default. All
current, supported clients use
TLS 1.2 and all traffic
generally uses TLS.
Legacy clients are still
permitted to use TLS <1.2,
customer controlled.
TLS is terminated at the
gateway VMs.
Common, namespace-level IP filter and VNet/PEP firewall policy enforcement on each VM.
Each cluster has an Azure-private
"IPv6 Service Endpoint" address for
private endpoints.
Customer Virtual Network: 10.0.0.0/8
Subnet 1: 10.1.1.0/24
VM
10.1.1.28
Private Endpoint
10.1.1.42
IPv6 SE
Client-Side Firewalls & Proxies
WebSockets AMQP tunneling
allows port 443 firewall traversal.
19. Event Streaming with Azure
Stream Analytics
Jobs
Azure Data Explorer
Real-time stream processing
Data Lake Storage
Gen2
Storage blob
Streaming ETL
Big data analytics
Azure Synapse
Analytics
Function Apps
Kubernetes
Services
Event Streaming Apps
20. Azure Schema Registry
• Event Streaming often requires structured
data.
• New consumers need to understand the
format of the messages.
• Validate event stream data, evolution of
event data
• Interaction of producers and consumers
without directly sharing schema.
• Included with Azure Event Hubs with no
additional cost.
21. Real-time event stream processing with Azure Stream
Analytics
• Process large volumes of streaming data with sub-
millisecond latencies with Azure Stream Analytics
• Create streaming pipelines using intuitive graphical drag-
and-drop tool which is built into Event Hubs and runs on
Azure Stream Analytics.
22. Capturing Event Streams
• You can capture event streams to data lakes and warehouses using built-in capture
feature or using Azure Stream Analytics jobs.
23. Data loading to Azure Data Explorer from Event Hubs
• Azure Data Explorer offers ingestion (data loading) from Event Hubs for near-real time real-
time analysis on large volumes of streaming data.
24. Learn more at
https://aka.ms/eventhubs
Checkout our blogs for updates and more
https://blogs.msdn.microsoft.com/eventhubs/
Contact us
askeventhubs@microsoft.com
Find us on GitHub
https://github.com/Azure/azure-event-hubs
Learn about EventHubs on Azure Stack
https://aka.ms/eventhubsonstack
Learn about Dedicated Event Hubs Clusters
https://aka.ms/eventhubsclusterquickstart
Event Hubs Resources
26. Event Hubs – High Level Architecture
Coordinate ownership of partitions
across multiple receivers
Clients can use any
native protocol.
Partitions are like lanes
on a freeway. More
lanes, more throughput.
Entity/Topic
AMQP 1.0
27. Similar yet very different
Azure Event Hubs Apache Kafka
User Model Partitioned event stream broker with high-availability replication Partitioned event stream broker with high-availability replication
Architecture Multi-tenant, 3-Tier Gateway/Broker/Storage cluster model, with tenant-
isolation, all tiers independently scalable
Single-tenant monolith. Need to increase broker instances in a cluster to
scale any dimension.
Implementation Language C# and Native (C/C++) Java
Cluster Manager Azure Service Fabric (inline) Apache Zookeeper (external); KRaft (inline, experimental)
Partition Mapping Key hashing, client or server-side mapping of events Key hashing, client-side mapping of events
Consumer Partition Ownership Coordination Server-coordinated partition ownership (Kafka), client-coordinated
ownership with external leader election. Parallel, direct partition reads.
Server-coordinated partition ownership
Server Workload Balancing Dynamic and fully automated (100% hands-off). Broker resource allocation
independent of partition count or ownership, flexible scaling.
Static assignment of partitions to broker instances requiring operator
intervention for rebalancing.
Storage Model Replicated log store, synchronous per-message flush-to-disk on all replicas Replicated log store, asynchronous flush-to-disk controlled by host file
system write cache settings.
Networking Single endpoint access to all partitions, Public IP/DNS or Virtual
Networking, Firewall.
Endpoint per broker instance. Multiple IPs required. Complex network
management required.
Access Control Token-based access policy model with unlimited publisher policies, Azure
Active Directory role-based access control
Local accounts, federation extensibility.
Protocols AMQP 1.0 (optional: WebSockets)
HTTPS 1.1
Apache Kafka RPC
Apache Kafka RPC
Batching / Archives Avro-packaged batch-packaging and archival to blob store
Schema Registry Schema Registry based on open CNCF Schema Registry API (Proprietary from commercial vendors)
Azure Event Hubs vs Apache Kafka®