Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Ingestion in data pipelines with Managed Kafka Clusters in Azure HDInsight
1.
2.
3.
4. Reliable open-source
99.9% availability SLA
Monitoring
(OMS)
Visual Studio, IntelliJ and Eclipse support for
developers and data scientists
Enterprise grade Security Kerberos
Apache Ranger
Install & use big data applications
Azure Marketplace
Azure
HDInsight
Cloud Spark and Hadoop
service for your enterprise
(Spark, Hive, MR, LLAP,
Kafka, HBase, Storm)
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
7. • Managed Kafka clusters with 99.9% service level
SLA
• Native integration with Azure Managed Disks.
Allows for exponentially lower costs, and higher
scale.
• Scalable On Demand clusters - Kafka clusters
with 16 TB/node and Zookeeper up and running
in 15 minutes
• Rack awareness for Kafka on the Azure cloud
• Alerting and predictive cluster maintenance
through Azure Operations Management Suite
• Extensibility via one click deploy of leading ISVs
such as StreamSets
• Disaster recovery support via MirrorMaker
• Deploy End to End streaming pipelines with
Storm, Spark, Storage via automated ARM
templates in the same VNET.
8. Modern Data Warehouse: Real-time analytics
Unstructured data
Azure storage
Azure HDInsight (LLAP)
Azure HDInsight
(Kafka)
Analytic Dashboards
Model & ServePrep & TrainStoreIngest Intelligence
SQL DW
Azure Databricks
(Spark)
Azure HDInsight
(Spark)
13. Siphon on HDInsight Kafka 8 million
EVENTS PER SECOND PEAK INGRESS
800 TB (10 GB per Sec)
INGRESS PER DAY
1,800; 450
PRODUCTION KAFKA BROKERS; TOPICS
15 Sec
99th PERCENTILE LATENCY
KEY CUSTOMER
SCENARIOS
Ads Monetization (Fast BI)
O365 Customer Fabric NRT – Tenant & User insights
BingNRT Operational Intelligence
Presto (Fast SML) interactive analysis
Delve Analytics
0
5
10
15
20
25
30
35
40
45
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16
Nov-16
Dec-16
Throughput(inGBps)
Siphon Data Volume (Ingress and Egress)
Volume published (GBps) Volume subscribed (GBps)
0
5
10
15
20
25
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16
Nov-16
Dec-16
Throughput(eventspersec)Millions
Siphon Events per second (Ingress and Egress)
EPS In Eps Out
14.
15.
16.
17. Getting Started with Kafka for HDInsight
Structured Streaming with HDInsight Kafka and Spark
Deploy HDInsight Kafka + Storm
Stream data from on-premise to HDInsight Kafka in the cloud
https://academy.microsoft.com/en-us/professional-program/big-data/
https://www.pluralsight.com/courses/spark-kafka-cassandra-applying-lambda-architecture
https://azure.microsoft.com/en-us/blog/announcing-apache-kafka-for-azure-
hdinsight-general-availability/
https://azure.microsoft.com/en-us/blog/announcing-public-preview-of-apache-kafka-
on-hdinsight-with-azure-managed-disks/