SlideShare a Scribd company logo
1 of 38
Download to read offline
Data Pipelines with Azure Synapse:
Real-life scenarios and solutions
Dustin Vannoy
dustin@dustinvannoy.com
Dustin
Consultant – Data Engineer
/in/dustinvannoy
youtube.com/DustinVannoy
dustinvannoy.com
Data Engineering SD meetup
Technologies
➢ Azure & AWS
➢ Apache Spark
➢ Apache Kafka
➢ Azure Synapse Analytics
➢ Python & Scala
Vannoy
@dustinvannoy
Agenda
What is a Data Pipeline?
Technology Overview
Scenario 1: Ingest from Azure Storage
Scenario 2: Ingest from SQL Server
Scenario 3: Ingest streaming data
What is a Data
Pipeline?
Defining Data Pipeline (General)
A set of jobs that process data
from one place to another.
Defining Data Pipeline (Typical Use)
The process of bringing data into a
data lake or data warehouse,
including cleaning, enriching, and
transforming data.
Data Lake Defined
Big Data Capable
Store first, evaluate
and model later
Data Zones Ready for Analysts
Query layer, other
analytic tools access
Raw
Enriched
Curated / Certified
Data Warehouse Defined
Structured Data
Processed and
modeled for
analytics use
Interactive query
Analysts can get
answers to questions
quickly
BI tool support
Reporting tools can
query efficiently
Curate
Enrich
Clean
Make Available
Collect
Data Ingestion Decisions
Do we use Azure Data Factory or Synapse Pipelines?
How do we schedule and orchestrate job steps?
How do we monitor job success?
Do we attempt to validate data quality?
Any field level encryption required?
Technology Overview
Data Lake Storage, Gen 2
• Built on Azure Blob Storage
• Hadoop compatible access
• Optimized for cloud analytics
• Low cost: $
Managed Apache Spark
Synapse Pipelines
Serverless & Dedicated SQL
Data Explorer
AZURE SYNAPSE ANALYTICS
Serverless Apache Spark for data
processing and exploration
Synapse Pipelines for no-code or
low-code data ingestion
Serverless SQL for easy querying
Dedicated SQL for high
performance analytic queries
using MPP database
Synapse Capabilities
Ingest from Azure
Storage
Synapse Data Lake Ingest
Sources
Azure Data Lake
Storage
Synapse Spark
Why Spark?
Big data and the
cloud changed our
mindset. We want
tools that scale
easily as data size
grows.
⮚ Fast, general purpose data
processing
⮚ Simple code for
distributed processing
⮚ Many options to develop
and run
Simple code, parallel compute
Worker
Controller
Worker
Worker
Worker
Demo
Azure
Storage
Ingest
Ingest from SQL
Server
Ingest from SQL Server
How can I keep the table schema?
How will I maintain this as new tables get added?
How will I deal with new or removed columns?
Can I do a full reload of every table for every run?
Is it outside of our Azure virtual network?
Can private endpoint be easily configured?
Do I need to add specific IPs to an allow list?
Demo
SQL Server
Ingest
Ingest from Event
Stream
Synapse Spark Streaming
Apache Kafka
Synapse Spark
Sources Data Lake Storage
Why Kafka? Apache Kafka is a
scalable message broker
/ distributed log.
Producers can quickly
publish and move on
while data is persisted
for all consumers.
Reliable place to
stream events;
decoupled from
destination
Distributed Log (message broker)
Decouple producer and consumer
Durable storage
Low-latency
High scalability
Apache Kafka
Hub for streaming data
Data Lake
Post data
User Dashboard
Real-time report
User data
Apache Kafka / Event Hubs
What is Spark Structured Streaming?
"The simplest way to perform streaming analytics is not having to
reason about streaming at all"
A table that is constantly appended with each micro-batch
- Tathagata Das “TD”
Reference: https://youtu.be/rl8dIzTpxrI
Structured Streaming - Read
df = spark.readStream
.format("kafka")
.options(**consumer_config)
.load()
Structured Streaming - Write
df.writeStream
.format("kafka")
.options(**producer_config)
.option("checkpointLocation","/tmp/cp001")
.start()
Structured Streaming –Checkpoint
df.writeStream
.format("delta")
.outputMode("append")
.option("checkpointLocation","/chkpnt/dq1")
.start("/tmp/demo_out"))
Structured Streaming – Output Mode
df.writeStream
.format("delta")
.outputMode("append")
.option("checkpointLocation","/chkpnt/dq1")
.start("/tmp/demo_out"))
Spark Streaming Benefits
● Re-use Spark batch code
● Stateful streaming and joins
● Mature with many integrations
● Kafka or Event Hubs not required
Demo
Ingest
Event
Stream
Final Thoughts
Session Feedback Surveys
In the pursuit of making our conferences even better, we need to hear your
feedback about this session.
Here’s How -
▪ Simply go to the Whova App on your smartphone
▪ Go to the conference homepage
▪ Scroll down to ‘Additional Resources’ and click ‘Surveys’.
▪ Click ‘Session Feedback’.
▪ Scroll down to click on this session title.
▪ Complete the session feedback survey.
▪ Finally, click ‘Submit’
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf

More Related Content

Similar to DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf

OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
 

Similar to DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf (20)

Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
 
Spark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkSpark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with Spark
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
 
AWS Big Data Landscape
AWS Big Data LandscapeAWS Big Data Landscape
AWS Big Data Landscape
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline Accelerator
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream Analytics
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 

Recently uploaded

Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 

Recently uploaded (20)

Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 

DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf

  • 1. Data Pipelines with Azure Synapse: Real-life scenarios and solutions Dustin Vannoy dustin@dustinvannoy.com
  • 2. Dustin Consultant – Data Engineer /in/dustinvannoy youtube.com/DustinVannoy dustinvannoy.com Data Engineering SD meetup Technologies ➢ Azure & AWS ➢ Apache Spark ➢ Apache Kafka ➢ Azure Synapse Analytics ➢ Python & Scala Vannoy @dustinvannoy
  • 3. Agenda What is a Data Pipeline? Technology Overview Scenario 1: Ingest from Azure Storage Scenario 2: Ingest from SQL Server Scenario 3: Ingest streaming data
  • 4. What is a Data Pipeline?
  • 5. Defining Data Pipeline (General) A set of jobs that process data from one place to another.
  • 6. Defining Data Pipeline (Typical Use) The process of bringing data into a data lake or data warehouse, including cleaning, enriching, and transforming data.
  • 7. Data Lake Defined Big Data Capable Store first, evaluate and model later Data Zones Ready for Analysts Query layer, other analytic tools access Raw Enriched Curated / Certified
  • 8. Data Warehouse Defined Structured Data Processed and modeled for analytics use Interactive query Analysts can get answers to questions quickly BI tool support Reporting tools can query efficiently
  • 10. Data Ingestion Decisions Do we use Azure Data Factory or Synapse Pipelines? How do we schedule and orchestrate job steps? How do we monitor job success? Do we attempt to validate data quality? Any field level encryption required?
  • 12. Data Lake Storage, Gen 2 • Built on Azure Blob Storage • Hadoop compatible access • Optimized for cloud analytics • Low cost: $
  • 13. Managed Apache Spark Synapse Pipelines Serverless & Dedicated SQL Data Explorer AZURE SYNAPSE ANALYTICS
  • 14. Serverless Apache Spark for data processing and exploration Synapse Pipelines for no-code or low-code data ingestion Serverless SQL for easy querying Dedicated SQL for high performance analytic queries using MPP database Synapse Capabilities
  • 16. Synapse Data Lake Ingest Sources Azure Data Lake Storage Synapse Spark
  • 17. Why Spark? Big data and the cloud changed our mindset. We want tools that scale easily as data size grows. ⮚ Fast, general purpose data processing ⮚ Simple code for distributed processing ⮚ Many options to develop and run
  • 18. Simple code, parallel compute Worker Controller Worker Worker Worker
  • 21. Ingest from SQL Server How can I keep the table schema? How will I maintain this as new tables get added? How will I deal with new or removed columns? Can I do a full reload of every table for every run? Is it outside of our Azure virtual network? Can private endpoint be easily configured? Do I need to add specific IPs to an allow list?
  • 24. Synapse Spark Streaming Apache Kafka Synapse Spark Sources Data Lake Storage
  • 25. Why Kafka? Apache Kafka is a scalable message broker / distributed log. Producers can quickly publish and move on while data is persisted for all consumers. Reliable place to stream events; decoupled from destination
  • 26. Distributed Log (message broker) Decouple producer and consumer Durable storage Low-latency High scalability Apache Kafka
  • 27. Hub for streaming data Data Lake Post data User Dashboard Real-time report User data Apache Kafka / Event Hubs
  • 28. What is Spark Structured Streaming? "The simplest way to perform streaming analytics is not having to reason about streaming at all" A table that is constantly appended with each micro-batch - Tathagata Das “TD” Reference: https://youtu.be/rl8dIzTpxrI
  • 29. Structured Streaming - Read df = spark.readStream .format("kafka") .options(**consumer_config) .load()
  • 30. Structured Streaming - Write df.writeStream .format("kafka") .options(**producer_config) .option("checkpointLocation","/tmp/cp001") .start()
  • 32. Structured Streaming – Output Mode df.writeStream .format("delta") .outputMode("append") .option("checkpointLocation","/chkpnt/dq1") .start("/tmp/demo_out"))
  • 33. Spark Streaming Benefits ● Re-use Spark batch code ● Stateful streaming and joins ● Mature with many integrations ● Kafka or Event Hubs not required
  • 36.
  • 37. Session Feedback Surveys In the pursuit of making our conferences even better, we need to hear your feedback about this session. Here’s How - ▪ Simply go to the Whova App on your smartphone ▪ Go to the conference homepage ▪ Scroll down to ‘Additional Resources’ and click ‘Surveys’. ▪ Click ‘Session Feedback’. ▪ Scroll down to click on this session title. ▪ Complete the session feedback survey. ▪ Finally, click ‘Submit’