Slide deck of the third part of building Modern Data Warehouse using Azure. This session covered Azure Synapse, formerly SQL Data Warehouse. We look at the Azure Synapse Architecture, external files, integration with Azuer Data Factory.
The recording of the session is available on YouTube
https://www.youtube.com/watch?v=LZlu6_rFzm8&WT.mc_id=DP-MVP-5003170
6. Part 1 - Recap â ADLS & ADF
⢠Petabyte scale storage
⢠Hierarchical namespace
⢠Hadoop compatible access with ABFS
driver
ADLS - Main features
ADF - Main features
⢠Cloud ETL service
⢠Scale-out serverless data integration & data
transformation
⢠Code-free UI
⢠Monitoring & Management
7. Part 2 - Recap
⢠Collaborative Spark based Analytical service
⢠Different cluster types (automated / interactive / pool)
⢠Autoscale based on workloads
⢠Fine grained access controls
Azure Databricks - Main features
9. Parallelism
⢠Uses many separate CPUs running in parallel to execute a single
program
⢠Shared Nothing: Each CPU has its own memory and disk (scale-out)
⢠Segments communicate using high-speed network between nodes
MPP - Massively
Parallel
Processing
⢠Multiple CPUs used to complete individual processes simultaneously
⢠All CPUs share the same memory, disks, and network controllers (scale-up)
⢠All SQL Server implementations up until now have been SMP
⢠Mostly, the solution is housed on a shared SAN
SMP - Symmetric
Multiprocessing
16. External Data Sources
⢠External Data Source
⢠Hadoop, ADLS
⢠External File Format
⢠File types
⢠Delimited Text, Hive RCFile, Hive ORC file, Parquet
⢠Data Compression
⢠Gzip, Snappy
⢠Field Delimiters
⢠Date Format
⢠External Table
17. What workloads are NOT suitable?
⢠High frequency reads and writes.
⢠Large numbers of singleton
selects.
⢠High volumes of single row
inserts.
Operational workloads (OLTP)
⢠Row by row processing needs.
⢠Incompatible formats (XML).
Data Preparations
SQL
SQL
18. What Workloads are Suitable?
Store large volumes of data.
Consolidate disparate data into a single location.
Shape, model, transform and aggregate data.
Batch/Micro-batch loads.
Perform query analysis across large datasets.
Ad-hoc reporting across large data volumes.
All using simple SQL constructs.
Analytics
19. Summary
⢠MPP Architecture
⢠Can be paused
⢠Optimized for analytics workloads
⢠Supports multiple external file formats
⢠Works with Polybase
Azure Synapse - Main features
20. SQL Server & SQL Data Warehouse Differences
Azure Synapse
Workload Management
External Data Source
External File Formats
External Table
SQL Data Warehouse Benchmark
21. References â MS Learn
https://docs.microsoft.com/en-us/learn/paths/implement-sql-data-warehouse
22. Thank you very much
Code with Passion and Strive for Excellence
https://www.slideshare.net/nileshgule/presentations
https://speakerdeck.com/nileshgule/
23. Nilesh Gule
ARCHITECT | MICROSOFT MVP
âCode with Passion and
Strive for Excellenceâ
nileshgule @nileshgule Nilesh Gule
NileshGule
www.handsonarchitect.com