SlideShare a Scribd company logo
1 of 33
Download to read offline
Building Merge on Read on Delta Lake
Justin Breese
Senior Solutions Architect
Nick Karpov
Resident Solutions Architect
Who are we?
Justin Breese
justin.breese@databricks.com | Los Angeles
Senior Strategic Solutions Architect
I pester Nick with a lot of questions and thoughts
Nick Karpov
nick.karpov@databricks.com | San Francisco
Senior Resident Solutions Architect
History & Music
Agenda
▪ Background: Copy on Write (COW) & Merge on Read (MOR)
▪ Use case, challenges, & MOR strategies
▪ Testing: choosing the right MOR strategy
▪ Rematerialization?
Problem statement(s)
▪ Dealing with highly random and update heavy CDC streams
▪ Wanting to be able to get fresh data at any given time
Summary
▪ Using MOR allows for faster writes and still get reads that can meet SLAs
Building Merge on Read on Data Lake
▪ What is Merge on Read (MOR) and Copy on Write (COW)?
▪ What is the use case?
▪ Why did we build?
▪ What is the architecture?
▪ How to test and verify it?
Copy on Write (COW) and Merge on read (MOR)
Copy on Write (COW)
▪ TL;DR the merge is done during the write
▪ Default config for Delta Lake
▪ Data is “merged” into a Delta table by physically rewriting existing files
with modifications before making available to the reader
▪ In Delta Lake, merge is a three-step process
▪ Great for write once read many scenarios
▪
Delta Lake Merge - Under the hood
▪ source: new data, target: existing data (Delta table)
▪ Phase 1: Find the input files in target that are touched by the rows that
satisfy the condition and verify that no two source rows match with the
same target row [innerJoin]
▪ Phase 2: Read the touched files again and write new files with updated
and/or inserted rows
▪ Phase 3: Use the Delta protocol to atomically remove the touched files and
add the new files (write stuff to object/blob store)
COW: What is Delta Lake doing under the hood?
Phase 2: Read the touched files again and write new files
with updated and/or inserted rows.
The type of join can vary depending on the conditions of the merge:
▪ Insert only merge (e.g. no updates/deletes) → leftAntiJoin on the
source to find the inserts
▪ Matched only clauses (e.g. when matched) → rightOuterJoin
▪ Else (e.g. you have updates, deletes, and inserts) → fullOuterJoin
Phase 2 double click
Merge on Read (MOR)
▪ TL;DR the “merge” is done during the read
▪ Common strategy: don’t logically merge until you NEED the result
▪ Implementation? Two tables and a view
▪ Materialized table
▪ Changelog table (can be a diff, Avro, parquet, etc.)
▪ View that acts as the referee between the two and is the source of truth
Which one do you pick? Well it depends...
or
write many read less
write less read many
Use case
Use case info
▪ 100-200/second (6k-12k/minute)
▪ CDC data coming from Kafka
▪ usually 1-3 columns are changing
▪ partial updates
▪ Each row has a unique ID
▪ 200GB active files; growing at a small rate
▪ SLA: read updates to point lookups in <5 min
▪ Currently doing daily batch overwrites; data can be up to 24 hours
stale
Initial observations and problems encountered
▪ Lots of updates: 96% of events
▪ Matching condition is uniformly distributed across the target
▪ No natural partitioning keys
▪ Sample of 50k events could have 2k different days of updates
▪ Default Delta Lake Merge configs were not performing well
▪ Ended up rewriting almost the entire table each merge
Architecture: what did we settle on? MOR
This is what we will talk about
Snapshot & Changeset
▪ Snapshot: base table ▪ Changset: append only
Primary Key id
Most recent data fragno
Partitioning optional (depends on use case)
many data
columns
….
Primary Key id
Most recent data fragno
Partitioning Structured Streaming batchId (this is
important!)
many data
columns
...
Changeset
▪ Get the unique values in the changeset - primaryKey and latest
▪ As I have partial updates, I need to coalesce(changes, baseline)
▪ Check to understand if the dataframe can be broadcasted?
▪ If I believe I can broadcast 1GB data and each row is 364b, then I can broadcast anything up to
2.8M rows. If the changeset is > 2.8M rows ⇒ do not broadcast -- because memory!
* if your changeset is small enough
View: Methods to join rankedChangeset into the baseline
doubleRankOver
fullOuterJoin
leftJoinAntiJoin broadcastable!
leftJoinUnionInserts: broadcastable! Great if you are guaranteed that your inserts are not upserts!
▪ Now that we have our changeset… we still need to compare these values to the baseline table to get the
latest by id
▪ There are several methods to do this
How to pick the right view - perfTesting!
Testing [normally] takes a long time
▪ Things to consider:
▪ How many tests are sufficient?
▪ How can I make them as even as possible?
▪ What do you actually want to test?
▪ Why is this part so hard and manual?
▪ Databricks has a `/runs/submit` API - starts a fresh cluster for each run
▪ Databricks notebooks have widgets which act as params
▪ Let’s do 3 tests for each viewType (method) and each operation
(read/write) ⇒ 3 * 4 * 2 = 24 tests!
But it doesn’t have to!
Create the widgets in your Notebooks
Create your results payload (note: we are calling the widgets as params)
Create a timer function
Save to Delta table (note: payload)
Operation to test
Case statement to match the method and supply the correct view - send it to the stopwatch utility
Configuring the API
Check out my gitHub [https://github.com/justinbreese/databricks-gems#perftestautomationpy]
Made a simple script that leverages the Databricks runs/submit API
Run info
Cluster info
Here is what we will create:
Run Operation Method
0 Read leftJoinUnionInserts
1 Read leftJoinUnionInserts
2 Read leftJoinUnionInserts
0 Read outerJoined
1 Read outerJoined
2 Read outerJoined
0 Read antiJoinLeftJoinUnion
1 Read antiJoinLeftJoinUnion
2 Read antiJoinLeftJoinUnion
Calling the API
Check out gitHub [https://github.com/justinbreese/databricks-gems#perftestautomationpy]
Made a simple script that leverages the Databricks runs/submit API
python3 perfTestAutomation.py -t <userAccessToken> -s 0 -j artifacts/perfTest.json
View the results
leftJoinUnionInserts in the winner for the view
Recap thus far
Now we will talk about this part
Periodic Rematerialization
Periodic Rematerialization
▪ If changes are getting appended consistently, then you’ll have more
and more rows to compare against
▪ This makes your read performance degrade over time
▪ Therefore, you need to do a periodic job that will reset your baseline
table
▪ And yes, there are some choices that you have for this:
Because you need to reset your baseline table for read perf
Method Consideration(s)
Merge Easy, very helpful if you have many larger partitions and only a smaller subset of partitions need to be changed,
and built into Delta Lake
Overwrite Easy, great if you do not have or cannot partition, or if all/most partitions need to be changed
replaceWhere Moderate, only can be used if you have partitions, built into Delta Lake
Periodic Rematerialization
▪ Now that we’ve materialized the new changes into the baseline, we want to delete those batches that we
don’t need
▪ Since we partitioned by batchId, when we delete those previous batches, this is a metadata only
operation and super fast/cheap - line 68
▪ We do this so we don’t duplicate changes and because we don’t need them anymore
▪ Remember: we have an initial bronze table that has all of our changes so we always have this if we ever need them
Code! Remember that we said that the batchId is important?
Periodic Rematerialization
▪ Yes, you can even do some perfTesting on this to understand which
method fits your use case best
▪ Our use case ended up using overwrite as it was a better fit
▪ Changes happened very randomly; going back up to 2000+ days
▪ Dataset was ~200GB; partitioning was not able to be effective
▪ 200GB is small and we can overwrite the complete table in <10 min with 80 cores
Final recap
▪ Talked about the use case
▪ Introduced the MOR architecture
▪ Talked about the two tables
▪ Different views and understanding their differences
▪ How to test the different view methods
▪ Periodic rematerialization
This wouldn’t have been possible without help
from:
Chris Fish
Daniel Tomes
Tathagata Das (TD)
Burak Yavuz
Joe Widen
Denny Lee
Paul Roome
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

More Related Content

What's hot

The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in Impala
Cloudera, Inc.
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 

What's hot (20)

Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in Impala
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slides
 

Similar to Delta: Building Merge on Read

Similar to Delta: Building Merge on Read (20)

Operating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionOperating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in Production
 
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 20197 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
 
Structured Data Extraction
Structured Data ExtractionStructured Data Extraction
Structured Data Extraction
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersFrom HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
Object Compaction in Cloud for High Yield
Object Compaction in Cloud for High YieldObject Compaction in Cloud for High Yield
Object Compaction in Cloud for High Yield
 
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
Ten query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should knowTen query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should know
 
Performant Django - Ara Anjargolian
Performant Django - Ara AnjargolianPerformant Django - Ara Anjargolian
Performant Django - Ara Anjargolian
 
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...
 
Embedded Systems: Lecture 12: Introduction to Git & GitHub (Part 3)
Embedded Systems: Lecture 12: Introduction to Git & GitHub (Part 3)Embedded Systems: Lecture 12: Introduction to Git & GitHub (Part 3)
Embedded Systems: Lecture 12: Introduction to Git & GitHub (Part 3)
 
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksAccelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on Databricks
 
FluentMigrator - Dayton .NET - July 2023
FluentMigrator - Dayton .NET - July 2023FluentMigrator - Dayton .NET - July 2023
FluentMigrator - Dayton .NET - July 2023
 
Ibm redbook
Ibm redbookIbm redbook
Ibm redbook
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
 
Sql 2016 - What's New
Sql 2016 - What's NewSql 2016 - What's New
Sql 2016 - What's New
 
Retaining Goodput with Query Rate Limiting
Retaining Goodput with Query Rate LimitingRetaining Goodput with Query Rate Limiting
Retaining Goodput with Query Rate Limiting
 

More from Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Delta: Building Merge on Read

  • 1. Building Merge on Read on Delta Lake Justin Breese Senior Solutions Architect Nick Karpov Resident Solutions Architect
  • 2. Who are we? Justin Breese justin.breese@databricks.com | Los Angeles Senior Strategic Solutions Architect I pester Nick with a lot of questions and thoughts Nick Karpov nick.karpov@databricks.com | San Francisco Senior Resident Solutions Architect History & Music
  • 3. Agenda ▪ Background: Copy on Write (COW) & Merge on Read (MOR) ▪ Use case, challenges, & MOR strategies ▪ Testing: choosing the right MOR strategy ▪ Rematerialization?
  • 4. Problem statement(s) ▪ Dealing with highly random and update heavy CDC streams ▪ Wanting to be able to get fresh data at any given time Summary ▪ Using MOR allows for faster writes and still get reads that can meet SLAs
  • 5. Building Merge on Read on Data Lake ▪ What is Merge on Read (MOR) and Copy on Write (COW)? ▪ What is the use case? ▪ Why did we build? ▪ What is the architecture? ▪ How to test and verify it?
  • 6. Copy on Write (COW) and Merge on read (MOR)
  • 7. Copy on Write (COW) ▪ TL;DR the merge is done during the write ▪ Default config for Delta Lake ▪ Data is “merged” into a Delta table by physically rewriting existing files with modifications before making available to the reader ▪ In Delta Lake, merge is a three-step process ▪ Great for write once read many scenarios ▪
  • 8. Delta Lake Merge - Under the hood ▪ source: new data, target: existing data (Delta table) ▪ Phase 1: Find the input files in target that are touched by the rows that satisfy the condition and verify that no two source rows match with the same target row [innerJoin] ▪ Phase 2: Read the touched files again and write new files with updated and/or inserted rows ▪ Phase 3: Use the Delta protocol to atomically remove the touched files and add the new files (write stuff to object/blob store)
  • 9. COW: What is Delta Lake doing under the hood? Phase 2: Read the touched files again and write new files with updated and/or inserted rows. The type of join can vary depending on the conditions of the merge: ▪ Insert only merge (e.g. no updates/deletes) → leftAntiJoin on the source to find the inserts ▪ Matched only clauses (e.g. when matched) → rightOuterJoin ▪ Else (e.g. you have updates, deletes, and inserts) → fullOuterJoin Phase 2 double click
  • 10. Merge on Read (MOR) ▪ TL;DR the “merge” is done during the read ▪ Common strategy: don’t logically merge until you NEED the result ▪ Implementation? Two tables and a view ▪ Materialized table ▪ Changelog table (can be a diff, Avro, parquet, etc.) ▪ View that acts as the referee between the two and is the source of truth
  • 11. Which one do you pick? Well it depends... or write many read less write less read many
  • 13. Use case info ▪ 100-200/second (6k-12k/minute) ▪ CDC data coming from Kafka ▪ usually 1-3 columns are changing ▪ partial updates ▪ Each row has a unique ID ▪ 200GB active files; growing at a small rate ▪ SLA: read updates to point lookups in <5 min ▪ Currently doing daily batch overwrites; data can be up to 24 hours stale
  • 14. Initial observations and problems encountered ▪ Lots of updates: 96% of events ▪ Matching condition is uniformly distributed across the target ▪ No natural partitioning keys ▪ Sample of 50k events could have 2k different days of updates ▪ Default Delta Lake Merge configs were not performing well ▪ Ended up rewriting almost the entire table each merge
  • 15. Architecture: what did we settle on? MOR This is what we will talk about
  • 16. Snapshot & Changeset ▪ Snapshot: base table ▪ Changset: append only Primary Key id Most recent data fragno Partitioning optional (depends on use case) many data columns …. Primary Key id Most recent data fragno Partitioning Structured Streaming batchId (this is important!) many data columns ...
  • 17. Changeset ▪ Get the unique values in the changeset - primaryKey and latest ▪ As I have partial updates, I need to coalesce(changes, baseline) ▪ Check to understand if the dataframe can be broadcasted? ▪ If I believe I can broadcast 1GB data and each row is 364b, then I can broadcast anything up to 2.8M rows. If the changeset is > 2.8M rows ⇒ do not broadcast -- because memory! * if your changeset is small enough
  • 18. View: Methods to join rankedChangeset into the baseline doubleRankOver fullOuterJoin leftJoinAntiJoin broadcastable! leftJoinUnionInserts: broadcastable! Great if you are guaranteed that your inserts are not upserts! ▪ Now that we have our changeset… we still need to compare these values to the baseline table to get the latest by id ▪ There are several methods to do this
  • 19. How to pick the right view - perfTesting!
  • 20. Testing [normally] takes a long time ▪ Things to consider: ▪ How many tests are sufficient? ▪ How can I make them as even as possible? ▪ What do you actually want to test? ▪ Why is this part so hard and manual? ▪ Databricks has a `/runs/submit` API - starts a fresh cluster for each run ▪ Databricks notebooks have widgets which act as params ▪ Let’s do 3 tests for each viewType (method) and each operation (read/write) ⇒ 3 * 4 * 2 = 24 tests! But it doesn’t have to!
  • 21. Create the widgets in your Notebooks Create your results payload (note: we are calling the widgets as params)
  • 22. Create a timer function Save to Delta table (note: payload) Operation to test Case statement to match the method and supply the correct view - send it to the stopwatch utility
  • 23. Configuring the API Check out my gitHub [https://github.com/justinbreese/databricks-gems#perftestautomationpy] Made a simple script that leverages the Databricks runs/submit API Run info Cluster info Here is what we will create: Run Operation Method 0 Read leftJoinUnionInserts 1 Read leftJoinUnionInserts 2 Read leftJoinUnionInserts 0 Read outerJoined 1 Read outerJoined 2 Read outerJoined 0 Read antiJoinLeftJoinUnion 1 Read antiJoinLeftJoinUnion 2 Read antiJoinLeftJoinUnion
  • 24. Calling the API Check out gitHub [https://github.com/justinbreese/databricks-gems#perftestautomationpy] Made a simple script that leverages the Databricks runs/submit API python3 perfTestAutomation.py -t <userAccessToken> -s 0 -j artifacts/perfTest.json
  • 25. View the results leftJoinUnionInserts in the winner for the view
  • 26. Recap thus far Now we will talk about this part
  • 28. Periodic Rematerialization ▪ If changes are getting appended consistently, then you’ll have more and more rows to compare against ▪ This makes your read performance degrade over time ▪ Therefore, you need to do a periodic job that will reset your baseline table ▪ And yes, there are some choices that you have for this: Because you need to reset your baseline table for read perf Method Consideration(s) Merge Easy, very helpful if you have many larger partitions and only a smaller subset of partitions need to be changed, and built into Delta Lake Overwrite Easy, great if you do not have or cannot partition, or if all/most partitions need to be changed replaceWhere Moderate, only can be used if you have partitions, built into Delta Lake
  • 29. Periodic Rematerialization ▪ Now that we’ve materialized the new changes into the baseline, we want to delete those batches that we don’t need ▪ Since we partitioned by batchId, when we delete those previous batches, this is a metadata only operation and super fast/cheap - line 68 ▪ We do this so we don’t duplicate changes and because we don’t need them anymore ▪ Remember: we have an initial bronze table that has all of our changes so we always have this if we ever need them Code! Remember that we said that the batchId is important?
  • 30. Periodic Rematerialization ▪ Yes, you can even do some perfTesting on this to understand which method fits your use case best ▪ Our use case ended up using overwrite as it was a better fit ▪ Changes happened very randomly; going back up to 2000+ days ▪ Dataset was ~200GB; partitioning was not able to be effective ▪ 200GB is small and we can overwrite the complete table in <10 min with 80 cores
  • 31. Final recap ▪ Talked about the use case ▪ Introduced the MOR architecture ▪ Talked about the two tables ▪ Different views and understanding their differences ▪ How to test the different view methods ▪ Periodic rematerialization
  • 32. This wouldn’t have been possible without help from: Chris Fish Daniel Tomes Tathagata Das (TD) Burak Yavuz Joe Widen Denny Lee Paul Roome
  • 33. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.