SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Downloaden Sie, um offline zu lesen
Schema
Evolution 

Patterns
Alex Rasmussen
alex@bitsondisk.com John Gould (14.Sep.1804 - 3.Feb.1881) [Public domain]
Hi, I’m Alex!
Twitter, GitHub, etc: @alexras

alex@bitsondisk.com
https://www.bitsondisk.com/
LA-based 

Data Engineering 

Consultant
Data Migrations Are Hard
•How should I migrate? When?
•Will the old code still work?
•Can I migrate without downtime?
•Can I roll back? How? What if I can’t?
Should I just use MongoDB? 

How do I decide?
API Versioning Is Hard
•When should I bump versions?
•Will old callers break?
•Should I just switch to GraphQL?
THESE HARD
PROBLEMS 

ARE THE SAME
PROBLEM!
Schema Evolution
•The structure of your data will change
•What does that mean for the data?
•How should our systems respond?
•How does this impact how we build?
Goals of This Talk
•Mental toolkit for reasoning about 

schema evolution that applies

across technologies and domains
•How this looks in practice
https://github.com/bitsondisk/velocity-sj-2019References:
1. WHAT ARE SCHEMAS?
2. SCHEMA COMPATIBILITY
3. CROSSING COMPATIBILITY GAPS
4. WRAPPING UP / TAKEAWAYS
1. WHAT ARE SCHEMAS?
2. SCHEMA COMPATIBILITY
3. CROSSING COMPATIBILITY GAPS
4. WRAPPING UP / TAKEAWAYS
Schemas
•From the Greek skhēma, meaning 

“form” or “figure”. The data’s shape.
Person {
name: string,
age: integer,
favorite_color: enum(RED, BLUE, …),
. . .
}
What/Where Is the Schema?
•Explicit: SQL tables, Thrift, Avro*
•Implicit: CSV, Excel, JSON
•Separate: JSON Schema, schema registries,
an ex-employee’s brain
When Are Schemas Enforced?
•On Write: e.g. in RDBMS tables
•On Read: enforce after reading
•On Need: dynamic, late-binding
•Can enforce one schema, or many
1. WHAT ARE SCHEMAS?
2. SCHEMA COMPATIBILITY
3. CROSSING COMPATIBILITY GAPS
4. WRAPPING UP / TAKEAWAYS
Schema Compatibility
•If two schemas are compatible, evolving
from one schema to another can be done

automatically on read*
•Readers can be oblivious to schema change**
•Two directions: backwards and forwards
Backwards-Compatible
X X+1
Data written with old schema 

readable by clients with new schema
C
Forwards-Compatible
X X+1
Data written with new schema 

readable by clients with old schema
C
Add a Field With a Default
name: string,
age: integer,
is_admin: boolean (default: false)
name: “Bob Jones”,
age: 42
name: “Bob Jones”,
age: 42,
is_admin: false
Backwards: substitute default in older data
Forwards: ignore field in newer data
Remove a Field With a Default
name: string,
age: integer,
is_admin: boolean (default: false)
pto_days_left: int (default: 0)
name: “Alice Smith”,
age: 29,
is_admin: true
pto_days_left: 16
name: “Alice Smith”,
age: 29
is_admin: true
Backwards: ignore field in older data
Forwards: substitute default in newer data
Other Types of Changes
•Without defaults:
- Adding a field breaks backwards-compatibility
- Removing a field breaks forwards-compatibility
•Field type changes: typecasting rules apply(?)
•Renaming and reordering: it depends
In Practice - Protocol Buffers
message Person {
required string name = 1;
required int32 age = 2;
optional bool is_admin = 3 

[default = false];
}
•Field numbers can’t change or be reused
•In proto3, no required or optional
In Practice - Avro
record Person {
string name;
int age;
boolean is_admin = false;
}
•Reorders are OK, but not renames
•Compatibility determined by rules
In Practice - GraphQL
•Every field is nullable by default
•Only return queried fields (bounded enforcement)
•Hide deprecated fields; deletes tricky
type Person {
name: String!,
age: Int!,
is_admin: Boolean
}
In Practice - Wix
•Adding fields
- Non-indexed fields: JSON blob column 

(schema-on-read!)
- Indexed fields: new table + join
•Removing fields
- Don’t. Just stop using them (similar to GraphQL)
Recap
•Compatibility makes evolution easier
•Changes can be 

backwards-compatible, 

forwards-compatible, both, or neither
•Ease-of-compatibility drives many
decisions behind “rules” of format design
1. WHAT ARE SCHEMAS?
2. SCHEMA COMPATIBILITY
3. CROSSING COMPATIBILITY GAPS
4. WRAPPING UP / TAKEAWAYS
Crossing Compatibility Gaps
•Not all schema changes are compatible
•Not all incompatibilities are simple
•Not all compatible changes are practical
•How do we deal with that?
Complex Changes
name: string,
first_name: string,
last_name: string,
age: integer,
is_admin: boolean (default: false)
•Not obvious how to split
•Client code changes (likely) required
Impractical Changes
•e.g. Adding a column in MySQL requires
locking/copying the table
- Days to weeks not unheard of
Crossing Compatibility Gaps
•We’ll look at:
- Multi-schema stores (Kafka, et al)
- Single-schema stores (RDBMS, et al)
Multi-Schema Stores
C1 C2 C3 C4
1 2 3 4 5
Step 1: Old clients write with new schema
1 2 3 4 5
C1 C2 C3 C4
C1 C2
Step 2: Migrate old data to new schema
1 2 3 4 5
C1 C2 C3 C4
C1 C2
Step 3: Migrate old clients to new schema
1 2 3 4 5
C1 C2 C3 C4
In Practice: Kafka (Confluent)
•Stores Avro schemas in schema registry
with compatibility check built-in
•Backwards-compatible changes: 

update readers first
•Forwards-compatible changes: 

update writers first
In Practice: Stripe
•Goal: (backwards-incompatible) API
responses that work for old clients, forever
•Version change modules applied
backwards from present to client’s version
•They admit: this is hard, can’t last forever
Recap
•In multi-schema stores:
- Migrate client writes/reads
- Migrate data
- Migrate client reads/writes
Single Schema Stores
X-1 X
C1
C2 C3
C4S
Single-Schema Migration
X X+1
C1 C2 C3 C4
S
Move from X 

to (incompatible) X + 1 

without downtime
Step 1: Create and migrate temporary store S’
X X+1
C1 C2 C3 C4
S
S’
Step 2: Create a copier and an updater
X X+1
C1 C2 C3 C4
S
S’
U
C
Step 3.1: Move clients over to new schema
X X+1
C1 C2 C3 C4
S
S’
U
C
Step 3.2: Copy data, record / apply updates
X X+1
C1 C2 C3 C4
S S’
U
C
Step 4: Cutover - S’ becomes S
X X+1
C1 C2 C3 C4
SSold
U
Step 5: Drain updater, delete Sold
X X+1
C1 C2 C3 C4
S
In Practice - Percona
• pt-online-schema-change
- Copier: scan/copy in timed chunks
- Updater: synchronous table triggers
- Cutover: RENAME TABLE
In Practice - Facebook
•OSC (Online Schema Change)
- Copier: dump/load snapshot chunks
- Updater: triggers + changelog
- Cutover: lock, replay, rename, unlock
In Practice - GitHub
•gh-ost
- Copier: chunked reads/writes
- Updater: read binlog, interleave copies
- Cutover: 2-step blocking swap
In Practice - Wix
•Updater client-side, controlled with feature toggles
- Phase 1: write to S and S’
- Phase 2: read from S’, fallback to S
- Phase 3: write only to new
•Copy phase eagerly after phase 3
Recap
•In single-schema stores:
- Migrate clients, retaining old schema
- Create data w/ new schema, over time
- When in sync, cut over
1. WHAT ARE SCHEMAS?
2. SCHEMA COMPATIBILITY
3. CROSSING COMPATIBILITY GAPS
4. WRAPPING UP / TAKEAWAYS
Wrapping Up
•Concepts covered here cut across
RDBMSs, APIs, document stores, 

data lakes, etc.
•Reason about schema evolution up-front
to guide your architecture choices
•Your schemas will change. Be prepared.
A Few Takeaways
•Make your changes compatible if you can
•Compatibility informs migration strategy
•Schema-on-read/need won’t save you, 

but it helps sometimes
•One size does not fit all
Thank You! Questions?
https://www.bitsondisk.com/
Twitter, GitHub,etc: @alexras
alex@bitsondisk.com
https://github.com/bitsondisk/velocity-sj-2019
Session Page O’Reilly Events App
Rate This TalkContact Me
References:

Weitere ähnliche Inhalte

Ähnlich wie Schema Evolution Patterns - Velocity SJ 2019

FP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit HoleFP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit Hole
Christophe Grand
 
Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache Spark
Databricks
 

Ähnlich wie Schema Evolution Patterns - Velocity SJ 2019 (20)

Webinar: Migrating from RDBMS to MongoDB (June 2015)
Webinar: Migrating from RDBMS to MongoDB (June 2015)Webinar: Migrating from RDBMS to MongoDB (June 2015)
Webinar: Migrating from RDBMS to MongoDB (June 2015)
 
Evolutionary database design
Evolutionary database designEvolutionary database design
Evolutionary database design
 
Relational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsRelational data modeling trends for transactional applications
Relational data modeling trends for transactional applications
 
Couchbase 3.0.2 d1
Couchbase 3.0.2  d1Couchbase 3.0.2  d1
Couchbase 3.0.2 d1
 
Webinar: Migrating from RDBMS to MongoDB
Webinar: Migrating from RDBMS to MongoDBWebinar: Migrating from RDBMS to MongoDB
Webinar: Migrating from RDBMS to MongoDB
 
Rails data migrations
Rails data migrationsRails data migrations
Rails data migrations
 
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 20197 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design Patterns
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
 
Migrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMigrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDB
 
FP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit HoleFP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit Hole
 
Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache Spark
 
12 in 12 – A closer look at twelve or so new things in Postgres 12
12 in 12 – A closer look at twelve or so new things in Postgres 1212 in 12 – A closer look at twelve or so new things in Postgres 12
12 in 12 – A closer look at twelve or so new things in Postgres 12
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
SQL Server Managing Test Data & Stress Testing January 2011
SQL Server Managing Test Data & Stress Testing January 2011SQL Server Managing Test Data & Stress Testing January 2011
SQL Server Managing Test Data & Stress Testing January 2011
 

Mehr von Alex Rasmussen

Mehr von Alex Rasmussen (6)

EarthBound’s almost-Turing-complete text system!
EarthBound’s almost-Turing-complete text system!EarthBound’s almost-Turing-complete text system!
EarthBound’s almost-Turing-complete text system!
 
How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018
How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018
How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018
 
Unorthodox Paths to High Performance - QCon NY 2016
Unorthodox Paths to High Performance - QCon NY 2016Unorthodox Paths to High Performance - QCon NY 2016
Unorthodox Paths to High Performance - QCon NY 2016
 
Papers We Love January 2015 - Flat Datacenter Storage
Papers We Love January 2015 - Flat Datacenter StoragePapers We Love January 2015 - Flat Datacenter Storage
Papers We Love January 2015 - Flat Datacenter Storage
 
Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)
 
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
 

Kürzlich hochgeladen

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Kürzlich hochgeladen (20)

WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 

Schema Evolution Patterns - Velocity SJ 2019

  • 1. Schema Evolution 
 Patterns Alex Rasmussen alex@bitsondisk.com John Gould (14.Sep.1804 - 3.Feb.1881) [Public domain]
  • 2. Hi, I’m Alex! Twitter, GitHub, etc: @alexras
 alex@bitsondisk.com https://www.bitsondisk.com/ LA-based 
 Data Engineering 
 Consultant
  • 3. Data Migrations Are Hard •How should I migrate? When? •Will the old code still work? •Can I migrate without downtime? •Can I roll back? How? What if I can’t?
  • 4. Should I just use MongoDB? 
 How do I decide?
  • 5. API Versioning Is Hard •When should I bump versions? •Will old callers break? •Should I just switch to GraphQL?
  • 6. THESE HARD PROBLEMS 
 ARE THE SAME PROBLEM!
  • 7. Schema Evolution •The structure of your data will change •What does that mean for the data? •How should our systems respond? •How does this impact how we build?
  • 8. Goals of This Talk •Mental toolkit for reasoning about 
 schema evolution that applies
 across technologies and domains •How this looks in practice https://github.com/bitsondisk/velocity-sj-2019References:
  • 9. 1. WHAT ARE SCHEMAS? 2. SCHEMA COMPATIBILITY 3. CROSSING COMPATIBILITY GAPS 4. WRAPPING UP / TAKEAWAYS
  • 10. 1. WHAT ARE SCHEMAS? 2. SCHEMA COMPATIBILITY 3. CROSSING COMPATIBILITY GAPS 4. WRAPPING UP / TAKEAWAYS
  • 11. Schemas •From the Greek skhēma, meaning 
 “form” or “figure”. The data’s shape. Person { name: string, age: integer, favorite_color: enum(RED, BLUE, …), . . . }
  • 12. What/Where Is the Schema? •Explicit: SQL tables, Thrift, Avro* •Implicit: CSV, Excel, JSON •Separate: JSON Schema, schema registries, an ex-employee’s brain
  • 13. When Are Schemas Enforced? •On Write: e.g. in RDBMS tables •On Read: enforce after reading •On Need: dynamic, late-binding •Can enforce one schema, or many
  • 14. 1. WHAT ARE SCHEMAS? 2. SCHEMA COMPATIBILITY 3. CROSSING COMPATIBILITY GAPS 4. WRAPPING UP / TAKEAWAYS
  • 15. Schema Compatibility •If two schemas are compatible, evolving from one schema to another can be done
 automatically on read* •Readers can be oblivious to schema change** •Two directions: backwards and forwards
  • 16. Backwards-Compatible X X+1 Data written with old schema 
 readable by clients with new schema C
  • 17. Forwards-Compatible X X+1 Data written with new schema 
 readable by clients with old schema C
  • 18. Add a Field With a Default name: string, age: integer, is_admin: boolean (default: false) name: “Bob Jones”, age: 42 name: “Bob Jones”, age: 42, is_admin: false Backwards: substitute default in older data Forwards: ignore field in newer data
  • 19. Remove a Field With a Default name: string, age: integer, is_admin: boolean (default: false) pto_days_left: int (default: 0) name: “Alice Smith”, age: 29, is_admin: true pto_days_left: 16 name: “Alice Smith”, age: 29 is_admin: true Backwards: ignore field in older data Forwards: substitute default in newer data
  • 20. Other Types of Changes •Without defaults: - Adding a field breaks backwards-compatibility - Removing a field breaks forwards-compatibility •Field type changes: typecasting rules apply(?) •Renaming and reordering: it depends
  • 21. In Practice - Protocol Buffers message Person { required string name = 1; required int32 age = 2; optional bool is_admin = 3 
 [default = false]; } •Field numbers can’t change or be reused •In proto3, no required or optional
  • 22. In Practice - Avro record Person { string name; int age; boolean is_admin = false; } •Reorders are OK, but not renames •Compatibility determined by rules
  • 23. In Practice - GraphQL •Every field is nullable by default •Only return queried fields (bounded enforcement) •Hide deprecated fields; deletes tricky type Person { name: String!, age: Int!, is_admin: Boolean }
  • 24. In Practice - Wix •Adding fields - Non-indexed fields: JSON blob column 
 (schema-on-read!) - Indexed fields: new table + join •Removing fields - Don’t. Just stop using them (similar to GraphQL)
  • 25. Recap •Compatibility makes evolution easier •Changes can be 
 backwards-compatible, 
 forwards-compatible, both, or neither •Ease-of-compatibility drives many decisions behind “rules” of format design
  • 26. 1. WHAT ARE SCHEMAS? 2. SCHEMA COMPATIBILITY 3. CROSSING COMPATIBILITY GAPS 4. WRAPPING UP / TAKEAWAYS
  • 27. Crossing Compatibility Gaps •Not all schema changes are compatible •Not all incompatibilities are simple •Not all compatible changes are practical •How do we deal with that?
  • 28. Complex Changes name: string, first_name: string, last_name: string, age: integer, is_admin: boolean (default: false) •Not obvious how to split •Client code changes (likely) required
  • 29. Impractical Changes •e.g. Adding a column in MySQL requires locking/copying the table - Days to weeks not unheard of
  • 30. Crossing Compatibility Gaps •We’ll look at: - Multi-schema stores (Kafka, et al) - Single-schema stores (RDBMS, et al)
  • 31. Multi-Schema Stores C1 C2 C3 C4 1 2 3 4 5
  • 32. Step 1: Old clients write with new schema 1 2 3 4 5 C1 C2 C3 C4 C1 C2
  • 33. Step 2: Migrate old data to new schema 1 2 3 4 5 C1 C2 C3 C4 C1 C2
  • 34. Step 3: Migrate old clients to new schema 1 2 3 4 5 C1 C2 C3 C4
  • 35. In Practice: Kafka (Confluent) •Stores Avro schemas in schema registry with compatibility check built-in •Backwards-compatible changes: 
 update readers first •Forwards-compatible changes: 
 update writers first
  • 36. In Practice: Stripe •Goal: (backwards-incompatible) API responses that work for old clients, forever •Version change modules applied backwards from present to client’s version •They admit: this is hard, can’t last forever
  • 37. Recap •In multi-schema stores: - Migrate client writes/reads - Migrate data - Migrate client reads/writes
  • 38. Single Schema Stores X-1 X C1 C2 C3 C4S
  • 39. Single-Schema Migration X X+1 C1 C2 C3 C4 S Move from X 
 to (incompatible) X + 1 
 without downtime
  • 40. Step 1: Create and migrate temporary store S’ X X+1 C1 C2 C3 C4 S S’
  • 41. Step 2: Create a copier and an updater X X+1 C1 C2 C3 C4 S S’ U C
  • 42. Step 3.1: Move clients over to new schema X X+1 C1 C2 C3 C4 S S’ U C
  • 43. Step 3.2: Copy data, record / apply updates X X+1 C1 C2 C3 C4 S S’ U C
  • 44. Step 4: Cutover - S’ becomes S X X+1 C1 C2 C3 C4 SSold U
  • 45. Step 5: Drain updater, delete Sold X X+1 C1 C2 C3 C4 S
  • 46. In Practice - Percona • pt-online-schema-change - Copier: scan/copy in timed chunks - Updater: synchronous table triggers - Cutover: RENAME TABLE
  • 47. In Practice - Facebook •OSC (Online Schema Change) - Copier: dump/load snapshot chunks - Updater: triggers + changelog - Cutover: lock, replay, rename, unlock
  • 48. In Practice - GitHub •gh-ost - Copier: chunked reads/writes - Updater: read binlog, interleave copies - Cutover: 2-step blocking swap
  • 49. In Practice - Wix •Updater client-side, controlled with feature toggles - Phase 1: write to S and S’ - Phase 2: read from S’, fallback to S - Phase 3: write only to new •Copy phase eagerly after phase 3
  • 50. Recap •In single-schema stores: - Migrate clients, retaining old schema - Create data w/ new schema, over time - When in sync, cut over
  • 51. 1. WHAT ARE SCHEMAS? 2. SCHEMA COMPATIBILITY 3. CROSSING COMPATIBILITY GAPS 4. WRAPPING UP / TAKEAWAYS
  • 52. Wrapping Up •Concepts covered here cut across RDBMSs, APIs, document stores, 
 data lakes, etc. •Reason about schema evolution up-front to guide your architecture choices •Your schemas will change. Be prepared.
  • 53. A Few Takeaways •Make your changes compatible if you can •Compatibility informs migration strategy •Schema-on-read/need won’t save you, 
 but it helps sometimes •One size does not fit all
  • 54. Thank You! Questions? https://www.bitsondisk.com/ Twitter, GitHub,etc: @alexras alex@bitsondisk.com https://github.com/bitsondisk/velocity-sj-2019 Session Page O’Reilly Events App Rate This TalkContact Me References: