SlideShare ist ein Scribd-Unternehmen logo
1 von 111
Downloaden Sie, um offline zu lesen
The 30-Month Migration
Glenn Vanderburg

VP of Engineering, First.io

@glv
Changing Your Data Model

Is Hard!
Living With a Poor Data Model

Is Also Hard!
Four Stages, 2½ Years
15Nov
Today
9Feb
Stage 2 Stage 3Stage 1 Stage 4
20Jun
18Nov
15Jan
26Jan
20Mar
7Dec
2016 2017 2018 2019
30 months!
15Nov
Today
9Feb
Stage 2 Stage 3Stage 1 Stage 4
20Jun
18Nov
15Jan
26Jan
20Mar
7Dec
2016 2017 2018 2019
30 months!
15Nov
Today
9Feb
Stage 2 Stage 3Stage 1 Stage 4
20Jun
18Nov
15Jan
26Jan
20Mar
7Dec
2016 2017 2018 2019
A Technical Talk,

with Mostly Non-Technical Lessons
Three Principles
•Validation

•Reversibility

•Transparency
Introduction:

Our Big Mistakes*
* So far.
Realtor
Abby
Isaac
Jane
Kathy
Lee
Mike
Nancy
Realtor
Bill
Oscar
Pat
Quentin
Robert
Sally
Tina
Realtor
Abby
Isaac
Jane
Kathy
Lee
Mike
Nancy
Realtor
Bill
Oscar
Pat
Quentin
Robert
Sally
Tina
Realtor
Abby
Isaac
Jane
Kathy
Lee
Mike
Nancy
Realtor
Bill
Oscar
Pat
Quentin
Robert
Sally
Tina
Realtor
Abby
Isaac
Jane
Kathy
Lee
Mike
Nancy
Realtor
Bill
Oscar
Pat
Quentin
Robert
Sally
Tina
Realtor
Abby
Isaac
Jane
Kathy
Lee
Mike
Nancy
Realtor
Bill
Oscar
Pat
Quentin
Robert
Sally
Tina
–Gerald Weinberg
Things are the way they are
because they got that way.
Realtor
Abby
Postgres Neo4j
Realtor
Bill
Realtor′
Abby
Realtor′
Bill
Isaac
Jane
Kathy
Lee
Mike
Nancy
Oscar
Pat
Quentin
Robert
Sally
Tina
CR
CR
CR
CR
CR
CR
CR
CR
CR
CR
CR
CR
CR
CR
CR
CR
Isaac
Jane
Kathy
Lee
Mike
Nancy
Oscar
Pat
Quentin
Robert
Sally
Tina
Abby
Bill
Abby
Bill
realtors Realtors′
ContactRelationships
Contacts
Other tables
here: subscriptions,
payments, notes,
appointments, etc.
Many other
relationships
between
contacts, and
between
contacts and
their attributes.
Postgres Neo4j
Postgres
Isaac
Jane
Kathy
Lee
Mike
Nancy
Quentin
Sally
…
Kathy
Nancy
Oscar
Pat
Quentin
Robert
Sally
Tina
Abby
Bill
realtors
contacts
Stage 1:

From Neo4j to Postgres
What Drove the Change?
• Neo4j/Cypher not as familiar to developers as Postgres/SQL

• Neo4j ActiveModel gem less mature and feature-rich than
ActiveRecord

• Neo4j drivers less mature, less well optimized

• Some features required cross-database joins (slow, memory
intensive)
Making a Plan
• Realtor-by-realtor migration

• An importer job that would import a realtor’s Neo4j data
into Postgres

• The importer needed to avoid duplicating shared data that
had already been imported for another realtor

• We would use a feature flag to indicate whether a realtor
had been migrated or not
Schema Definition
• We knew our data in Neo4j was messy.

• Neo4j’s referential integrity features weaker than Postgres’

• We weren’t skilled at using the features Neo4j did have

• We got very serious about data integrity in the schema:

• foreign keys, ON CASCADE, check constraints, exclusion
constraints

• This was enormously helpful!
Switching Models
• The feature flag needed to be readily available everywhere, so we set a
thread-local variable in middleware.

• A lot of queries start off by calling class methods on a model class

• We needed that model class to be the ActiveRecord model if the current
realtor’s feature flag was set, and the Neo4j model otherwise
Person.find(35)
# or
Property.where(zip5: "75238")
Switching Models
• Exploiting Ruby’s dynamic nature, we were able to build models that
could be Neo4j or ActiveRecord models, depending on the feature flag.
class Contact
extend SwitchingModel
switch_between(::ContactV1, ::ContactV2)
end
class ContactV1
include Neo4j::ActiveNode
self.mapped_label_name = "Contact"
# ... Neo4j::ActiveNode model code
end
class ContactV2 < ApplicationRecord
self.table_name = :contacts
# ... ActiveRecord model code
end
Switching Models
module SwitchingModel
def switch_between(v1_model, v2_model)  
@_v1_model = v1_model 
@_v2_model = v2_model 
end 
private 
def _v2_mode?   
Thread.current.thread_variable_get(:moved_to_postgres) || 
ENV['FORCE_V2_FEATURE_FLAG'] == '1' 
end
def _switch 
return @_v2_model if _v2_mode?  
@_v1_model  
end 
end 
Switching Models
module SwitchingModel
def method_missing(meth, *args, &blk)   
_switch.send(meth, *args, &blk)   
end 
def const_missing(name) 
_switch.const_get(name)   
end
def new(*args)
_switch.new(*args)   
end 
private 
# ...
end
Scopes and More Scopes
• A lot of queries contained Cypher fragments

• Converting those to scopes allowed controllers to use
the same queries, whether the feature flag was set or not

• Built a rich vocabulary of scopes that has served us well
ever since
Testing
• Environment variable override of feature flag

• Rake tasks for running two sets of specs

• Separate sets of factories

• CI running both sets

• Lots of comparison testing by developers

• Whole company QA swarm in staging
Tracking Progress
• Excellent advice from Jess Martin, our CTO

• Added an RSpec custom formatter to output total number of v2 specs vs.
number of passing v2 specs.

• Those went into a spreadsheet with a chart:
Executing
• Select employees first (those not doing sales and demos)

• Rest of employees

• Friendly customers (who would inform us of issues)

• Rest of active customers

• The whole process took about three weeks
Finishing the Job
• After the initial round of employee and select customer migrations, we
kicked off the first full batch of customers.
• All of a sudden, I had nothing to do!
• “I may as well start on the PR to rip out all the V1 and transitional code …”
• 10 hours later:
Isaac
Jane
Kathy
Lee
Mike
Nancy
Oscar
Pat
Quentin
Robert
Sally
Tina
Abby
Bill
realtors
contact_relationships
contacts
Postgres
Stage 2:

Change Primary Keys to Integers
What Drove the Change?
• Postgres UUID primary keys work just fine.

• Harder to remember, vdiff, type

• Didn’t become an issue until we needed to start tracking source
info for a different table that had an integer primary key.

• We track sources using a polymorphic join table
(sourcings).
A Spike
⭐ id ⭐ first …
abc Joe
def Susan
ghi Rachel
jkl Todd
mno Melanie
contact_names
A Spike
id first … integer_id
abc Joe 1
def Susan 2
ghi Rachel 3
jkl Todd 4
mno Melanie 5
⭐ id ⭐
contact_names
A Spike
uuid first … integer_id
abc Joe 1
def Susan 2
ghi Rachel 3
jkl Todd 4
mno Melanie 5
contact_names
A Spike
uuid first … id
abc Joe 1
def Susan 2
ghi Rachel 3
jkl Todd 4
mno Melanie 5
⭐ id ⭐
contact_names
Problem: Foreign Key References
⭐ id ⭐ …
abc
def
ghi
jkl
mno
properties property_notes
⭐ property_id ⭐ …
jkl
def
abc
mno
ghi
Problem: Foreign Key References
⭐ id ⭐ … integer_id
abc 1
def 2
ghi 3
jkl 4
mno 5
properties property_notes
⭐ property_id ⭐ …
jkl
def
abc
mno
ghi
Problem: Foreign Key References
id … integer_id
abc 1
def 2
ghi 3
jkl 4
mno 5
properties property_notes
property_id … int_property_id
jkl 4
def 2
abc 1
mno 5
ghi 3
⭐ id ⭐ ⭐ property_id ⭐
Problem: Foreign Key References
id … integer_id
abc 1
def 2
ghi 3
jkl 4
mno 5
properties property_notes
property_id … int_property_id
jkl 4
def 2
abc 1
mno 5
ghi 3
Problem: Foreign Key References
uuid … integer_id
abc 1
def 2
ghi 3
jkl 4
mno 5
properties property_notes
property_id … int_property_id
jkl 4
def 2
abc 1
mno 5
ghi 3
Problem: Foreign Key References
uuid … integer_id
abc 1
def 2
ghi 3
jkl 4
mno 5
properties property_notes
… int_property_id
4
2
1
5
3
Problem: Foreign Key References
uuid … id
abc 1
def 2
ghi 3
jkl 4
mno 5
properties property_notes
… int_property_id
4
2
1
5
3
Problem: Foreign Key References
uuid … id
abc 1
def 2
ghi 3
jkl 4
mno 5
properties property_notes
… property_id
4
2
1
5
3
Problem: Foreign Key References
uuid … id
abc 1
def 2
ghi 3
jkl 4
mno 5
properties property_notes
… property_id
4
2
1
5
3
⭐ property_id ⭐⭐ id ⭐
Problem: Polymorphic Tables
• Remember, this started because of a polymorphic join table,
sourcings

• Required converting all tables referenced by the polymorphic
table at once

• Ended up with 5 separate clusters of tables.

• Wrote migration helpers to manage the details and make
things reversible.
Discovering Constraints
• You can query anything about the schema from a set of internal tables and
views

• Example: finding all foreign key references to the contacts table:

• You can do similar things for indexes and other kinds of constraints
SELECT *
FROM information_schema.constraint_column_usage
WHERE table_name = 'contacts'
AND column_name = 'id'
AND constraint_name <> 'contacts_pkey'
Plan and Wait
• Output of the spike:

• 3 complex migration helpers

• 5 migrations

• Ended up waiting 5 months before the pain outweighed the risk
Five Big Migrations
• Simple case easy:

• Harder cases not so easy:

• Worst case: 3 primary keys, 28 foreign keys, 4 polymorphic tables … all in
one migration.
fix_uuid_primary_key :contact_names
fix_uuid_primary_key :avatars
fix_uuid_primary_key :properties
fix_uuid_foreign_key :properties, :property_notes, on_delete: :cascade
fix_uuid_polymorphic_association :sourcings,
:sourceable,
targets: [:avatars, :properties]
From Spike to Solution
• Careful review of the migrations and helpers

• Ran the migrations many, many times on clone of production DB

• Run, fix error, repeat. (Very thankful for Postgres transactional DDL!)

• Fixing error usually meant figuring out how to reflect on some new kind of
dependency in Postgres and update the helper to deal with it.

• Sometimes meant just coding a workaround for an odd case.
Being Careful
• Ran the migrations in staging for timings

• We had the luxury of downtime!

• But we wanted to understand how long each maintenance window would be.

• Made them reversible!

• We planned for never having to reverse them, including careful testing and
random spot-checks in the migrations.

• But we also made sure they could be reversed (including round-trip testing of
both schema and table contents).
Being Careful
• Build correctness checking into the migration helpers

• Remember: we kept the uuid column

• At start of change: store random sample of records

• After change: find those records and ensure they still refer to same UUID

• Finally deployed on five consecutive weekends (simplest first)
Isaac
Jane
Kathy
Lee
Mike
Nancy
Oscar
Pat
Quentin
Robert
Sally
Tina
Abby
Bill
realtors
contact_relationships
contacts
Postgres
Stage 3:

Private, Per-Customer Contacts
Swanand Pagnis
What Drove the Change?
• Nearly every query had to be filtered based on source

• Extra complexity

• Joining through polymorphic table was costly

• Sooner or later we would miss it and violate data privacy
A Spike
• Doing this in Ruby was fairly straightforward … but very slow (about a day
per realtor)

• Doing it in SQL required fairly advanced skills … but took about ten
minutes per realtor

• As with stage 1, decided on a user-by-user approach
The Strategy
id: 1

Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1

id: 3

id: 2

realtors contact_relationships
contacts Name: Nancy

contact_id: 1
Email: nancy@example.com

contact_id: 1
• Simple example: one contact shared by three realtors.
The Strategy
id: 1

Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1

old_contact_id: 1
id: 3

old_contact_id: 1
id: 2

old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy

contact_id: 1
Email: nancy@example.com

contact_id: 1
• First, add old_contact_id column to contact_relationships
• Populate it with current value of contact_id
The Strategy
id: 1

contact_relationship_id: "
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1

old_contact_id: 1
id: 3

old_contact_id: 1
id: 2

old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy

contact_id: 1
Email: nancy@example.com

contact_id: 1
uniqueness constraint

on contact_relationship_id
• Next, add contact_relationship_id column to contacts
• Populate it with NULL (represented as ")
• Add uniqueness constraint for that column
id: 1

old_contact_id: 1
The Strategy
id: 1

contact_relationship_id: "
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 3

old_contact_id: 1
id: 2

old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy

contact_id: 1
Email: nancy@example.com

contact_id: 1
uniqueness constraint

on contact_relationship_id
The Strategy
id: 1

contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1

old_contact_id: 1
id: 3

old_contact_id: 1
id: 2

old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy

contact_id: 1
Email: nancy@example.com

contact_id: 1
uniqueness constraint

on contact_relationship_id
UPDATE contacts
SET contact_relationship_id = contact_relationships.id
FROM contact_relationships
WHERE contacts.id = contact_relationships.contact_id
AND contact_relationships.realtor_id = 1
AND contacts.contact_relationship_id IS NULL
• Update contact_relationship_id IF it’s NULL
The Strategy
id: 1

contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1

old_contact_id: 1
id: 3

old_contact_id: 1
id: 2

old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy

contact_id: 1
Email: nancy@example.com

contact_id: 1
uniqueness constraint

on contact_relationship_id
• Now INSERT into contacts for each of Alice’s contact_relationships
• ON CONFLICT just set updated_at on the existing one
• and then UPDATE contact_relationships to point to the new contact records
?
INSERT with ON CONFLICT
WITH new_contacts AS (
INSERT INTO contacts (cr_id, created_at, updated_at)
(
SELECT cr.id AS cr_id, cr.created_at, cr.updated_at
FROM contacts
INNER JOIN contact_relationships cr
ON contact.id = cr.contact_id
WHERE cr.realtor_id = 1
AND contacts.contact_relationship_id IS NULL
ORDER BY cr_id ASC
)
ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness
DO UPDATE SET updated_at = EXCLUDED.updated_at
RETURNING *)
UPDATE contact_relationships
SET contact_id = new_contacts.id
FROM new_contacts
WHERE contact_relationships.id = new_contacts.contact_relationship_id;
INSERT with ON CONFLICT
WITH new_contacts AS (
INSERT INTO contacts (cr_id, created_at, updated_at)
(
SELECT cr.id AS cr_id, cr.created_at, cr.updated_at
FROM contacts
INNER JOIN contact_relationships cr
ON contact.id = cr.contact_id
WHERE cr.realtor_id = 1
AND contacts.contact_relationship_id IS NULL
ORDER BY cr_id ASC
)
ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness
DO UPDATE SET updated_at = EXCLUDED.updated_at
RETURNING *)
UPDATE contact_relationships
SET contact_id = new_contacts.id
FROM new_contacts
WHERE contact_relationships.id = new_contacts.contact_relationship_id;
INSERT with ON CONFLICT
WITH new_contacts AS (
INSERT INTO contacts (cr_id, created_at, updated_at)
(
SELECT cr.id AS cr_id, cr.created_at, cr.updated_at
FROM contacts
INNER JOIN contact_relationships cr
ON contact.id = cr.contact_id
WHERE cr.realtor_id = 1
AND contacts.contact_relationship_id IS NULL
ORDER BY cr_id ASC
)
ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness
DO UPDATE SET updated_at = EXCLUDED.updated_at
RETURNING *)
UPDATE contact_relationships
SET contact_id = new_contacts.id
FROM new_contacts
WHERE contact_relationships.id = new_contacts.contact_relationship_id;
INSERT with ON CONFLICT
WITH new_contacts AS (
INSERT INTO contacts (cr_id, created_at, updated_at)
(
SELECT cr.id AS cr_id, cr.created_at, cr.updated_at
FROM contacts
INNER JOIN contact_relationships cr
ON contact.id = cr.contact_id
WHERE cr.realtor_id = 1
AND contacts.contact_relationship_id IS NULL
ORDER BY cr_id ASC
)
ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness
DO UPDATE SET updated_at = EXCLUDED.updated_at
RETURNING *)
UPDATE contact_relationships
SET contact_id = new_contacts.id
FROM new_contacts
WHERE contact_relationships.id = new_contacts.contact_relationship_id;
INSERT with ON CONFLICT
WITH new_contacts AS (
INSERT INTO contacts (cr_id, created_at, updated_at)
(
SELECT cr.id AS cr_id, cr.created_at, cr.updated_at
FROM contacts
INNER JOIN contact_relationships cr
ON contact.id = cr.contact_id
WHERE cr.realtor_id = 1
AND contacts.contact_relationship_id IS NULL
ORDER BY cr_id ASC
)
ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness
DO UPDATE SET updated_at = EXCLUDED.updated_at
RETURNING *)
UPDATE contact_relationships
SET contact_id = new_contacts.id
FROM new_contacts
WHERE contact_relationships.id = new_contacts.contact_relationship_id;
INSERT with ON CONFLICT
WITH new_contacts AS (
INSERT INTO contacts (cr_id, created_at, updated_at)
(
SELECT cr.id AS cr_id, cr.created_at, cr.updated_at
FROM contacts
INNER JOIN contact_relationships cr
ON contact.id = cr.contact_id
WHERE cr.realtor_id = 1
AND contacts.contact_relationship_id IS NULL
ORDER BY cr_id ASC
)
ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness
DO UPDATE SET updated_at = EXCLUDED.updated_at
RETURNING *)
UPDATE contact_relationships
SET contact_id = new_contacts.id
FROM new_contacts
WHERE contact_relationships.id = new_contacts.contact_relationship_id;
The Strategy
id: 1

contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1

old_contact_id: 1
id: 3

old_contact_id: 1
id: 2

old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy

contact_id: 1
Email: nancy@example.com

contact_id: 1
uniqueness constraint

on contact_relationship_id
• INSERT into contacts for each of Alice’s contact_relationships
• ON CONFLICT just set updated_at on the existing one
• and then UPDATE contact_relationships to point to the new contact records
id: 1001

contact_relationship_id: 1
X
The Strategy
id: 1

contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1

old_contact_id: 1
id: 3

old_contact_id: 1
id: 2

old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy

contact_id: 1
Email: nancy@example.com

contact_id: 1
uniqueness constraint

on contact_relationship_id
• That time, nothing happened, because Alice was the first realtor for contact 1.
The Strategy
id: 1

contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1

old_contact_id: 1
id: 3

old_contact_id: 1
id: 2

old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy

contact_id: 1
Email: nancy@example.com

contact_id: 1
uniqueness constraint

on contact_relationship_id
• Now let’s try Bill.
The Strategy
id: 1

contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1

old_contact_id: 1
id: 3

old_contact_id: 1
id: 2

old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy

contact_id: 1
Email: nancy@example.com

contact_id: 1
uniqueness constraint

on contact_relationship_id
• We try to claim the contact for Bill by updating

contacts.contact_relationship_id
• But it isn’t NULL, so we don’t update it to 2
X
The Strategy
id: 1

contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1

old_contact_id: 1
id: 3

old_contact_id: 1
id: 2

old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy

contact_id: 1
Email: nancy@example.com

contact_id: 1
uniqueness constraint

on contact_relationship_id
• But the INSERT works because it doesn’t create a uniqueness violation
id: 1001

contact_relationship_id: 2
The Strategy
id: 1

contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1

old_contact_id: 1
id: 3

old_contact_id: 1
id: 2

old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy

contact_id: 1
Email: nancy@example.com

contact_id: 1
uniqueness constraint

on contact_relationship_id
• And then the UPDATE fixes up the contact_relationships record
• But what about the attached attributes?
id: 1001

contact_relationship_id: 2
The Strategy
id: 1

contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1

old_contact_id: 1
id: 3

old_contact_id: 1
id: 2

old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy

contact_id: 1
Email: nancy@example.com

contact_id: 1
uniqueness constraint

on contact_relationship_id
• For each of Bill’s contacts where

contact_relationships.old_contact_id != contacts.id,

go copy all of the attached attributes from old_contact_id
id: 1001

contact_relationship_id: 2
The Strategy
id: 1

contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1

old_contact_id: 1
id: 3

old_contact_id: 1
id: 2

old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy

contact_id: 1
Email: nancy@example.com

contact_id: 1
uniqueness constraint

on contact_relationship_id
• For each of Bill’s contacts where

contact_relationships.old_contact_id != contacts.id,

go copy all of the attached attributes from old_contact_id
• A lot of queries, but basically straightforward
• Then move on to Carl
id: 1001

contact_relationship_id: 2
Email: nancy@example.com

contact_id: 2
Name: Nancy

contact_id: 2
Being Careful
• Again: ran these transformations against a clone of production

• Run for a realtor, compare against that realtor’s production data

• Complete run-through of all realtors in staging before moving on to
production

• During run-through, I plotted changes to table counts as a sanity check
An OUTER JOIN should’ve been an INNER JOIN
Abby
Bill
realtors
contact_relationships
contacts
Postgres
Isaac
Jane
Kathy
Lee
Mike
Nancy
Quentin
Sally
…
Kathy
Nancy
Oscar
Pat
Quentin
Robert
Sally
Tina
Stage 4:

From Join Table to belongs_to
What Drove the Change?
• Everything’s just a little more complex with the join table

• Requires constraints and integrity checks that wouldn’t be
necessary without it

• Another team member challenged me to get rid of it!

• It really wasn’t causing us enough trouble to justify a big push

• But I realized we could set this up to do opportunistically
The Idea
• Go ahead and add the direct contacts.realtor_id foreign key

• Populate it to match the existing contact_relationships.

• Then just make sure they stay consistent!
Triggers
• Rails developers are wary of stored procedures and triggers (for good
reason)

• But sometimes they’re exactly what you need. This is one of those times.

• I had a lot of ignorance to overcome.

• So I worked on a spike, curling up with the Postgres manual and
experimenting …
ContactRelationships Contacts
Realtors
insert
set realtor_id
ContactRelationships Contacts
Realtors
insert
set realtor_id
ContactRelationships Contacts
Realtors
insert
set realtor_id
insert
X
ContactRelationships Contacts
Realtors
insert
set realtor_id
X
set
Triggers Are Difficult

(for me, anyway)
• For efficiency, control the conditions under which invoked

• For correctness, decide before/after

• Carefully write updates/inserts to only make changes if
things are inconsistent
The Plan: A 12-Step Epic
• Step 1: build a way to track progress

• Step 2: build a way to audit the activity of the triggers

• Step 3: add contacts.realtor_id and triggers

• Steps 4–6: move fields from contact_relationships to contacts

• Steps 7–8: retargeting polymorphic associations

• Steps 9-11: retargeting associations, scopes, and query fragments

• Step 12: DROP TABLE contact_relationships
Tracking Progress
rg --count --ignore-file .rg_crprogress_ignore '[Cc]ontact_?[Rr]elationship' 

| cut -d : -f 2 

| sed '2,$s/$/+/; $s/$/p/' 

| dc
Auditing Trigger Activity
• Updated the triggers to log behavior to new
contact_relationship_trigger_actions table.

• Utility script to audit this table for consistency occasionally.
id action contact_relationship_id contact_id performed_update time
4995810 c_setrealtor 5622768 FALSE 2019-02-18 13:31:49.395671
4995811 cr_insert 10607228 5622768 TRUE 2019-02-18 13:31:49.395671
4995812 c_setrealtor 5622769 FALSE 2019-02-18 13:31:50.181528
4995813 cr_insert 10607230 5622769 TRUE 2019-02-18 13:31:50.181528
4995814 c_setrealtor 5622770 FALSE 2019-02-18 13:31:50.474147
Executing
• One step a week

• Each took 8–10 hours, on average

• Most deployments on weekends, even when
no downtime required
Postgres
Isaac
Jane
Kathy
Lee
Mike
Nancy
Quentin
Sally
…
Kathy
Nancy
Oscar
Pat
Quentin
Robert
Sally
Tina
Abby
Bill
realtors
contacts
Lessons and
Recommendations
Slow and Steady
• Incremental, “worst pain first” strategy

• Contained risk

• Enabled feature development

• Produced enormous technical improvement over time
Keep Looking Ahead
• We were always looking for ways to improve the system

• An “inventory of pain” helps you to identify which pain is the worst right
now
Each Stage Was Different!
• Entirely different, creative solutions required at each step

• Ruby magic

• Migrations and database reflections

• Fancy Postgres UPSERT (i.e., INSERT … ON CONFLICT) queries and CTEs

• Triggers

• Entirely different testing strategies, too.

• There is no recipe. Find what works.
Leverage Your Database
• We Rails developers love ActiveRecord and Arel for queries.

• But for all its problems, SQL is powerful.

• Data and referential integrity protections can save you.

• Without Postgres’ transactional DDL, the risk and effort would have been
enormously greater. (I’d guess roughly tenfold.)

• Stored procedures and triggers have their place.
The Luxury of Downtime
• We have the luxury of being able to schedule maintenance time.

• If you can, do that.

• If not, you have to explore other techniques. (It’s worth bringing in an
experienced database consultant if you need to explore these.)
Focus: The Two-Edged Sword
• These kinds of tasks really benefit from intense focus.

• But that kind of focus can keep you from seeing danger.

• Make sure you come up for air and have someone looking over your
shoulder.
What Would We Do Differently?
• If we had clearly understood our end goal, we could have done all of this
in stage 1.

• But we still thought we were building a social graph.

• You can never be sure you understand the future of your business.
What Would We Do Differently?
• There is one mistake we could have avoided based on technical principles.

• We should never have used UUID primary keys.

• They are useful only if you need to distribute primary key creation.

• Probably where contention on the primary key sequence is a bottleneck.

• Maybe also when you need to provide a key with less latency than a DB
round trip.

• THAT’S IT.
Three Principles
•Validation

•Reversibility

•Transparency

Weitere ähnliche Inhalte

Ähnlich wie The 30-Month Migration

Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agilityelliando dias
 
GOTO Night: Decision Making Based on Machine Learning
GOTO Night: Decision Making Based on Machine LearningGOTO Night: Decision Making Based on Machine Learning
GOTO Night: Decision Making Based on Machine LearningOUTFITTERY
 
Bahaviour Driven Development
Bahaviour Driven DevelopmentBahaviour Driven Development
Bahaviour Driven Developmentbuildmaster
 
Decision Making based on Machine Learning at Outfittery (W-JAX 2017)
Decision Making based on Machine Learning at Outfittery (W-JAX 2017)Decision Making based on Machine Learning at Outfittery (W-JAX 2017)
Decision Making based on Machine Learning at Outfittery (W-JAX 2017)OUTFITTERY
 
Ten Commandments Of A Software Engineer
Ten Commandments Of A Software EngineerTen Commandments Of A Software Engineer
Ten Commandments Of A Software EngineerSebastian Marek
 
How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...
How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...
How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...OUTFITTERY
 
Migrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMigrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMatt Graham
 
50 Shades of Fail KScope16
50 Shades of Fail KScope1650 Shades of Fail KScope16
50 Shades of Fail KScope16Christian Berg
 
Moving away from legacy code with BDD
Moving away from legacy code with BDDMoving away from legacy code with BDD
Moving away from legacy code with BDDKonstantin Kudryashov
 
How we integrate Machine Learning Algorithms into our IT Platform at Outfittery
How we integrate Machine Learning Algorithms into our IT Platform at OutfitteryHow we integrate Machine Learning Algorithms into our IT Platform at Outfittery
How we integrate Machine Learning Algorithms into our IT Platform at OutfitteryOUTFITTERY
 
An Introduction To Software Development - Final Review
An Introduction To Software Development - Final ReviewAn Introduction To Software Development - Final Review
An Introduction To Software Development - Final ReviewBlue Elephant Consulting
 
Customer Feedback: the missing piece of the Agile puzzle
Customer Feedback: the missing piece of the Agile puzzleCustomer Feedback: the missing piece of the Agile puzzle
Customer Feedback: the missing piece of the Agile puzzleskierkowski
 
Intro to javascript (6:27)
Intro to javascript (6:27)Intro to javascript (6:27)
Intro to javascript (6:27)David Coulter
 
Asynchronous Awesome
Asynchronous AwesomeAsynchronous Awesome
Asynchronous AwesomeFlip Sasser
 
DBmaestro's State of the Database Continuous Delivery Survey- Findings Revealed
DBmaestro's State of the Database Continuous Delivery Survey- Findings RevealedDBmaestro's State of the Database Continuous Delivery Survey- Findings Revealed
DBmaestro's State of the Database Continuous Delivery Survey- Findings RevealedDBmaestro - Database DevOps
 
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Fwdays
 
Asufe juniors-training session2
Asufe juniors-training session2Asufe juniors-training session2
Asufe juniors-training session2Omar Ahmed
 

Ähnlich wie The 30-Month Migration (20)

Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
GOTO Night: Decision Making Based on Machine Learning
GOTO Night: Decision Making Based on Machine LearningGOTO Night: Decision Making Based on Machine Learning
GOTO Night: Decision Making Based on Machine Learning
 
Bahaviour Driven Development
Bahaviour Driven DevelopmentBahaviour Driven Development
Bahaviour Driven Development
 
Decision Making based on Machine Learning at Outfittery (W-JAX 2017)
Decision Making based on Machine Learning at Outfittery (W-JAX 2017)Decision Making based on Machine Learning at Outfittery (W-JAX 2017)
Decision Making based on Machine Learning at Outfittery (W-JAX 2017)
 
Ten Commandments Of A Software Engineer
Ten Commandments Of A Software EngineerTen Commandments Of A Software Engineer
Ten Commandments Of A Software Engineer
 
How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...
How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...
How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...
 
Migrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMigrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without Downtime
 
50 Shades of Fail KScope16
50 Shades of Fail KScope1650 Shades of Fail KScope16
50 Shades of Fail KScope16
 
Moving away from legacy code with BDD
Moving away from legacy code with BDDMoving away from legacy code with BDD
Moving away from legacy code with BDD
 
How we integrate Machine Learning Algorithms into our IT Platform at Outfittery
How we integrate Machine Learning Algorithms into our IT Platform at OutfitteryHow we integrate Machine Learning Algorithms into our IT Platform at Outfittery
How we integrate Machine Learning Algorithms into our IT Platform at Outfittery
 
Lua pitfalls
Lua pitfallsLua pitfalls
Lua pitfalls
 
Introduction to C ++.pptx
Introduction to C ++.pptxIntroduction to C ++.pptx
Introduction to C ++.pptx
 
An Introduction To Software Development - Final Review
An Introduction To Software Development - Final ReviewAn Introduction To Software Development - Final Review
An Introduction To Software Development - Final Review
 
rnd teams.pptx
rnd teams.pptxrnd teams.pptx
rnd teams.pptx
 
Customer Feedback: the missing piece of the Agile puzzle
Customer Feedback: the missing piece of the Agile puzzleCustomer Feedback: the missing piece of the Agile puzzle
Customer Feedback: the missing piece of the Agile puzzle
 
Intro to javascript (6:27)
Intro to javascript (6:27)Intro to javascript (6:27)
Intro to javascript (6:27)
 
Asynchronous Awesome
Asynchronous AwesomeAsynchronous Awesome
Asynchronous Awesome
 
DBmaestro's State of the Database Continuous Delivery Survey- Findings Revealed
DBmaestro's State of the Database Continuous Delivery Survey- Findings RevealedDBmaestro's State of the Database Continuous Delivery Survey- Findings Revealed
DBmaestro's State of the Database Continuous Delivery Survey- Findings Revealed
 
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
 
Asufe juniors-training session2
Asufe juniors-training session2Asufe juniors-training session2
Asufe juniors-training session2
 

Kürzlich hochgeladen

SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 

Kürzlich hochgeladen (20)

SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 

The 30-Month Migration

  • 1. The 30-Month Migration Glenn Vanderburg VP of Engineering, First.io @glv
  • 2. Changing Your Data Model
 Is Hard!
  • 3. Living With a Poor Data Model
 Is Also Hard!
  • 4. Four Stages, 2½ Years 15Nov Today 9Feb Stage 2 Stage 3Stage 1 Stage 4 20Jun 18Nov 15Jan 26Jan 20Mar 7Dec 2016 2017 2018 2019
  • 5. 30 months! 15Nov Today 9Feb Stage 2 Stage 3Stage 1 Stage 4 20Jun 18Nov 15Jan 26Jan 20Mar 7Dec 2016 2017 2018 2019
  • 6. 30 months! 15Nov Today 9Feb Stage 2 Stage 3Stage 1 Stage 4 20Jun 18Nov 15Jan 26Jan 20Mar 7Dec 2016 2017 2018 2019
  • 7. A Technical Talk,
 with Mostly Non-Technical Lessons
  • 10.
  • 16. –Gerald Weinberg Things are the way they are because they got that way.
  • 18. Isaac Jane Kathy Lee Mike Nancy Oscar Pat Quentin Robert Sally Tina Abby Bill Abby Bill realtors Realtors′ ContactRelationships Contacts Other tables here: subscriptions, payments, notes, appointments, etc. Many other relationships between contacts, and between contacts and their attributes. Postgres Neo4j
  • 20. Stage 1:
 From Neo4j to Postgres
  • 21. What Drove the Change? • Neo4j/Cypher not as familiar to developers as Postgres/SQL • Neo4j ActiveModel gem less mature and feature-rich than ActiveRecord • Neo4j drivers less mature, less well optimized • Some features required cross-database joins (slow, memory intensive)
  • 22. Making a Plan • Realtor-by-realtor migration • An importer job that would import a realtor’s Neo4j data into Postgres • The importer needed to avoid duplicating shared data that had already been imported for another realtor • We would use a feature flag to indicate whether a realtor had been migrated or not
  • 23. Schema Definition • We knew our data in Neo4j was messy. • Neo4j’s referential integrity features weaker than Postgres’ • We weren’t skilled at using the features Neo4j did have • We got very serious about data integrity in the schema: • foreign keys, ON CASCADE, check constraints, exclusion constraints • This was enormously helpful!
  • 24. Switching Models • The feature flag needed to be readily available everywhere, so we set a thread-local variable in middleware. • A lot of queries start off by calling class methods on a model class • We needed that model class to be the ActiveRecord model if the current realtor’s feature flag was set, and the Neo4j model otherwise Person.find(35) # or Property.where(zip5: "75238")
  • 25. Switching Models • Exploiting Ruby’s dynamic nature, we were able to build models that could be Neo4j or ActiveRecord models, depending on the feature flag. class Contact extend SwitchingModel switch_between(::ContactV1, ::ContactV2) end class ContactV1 include Neo4j::ActiveNode self.mapped_label_name = "Contact" # ... Neo4j::ActiveNode model code end class ContactV2 < ApplicationRecord self.table_name = :contacts # ... ActiveRecord model code end
  • 26. Switching Models module SwitchingModel def switch_between(v1_model, v2_model)   @_v1_model = v1_model  @_v2_model = v2_model  end  private  def _v2_mode?    Thread.current.thread_variable_get(:moved_to_postgres) || ENV['FORCE_V2_FEATURE_FLAG'] == '1'  end def _switch  return @_v2_model if _v2_mode?   @_v1_model   end  end 
  • 27. Switching Models module SwitchingModel def method_missing(meth, *args, &blk)    _switch.send(meth, *args, &blk)    end  def const_missing(name)  _switch.const_get(name)    end def new(*args) _switch.new(*args)    end  private  # ... end
  • 28. Scopes and More Scopes • A lot of queries contained Cypher fragments • Converting those to scopes allowed controllers to use the same queries, whether the feature flag was set or not • Built a rich vocabulary of scopes that has served us well ever since
  • 29. Testing • Environment variable override of feature flag • Rake tasks for running two sets of specs • Separate sets of factories • CI running both sets • Lots of comparison testing by developers • Whole company QA swarm in staging
  • 30. Tracking Progress • Excellent advice from Jess Martin, our CTO • Added an RSpec custom formatter to output total number of v2 specs vs. number of passing v2 specs. • Those went into a spreadsheet with a chart:
  • 31. Executing • Select employees first (those not doing sales and demos) • Rest of employees • Friendly customers (who would inform us of issues) • Rest of active customers • The whole process took about three weeks
  • 32. Finishing the Job • After the initial round of employee and select customer migrations, we kicked off the first full batch of customers. • All of a sudden, I had nothing to do! • “I may as well start on the PR to rip out all the V1 and transitional code …” • 10 hours later:
  • 34. Stage 2:
 Change Primary Keys to Integers
  • 35. What Drove the Change? • Postgres UUID primary keys work just fine. • Harder to remember, vdiff, type • Didn’t become an issue until we needed to start tracking source info for a different table that had an integer primary key. • We track sources using a polymorphic join table (sourcings).
  • 36. A Spike ⭐ id ⭐ first … abc Joe def Susan ghi Rachel jkl Todd mno Melanie contact_names
  • 37. A Spike id first … integer_id abc Joe 1 def Susan 2 ghi Rachel 3 jkl Todd 4 mno Melanie 5 ⭐ id ⭐ contact_names
  • 38. A Spike uuid first … integer_id abc Joe 1 def Susan 2 ghi Rachel 3 jkl Todd 4 mno Melanie 5 contact_names
  • 39. A Spike uuid first … id abc Joe 1 def Susan 2 ghi Rachel 3 jkl Todd 4 mno Melanie 5 ⭐ id ⭐ contact_names
  • 40. Problem: Foreign Key References ⭐ id ⭐ … abc def ghi jkl mno properties property_notes ⭐ property_id ⭐ … jkl def abc mno ghi
  • 41. Problem: Foreign Key References ⭐ id ⭐ … integer_id abc 1 def 2 ghi 3 jkl 4 mno 5 properties property_notes ⭐ property_id ⭐ … jkl def abc mno ghi
  • 42. Problem: Foreign Key References id … integer_id abc 1 def 2 ghi 3 jkl 4 mno 5 properties property_notes property_id … int_property_id jkl 4 def 2 abc 1 mno 5 ghi 3 ⭐ id ⭐ ⭐ property_id ⭐
  • 43. Problem: Foreign Key References id … integer_id abc 1 def 2 ghi 3 jkl 4 mno 5 properties property_notes property_id … int_property_id jkl 4 def 2 abc 1 mno 5 ghi 3
  • 44. Problem: Foreign Key References uuid … integer_id abc 1 def 2 ghi 3 jkl 4 mno 5 properties property_notes property_id … int_property_id jkl 4 def 2 abc 1 mno 5 ghi 3
  • 45. Problem: Foreign Key References uuid … integer_id abc 1 def 2 ghi 3 jkl 4 mno 5 properties property_notes … int_property_id 4 2 1 5 3
  • 46. Problem: Foreign Key References uuid … id abc 1 def 2 ghi 3 jkl 4 mno 5 properties property_notes … int_property_id 4 2 1 5 3
  • 47. Problem: Foreign Key References uuid … id abc 1 def 2 ghi 3 jkl 4 mno 5 properties property_notes … property_id 4 2 1 5 3
  • 48. Problem: Foreign Key References uuid … id abc 1 def 2 ghi 3 jkl 4 mno 5 properties property_notes … property_id 4 2 1 5 3 ⭐ property_id ⭐⭐ id ⭐
  • 49. Problem: Polymorphic Tables • Remember, this started because of a polymorphic join table, sourcings • Required converting all tables referenced by the polymorphic table at once • Ended up with 5 separate clusters of tables. • Wrote migration helpers to manage the details and make things reversible.
  • 50. Discovering Constraints • You can query anything about the schema from a set of internal tables and views • Example: finding all foreign key references to the contacts table: • You can do similar things for indexes and other kinds of constraints SELECT * FROM information_schema.constraint_column_usage WHERE table_name = 'contacts' AND column_name = 'id' AND constraint_name <> 'contacts_pkey'
  • 51. Plan and Wait • Output of the spike: • 3 complex migration helpers • 5 migrations • Ended up waiting 5 months before the pain outweighed the risk
  • 52. Five Big Migrations • Simple case easy: • Harder cases not so easy: • Worst case: 3 primary keys, 28 foreign keys, 4 polymorphic tables … all in one migration. fix_uuid_primary_key :contact_names fix_uuid_primary_key :avatars fix_uuid_primary_key :properties fix_uuid_foreign_key :properties, :property_notes, on_delete: :cascade fix_uuid_polymorphic_association :sourcings, :sourceable, targets: [:avatars, :properties]
  • 53. From Spike to Solution • Careful review of the migrations and helpers • Ran the migrations many, many times on clone of production DB • Run, fix error, repeat. (Very thankful for Postgres transactional DDL!) • Fixing error usually meant figuring out how to reflect on some new kind of dependency in Postgres and update the helper to deal with it. • Sometimes meant just coding a workaround for an odd case.
  • 54. Being Careful • Ran the migrations in staging for timings • We had the luxury of downtime! • But we wanted to understand how long each maintenance window would be. • Made them reversible! • We planned for never having to reverse them, including careful testing and random spot-checks in the migrations. • But we also made sure they could be reversed (including round-trip testing of both schema and table contents).
  • 55. Being Careful • Build correctness checking into the migration helpers • Remember: we kept the uuid column • At start of change: store random sample of records • After change: find those records and ensure they still refer to same UUID • Finally deployed on five consecutive weekends (simplest first)
  • 59. What Drove the Change? • Nearly every query had to be filtered based on source • Extra complexity • Joining through polymorphic table was costly • Sooner or later we would miss it and violate data privacy
  • 60. A Spike • Doing this in Ruby was fairly straightforward … but very slow (about a day per realtor) • Doing it in SQL required fairly advanced skills … but took about ten minutes per realtor • As with stage 1, decided on a user-by-user approach
  • 61.
  • 62. The Strategy id: 1
 Realtor Alice Realtor Bill Realtor Carl id: 1
 id: 3
 id: 2
 realtors contact_relationships contacts Name: Nancy
 contact_id: 1 Email: nancy@example.com
 contact_id: 1 • Simple example: one contact shared by three realtors.
  • 63. The Strategy id: 1
 Realtor Alice Realtor Bill Realtor Carl id: 1
 old_contact_id: 1 id: 3
 old_contact_id: 1 id: 2
 old_contact_id: 1 realtors contact_relationships contacts Name: Nancy
 contact_id: 1 Email: nancy@example.com
 contact_id: 1 • First, add old_contact_id column to contact_relationships • Populate it with current value of contact_id
  • 64. The Strategy id: 1
 contact_relationship_id: " Realtor Alice Realtor Bill Realtor Carl id: 1
 old_contact_id: 1 id: 3
 old_contact_id: 1 id: 2
 old_contact_id: 1 realtors contact_relationships contacts Name: Nancy
 contact_id: 1 Email: nancy@example.com
 contact_id: 1 uniqueness constraint
 on contact_relationship_id • Next, add contact_relationship_id column to contacts • Populate it with NULL (represented as ") • Add uniqueness constraint for that column
  • 65. id: 1
 old_contact_id: 1 The Strategy id: 1
 contact_relationship_id: " Realtor Alice Realtor Bill Realtor Carl id: 3
 old_contact_id: 1 id: 2
 old_contact_id: 1 realtors contact_relationships contacts Name: Nancy
 contact_id: 1 Email: nancy@example.com
 contact_id: 1 uniqueness constraint
 on contact_relationship_id
  • 66. The Strategy id: 1
 contact_relationship_id: 1 Realtor Alice Realtor Bill Realtor Carl id: 1
 old_contact_id: 1 id: 3
 old_contact_id: 1 id: 2
 old_contact_id: 1 realtors contact_relationships contacts Name: Nancy
 contact_id: 1 Email: nancy@example.com
 contact_id: 1 uniqueness constraint
 on contact_relationship_id UPDATE contacts SET contact_relationship_id = contact_relationships.id FROM contact_relationships WHERE contacts.id = contact_relationships.contact_id AND contact_relationships.realtor_id = 1 AND contacts.contact_relationship_id IS NULL • Update contact_relationship_id IF it’s NULL
  • 67. The Strategy id: 1
 contact_relationship_id: 1 Realtor Alice Realtor Bill Realtor Carl id: 1
 old_contact_id: 1 id: 3
 old_contact_id: 1 id: 2
 old_contact_id: 1 realtors contact_relationships contacts Name: Nancy
 contact_id: 1 Email: nancy@example.com
 contact_id: 1 uniqueness constraint
 on contact_relationship_id • Now INSERT into contacts for each of Alice’s contact_relationships • ON CONFLICT just set updated_at on the existing one • and then UPDATE contact_relationships to point to the new contact records ?
  • 68. INSERT with ON CONFLICT WITH new_contacts AS ( INSERT INTO contacts (cr_id, created_at, updated_at) ( SELECT cr.id AS cr_id, cr.created_at, cr.updated_at FROM contacts INNER JOIN contact_relationships cr ON contact.id = cr.contact_id WHERE cr.realtor_id = 1 AND contacts.contact_relationship_id IS NULL ORDER BY cr_id ASC ) ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness DO UPDATE SET updated_at = EXCLUDED.updated_at RETURNING *) UPDATE contact_relationships SET contact_id = new_contacts.id FROM new_contacts WHERE contact_relationships.id = new_contacts.contact_relationship_id;
  • 69. INSERT with ON CONFLICT WITH new_contacts AS ( INSERT INTO contacts (cr_id, created_at, updated_at) ( SELECT cr.id AS cr_id, cr.created_at, cr.updated_at FROM contacts INNER JOIN contact_relationships cr ON contact.id = cr.contact_id WHERE cr.realtor_id = 1 AND contacts.contact_relationship_id IS NULL ORDER BY cr_id ASC ) ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness DO UPDATE SET updated_at = EXCLUDED.updated_at RETURNING *) UPDATE contact_relationships SET contact_id = new_contacts.id FROM new_contacts WHERE contact_relationships.id = new_contacts.contact_relationship_id;
  • 70. INSERT with ON CONFLICT WITH new_contacts AS ( INSERT INTO contacts (cr_id, created_at, updated_at) ( SELECT cr.id AS cr_id, cr.created_at, cr.updated_at FROM contacts INNER JOIN contact_relationships cr ON contact.id = cr.contact_id WHERE cr.realtor_id = 1 AND contacts.contact_relationship_id IS NULL ORDER BY cr_id ASC ) ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness DO UPDATE SET updated_at = EXCLUDED.updated_at RETURNING *) UPDATE contact_relationships SET contact_id = new_contacts.id FROM new_contacts WHERE contact_relationships.id = new_contacts.contact_relationship_id;
  • 71. INSERT with ON CONFLICT WITH new_contacts AS ( INSERT INTO contacts (cr_id, created_at, updated_at) ( SELECT cr.id AS cr_id, cr.created_at, cr.updated_at FROM contacts INNER JOIN contact_relationships cr ON contact.id = cr.contact_id WHERE cr.realtor_id = 1 AND contacts.contact_relationship_id IS NULL ORDER BY cr_id ASC ) ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness DO UPDATE SET updated_at = EXCLUDED.updated_at RETURNING *) UPDATE contact_relationships SET contact_id = new_contacts.id FROM new_contacts WHERE contact_relationships.id = new_contacts.contact_relationship_id;
  • 72. INSERT with ON CONFLICT WITH new_contacts AS ( INSERT INTO contacts (cr_id, created_at, updated_at) ( SELECT cr.id AS cr_id, cr.created_at, cr.updated_at FROM contacts INNER JOIN contact_relationships cr ON contact.id = cr.contact_id WHERE cr.realtor_id = 1 AND contacts.contact_relationship_id IS NULL ORDER BY cr_id ASC ) ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness DO UPDATE SET updated_at = EXCLUDED.updated_at RETURNING *) UPDATE contact_relationships SET contact_id = new_contacts.id FROM new_contacts WHERE contact_relationships.id = new_contacts.contact_relationship_id;
  • 73. INSERT with ON CONFLICT WITH new_contacts AS ( INSERT INTO contacts (cr_id, created_at, updated_at) ( SELECT cr.id AS cr_id, cr.created_at, cr.updated_at FROM contacts INNER JOIN contact_relationships cr ON contact.id = cr.contact_id WHERE cr.realtor_id = 1 AND contacts.contact_relationship_id IS NULL ORDER BY cr_id ASC ) ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness DO UPDATE SET updated_at = EXCLUDED.updated_at RETURNING *) UPDATE contact_relationships SET contact_id = new_contacts.id FROM new_contacts WHERE contact_relationships.id = new_contacts.contact_relationship_id;
  • 74. The Strategy id: 1
 contact_relationship_id: 1 Realtor Alice Realtor Bill Realtor Carl id: 1
 old_contact_id: 1 id: 3
 old_contact_id: 1 id: 2
 old_contact_id: 1 realtors contact_relationships contacts Name: Nancy
 contact_id: 1 Email: nancy@example.com
 contact_id: 1 uniqueness constraint
 on contact_relationship_id • INSERT into contacts for each of Alice’s contact_relationships • ON CONFLICT just set updated_at on the existing one • and then UPDATE contact_relationships to point to the new contact records id: 1001
 contact_relationship_id: 1 X
  • 75. The Strategy id: 1
 contact_relationship_id: 1 Realtor Alice Realtor Bill Realtor Carl id: 1
 old_contact_id: 1 id: 3
 old_contact_id: 1 id: 2
 old_contact_id: 1 realtors contact_relationships contacts Name: Nancy
 contact_id: 1 Email: nancy@example.com
 contact_id: 1 uniqueness constraint
 on contact_relationship_id • That time, nothing happened, because Alice was the first realtor for contact 1.
  • 76. The Strategy id: 1
 contact_relationship_id: 1 Realtor Alice Realtor Bill Realtor Carl id: 1
 old_contact_id: 1 id: 3
 old_contact_id: 1 id: 2
 old_contact_id: 1 realtors contact_relationships contacts Name: Nancy
 contact_id: 1 Email: nancy@example.com
 contact_id: 1 uniqueness constraint
 on contact_relationship_id • Now let’s try Bill.
  • 77. The Strategy id: 1
 contact_relationship_id: 1 Realtor Alice Realtor Bill Realtor Carl id: 1
 old_contact_id: 1 id: 3
 old_contact_id: 1 id: 2
 old_contact_id: 1 realtors contact_relationships contacts Name: Nancy
 contact_id: 1 Email: nancy@example.com
 contact_id: 1 uniqueness constraint
 on contact_relationship_id • We try to claim the contact for Bill by updating
 contacts.contact_relationship_id • But it isn’t NULL, so we don’t update it to 2 X
  • 78. The Strategy id: 1
 contact_relationship_id: 1 Realtor Alice Realtor Bill Realtor Carl id: 1
 old_contact_id: 1 id: 3
 old_contact_id: 1 id: 2
 old_contact_id: 1 realtors contact_relationships contacts Name: Nancy
 contact_id: 1 Email: nancy@example.com
 contact_id: 1 uniqueness constraint
 on contact_relationship_id • But the INSERT works because it doesn’t create a uniqueness violation id: 1001
 contact_relationship_id: 2
  • 79. The Strategy id: 1
 contact_relationship_id: 1 Realtor Alice Realtor Bill Realtor Carl id: 1
 old_contact_id: 1 id: 3
 old_contact_id: 1 id: 2
 old_contact_id: 1 realtors contact_relationships contacts Name: Nancy
 contact_id: 1 Email: nancy@example.com
 contact_id: 1 uniqueness constraint
 on contact_relationship_id • And then the UPDATE fixes up the contact_relationships record • But what about the attached attributes? id: 1001
 contact_relationship_id: 2
  • 80. The Strategy id: 1
 contact_relationship_id: 1 Realtor Alice Realtor Bill Realtor Carl id: 1
 old_contact_id: 1 id: 3
 old_contact_id: 1 id: 2
 old_contact_id: 1 realtors contact_relationships contacts Name: Nancy
 contact_id: 1 Email: nancy@example.com
 contact_id: 1 uniqueness constraint
 on contact_relationship_id • For each of Bill’s contacts where
 contact_relationships.old_contact_id != contacts.id,
 go copy all of the attached attributes from old_contact_id id: 1001
 contact_relationship_id: 2
  • 81. The Strategy id: 1
 contact_relationship_id: 1 Realtor Alice Realtor Bill Realtor Carl id: 1
 old_contact_id: 1 id: 3
 old_contact_id: 1 id: 2
 old_contact_id: 1 realtors contact_relationships contacts Name: Nancy
 contact_id: 1 Email: nancy@example.com
 contact_id: 1 uniqueness constraint
 on contact_relationship_id • For each of Bill’s contacts where
 contact_relationships.old_contact_id != contacts.id,
 go copy all of the attached attributes from old_contact_id • A lot of queries, but basically straightforward • Then move on to Carl id: 1001
 contact_relationship_id: 2 Email: nancy@example.com
 contact_id: 2 Name: Nancy
 contact_id: 2
  • 82. Being Careful • Again: ran these transformations against a clone of production • Run for a realtor, compare against that realtor’s production data • Complete run-through of all realtors in staging before moving on to production • During run-through, I plotted changes to table counts as a sanity check
  • 83.
  • 84.
  • 85. An OUTER JOIN should’ve been an INNER JOIN
  • 86.
  • 88. Stage 4:
 From Join Table to belongs_to
  • 89. What Drove the Change? • Everything’s just a little more complex with the join table • Requires constraints and integrity checks that wouldn’t be necessary without it • Another team member challenged me to get rid of it! • It really wasn’t causing us enough trouble to justify a big push • But I realized we could set this up to do opportunistically
  • 90. The Idea • Go ahead and add the direct contacts.realtor_id foreign key • Populate it to match the existing contact_relationships. • Then just make sure they stay consistent!
  • 91. Triggers • Rails developers are wary of stored procedures and triggers (for good reason) • But sometimes they’re exactly what you need. This is one of those times. • I had a lot of ignorance to overcome. • So I worked on a spike, curling up with the Postgres manual and experimenting …
  • 96. Triggers Are Difficult
 (for me, anyway) • For efficiency, control the conditions under which invoked • For correctness, decide before/after • Carefully write updates/inserts to only make changes if things are inconsistent
  • 97. The Plan: A 12-Step Epic • Step 1: build a way to track progress • Step 2: build a way to audit the activity of the triggers • Step 3: add contacts.realtor_id and triggers • Steps 4–6: move fields from contact_relationships to contacts • Steps 7–8: retargeting polymorphic associations • Steps 9-11: retargeting associations, scopes, and query fragments • Step 12: DROP TABLE contact_relationships
  • 98. Tracking Progress rg --count --ignore-file .rg_crprogress_ignore '[Cc]ontact_?[Rr]elationship' 
 | cut -d : -f 2 
 | sed '2,$s/$/+/; $s/$/p/' 
 | dc
  • 99. Auditing Trigger Activity • Updated the triggers to log behavior to new contact_relationship_trigger_actions table. • Utility script to audit this table for consistency occasionally. id action contact_relationship_id contact_id performed_update time 4995810 c_setrealtor 5622768 FALSE 2019-02-18 13:31:49.395671 4995811 cr_insert 10607228 5622768 TRUE 2019-02-18 13:31:49.395671 4995812 c_setrealtor 5622769 FALSE 2019-02-18 13:31:50.181528 4995813 cr_insert 10607230 5622769 TRUE 2019-02-18 13:31:50.181528 4995814 c_setrealtor 5622770 FALSE 2019-02-18 13:31:50.474147
  • 100. Executing • One step a week • Each took 8–10 hours, on average • Most deployments on weekends, even when no downtime required
  • 103. Slow and Steady • Incremental, “worst pain first” strategy • Contained risk • Enabled feature development • Produced enormous technical improvement over time
  • 104. Keep Looking Ahead • We were always looking for ways to improve the system • An “inventory of pain” helps you to identify which pain is the worst right now
  • 105. Each Stage Was Different! • Entirely different, creative solutions required at each step • Ruby magic • Migrations and database reflections • Fancy Postgres UPSERT (i.e., INSERT … ON CONFLICT) queries and CTEs • Triggers • Entirely different testing strategies, too. • There is no recipe. Find what works.
  • 106. Leverage Your Database • We Rails developers love ActiveRecord and Arel for queries. • But for all its problems, SQL is powerful. • Data and referential integrity protections can save you. • Without Postgres’ transactional DDL, the risk and effort would have been enormously greater. (I’d guess roughly tenfold.) • Stored procedures and triggers have their place.
  • 107. The Luxury of Downtime • We have the luxury of being able to schedule maintenance time. • If you can, do that. • If not, you have to explore other techniques. (It’s worth bringing in an experienced database consultant if you need to explore these.)
  • 108. Focus: The Two-Edged Sword • These kinds of tasks really benefit from intense focus. • But that kind of focus can keep you from seeing danger. • Make sure you come up for air and have someone looking over your shoulder.
  • 109. What Would We Do Differently? • If we had clearly understood our end goal, we could have done all of this in stage 1. • But we still thought we were building a social graph. • You can never be sure you understand the future of your business.
  • 110. What Would We Do Differently? • There is one mistake we could have avoided based on technical principles. • We should never have used UUID primary keys. • They are useful only if you need to distribute primary key creation. • Probably where contention on the primary key sequence is a bottleneck. • Maybe also when you need to provide a key with less latency than a DB round trip. • THAT’S IT.