Presenter: Software Developer at Family Search
FamilySearch hosts a collaborative family tree with over a billion editable records. The tree currently serves as many as 10,000 concurrent users at peak weekly load. These users come from across the globe and collectively maintain and enhance the tree around the clock. Recent efforts to port the tree from a relational database to Cassandra have resulted in drastically improved performance and scalability. The database consists of more than 5 billion records in journaled form, and we anticipate having over 10TB of live data available for user view & edit, with that data size growing significantly as our user base grows. The dataset has resisted sharding in the past, so the port involved rethinking the core data model. The model we chose retains the consistency that our users demand, and is able to be implemented without requiring ACID transactions. Specifically, the consistency model we chose combined a Convergent and Commutative Replicated Data Type (CvRDT and CmRDT) with Cassandra's atomic batch implementation to form the basis for a consistency model that met the demanding needs of the family tree application.
Automating Google Workspace (GWS) & more with Apps Script
Â
Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra
1. 1
Š 2014 by Intellectual Reserve, Inc. All rights reserved.
Huge Online Genealogical Database
Driven By Cassandra
Cassandra Summit 2014
John Sumsion
2. 2
Outline
⢠Introduction to FamilySearch Family Tree
⢠Outline of Cassandra reimplementation
⢠Journal-based Consistency Model
⢠Experience with Cassandra
3. 3
What is FamilySearch?
Familysearch.org website
Very large single pedigree (Family Tree)
Largest collection of free genealogical records
Largest genealogical library
Family History Department of Church of Jesus
Christ of Latter-day Saints (known as Mormons)
4. 4
Why does FamilySearch exist?
Visit http://mormon.org/family-history/
5. 5
Family Tree
Records Indexing
Family
Tree
Memories
Community
Where it fits
11. 11
Family Tree
Records Indexing
Family
Tree
Memories
Community
Where it fits
12. 12
Family Tree Data
Family Tree:
⢠900M+ person records, open-edit
⢠500M+ relationships, open-edit
⢠8.4B change log entries, 100M+ per quarter
⢠Dynamic OLTP system
⢠Data-dependent performance issues
13. 13
Family Tree: Example 9 Gen Pedigree
up to 511 person slots
Dynamic content!
14. 14
Family Tree: Example Pedigree App
31+ persons per section
Dynamic content!
15. 15
Family Tree: Example Ancestor Page
10+ persons in families
100-1000+ changes
Dynamic content!
16. 16
Family Tree: Example Change History
100-1000+ changes
Dynamic content!
17. 17
Contents
⢠Introduction to FamilySearch Family Tree
⢠Outline of Cassandra reimplementation
⢠Journal-based Consistency Model
⢠Experience with Cassandra
18. 18
Performance & Scale
⢠Slow page views
⢠pedigree (500-3000ms for 3 generations)
⢠change history (2000+ms for first page of changes)
⢠large family view
⢠Query problems
⢠relationships connect persons, range scan by person id
⢠every person => person traversal is 200-300M btree scan
(global index)
⢠change history queries travers 8+B btree scan
(global index)
19. 19
Performance & Scale
⢠Query performance problems
Person
Relati
onship
Person
Wide range scan
Pedigree
Change History
Change
History
Wide range scan
20. 20
Cassandra Reimplementation
⢠selected Cassandra after extensive testing
⢠full data scale proof-of-concept & tests
⢠required: new data model (performance)
⢠required: new consistency model (critical!)
21. 21
Cassandra Reimplementation
⢠event-sourced data model â journal / views
⢠new data model â no indexes
⢠new consistency model â satisfies consistency
JE #8
P1 P1 Views
A B
JE #6
P2 P2 Views
A B
24. 24
Cassandra Reimplementation
⢠denormalized relationships
⢠exact duplication allows biderectional traversal
Person
/Rels
Person
/Rels
Person
Relatio
nship
Person
Wide query
P1 P2
R1
R2
R3
R5
R4
R2
R3
25. 25
Cassandra Reimplementation
⢠change history is a core feature
⢠denormalized change history
⢠optimizes for displaying recent changes
JE #8
P1 P1 Change History View
1000s of changes
(spread over multiple Cassandra cells)
Last 100-1000 changes
(local to a single Cassandra cell)
26. 26
Contents
⢠Introduction to FamilySearch Family Tree
⢠Outline of Cassandra reimplementation
⢠Journal-based Consistency Model
⢠Experience with Cassandra
30. 30
Journal-based Consistency Model
Journal
⢠write-once with quorum & C* batch
⢠denormalized byte-exact across
affected persons & relationships
⢠each entry stored in separate cell
(compaction required for fast journal reads)
Command Journal View
View
View
31. 31
Journal-based Consistency Model
Journal
⢠CmRDT (commutative replicated type)
⢠partitions converge without conflict
because of unique uuid
Command Journal View
View
View
33. 33
Journal-based Consistency Model
View
⢠multiple views for multiple uses
(person, person card, change history)
⢠populated by applying journal entries
⢠incrementally updated in steady state
⢠not canonical data, can be recalculated
Command Journal View
View
View
38. 38
Journal-based Consistency Model
View
⢠views have same schema as journal
⢠journal entries are written to view for
incremental refresh
⢠core of the consistency model
Command Journal View
View
View
39. 39
Journal-based Consistency Model
View
⢠CvRDT (convergent replicated type)
⢠partitions converge with conflict; resolved
by full view refresh from canonical journal
⢠steady state: one view of a given type per
entity
Command Journal View
View
View
40. 40
Journal-based Consistency Model
Command Journal View
View
View
P1 P1 Views
A B
JE #8 JE #8
A
(new)
B
(new)
JE #8
41. 41
Journal-based Consistency Model
⢠Performance & Scale
⢠lookup by partition key only, no indexes
⢠any cross-entity change happens in duplicate on all
⢠stored âcurrent-stateâ views â cheapest possible read
⢠custom views â tunable to different use cases
⢠disposable views â able to tweak view over time
42. 42
Journal-based Consistency Model
⢠Business Rule Enforcement
⢠Read / Write / Read & Revert
⢠pre-command checks prevent invalid changes
⢠write with appropriate quorum ensures consistent write
⢠post-command checks prevent business-rules conflicts
⢠administrative revert marks command as ânot applicableâ
and thereby causes full refresh which ignores changes
⢠appropriate quorum: depending on the change, either
LOCAL_QUORUM or EACH_QUORUM
43. 43
Journal-based Consistency Model
⢠Strong consistency
⢠command store â atomic capture of a single user action
⢠command handling â idempotent writes to journal,
picked up later even if interrupted
⢠no global lock needed for optimistic concurrency
⢠Read after write
⢠consistency ONE for normal reads
⢠quorum when the client knows itâs refreshing after write
44. 44
Journal-based Consistency Model
⢠Journal / View Concerns
⢠native support for change history
⢠no journal tombstones in steady state â write-once
⢠blob schema implementable on any db engine that
supports two-level keys (partition, composite)
⢠consistency model implementable on any db engine that
supports batches & quorum writes/reads
⢠view tombstones on every write, biggest concern
⢠leveled compaction?
⢠WISH: size-tiered compaction with data locality hoisting
45. 45
Contents
⢠Introduction to FamilySearch Family Tree
⢠Outline of Cassandra reimplementation
⢠Journal-based Consistency Model
⢠Experience with Cassandra
46. 46
Experience with Cassandra
⢠tested Community 1.2 and 2.0
⢠fantastic performance
⢠easy cloud setup
⢠great developer response
⢠easy to bulk load through CQL3
⢠harder to get running inside AWS VPC
47. 47
Experience with Cassandra
⢠Bulk import experience
⢠8.4B change log records => 5.8B journal entries (2.5TB lzo)
⢠âhi1.4xlargeâ cluster (2x 1TB SSDs)
⢠import through CQL was fast enough
⢠11h to import 5-node cluster (5h on 30-node cluster)
⢠140k writes / sec, fed from 128 writer threads
⢠20 records / unlogged batch write, 1-2k record size
⢠minimal post-import compaction (size-tiered)
⢠ended up with 3.5-4TB on C* disk after import
⢠OpsCenter â great visibility for tuning
⢠Community â harder to automate repairs, etc.
48. 48
Experience with Cassandra
⢠Full-scale load test experience
⢠got to 25x our peak hourly load on 25-28-node cluster
⢠production peak load included significant write load
⢠working-set size was about 2M persons in a month
⢠enabled row cache, ran almost entirely without disk access
⢠bottlenecked on interconnect socket w/ round robin client
⢠got 50% boost from token-aware, round robin client
⢠OpsCenter â great visibility for tuning
⢠Large SSD cluster â able to handle repair
during scale tests
49. 49
Experience with Cassandra
current system
cassandra
impl (1x, 10x, 20x)
50. 50
Experience with Cassandra
current system
cassandra
impl (1x, 10x, 20x)
LOG SCALE!
51. 51
Current Status
⢠still working on implementation & rollout
⢠migration, reconciliation, integrationâŚ
⢠consistency model code separate
52. 52
Contents
⢠Introduction to FamilySearch Family Tree
⢠Outline of Cassandra reimplementation
⢠Journal-based Consistency Model
⢠Experience with Cassandra
Questions?
53. 53
Contact Info
John Sumsion
Sr. Software Engineer
sumsionjg@familysearch.org
@jdsumsion
Thanks to the team at FamilySearch!
esp. Randy & James for doing the model
Thanks to the awesome presenters & organizers at
#CassandraSummit!