Presenters: Michael Nelson, Development Manager at FamilySearch
A recent research project at FamilySearch.org pushed Cassandra to very high scale and performance limits in AWS using a real application. Come see how we achieved 250K reads/sec with latencies under 5 milliseconds on a 400-core cluster holding 6 TB of data while maintaining transactional consistency for users. We'll cover tuning of Cassandra's caches, other server-side settings, client driver, AWS cluster placement and instance types, and the tradeoffs between regular & SSD storage.
2. 2!
Outline!
• The App: FamilySearch Family Tree!
• The Test: Borland Silk Performer!
• The Findings:!
• Row Cache!
• Token Aware Driver!
• Networking Issues!
• Etc.!
3. 3!
What Is FamilySearch?!
• Familysearch.org Website!
• Very Large Single Pedigree (Family Tree)!
• Largest Collection of Free Genealogical Records!
• Largest Genealogical Library!
• The Church of Jesus Christ of Latter-day Saints
(Mormons)!
4. 4!
Why does FamilySearch exist?!
Visit http://mormon.org/family-history/!
!
5. 5!
Family Tree Data!
Family Tree: !
• 900M+ Person Records, Open-Edit!
• 500M+ Relationships, Open-Edit!
• 8.4B Change Log Entries, ~1M / day!
• 7TB in Cassandra (13TB in Oracle)!
• Dynamic OLTP system!
• Data-dependent performance issues!
6. 6!
Family Tree: Example 9 Gen Pedigree!
up
to
511
person
slots
Dynamic
content!
7. 7!
Family Tree: Example Pedigree App!
31+
persons
per
sec0on
Dynamic
content!
8. 8!
Family Tree: Example Ancestor Page!
10+
persons
in
families
100-‐1000+
changes
Dynamic
content!
9. 9!
Cassandra Reimplementation!
• Event-Sourced Data Model – journal / views!
• New Data Model – no indexes!
• New Consistency Model – satisfies consistency!
P1
JE
#8
P1
Views
A
B
P2
P2
Views
JE
#6
A
B
14. 14!
Row Cache = 35% More Throughput!
Default Key Cache:!
• Cached Disk Location!
• Data From Disk Cache!
• ~11ms Reads!
Row Cache:!
• Cached Row Contents!
• ~7ms Reads!
15. 15!
Configuring Row Cache!
cassandra.yaml:!
# Maximum size of the row cache in memory.
# Default value is 0, to disable row caching.
row_cache_size_in_mb: 32768
!
Enable For Each Table Explicitly:!
ALTER TABLE person_view WITH caching = 'ALL';
!
27. 27!
Contact Info!
Michael Nelson"
Development Manager!
nelsonmi@familysearch.org!
!
Thanks to FamilySearch team!!
!
Thanks to the awesome presenters & organizers at
#CassandraSummit!!