"In this session, we’ll detail Red Hat Storage Server data replication strategies for both near replication (LAN) and far replication (over WAN), and explain how replication has evolved over the last few years. You’ll learn about:
Past mechanisms.
Near replication (client-side replication).
Far replication using timestamps (xtime).
Present mechanisms.
Near replication (server side) built using quorum and journaling.
Faster far replication using journaling.
Unified replication.
Replication using snapshots.
Stripe replication using erasure coding."
13. Traditional replication using AFR
“Automatic file replication”
Client based replication
Entry, meta data and data based replication.
Automated Self healing in case bricks recover after failure.
14. AFR Sequence Diagram
Client 1
Client 2
Server A
Server B
Lock
Pre Op
Op
Post Op
Unlock
Lock (blocked) Pre Op
15. AFR improvements
In 3.4 release
Eager locking
Piggybacking
Server quorum
In 3.5 release
Granular self heal
In 3.6 release
Rewrite of the code
Pending counters
Self healing in the context of self
heal daemon
16. NSR – new style (aka server side) replication
Replication to the back end (brick processes)
Controlled by a designated “leader” also known as sweeper.
Advantages
Bandwidth usage of client network optimized for direct (fuse) mounts
Avoidance of split brain
Sweeper elected using majority principle.
Per term Changelog on the sweeper preseves the ordering of operations.
Variable consistency models
for trading consistency with performance.
17. NSR high level blocks
NSR client side translator
Sends IO to sweeper
Sweeper (leader)
Forwards IO to peers
Commits after all peer completion
Non sweeper (follower)
Accepts IO only from sweeper or reconciliation
Rejects IO from client (client retry)
Change log
Reconciliation
Makes use of membership to figure out terms missing.
Makes use of change logs for syncing the corresponding terms.
23. Crawling and xtime
Xtime
Inode changed time
Marked up to root (marker xlator)
Crawling/Scanning
Directory crawl and file synchronization
xtime(master) > xtime(slave)
Slave xtime maintained by master
26. Overview
Multi node
Distributed (parallel) synchronization
Replica failover
Change detection
Consumable journals
Data synchronization (configurable)
Rsync, tar+ssh (large number of small files)
Efficient processing
rename, delete, hardlink
27. Journaling
Journaling Translator (changelog)
Records FOP (efficiently) local to a brick
Data, Entry, Metadata
Change detection : O(1) relative to number of changes
Consumer library (libgfchangelog)
Per brick
Publish/Subscribe mechanism
Journals periodically published