Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Ceph Day San Jose - HA NAS with CephFS
1. HA NAS with CephFSWyllys Ingersoll – Keeper Technology, LLC
2. Data and Storage Management Experts
Focus on IC & Commercial Customers for 12 Years
• Mul?-PB Enterprise Systems
• Imagery, Computer Forensics, Big Data
• High Volume/High Velocity Data Analysis
• Full Solu?on Provider
• Keeper Products + Partner Products
Introductions
Keeper Technology
3. • How we implemented HA NAS gateways using
cephfs
• Cluster configura?on
• SoPware used
• Issues encountered
• Performance Sta?s?cs
Overview
Overview
3/17/17 - 3
4. • Ceph Jewel 10.2.5 on Ubuntu-based Linux
• 6 Storage servers, ~ 80 TB Usable (3-copy)
• ~15 OSD per server
• 2, 3, and 4TB 7200RPM spinning drives (no SSD)
• 2 Gateways
• HP DL380 G9 Servers w/48GB RAM
• 3 Monitors + 3 MDS Servers
• HP DL360 G6 w/48GB RAM
Configuration
Cluster Configuration
3/17/17 - 4
5. • Provide NFS and/or SMB filesystem shares with
redundancy
• Failover if a gateway goes down
• Clients should not lose data
• Minimal interrup?on of client workflow
• “Seamless” for NFSv3 – others are WiP.
• Minimum 2 gateways required
Goals
HA NAS Goals
3/17/17 - 5
6. • SAMBA 4.5.5 w/CTDB support
• Built with “—with-cluster-support” flag
• CTDB is key to HA func?onality
• CTDB = Clustered Trivial Database
• Node monitor, failover, IP takeover
• Define mul?ple floa?ng IP Addresses in DNS
• CTDB configured with virtual shared IPs and real IP of
each GW.
• CTDB nodes communicate on private network
• Insecure protocol
Software
SMB
3/17/17 - 6
7. • Ganesha NFS 2.4.3
• User space NFS service, replaces kernel NFS
• Building code from github repo code
• Store ganesha config on shared FS
• Ex: /cephfs/nfs/ganesha.conf
• HA gateways must have common NFS Export IDs
• Use “VFS” FSAL (not “CEPH”) for Ganesha exports
Software
NFS
3/17/17 - 7
8. • Single FS per cluster (for now) - /cephfs
• Disable snapshots or restrict to top level
• Hard-linking bug prevents reliable snapshots on subdirs
• Prefer kernel mounted over fuse for performance
• Kernel 4.8.10
• Each export is a subdirectory
• /cephfs/exports/foobar
Software
NAS with Cephfs
3/17/17 - 8
10. • SAMBA locks stored on shared FS
• Ex: /cephfs/ctdb
• CTDB monitors SAMBA and Ganesha services
• Starts and stops as necessary via “callout” scripts
• CTDB assigns virtual IP addrs as needed
Software
NAS With Cephfs
3/17/17 - 10
11. • Kernel support for cephfs varies
• Using “bleeding edge” kernels for best results
• Cannot set quotas on subdirectories
• kernel cephfs limita?on
• Cannot limit size available for a single export
• Each share has max size = en?re cephfs data pool
• Snapshots only at top level
• Cannot snapshot each exported subdirectory
Issues
Issues and Problems
3/17/17 - 11
12. • mds_cache_size = 8,000,000 (default was 100k)
• Uses more RAM, but we have 48GB
• Avoid “failing to respond to cache pressure” errors
• Use “default” crush tunables (not “jewel”).
• Works beqer with older kernels
Issues
Adjustments
3/17/17 - 12
13. • FIO parameters
• Vary block sizes (4K, 64K, 1m, 4m)
• Vary # of jobs (1, 16, 32, 64)
• Iodepth = 1
• Read/write + randread/randwrite
• Ioengine = sync
• direct=1
• Simultaneous 4 dis?nct clients on 10GB link
Performance
Test Methodology
3/17/17 - 13
14. Performance
Performance Configuration
3/17/17 - 14
NAS Gateway
Client
1
Client
2
Client
3
Client
4
/cephfs
Shared FS (/cephfs)
mounted with NFS or SMB
/cephfs
/cephfs
/cephfs
/cephfs
FIO directIO used on each client
to read & write data to the share.
Ceph Cluster
23. • High Availability NAS is possible with cephfs
• Some issues remain
• Deep snapshots
• Quotas/limits on subdirs
• Performance is “OK”
• Standard NAS protocol limita?ons (NFS, SMB)
Summary
Summary
3/17/17 - 23
24. Thank You
| 21740 Beaumeade Circle | Suite 150 | Ashburn, VA 20147 | P [571] 333 2725 | F [703] 738 7231 | solu?ons@keepertech.com | www.keepertech.com
Thank You