SlideShare ist ein Scribd-Unternehmen logo
1 von 60
GlusterFS for SysAdmins
Justin Clift, GlusterFS Integration
Open Source and Standards team @ Red Hat
2013-04
JUSTIN CLIFT – JCLIFT@REDHAT.COM2
#whoami
● Experienced SysAdmin (Solaris and
Linux) for many, many years
● Mostly worked on Mission Critical
systems in corporate/enterprise
environments (eg. Telco, banking,
Insurance)
● Has been helping build Open Source
Communities after hours for many
years (PostgreSQL, OpenOffice)
● Dislikes networks being a bottleneck
(likes Infiniband. A lot :>)
● Joined Red Hat mid 2010
jclift@redhat.com
JUSTIN CLIFT – JCLIFT@REDHAT.COM3
Agenda
● Technology Overview
● Scaling Up and Out
● A Peek at GlusterFS Logic
● Redundancy and Fault Tolerance
● Data Access
● General Administration
● Use Cases
● Common Pitfalls
GlusterFS for SysAdmins
Technology
Overview
JUSTIN CLIFT – JCLIFT@REDHAT.COM5
What is GlusterFS?
● POSIX-Like Distributed File System
● No Metadata Server
● Network Attached Storage (NAS)
● Heterogeneous Commodity Hardware
● Aggregated Storage and Memory
● Standards-Based – Clients, Applications, Networks
● Flexible and Agile Scaling
● Capacity – Petabytes and beyond
● Performance – Thousands of Clients
● Single Global Namespace
JUSTIN CLIFT – JCLIFT@REDHAT.COM6
GlusterFS vs. Traditional Solutions
● A basic NAS has limited scalability and redundancy
● Other distributed filesystems limited by metadata
● SAN is costly & complicated but high performance &
scalable
● GlusterFS
● Linear Scaling
● Minimal Overhead
● High Redundancy
● Simple and Inexpensive Deployment
GlusterFS for SysAdmins
Technology
Stack
JUSTIN CLIFT – JCLIFT@REDHAT.COM8
Terminology
● Brick
● A filesystem mountpoint
● A unit of storage used as a GlusterFS building block
● Translator
● Logic between the bits and the Global Namespace
● Layered to provide GlusterFS functionality
● Volume
● Bricks combined and passed through translators
● Node / Peer
● Server running the gluster daemon and sharing
volumes
JUSTIN CLIFT – JCLIFT@REDHAT.COM9
Foundation Components
● Private Cloud (Datacenter)
● Common Commodity x86_64 Servers
● Public Cloud
● Amazon Web Services (AWS)
● EC2 + EBS
JUSTIN CLIFT – JCLIFT@REDHAT.COM10
Disk, LVM, and Filesystems
● Direct-Attached Storage (DAS)
-or-
● Just a Bunch Of Disks (JBOD)
● Hardware RAID
● Logical Volume Management (LVM)
● XFS, EXT3/4, BTRFS
● Extended attributes (xattr's) support required
● XFS is strongly recommended for GlusterFS
JUSTIN CLIFT – JCLIFT@REDHAT.COM11
Gluster Components
● glusterd
● Elastic volume management daemon
● Runs on all export servers
● Interfaced through gluster CLI
● glusterfsd
● GlusterFS brick daemon
● One process for each brick
● Managed by glusterd
JUSTIN CLIFT – JCLIFT@REDHAT.COM12
Gluster Components
● glusterfs
● NFS server daemon
● FUSE client daemon
● mount.glusterfs
● FUSE native mount tool
● gluster
● Gluster Console Manager (CLI)
JUSTIN CLIFT – JCLIFT@REDHAT.COM13
Data Access Overview
● GlusterFS Native Client
● Filesystem in Userspace (FUSE)
● NFS
● Built-in Service
● SMB/CIFS
● Samba server required
● Unified File and Object (UFO)
● Simultaneous object-based access
JUSTIN CLIFT – JCLIFT@REDHAT.COM14
Putting it All Together
GlusterFS for SysAdmins
Scaling
JUSTIN CLIFT – JCLIFT@REDHAT.COM16
Scaling Up
● Add disks and filesystems to a node
● Expand a GlusterFS volume by adding bricks
XFS
JUSTIN CLIFT – JCLIFT@REDHAT.COM17
Scaling Out
● Add GlusterFS nodes to trusted pool
● Add filesystems as new bricks
GlusterFS for SysAdmins
Under
the Hood
JUSTIN CLIFT – JCLIFT@REDHAT.COM19
Elastic Hash Algorithm
● No central metadata
● No Performance Bottleneck
● Eliminates risk scenarios
● Location hashed intelligently on path and filename
● Unique identifiers, similar to md5sum
● The “Elastic” Part
● Files assigned to virtual volumes
● Virtual volumes assigned to multiple bricks
● Volumes easily reassigned on the fly
JUSTIN CLIFT – JCLIFT@REDHAT.COM20
Translators
GlusterFS for SysAdmins
Distribution
and Replication
JUSTIN CLIFT – JCLIFT@REDHAT.COM22
Distributed Volume
● Files “evenly” spread across bricks
● File-level RAID 0
● Server/Disk failure could be catastrophic
JUSTIN CLIFT – JCLIFT@REDHAT.COM23
Replicated Volume
● Copies files to multiple bricks
● File-level RAID 1
JUSTIN CLIFT – JCLIFT@REDHAT.COM24
Distributed Replicated Volume
● Distributes files across replicated bricks
● RAID 1 plus improved read performance
JUSTIN CLIFT – JCLIFT@REDHAT.COM25
Geo Replication
● Asynchronous across LAN, WAN, or Internet
● Master-Slave model -- Cascading possible
● Continuous and incremental
● Data is passed between defined master and slave only
JUSTIN CLIFT – JCLIFT@REDHAT.COM26
Replicated Volumes vs Geo-replication
Replicated Volumes Geo-replication
Mirrors data across clusters Mirrors data across geographically
distributed clusters
Provides high-availability Ensures backing up of data for disaster
recovery
Synchronous replication (each and every
file operation is sent across all the bricks)
Asynchronous replication (checks for the
changes in files periodically and syncs
them on detecting differences)
GlusterFS for SysAdmins
Layered
Functionality
JUSTIN CLIFT – JCLIFT@REDHAT.COM28
Striped Volumes
● Individual files split among bricks
● Similar to RAID 0
● Limited Use Cases – HPC Pre/Post Processing
JUSTIN CLIFT – JCLIFT@REDHAT.COM29
Distributed Striped Volume
● Files striped across two or more nodes
● Striping plus scalability
JUSTIN CLIFT – JCLIFT@REDHAT.COM30
Striped Replicated Volume
● GlusterFS 3.3+
● Similar to RAID 10 (1+0)
JUSTIN CLIFT – JCLIFT@REDHAT.COM31
Distributed Striped Replicated Volume
● GlusterFS 3.3+
● Limited Use Cases – Map Reduce
GlusterFS for SysAdmins
Data Access
JUSTIN CLIFT – JCLIFT@REDHAT.COM33
GlusterFS Native Client (FUSE)
● FUSE kernel module allows the filesystem to be built
and operated entirely in userspace
● Specify mount to any GlusterFS node
● Native Client fetches volfile from mount server, then
communicates directly with all nodes to access data
● Recommended for high concurrency and high write
performance
● Load is inherently balanced across distributed volumes
JUSTIN CLIFT – JCLIFT@REDHAT.COM34
NFS
● Standard NFS v3 clients
● Note: Mount with vers=3 option
● Standard automounter is supported
● Mount to any node, or use a load balancer
● GlusterFS NFS server includes Network Lock Manager
(NLM) to synchronize locks across clients
● Better performance for reading many small files from a
single client
● Load balancing must be managed externally
JUSTIN CLIFT – JCLIFT@REDHAT.COM35
SMB/CIFS
● GlusterFS volume is first mounted with the Native
Client
● Redundantly on the GlusterFS peer
-or-
● On an external server
● Native mount point is then shared via Samba
● Must be setup on each node you wish to connect to via
CIFS
● Load balancing must be managed externally
GlusterFS for SysAdmins
General
Administration
JUSTIN CLIFT – JCLIFT@REDHAT.COM37
Preparing a Brick
# lvcreate -L 100G -n lv_brick1 vg_server1
# mkfs -t xfs -i size=512 /dev/vg_server1/lv_brick1
# mkdir /brick1
# mount /dev/vg_server1/lv_brick1 /brick1
# echo '/dev/vg_server1/lv_brick1 /brick1 xfs defaults 1 2' >> /etc/fstab
JUSTIN CLIFT – JCLIFT@REDHAT.COM38
Adding Nodes (peers) and Volumes
gluster> peer probe server3
gluster> peer status
Number of Peers: 2
Hostname: server2
Uuid: 5e987bda-16dd-43c2-835b-08b7d55e94e5
State: Peer in Cluster (Connected)
Hostname: server3
Uuid: 1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7
State: Peer in Cluster (Connected)
gluster> volume create my-dist-vol server2:/brick2 server3:/brick3
gluster> volume info my-dist-vol
Volume Name: my-dist-vol
Type: Distribute
Status: Created
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: server2:/brick2
Brick2: server3:/brick3
gluster> volume start my-dist-vol
Distributed Volume
Peer Probe
JUSTIN CLIFT – JCLIFT@REDHAT.COM39
Distributed Striped Replicated Volume
gluster> volume create test-volume replica 2 stripe 2 transport tcp 
server1:/exp1 server1:/exp2 server2:/exp3 server2:/exp4 
server3:/exp5 server3:/exp6 server4:/exp7 server4:/exp8
Multiple bricks of a replicate volume are present on the same server. This setup is not
optimal.
Do you still want to continue creating the volume? (y/n) y
Creation of volume test-volume has been successful. Please start the volume to access
data.
<- test-volume<- test-volume
<- distributed files -><- distributed files ->
stripe 2stripe 2
replica 2replica 2
JUSTIN CLIFT – JCLIFT@REDHAT.COM40
Distributed Striped Replicated Volume
gluster> volume info test-volume
Volume Name: test-volume
Type: Distributed-Striped-Replicate
Volume ID: 8f8b8b59-d1a1-42fe-ae05-abe2537d0e2d
Status: Created
Number of Bricks: 2 x 2 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: server1:/exp1
Brick2: server2:/exp3
Brick3: server1:/exp2
Brick4: server2:/exp4
Brick5: server3:/exp5
Brick6: server4:/exp7
Brick7: server3:/exp6
Brick8: server4:/exp8
gluster> volume create test-volume stripe 2 replica 2 transport tcp 
server1:/exp1 server2:/exp3 server1:/exp2 server2:/exp4 
server3:/exp5 server4:/exp7 server3:/exp6 server4:/exp8
Creation of volume test-volume has been successful. Please start the volume to access
data.
JUSTIN CLIFT – JCLIFT@REDHAT.COM41
Manipulating Bricks in a Volume
gluster> volume add-brick my-dist-vol server4:/brick4
gluster> volume remove-brick my-dist-vol server2:/brick2 start
gluster> volume remove-brick my-dist-vol server2:/brick2 status
Node Rebalanced-files size scanned failures status
--------- ----------- ----------- ----------- ----------- ------------
localhost 16 16777216 52 0 in progress
192.168.1.1 13 16723211 47 0 in progress
gluster> volume remove-brick my-dist-vol server2:/brick2 commit
gluster> volume rebalance my-dist-vol fix-layout start
gluster> volume rebalance my-dist-vol start
gluster> volume rebalance my-dist-vol status
Node Rebalanced-files size scanned failures status
--------- ----------- ----------- ----------- ----------- ------------
localhost 112 15674 170 0 completed
10.16.156.72 140 3423 321 2 completed
JUSTIN CLIFT – JCLIFT@REDHAT.COM42
Migrating Data / Replacing Bricks
gluster> volume replace-brick my-dist-vol server3:/brick3 server5:/brick5 start
gluster> volume replace-brick my-dist-vol server3:/brick3 server5:/brick5 status
Current File = /usr/src/linux-headers-2.6.31-14/block/Makefile
Number of files migrated = 10567
Migration complete
gluster> volume replace-brick my-dist-vol server3:/brick3 server5:/brick5 commit
JUSTIN CLIFT – JCLIFT@REDHAT.COM43
Volume Options
gluster> volume set my-dist-vol auth.allow 192.168.1.*
gluster> volume set my-dist-vol auth.reject 10.*
gluster> volume set my-dist-vol nfs.volume-access read-only
gluster> volume set my-dist-vol nfs.disable on
gluster> volume set my-dist-vol features.read-only on
gluster> volume set my-dist-vol performance.cache-size 67108864
gluster> volume set my-dist-vol auth.allow 192.168.1.*
gluster> volume set my-dist-vol auth.reject 10.*
NFS
Auth
Other advanced options
JUSTIN CLIFT – JCLIFT@REDHAT.COM44
Volume Top Command
gluster> volume set my-dist-vol auth.allow 192.168.1.*
gluster> volume set my-dist-vol auth.reject 10.*
gluster> volume top my-dist-vol read brick server3:/brick3 list-cnt 3
Brick: server:/export/dir1
==========Read file stats========
read filename
call count
116 /clients/client0/~dmtmp/SEED/LARGE.FIL
64 /clients/client0/~dmtmp/SEED/MEDIUM.FIL
54 /clients/client2/~dmtmp/SEED/LARGE.FIL
● Many top commands are available for analysis of
files, directories, and bricks
● Read and write performance test commands available
● Perform active dd tests and measure throughput
JUSTIN CLIFT – JCLIFT@REDHAT.COM45
Volume Profiling
gluster> volume set my-dist-vol auth.allow 192.168.1.*
gluster> volume set my-dist-vol auth.reject 10.*
gluster> volume profile my-dist-vol start
gluster> volume profile my-dist-vol info
Brick: Test:/export/2
Cumulative Stats:
Block 1b+ 32b+ 64b+
Size:
Read: 0 0 0
Write: 908 28 8
...
%-latency Avg- Min- Max- calls Fop
latency Latency Latency
___________________________________________________________
4.82 1132.28 21.00 800970.00 4575 WRITE
5.70 156.47 9.00 665085.00 39163 READDIRP
11.35 315.02 9.00 1433947.00 38698 LOOKUP
11.88 1729.34 21.00 2569638.00 7382 FXATTROP
47.35 104235.02 2485.00 7789367.00 488 FSYNC
------------------
Duration : 335
BytesRead : 94505058
BytesWritten : 195571980
JUSTIN CLIFT – JCLIFT@REDHAT.COM46
Geo-Replication
# ssh-keygen -f /var/lib/glusterd/geo-replication/secret.pem
# ssh-copy-id -i /var/lib/glusterd/geo-replication/secret.pem repluser@slavehost1
gluster> volume geo-replication my-dist-vol repluser@slavehost1::my-dist-repl start
Starting geo-replication session between my-dist-vol & slavehost1:my-dist-repl has been
successful
gluster> volume geo-replication my-dist-vol status
MASTER SLAVE STATUS
--------------------------------------------------------------------------------
my-dist-vol ssh://repluser@slavehost1::my-dist-repl OK
Setup SSH Keys
Replicate Via SSH to Remote GlusterFS Volume
gluster> volume info my-dist-vol
...
Options Reconfigured:
geo-replication.indexing: on
Output of volume info Now Reflects Replication
GlusterFS for SysAdmins
Use Cases
JUSTIN CLIFT – JCLIFT@REDHAT.COM48
Common Solutions
● Media / Content Distribution Network (CDN)
● Backup / Archive / Disaster Recovery (DR)
● Large Scale File Server
● Home directories
● High Performance Computing (HPC)
● Infrastructure as a Service (IaaS) storage layer
JUSTIN CLIFT – JCLIFT@REDHAT.COM49
Hadoop – Map Reduce
● Access data within and outside of Hadoop
● No HDFS name node single point of failure / bottleneck
● Seamless replacement for HDFS
● Scales with the massive growth of big data
JUSTIN CLIFT – JCLIFT@REDHAT.COM50
CIC Electronic Signature Solutions
● Challenge
● Must leverage economics of the cloud
● Storage performance in the cloud too slow
● Need to meet demanding client SLA’s
● Solution
● Red Hat Storage Software Appliance
● Amazon EC2 and Elastic Block Storage (EBS)
● Benefits
● Faster development and delivery of new products
● SLA’s met with headroom to spare
● Accelerated cloud migration
● Scale-out for rapid and simple expansion
● Data is highly available for 24/7 client access
Hybrid Cloud: Electronic Signature Solutions
● Reduced time-to-
market for new
products
● Meeting all client SLAs
● Accelerating move to
the cloud
JUSTIN CLIFT – JCLIFT@REDHAT.COM51
Pandora Internet Radio
● Challenge
● Explosive user & title growth
● As many as 12 file formats for each song
● ‘Hot’ content and long tail
● Solution
● Three data centers, each with a six-node GlusterFS
cluster
● Replication for high availability
● 250+ TB total capacity
● Benefits
● Easily scale capacity
● Centralized management; one administrator to manage
day-to-day operations
● No changes to application
● Higher reliability
Private Cloud: Media Serving
● 1.2 PB of audio served
per week
● 13 million files
● Over 50 GB/sec peak
traffic
JUSTIN CLIFT – JCLIFT@REDHAT.COM52
Brightcove
• Over 1 PB currently in
Gluster
• Separate 4 PB project
in the works
Private Cloud: Media Serving
● Challenge
● Explosive customer & title growth
● Massive video in multiple locations
● Costs rising, esp. with HD formats
● Solution
● Complete scale-out based on commodity DAS/JBOD
and GlusterFS
● Replication for high availability
● 1PB total capacity
● Benefits
● Easily scale capacity
● Centralized management; one administrator to manage
day-to-day operations
● Higher reliability
● Path to multi-site
JUSTIN CLIFT – JCLIFT@REDHAT.COM53
Pattern Energy
• Rapid and advance
weather predictions
• Maximizing energy
assets
• Cost savings and
avoidance
High Performance Computing for Weather Prediction
●
Challenge
●
Need to deliver rapid advance weather predictions
●
Identify wind and solar abundance in advance
●
More effectively perform preventative maintenance and
repair
●
Solution
●
32 HP compute nodes
●
Red Hat SSA for high throughput and availability
●
20TB+ total capacity
●
Benefits
●
Predicts solar and wind patterns 3 to 5 days in advance
●
Maximize energy production and repair times
●
Avoid costs of outsourcing weather predictions
●
Solution has paid for itself many times over
GlusterFS for SysAdmins
Common
Pitfalls
JUSTIN CLIFT – JCLIFT@REDHAT.COM55
Split-Brain Syndrome
● Communication lost between replicated peers
● Clients write separately to multiple copies of a file
● No automatic fix
● May be subjective which copy is right – ALL may be!
● Admin determines the “bad” copy and removes it
● Self-heal will correct the volume
● Trigger a recursive stat to initiate
● Proactive self-healing in GlusterFS 3.3
JUSTIN CLIFT – JCLIFT@REDHAT.COM56
Quorum Enforcement
● Disallows writes (EROFS) on non-quorum peers
● Significantly reduces files affected by split-brain
● Preferred when data integrity is the priority
● Not preferred when application integrity is the priority
JUSTIN CLIFT – JCLIFT@REDHAT.COM57
Your Storage Servers are Sacred!
● Don't touch the brick filesystems directly!
● They're Linux servers, but treat them like appliances
● Separate security protocols
● Separate access standards
● Don't let your Jr. Linux admins in!
● A well-meaning sysadmin can quickly break your
system or destroy your data
GlusterFS for SysAdmins
Do it!
59
Do it!
● Build a test environment in VMs in just minutes!
● Get the bits:
● www.gluster.org has packages available for many
Linux distributions (CentOS, Fedora, RHEL,
Ubuntu)
Thank You!
● GlusterFS:
www.gluster.org
● Justin Clift - jclift@redhat.com
GlusterFS for SysAdmins
Slides Available at:
http://www.gluster.org/community/documentation/index.php/Presentations

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015
 
Leases and-caching final
Leases and-caching finalLeases and-caching final
Leases and-caching final
 
Gluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmapGluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmap
 
Gluster wireshark niels_de_vos
Gluster wireshark niels_de_vosGluster wireshark niels_de_vos
Gluster wireshark niels_de_vos
 
Gluster for sysadmins
Gluster for sysadminsGluster for sysadmins
Gluster for sysadmins
 
Gluster d2
Gluster d2Gluster d2
Gluster d2
 
Gluster technical overview
Gluster technical overviewGluster technical overview
Gluster technical overview
 
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
 
20160401 Gluster-roadmap
20160401 Gluster-roadmap20160401 Gluster-roadmap
20160401 Gluster-roadmap
 
Gluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephantGluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephant
 
Sdc 2012-challenges
Sdc 2012-challengesSdc 2012-challenges
Sdc 2012-challenges
 
Red Hat Gluster Storage : GlusterFS
Red Hat Gluster Storage : GlusterFSRed Hat Gluster Storage : GlusterFS
Red Hat Gluster Storage : GlusterFS
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmap
 
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized StoreGlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
 
Lisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionLisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introduction
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on gluster
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS Plans
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep Dive
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelona
 

Andere mochten auch

LizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-webLizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-web
Szymon Haly
 

Andere mochten auch (19)

Introduction to Open Source
Introduction to Open SourceIntroduction to Open Source
Introduction to Open Source
 
Hands On Gluster with Jeff Darcy
Hands On Gluster with Jeff DarcyHands On Gluster with Jeff Darcy
Hands On Gluster with Jeff Darcy
 
Developing apps and_integrating_with_gluster_fs_-_libgfapi
Developing apps and_integrating_with_gluster_fs_-_libgfapiDeveloping apps and_integrating_with_gluster_fs_-_libgfapi
Developing apps and_integrating_with_gluster_fs_-_libgfapi
 
Join the super_colony_-_feb2013
Join the super_colony_-_feb2013Join the super_colony_-_feb2013
Join the super_colony_-_feb2013
 
Debugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosDebugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vos
 
Gsummit apis-2013
Gsummit apis-2013Gsummit apis-2013
Gsummit apis-2013
 
Dedupe nmamit
Dedupe nmamitDedupe nmamit
Dedupe nmamit
 
Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalab
 
Qos
QosQos
Qos
 
Gluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmapGluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmap
 
Kkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summitKkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summit
 
LizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-webLizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-web
 
Gluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvGluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlv
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 
Building a cluster filesystem using distributed, directly-attached storage
Building a cluster filesystem using distributed, directly-attached storageBuilding a cluster filesystem using distributed, directly-attached storage
Building a cluster filesystem using distributed, directly-attached storage
 
Mesos & Marathon - Piloter les services de votre système
Mesos & Marathon - Piloter les services de votre systèmeMesos & Marathon - Piloter les services de votre système
Mesos & Marathon - Piloter les services de votre système
 
レッドハット グラスター ストレージ Red Hat Gluster Storage (Japanese)
レッドハット グラスター ストレージ Red Hat Gluster Storage (Japanese)レッドハット グラスター ストレージ Red Hat Gluster Storage (Japanese)
レッドハット グラスター ストレージ Red Hat Gluster Storage (Japanese)
 
Continuous delivery with jenkins, docker and exoscale
Continuous delivery with jenkins, docker and exoscaleContinuous delivery with jenkins, docker and exoscale
Continuous delivery with jenkins, docker and exoscale
 
Lcna example-2012
Lcna example-2012Lcna example-2012
Lcna example-2012
 

Ähnlich wie Glusterfs for sysadmins-justin_clift

A3Sec Advanced Deployment System
A3Sec Advanced Deployment SystemA3Sec Advanced Deployment System
A3Sec Advanced Deployment System
a3sec
 

Ähnlich wie Glusterfs for sysadmins-justin_clift (20)

High Availability Storage (susecon2016)
High Availability Storage (susecon2016)High Availability Storage (susecon2016)
High Availability Storage (susecon2016)
 
A3Sec Advanced Deployment System
A3Sec Advanced Deployment SystemA3Sec Advanced Deployment System
A3Sec Advanced Deployment System
 
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
 
Red Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructureRed Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructure
 
5 Levels of High Availability: From Multi-instance to Hybrid Cloud
5 Levels of High Availability: From Multi-instance to Hybrid Cloud5 Levels of High Availability: From Multi-instance to Hybrid Cloud
5 Levels of High Availability: From Multi-instance to Hybrid Cloud
 
5 levels of high availability from multi instance to hybrid cloud
5 levels of high availability  from multi instance to hybrid cloud5 levels of high availability  from multi instance to hybrid cloud
5 levels of high availability from multi instance to hybrid cloud
 
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GRGlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
ClustrixDB at Samsung Cloud
ClustrixDB at Samsung CloudClustrixDB at Samsung Cloud
ClustrixDB at Samsung Cloud
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
GlusterFs: a scalable file system for today's and tomorrow's big data
GlusterFs: a scalable file system for today's and tomorrow's big dataGlusterFs: a scalable file system for today's and tomorrow's big data
GlusterFs: a scalable file system for today's and tomorrow's big data
 
GlusterFS : un file system open source per i big data di oggi e domani - Robe...
GlusterFS : un file system open source per i big data di oggi e domani - Robe...GlusterFS : un file system open source per i big data di oggi e domani - Robe...
GlusterFS : un file system open source per i big data di oggi e domani - Robe...
 
GlusterFS And Big Data
GlusterFS And Big DataGlusterFS And Big Data
GlusterFS And Big Data
 
Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209
 
Gpfs introandsetup
Gpfs introandsetupGpfs introandsetup
Gpfs introandsetup
 
Performance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fsPerformance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fs
 
Rhel cluster basics 2
Rhel cluster basics   2Rhel cluster basics   2
Rhel cluster basics 2
 
1049: Best and Worst Practices for Deploying IBM Connections - IBM Connect 2016
1049: Best and Worst Practices for Deploying IBM Connections - IBM Connect 20161049: Best and Worst Practices for Deploying IBM Connections - IBM Connect 2016
1049: Best and Worst Practices for Deploying IBM Connections - IBM Connect 2016
 

Mehr von Gluster.org

nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravaranfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
Gluster.org
 
Facebook’s upstream approach to GlusterFS - David Hasson
Facebook’s upstream approach to GlusterFS  - David HassonFacebook’s upstream approach to GlusterFS  - David Hasson
Facebook’s upstream approach to GlusterFS - David Hasson
Gluster.org
 

Mehr von Gluster.org (20)

Automating Gluster @ Facebook - Shreyas Siravara
Automating Gluster @ Facebook - Shreyas SiravaraAutomating Gluster @ Facebook - Shreyas Siravara
Automating Gluster @ Facebook - Shreyas Siravara
 
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravaranfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
 
Facebook’s upstream approach to GlusterFS - David Hasson
Facebook’s upstream approach to GlusterFS  - David HassonFacebook’s upstream approach to GlusterFS  - David Hasson
Facebook’s upstream approach to GlusterFS - David Hasson
 
Throttling Traffic at Facebook Scale
Throttling Traffic at Facebook ScaleThrottling Traffic at Facebook Scale
Throttling Traffic at Facebook Scale
 
GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS  GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS
 
Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...
 
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
 
Data Reduction for Gluster with VDO
Data Reduction for Gluster with VDOData Reduction for Gluster with VDO
Data Reduction for Gluster with VDO
 
Releases: What are contributors responsible for
Releases: What are contributors responsible forReleases: What are contributors responsible for
Releases: What are contributors responsible for
 
RIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
RIO Distribution: Reconstructing the onion - Shyamsundar RanganathanRIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
RIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
 
Gluster and Kubernetes
Gluster and KubernetesGluster and Kubernetes
Gluster and Kubernetes
 
Native Clients, more the merrier with GFProxy!
Native Clients, more the merrier with GFProxy!Native Clients, more the merrier with GFProxy!
Native Clients, more the merrier with GFProxy!
 
Gluster: a SWOT Analysis
Gluster: a SWOT Analysis Gluster: a SWOT Analysis
Gluster: a SWOT Analysis
 
GlusterD-2.0: What's Happening? - Kaushal Madappa
GlusterD-2.0: What's Happening? - Kaushal MadappaGlusterD-2.0: What's Happening? - Kaushal Madappa
GlusterD-2.0: What's Happening? - Kaushal Madappa
 
Scalability and Performance of CNS 3.6
Scalability and Performance of CNS 3.6Scalability and Performance of CNS 3.6
Scalability and Performance of CNS 3.6
 
What Makes Us Fail
What Makes Us FailWhat Makes Us Fail
What Makes Us Fail
 
Gluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and futureGluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and future
 
Heketi Functionality into Glusterd2
Heketi Functionality into Glusterd2Heketi Functionality into Glusterd2
Heketi Functionality into Glusterd2
 
Architecture of the High Availability Solution for Ganesha and Samba with Kal...
Architecture of the High Availability Solution for Ganesha and Samba with Kal...Architecture of the High Availability Solution for Ganesha and Samba with Kal...
Architecture of the High Availability Solution for Ganesha and Samba with Kal...
 
Gluster Containerized Storage for Cloud Applications
Gluster Containerized Storage for Cloud ApplicationsGluster Containerized Storage for Cloud Applications
Gluster Containerized Storage for Cloud Applications
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Glusterfs for sysadmins-justin_clift

  • 1. GlusterFS for SysAdmins Justin Clift, GlusterFS Integration Open Source and Standards team @ Red Hat 2013-04
  • 2. JUSTIN CLIFT – JCLIFT@REDHAT.COM2 #whoami ● Experienced SysAdmin (Solaris and Linux) for many, many years ● Mostly worked on Mission Critical systems in corporate/enterprise environments (eg. Telco, banking, Insurance) ● Has been helping build Open Source Communities after hours for many years (PostgreSQL, OpenOffice) ● Dislikes networks being a bottleneck (likes Infiniband. A lot :>) ● Joined Red Hat mid 2010 jclift@redhat.com
  • 3. JUSTIN CLIFT – JCLIFT@REDHAT.COM3 Agenda ● Technology Overview ● Scaling Up and Out ● A Peek at GlusterFS Logic ● Redundancy and Fault Tolerance ● Data Access ● General Administration ● Use Cases ● Common Pitfalls
  • 5. JUSTIN CLIFT – JCLIFT@REDHAT.COM5 What is GlusterFS? ● POSIX-Like Distributed File System ● No Metadata Server ● Network Attached Storage (NAS) ● Heterogeneous Commodity Hardware ● Aggregated Storage and Memory ● Standards-Based – Clients, Applications, Networks ● Flexible and Agile Scaling ● Capacity – Petabytes and beyond ● Performance – Thousands of Clients ● Single Global Namespace
  • 6. JUSTIN CLIFT – JCLIFT@REDHAT.COM6 GlusterFS vs. Traditional Solutions ● A basic NAS has limited scalability and redundancy ● Other distributed filesystems limited by metadata ● SAN is costly & complicated but high performance & scalable ● GlusterFS ● Linear Scaling ● Minimal Overhead ● High Redundancy ● Simple and Inexpensive Deployment
  • 8. JUSTIN CLIFT – JCLIFT@REDHAT.COM8 Terminology ● Brick ● A filesystem mountpoint ● A unit of storage used as a GlusterFS building block ● Translator ● Logic between the bits and the Global Namespace ● Layered to provide GlusterFS functionality ● Volume ● Bricks combined and passed through translators ● Node / Peer ● Server running the gluster daemon and sharing volumes
  • 9. JUSTIN CLIFT – JCLIFT@REDHAT.COM9 Foundation Components ● Private Cloud (Datacenter) ● Common Commodity x86_64 Servers ● Public Cloud ● Amazon Web Services (AWS) ● EC2 + EBS
  • 10. JUSTIN CLIFT – JCLIFT@REDHAT.COM10 Disk, LVM, and Filesystems ● Direct-Attached Storage (DAS) -or- ● Just a Bunch Of Disks (JBOD) ● Hardware RAID ● Logical Volume Management (LVM) ● XFS, EXT3/4, BTRFS ● Extended attributes (xattr's) support required ● XFS is strongly recommended for GlusterFS
  • 11. JUSTIN CLIFT – JCLIFT@REDHAT.COM11 Gluster Components ● glusterd ● Elastic volume management daemon ● Runs on all export servers ● Interfaced through gluster CLI ● glusterfsd ● GlusterFS brick daemon ● One process for each brick ● Managed by glusterd
  • 12. JUSTIN CLIFT – JCLIFT@REDHAT.COM12 Gluster Components ● glusterfs ● NFS server daemon ● FUSE client daemon ● mount.glusterfs ● FUSE native mount tool ● gluster ● Gluster Console Manager (CLI)
  • 13. JUSTIN CLIFT – JCLIFT@REDHAT.COM13 Data Access Overview ● GlusterFS Native Client ● Filesystem in Userspace (FUSE) ● NFS ● Built-in Service ● SMB/CIFS ● Samba server required ● Unified File and Object (UFO) ● Simultaneous object-based access
  • 14. JUSTIN CLIFT – JCLIFT@REDHAT.COM14 Putting it All Together
  • 16. JUSTIN CLIFT – JCLIFT@REDHAT.COM16 Scaling Up ● Add disks and filesystems to a node ● Expand a GlusterFS volume by adding bricks XFS
  • 17. JUSTIN CLIFT – JCLIFT@REDHAT.COM17 Scaling Out ● Add GlusterFS nodes to trusted pool ● Add filesystems as new bricks
  • 19. JUSTIN CLIFT – JCLIFT@REDHAT.COM19 Elastic Hash Algorithm ● No central metadata ● No Performance Bottleneck ● Eliminates risk scenarios ● Location hashed intelligently on path and filename ● Unique identifiers, similar to md5sum ● The “Elastic” Part ● Files assigned to virtual volumes ● Virtual volumes assigned to multiple bricks ● Volumes easily reassigned on the fly
  • 20. JUSTIN CLIFT – JCLIFT@REDHAT.COM20 Translators
  • 22. JUSTIN CLIFT – JCLIFT@REDHAT.COM22 Distributed Volume ● Files “evenly” spread across bricks ● File-level RAID 0 ● Server/Disk failure could be catastrophic
  • 23. JUSTIN CLIFT – JCLIFT@REDHAT.COM23 Replicated Volume ● Copies files to multiple bricks ● File-level RAID 1
  • 24. JUSTIN CLIFT – JCLIFT@REDHAT.COM24 Distributed Replicated Volume ● Distributes files across replicated bricks ● RAID 1 plus improved read performance
  • 25. JUSTIN CLIFT – JCLIFT@REDHAT.COM25 Geo Replication ● Asynchronous across LAN, WAN, or Internet ● Master-Slave model -- Cascading possible ● Continuous and incremental ● Data is passed between defined master and slave only
  • 26. JUSTIN CLIFT – JCLIFT@REDHAT.COM26 Replicated Volumes vs Geo-replication Replicated Volumes Geo-replication Mirrors data across clusters Mirrors data across geographically distributed clusters Provides high-availability Ensures backing up of data for disaster recovery Synchronous replication (each and every file operation is sent across all the bricks) Asynchronous replication (checks for the changes in files periodically and syncs them on detecting differences)
  • 28. JUSTIN CLIFT – JCLIFT@REDHAT.COM28 Striped Volumes ● Individual files split among bricks ● Similar to RAID 0 ● Limited Use Cases – HPC Pre/Post Processing
  • 29. JUSTIN CLIFT – JCLIFT@REDHAT.COM29 Distributed Striped Volume ● Files striped across two or more nodes ● Striping plus scalability
  • 30. JUSTIN CLIFT – JCLIFT@REDHAT.COM30 Striped Replicated Volume ● GlusterFS 3.3+ ● Similar to RAID 10 (1+0)
  • 31. JUSTIN CLIFT – JCLIFT@REDHAT.COM31 Distributed Striped Replicated Volume ● GlusterFS 3.3+ ● Limited Use Cases – Map Reduce
  • 33. JUSTIN CLIFT – JCLIFT@REDHAT.COM33 GlusterFS Native Client (FUSE) ● FUSE kernel module allows the filesystem to be built and operated entirely in userspace ● Specify mount to any GlusterFS node ● Native Client fetches volfile from mount server, then communicates directly with all nodes to access data ● Recommended for high concurrency and high write performance ● Load is inherently balanced across distributed volumes
  • 34. JUSTIN CLIFT – JCLIFT@REDHAT.COM34 NFS ● Standard NFS v3 clients ● Note: Mount with vers=3 option ● Standard automounter is supported ● Mount to any node, or use a load balancer ● GlusterFS NFS server includes Network Lock Manager (NLM) to synchronize locks across clients ● Better performance for reading many small files from a single client ● Load balancing must be managed externally
  • 35. JUSTIN CLIFT – JCLIFT@REDHAT.COM35 SMB/CIFS ● GlusterFS volume is first mounted with the Native Client ● Redundantly on the GlusterFS peer -or- ● On an external server ● Native mount point is then shared via Samba ● Must be setup on each node you wish to connect to via CIFS ● Load balancing must be managed externally
  • 37. JUSTIN CLIFT – JCLIFT@REDHAT.COM37 Preparing a Brick # lvcreate -L 100G -n lv_brick1 vg_server1 # mkfs -t xfs -i size=512 /dev/vg_server1/lv_brick1 # mkdir /brick1 # mount /dev/vg_server1/lv_brick1 /brick1 # echo '/dev/vg_server1/lv_brick1 /brick1 xfs defaults 1 2' >> /etc/fstab
  • 38. JUSTIN CLIFT – JCLIFT@REDHAT.COM38 Adding Nodes (peers) and Volumes gluster> peer probe server3 gluster> peer status Number of Peers: 2 Hostname: server2 Uuid: 5e987bda-16dd-43c2-835b-08b7d55e94e5 State: Peer in Cluster (Connected) Hostname: server3 Uuid: 1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7 State: Peer in Cluster (Connected) gluster> volume create my-dist-vol server2:/brick2 server3:/brick3 gluster> volume info my-dist-vol Volume Name: my-dist-vol Type: Distribute Status: Created Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: server2:/brick2 Brick2: server3:/brick3 gluster> volume start my-dist-vol Distributed Volume Peer Probe
  • 39. JUSTIN CLIFT – JCLIFT@REDHAT.COM39 Distributed Striped Replicated Volume gluster> volume create test-volume replica 2 stripe 2 transport tcp server1:/exp1 server1:/exp2 server2:/exp3 server2:/exp4 server3:/exp5 server3:/exp6 server4:/exp7 server4:/exp8 Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. Do you still want to continue creating the volume? (y/n) y Creation of volume test-volume has been successful. Please start the volume to access data. <- test-volume<- test-volume <- distributed files -><- distributed files -> stripe 2stripe 2 replica 2replica 2
  • 40. JUSTIN CLIFT – JCLIFT@REDHAT.COM40 Distributed Striped Replicated Volume gluster> volume info test-volume Volume Name: test-volume Type: Distributed-Striped-Replicate Volume ID: 8f8b8b59-d1a1-42fe-ae05-abe2537d0e2d Status: Created Number of Bricks: 2 x 2 x 2 = 8 Transport-type: tcp Bricks: Brick1: server1:/exp1 Brick2: server2:/exp3 Brick3: server1:/exp2 Brick4: server2:/exp4 Brick5: server3:/exp5 Brick6: server4:/exp7 Brick7: server3:/exp6 Brick8: server4:/exp8 gluster> volume create test-volume stripe 2 replica 2 transport tcp server1:/exp1 server2:/exp3 server1:/exp2 server2:/exp4 server3:/exp5 server4:/exp7 server3:/exp6 server4:/exp8 Creation of volume test-volume has been successful. Please start the volume to access data.
  • 41. JUSTIN CLIFT – JCLIFT@REDHAT.COM41 Manipulating Bricks in a Volume gluster> volume add-brick my-dist-vol server4:/brick4 gluster> volume remove-brick my-dist-vol server2:/brick2 start gluster> volume remove-brick my-dist-vol server2:/brick2 status Node Rebalanced-files size scanned failures status --------- ----------- ----------- ----------- ----------- ------------ localhost 16 16777216 52 0 in progress 192.168.1.1 13 16723211 47 0 in progress gluster> volume remove-brick my-dist-vol server2:/brick2 commit gluster> volume rebalance my-dist-vol fix-layout start gluster> volume rebalance my-dist-vol start gluster> volume rebalance my-dist-vol status Node Rebalanced-files size scanned failures status --------- ----------- ----------- ----------- ----------- ------------ localhost 112 15674 170 0 completed 10.16.156.72 140 3423 321 2 completed
  • 42. JUSTIN CLIFT – JCLIFT@REDHAT.COM42 Migrating Data / Replacing Bricks gluster> volume replace-brick my-dist-vol server3:/brick3 server5:/brick5 start gluster> volume replace-brick my-dist-vol server3:/brick3 server5:/brick5 status Current File = /usr/src/linux-headers-2.6.31-14/block/Makefile Number of files migrated = 10567 Migration complete gluster> volume replace-brick my-dist-vol server3:/brick3 server5:/brick5 commit
  • 43. JUSTIN CLIFT – JCLIFT@REDHAT.COM43 Volume Options gluster> volume set my-dist-vol auth.allow 192.168.1.* gluster> volume set my-dist-vol auth.reject 10.* gluster> volume set my-dist-vol nfs.volume-access read-only gluster> volume set my-dist-vol nfs.disable on gluster> volume set my-dist-vol features.read-only on gluster> volume set my-dist-vol performance.cache-size 67108864 gluster> volume set my-dist-vol auth.allow 192.168.1.* gluster> volume set my-dist-vol auth.reject 10.* NFS Auth Other advanced options
  • 44. JUSTIN CLIFT – JCLIFT@REDHAT.COM44 Volume Top Command gluster> volume set my-dist-vol auth.allow 192.168.1.* gluster> volume set my-dist-vol auth.reject 10.* gluster> volume top my-dist-vol read brick server3:/brick3 list-cnt 3 Brick: server:/export/dir1 ==========Read file stats======== read filename call count 116 /clients/client0/~dmtmp/SEED/LARGE.FIL 64 /clients/client0/~dmtmp/SEED/MEDIUM.FIL 54 /clients/client2/~dmtmp/SEED/LARGE.FIL ● Many top commands are available for analysis of files, directories, and bricks ● Read and write performance test commands available ● Perform active dd tests and measure throughput
  • 45. JUSTIN CLIFT – JCLIFT@REDHAT.COM45 Volume Profiling gluster> volume set my-dist-vol auth.allow 192.168.1.* gluster> volume set my-dist-vol auth.reject 10.* gluster> volume profile my-dist-vol start gluster> volume profile my-dist-vol info Brick: Test:/export/2 Cumulative Stats: Block 1b+ 32b+ 64b+ Size: Read: 0 0 0 Write: 908 28 8 ... %-latency Avg- Min- Max- calls Fop latency Latency Latency ___________________________________________________________ 4.82 1132.28 21.00 800970.00 4575 WRITE 5.70 156.47 9.00 665085.00 39163 READDIRP 11.35 315.02 9.00 1433947.00 38698 LOOKUP 11.88 1729.34 21.00 2569638.00 7382 FXATTROP 47.35 104235.02 2485.00 7789367.00 488 FSYNC ------------------ Duration : 335 BytesRead : 94505058 BytesWritten : 195571980
  • 46. JUSTIN CLIFT – JCLIFT@REDHAT.COM46 Geo-Replication # ssh-keygen -f /var/lib/glusterd/geo-replication/secret.pem # ssh-copy-id -i /var/lib/glusterd/geo-replication/secret.pem repluser@slavehost1 gluster> volume geo-replication my-dist-vol repluser@slavehost1::my-dist-repl start Starting geo-replication session between my-dist-vol & slavehost1:my-dist-repl has been successful gluster> volume geo-replication my-dist-vol status MASTER SLAVE STATUS -------------------------------------------------------------------------------- my-dist-vol ssh://repluser@slavehost1::my-dist-repl OK Setup SSH Keys Replicate Via SSH to Remote GlusterFS Volume gluster> volume info my-dist-vol ... Options Reconfigured: geo-replication.indexing: on Output of volume info Now Reflects Replication
  • 48. JUSTIN CLIFT – JCLIFT@REDHAT.COM48 Common Solutions ● Media / Content Distribution Network (CDN) ● Backup / Archive / Disaster Recovery (DR) ● Large Scale File Server ● Home directories ● High Performance Computing (HPC) ● Infrastructure as a Service (IaaS) storage layer
  • 49. JUSTIN CLIFT – JCLIFT@REDHAT.COM49 Hadoop – Map Reduce ● Access data within and outside of Hadoop ● No HDFS name node single point of failure / bottleneck ● Seamless replacement for HDFS ● Scales with the massive growth of big data
  • 50. JUSTIN CLIFT – JCLIFT@REDHAT.COM50 CIC Electronic Signature Solutions ● Challenge ● Must leverage economics of the cloud ● Storage performance in the cloud too slow ● Need to meet demanding client SLA’s ● Solution ● Red Hat Storage Software Appliance ● Amazon EC2 and Elastic Block Storage (EBS) ● Benefits ● Faster development and delivery of new products ● SLA’s met with headroom to spare ● Accelerated cloud migration ● Scale-out for rapid and simple expansion ● Data is highly available for 24/7 client access Hybrid Cloud: Electronic Signature Solutions ● Reduced time-to- market for new products ● Meeting all client SLAs ● Accelerating move to the cloud
  • 51. JUSTIN CLIFT – JCLIFT@REDHAT.COM51 Pandora Internet Radio ● Challenge ● Explosive user & title growth ● As many as 12 file formats for each song ● ‘Hot’ content and long tail ● Solution ● Three data centers, each with a six-node GlusterFS cluster ● Replication for high availability ● 250+ TB total capacity ● Benefits ● Easily scale capacity ● Centralized management; one administrator to manage day-to-day operations ● No changes to application ● Higher reliability Private Cloud: Media Serving ● 1.2 PB of audio served per week ● 13 million files ● Over 50 GB/sec peak traffic
  • 52. JUSTIN CLIFT – JCLIFT@REDHAT.COM52 Brightcove • Over 1 PB currently in Gluster • Separate 4 PB project in the works Private Cloud: Media Serving ● Challenge ● Explosive customer & title growth ● Massive video in multiple locations ● Costs rising, esp. with HD formats ● Solution ● Complete scale-out based on commodity DAS/JBOD and GlusterFS ● Replication for high availability ● 1PB total capacity ● Benefits ● Easily scale capacity ● Centralized management; one administrator to manage day-to-day operations ● Higher reliability ● Path to multi-site
  • 53. JUSTIN CLIFT – JCLIFT@REDHAT.COM53 Pattern Energy • Rapid and advance weather predictions • Maximizing energy assets • Cost savings and avoidance High Performance Computing for Weather Prediction ● Challenge ● Need to deliver rapid advance weather predictions ● Identify wind and solar abundance in advance ● More effectively perform preventative maintenance and repair ● Solution ● 32 HP compute nodes ● Red Hat SSA for high throughput and availability ● 20TB+ total capacity ● Benefits ● Predicts solar and wind patterns 3 to 5 days in advance ● Maximize energy production and repair times ● Avoid costs of outsourcing weather predictions ● Solution has paid for itself many times over
  • 55. JUSTIN CLIFT – JCLIFT@REDHAT.COM55 Split-Brain Syndrome ● Communication lost between replicated peers ● Clients write separately to multiple copies of a file ● No automatic fix ● May be subjective which copy is right – ALL may be! ● Admin determines the “bad” copy and removes it ● Self-heal will correct the volume ● Trigger a recursive stat to initiate ● Proactive self-healing in GlusterFS 3.3
  • 56. JUSTIN CLIFT – JCLIFT@REDHAT.COM56 Quorum Enforcement ● Disallows writes (EROFS) on non-quorum peers ● Significantly reduces files affected by split-brain ● Preferred when data integrity is the priority ● Not preferred when application integrity is the priority
  • 57. JUSTIN CLIFT – JCLIFT@REDHAT.COM57 Your Storage Servers are Sacred! ● Don't touch the brick filesystems directly! ● They're Linux servers, but treat them like appliances ● Separate security protocols ● Separate access standards ● Don't let your Jr. Linux admins in! ● A well-meaning sysadmin can quickly break your system or destroy your data
  • 59. 59 Do it! ● Build a test environment in VMs in just minutes! ● Get the bits: ● www.gluster.org has packages available for many Linux distributions (CentOS, Fedora, RHEL, Ubuntu)
  • 60. Thank You! ● GlusterFS: www.gluster.org ● Justin Clift - jclift@redhat.com GlusterFS for SysAdmins Slides Available at: http://www.gluster.org/community/documentation/index.php/Presentations

Hinweis der Redaktion

  1. -Dustin Black -Red Hat Certified Architect -Senior Technical Account Manager with Red Hat Global Support Services -More than 10 years of systems and infrastructure experience, with focus on Linux, UNIX, and networking -I am not a coder. I&amp;apos;ll hack a script together, or read code from others reasonably well, but I would never presume to be a developer. -I believe strongly in the power of openness in most all aspects of life and business. Openness only improves the interests we all share.
  2. I hope to make the GlusterFS concepts more tangible. I want you to walk away with the confidence to start working with GlusterFS today.
  3. -Commodity hardware: aggregated as building blocks for a clustered storage resource. -Standards-based: No need to re-architect systems or applications, and no long-term lock-in to proprietary systems or protocols. -Simple and inexpensive scalability. -Scaling is non-interruptive to client access. -Aggregated resources into unified storage volume abstracted from the hardware.
  4. -Bricks are “stacked” to increase capacity -Translators are “stacked” to increase functionality
  5. -So if you&amp;apos;re going to deploy GlusterFS, where do you start? -Remember, for a datacenter deployment, we&amp;apos;re talking about commodity server hardware as the foundation. -In the public cloud space, your option right now is Amazon Web Services, with EC2 and EBS. -RHS is the only supported high-availability option for EBS.
  6. -XFS is the only filesystem supported with RHS. -Extended attribute support is necessary because the file hash is stored there
  7. -gluster console commands can be run directly, or in interactive mode. Similar to virsh, ntpq
  8. -The native client uses fuse to build complete filesystem functionality without gluster itself having to operate in kernel space. This offers benefits in system stability and time-to-end-user for code updates.
  9. No metadata == No Performance Bottleneck or single point of failure (compared to single metadata node) or corruption issues (compared to distributed metadata). Hash calculation is faster than metadata retrieval Elastic hash is the core of how gluster scales linerally
  10. Modular building blocks for functionality, like bricks are for storage
  11. When you configure geo-replication, you do so on a specific master (or source) host, replicating to a specific slave (or destination) host. Geo-replication does not scale linearly since all data is passed through the master and slave nodes specifically. Because geo-replication is configured per-volume, you can gain scalability by choosing different master and slave geo-rep nodes for each volume.
  12. Limited Read/Write performance increase, but in some cases the overhead can actually cause a performance degredation
  13. -Graphic is wrong -- -Replica 0 is exp1 and exp3 -Replica 1 is exp2 and exp4
  14. This graphic from the RHS 2.0 beta documentation actually represents a non-optimal configuration. We&amp;apos;ll discuss this in more detail later.
  15. -Native client will be the best choice when you have many nodes concurrently accessing the same data -Client access to data is naturally load-balanced because the client is aware of the volume structure and the hash algorithm.
  16. ...mount with nfsvers=3 in modern distros that default to nfs 4 The need for this seems to be a bug, and I understand it is in the process of being fixed. NFS will be the best choice when most of the data access by one client and for small files. This is mostly due to the benefits of native NFS caching. Load balancing will need to be accomplished by external mechanisms
  17. -Use the GlusterFS native client first to mount the volume on the Samba server, and then share that mount point with Samba via normal methods. -GlusterFS nodes can act as Samba servers (packages are included), or it can be an external service. -Load balancing will need to be accomplished by external mechanisms
  18. An inode size smaller than 512 leaves no room for extended attributes (xattr). This means that every active inode will require a separate block for these. This has both a performance hit as well as a disk space usage penalty.
  19. -peer status command shows all other peer nodes – excludes the local node -I understand this to be a bug that&amp;apos;s in the process of being fixed
  20. -Support for striped replicated volumes is added in RHS 2.0 -I&amp;apos;m using this example because it&amp;apos;s straight out of the documentation, but I want to point out that this is not an optimal configuration. -With this configuration, the replication happens between bricks on the same node. -We should alter the order of the bricks here so that the replication is between nodes.
  21. -Brick order is corrected to ensure replication is between bricks on different nodes. -Replica is always processed first in building the volume, regardless of whether it&amp;apos;s before or after stripe on the command line. -So, a &amp;apos;replica 2&amp;apos; will create the replica between matching pairs of bricks in the order that the bricks are passed to the command. &amp;apos;replica 3&amp;apos; will be matching trios of bricks, and so on.
  22. Must add and remove in multiples of the replica and stripe counts
  23. ??where is this stored, and how does it impact performance when on??
  24. Specifying the double-colin between the remote host name and destination tells geo-replication that the destination is a glusterfs volume name. With a single colon, it will treat the destination as an absolute filesystem path. Preceding the remote host with user@ causes geo-replication to interpret the communication protocol as ssh, which is generally preferred. If you use the simple remote syntax of: hostname:volume It will cause the local system to mount the remote volume locally with the native client, which will result in a necessary performance degradation because of the added fuse overhead. Over a WAN, this can be significant.
  25. Cloud-based online video platform
  26. Cloud-based online video platform
  27. Patch written by Jeff Darcy
  28. Question notes: -Vs. CEPH -CEPH is object-based at its core, with distributed filesystem as a layered function. GlusterFS is file-based at its core, with object methods (UFO) as a layered function. -CEPH stores underlying data in files, but outside the CEPH constructs they are meaningless. Except for striping, GlusterFS files maintain complete integrity at the brick level. -With CEPH, you define storage resources and data architecture (replication) separate, and CEPH actively and dynamically manages the mapping of the architecture to the storage. With GlusterFS, you manually manage both the storage resources and the data architecture.