Slides from the S8 File Systems Tutorial at USENIX LISA'13 conference in Washington, DC. The topic covers ext4, btrfs, and ZFS with an emphasis on Linux implementations.
4. ext4
File Systems
ZFS
btrfs
File system
discussed on slide
• Today’s discussions: emphasis on Linux
• ext4, with a few comments on ext3
• btrfs
• ZFS
• Not in scope (maybe next year?)
• ReFS
• HSF+
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 4
5. ext4
ext4 Highlights
• ext3 was limited
• 16TB filesystem size (32-bit block numbers)
• 32k limit on subdirectories
• Performance limitations
• ext4 is natural successor
•
•
•
•
•
Easy migration from ext3
Replace indirect blocks with extents
> 16TB filesystem size
Preallocation
Journal checksums
• Now default on many Linux distros
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 5
6. ZFS Highlights
ZFS
• Figure out why storage has become so
•
•
•
•
•
•
•
complicated
Blow away 20+ years of obsolete
assumptions
Sun had to replace UFS
Opportunity to design integrated system
from scratch
Widely ported: Linux, FreeBSD, OSX
Builtin RAID
Checksums
Large scale (256 ZB)
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 6
7. btrfs
•
•
•
•
•
•
•
btrfs
New copy-on-write file system
Pooled storage model
Snapshots
Checksums
Large scale (16 EB)
Builtin RAID
Clever in-place migration from ext3
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 7
8. ReFS
Pooled Storage Model
ZFS
btrfs
• Old school
• 1 disk means
• 1 file system
• 1 directory structure (directory tree)
• File systems didn’t change when virtual
disks (eg RAID) arrived
•
• ok, so we could partition them... ugly solution
New school
• Combine storage devices into a pool
• Allow many file systems per pool
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 8
9. ZFS
btrfs
Sysadmin’s View of Pools
Pool
File System
Configuration
Information
File System
Dataset
November 3, 2013
Volume
File System
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 9
10. ext4
Blocks and Extents
ZFS
btrfs
• Early file systems were block-based
• ext3, UFS, FAT
• Data blocks are fixed sizes
• Difficult to scale due to indirection levels and
allocation algorithms
• Extents solve many indirection issues
• Extent is a contiguous area of storage
•
•
reserved for a file
Data blocks are variable sizes
ext4, btrfs, ZFS, XFS, NTFS, VxFS
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 10
12. ext4
Scalability
ZFS
btrfs
Problem: what happens when we
need more metadata?
• Block-based: go with indirect blocks
• Really just pointers to pointers
• Gets ugly at triple-indirection
• Function of data size and block size
• Extent-based: grow trees
• B-trees are popular
•
• ext4, for more than 3 levels
• btrfs
ZFS uses a Merkle tree
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 12
14. ext4
Treed Metadata
ZFS
btrfs
Root
Data
Data
Data
Data
• Trees can be large, yet efficiently
searched and modified
• Enables copy-on-write (COW)
• Lots of good computer science here!
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 14
15. ZFS
btrfs
Trees Allow Copy-on-Write
1. Initial block tree
3. COW metadata
November 3, 2013
2. COW some data
4. Update Uberblocks & free
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 15
16. ext4
fsck
ZFS
btrfs
Problem: how do we know the metadata is
correct?
• Keep redundant copies
• But what if the copies don’t agree?
1. File system check reconciles metadata
inconsistencies
• fsck (ext[234], btrfs, UFS), chkdsk (FAT), etc
• Repairs problems that are known to occur (!)
• Does not repair data (!)
2. Build a transactional system with atomic updates
•
•
November 3, 2013
Databases (MySQL, Oracle, etc)
ZFS
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 16
22. RAID Basics
• Disks fail. Sometimes they lose data.
•
•
•
•
•
•
Sometimes they completely die. Get over it.
RAID = Redundant Array of Inexpensive
Disks
RAID = Redundant Array of Independent
Disks
Key word: Redundant
Redundancy is good.
More redundancy is better.
Everything else fails, too. You’re over it by
now, right?
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 22
23. RAID-0 or Striping
ZFS
btrfs
• RAID-0
• SNIA definition: fixed-length sequences of virtual
•
•
disk data addresses are mapped to sequences of
member disk addresses in a regular rotating pattern
Good for space and performance
Bad for dependability
• ZFS Dynamic Stripe
•
•
•
•
Data is dynamically mapped to member disks
No fixed-length sequences
Allocate up to ~1 MByte/vdev before changing vdev
Good combination of the concatenation feature with
RAID-0 performance
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 23
24. ZFS
btrfs
RAID-0 Example
RAID-0 Column size = 128 kBytes, stripe width = 384 kBytes
384 kBytes
ZFS Dynamic Stripe recordsize = 128 kBytes
Total write size = 2816 kBytes
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 24
25. RAID-1 or Mirroring
ZFS
btrfs
• Straightforward: put N copies of the data
on N disks
• Good for read performance and
•
dependability
Bad for space
• Arbitration: btrfs and ZFS do not blindly
trust either side of mirror
• Most recent, correct view of data wins
• Checksums validate data
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 25
26. Traditional Mirrors
File system
does bad read
Can not tell
November 3, 2013
If it’s a metadata
block FS panics
does disk rebuild
Or we get
back bad
data
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 26
27. ZFS
btrfs
Checksums for Mirrors
• What if a disk is (mostly) ok, but the data
became corrupted?
• btrfs and ZFS improve dependability
using checksums for data and store
checksums in metadata
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 27
28. RAID-5 and RAIDZ
ZFS
btrfs
• N+1 redundancy
• Good for space and dependability
• Bad for performance
• RAID-5 (btrfs)
• Parity check data is distributed across the RAID array's
•
disks
Must read/modify/write when data is smaller than stripe
width
• RAIDZ (ZFS)
•
•
•
•
Dynamic data placement
Parity added as needed
Writes are full-stripe writes
No read/modify/write (write hole)
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 28
30. RAID-6, RAIDZ2, RAIDZ3
ZFS
btrfs
• Adding more parity
• Parity 1: XOR
• Parity 2: another Reed-Solomon syndrome
• Parity 3: yet another Reed-Solomon
syndrome
• Double parity: N+2
• RAID-6 (btrfs)
• RAIDZ2 (ZFS)
• Triple parity: N+3
• RAIDZ3 (ZFS)
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 30
31. ZFS
btrfs
Dependability vs Space
Dependability model metric MTTDL = Mean time to data loss (bigger is better)
For this analysis, RAIDZ1/2 and RAID-5/6 are equivalent
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 31
32. We now return you to
your regularly
scheduled program:
AΩ
33. Create a Simple Pool
ZFS
btrfs
1. Determine the name of an unused disk
•
•
•
•
/dev/sd* or /dev/hd*
/dev/disk/by-id
/dev/disk/by-path
/dev/disk/by-vdev (ZFS)
2. Create a simple pool
•
•
btrfs
mkfs.btrfs -m single /dev/sdb
ZFS
zpool create zwimming /dev/sdd
Note: might need “-f” flag to create EFI label
3. Woohoo!
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 33
34. ZFS
btrfs
Verify Pool Status
• btrfs
btrfs filesystem show
• ZFS
zpool status
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 34
35. Destroy Pool
ZFS
btrfs
• btrfs
• Unmount all btrfs file systems
• ZFS
zpool destroy zwimming
• Unmounts file systems and volumes
• Exports pool
• Marks pool as destroyed
• Walk away...
• Until overwritten, data is still ok and can be
imported again
• To see destroyed ZFS pools
zpool import -D
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 35
36. Create Mirrored Pool
ZFS
btrfs
1. Determine the name of two unused disks
2. Create a mirrored pool
•
btrfs
mkfs.btrfs -d raid1 /dev/sdb /dev/sdc
•
•
-d specifies redundancy for data, metadata is
redundant by default
ZFS
zpool create zwimming mirror /dev/sdd /dev/sde
3. Woohoo!
4. Verify
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 36
38. ext4
Create & Mount File System
ZFS
btrfs
• Make some mount points for this example
•
•
•
mkdir /mnt.ext4
mkdir /mnt.btrfs
ext4
mkfs.ext4 /dev/sdf
mount /dev/sdf /mnt.ext4
btrfs
mount /dev/sdb /mnt.btrfs
ZFS
• zpool create already made a file system and
mounted it at /zwimming
• Verify...
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 38
40. ext4
Verify Mounted File Systems
ZFS
btrfs
• df is handy tool to verify mounted file
systems
root@ubuntu:~# df -h
Filesystem
Size
...
/dev/sdf
976M
zwimming
976M
/dev/sdb
1.0G
Used Avail Use% Mounted on
1.3M
0
56K
924M
976M
894M
1% /mnt.ext4
0% /zwimming
1% /mnt.btrfs
• WAT?
• Pool space accounting isn’t like traditional
filesystem space accounting
• NB: the raw disk has 1,073,741,824 bytes
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 40
41. ext4
Again!
ZFS
btrfs
• Try again with our mirrored pool examples
root@ubuntu:~# df -h
Filesystem
Size
...
/dev/sdf
976M
zwimming
976M
/dev/sdc
2.0G
Used Avail Use% Mounted on
1.3M
0
56K
924M
976M
1.8G
1% /mnt.ext4
0% /zwimming
1% /mnt.btrfs
• WAT, WAT, WAT?
• The accounting is correct, your
•
understanding of the accounting might
need a little bit of help
Adding RAID-5, compression, copies, and
deduplication makes accounting very
confusing
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 41
42. Accounting Sanity
ZFS
btrfs
• A full explanation of the accounting for
pools is an opportunity for aspiring
writers!
• A more pragmatic view:
• The accounting is correct
• You can tell how much space is unallocated
(free), but you can’t tell how much data you
can put into it, until you do so
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 42
44. ZFS
btrfs
One Pool Many File Systems
Pool
File System
Configuration
Information
File System
Dataset
Volume
File System
• Good idea: create new file systems when
you want a new policy
• readonly, quota, snapshots/clones, etc
• Act like directories, but slightly heavier
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 44
45. ZFS
btrfs
Create New File Systems
• Context: new file system in existing pool
• btrfs
•
•
btrfs subvolume /mnt.btrfs/sv1
ZFS
zfs create zwimming/fs1
Verify
root@ubuntu:~# df -h
Filesystem
Size Used Avail Use% Mounted on
...
/dev/sdf
976M 1.3M 924M
1% /mnt.ext4
zwimming
976M 128K 976M
1% /zwimming
/dev/sdb
1.0G
64K 894M
1% /mnt.btrfs
zwimming/fs1
976M 128K 976M
1% /zwimming/fs1
root@ubuntu:~# ls -l /mnt.btrfs
total 0
drwxr-xr-x 1 root root 0 Nov 2 20:30 sv1
root@ubuntu:~# ls -l /zwimming
total 2
drwxr-xr-x 2 root root 2 Nov 2 20:29 fs1
root@ubuntu:~# btrfs subvolume list /mnt.btrfs
ID 256 top level 5 path sv1
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 45
46. ZFS
btrfs
Nesting
• It is tempting to create deep, nested
multiple file system structures
• But it increases management complexity
• Good idea: use shallow file system
hierarchy
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 46
48. ext4
ZFS
btrfs
Traditional Tools
• For file systems, the traditional tools work
as you expect
• cp, scp, tar, rsync, zip, ...
• For ZFS volumes, dd
• But those are boring, let’s talk about
snapshots and replication
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 48
49. Snapshots
ZFS
btrfs
Snapshot tree
root
Current tree
root
• Create a snapshot by not free'ing COWed blocks
• Snapshot creation is fast and easy
• Number of snapshots determined by use – no
hardwired limit
• Recursive snapshots also possible in ZFS
• Terminology: btrfs “writable snapshot” is like ZFS
“clone”
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 49
50. Create Read-only Snapshot
ZFS
btrfs
• btrfs
• btrfs version v0.20-rc1 or later
• Read-only needed for btrfs send
btrfs subvolume snapshot -r /mnt.btrfs/sv1
/mnt.btrfs/sv1_ro
• ZFS
zfs snapshot zwimming@snapme
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 50
51. ZFS
btrfs
Create Writable Snapshot
• btrfs
•
btrfs subvolume snapshot /mnt.btrfs/sv1
ZFS
zfs snapshot zwimming@snapme
zfs clone zwimming@snapme zwimming/cloneme
root@ubuntu:~# btrfs subvolume snapshot /mnt.btrfs/sv1 /mnt.btrfs/sv1_snap
Create a snapshot of '/mnt.btrfs/sv1' in '/mnt.btrfs/sv1_snap'
root@ubuntu:~# btrfs subvolume list /mnt.btrfs
ID 256 top level 5 path sv1
ID 257 top level 5 path sv1_snap
root@ubuntu:~# zfs snapshot zwimming@snapme
root@ubuntu:~# zfs list -t snapshot
NAME
USED AVAIL REFER MOUNTPOINT
zwimming@snapme
0
31K root@ubuntu:~# ls -l /zwimming/.zfs/snapshot
total 0
dr-xr-xr-x 1 root root 0 Nov 2 21:02 snapme
root@ubuntu:~# zfs clone zwimming@snapme zwimming/cloneme
root@ubuntu:~# df -h
Filesystem
Size Used Avail Use% Mounted on
...
zwimming
976M
0 976M
0% /zwimming
zwimming/cloneme 976M
0 976M
0% /zwimming/cloneme
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 51
52. btrfs
btfs Send and Receive
• New feature in v0.20-rc1
• Operates on read-only snapshots
btrfs subvolume snapshot -r /mnt.btrfs/sv1
/mnt.btrfs/sv1_ro
• Note: send data must be on disk, either
wait or use sync command
• Send the to stdout, receive from stdin
root# btrfs subvolume snapshot -r /mnt.btrfs/sv1 /mnt.btrfs/sv1_ro
root# sync
root# btrfs subvolume create /mnt.btrfs/backup
root# btrfs send /mnt.btrfs/sv1_ro | btrfs receive /mnt.btrfs/backup
At subvol /mnt.btrfs/sv1_ro
At subvol sv1_ro
root# btrfs subvolume list /mnt.btrfs
ID 256 gen 8 top level 5 path sv1
ID 257 gen 8 top level 5 path sv1_ro
ID 258 gen 13 top level 5 path backup
ID 259 gen 14 top level 5 path backup/svr_ro
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 52
53. ZFS Send and Receive
ZFS
• Works the same on file systems as volumes
•
(datasets)
Send a snapshot as a stream to stdout
• Whole: single snapshot
• Incremental: difference between two snapshots
• Receive a snapshot into a dataset
• Whole: create a new dataset
• Incremental: add to existing, common snapshot
• Each snapshot has a GUID and creation time property
• Good idea: avoid putting time in snapshot name, use the
properties for automation
• Example
zfs send zwimming@snap | zfs receive zbackup/zwimming
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 53
55. ext4
Forward Migration
ZFS
btrfs
•
•
•
•
But first... backup your data!
And second... test your backup
ext3 ➯ ext4
ext3 or ext4 ➯ btrfs
• Cleverly treats existing ext3 or ext4 data as readonly snapshot
• btrfs seed devices
• Read-only file system as basis of new file system
• All writes are COWed into new file system
• ZFS is fundamentally different
• Use traditional copies: cp, tar, rsync, etc
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 55
56. ext4
btrfs
Reverting Migration
• Once you start to use ext4 features or
add data to btrfs, the old ext3 filesystems
doesn’t see the new data
• Seems to be unallocated space
• Reverting loses the changes after migration
• But first... backup your data!
• And second... test your backup
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 56
58. ext4
ext4 Options
• Extends function set available to ext2 and ext3
• Creation options
• uninit_bg creates file system without initializing all of
the block groups
•
• speeds filesystem creation
• can speed fsck
Mount options of note
• barriers enabled by default
• max_batch_time for coalescing synchronous writes
• Adjusts dynamically by observing commit time
• Use with caution, know your workload
• discard/nodiscard for enabling TRIM for SSDs
• Is TRIM actually useful? The jury is still out...
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 58
59. btrfs Options
btrfs
• Mount options
• degraded: useful when mounting redundant
•
pools with broken or missing devices
compress: select zlib, lzo, or no
compression algorithms
• Note: by default, only compressible data is
written
• discard: enables TRIM (see ext4 option)
• fatal_errors: choose error fail policy
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 59
60. ZFS Properties
ZFS
• Recall that ZFS doesn’t use fstab or mkfs
• Properties are stored in metadata for the pool or
•
•
•
•
•
dataset
By default, properties are inherited
Some properties are common to all datasets, but
a specific dataset type may have additional
properties
Easily set or retrieved via scripts
Can set at creation time, or later (restrictions
apply)
In general, properties affect future file system
activity
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 60
61. Managing ZFS Properties
ZFS
• Pool properties
zpool get all poolname
zpool get propertyname poolname
zpool set propertyname=value poolname
• Dataset properties
zfs get all dataset
zfs get propertyname [dataset]
zfs set propertyname=value dataset
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 61
62. User-defined Properties
ZFS
• Useful for adding metadata to datasets
• Limited to description property on pools
• Recall each pool has a dataset of the same name
• Names
•
•
•
•
•
Must include colon ':'
Can contain lower case alphanumerics or “+” “.” “_”
Max length = 256 characters
By convention, module:property
• com.sun:auto-snapshot
Values
• Max length = 1024 characters
• Examples
• com.richardelling:important_files=true
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 62
63. ZFS Pool Properties
ZFS
Property
altroot
Change?
Brief Description
Alternate root directory (ala chroot)
autoexpand
Policy for expanding when vdev size
changes
autoreplace
vdev replacement policy
available
readonly Available storage space
bootfs
Default bootable dataset for root pool
cachefile
Cache file to use other than /etc/zfs/
zpool.cache
capacity
dedupditto
readonly Percent of pool space used
Automatic copies for deduped data
dedupratio readonly Deduplication efficiency metric
delegation
Master pool delegation switch
failmode
Catastrophic pool failure policy
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 63
64. More ZFS Pool Properties
ZFS
Property
feature@async_destroy
Change?
Brief Description
Reduce pain of dataset
destroy workload
feature@empty_bpobj
Improves performance for
lots of snapshots
feature@lz4_compress
lz4 compression
guid
readonly Unique identifier
health
listsnapshots
readonly Current health of the pool
zfs list policy
size
used
readonly Amount of space used
version
November 3, 2013
readonly Total size of pool
readonly Current on-disk version
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 64
65. Common Dataset Properties
ZFS
Property
Change?
available
readonly
checksum
copies
creation
Space available to dataset &
children
Checksum algorithm
compression
compressratio
Brief Description
Compression algorithm
readonly
Compression ratio – logical
size:referenced physical
Number of copies of user data
readonly Dataset creation time
dedup
Deduplication policy
logbias
Separate log write policy
mlslabel
Multilayer security label
origin
November 3, 2013
readonly For clones, origin snapshot
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 65
66. More Dataset Properties
ZFS
Property
Change?
primarycache
Brief Description
ARC caching policy
readonly
Is dataset in readonly mode?
referenced
readonly
Size of data accessible by this
dataset
refreservation
Minimum space guaranteed to a
dataset, excluding descendants
(snapshots & clones)
reservation
Minimum space guaranteed to
dataset, including descendants
secondarycache
L2ARC caching policy
sync
type
November 3, 2013
Synchronous write policy
readonly
Type of dataset (filesystem,
snapshot, volume)
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 66
67. Still More Dataset Properties
ZFS
Property
Change?
Brief Description
used
readonly Sum of usedby* (see below)
usedbychildren
readonly Space used by descendants
usedbydataset
readonly Space used by dataset
usedbyrefreservation readonly
Space used by a refreservation
for this dataset
usedbysnapshots
readonly
Space used by all snapshots of
this dataset
zoned
readonly
Is dataset added to non-global
zone (Solaris)
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 67
68. ZFS
ZFS Volume Properties
Property
Change?
shareiscsi
volblocksize
iSCSI service (per-distro option)
creation
volsize
zoned
November 3, 2013
Brief Description
fixed block size
Implicit quota
readonly
Set if dataset delegated to nonglobal zone (Solaris)
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 68
69. ZFS File System Properties
ZFS
Property
Change?
Brief Description
aclinherit
ACL inheritance policy, when files or
directories are created
aclmode
ACL modification policy, when chmod
is used
atime
Disable access time metadata
updates
canmount
Mount policy
casesensitivity creation
Filename matching algorithm (CIFS
client feature)
devices
Device opening policy for dataset
exec
File execution policy for dataset
mounted
November 3, 2013
readonly
Is file system currently mounted?
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 69
70. ZFS Filesystem Properties2
ZFS
Property
Change
?
nbmand
export/
File system should be mounted with nonimport blocking mandatory locks (CIFS client feature)
normalization creation
Brief Description
Unicode normalization of file names for
matching
quota
Max space dataset and descendants can
consume
recordsize
Suggested maximum block size for files
refquota
Max space dataset can consume, not
including descendants
setuid
setuid mode policy
sharenfs
NFS sharing options (per-distro)
sharesmb
Files system shared with SMB (per-distro)
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 70
71. ZFS Filesystem Properties3
ZFS
Property
Change
?
snapdir
utf8only
Brief Description
Controls whether .zfs directory is hidden
creation
UTF-8 character file name policy
vscan
Virus scan enabled
xattr
Extended attributes policy
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 71
72. ZFS Distro Properties
ZFS
Pool Properties
Release
Property
Brief Description
illumos
comment
Human-readable comment field
ZFSonLinux
ashift
Sets default disk sector size
Dataset Properties
Release
Property
Brief Description
Solaris 11
encryption
Dataset encryption
Delphix/illumos
clones
Clone descendants
Delphix/illumos
refratio
Compression ratio for references
Solaris 11
share
Combines sharenfs & sharesmb
Solaris 11
shadow
Shadow copy
NexentaOS/illumos
worm
WORM feature
Delphix/illumos
written
Amount of data written since last
snapshot
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 72
74. ext4
About Disks
ZFS
btrfs
• Hard disk drives are slow. Get over it.
Average
Average Seek
Rotational
(ms)
Latency (ms)
Disk
Size
RPM
Max Size
(GBytes)
HDD
2.5”
5,400
1,000
5.5
11
HDD
3.5”
5,900
4,000
5.1
16
HDD
3.5”
7,200
4,000
4.2
8 - 8.5
HDD
2.5” 10,000
300
3
4.2 - 4.6
HDD
2.5” 15,000
146
2
3.2 - 3.5
SSD (w) 2.5”
N/A
800
0
0.02 - 0.25
SSD (r) 2.5”
N/A
1,000
0
0.02 - 0.15
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 74
75. btrfs Performance
btrfs
• Move metadata to separate devices
• Common option for distributed file systems
• Attribute-intensive workloads can benefit
from faster metadata management
Metadata
Pool
Minimal
HDD
Good
HDD
HDD
Better
November 3, 2013
RAID-1
SSD
SSD
RAID-1
HDD
HDD
RAID-1
HDD
HDD
RAID-1
RAID-10
HDD
HDD
RAID-1
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 75
81. ext4
Great File Systems!
ZFS
btrfs
• All of these file systems have great
features and bright futures
• Now you know how to use them better!
• ext4 is now default for many Linux distros
• btrfs takes it to the next level in the Linux
•
ecosystem
ZFS is widely ported to many different
OSes
• OpenZFS organization recently launched to
be focal point for open-source ZFS
• We’re always looking for more contributors!
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 81
82. Websites
ZFS
btrfs
• www.Open-ZFS.org
• www.ZFSonLinux.org
• github.com/zfsonlinux/pkg-zfs/wiki/HOWTOinstall-Ubuntu-to-a-Native-ZFS-RootFilesystem
• btrfs.wiki.kernel.org
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 82
83. ZFS
btrfs
Online Chats
• irc.freenode.net
• #zfs - general ZFS discussions
• #zfsonlinux - Linux-specific discussions
• #btrfs - general btrfs discusions
November 3, 2013
File Systems: Top to Bottom and Back — USENIX LISA’13
Slide 83