AOS Lab 10: File system -- Inodes and beyond

Lab 10: File system – Inodes and beyond
Advanced Operating Systems

Zubair Nabi
zubair.nabi@itu.edu.pk

April 10, 2013

Recap of Lab 9: xv6 FS layers

File descriptors

System calls

Recursive lookup Pathnames
Directory inodes
Inodes and
block allocator
Logging
Buffer cache

Directories
Files
Transactions
Blocks

Recap of Lab 9: xv6 FS layers (2)

1

Buffer cache: Reads and writes blocks on the IDE disk via the
buffer cache, which synchronizes access to disk blocks


1

• Ensures that only one kernel process can edit any particular block
at a time


1

at a time

2

Logging: Ensures atomicity by enabling higher layers to wrap
updates to several blocks in a transaction


1

at a time

2

Logging: Ensures atomicity by enabling higher layers to wrap
updates to several blocks in a transaction

3

Inodes and block allocator: Provides unnamed files, each
unnamed file is represented by an inode and a sequence of
blocks holding the file content


4

Directory inodes: Implements directories as a special kind of
inode


4

inode
• The content of this inode is a sequence of directory entries, each
of which contains a name and a reference to the named ﬁle’s
inode


4

inode
inode

5

Recursive lookup: Provides hierarchical path names such as
/foo/bar/baz.txt, via recursive lookup


4

inode
inode

5

Recursive lookup: Provides hierarchical path names such as
/foo/bar/baz.txt, via recursive lookup

6

File descriptors: Abstracts many Unix resources, such as pipes,
devices, ﬁle, etc., using the ﬁle system interface

Recap of Lab 9: File system layout

• xv6 lays out inodes and content blocks on the disk by dividing the
disk into several sections


boot super
0

1

inodes...
2

• Block 0 holds the boot sector

bitmap...
…..

data...

log...


boot super
0

1

inodes...
2

bitmap...

data...

log...

…..

• Block 1 (called the superblock) contains metadata about the ﬁle
system


boot super
0

1

inodes...
2

bitmap...

data...

log...

…..

system
• File system size in blocks, the number of data blocks, the number
of inodes, and the number of blocks in the log


boot super
0

1

inodes...
2

bitmap...

data...

log...

…..

system
• File system size in blocks, the number of data blocks, the number
of inodes, and the number of blocks in the log

• Blocks starting at 2 hold inodes, with multiple inodes per block


boot super
0

1

inodes...
2

bitmap...

data...

log...

…..

• inode blocks are followed by bitmap blocks which keep track of
data blocks in use


boot super
0

1

inodes...
2

bitmap...

data...

log...

…..

data blocks in use
• Bitmap blocks are followed by data blocks which hold ﬁle and
directory contents


boot super
0

1

inodes...
2

bitmap...

data...

log...

…..

data blocks in use
• Bitmap blocks are followed by data blocks which hold ﬁle and
directory contents
• Finally at the end, the blocks hold a log which is required by the
transaction layer

Inodes

• Have two variants:
1

On-disk data structure containing a ﬁle’s size and list of data block
numbers

Inodes

numbers
2 In-memory version of the on-disk inode, along with extra
information needed within the kernel
1

Inodes

numbers
1

• All on-disk inodes are stored in a contiguous area of disk,
between the superblock and the bitmap block

Inodes

numbers
1

• All on-disk inodes are stored in a contiguous area of disk,
between the superblock and the bitmap block
• Each inode has the same size, so given a number n (called the
inode number or i-number), it is simple to locate the
corresponding inode

On-disk inodes

• Represented by struct dinode

On-disk inodes

• Contains a type field to distinguish between files, directories, and
special files (devices)

On-disk inodes

• Zero indicates that the dinode is free

On-disk inodes

• Also keeps track of the number of directory entries that refer to
this inode

On-disk inodes

this inode
• This reference count dictates when the inode should be freed

On-disk inodes

this inode
• This reference count dictates when the inode should be freed

• Also has ﬁelds to hold number of bytes of content and the block
numbers of disk blocks

Code: dinode

struct dinode {
short type; // File type
short major; // Major device number (T_DEV only)
short minor; // Minor device number (T_DEV only)
short nlink; // Number of links to inode in file s
uint size; // Size of file (bytes)
uint addrs[NDIRECT+1]; // Data block addresses
};
#define T_DIR 1 // Directory
#define T_FILE 2 // File
#define T_DEV 3 // Device

In-memory inodes

• Represented by struct inode

In-memory inodes

• An inode is kept in memory if there are C pointers referring to it

In-memory inodes

• These pointers come from ﬁle descriptors, current working
directories, and kernel code

In-memory inodes

• iget and iput functions are used to acquire and release
pointers to/from an inode

In-memory inodes

• A pointer via an iget() call implements a weak form of a lock by
ensuring that the inode will stay in the cache till the reference
count goes down to zero

In-memory inodes

• A pointer via an iget() call implements a weak form of a lock by
ensuring that the inode will stay in the cache till the reference
count goes down to zero
• These pointers enable long-term references (open ﬁles and
current directory) and to prevent deadlock in code that
manipulates multiple inodes (pathname lookup)

Code: inode
struct inode {
uint dev; // Device number
uint inum; // Inode number
int ref; // Reference count
int flags; // I_BUSY, I_VALID
short type; // copy of disk inode
short major;
short minor;
short nlink;
uint size;
uint addrs[NDIRECT+1];
};
#define I_BUSY 0x1
#define I_VALID 0x2

Inode locks and allocation

• To ensure that an inode has valid content, the code must read it
from disk


from disk
• This read call must be wrapped around ilock and iunlock


from disk
• This allows multiple processes to hold a C pointer to an inode but
only one process can lock it at a time


from disk
• This allows multiple processes to hold a C pointer to an inode but
only one process can lock it at a time
• Inodes are allocated via ialloc which works similar to balloc

Inode data

• Data is found in the blocks pointed to by the addrs ﬁelds

Inode data

• Size of addrs is NDIRECT+1 where NDIRECT is 12

Inode data

• addrs can refer to 6KB of data

Inode data

• The 13th location in addrs ﬁeld points to the indirect block
(NINDIRECT) which points to 64KB of data

Inode data

• The 13th location in addrs field points to the indirect block
(NINDIRECT) which points to 64KB of data
• Therefore, while fixed-sized blocks simplify look up, the maximum
size of a file in xv6 can be 70KB

Inodes content

• bmap(struct inode *ip, uint bn) returns the disk
address of the nth block within inode ip, masking away the
complexity of direct and indirect blocks

Inodes content

• If the data block does not exist, it is created

Inodes content


• itrunc(struct inode *ip) frees inode ip by setting its
reference count to zero and freeing up blocks, both direct and
indirect

Inodes content


• itrunc(struct inode *ip) frees inode ip by setting its
reference count to zero and freeing up blocks, both direct and
indirect
• readi(struct inode *ip, char *dst, uint
off, uint n) reads n blocks in inode ip starting from off
into dst

Inodes content (2)

• writei(struct inode *ip, char *src, uint
off, uint n) works similar to readi but it:

Inodes content (2)

1

Copies data in instead of out

Inodes content (2)

1
2

Extends the ﬁle if the write increases its size

Inodes content (2)

3 Updates the size in the inode

1

2

Inodes content (2)

3 Updates the size in the inode

1

2

• stati(struct inode *ip, struct stat *st)
copies metadata of inode ip into st which is exposed to
userspace via the stat system call

Directory layer

• A directory is a ﬁle with an inode type T_DIR and data in the
form of a sequence of directory entries

Directory layer

• Each entry is a struct dirent

Directory layer

• Each entry is a struct dirent

struct dirent {
ushort inum; // free, if zero
char name[DIRSIZ];
};
#define DIRSIZ 14

dirlookup

• Searches a directory for an entry with the given name

dirlookup

• Signature: struct inode* dirlookup(struct inode

*dp, char *name, uint *poff)

dirlookup

• Signature: struct inode* dirlookup(struct inode

*dp, char *name, uint *poff)
• If it ﬁnds it, it returns a pointer to the corresponding inode via
iget, unlocked, and returns the offset of the entry within the
directory

dirlink

• Writes a new directory entry with the given name and inode
number into dp

dirlink

• Writes a new directory entry with the given name and inode
number into dp
• Signature: int dirlink(struct inode *dp, char

*name, uint inum)

Path names

• Path name look up is enabled by multiple calls – one for each
path component – to dirlookup

Path names

• namei takes a path and returns the corresponding inode

Path names

• nameiparent is similar but returns the inode of the parent
directory

Path names

• nameiparent is similar but returns the inode of the parent
directory
• Both make a call to namex internally

namex

• Starts by deciding where the path evaluation begins

namex

• If the path begins with /, evaluation starts at the root

namex

• Otherwise, the current directory

namex


• Uses skipelem to parse the path into path elements

namex


• Uses skipelem to parse the path into path elements
• For each iteration (depending on the number of path elements),
looks up name within the current path element inode till it ﬁnds
the required inode and returns it

File descriptor layer

• Everything in Unix is a ﬁle and this interface is enabled by the ﬁle
descriptor layer


descriptor layer
• Each process has its own open ﬁles (or ﬁle descriptor) table


descriptor layer
• Each process has its own open files (or file descriptor) table
• Each open file is represented by struct file

Code: struct file

struct file {
enum { FD_NONE, FD_PIPE, FD_INODE } type;
int ref; // reference count
char readable;
char writable;
struct pipe *pipe;
struct inode *ip;
uint off;
};

file
• struct file is simply a wrapper around an inode or a pipe;
plus an I/O offset

file
plus an I/O offset
• Each call to open creates a new struct file

file
plus an I/O offset
• If multiple processes open the same independently, they will have
their own struct file for it with a local I/O offset

file
plus an I/O offset
• The same struct file can appear multiple times within a) A
process’s ﬁle table, and b) Across multiple processes

file
plus an I/O offset
• When would this happen?

file
plus an I/O offset
• a happens when a process opens a ﬁle and then dups it and b
takes place when it makes a call to fork

file
plus an I/O offset

• Reference count tracks the number of references to a particular
open ﬁle

file
plus an I/O offset

• Reference count tracks the number of references to a particular
open ﬁle
• Read/write access is tracked by readable/writable ﬁelds

Global file table

• All open files in the system are kept within a global file table
(ftable)

Global ﬁle table

(ftable)
• ftable has corresponding functions to:
1 Allocate a ﬁle: filealloc

Global ﬁle table

(ftable)
2 Create a duplicate reference: filedup

Global ﬁle table

(ftable)
3 Release a reference: fileclose

Global ﬁle table

(ftable)
4 Read from a ﬁle: fileread

Global file table

(ftable)
4 Read from a file: fileread
5 Write to a file: filewrite

File manipulation

• filealloc: Scans the ﬁle table for an unreferenced ﬁle
(f->ref == 0) and returns a new reference

File manipulation

• filedup: Increments the reference count

File manipulation

• fileclose: Decrements the reference count

File manipulation

• fileclose: Decrements the reference count
• If f->ref == 0, underlying pipe or inode is released

File manipulation (2)

• filestat: Invokes stati and ensures that the ﬁle represents
an inode


an inode
• fileread and filewrite:
1

Check whether the operation is allowed by the open mode


an inode
1
2

Patch the call through to either the underlying pipe or inode
implementation


an inode
implementation
3 If the wrapper is around an inode, the I/O offset would be used
and then advanced
1

2


an inode
implementation
3 If the wrapper is around an inode, the I/O offset would be used
and then advanced
4 Pipes have no concept of offset
1

2

System calls

• sys_link and sys_unlink edit directories by creating or
removing references to inodes

System calls

• sys_link creates a new name for an existing inode

System calls

1 Takes as arguments two strings old and new

System calls

2 Increments its nlink ﬁeld – Number of links

System calls

3

Creates a new directory entry pointing at old’s inode

System calls

3
4

Creates a new directory entry pointing at old’s inode
The new directory entry is on the same inode as the existing one

create

• Creates a new name for a new inode

create

• Generalizes the creation of three ﬁle creation system calls:

create

1 open with the O_CREATE ﬂag creates a new ﬁle

create

2 mkdir creates a new directory

create

2 mkdir creates a new directory
3 mkdev creates a new device ﬁle

create (2)

• Makes a call to dirlookup to check whether the name already
exists

create (2)

exists
• If it does not exist, creates a new inode via a call to ialloc

create (2)

exists
• If create has been invoked by mkdir (T_DIR), it initializes it
with . and .. entries

create (2)

exists
• If create has been invoked by mkdir (T_DIR), it initializes it
with . and .. entries
• Finally, it links it into the parent directory

Buffer cache eviction policy

• xv6’s buffer cache uses simple LRU eviction


• A number of different policies can be implemented such as FIFO,
not frequently used, aging, random, etc.


• The buffer cache is currently a linked list but an efﬁcient
implementation can replace it with a hash table and/or a heap


• The buffer cache is currently a linked list but an efﬁcient
implementation can replace it with a hash table and/or a heap
• The buffer cache can also be integrated with the virtual memory
system to enabled memory-mapped ﬁles (mmap in Linux)

Today’s task

• xv6 has no support for memory-mapped ﬁles
• Come up with a design to implement mmap1

1

http://man7.org/linux/man-pages/man2/mmap.2.html

Reading(s)

• Chapter 6, “File system”, from “Code: directory layer" onwards
from “xv6: a simple, Unix-like teaching operating system”

AOS Lab 10: File system -- Inodes and beyond

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (14)

Ähnlich wie AOS Lab 10: File system -- Inodes and beyond

Ähnlich wie AOS Lab 10: File system -- Inodes and beyond (20)

Mehr von Zubair Nabi

Mehr von Zubair Nabi (11)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

AOS Lab 10: File system -- Inodes and beyond