SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
SEFS - Self Expiring File System
Index No: 09440599
S M Udara Rusiri Siyasinghe
smudararusiri@yahoo.com
Supervisor: Mr. Malik Silva
Master of Computer Science
University of Colombo School of Computing
January 15, 2013
Declaration
The thesis is my original work and has not been submitted previously for a degree at this or
any other university/institute.
To the best of my knowledge it does not contain any material published or written by another
person, except as acknowledged in the text.
Students name: S M Udara Rusiri Siyasinghe Date: ..........
....................................
Signature
This is to certify that this thesis is based on the work of
Mr. S M Udara Rusiri Siyasinghe
under my supervision. The thesis has been prepared according to the format stipulated and is of
acceptable standard.
Certified by:
Supervisor Name:
.................................... Date: ..........
Signature
ii
Abstract
This research is an attempt to design a new file system that will help the users to protect
the privacy and save disk space. In this file system we can highlight interested files with an
expiration date so that they may get deleted or moved to a special location where we can attend
to them and clear off what we do not want anymore. A separate data structure per file is used
to store additional attributes about each file created in this file system and desired behavior is
achieved by modifying these attributes accordingly. These additional attributes are saved in
the Extended File Attributes which is a file system feature supported by most of the Linux file
systems.
iii
Acknowledgement
This research project would not have been possible without the support of many people. It
is with immense gratitude that I acknowledge the invaluable assistance, support and guidance
of my supervisor Mr. Malik Silva. I owe my deepest gratitude also to Dr. Chamath Keppi-
tiyagama who was abundantly helpful with suggestions and guided this research towards the
right direction initially.
iv
Contents
Declaration ii
Abstract iii
Acknowledgement iv
List of Figures vii
List of Tables viii
List of Abbreviations ix
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Background 3
2.1 What do we have now . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 First of its kind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Methodology 7
3.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.1 SEFS Control Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.2 Core functionality of SEFS . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.3 Extended Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1 Development Environment . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.2 SEFS Implementation Architecture . . . . . . . . . . . . . . . . . . . 13
3.2.3 SEFS attributes - How it is implemented . . . . . . . . . . . . . . . . . 14
3.2.4 SEFS behavior implementation . . . . . . . . . . . . . . . . . . . . . 15
3.2.5 sefsutil - The Utility program . . . . . . . . . . . . . . . . . . . . . . 19
v
4 Evaluation 20
4.1 Scenario based testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Performance testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
A Installation Guide 27
References 30
vi
List of Figures
2.1 BleachBit File Cleaner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 FUSE module interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Implementation Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 SEFS attribute names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 SEFS inode struct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Extended Attribute storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 Extended Attribute retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.6 FUSE Operations struct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.7 Non Overridden operation from FUSE example file system . . . . . . . . . . . 18
3.8 An Overridden operation from SEFS file system . . . . . . . . . . . . . . . . . 18
4.1 Compile bash script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Performance Benchmarking through compilation methodology . . . . . . . . . 26
A.1 sefsutil command help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
vii
List of Tables
3.1 SEFS control attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Attribute values for Default Expiration Policy . . . . . . . . . . . . . . . . . . 10
4.1 SEFS Attributes default policy test . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 File expirability test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 SEFS ExpireDate extension test . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4 Ext4 compilation elapsed time . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.5 FUSE example file system compilation elapsed time . . . . . . . . . . . . . . . 25
4.6 SEFS file system compilation elapsed time . . . . . . . . . . . . . . . . . . . . 25
4.7 Compilation elapsed time comparison . . . . . . . . . . . . . . . . . . . . . . 25
viii
List of Abbreviations
JPEG Joint Photographic Experts Group
SEFS Self Expiring File System
EXT2 2nd Extended Filesystem
FAT32 32 bit File Allocation Table
NTFS New Technology File System
GUI Graphical User Interface
ix
1. Introduction
With the advancement of technology, mass production and competition, the capacity of a sec-
ondary storage device is rising rapidly while the cost is getting lowered. Thanks to much
cheaper storage devices our computers have persistence storage capacity in abundance. That
said, does this enormous capacity enable us to keep our files stored forever? No. This is not
always true because the size of a file is also increasing due to the advancement of the tech-
nology. Few years back we had 1 to 4 Mega-pixels of cameras and camera phones which will
generate images of average size 300KB. But those are history now and we have cheap cameras
which generate 8 - 14 Mega-pixels images which consumes average 1.2MB - 3MB when com-
pressed to JPEG. When it comes to professional level equipments photos will consume even
more. There is no exception when it comes to the world of Videography as well. Thanks to the
High Definition technology, HD videos consume Gigabytes of storage and now it is possible to
shoot HD videos even with some mobile phones. So the storage devices eventually become full
even with the enormous capacity. Therefore we cannot get rid of the need of deleting unwanted
files.
On the other hand we receive and store files which can be considered as sensitive in the sense
that security, file content or privacy is more concerned than mere storage. We might have used
those files already and interest is lost. Therefore further storage of these files may unnecessary.
Yet we might forget to delete those files and keeping them unattended may become hostile to
our selves because information leakage may cause irreversible damage to our legal, social and
personal life to a great extent.
1.1 Motivation
There are so many different file systems in the computer world. Some are general purpose and
some are used for more specialized purposes. Almost all these existing file systems are designed
in a way that files in the secondary storage are to be persistent throughout the lifetime of the
storage device unless they are deleted by the user.
1
These facts indicate that there is a need to have a new file system which provides additional
functionalities which are not found in conventional file systems. This new file system should
satisfy the above mentioned requirements by deleting the files itself. In the new file system we
should be able to highlight interested files with an expiration date so that they may get deleted
or moved to a special location upon expiration. At a later point of time we can attend to those
expired files and delete what we do not want any more from the storage.
Our objective is to design and implement the new file system, “SEFS - The Self Expiring
File System” which supports self expiring behavior of the files stored in it along with a set of
supporting tools to achieve the desired functionality. Implementing a new file system using C
is very broad and need considerable time and effort which is not available for an individual
Master’s level research project. In order to keep the scope of the project to an acceptable level,
this file system will be implemented as a User Space file system[1, 2]. User space file systems
are not replacements for standard file systems like Ext2, Ext3,Ext4, FAT32 or NTFS but will be
used side by side with those file systems. Among the available User Space file systems, FUSE[3,
4] is a stable open source framework which is available for Unix like operating systems. FUSE
framework will be used to implement the new file system and Ubuntu Linux distribution will be
used as the target operating system. Since FUSE is an open source framework and working for
most of the Linux distributions, the file system will not be limited to Ubuntu operating system.
However FUSE is not available for windows operating system. When FUSE is used not all the
file system operations have to be implemented which is really advantageous that I only have to
focus on the file expiring behavior of the file system.
2
2. Background
2.1 What do we have now
Currently there are hundreds of file cleaners implemented and available to be used by any PC
user to free disk space and protect privacy. Some of them are part of the operating system
like ‘Disk Cleanup’ in Windows and ‘apt-get clean’ for package repository cleanup in Linux.
Some of these are working in a passive way through Schedule Tasks and some in Active way
through utility applications that user has to initiate by themselves. The cleanup methodology is
more or less same in almost all the systems. Either they use predefined set of folder paths or
set of commonly used applications with known temporary file locations. Files and folders re-
side in these locations are cleaned by looking at the last modified date or just delete everything
since user gave the nod to delete them. There are some advanced open source applications
like BleachBit[5] which gives more control to the user to specify file locations and cleaning
outcome(Delete, truncate, or shred files). It even provides an xml based meta language called
CleanerML to write custom cleaners which makes it one step ahead of other systems but clean-
ing is done by looking at the available fie system parameters like ‘created date’ and ‘modified
date’. Figure 2.1 shows the GUI of the BleachBit cleaner and most of the other file cleaners
share the same designs and concepts.
Apart from file cleanup applications, we were unable to find any sophisticated system which
will give more control over to user with additional set of attributes or guaranteed way of cleaning
the fies. A new file system will be the best way to achieve this since it is the only way to
have additional set of attributes and guaranteed way of cleaning through file system operations.
Surprisingly we could not find any theoretical or prototype file system in this nature and SEFS
is going to be the first file system of its kind. Thanks to the FUSE framework we were able to
implement a prototype version of the file system so that first file system is available for other
researchers to improve and implement as a true kernel file system.
3
Figure 2.1: BleachBit File Cleaner
2.2 First of its kind
Since the new file system will be implemented as a user space file system on top of FUSE
framework, let’s talk about FUSE and user space file systems a bit. When we talk about user
space file systems one can ask the question what is user space? This is a good entry point to
a discussion about the user space file systems. Conventional operating systems divide virtual
memory in to two sections called kernel space and user space. Kernel space portion of the mem-
ory is reserved for running the operating system kernel, kernel extensions and device drivers.
Kernel is not a process rather a controller/manager of all the processes. It has exclusive control
over everything that is occurring on the system. Therefore kernel space programs have admin-
istrator(root) privileges when they are executing. User space on the other hand is the portion
of virtual memory in which all the user processes are executed. User processes are instances
of all programs other than the kernel and they cannot mess around with memory belongs to
other programs or kernel. They do not have administrator(root) privileges therefore they cannot
directly access system resources and have to go though the kernel for resources. As explained
above a user space file system can be regarded as a program executing in the user space.
Accessing a file stored in a file system is an IO operation and is a duty of the kernel and
4
implemented in the file system itself. Conventional file systems like FAT32, NTFS, Ext2 and
Ext3 are part of the OS kernel and runs on the kernel space. Then how a user space program will
do an IO operation without having root privileges or without having access to kernel resources?
This is indeed a good question. Thanks to the Linux kernel design all the requests for file
operations are handled by a kernel module called VFS - Virtual File system Switch. This is
only a standard interface to handle file operations and actual implementation can be found at
separate kernel modules which are dedicated to each file system. Using this mechanism user
space file system frameworks like FUSE[3, 4] are installed and executed as a kernel module and
user space file systems are implemented on top of the framework APIs.
Figure 2.2: FUSE module interactions
Figure 2.2 illustrates the flow of control when we execute a command against a user space
file system implemented using FUSE. First we have to compile the file system code and generate
a binary. Then we can mount the file system to a mount point and execute commands against it.
For an example, If we execute the command
ls -l /tmp/fuse
against a folder resides inside a mounted FUSE file system, the control will be given to the
kernel because ls -l will results in an IO operation. This request is handled by the kernel module
VFS, which is just an interface to the actual file system modules. Since this file system runs on
5
top of the FUSE module, control is given to the FUSE kernel module which then goes through
glibc and libfuse (libfuse is the FUSE library in user space) and contacts the actual file system
binary. It then returns the results back down the stack to the FUSE kernel model, back through
the VFS to the user.
The new file system will aid the user with cleaning unwanted files by looking at the user’s
behavior. Ordinary file systems does not support the user by making decisions on behalf of
him rather user will control each and every modification that is done to the files in the file
system. But SEFS is not the only file system that will help the user when making changes to the
files. Among such file systems Elephant[6] file system is a quite interesting one due to its core
functionality which is, it never forgets a file. When you delete a file in the Elephant file system
you are not reclaiming the storage. This file system is intended to support making file revisions
without the support of the user. Elephant is the total opposite of the SEFS but it also consider
the fact that storage eventually become full and need to reclaim the storage by deleting some
versions of a file. The decision of selecting which file version to delete is really important to the
usability of this file system. Similarly we have to come up with a file expiration schema which
is the core functionality of the SEFS.
6
3. Methodology
3.1 Design
When the file system is designed for the first time our thoughts were to allow the user to decide
what files needed to be expired. And he has to mark the files with relevant flags which control
the expiration behavior. Our most important design consideration was the behavior should be
applied per file basis. It is not something defined globally and will not affect all the files. User
has the full control over his actions and file system will behave the way he likes. But the
fundamental problem that we attempt to solve by this file system is cleaning useless files by the
file system itself. If file system is to fulfill this requirement, user has to set the necessary flags.
In general what a typical user would do is, just stores the files he needs in the new file system
with the intention of setting the necessary flags at a later stage. But eventually what would
happen is, he forget to set the flags and new file system will behave just like a conventional file
system.
Other extreme end of the above design is letting the file system take control of everything.
It will set the required expiration flags and will expire the files according to the way it is im-
plemented. But the user will not be able to control which files are expirable and which are not
as well as when to expire. A user might not like the idea of a file system which deletes things
without his consent if file system sets the expirable attributes. This is a valid argument about
this particular design of the file system but the design will make sure that file system will be
useful in its intended specialized behavior.
Since our interest is focused on the ability of the file system to clean useless files which user
thinks they really are useless, we have come up with another design which can be considered as
a hybrid model. In this design the file system will assign set of required flags in each and every
file that are stored in this file system at the time of creating the file. Since the file system assigns
flags by default, user should be given the ability to change those flags as and when he wishes
in per file basis. In this model the file system forces the user and also provide a mechanism to
safeguard his useful files(by removing self expiration flags), otherwise all the files are treated
7
exactly the same way and will be governed by the default file expiration policy. So the user must
be active when storing files in the, SEFS - Self Expiring File System by removing the flags of
each file which he wants to keep indefinitely while file system can carry out its specialized tasks
so that SEFS is not just a conventional file system. Moreover another important behavior was
considered when designing the file system. That is, expired files will not be deleted from the
storage unless the user intentionally sets the SEFS attribute, DeleteOnExpire on each file. With
this flexible feature file system will aid the user by moving expired files to a designated zone in
the file system which we would call Scavenger Buffer Zone. So the user does not have to worry
about the files which are being expired since he has given a second chance to recover the files.
Controlled flexibility is the key in this design of the file system.
3.1.1 SEFS Control Attributes
The prominent feature of this file system is its unique set of additional attributes. Availability
and usage of new attributes distinguish this file system with other mechanisms of automatic file
cleaning. Most of the other systems allow only to specify folders where we need to store files
if we need them cleaned by the system. And there is no way of specifying an expiration date or
controlling the expiration behavior. The power and the flexibility of SEFS is coming from its
controlling attributes and they allow the expiration behavior to be controlled at fine grained per
file level instead of folder level.
Let us dig deeper in to these attributes in this section and eventually come to see how SEFS
behave according to these flags. Table 3.1 list down the new file attributes which are needed to
achieve the desired functionality.
Attribute Name Type C Equivalent Description
Expirable Boolean int True if a file is Expirable by the file system.
BaseDate Date time t Date value when the Expire Date was assigned.
ExpireDate Date time t File Expiration Date.
UserModified Boolean int True if user has modified any SEFS flags.
InBuffer Boolean int True if the file is in the Scavenger Buffer
DeleteOnExpire Boolean int True if user wants to Delete the file once expired.
Origin String char[] Original path of the file before moving into the Buffer.
Table 3.1: SEFS control attributes
8
More on attributes
Expirable This flag can be considered as the controlling flag of the entire expiration behavior.
SEFS will set this flag once a new file is created in the file system and default expiration
policy will be applied if and only if this flag has been set to be true. User will be allowed
set this flag to false if he wishes and once this is set to false expiration policy will never
be applied on that particular file.
Base Date This attribute will hold the date on which the Expire Date was assigned to a file.
Whenever the user or the file system updates the Expire Date, this attribute will also be
updated with the current date.
Expire Date This is the attribute which will hold the Expire Date of a file if that particular file
is an expirable file. Both file system and the user will be able to set and update the expire
date and once this date is reached SEFS will expire the file accordingly.
User Modified This flag will set to be true if user has done any change to the SEFS attributes,
so that file system will left the particular file untouched without modifying any flags at
certain command executions like move the file to a new location.
In Buffer This attribute will set to be true once a particular file is expired and moved into
the Scavenger Buffer Zone. And this attribute will help the file system to take critical
decisions when user is recovering the file from Buffer Zone.
Delete On Expire This attribute suppose to be set only by the user and once set to true on
a particular file, it will get deleted from the file system upon reaching the expire date
without moving it into Buffer Zone.
Origin This attribute will contain the file path on which this particular file was stored before
moving it into the Buffer Zone. Information stored on this attribute will aid the user at
the time of file recovery from the Buffer.
3.1.2 Core functionality of SEFS
In this file system most critical attribute values will be set based on a global constant called
Default Expiration Period. User is allowed to change this parameter only and SEFS will derive
rest of the values based on this parameter. This way user is given the required flexibility and
9
at the same time configuration complexity is reduced. When there are too may configurable
parameters users might end up with over configuring or under configuring the system.
Default Expiration Policy
Attribute Name Value
Expirable true
BaseDate Current System Date Time
ExpireDate BaseDate + Default Expiration Period
UserModified false
InBuffer false
DeleteOnExpire false
Origin Original path of the file before moving into the Buffer.
Table 3.2: Attribute values for Default Expiration Policy
Table 3.2 lists the SEFS attributes with the default policy applied. When a new file is created
in the file system SEFS will make sure that controlling attributes will hold these default values.
Therefor this can be considered as the default file expiration policy and user is also given the
flexibility to modify certain attribute values later.
• Whenever a new file is created in the file system, it will be marked as an Expirable file
and by default an expire date will be assigned by adding Default Expiration Period to the
system date. The current system date will be stored as the Base Date. Current path will
be stored in the Origin attribute.
• Once the Expire Date is reached, files will be moved in to the Buffer Zone since in default
policy DeleteOnExpire is set to false.
• Expired files will get moved in to the Buffer Zone or deleted from the file system at the
time of accessing its containing folder. In most of the time this task will be done when
user executes the directory listing command.
• Whenever a file in the Buffer Zone is recovered, it will be treated as an useful file and no
SEFS attributes will be set by the file system but user would allowed to set any flag as he
wishes.
10
• When the user is accessing an expirable file(Expirable attribute is set to true), it can be
considered as a useful file to the user. Therefore the file system will give an extension to
the ExpireDate using formula 4.1 but if the file is a user modified file, no change will be
done to the ExpireDate because priority will be given to the user’s preference. Whenever
the Expire Date is updated, BaseDate will also get updated with the System Date.
Extension = (DefaulExpirationPeriod)−(ExpireDate−CurrentDate)
NewExpireDate = OldExpireDate+Extention
(3.1)
• If the user is making a copy of a file stored in the file system, the copy is treated as a
totally new file being created in the file system and SEFS default policy attribute set will
be assigned and an extension will be given to the Expire Date of the original file. When
the file is moved to a new location, original attribute set will be left untouched if the
UserModified flag is set. Otherwise an extension will be given to the ExpireDate.
After the initial design was done next problem we had to solve was, How to store these
new set of attributes for a file? Conventional file systems store metadata related to a file in
the inode structure. If we had to come up with a new inode structure to make room for these
new attributes, it would have turned out to be a big development. Fortunately with Extended
Attributes[7] which is a file system feature that provides a mechanism to store additional set
of user defined meta data about a file, a new inode implementation was not needed. Currently
Extended Attributes are supported in Linux file systems ext3, ext4, JFS, ReiserFS, XFS, Btrfs,
OCFS2 1.6 and FAT, HPFS in Windows. Clearly this will helps to keep the scope intact since
Base file system will handle the basic file system functionality while FUSE file system will
focus on the SEFS specific behavior. But this will impose a restriction on the underling file
system that will be used as the base file system. Ubuntu 12.04 operating system will be used
as the platform to implement the prototype FUSE file system. Ext4 journal file system which is
the latest of Ext file systems will be used as the base file system.
Above all there is one important thing the user has to pay his attention on for the smooth
operation of this file system. This may also be considered as a weakness of SEFS. The limitation
is, the user is required to make sure that System Date and Time is accurate. This is something
we can always expect from a user who will go this extra mile to have this SEFS functionality.
11
3.1.3 Extended Attributes
Since we are heavily dependent on the Extended Attributes let us talk about them a bit. Extended
attributes(xattr) is a file system feature that enables applications and users to associate files with
additional set of metadata. Typically these metadata is not understood by the file system as they
do with regular file attributes. For an example normal file attributes like size, modified date and
permissions have a clearly defined purpose and maintained and understood by the file system.
Contrary to normal attributes, Extended attributes can be used to store the author of a document,
the character encoding of a document, a hash value or digital signature and are not maintained
by the file system but application programs.
Extended attributes are name/value pairs associated with files as well as directories. An
Extended attribute may be defined or undefined. Similarly its value also may be empty or non-
empty given that attribute is defined. Extended attributes are stored in separate disk blocks
pointed to by entries in the inodes whereas normal attributes are stored in the inode itself. They
are often used to provide additional functionality or extend the file system features. For an
example, additional security features such as Access Control Lists(ACLs) may be implemented
using extended attributes. Since Extended attributes are stored in disk blocks the kernel and the
file system may impose limits on the maximum number and the size of extended attributes. In
the extended file system implementations like ext2, ext3 and ext4 each extended attribute must
fit on a single file system block depending on the specified block size when the file system was
created. Typically this value is 4096 bytes(4 KBs). The space consumed for storing extended
attributes is counted towards the allocated disk quotas of the file owner and the group.
In some file systems like ext2, ext3 and reiserfs, it is required to mount the file system with
‘user xattr’ mount option in order to use the extended attributes. In the Ubuntu 9.04 operating
system that we started our implementation, ext3 was the default file system and by default it is
mounted with ‘user xattr’ mount option. Later the SEFS implementation was compiled in the
Ubuntu 12.04 with ext4 as the base file system with zero modifications to the code.
Extended Attribute names are zero-terminated strings and they should be specified as fully
qualified names. This fully qualified name consists of a namespace and the attribute name.
The namespace mechanism is used to separate extended attributes in to different classes and
different access permissions are required to manipulate attributes in each class. Currently there
are four namespaces. Namely; security, system, trusted, and user. The number of supported
namespaces are vary with the file system implementation and ext4 file system support them all.
Among them SEFS is implemented only using ‘user’ namespace which requires minimal set of
12
permissions like file access permissions. And the Extended ‘user’ Attributes can be assigned
to files and directories for storing arbitrary additional information like mime type and character
encoding. Therefore it was an oblivious choice to use Extended user attributes to store SEFS
specific attributes.
3.2 Implementation
The prototype implementation consists of two programs. One is the SEFS file system itself and
the other one is a Utility program which provides the means of manipulating SEFS attributes
and changing the self expiring behavior of the stored files.
3.2.1 Development Environment
The prototype file system is implemented on top of the FUSE framework which implicitly im-
pose a limitation towards the operating system. Since FUSE only supports Unix like operating
systems as of now, Ubuntu 12.04 Linux distribution was chosen as the underlying OS. Ubuntu
12.04 is configured to use ext4 extended file system as the default file system. Therefore it was
used as the base file system. Our prototype implementation should be fast enough since this is
a file system at the end of the day. Therefore C language was used to implement the file system
even though FUSE has other flavors of easy programming languages like Java. C++ was aban-
doned even though it is much easier to program in C++ since most of the help and example file
systems were available for C.
3.2.2 SEFS Implementation Architecture
C is not an object oriented language and this makes it difficult to implement and maintain C
codes when they started to grow bigger and bigger. Figure 3.1 shows a high level overview
of the implementation architecture and best efforts have been made to reuse the source codes
and to achieve some sort of modularized implementation. Reusable utility methods have been
implemented and maintained in a separate source file(sefslib.c) and both SEFS file system im-
plementation file(sefs.c) and Utility program(sefsutil.c) which is provided to manipulate self
expiring behavior are reusing the utility methods.
13
Figure 3.1: Implementation Architecture
3.2.3 SEFS attributes - How it is implemented
SEFS attributes is an integral part of the new file system and this makes it mandatory to have
an efficient storage and retrieval mechanism. Extended Attributes are the most efficient place
to store these attributes and manipulation of name/value pairs had to be done in a reliable way
that will make sure attribute values will not overwrite each other and retrieved values are 100
percent accurate.
#define NAME_PREFIX "user."
#define EXPIRABLE "Expirable"
#define BASE_DATE "BaseDate"
#define EXPIRE_DATE "ExpireDate"
#define USER_MODIFIED "UserModified"
#define IN_BUFFER "InBuffer"
#define DELETE_ON_EXPIRE "DeleteOnExpire"
#define ORIGINE "Origine"
Figure 3.2: SEFS attribute names
Code listing in figure 3.2 shows how SEFS attribute names are defined as global constants
and ‘NAME PREFIX’ constant defines the Extended attribute namespace ‘user’ which is used
in SEFS. When these global constants are used as names when manipulating extended attribute
values, it will guarantee the reliability, ease of implementation and maintainability. Since these
14
are defined in the sefslib.h header file and it is included both in SEFS file system code and
Utility program, uniformity of attribute names is maintained across two different programs.
Like attribute names, a reliable mechanism is required to manipulate the attribute values as
well. Code listing 3.3 shows the ‘SEFS INODE’ structure which efficiently manages attribute
values across two programs since it is defined in the same sefslib.h header file. Though this is
called SEFS INODE it is nowhere closer to an inode of a Unix file system. But when it comes
to the implementation, this struct is the inode of SEFS and equally important to SEFS as a good
old file system inode; hence the name.
typedef struct sefsInode
{
int expirable;
time_t baseDate;
time_t expireDate;
int userModified;
int inBuffer;
int deleteOnExpire;
char origine[PATH_MAX];
} SEFS_INODE;
Figure 3.3: SEFS inode struct
Using above implementation design, Extended Attribute name/value pairs are manipulated
in the two programs effectively. But the actual storage and retrieval of these attributes are done
via ‘setxattr’ and ‘getxattr’ of ext4 system calls. The Figures 3.4 and 3.5 shows storage and
retrieval of Extended attributes respectively.
3.2.4 SEFS behavior implementation
SEFS is implemented by altering the FUSE example file system found at example/fusexmp -
fh.c. This is a C source file with a main() method and inside the main, fuse main() is called to
interact with the FUSE libraries.
int main(int argc, char *argv[])
{
umask(0);
return fuse_main(argc, argv, &xmp_oper, NULL);
}
15
void setSefsFlags(const char* path, SEFS_INODE* node)
{
int r = -1;
char value[32];
/*Expirable*/
sprintf(value, "%d", node->expirable);
r = setxattr(path, NAME_PREFIX EXPIRABLE, value, strlen(value), 0);
/*BaseDate*/
sprintf(value, "%ld", node->baseDate);
r = setxattr(path, NAME_PREFIX BASE_DATE, value, strlen(value), 0);
/*ExpireDate*/
sprintf(value, "%ld", node->expireDate);
r = setxattr(path, NAME_PREFIX EXPIRE_DATE, value, strlen(value), 0);
/*UserModified*/
sprintf(value, "%d", node->userModified);
r = setxattr(path, NAME_PREFIX USER_MODIFIED, value, strlen(value), 0);
/*InBuffer*/
sprintf(value, "%d", node->inBuffer);
r = setxattr(path, NAME_PREFIX IN_BUFFER, value, strlen(value), 0);
/*DeleteOnExpire*/
sprintf(value, "%d", node->deleteOnExpire);
r = setxattr(path, NAME_PREFIX DELETE_ON_EXPIRE, value, strlen(value), 0);
/*Origine*/
r = setxattr(path, NAME_PREFIX ORIGINE, node->origine, strlen(node->origine)+1, 0);
if (r != 0)
perror("Set Attribute Error ");
}
Figure 3.4: Extended Attribute storage
void getSefsFlags(const char* path, SEFS_INODE* node)
{
char boolVal[1];
char longIntVal[10];
char strValue[1024];
getxattr(path, NAME_PREFIX EXPIRABLE, boolVal, 1);
node->expirable = atoi(boolVal);
getxattr(path, NAME_PREFIX BASE_DATE, longIntVal, 10);
node->baseDate = atoi(longIntVal);
getxattr(path, NAME_PREFIX EXPIRE_DATE, longIntVal, 10);
node->expireDate = atoi(longIntVal);
getxattr(path, NAME_PREFIX USER_MODIFIED, boolVal, 1);
node->userModified = atoi(boolVal);
getxattr(path, NAME_PREFIX IN_BUFFER, boolVal, 1);
node->inBuffer = atoi(boolVal);
getxattr(path, NAME_PREFIX DELETE_ON_EXPIRE, boolVal, 1);
node->deleteOnExpire = atoi(boolVal);
getxattr(path, NAME_PREFIX ORIGINE, strValue, 1024);
strcpy(node->origine, strValue);
}
Figure 3.5: Extended Attribute retrieval
16
When calling the fuse main() function a very important additional parameter is passed to
it along with the standard main arguments. This is a struct called ‘fuse operations’ and all
the standard file system operations that will be subscribed and overridden is specified in this.
Once this is done new FUSE file system will get the control when the standard file system
operations are called by the kernel or the application programs, enabling the FUSE file system
to implement a totally new behavior or enhance the standard functionality.
static struct fuse_operations sefs_oper = {
.getattr = sefs_getattr,
.opendir = sefs_opendir,
.readdir = sefs_readdir,
.mknod = sefs_mknod,
.mkdir = sefs_mkdir,
.rmdir = sefs_rmdir,
.rename = sefs_rename,
.chmod = sefs_chmod,
...
...
...
.create = sefs_create,
.open = sefs_open,
.read = sefs_read,
.write = sefs_write,
.statfs = sefs_statfs,
.setxattr = sefs_setxattr,
.getxattr = sefs_getxattr,
.listxattr = sefs_listxattr,
.removexattr= sefs_removexattr,
.lock = sefs_lock,
.flag_nullpath_ok = 1,
};
Figure 3.6: FUSE Operations struct
Code listing 3.6 shows part of the fuse operations struct and out of all the subscribed opera-
tions only required operations have been overridden. Rest of the operations just call the system
calls provided by the underlying ext4 file system. It was sufficient to override the following set
of operations to achieve the self expiring behavior of SEFS prototype implementation.
• sefs write
• sefs read
• sefs open
17
• sefs create
• sefs rename
• sefs readdir
Code listing 3.7 shows a non overridden operation from FUSE example file system and code
listing 3.8 shows an enhanced operation from SEFS file system.
static int xmp_create(const char *path, mode_t mode, struct fuse_file_info *fi)
{
int fd;
fd = open(path, fi->flags, mode);
if (fd == -1)
return -errno;
fi->fh = fd;
return 0;
}
Figure 3.7: Non Overridden operation from FUSE example file system
static int sefs_create(const char *path, mode_t mode, struct fuse_file_info *fi)
{
int fd;
SEFS_INODE fileNode;
fd = open(path, fi->flags, mode);
if (fd == -1)
return -errno;
getDefaultSefsFlags(&fileNode);
fileNode.expireDate = fileNode.baseDate + DEFAULT_EXPIRE_PERIOD;
strcpy(fileNode.origine, path);
setSefsFlags(path, &fileNode);
fi->fh = fd;
return 0;
}
Figure 3.8: An Overridden operation from SEFS file system
18
3.2.5 sefsutil - The Utility program
Once the file system is compiled and mounted a user can use the SEFS file system as any other
file system through the mount point. Then the SEFS will starts to set the necessary flags to
carry out it’s self expiring functionality. The SEFS was designed in a way that fine grained file
level control is available to the user. Therefore the user should be given the ability to change the
SEFS attributes of a file so that file system will behave according to the user’s requirement. For
that a separate command line program called ‘sefsutil’ was implemented and will be distributed
together with the SEFS file system prototype.
When a file stored in the SEFS file system is expired it will be moved into the SEFS Buffer
zone specified by the path in sefs.conf configuration file. If the user wishes to recover this file he
can either move it to a desired location or use the sefsutil program. When this program is used,
the user can restore the file to its original location by using path stored in the ‘Origin’ SEFS
attribute and the file is considered as a useful file and ‘Expirable’ flag is removed. Moreover it
can be used to have a look at the current SEFS attribute values and modify the values assigned to
‘Expirable’, ‘DeleteOnExpire’ and ‘ExpireDate’ flags. Program usage instructions are available
in the Appendix A.
19
4. Evaluation
To free disk space and protect privacy in a more productive way our approach was to implement
a new file system deviating from common file cleaners. Once the file system is implemented
next question was how to test and evaluate the file system. When evaluating SEFS we had
to make sure that technique will reflect the SEFS’s ability of demonstrating its file expiring
behavior. To test this we used scenario based test protocols and once the testing it completed
we can easily conclude that SEFS is capable of demonstrating its file expiring capabilities.
To evaluate the performance researchers try to benchmark their file system against an es-
tablished file system using various file system benchmarking methods. But their is no single
accepted benchmarking technique available[8]. Most researchers have used Andrew file sys-
tem benchmark which was introduced along with the Andrew file system[9] but this benchmark
seems obsolete now. There are some modern techniques like Filebench, Postmark, Compile
techniques(Apache source, Linux Kernel etc) and different Ad-hoc techniques. When these
benchmarking techniques were evaluated we felt that Compile technique will be suitable to
evaluate the SEFS since a source code compile will read new files, create new files, write to
existing files and lookup the files. Therefore a compile will cover most of the overridden file
operation system calls. SEFS prototype is implemented on top of FUSE framework as a user
space file system and measured performance will not be any better than kernel file systems like
Ext4 using which SEFS is implemented. But what we interested in finding out is whether it will
perform within acceptable limits with the added functionality.
There are two known limitations of this file system which are listed bellow and we cannot
say this file system is a failure because of them. A user has to use the SEFS with these inherent
limitations but we do not see them as weakness; rather trade-offs.
• The user or the admin has to make sure that System Date and Time is accurate and if Date
and Time is incorrect, SEFS will not behave as expected.
• The SEFS attributes are assigned only to the files and directories are ignored from the
expiring functionality. This file level expiration behavior gives more control over to the
20
user and simplifies the expiration behavior and configuration.
4.1 Scenario based testing
A set of scenario based tests were created along with the expected outcome and results were
recorded against each test. By comparing expected outcome and actual result, it is easy prove
that SEFS is working according to its design or not.
Prerequisites:
Configure the System Date and Time to reflect the correct values.
Mount the SEFS file system at /mnt/sefs mount point.
Configure the SEFS buffer path to /mnt/sefs buffer.
Configure the Default Expiration Period to 10 minutes.
Scenario 01: Default Expiration Policy Test
Create a file in the path /mnt/sefs/tmp/test1.txt
udara@ubuntu:~/MCS_Working/sefs$ ls -l > /mnt/sefs/tmp/test1.txt
Examine the SEFS attribute values using the sefsutil.
udara@ubuntu:~/MCS_Working/sefs$ ./sefsutil -p /tmp/test1.txt
Test Attribute Expected Value Actual Result
Expirable 1 1
BaseDate System Date Time 2013-01-15 00:15:05
ExpireDate BaseDate + 10 minutes 2013-01-15 00:25:05
UserModified 0 0
InBuffer 0 0
DeleteOnExpire 0 0
Origin tmp/test1.txt tmp/test1.txt
Table 4.1: SEFS Attributes default policy test
According to the Table 4.1 Actual result is aligned with the expected result; therefore Sce-
nario 01 Testing is Passed.
21
Scenario 02: File Expirability Test
The expectation of this test scenario is to find out the file expirability of the SEFS file system.
Execute the ‘ls -l’ directory listing command against the path /mnt/sefs/tmp/ after 2013-01-15
00:25:05.
udara@ubuntu:~/MCS_Working/sefs$ date
udara@ubuntu:~/MCS_Working/sefs$ ls -l /mnt/sefs/tmp/
Test Case Expected Value Actual Result
Execute the ‘date’ command Currenct Date Time Tue Jan 15 00:35:07 PST 2013
Check the availability of file
/tmp/test1.txt
Not available Not available
Check the availability of file
/tmp/test1.txt at the SEFS Buffer
Available in Buffer Available in Buffer
Table 4.2: File expirability test
According to the Table 4.2 Actual result is same as the expected result; therefore Scenario 02
Testing is Passed. SEFS file system is capable of expiring a file upon reaching the ExpireDate.
Scenario 03: Test the ability of preserving a frequently accessed file
Create a file in the path /mnt/sefs/tmp/test3.txt
udara@ubuntu:~/MCS_Working/sefs$ ls -l > /mnt/sefs/tmp/test3.txt
Examine the ExpireDate attribute value using the sefsutil.
udara@ubuntu:~/MCS_Working/sefs$ ./sefsutil -p /tmp/test3.txt
After waiting 5 minutes modify the content of the /tmp/test3.txt file
udara@ubuntu:~/MCS_Working/sefs$ date > /mnt/sefs/tmp/test3.txt
Examine the ExpireDate attribute value using the sefsutil after modifying the file.
Extension = (DefaulExpirationPeriod)−(ExpireDate−CurrentDate)
NewExpireDate = OldExpireDate+Extention
NewExpireDate = OldExpireDate+(DefaulExpirationPeriod)−(ExpireDate−CurrentDate)
NewExpireDate = CurrentDate+DefaulExpirationPeriod
(4.1)
22
Test Case Expected Value Actual Result
Original BaseDate System Date 2013-01-15 01:15:37
Original ExpireDate BaseDate + 10 Minutes 2013-01-15 01:25:37
Check the file content after modifi-
cation
result of the ‘date’ command Tue Jan 15 01:20:40 PST 2013
New ExpireDate Original ExpireDate + Exten-
sion
2013-01-15 00:30:40
Table 4.3: SEFS ExpireDate extension test
The equation 4.1 is used to calculate the new ExpireDate after giving an extension due to
recent access to the file. So the new ExpireDate should derive by adding Default Expiration
Period (10 minutes in these test scenarios) to the result of the ‘date’ command which is the
‘CurrentTime’ when the file is actually modified. By looking at the Table 4.3 we can clearly
see that new ExpireDate calculated by the equation 4.1 is same as the test out come; therefore
Scenario 03 Testing is Passed.
Above three scenarios sufficiently cover the SEFS file system expiration behavior. Since all
three testing scenarios were passed we can conclude that SEFS is working reliably as it was
designed.
4.2 Performance testing
Demonstrating SEFS’s ability of file expiring alone will not be enough to draw a conclusion
of its usability. If it’s performance is weak it cannot be considered as a useful file system. To
evaluate the performance we had to choose a technique that will reflect the implementation
overhead of all or most of the overridden SEFS file system operations. The compile technique
is used to evaluate the performance since a compiler has to read, write, create, list and remove
files in the compilation process effectively covering most of the overridden operations.
In order to evaluate SEFS’s performance, we will compare the compile technique results of
Ext4, FUSE example file system which SEFS is based on and SEFS itself. Clearly we have to
anticipate SEFS’s performance will be the lowest but we need to prove that It will be useful
even with some performance degradation.
To compare the performance a sufficiently large program source is required but large sources
like Linux kernel, apache makes compilation difficult due to dependency issues. To avoid com-
plexities we have used the FUSE 2.9.2 source code to carry out the compilation technique since
23
we have compiled this so many times without much difficulties.
The FUSE 2.9.2 source was compiled in a Intel Core i3 process with. Compilation needs to
execute the ./configure, make, and make install to fully compile and install the framework. But
for our evaluation its only enough to execute the ./configure and make commands. Since we
need to measure the elapsed time we created a simple bash script to execute the two commands
and use the ‘time script-name’ to output the elapsed time.
#! /bin/bash
cd /tmp/fuse-2.9.2
./configure
make
Figure 4.1: Compile bash script
Code listing 4.1 shows the bash script used to compile the FUSE. Following command
is used to output the elapsed time to execute the bash script given that name of the script is
compile.
> time ./compile
First the compilation was done on the Ext4 file system and FUSE source is located and
accessed at /tmp/fuse-2.9.2. The compilation was done 5 times and average elapsed time was
taken. For every compilation directory /tmp/fuse-2.9.2 was removed and extracted again to
make room for ./configure to generate a lot of files from the scratch. Table 4.4 shows the test
outcome of Ext4 file system compilation time.
Execution Elapsed Time(s)
1 37.651
2 37.811
3 38.301
4 37.191
5 39.192
Average 38.0292
Table 4.4: Ext4 compilation elapsed time
Next FUSE example file system fusexmp fh was tested and this was mounted to /mnt/xmp
24
and bash script also changed accordingly. Table 4.5 shows the test outcome of FUSE example
file system compilation time.
Execution Elapsed Time(s)
1 43.605
2 43.538
3 40.940
4 40.676
5 42.620
Average 42.2758
Table 4.5: FUSE example file system compilation elapsed time
Finally table 4.6 shows the test outcome of SEFS file system compilation time.
Execution Elapsed Time(s)
1 45.868
2 41.711
3 42.547
4 42.193
5 42.079
Average 42.8796
Table 4.6: SEFS file system compilation elapsed time
For ease of comparison the average elapsed time for compilation of FUSE source is put
together into the table 4.7
File System Average Elapsed Time(s)
Ext4 38.0292
FUSE Example 42.2758
SEFS 42.8796
Table 4.7: Compilation elapsed time comparison
The data shown in the table 4.7 is transformed in to a easy to understand visually appealing
25
graph depicted in the figure 4.2. What we can clearly see is there is no performance degradation
by introducing additional functionality to SEFS file system operations with respect to FUSE
example file system but it is obvious that Ext4 kernel file system performs better than user
space file systems but when considering what user space file systems can offer, this level of
performance hit is not that important.
As a whole we evaluated the SESF-Self Expiring File System by scenario based testing and
performance benchmarking and conclusion is added functionality provided by the file system
and its usability is very encouraging.
Figure 4.2: Performance Benchmarking through compilation methodology
4.3 Future Works
This section briey discusses several remaining issues related to our design and prototype imple-
mentation limitations that we are actively considering. The actual file expiration functionality
is implemented in the ‘readdir’ file system operation only and this limits the SEFS’s usability to
certain extent when it comes to the claiming disk space from expired files. Due to this limitation
even though the file is expired already, it will not be cleaned until ‘readdir’ operation is called
by the ‘ls’ file listing command or the by the File Manager. We are considering to introduce a
background cleaner application which will clean expired files and claim free disk space.
The sefsutil program which is used to modify the SEFS attributes set by the file system is
less user friendly and we can improve the user experience by introducing GUI application or in
the grand scale try to modify the Linux file manager Nautilus to provide that functionality.
26
A. Installation Guide
Installation
Follow the step by setp guideline to install the FUSE framework and SEFS file system.
Prerequisites:
Ubuntu 12.04 Operating system
FUSE framework 2.9.2
• Download the fuse framework gzip file fuse-2.9.2.tar.gz from http://sourceforge.
net/projects/fuse/files/fuse-prerelease/ extract the content to folder fuse-2.9.2.
• Open a terminal and logged in as root.
• Install the framework by executing following commands in the given order.
>./configure
>make
>make install
Once the fuse framework is successfully installed, environment is ready to compile the SEFS
filesystem.
• Extract the sefs.tar.gz file to the folder sefs.
• Open a terminal and go to the sefs folder which contains the extracted files.
• Type ‘make’ and press enter to compile the sefs file system. This will generate following
executable files.
sefs - executable for file system.
sefsutil - executable for utility program to interact with the SEFS attributes.
27
Execution
Follow the steps to mount the SEFS file system.
• Create two folders in the /mnt folder with read, write, execute permissions. (chmod 777)
/mnt/sefs
/mnt/sefs buffer
• Open a terminal and go to the sefs folder where the filesystem executables were generated.
• Execute the following command to mount the filesystem
>./sefs /mnt/sefs
Note: If the Linux distribution has an old fuse package installed, we will get the follow-
ing warning message. We can either uninstall the old FUSE package or just ignore the
warning message.
fuse: warning: library too old, some operations may not work
Once the file system is mounted it can be used through the mount point and speciallity of
this is it will display the current directry tree inside the root(/) folder when execute the ‘ls
-l’ command.
• To un-mount the file system, execute the following command
>fusermount -u /mnt/sefs
sefsutil Usage
The sefsutil program is implemented with a built-in help to provide information about the com-
mand and parameter usage. To get the parameter help information execute the following com-
mand. Figure A.1 shows the command help result.
>./sefsutil -h
28
Figure A.1: sefsutil command help
29
References
[1] J. B. Layton, “User space file systems.” Available: http://www.linux-mag.com/id/
7814/, June 2010.
[2] A. Rajgarhia and A. Gehani, “Performance and extension of user space file systems,”
SAC’10, Sierre Switzerland, 2010.
[3] M. Szeredi, “Fuse - filesystem in userspace.” Available: http://fuse.sourceforge.
net/, January 2005.
[4] S. Singh, “Develop your own filesystem with fuse.” Available: http://www.ibm.com/
developerworks/linux/library/l-fuse/, May 2011.
[5] A. Ziem, “Bleachbit - clean junk to free disk space and to maintain privacy.” Available:
http://bleachbit.sourceforge.net.
[6] D. J. Santry, M. J. Feeley, and N. C. Hutchinson, “Elephant: The file system that never
forgets,” In Workshop on Hot Topics in Operating Systems, 1999. Available: http://www.
cs.fsu.edu/~awang/courses/cop5611_s2006/elephant.pdf.
[7] A. Gruenbacher and the SGI XFS development team, “Ubuntu manuals, attr - extended
attributes.” http://manpages.ubuntu.com/manpages/maverick/man5/attr.5.html.
[8] V. Tarasov, S. Bhanage, and E. Zadok, “Benchmarking file system benchmarking: It *is *
rocket science.”
[9] J. H. Howard, “An overview of the andrew file system,” in in Winter 1988 USENIX Confer-
ence Proceedings, pp. 23–26, 1988.
30

Weitere ähnliche Inhalte

Was ist angesagt?

New linux course_modules
New linux course_modulesNew linux course_modules
New linux course_modulessanmugamk
 
Machine_translation_for_low_resource_Indian_Languages_thesis_report
Machine_translation_for_low_resource_Indian_Languages_thesis_reportMachine_translation_for_low_resource_Indian_Languages_thesis_report
Machine_translation_for_low_resource_Indian_Languages_thesis_reportTrushita Redij
 
Deployment guide series ibm total storage productivity center for data sg247140
Deployment guide series ibm total storage productivity center for data sg247140Deployment guide series ibm total storage productivity center for data sg247140
Deployment guide series ibm total storage productivity center for data sg247140Banking at Ho Chi Minh city
 
System administration guide
System administration guideSystem administration guide
System administration guidemeoconhs2612
 
Operating Systems (printouts)
Operating Systems (printouts)Operating Systems (printouts)
Operating Systems (printouts)wx672
 
Linux_kernelmodule
Linux_kernelmodule Linux_kernelmodule
Linux_kernelmodule sudhir1223
 
Memory synthesis using_ai_methods
Memory synthesis using_ai_methodsMemory synthesis using_ai_methods
Memory synthesis using_ai_methodsGabriel Mateescu
 
BOOK - IBM zOS V1R10 communications server TCP / IP implementation volume 1 b...
BOOK - IBM zOS V1R10 communications server TCP / IP implementation volume 1 b...BOOK - IBM zOS V1R10 communications server TCP / IP implementation volume 1 b...
BOOK - IBM zOS V1R10 communications server TCP / IP implementation volume 1 b...Satya Harish
 
Fedora 17-installation guide-en-us
Fedora 17-installation guide-en-usFedora 17-installation guide-en-us
Fedora 17-installation guide-en-usnelson-10
 
IBM PowerLinux Open Source Infrastructure Services Implementation and T…
IBM PowerLinux Open Source Infrastructure Services Implementation and T…IBM PowerLinux Open Source Infrastructure Services Implementation and T…
IBM PowerLinux Open Source Infrastructure Services Implementation and T…IBM India Smarter Computing
 
Ibm tivoli monitoring for network performance v2.1 the mainframe network mana...
Ibm tivoli monitoring for network performance v2.1 the mainframe network mana...Ibm tivoli monitoring for network performance v2.1 the mainframe network mana...
Ibm tivoli monitoring for network performance v2.1 the mainframe network mana...Banking at Ho Chi Minh city
 
Introduction to system_administration
Introduction to system_administrationIntroduction to system_administration
Introduction to system_administrationmeoconhs2612
 
Experiences with oracle 10g database for linux on z series sg246482
Experiences with oracle 10g database for linux on z series sg246482Experiences with oracle 10g database for linux on z series sg246482
Experiences with oracle 10g database for linux on z series sg246482Banking at Ho Chi Minh city
 
Deployment guide series ibm tivoli composite application manager for web reso...
Deployment guide series ibm tivoli composite application manager for web reso...Deployment guide series ibm tivoli composite application manager for web reso...
Deployment guide series ibm tivoli composite application manager for web reso...Banking at Ho Chi Minh city
 
Cesvip 2010 first_linux_module
Cesvip 2010 first_linux_moduleCesvip 2010 first_linux_module
Cesvip 2010 first_linux_moduleAlessandro Grandi
 

Was ist angesagt? (19)

New linux course_modules
New linux course_modulesNew linux course_modules
New linux course_modules
 
Machine_translation_for_low_resource_Indian_Languages_thesis_report
Machine_translation_for_low_resource_Indian_Languages_thesis_reportMachine_translation_for_low_resource_Indian_Languages_thesis_report
Machine_translation_for_low_resource_Indian_Languages_thesis_report
 
Book hudson
Book hudsonBook hudson
Book hudson
 
Deployment guide series ibm total storage productivity center for data sg247140
Deployment guide series ibm total storage productivity center for data sg247140Deployment guide series ibm total storage productivity center for data sg247140
Deployment guide series ibm total storage productivity center for data sg247140
 
System administration guide
System administration guideSystem administration guide
System administration guide
 
fundamentals of linux
fundamentals of linuxfundamentals of linux
fundamentals of linux
 
Operating Systems (printouts)
Operating Systems (printouts)Operating Systems (printouts)
Operating Systems (printouts)
 
Linux_kernelmodule
Linux_kernelmodule Linux_kernelmodule
Linux_kernelmodule
 
Memory synthesis using_ai_methods
Memory synthesis using_ai_methodsMemory synthesis using_ai_methods
Memory synthesis using_ai_methods
 
BOOK - IBM zOS V1R10 communications server TCP / IP implementation volume 1 b...
BOOK - IBM zOS V1R10 communications server TCP / IP implementation volume 1 b...BOOK - IBM zOS V1R10 communications server TCP / IP implementation volume 1 b...
BOOK - IBM zOS V1R10 communications server TCP / IP implementation volume 1 b...
 
Fedora 17-installation guide-en-us
Fedora 17-installation guide-en-usFedora 17-installation guide-en-us
Fedora 17-installation guide-en-us
 
IBM PowerLinux Open Source Infrastructure Services Implementation and T…
IBM PowerLinux Open Source Infrastructure Services Implementation and T…IBM PowerLinux Open Source Infrastructure Services Implementation and T…
IBM PowerLinux Open Source Infrastructure Services Implementation and T…
 
Ibm tivoli monitoring for network performance v2.1 the mainframe network mana...
Ibm tivoli monitoring for network performance v2.1 the mainframe network mana...Ibm tivoli monitoring for network performance v2.1 the mainframe network mana...
Ibm tivoli monitoring for network performance v2.1 the mainframe network mana...
 
BA1_Breitenfellner_RC4
BA1_Breitenfellner_RC4BA1_Breitenfellner_RC4
BA1_Breitenfellner_RC4
 
Introduction to system_administration
Introduction to system_administrationIntroduction to system_administration
Introduction to system_administration
 
Experiences with oracle 10g database for linux on z series sg246482
Experiences with oracle 10g database for linux on z series sg246482Experiences with oracle 10g database for linux on z series sg246482
Experiences with oracle 10g database for linux on z series sg246482
 
Deployment guide series ibm tivoli composite application manager for web reso...
Deployment guide series ibm tivoli composite application manager for web reso...Deployment guide series ibm tivoli composite application manager for web reso...
Deployment guide series ibm tivoli composite application manager for web reso...
 
Oracle
OracleOracle
Oracle
 
Cesvip 2010 first_linux_module
Cesvip 2010 first_linux_moduleCesvip 2010 first_linux_module
Cesvip 2010 first_linux_module
 

Ähnlich wie thesis

Ali.Kamali-MSc.Thesis-SFU
Ali.Kamali-MSc.Thesis-SFUAli.Kamali-MSc.Thesis-SFU
Ali.Kamali-MSc.Thesis-SFUAli Kamali
 
Thesis - Nora Szepes - Design and Implementation of an Educational Support Sy...
Thesis - Nora Szepes - Design and Implementation of an Educational Support Sy...Thesis - Nora Szepes - Design and Implementation of an Educational Support Sy...
Thesis - Nora Szepes - Design and Implementation of an Educational Support Sy...Nóra Szepes
 
Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010Pieter Van Zyl
 
Cenet-- capability enabled networking: towards least-privileged networking
Cenet-- capability enabled networking: towards least-privileged networkingCenet-- capability enabled networking: towards least-privileged networking
Cenet-- capability enabled networking: towards least-privileged networkingJithu Joseph
 
M.Sc Dissertation: Simple Digital Libraries
M.Sc Dissertation: Simple Digital LibrariesM.Sc Dissertation: Simple Digital Libraries
M.Sc Dissertation: Simple Digital LibrariesLighton Phiri
 
bonino_thesis_final
bonino_thesis_finalbonino_thesis_final
bonino_thesis_finalDario Bonino
 
eclipse.pdf
eclipse.pdfeclipse.pdf
eclipse.pdfPerPerso
 
digital marketing training in bangalore
digital marketing training in bangaloredigital marketing training in bangalore
digital marketing training in bangaloreVenus Tech Inc.
 
Progress OpenEdge database administration guide and reference
Progress OpenEdge database administration guide and referenceProgress OpenEdge database administration guide and reference
Progress OpenEdge database administration guide and referenceVinh Nguyen
 
DBMS_Lab_Manual_&_Solution
DBMS_Lab_Manual_&_SolutionDBMS_Lab_Manual_&_Solution
DBMS_Lab_Manual_&_SolutionSyed Zaid Irshad
 
Workflow management solutions: the ESA Euclid case study
Workflow management solutions: the ESA Euclid case studyWorkflow management solutions: the ESA Euclid case study
Workflow management solutions: the ESA Euclid case studyMarco Potok
 
An Analysis of Component-based Software Development -Maximize the reuse of ex...
An Analysis of Component-based Software Development -Maximize the reuse of ex...An Analysis of Component-based Software Development -Maximize the reuse of ex...
An Analysis of Component-based Software Development -Maximize the reuse of ex...Mohammad Salah uddin
 
Informational user guide_1_212_mo_shell_1
Informational user guide_1_212_mo_shell_1Informational user guide_1_212_mo_shell_1
Informational user guide_1_212_mo_shell_1Hatim100
 

Ähnlich wie thesis (20)

Ali.Kamali-MSc.Thesis-SFU
Ali.Kamali-MSc.Thesis-SFUAli.Kamali-MSc.Thesis-SFU
Ali.Kamali-MSc.Thesis-SFU
 
Thesis - Nora Szepes - Design and Implementation of an Educational Support Sy...
Thesis - Nora Szepes - Design and Implementation of an Educational Support Sy...Thesis - Nora Szepes - Design and Implementation of an Educational Support Sy...
Thesis - Nora Szepes - Design and Implementation of an Educational Support Sy...
 
thesis-2005-029
thesis-2005-029thesis-2005-029
thesis-2005-029
 
diss
dissdiss
diss
 
thesis
thesisthesis
thesis
 
Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010
 
Srs
SrsSrs
Srs
 
Cenet-- capability enabled networking: towards least-privileged networking
Cenet-- capability enabled networking: towards least-privileged networkingCenet-- capability enabled networking: towards least-privileged networking
Cenet-- capability enabled networking: towards least-privileged networking
 
M.Sc Dissertation: Simple Digital Libraries
M.Sc Dissertation: Simple Digital LibrariesM.Sc Dissertation: Simple Digital Libraries
M.Sc Dissertation: Simple Digital Libraries
 
bonino_thesis_final
bonino_thesis_finalbonino_thesis_final
bonino_thesis_final
 
eclipse.pdf
eclipse.pdfeclipse.pdf
eclipse.pdf
 
Tr1546
Tr1546Tr1546
Tr1546
 
digital marketing training in bangalore
digital marketing training in bangaloredigital marketing training in bangalore
digital marketing training in bangalore
 
Progress OpenEdge database administration guide and reference
Progress OpenEdge database administration guide and referenceProgress OpenEdge database administration guide and reference
Progress OpenEdge database administration guide and reference
 
DBMS_Lab_Manual_&_Solution
DBMS_Lab_Manual_&_SolutionDBMS_Lab_Manual_&_Solution
DBMS_Lab_Manual_&_Solution
 
IBM Streams - Redbook
IBM Streams - RedbookIBM Streams - Redbook
IBM Streams - Redbook
 
Workflow management solutions: the ESA Euclid case study
Workflow management solutions: the ESA Euclid case studyWorkflow management solutions: the ESA Euclid case study
Workflow management solutions: the ESA Euclid case study
 
An Analysis of Component-based Software Development -Maximize the reuse of ex...
An Analysis of Component-based Software Development -Maximize the reuse of ex...An Analysis of Component-based Software Development -Maximize the reuse of ex...
An Analysis of Component-based Software Development -Maximize the reuse of ex...
 
Informational user guide_1_212_mo_shell_1
Informational user guide_1_212_mo_shell_1Informational user guide_1_212_mo_shell_1
Informational user guide_1_212_mo_shell_1
 
Thesis
ThesisThesis
Thesis
 

thesis

  • 1. SEFS - Self Expiring File System Index No: 09440599 S M Udara Rusiri Siyasinghe smudararusiri@yahoo.com Supervisor: Mr. Malik Silva Master of Computer Science University of Colombo School of Computing January 15, 2013
  • 2. Declaration The thesis is my original work and has not been submitted previously for a degree at this or any other university/institute. To the best of my knowledge it does not contain any material published or written by another person, except as acknowledged in the text. Students name: S M Udara Rusiri Siyasinghe Date: .......... .................................... Signature This is to certify that this thesis is based on the work of Mr. S M Udara Rusiri Siyasinghe under my supervision. The thesis has been prepared according to the format stipulated and is of acceptable standard. Certified by: Supervisor Name: .................................... Date: .......... Signature ii
  • 3. Abstract This research is an attempt to design a new file system that will help the users to protect the privacy and save disk space. In this file system we can highlight interested files with an expiration date so that they may get deleted or moved to a special location where we can attend to them and clear off what we do not want anymore. A separate data structure per file is used to store additional attributes about each file created in this file system and desired behavior is achieved by modifying these attributes accordingly. These additional attributes are saved in the Extended File Attributes which is a file system feature supported by most of the Linux file systems. iii
  • 4. Acknowledgement This research project would not have been possible without the support of many people. It is with immense gratitude that I acknowledge the invaluable assistance, support and guidance of my supervisor Mr. Malik Silva. I owe my deepest gratitude also to Dr. Chamath Keppi- tiyagama who was abundantly helpful with suggestions and guided this research towards the right direction initially. iv
  • 5. Contents Declaration ii Abstract iii Acknowledgement iv List of Figures vii List of Tables viii List of Abbreviations ix 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Background 3 2.1 What do we have now . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 First of its kind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3 Methodology 7 3.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1.1 SEFS Control Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1.2 Core functionality of SEFS . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.3 Extended Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2.1 Development Environment . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2.2 SEFS Implementation Architecture . . . . . . . . . . . . . . . . . . . 13 3.2.3 SEFS attributes - How it is implemented . . . . . . . . . . . . . . . . . 14 3.2.4 SEFS behavior implementation . . . . . . . . . . . . . . . . . . . . . 15 3.2.5 sefsutil - The Utility program . . . . . . . . . . . . . . . . . . . . . . 19 v
  • 6. 4 Evaluation 20 4.1 Scenario based testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Performance testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 A Installation Guide 27 References 30 vi
  • 7. List of Figures 2.1 BleachBit File Cleaner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 FUSE module interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1 Implementation Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 SEFS attribute names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 SEFS inode struct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 Extended Attribute storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.5 Extended Attribute retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.6 FUSE Operations struct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.7 Non Overridden operation from FUSE example file system . . . . . . . . . . . 18 3.8 An Overridden operation from SEFS file system . . . . . . . . . . . . . . . . . 18 4.1 Compile bash script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 Performance Benchmarking through compilation methodology . . . . . . . . . 26 A.1 sefsutil command help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 vii
  • 8. List of Tables 3.1 SEFS control attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Attribute values for Default Expiration Policy . . . . . . . . . . . . . . . . . . 10 4.1 SEFS Attributes default policy test . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 File expirability test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 SEFS ExpireDate extension test . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.4 Ext4 compilation elapsed time . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.5 FUSE example file system compilation elapsed time . . . . . . . . . . . . . . . 25 4.6 SEFS file system compilation elapsed time . . . . . . . . . . . . . . . . . . . . 25 4.7 Compilation elapsed time comparison . . . . . . . . . . . . . . . . . . . . . . 25 viii
  • 9. List of Abbreviations JPEG Joint Photographic Experts Group SEFS Self Expiring File System EXT2 2nd Extended Filesystem FAT32 32 bit File Allocation Table NTFS New Technology File System GUI Graphical User Interface ix
  • 10. 1. Introduction With the advancement of technology, mass production and competition, the capacity of a sec- ondary storage device is rising rapidly while the cost is getting lowered. Thanks to much cheaper storage devices our computers have persistence storage capacity in abundance. That said, does this enormous capacity enable us to keep our files stored forever? No. This is not always true because the size of a file is also increasing due to the advancement of the tech- nology. Few years back we had 1 to 4 Mega-pixels of cameras and camera phones which will generate images of average size 300KB. But those are history now and we have cheap cameras which generate 8 - 14 Mega-pixels images which consumes average 1.2MB - 3MB when com- pressed to JPEG. When it comes to professional level equipments photos will consume even more. There is no exception when it comes to the world of Videography as well. Thanks to the High Definition technology, HD videos consume Gigabytes of storage and now it is possible to shoot HD videos even with some mobile phones. So the storage devices eventually become full even with the enormous capacity. Therefore we cannot get rid of the need of deleting unwanted files. On the other hand we receive and store files which can be considered as sensitive in the sense that security, file content or privacy is more concerned than mere storage. We might have used those files already and interest is lost. Therefore further storage of these files may unnecessary. Yet we might forget to delete those files and keeping them unattended may become hostile to our selves because information leakage may cause irreversible damage to our legal, social and personal life to a great extent. 1.1 Motivation There are so many different file systems in the computer world. Some are general purpose and some are used for more specialized purposes. Almost all these existing file systems are designed in a way that files in the secondary storage are to be persistent throughout the lifetime of the storage device unless they are deleted by the user. 1
  • 11. These facts indicate that there is a need to have a new file system which provides additional functionalities which are not found in conventional file systems. This new file system should satisfy the above mentioned requirements by deleting the files itself. In the new file system we should be able to highlight interested files with an expiration date so that they may get deleted or moved to a special location upon expiration. At a later point of time we can attend to those expired files and delete what we do not want any more from the storage. Our objective is to design and implement the new file system, “SEFS - The Self Expiring File System” which supports self expiring behavior of the files stored in it along with a set of supporting tools to achieve the desired functionality. Implementing a new file system using C is very broad and need considerable time and effort which is not available for an individual Master’s level research project. In order to keep the scope of the project to an acceptable level, this file system will be implemented as a User Space file system[1, 2]. User space file systems are not replacements for standard file systems like Ext2, Ext3,Ext4, FAT32 or NTFS but will be used side by side with those file systems. Among the available User Space file systems, FUSE[3, 4] is a stable open source framework which is available for Unix like operating systems. FUSE framework will be used to implement the new file system and Ubuntu Linux distribution will be used as the target operating system. Since FUSE is an open source framework and working for most of the Linux distributions, the file system will not be limited to Ubuntu operating system. However FUSE is not available for windows operating system. When FUSE is used not all the file system operations have to be implemented which is really advantageous that I only have to focus on the file expiring behavior of the file system. 2
  • 12. 2. Background 2.1 What do we have now Currently there are hundreds of file cleaners implemented and available to be used by any PC user to free disk space and protect privacy. Some of them are part of the operating system like ‘Disk Cleanup’ in Windows and ‘apt-get clean’ for package repository cleanup in Linux. Some of these are working in a passive way through Schedule Tasks and some in Active way through utility applications that user has to initiate by themselves. The cleanup methodology is more or less same in almost all the systems. Either they use predefined set of folder paths or set of commonly used applications with known temporary file locations. Files and folders re- side in these locations are cleaned by looking at the last modified date or just delete everything since user gave the nod to delete them. There are some advanced open source applications like BleachBit[5] which gives more control to the user to specify file locations and cleaning outcome(Delete, truncate, or shred files). It even provides an xml based meta language called CleanerML to write custom cleaners which makes it one step ahead of other systems but clean- ing is done by looking at the available fie system parameters like ‘created date’ and ‘modified date’. Figure 2.1 shows the GUI of the BleachBit cleaner and most of the other file cleaners share the same designs and concepts. Apart from file cleanup applications, we were unable to find any sophisticated system which will give more control over to user with additional set of attributes or guaranteed way of cleaning the fies. A new file system will be the best way to achieve this since it is the only way to have additional set of attributes and guaranteed way of cleaning through file system operations. Surprisingly we could not find any theoretical or prototype file system in this nature and SEFS is going to be the first file system of its kind. Thanks to the FUSE framework we were able to implement a prototype version of the file system so that first file system is available for other researchers to improve and implement as a true kernel file system. 3
  • 13. Figure 2.1: BleachBit File Cleaner 2.2 First of its kind Since the new file system will be implemented as a user space file system on top of FUSE framework, let’s talk about FUSE and user space file systems a bit. When we talk about user space file systems one can ask the question what is user space? This is a good entry point to a discussion about the user space file systems. Conventional operating systems divide virtual memory in to two sections called kernel space and user space. Kernel space portion of the mem- ory is reserved for running the operating system kernel, kernel extensions and device drivers. Kernel is not a process rather a controller/manager of all the processes. It has exclusive control over everything that is occurring on the system. Therefore kernel space programs have admin- istrator(root) privileges when they are executing. User space on the other hand is the portion of virtual memory in which all the user processes are executed. User processes are instances of all programs other than the kernel and they cannot mess around with memory belongs to other programs or kernel. They do not have administrator(root) privileges therefore they cannot directly access system resources and have to go though the kernel for resources. As explained above a user space file system can be regarded as a program executing in the user space. Accessing a file stored in a file system is an IO operation and is a duty of the kernel and 4
  • 14. implemented in the file system itself. Conventional file systems like FAT32, NTFS, Ext2 and Ext3 are part of the OS kernel and runs on the kernel space. Then how a user space program will do an IO operation without having root privileges or without having access to kernel resources? This is indeed a good question. Thanks to the Linux kernel design all the requests for file operations are handled by a kernel module called VFS - Virtual File system Switch. This is only a standard interface to handle file operations and actual implementation can be found at separate kernel modules which are dedicated to each file system. Using this mechanism user space file system frameworks like FUSE[3, 4] are installed and executed as a kernel module and user space file systems are implemented on top of the framework APIs. Figure 2.2: FUSE module interactions Figure 2.2 illustrates the flow of control when we execute a command against a user space file system implemented using FUSE. First we have to compile the file system code and generate a binary. Then we can mount the file system to a mount point and execute commands against it. For an example, If we execute the command ls -l /tmp/fuse against a folder resides inside a mounted FUSE file system, the control will be given to the kernel because ls -l will results in an IO operation. This request is handled by the kernel module VFS, which is just an interface to the actual file system modules. Since this file system runs on 5
  • 15. top of the FUSE module, control is given to the FUSE kernel module which then goes through glibc and libfuse (libfuse is the FUSE library in user space) and contacts the actual file system binary. It then returns the results back down the stack to the FUSE kernel model, back through the VFS to the user. The new file system will aid the user with cleaning unwanted files by looking at the user’s behavior. Ordinary file systems does not support the user by making decisions on behalf of him rather user will control each and every modification that is done to the files in the file system. But SEFS is not the only file system that will help the user when making changes to the files. Among such file systems Elephant[6] file system is a quite interesting one due to its core functionality which is, it never forgets a file. When you delete a file in the Elephant file system you are not reclaiming the storage. This file system is intended to support making file revisions without the support of the user. Elephant is the total opposite of the SEFS but it also consider the fact that storage eventually become full and need to reclaim the storage by deleting some versions of a file. The decision of selecting which file version to delete is really important to the usability of this file system. Similarly we have to come up with a file expiration schema which is the core functionality of the SEFS. 6
  • 16. 3. Methodology 3.1 Design When the file system is designed for the first time our thoughts were to allow the user to decide what files needed to be expired. And he has to mark the files with relevant flags which control the expiration behavior. Our most important design consideration was the behavior should be applied per file basis. It is not something defined globally and will not affect all the files. User has the full control over his actions and file system will behave the way he likes. But the fundamental problem that we attempt to solve by this file system is cleaning useless files by the file system itself. If file system is to fulfill this requirement, user has to set the necessary flags. In general what a typical user would do is, just stores the files he needs in the new file system with the intention of setting the necessary flags at a later stage. But eventually what would happen is, he forget to set the flags and new file system will behave just like a conventional file system. Other extreme end of the above design is letting the file system take control of everything. It will set the required expiration flags and will expire the files according to the way it is im- plemented. But the user will not be able to control which files are expirable and which are not as well as when to expire. A user might not like the idea of a file system which deletes things without his consent if file system sets the expirable attributes. This is a valid argument about this particular design of the file system but the design will make sure that file system will be useful in its intended specialized behavior. Since our interest is focused on the ability of the file system to clean useless files which user thinks they really are useless, we have come up with another design which can be considered as a hybrid model. In this design the file system will assign set of required flags in each and every file that are stored in this file system at the time of creating the file. Since the file system assigns flags by default, user should be given the ability to change those flags as and when he wishes in per file basis. In this model the file system forces the user and also provide a mechanism to safeguard his useful files(by removing self expiration flags), otherwise all the files are treated 7
  • 17. exactly the same way and will be governed by the default file expiration policy. So the user must be active when storing files in the, SEFS - Self Expiring File System by removing the flags of each file which he wants to keep indefinitely while file system can carry out its specialized tasks so that SEFS is not just a conventional file system. Moreover another important behavior was considered when designing the file system. That is, expired files will not be deleted from the storage unless the user intentionally sets the SEFS attribute, DeleteOnExpire on each file. With this flexible feature file system will aid the user by moving expired files to a designated zone in the file system which we would call Scavenger Buffer Zone. So the user does not have to worry about the files which are being expired since he has given a second chance to recover the files. Controlled flexibility is the key in this design of the file system. 3.1.1 SEFS Control Attributes The prominent feature of this file system is its unique set of additional attributes. Availability and usage of new attributes distinguish this file system with other mechanisms of automatic file cleaning. Most of the other systems allow only to specify folders where we need to store files if we need them cleaned by the system. And there is no way of specifying an expiration date or controlling the expiration behavior. The power and the flexibility of SEFS is coming from its controlling attributes and they allow the expiration behavior to be controlled at fine grained per file level instead of folder level. Let us dig deeper in to these attributes in this section and eventually come to see how SEFS behave according to these flags. Table 3.1 list down the new file attributes which are needed to achieve the desired functionality. Attribute Name Type C Equivalent Description Expirable Boolean int True if a file is Expirable by the file system. BaseDate Date time t Date value when the Expire Date was assigned. ExpireDate Date time t File Expiration Date. UserModified Boolean int True if user has modified any SEFS flags. InBuffer Boolean int True if the file is in the Scavenger Buffer DeleteOnExpire Boolean int True if user wants to Delete the file once expired. Origin String char[] Original path of the file before moving into the Buffer. Table 3.1: SEFS control attributes 8
  • 18. More on attributes Expirable This flag can be considered as the controlling flag of the entire expiration behavior. SEFS will set this flag once a new file is created in the file system and default expiration policy will be applied if and only if this flag has been set to be true. User will be allowed set this flag to false if he wishes and once this is set to false expiration policy will never be applied on that particular file. Base Date This attribute will hold the date on which the Expire Date was assigned to a file. Whenever the user or the file system updates the Expire Date, this attribute will also be updated with the current date. Expire Date This is the attribute which will hold the Expire Date of a file if that particular file is an expirable file. Both file system and the user will be able to set and update the expire date and once this date is reached SEFS will expire the file accordingly. User Modified This flag will set to be true if user has done any change to the SEFS attributes, so that file system will left the particular file untouched without modifying any flags at certain command executions like move the file to a new location. In Buffer This attribute will set to be true once a particular file is expired and moved into the Scavenger Buffer Zone. And this attribute will help the file system to take critical decisions when user is recovering the file from Buffer Zone. Delete On Expire This attribute suppose to be set only by the user and once set to true on a particular file, it will get deleted from the file system upon reaching the expire date without moving it into Buffer Zone. Origin This attribute will contain the file path on which this particular file was stored before moving it into the Buffer Zone. Information stored on this attribute will aid the user at the time of file recovery from the Buffer. 3.1.2 Core functionality of SEFS In this file system most critical attribute values will be set based on a global constant called Default Expiration Period. User is allowed to change this parameter only and SEFS will derive rest of the values based on this parameter. This way user is given the required flexibility and 9
  • 19. at the same time configuration complexity is reduced. When there are too may configurable parameters users might end up with over configuring or under configuring the system. Default Expiration Policy Attribute Name Value Expirable true BaseDate Current System Date Time ExpireDate BaseDate + Default Expiration Period UserModified false InBuffer false DeleteOnExpire false Origin Original path of the file before moving into the Buffer. Table 3.2: Attribute values for Default Expiration Policy Table 3.2 lists the SEFS attributes with the default policy applied. When a new file is created in the file system SEFS will make sure that controlling attributes will hold these default values. Therefor this can be considered as the default file expiration policy and user is also given the flexibility to modify certain attribute values later. • Whenever a new file is created in the file system, it will be marked as an Expirable file and by default an expire date will be assigned by adding Default Expiration Period to the system date. The current system date will be stored as the Base Date. Current path will be stored in the Origin attribute. • Once the Expire Date is reached, files will be moved in to the Buffer Zone since in default policy DeleteOnExpire is set to false. • Expired files will get moved in to the Buffer Zone or deleted from the file system at the time of accessing its containing folder. In most of the time this task will be done when user executes the directory listing command. • Whenever a file in the Buffer Zone is recovered, it will be treated as an useful file and no SEFS attributes will be set by the file system but user would allowed to set any flag as he wishes. 10
  • 20. • When the user is accessing an expirable file(Expirable attribute is set to true), it can be considered as a useful file to the user. Therefore the file system will give an extension to the ExpireDate using formula 4.1 but if the file is a user modified file, no change will be done to the ExpireDate because priority will be given to the user’s preference. Whenever the Expire Date is updated, BaseDate will also get updated with the System Date. Extension = (DefaulExpirationPeriod)−(ExpireDate−CurrentDate) NewExpireDate = OldExpireDate+Extention (3.1) • If the user is making a copy of a file stored in the file system, the copy is treated as a totally new file being created in the file system and SEFS default policy attribute set will be assigned and an extension will be given to the Expire Date of the original file. When the file is moved to a new location, original attribute set will be left untouched if the UserModified flag is set. Otherwise an extension will be given to the ExpireDate. After the initial design was done next problem we had to solve was, How to store these new set of attributes for a file? Conventional file systems store metadata related to a file in the inode structure. If we had to come up with a new inode structure to make room for these new attributes, it would have turned out to be a big development. Fortunately with Extended Attributes[7] which is a file system feature that provides a mechanism to store additional set of user defined meta data about a file, a new inode implementation was not needed. Currently Extended Attributes are supported in Linux file systems ext3, ext4, JFS, ReiserFS, XFS, Btrfs, OCFS2 1.6 and FAT, HPFS in Windows. Clearly this will helps to keep the scope intact since Base file system will handle the basic file system functionality while FUSE file system will focus on the SEFS specific behavior. But this will impose a restriction on the underling file system that will be used as the base file system. Ubuntu 12.04 operating system will be used as the platform to implement the prototype FUSE file system. Ext4 journal file system which is the latest of Ext file systems will be used as the base file system. Above all there is one important thing the user has to pay his attention on for the smooth operation of this file system. This may also be considered as a weakness of SEFS. The limitation is, the user is required to make sure that System Date and Time is accurate. This is something we can always expect from a user who will go this extra mile to have this SEFS functionality. 11
  • 21. 3.1.3 Extended Attributes Since we are heavily dependent on the Extended Attributes let us talk about them a bit. Extended attributes(xattr) is a file system feature that enables applications and users to associate files with additional set of metadata. Typically these metadata is not understood by the file system as they do with regular file attributes. For an example normal file attributes like size, modified date and permissions have a clearly defined purpose and maintained and understood by the file system. Contrary to normal attributes, Extended attributes can be used to store the author of a document, the character encoding of a document, a hash value or digital signature and are not maintained by the file system but application programs. Extended attributes are name/value pairs associated with files as well as directories. An Extended attribute may be defined or undefined. Similarly its value also may be empty or non- empty given that attribute is defined. Extended attributes are stored in separate disk blocks pointed to by entries in the inodes whereas normal attributes are stored in the inode itself. They are often used to provide additional functionality or extend the file system features. For an example, additional security features such as Access Control Lists(ACLs) may be implemented using extended attributes. Since Extended attributes are stored in disk blocks the kernel and the file system may impose limits on the maximum number and the size of extended attributes. In the extended file system implementations like ext2, ext3 and ext4 each extended attribute must fit on a single file system block depending on the specified block size when the file system was created. Typically this value is 4096 bytes(4 KBs). The space consumed for storing extended attributes is counted towards the allocated disk quotas of the file owner and the group. In some file systems like ext2, ext3 and reiserfs, it is required to mount the file system with ‘user xattr’ mount option in order to use the extended attributes. In the Ubuntu 9.04 operating system that we started our implementation, ext3 was the default file system and by default it is mounted with ‘user xattr’ mount option. Later the SEFS implementation was compiled in the Ubuntu 12.04 with ext4 as the base file system with zero modifications to the code. Extended Attribute names are zero-terminated strings and they should be specified as fully qualified names. This fully qualified name consists of a namespace and the attribute name. The namespace mechanism is used to separate extended attributes in to different classes and different access permissions are required to manipulate attributes in each class. Currently there are four namespaces. Namely; security, system, trusted, and user. The number of supported namespaces are vary with the file system implementation and ext4 file system support them all. Among them SEFS is implemented only using ‘user’ namespace which requires minimal set of 12
  • 22. permissions like file access permissions. And the Extended ‘user’ Attributes can be assigned to files and directories for storing arbitrary additional information like mime type and character encoding. Therefore it was an oblivious choice to use Extended user attributes to store SEFS specific attributes. 3.2 Implementation The prototype implementation consists of two programs. One is the SEFS file system itself and the other one is a Utility program which provides the means of manipulating SEFS attributes and changing the self expiring behavior of the stored files. 3.2.1 Development Environment The prototype file system is implemented on top of the FUSE framework which implicitly im- pose a limitation towards the operating system. Since FUSE only supports Unix like operating systems as of now, Ubuntu 12.04 Linux distribution was chosen as the underlying OS. Ubuntu 12.04 is configured to use ext4 extended file system as the default file system. Therefore it was used as the base file system. Our prototype implementation should be fast enough since this is a file system at the end of the day. Therefore C language was used to implement the file system even though FUSE has other flavors of easy programming languages like Java. C++ was aban- doned even though it is much easier to program in C++ since most of the help and example file systems were available for C. 3.2.2 SEFS Implementation Architecture C is not an object oriented language and this makes it difficult to implement and maintain C codes when they started to grow bigger and bigger. Figure 3.1 shows a high level overview of the implementation architecture and best efforts have been made to reuse the source codes and to achieve some sort of modularized implementation. Reusable utility methods have been implemented and maintained in a separate source file(sefslib.c) and both SEFS file system im- plementation file(sefs.c) and Utility program(sefsutil.c) which is provided to manipulate self expiring behavior are reusing the utility methods. 13
  • 23. Figure 3.1: Implementation Architecture 3.2.3 SEFS attributes - How it is implemented SEFS attributes is an integral part of the new file system and this makes it mandatory to have an efficient storage and retrieval mechanism. Extended Attributes are the most efficient place to store these attributes and manipulation of name/value pairs had to be done in a reliable way that will make sure attribute values will not overwrite each other and retrieved values are 100 percent accurate. #define NAME_PREFIX "user." #define EXPIRABLE "Expirable" #define BASE_DATE "BaseDate" #define EXPIRE_DATE "ExpireDate" #define USER_MODIFIED "UserModified" #define IN_BUFFER "InBuffer" #define DELETE_ON_EXPIRE "DeleteOnExpire" #define ORIGINE "Origine" Figure 3.2: SEFS attribute names Code listing in figure 3.2 shows how SEFS attribute names are defined as global constants and ‘NAME PREFIX’ constant defines the Extended attribute namespace ‘user’ which is used in SEFS. When these global constants are used as names when manipulating extended attribute values, it will guarantee the reliability, ease of implementation and maintainability. Since these 14
  • 24. are defined in the sefslib.h header file and it is included both in SEFS file system code and Utility program, uniformity of attribute names is maintained across two different programs. Like attribute names, a reliable mechanism is required to manipulate the attribute values as well. Code listing 3.3 shows the ‘SEFS INODE’ structure which efficiently manages attribute values across two programs since it is defined in the same sefslib.h header file. Though this is called SEFS INODE it is nowhere closer to an inode of a Unix file system. But when it comes to the implementation, this struct is the inode of SEFS and equally important to SEFS as a good old file system inode; hence the name. typedef struct sefsInode { int expirable; time_t baseDate; time_t expireDate; int userModified; int inBuffer; int deleteOnExpire; char origine[PATH_MAX]; } SEFS_INODE; Figure 3.3: SEFS inode struct Using above implementation design, Extended Attribute name/value pairs are manipulated in the two programs effectively. But the actual storage and retrieval of these attributes are done via ‘setxattr’ and ‘getxattr’ of ext4 system calls. The Figures 3.4 and 3.5 shows storage and retrieval of Extended attributes respectively. 3.2.4 SEFS behavior implementation SEFS is implemented by altering the FUSE example file system found at example/fusexmp - fh.c. This is a C source file with a main() method and inside the main, fuse main() is called to interact with the FUSE libraries. int main(int argc, char *argv[]) { umask(0); return fuse_main(argc, argv, &xmp_oper, NULL); } 15
  • 25. void setSefsFlags(const char* path, SEFS_INODE* node) { int r = -1; char value[32]; /*Expirable*/ sprintf(value, "%d", node->expirable); r = setxattr(path, NAME_PREFIX EXPIRABLE, value, strlen(value), 0); /*BaseDate*/ sprintf(value, "%ld", node->baseDate); r = setxattr(path, NAME_PREFIX BASE_DATE, value, strlen(value), 0); /*ExpireDate*/ sprintf(value, "%ld", node->expireDate); r = setxattr(path, NAME_PREFIX EXPIRE_DATE, value, strlen(value), 0); /*UserModified*/ sprintf(value, "%d", node->userModified); r = setxattr(path, NAME_PREFIX USER_MODIFIED, value, strlen(value), 0); /*InBuffer*/ sprintf(value, "%d", node->inBuffer); r = setxattr(path, NAME_PREFIX IN_BUFFER, value, strlen(value), 0); /*DeleteOnExpire*/ sprintf(value, "%d", node->deleteOnExpire); r = setxattr(path, NAME_PREFIX DELETE_ON_EXPIRE, value, strlen(value), 0); /*Origine*/ r = setxattr(path, NAME_PREFIX ORIGINE, node->origine, strlen(node->origine)+1, 0); if (r != 0) perror("Set Attribute Error "); } Figure 3.4: Extended Attribute storage void getSefsFlags(const char* path, SEFS_INODE* node) { char boolVal[1]; char longIntVal[10]; char strValue[1024]; getxattr(path, NAME_PREFIX EXPIRABLE, boolVal, 1); node->expirable = atoi(boolVal); getxattr(path, NAME_PREFIX BASE_DATE, longIntVal, 10); node->baseDate = atoi(longIntVal); getxattr(path, NAME_PREFIX EXPIRE_DATE, longIntVal, 10); node->expireDate = atoi(longIntVal); getxattr(path, NAME_PREFIX USER_MODIFIED, boolVal, 1); node->userModified = atoi(boolVal); getxattr(path, NAME_PREFIX IN_BUFFER, boolVal, 1); node->inBuffer = atoi(boolVal); getxattr(path, NAME_PREFIX DELETE_ON_EXPIRE, boolVal, 1); node->deleteOnExpire = atoi(boolVal); getxattr(path, NAME_PREFIX ORIGINE, strValue, 1024); strcpy(node->origine, strValue); } Figure 3.5: Extended Attribute retrieval 16
  • 26. When calling the fuse main() function a very important additional parameter is passed to it along with the standard main arguments. This is a struct called ‘fuse operations’ and all the standard file system operations that will be subscribed and overridden is specified in this. Once this is done new FUSE file system will get the control when the standard file system operations are called by the kernel or the application programs, enabling the FUSE file system to implement a totally new behavior or enhance the standard functionality. static struct fuse_operations sefs_oper = { .getattr = sefs_getattr, .opendir = sefs_opendir, .readdir = sefs_readdir, .mknod = sefs_mknod, .mkdir = sefs_mkdir, .rmdir = sefs_rmdir, .rename = sefs_rename, .chmod = sefs_chmod, ... ... ... .create = sefs_create, .open = sefs_open, .read = sefs_read, .write = sefs_write, .statfs = sefs_statfs, .setxattr = sefs_setxattr, .getxattr = sefs_getxattr, .listxattr = sefs_listxattr, .removexattr= sefs_removexattr, .lock = sefs_lock, .flag_nullpath_ok = 1, }; Figure 3.6: FUSE Operations struct Code listing 3.6 shows part of the fuse operations struct and out of all the subscribed opera- tions only required operations have been overridden. Rest of the operations just call the system calls provided by the underlying ext4 file system. It was sufficient to override the following set of operations to achieve the self expiring behavior of SEFS prototype implementation. • sefs write • sefs read • sefs open 17
  • 27. • sefs create • sefs rename • sefs readdir Code listing 3.7 shows a non overridden operation from FUSE example file system and code listing 3.8 shows an enhanced operation from SEFS file system. static int xmp_create(const char *path, mode_t mode, struct fuse_file_info *fi) { int fd; fd = open(path, fi->flags, mode); if (fd == -1) return -errno; fi->fh = fd; return 0; } Figure 3.7: Non Overridden operation from FUSE example file system static int sefs_create(const char *path, mode_t mode, struct fuse_file_info *fi) { int fd; SEFS_INODE fileNode; fd = open(path, fi->flags, mode); if (fd == -1) return -errno; getDefaultSefsFlags(&fileNode); fileNode.expireDate = fileNode.baseDate + DEFAULT_EXPIRE_PERIOD; strcpy(fileNode.origine, path); setSefsFlags(path, &fileNode); fi->fh = fd; return 0; } Figure 3.8: An Overridden operation from SEFS file system 18
  • 28. 3.2.5 sefsutil - The Utility program Once the file system is compiled and mounted a user can use the SEFS file system as any other file system through the mount point. Then the SEFS will starts to set the necessary flags to carry out it’s self expiring functionality. The SEFS was designed in a way that fine grained file level control is available to the user. Therefore the user should be given the ability to change the SEFS attributes of a file so that file system will behave according to the user’s requirement. For that a separate command line program called ‘sefsutil’ was implemented and will be distributed together with the SEFS file system prototype. When a file stored in the SEFS file system is expired it will be moved into the SEFS Buffer zone specified by the path in sefs.conf configuration file. If the user wishes to recover this file he can either move it to a desired location or use the sefsutil program. When this program is used, the user can restore the file to its original location by using path stored in the ‘Origin’ SEFS attribute and the file is considered as a useful file and ‘Expirable’ flag is removed. Moreover it can be used to have a look at the current SEFS attribute values and modify the values assigned to ‘Expirable’, ‘DeleteOnExpire’ and ‘ExpireDate’ flags. Program usage instructions are available in the Appendix A. 19
  • 29. 4. Evaluation To free disk space and protect privacy in a more productive way our approach was to implement a new file system deviating from common file cleaners. Once the file system is implemented next question was how to test and evaluate the file system. When evaluating SEFS we had to make sure that technique will reflect the SEFS’s ability of demonstrating its file expiring behavior. To test this we used scenario based test protocols and once the testing it completed we can easily conclude that SEFS is capable of demonstrating its file expiring capabilities. To evaluate the performance researchers try to benchmark their file system against an es- tablished file system using various file system benchmarking methods. But their is no single accepted benchmarking technique available[8]. Most researchers have used Andrew file sys- tem benchmark which was introduced along with the Andrew file system[9] but this benchmark seems obsolete now. There are some modern techniques like Filebench, Postmark, Compile techniques(Apache source, Linux Kernel etc) and different Ad-hoc techniques. When these benchmarking techniques were evaluated we felt that Compile technique will be suitable to evaluate the SEFS since a source code compile will read new files, create new files, write to existing files and lookup the files. Therefore a compile will cover most of the overridden file operation system calls. SEFS prototype is implemented on top of FUSE framework as a user space file system and measured performance will not be any better than kernel file systems like Ext4 using which SEFS is implemented. But what we interested in finding out is whether it will perform within acceptable limits with the added functionality. There are two known limitations of this file system which are listed bellow and we cannot say this file system is a failure because of them. A user has to use the SEFS with these inherent limitations but we do not see them as weakness; rather trade-offs. • The user or the admin has to make sure that System Date and Time is accurate and if Date and Time is incorrect, SEFS will not behave as expected. • The SEFS attributes are assigned only to the files and directories are ignored from the expiring functionality. This file level expiration behavior gives more control over to the 20
  • 30. user and simplifies the expiration behavior and configuration. 4.1 Scenario based testing A set of scenario based tests were created along with the expected outcome and results were recorded against each test. By comparing expected outcome and actual result, it is easy prove that SEFS is working according to its design or not. Prerequisites: Configure the System Date and Time to reflect the correct values. Mount the SEFS file system at /mnt/sefs mount point. Configure the SEFS buffer path to /mnt/sefs buffer. Configure the Default Expiration Period to 10 minutes. Scenario 01: Default Expiration Policy Test Create a file in the path /mnt/sefs/tmp/test1.txt udara@ubuntu:~/MCS_Working/sefs$ ls -l > /mnt/sefs/tmp/test1.txt Examine the SEFS attribute values using the sefsutil. udara@ubuntu:~/MCS_Working/sefs$ ./sefsutil -p /tmp/test1.txt Test Attribute Expected Value Actual Result Expirable 1 1 BaseDate System Date Time 2013-01-15 00:15:05 ExpireDate BaseDate + 10 minutes 2013-01-15 00:25:05 UserModified 0 0 InBuffer 0 0 DeleteOnExpire 0 0 Origin tmp/test1.txt tmp/test1.txt Table 4.1: SEFS Attributes default policy test According to the Table 4.1 Actual result is aligned with the expected result; therefore Sce- nario 01 Testing is Passed. 21
  • 31. Scenario 02: File Expirability Test The expectation of this test scenario is to find out the file expirability of the SEFS file system. Execute the ‘ls -l’ directory listing command against the path /mnt/sefs/tmp/ after 2013-01-15 00:25:05. udara@ubuntu:~/MCS_Working/sefs$ date udara@ubuntu:~/MCS_Working/sefs$ ls -l /mnt/sefs/tmp/ Test Case Expected Value Actual Result Execute the ‘date’ command Currenct Date Time Tue Jan 15 00:35:07 PST 2013 Check the availability of file /tmp/test1.txt Not available Not available Check the availability of file /tmp/test1.txt at the SEFS Buffer Available in Buffer Available in Buffer Table 4.2: File expirability test According to the Table 4.2 Actual result is same as the expected result; therefore Scenario 02 Testing is Passed. SEFS file system is capable of expiring a file upon reaching the ExpireDate. Scenario 03: Test the ability of preserving a frequently accessed file Create a file in the path /mnt/sefs/tmp/test3.txt udara@ubuntu:~/MCS_Working/sefs$ ls -l > /mnt/sefs/tmp/test3.txt Examine the ExpireDate attribute value using the sefsutil. udara@ubuntu:~/MCS_Working/sefs$ ./sefsutil -p /tmp/test3.txt After waiting 5 minutes modify the content of the /tmp/test3.txt file udara@ubuntu:~/MCS_Working/sefs$ date > /mnt/sefs/tmp/test3.txt Examine the ExpireDate attribute value using the sefsutil after modifying the file. Extension = (DefaulExpirationPeriod)−(ExpireDate−CurrentDate) NewExpireDate = OldExpireDate+Extention NewExpireDate = OldExpireDate+(DefaulExpirationPeriod)−(ExpireDate−CurrentDate) NewExpireDate = CurrentDate+DefaulExpirationPeriod (4.1) 22
  • 32. Test Case Expected Value Actual Result Original BaseDate System Date 2013-01-15 01:15:37 Original ExpireDate BaseDate + 10 Minutes 2013-01-15 01:25:37 Check the file content after modifi- cation result of the ‘date’ command Tue Jan 15 01:20:40 PST 2013 New ExpireDate Original ExpireDate + Exten- sion 2013-01-15 00:30:40 Table 4.3: SEFS ExpireDate extension test The equation 4.1 is used to calculate the new ExpireDate after giving an extension due to recent access to the file. So the new ExpireDate should derive by adding Default Expiration Period (10 minutes in these test scenarios) to the result of the ‘date’ command which is the ‘CurrentTime’ when the file is actually modified. By looking at the Table 4.3 we can clearly see that new ExpireDate calculated by the equation 4.1 is same as the test out come; therefore Scenario 03 Testing is Passed. Above three scenarios sufficiently cover the SEFS file system expiration behavior. Since all three testing scenarios were passed we can conclude that SEFS is working reliably as it was designed. 4.2 Performance testing Demonstrating SEFS’s ability of file expiring alone will not be enough to draw a conclusion of its usability. If it’s performance is weak it cannot be considered as a useful file system. To evaluate the performance we had to choose a technique that will reflect the implementation overhead of all or most of the overridden SEFS file system operations. The compile technique is used to evaluate the performance since a compiler has to read, write, create, list and remove files in the compilation process effectively covering most of the overridden operations. In order to evaluate SEFS’s performance, we will compare the compile technique results of Ext4, FUSE example file system which SEFS is based on and SEFS itself. Clearly we have to anticipate SEFS’s performance will be the lowest but we need to prove that It will be useful even with some performance degradation. To compare the performance a sufficiently large program source is required but large sources like Linux kernel, apache makes compilation difficult due to dependency issues. To avoid com- plexities we have used the FUSE 2.9.2 source code to carry out the compilation technique since 23
  • 33. we have compiled this so many times without much difficulties. The FUSE 2.9.2 source was compiled in a Intel Core i3 process with. Compilation needs to execute the ./configure, make, and make install to fully compile and install the framework. But for our evaluation its only enough to execute the ./configure and make commands. Since we need to measure the elapsed time we created a simple bash script to execute the two commands and use the ‘time script-name’ to output the elapsed time. #! /bin/bash cd /tmp/fuse-2.9.2 ./configure make Figure 4.1: Compile bash script Code listing 4.1 shows the bash script used to compile the FUSE. Following command is used to output the elapsed time to execute the bash script given that name of the script is compile. > time ./compile First the compilation was done on the Ext4 file system and FUSE source is located and accessed at /tmp/fuse-2.9.2. The compilation was done 5 times and average elapsed time was taken. For every compilation directory /tmp/fuse-2.9.2 was removed and extracted again to make room for ./configure to generate a lot of files from the scratch. Table 4.4 shows the test outcome of Ext4 file system compilation time. Execution Elapsed Time(s) 1 37.651 2 37.811 3 38.301 4 37.191 5 39.192 Average 38.0292 Table 4.4: Ext4 compilation elapsed time Next FUSE example file system fusexmp fh was tested and this was mounted to /mnt/xmp 24
  • 34. and bash script also changed accordingly. Table 4.5 shows the test outcome of FUSE example file system compilation time. Execution Elapsed Time(s) 1 43.605 2 43.538 3 40.940 4 40.676 5 42.620 Average 42.2758 Table 4.5: FUSE example file system compilation elapsed time Finally table 4.6 shows the test outcome of SEFS file system compilation time. Execution Elapsed Time(s) 1 45.868 2 41.711 3 42.547 4 42.193 5 42.079 Average 42.8796 Table 4.6: SEFS file system compilation elapsed time For ease of comparison the average elapsed time for compilation of FUSE source is put together into the table 4.7 File System Average Elapsed Time(s) Ext4 38.0292 FUSE Example 42.2758 SEFS 42.8796 Table 4.7: Compilation elapsed time comparison The data shown in the table 4.7 is transformed in to a easy to understand visually appealing 25
  • 35. graph depicted in the figure 4.2. What we can clearly see is there is no performance degradation by introducing additional functionality to SEFS file system operations with respect to FUSE example file system but it is obvious that Ext4 kernel file system performs better than user space file systems but when considering what user space file systems can offer, this level of performance hit is not that important. As a whole we evaluated the SESF-Self Expiring File System by scenario based testing and performance benchmarking and conclusion is added functionality provided by the file system and its usability is very encouraging. Figure 4.2: Performance Benchmarking through compilation methodology 4.3 Future Works This section briey discusses several remaining issues related to our design and prototype imple- mentation limitations that we are actively considering. The actual file expiration functionality is implemented in the ‘readdir’ file system operation only and this limits the SEFS’s usability to certain extent when it comes to the claiming disk space from expired files. Due to this limitation even though the file is expired already, it will not be cleaned until ‘readdir’ operation is called by the ‘ls’ file listing command or the by the File Manager. We are considering to introduce a background cleaner application which will clean expired files and claim free disk space. The sefsutil program which is used to modify the SEFS attributes set by the file system is less user friendly and we can improve the user experience by introducing GUI application or in the grand scale try to modify the Linux file manager Nautilus to provide that functionality. 26
  • 36. A. Installation Guide Installation Follow the step by setp guideline to install the FUSE framework and SEFS file system. Prerequisites: Ubuntu 12.04 Operating system FUSE framework 2.9.2 • Download the fuse framework gzip file fuse-2.9.2.tar.gz from http://sourceforge. net/projects/fuse/files/fuse-prerelease/ extract the content to folder fuse-2.9.2. • Open a terminal and logged in as root. • Install the framework by executing following commands in the given order. >./configure >make >make install Once the fuse framework is successfully installed, environment is ready to compile the SEFS filesystem. • Extract the sefs.tar.gz file to the folder sefs. • Open a terminal and go to the sefs folder which contains the extracted files. • Type ‘make’ and press enter to compile the sefs file system. This will generate following executable files. sefs - executable for file system. sefsutil - executable for utility program to interact with the SEFS attributes. 27
  • 37. Execution Follow the steps to mount the SEFS file system. • Create two folders in the /mnt folder with read, write, execute permissions. (chmod 777) /mnt/sefs /mnt/sefs buffer • Open a terminal and go to the sefs folder where the filesystem executables were generated. • Execute the following command to mount the filesystem >./sefs /mnt/sefs Note: If the Linux distribution has an old fuse package installed, we will get the follow- ing warning message. We can either uninstall the old FUSE package or just ignore the warning message. fuse: warning: library too old, some operations may not work Once the file system is mounted it can be used through the mount point and speciallity of this is it will display the current directry tree inside the root(/) folder when execute the ‘ls -l’ command. • To un-mount the file system, execute the following command >fusermount -u /mnt/sefs sefsutil Usage The sefsutil program is implemented with a built-in help to provide information about the com- mand and parameter usage. To get the parameter help information execute the following com- mand. Figure A.1 shows the command help result. >./sefsutil -h 28
  • 38. Figure A.1: sefsutil command help 29
  • 39. References [1] J. B. Layton, “User space file systems.” Available: http://www.linux-mag.com/id/ 7814/, June 2010. [2] A. Rajgarhia and A. Gehani, “Performance and extension of user space file systems,” SAC’10, Sierre Switzerland, 2010. [3] M. Szeredi, “Fuse - filesystem in userspace.” Available: http://fuse.sourceforge. net/, January 2005. [4] S. Singh, “Develop your own filesystem with fuse.” Available: http://www.ibm.com/ developerworks/linux/library/l-fuse/, May 2011. [5] A. Ziem, “Bleachbit - clean junk to free disk space and to maintain privacy.” Available: http://bleachbit.sourceforge.net. [6] D. J. Santry, M. J. Feeley, and N. C. Hutchinson, “Elephant: The file system that never forgets,” In Workshop on Hot Topics in Operating Systems, 1999. Available: http://www. cs.fsu.edu/~awang/courses/cop5611_s2006/elephant.pdf. [7] A. Gruenbacher and the SGI XFS development team, “Ubuntu manuals, attr - extended attributes.” http://manpages.ubuntu.com/manpages/maverick/man5/attr.5.html. [8] V. Tarasov, S. Bhanage, and E. Zadok, “Benchmarking file system benchmarking: It *is * rocket science.” [9] J. H. Howard, “An overview of the andrew file system,” in in Winter 1988 USENIX Confer- ence Proceedings, pp. 23–26, 1988. 30