1. 1-1
Chapter 1 Low-Level I/O Routine
System call is the fundamental interface between an application and the Linux kernel.
System calls are generally not invoked directly, but rather via wrapper functions in glibc (or
2. 1-2
perhaps some other library)
File Manipulation
The UNIX file system supports two main objects: files and directories.
Directories are just files with a special format, so the representation of a file is the basic UNIX
concept.
A file in UNIX is a sequence of bytes. Different programs expect various levels of structure, but the
kernel does not impose a structure on files.
System calls for basic file manipulation are open, read, write, close, unlink, and truncate.
Linux file types (ls -l)
- regular file
d directory
b block device
c character device
l symbolic link
s socket link, also called a Unix networking socket
p first-in first-out (FIFO) buffer, also called a named pipe
3. 1-3
Review
The Standard C I/O Library
❑ fopen, fclose
❑fread, fwrite
❑fflush
❑fseek
❑fgetc, getc, getchar
❑fputc, putc, putchar
❑fgets, gets
❑printf, fprintf, and sprintf
❑scanf, fscanf, and sscanf
High-level : file pointer (FILE *) based I/O
stdio library buffers I/O, reducing the system call overhead.
more portable
which is built on top of file descriptor I/O
printf(). scanf()..etc.
file descriptor–based I/O.
An existing file is opened by the open system call, and returns a small integer, called a file
descriptor.
A file descriptor is simply an integer that is used as an index into a table of open files
associated with each process.
Each running program, called a process, has a number of file descriptors associated with
it.
A file descriptor may then be passed to a read or write system call (along with a buffer
address and the number of bytes to transfer) to perform data transfers to or from the file.
A file is closed when its file descriptor is passed to the close system call.
The truncate call reduces the length of a file.
The values 0, 1, and 2 are special and refer to the stdin, stdout, and stderr streams;
❑open: Open a file or device
❑read: Read from an open file or device
❑write: Write to a file or device
❑close: Close the file or device
❑ioctl: Pass control information to a device driver
The ioctl system call is used to provide some necessary hardware-specific control
4. 1-4
Most of these calls return a value of -1 in the event of error and set the variable errno to the error
code. Error codes are documented in the man pages for the individual system calls and in the man
page for errno. The perror() function can be used to print an error message based on the error code
perror - print a system error message
#include <stdio.h>
void perror(const char *s);
strerror - return string describing error number
If an error occurs, it would probably be helpful to your users (or to you, for that matter) to know
what the operating system thinks went wrong.
#include <string.h>
char *strerror(int errnum);
5. 1-5
man -k to matches a string in the manual
The open() Call
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);
The open() call is used to open a file.
pathname: a string with the full or relative pathname to the file to be opened.
Mode : specifies the UNIX file mode (permissions bits) to be used when creating a file and should
be present if a file may be created.
Flags: is one of O_RDONLY, O_WRONLY, or O_RDWR, optionally OR-ed with additional
flags;
Flag Description9
❑O_RDONLY Open file for read-only access.
❑O_WRONLY Open file for write-only access.
❑O_RDWR Open file for read and write access.
❑O_CREAT Create the file if it does not exist.
❑O_EXCL Fail if the file already exists.
❑O_NOCTTY Open terminal device, device for reading and writing and not as controlling tty
because we don't want to get killed if linenoise sends CTRL-C.
❑O_TRUNC Truncate the file to length 0 if it exists.
❑O_APPEND Append file pointer will be positioned at end of file.
❑O_NONBLOCK If an operation cannot complete without delay, return before completing the
operation
6. 1-6
Blocking I/O v.s. Non-Blocking I/O
Synchronous I/O v.s. Asynchronous I/O
Synchronous is the default, where a read statement will block, until the read is satisfied.
In asynchronous mode the read statement will return immediately
/* open the device to be non-blocking (read will return immediatly) */
fd = open(MODEMDEVICE, O_RDWR | O_NOCTTY | O_NONBLOCK);
❑O_NODELAY Same as O_NONBLOCK.
❑O_SYNC Operations will not return until the data has been physically written to the disk or other
device.
When we create a file using the O_CREAT flag with open, we must use the three-parameter
form. mode,
❑S_IRUSR: Read permission, owner
❑S_IWUSR: Write permission, owner
❑S_IXUSR: Execute permission, owner
❑S_IRGRP: Read permission, group
❑S_IWGRP: Write permission, group
❑S_IXGRP: Execute permission, group
❑S_IROTH: Read permission, others
❑S_IWOTH: Write permission, others
❑S_IXOTH: Execute permission, others
open(“/home/myfile”,O_CREAT|O_TRUNC|O_WRONLY, S_IRUSR|S_IXOTH);
Umask
The umask is a system variable that encodes a mask for file permissions to be used when a file is
created. You can change the variable by executing the umask command to supply a new value. The
value is a three-digit octal value. Each digit is the result of ANDing values from 1, 2, or 4; the
meanings are shown in the following table. The separate digits refer to “user,” “group,” and “other”
permissions, respectively.
7. 1-7
The close() Call
You should close a file descriptor when you are done with it. The single argument is the file
descriptor number returned by open().
#include <unistd.h>
int close(int fd);
If this is the last (or only) file descriptor associated with an open file, the entry in the open file table
will be freed.
The read() Call
The read() system call is used to read data from the file corresponding to a file descriptor.
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
fd: file descriptor that was returned from a previous open() call.
Buf: pointer to a buffer to copy the data from
count : The read system call reads up to count bytes of data from the file associated with the file
descriptor fildes and places them in the data area buf.
read() returns the number of bytes read or a value of –1 if an error occurs (check errno).
The write() Call
#include <unistd.h>s
ssize_t write(int fd, const void *buf, size_t count);
8. 1-8
The write() system call is used to write data to the file corresponding to a file descriptor.
fd : the file descriptor which was returned from a previous open() call.
Buf: a pointer to a buffer to copy the data to (which must be large enough to hold the data)
count : gives the number of bytes to write.
write() returns the number of bytes write or a value of -1 if an error occurs (check errno).
svn co svn://192.168.1.251/15_linuxpro/0_joseph/SourceCode mylinux
Svn 安裝: sudo apt-get install subversion
Example 1-1.c:
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
int main()
{
char c;
int in, out;
in = open(“file.in”, O_RDONLY);
out = open(“file.out”, O_WRONLY|O_CREAT|O_TRUNC, S_IRUSR|S_IWUSR);
while(read(in,&c,1) == 1)
write(out,&c,1);
close(in);
close(out);
exit(0);
}
9. 1-9
POSIX allows a read () that is interrupted after reading some data to return -1
(With errno set to EINTR) or to return the number of bytes already read.
/*CASE 1: either it shall return -1 with errno set to EINTR*/
while( read(fd, buf, nbytes) < 0 ) {
if( errno == EINTR )
continue;
else
FATAL;
/*CASE 2: it shall return the number of bytes read.*/
}
void* bp;
bp = buf;
while( (rc=read(fd, bp, nbytes)) < nbytes ) {
bp += rc;
nbytes -= rc;
/* Combine Case 1 and 2*/
}
11. 1-11
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#define BLKSIZE 1024
void * copy_file(void *fd);
int main(int argc, char *argv[])
{
int file_fd[2];
int totalbytes=0;
int *p;
if (argc != 3) {
fprintf(stderr, "Usage: %s from_file to_filen", argv[0]);
exit(1);
}
if ((file_fd[0] = open(argv[1], O_RDONLY)) == -1) {
fprintf(stderr, "Could not open %s: %sn",
argv[1], strerror(errno));
exit(1);
}
}
12. 1-12
if ((file_fd[1] = open(argv[2], O_WRONLY | O_CREAT | O_EXCL,
S_IRUSR | S_IWUSR)) == -1) {
fprintf(stderr, "Could not create %s: %sn",
argv[2], strerror(errno));
exit(1);
}
p=copy_file(file_fd);
printf("[%d] bytes copiedn",*p);
return 0;
}
void * copy_file(void *fd)
{
int bytesread, byteswritten;
int *totalbytes;
char buf[BLKSIZE];
int from_fd, to_fd;
char *bp;
from_fd = *((int *)(fd));
to_fd = *((int *)(fd) + 1);
while (bytesread = read(from_fd, buf, BLKSIZE)) {
if ((bytesread == -1) && (errno != EINTR))
break; /* real error occurred on the descriptor */
bp = buf;
while(byteswritten = write(to_fd, bp, bytesread)) {
if ((byteswritten == -1) && (errno != EINTR))
break;
if (byteswritten > 0) {
bp += byteswritten;
bytesread -= byteswritten;
}
}
}
close(from_fd);
close(to_fd);
return totalbytes;
}
13. 1-13
The ioctl() Call
The ioctl() system call is a catchall for setting or retrieving various parameters associated with a file
or to perform other operations on the file. The ioctls available, and the arguments to ioctl(), vary
depending on the underlying device.
#include <sys/ioctl.h>
int ioctl(int fd, int command, ...)
14. 1-14
Example 1-4.c::
The argument d must be an open file descriptor.
Example:
The fsync() Call
The fsync() system call flushes all of the data written to file descriptor fd to disk or other
underlying device.
#include <unistd.h>
int fsync(int fd);
15. 1-15
The Linux filesystem may keep the data in memory for several seconds before writing it to disk in
order to more efficiently handle disk I/O. O_DYSNC,O_RSYNC, and O_SYNC flags with open()
could slow a program down since each write() does not return until all data have been written to
physical media. A zero is returned if successful; otherwise -1 will be returned and errno will be set.
The ftruncate() Call
The ftruncate() system call truncates the file referenced by file descriptor fd to the length specified
by length.
#include <unistd.h>
int ftruncate(int fd, size_t length);
Return values are zero for success and -1 for an error (check errno).
It’s is possible to exten the length of a file with these calls,not just shorten a file.
The lseek() Call
The lseek() function sets the current position of reads and writes in the file referenced by file
descriptor files to position offset.
#include <sys/types.h>
#include <unistd.h>
off_t lseek(int fildes, off_t offset, int whence);
❑SEEK_SET: offset is an absolute position
❑SEEK_CUR: offset is relative to the current position
16. 1-16
❑SEEK_END: offset is relative to the end of the file
The return value is the resulting offset (relative to the beginning of the file) or a value of (off_t) -1
in the case of error (errno will be set).
1-5.c:
#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>
#include <fcntl.h>
typedef struct {
int integer;
char string[12];
} RECORD;
#define NRECORDS (50)
int main()
{
RECORD record;
int i, f;
FILE *fp;
fp = fopen("records.dat","w+");
for(i=0; i<NRECORDS; i++) {
record.integer = i;
sprintf(record.string,"RECORD-%d",i);
fprintf(stdout,"RECORD-%dn",i);
fwrite(&record,sizeof(record),1,fp);
}
fclose(fp);
/* We now change the integer value of record 43 to 143
and write this to the 43rd record's string. */
fp = fopen("records.dat","r+");
17. 1-17
fseek(fp,43*sizeof(record),SEEK_SET);
fread(&record,sizeof(record),1,fp);
record.integer = 999;
sprintf(record.string,"RECORD-%d",record.integer);
fseek(fp,43*sizeof(record),SEEK_SET);
fwrite(&record,sizeof(record),1,fp);
fclose(fp);
return 0;
}
The dup2() Calls
#include <unistd.h>
int dup2(int oldfd, int newfd);
Example 1-6.c:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <assert.h>
print_line(int n)
{
char buf[32];
snprintf(buf,sizeof(buf), “Line #%dn”,n);
write(1,buf, strlen(buf));
}
int main()
{
int fd;
print_line(1);
print_line(2);
18. 1-18
print_line(3);
/* redirect stdout to file junk.out */
fd=open(“junk.out”, O_WRONLY|O_CREAT,0666);
assert(fd>=0);
dup2(fd,1);
print_line(4);
print_line(5);
print_line(6);
close(fd);
close(1);
}
The fstat() Call
The fstat() system call returns information about the file referred to by the file descriptor files,
placing the result in the struct stat pointed to by buf(). A return value of zero is success and -1 is
failure (check errno).
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
int fstat(int fildes, struct stat *buf);
int stat(const char *path, struct stat *buf);
int lstat(const char *path, struct stat *buf);
※lstat :if path is symbolic the link itself is stat-ed, not the file
that it refers to
19. 1-19
Example 1-7.c:: fstat()
#include <sys/types.h>
#include <sys/stat.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
Int main(int argc, char *argv[])
{
struct stat sb;
if (argc != 2) {
fprintf(stderr, "Usage: %s <pathname>n", argv[0]);
exit(EXIT_FAILURE);
}
if (stat(argv[1], &sb) == -1) {
perror("stat");
exit(EXIT_FAILURE);
}
printf("File type: ");
20. 1-20
switch (sb.st_mode & S_IFMT) {
case S_IFBLK: printf("block devicen"); break;
case S_IFCHR: printf("character devicen"); break;
case S_IFDIR: printf("directoryn"); break;
case S_IFIFO: printf("FIFO/pipen"); break;
case S_IFLNK: printf("symlinkn"); break;
case S_IFREG: printf("regular filen"); break;
case S_IFSOCK: printf("socketn"); break;
default: printf("unknown?n"); break;
}
printf("I-node number: %ldn", (long) sb.st_ino);
printf(s"Mode: %lo (octal)n", (unsigned long) sb.st_mode);
printf("Link count: %ldn", (long) sb.st_nlink);
printf("Ownership: UID=%ld GID=%ldn", (long) sb.st_uid, (long) sb.st_gid);
printf("Preferred I/O block size: %ld bytesn", (long) sb.st_blksize);
printf("File size: %lld bytesn", (long long) sb.st_size);
printf("Blocks allocated: %lldn", (long long) sb.st_blocks);
printf("Last status change: %s", ctime(&sb.st_ctime));
printf("Last file access: %s", ctime(&sb.st_atime));
printf("Last file modification: %s", ctime(&sb.st_mtime));
exit(EXIT_SUCCESS);
}
The st_mode flags returned in the stat structure also have a number of associated macros
defined in the header file sys/stat.h. These macros include names for permission and file-type
flags and some masks to help with testing for specific types and permissions.
21. 1-21
st_mode
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
File Type SUID SGID Sticky Owner r/w/x Group r/w/x Other r/w/x
File-type flags include.
❑S_IFMT 0170000: File type (1111)
❑S_IFSOCK 0140000 Entry is a socket (1100)
❑S_IFLNK 0120000: Entry is a symbolic link (1010)
❑S_IFREG 0100000: Entry is a regular file (1000)
❑S_IFBLK 0060000: Entry is a block special device (0110)
❑S_IFDIR 0040000: Entry is a directory (0100)
❑S_IFCHR 0020000: Entry is a character special device (0010)
❑S_IFIFO 0010000: Entry is a FIFO (named pipe) (0001)
❑S_ISUID 0004000: Entry has setUID on execution
❑S_ISGID 0002000: Entry has setGID on execution
Masks to interpret the st_mode flags include
❑S_IRWXU 0000700: User read/write/execute permissions
❑S_IRWXG 0000070: Group read/write/execute permissions
❑S_IRWXO 0000007: Others’ read/write/execute permissions
The following POSIX macros are defined to check the file type using
the st_mode Field
S_ISREG(m) is it a regular file?
S_ISDIR(m) directory?
S_ISCHR(m) character device?
S_ISBLK(m) block device?
S_ISFIFO(m) FIFO (named pipe)?
S_ISLNK(m) symbolic link?
S_ISSOCK(m) socket?
22. 1-22
Sticky bit
the Linux kernel ignores the sticky bit on files. When the sticky bit is set on a directory,
files in that directory may only be unlinked or renamed by root or their owner
chmod +t /usr/local/tmp
chmod 1777 /usr/local/tmp.
The access() Call
check user permissions for a file
#include <unistd.h>
int access(const char *pathname, int mode);
access( ) checks whether the process would be allowed to read, write or test for existence of the
file (or other file system object) whose name is pathname. If pathname is a symbolic link
permissions of the file referred to by this symbolic link are tested.
mode: is a mask consisting of one or more of R_OK, W_OK, X_OK and F_OK.
23. 1-23
R_OK, W_OK and X_OK request checking whether the file exists and has read, write and execute
permissions, respectively. F_OK just requests checking for the existence of the file.
On success (all requested permissions granted), zero is returned. On error (at least one bit in mode
asked for a permission that is denied, or some other error occurred), -1 is returned, and errno is set
appropriately.
The fcntl() Call
The fcntl() call is similar to ioctl() but it sets or retrieves a different set of parameters.
#include <unistd.h>
#include <fcntl.h>
int fcntl(int fd, int cmd);
int fcntl(int fd, int cmd, long arg);
fd: file descriptor,
cmd:the command
arg: an argument specific to the particular command.
COMMANDS FOR fcntl()
Command Description
❑ F_DUPFD Duplicates file descriptors. Use dup2() instead.
❑ F_GETFD Gets close-on-exec flag. The file will remain open across exec() family calls if the
24. 1-24
low order bit is 0.
❑ F_SETFD Sets close-on-exec flag.
❑ F_GETFL Gets the flags set by open.
❑ F_SETFL Changes the flags set by open.
❑ F_GETLK Gets discretionary file locks (see flock().)
❑ F_SETLK Sets discretionary lock, no wait.
❑ F_SETLKW Sets discretionary lock, wait if necessary.
❑ F_GETOWN Retrieves the process id or process group number that will receive the SIGIO and
SIGURG signals.
❑ F_SETOWN Sets the process id or process group number.
Select -- synchronous I/O multiplexing
How to wait for Input from Multiple Sources?
The select system call will not load the CPU while waiting for input, whereas looping until
input becomes available would slow down other processes executing at the same time.
25. 1-25
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>
int select(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout);
void FD_CLR(int fd, fd_set *set);
int FD_ISSET(int fd, fd_set *set);
void FD_SET(int fd, fd_set *set);
void FD_ZERO(fd_set *set);
select() allows a program to monitor multiple file descriptors, waiting until one or more of the file
descriptors become "ready" for some class of I/O operation (e.g., input possible). A file descriptor is
considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without
blocking.
Three independent sets of file descriptors are watched. Those listed in readfds will be watched
to see if characters become available for reading (more precisely, to see if a read will not block;
in particular, a file descriptor is also ready on end-of-file), those in writefds will be watched to
see if a write will not block, and those in exceptfds will be watched for exceptions.
On exit, the sets are modified in place to indicate which file descriptors actually changed
status.
Each of the three file descriptor sets may be specified as NULL if no file descriptors are to be
watched for the corresponding class of events.
Four macros are provided to manipulate the sets. FD_ZERO() clears a set. FD_SET() and
FD_CLR() respectively add and remove a given file descriptor from a set. FD_ISSET() tests to see
if a file descriptor is part of the set; this is useful after select() returns.
nfds is the highest-numbered file descriptor in any of the three sets, plus 1.
timeout is an upper bound on the amount of time elapsed before select() returns. It may be zero,
causing select() to return immediately. (This is useful for polling.) If timeout is NULL (no timeout),
select() can block indefinitely.
26. 1-26
On Linux, select() modifies timeout to reflect the amount of time not slept; most other
implementations do not do this.
Return Value
On success, select() returns the number of file descriptors contained in the three returned
descriptor sets (that is, the total number of bits that are set in readfds, writefds, exceptfds)
which may be zero if the timeout expires before anything interesting happens.
On error, -1 is returned, and errno is set appropriately; the sets and timeout become undefined,
so do not rely on their contents after an error.
Example
#include <stdio.h>
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>
int
main(void) {
fd_set rfds;
struct timeval tv;
int retval;
/* Watch stdin (fd 0) to see when it has input. */
FD_ZERO(&rfds);
FD_SET(0, &rfds);
/* Wait up to five seconds. */
tv.tv_sec = 5;
tv.tv_usec = 0;
retval = select(1, &rfds, NULL, NULL, &tv);
/* Don't rely on the value of tv now! */
if (retval == -1)
perror("select()");
else if (retval)
printf("Data is available now.n");
/* FD_ISSET(0, &rfds) will be true. */
27. 1-27
else
printf("No data within five seconds.n");
return 0;
}
File Locking
Example: l
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
const char *lock_file = "/tmp/LCK.test2";
int main() {
int file_desc;
int tries = 10;
while (tries--) {
file_desc = open(lock_file, O_RDWR | O_CREAT | O_EXCL, 0444);
if (file_desc == -1) {
printf("%d - Lock already presentn", getpid());
sleep(3);
} else
break;
} /* while */
/* critical region */
printf("%d - I have exclusive accessn", getpid());
sleep(10);
(void)close(file_desc);
(void)unlink(lock_file);
/* non-critical region */
exit(EXIT_SUCCESS);