I/O
- Unix I/O
- RIO (robust I/O) package
- Metadata, sharing, and redirection
- Standard I/O
- Closing remarks
Unix I/O Overview
- A Linux file is a sequence of m bytes
- Cool fact: All I/O devices are represented as files
- Even the kernel is represented as a file
- Elegant mapping of files to devices allows kernel to export simple interface called Unix I/O
File Types
- Each file has a type indicating its role in the system
- Regular file: Contains arbitrary data
- Directory: Index of a related group of files
- Socket: For communicating with a process on another machine
- Other file types beyond our scope
- Named pipes
- Symbolic links
- Character and block devices
Regular Files
- A regular file contains arbitrary
- Applications often distinguish between text files and binary files
- Text files are regular files with only ASCII or Unicode characters
- Binary files are everything else
- Kernel doesn’t know the difference
- Text file is sequence of text lines
- Text line is sequence of chars terminated by newline char(’\n’)
- End of line indicators in other systems
- Linux and Mac OS: ‘\n’
- Windows and Internet protocols: ‘\r\n’
Directories
- Directory consists of an array of links
- Each link maps a filename to a file
- Each directory contains at least two entries
- . is a link to itself
- … is a link to the parent directory in the directory hierarchy
- Commands for manipulating directories
- mkdir: create empty directory
- ls: view directory contents
- rmdir: delete empty directory
Directory Hierarchy
- All files are organized as a hierarchy anchored by root directory named /
- Kernel maintains current working directory for each command
Opening Files
- Opening a file informs the kernel that you are getting ready to access that file
- Returns a small identifying integer file descriptor
- fd == -1 indicates that an error occurred
- Each process created by a Linux shell begins life with three open files associated with a terminal
- 0: standard input
- 1: standard output
- 2: standard error
Closing Files
- Closing a file informs the kernel that you are finished accessing that file
- Closing an already closed file is a recipe for disaster in threaded programs
- Moral: Always check return codes, even for seemingly benign functions such as close()
Reading Files
- Reading a file copies bytes from the current file position to memory, and then updates file position
- Returns number of bytes read from file rd into buf
- Return type ssize_t is signed integer
- nbytes < 0 indicates that an error occurred
- Short counts (nbytes < sizeof (buf) ) are possible and are not errors
Writing Files
- Writing a file copies bytes from memory to the current file position, and then updates current file position
- Returns number of bytes written from buf to file fd
- nbytes < 0 indicates that an error occurred
- As with reads, short counts are possible and are not errors
On Short Counts
- Short counts can occur in these situations
- Encountering EOF on reads
- Reading text lines from a terminal
- Reading and writing network sockets
- Short counts never occur in these situations
- Reading from disk files (except for EOF)
- Writing to disk files
- Best practice is to always allow for short counts
The RIO Package
- RIO is a set of wrappers that provide efficient and robust I/O in apps, such as network programs that are subject to short counts
- RIO provides two different kinds of function
- Unbuffered input and output of binary data
- rio_readn and rio_writen
- Buffered input of text lines and binary data
- rio_realineb and rio_readnb
- Buffered RIO routines are thread-safe and can be interleaved arbitrarily on the same descriptor
- Unbuffered input and output of binary data
File Metadata
- Metadata is data about data, in this case file data
- Per-file metadata maintained by kernel
- accessed by users with the stat and fstat functions
Pros and Cons of Unix I/O
- Pros
- Unix I/O is the most general and lowest overhead form of I/O
- All other I/O packages are implemented using Unix I/O functions
- Unix I/O provides functions for accessing file metadata
- Unix I/O functions are async-signal-safe and can be used safely in signal handlers
- Unix I/O is the most general and lowest overhead form of I/O
- Cons
- Dealing with short counts is tricky and error prone
- Efficient reading of text lines requires some form of buffering, also tricky and error prone
- Both of these issues are addresses by the standard I/O and RIO
Pros and Cons of Standard I/O
- Pros
- Buffering increases efficiency by decreasing the number of read and writes system calls
- Short counts are handled automatically
- Cons
- Provides no function for accessing file metadata
- Standard I/O functions are not async-signal-safe, and not appropriate for signal handlers
- Standard I/O is not appropriate for input and output on network sockets
- There are poorly documented restrictions on streams that interact badly with restrictions on sockets
Choosing I/O Functions
- General rule: use the highest-level I/O functions you can
- Many C programmers are able to do all of their work using the standard I/O functions
- When to use standard I/O
- When working with disk or terminal files
- When to use raw Unix I/O
- Inside signal handlers, because Unix I/P is async-signal-safe
- In rare cases when you need absolute highest performance
- When to use RIO
- When you are reading and writing network sockets
- Avoid using standard I/O on sockets