A small trail through the Linux kernel
Andries Brouwer, aeb@cwi.nl 2001-01-01
A program
---------------------------------------------------------------------------------------------------
#include <unistd.h>
#include <fcntl.h>
int main(){
int fd;
char buf[512];
fd = open("/dev/hda", O_RDONLY);
if (fd >= 0)
read(fd, buf, sizeof(buf));
return 0;
}
---------------------------------------------------------------------------------------------------
This little program opens the block special device referring to the first IDE disk, and if the open succeeded reads the first sector. What happens in the kernel? Let us read 2.4.0 source.
open
The open system call is found in fs/open.c:
The routine get_unused_fd() is found in fs/open.c again. It returns the first unused filedescriptor:
Here current is the pointer to the user task struct for the currently executing task.
The struct nameidata is defined in include/linux/fs.h. It is used during lookups.
The routine open_namei() is found in fs/namei.c:
So, essentially, the lookup part op open_namei() is found in path_walk():
read
Given a file descriptor (that keeps the inode and the file position of the file) we want to read. In fs/read_write.c we find:
---------------------------------------------------------------------------------------------------
So the building blocks here are getblk(), ll_rw_block(), and wait_on_buffer().
The real I/O is started by ll_rw_block(). It lives in drivers/block/ll_rw_blk.c.
for (i = 0; i < nr; i++) {
struct buffer_head *bh = bhs[i];
bh->b_end_io = end_buffer_io_sync;
Here bh->b_end_io specifies what to do when I/O is finished. In this case:
So, ll_rw_block() just feeds the requests it gets one by one to submit_bh():
Thus, it finds the right queue and calls the request function for that queue.
#define DEVICE_NR(dev) (MINOR(dev) >> PARTN_BITS)
This .queue field was filled by ide_init_queue():
And blk_init_queue() (from ll_rw_blk.c again):
q->queuedata = HWGROUP(drive);
blk_init_queue(q, do_ide_request);
}
Aha, so we found the q->make_request_fn. Here it is:
---------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
do_ide_request(request_queue_t *q) {
ide_do_request(q->queuedata, 0);
}
ide_do_request(ide_hwgroup_t *hwgroup, int masked_irq) {
ide_startstop_t startstop;