Linux内核模块编程指南(二)

最新推荐文章于 2024-08-13 09:00:12 发布

雨声不在

最新推荐文章于 2024-08-13 09:00:12 发布

阅读量2.6k

点赞数 1

文章标签： kernel driver

linux 专栏收录该内容

25 篇文章 7 订阅

订阅专栏

翻译来自：
http://tldp.org/LDP/lkmpg/2.6/html/lkmpg.html
本系列文章还有:
Linux内核模块编程指南(一)
Linux内核模块编程指南(二)
Linux内核模块编程指南(三)
Linux内核模块编程指南(四)

第4章字符设备文件

字符设备驱动程序

file_operations结构

file_operations结构在linux / fs.h中定义，并保存指向由驱动程序定义的函数的指针，这些函数在设备上执行各种操作。结构的每个字段对应于驱动程序定义的某个函数的地址，以处理所请求的操作。

例如，每个字符驱动程序都需要定义从设备读取的函数。 file_operations结构保存执行该操作的模块函数的地址。以下是内核2.6.5的定义：

struct file_operations {
    struct module *owner;
     loff_t(*llseek) (struct file *, loff_t, int);
     ssize_t(*read) (struct file *, char __user *, size_t, loff_t *);
     ssize_t(*aio_read) (struct kiocb *, char __user *, size_t, loff_t);
     ssize_t(*write) (struct file *, const char __user *, size_t, loff_t *);
     ssize_t(*aio_write) (struct kiocb *, const char __user *, size_t,
                  loff_t);
    int (*readdir) (struct file *, void *, filldir_t);
    unsigned int (*poll) (struct file *, struct poll_table_struct *);
    int (*ioctl) (struct inode *, struct file *, unsigned int,
              unsigned long);
    int (*mmap) (struct file *, struct vm_area_struct *);
    int (*open) (struct inode *, struct file *);
    int (*flush) (struct file *);
    int (*release) (struct inode *, struct file *);
    int (*fsync) (struct file *, struct dentry *, int datasync);
    int (*aio_fsync) (struct kiocb *, int datasync);
    int (*fasync) (int, struct file *, int);
    int (*lock) (struct file *, int, struct file_lock *);
     ssize_t(*readv) (struct file *, const struct iovec *, unsigned long,
              loff_t *);
     ssize_t(*writev) (struct file *, const struct iovec *, unsigned long,
               loff_t *);
     ssize_t(*sendfile) (struct file *, loff_t *, size_t, read_actor_t,
                 void __user *);
     ssize_t(*sendpage) (struct file *, struct page *, int, size_t,
                 loff_t *, int);
    unsigned long (*get_unmapped_area) (struct file *, unsigned long,
                        unsigned long, unsigned long,
                        unsigned long);
};

某些操作未由驱动程序实现。例如，处理视频卡的驱动程序不需要从目录结构中读取。 file_operations结构中的相应条目应设置为NULL 。

有一个gcc扩展，使得分配给这个结构更方便。你会在现代司机中看到它，并可能会让你感到惊讶。这就是分配给结构的新方法：

struct file_operations fops = {
    read: device_read,
    write: device_write,
    open: device_open,
    release: device_release
};

但是，还有一种C99方式分配给结构的元素，这绝对比使用GNU扩展更受欢迎。作者在编写时使用的gcc版本为2.95 ，支持新的C99语法。如果有人想要移植您的驱动程序，您应该使用此语法。这将有助于兼容性：

struct file_operations fops = {
    .read = device_read,
    .write = device_write,
    .open = device_open,
    .release = device_release
};

意思很清楚，你应该知道你没有明确指定的结构的任何成员都将被gcc初始化为NULL 。

包含指向用于实现read，write，open，… syscalls的函数的指针的struct file_operations实例通常被命名为fops 。

文件结构

每个设备都在内核中通过文件结构表示，该文件结构在linux / fs.h中定义。请注意，文件是内核级结构，并且永远不会出现在用户空间程序中。它与FILE不同，它由glibc定义，永远不会出现在内核空间函数中。此外，它的名字有点误导; 它表示一个抽象的打开`文件’，而不是磁盘上的文件，它由名为inode的结构表示。

struct文件的一个实例通常称为filp 。您还会将其视为结构文件文件。抵制诱惑。

继续看看文件的定义。您看到的大多数条目（如struct dentry）都不会被设备驱动程序使用，您可以忽略它们。这是因为司机不直接填写文件 ; 它们只使用在别处创建的文件中包含的结构。

注册设备

如前所述，char设备是通过设备文件访问的，通常位于/ dev [7]中。主要编号告诉您哪个驱动程序处理哪个设备文件。次要编号仅由驱动程序本身使用，以区分它正在操作的设备，以防驱动程序处理多个设备。

向系统添加驱动程序意味着将其注册到内核。这与在模块初始化期间为其分配主编号同义。您可以使用由linux / fs.h定义的register_chrdev函数来完成此操作。

int register_chrdev(unsigned int major, const char *name, struct file_operations *fops);

其中unsigned int major是您要请求的主要编号， const char * name是设备的名称，因为它将出现在/ proc / devices和struct file_operations中* fops是指向驱动程序的file_operations表的指针。负返回值表示注册失败。请注意，我们没有将次要编号传递给register_chrdev 。那是因为内核并不关心次要数字; 只有我们的司机使用它。

现在的问题是，如何在不劫持已使用的数字的情况下获得一个主要数字？最简单的方法是查看Documentation / devices.txt并选择一个未使用的文件。这是一种糟糕的做事方式，因为你永远不会确定你选择的号码是否会在以后分配。答案是你可以要求内核为你分配一个动态的主号码。

如果将主要编号0传递给register_chrdev ，则返回值将是动态分配的主编号。缺点是您无法提前制作设备文件，因为您不知道主要编号是什么。有几种方法可以做到这一点。首先，驱动程序本身可以打印新分配的号码，我们可以手动制作设备文件。其次，新注册的设备将在/ proc / devices中有一个条目，我们可以手工制作设备文件，也可以编写shell脚本来读取文件并制作设备文件。第三种方法是我们可以让我们的驱动程序在成功注册后使用mknod系统调用来生成设备文件，并在调用cleanup_module期间使用rm。

取消注册设备

当root感觉它时，我们不能允许内核模块被rmmod编辑。如果一个进程打开了设备文件，然后我们删除了内核模块，那么使用该文件会导致调用以前使用相应函数（读/写）的内存位置。如果我们很幸运，那里没有加载其他代码，我们会收到一条丑陋的错误消息。如果我们运气不好，另一个内核模块被加载到同一个位置，这意味着跳转到内核中另一个函数的中间。这样的结果是不可能预测的，但它们不是非常积极的。

通常，当您不想允许某些内容时，您将从应该执行此操作的函数返回错误代码（负数）。使用cleanup_module是不可能的，因为它是一个void函数。但是，有一个计数器可以跟踪使用模块的进程数量。通过查看/ proc / modules的第3个字段，您可以看到它的价值。如果此数字不为零，则rmmod将失败。请注意，您不必在cleanup_module中检查计数器，因为将通过linux / module.c中定义的系统调用sys_delete_module为您执行检查。你不应该直接使用这个计数器，但linux / module.h中定义了一些函数，可以增加，减少和显示这个计数器：

try_module_get（THIS_MODULE）：增加使用次数。
module_put（THIS_MODULE）：减少使用次数。

保持计数器准确是很重要的; 如果您忘记了正确的使用次数，您将永远无法卸载该模块; 它现在重启时间，男孩和女孩。在模块开发过程中，这迟早会发生在您身上。

chardev.c

下一个代码示例创建一个名为chardev的char驱动程序。您可以捕获其设备文件（或使用程序打开文件），驱动程序会将设备文件的读取次数放入文件中。我们不支持写入文件（如echo“hi”> / dev / hello ），但是捕获这些尝试并告诉用户不支持该操作。如果您没有看到我们对读入缓冲区的数据做了什么，请不要担心; 我们对它没有太大作用。我们只是读入数据并打印一条消息，确认我们收到了它。

例4-1。 chardev.c

/*
 *  chardev.c: Creates a read-only char device that says how many times
 *  you've read from the dev file
 */

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/fs.h>
#include <asm/uaccess.h>    /* for put_user */

/*  
 *  Prototypes - this would normally go in a .h file
 */
int init_module(void);
void cleanup_module(void);
static int device_open(struct inode *, struct file *);
static int device_release(struct inode *, struct file *);
static ssize_t device_read(struct file *, char *, size_t, loff_t *);
static ssize_t device_write(struct file *, const char *, size_t, loff_t *);

#define SUCCESS 0
#define DEVICE_NAME "chardev"   /* Dev name as it appears in /proc/devices   */
#define BUF_LEN 80      /* Max length of the message from the device */

/* 
 * Global variables are declared as static, so are global within the file. 
 */

static int Major;       /* Major number assigned to our device driver */
static int Device_Open = 0; /* Is device open?  
                 * Used to prevent multiple access to device */
static char msg[BUF_LEN];   /* The msg the device will give when asked */
static char *msg_Ptr;

static struct file_operations fops = {
    .read = device_read,
    .write = device_write,
    .open = device_open,
    .release = device_release
};

/*
 * This function is called when the module is loaded
 */
int init_module(void)
{
        Major = register_chrdev(0, DEVICE_NAME, &fops);

    if (Major < 0) {
      printk(KERN_ALERT "Registering char device failed with %d\n", Major);
      return Major;
    }

    printk(KERN_INFO "I was assigned major number %d. To talk to\n", Major);
    printk(KERN_INFO "the driver, create a dev file with\n");
    printk(KERN_INFO "'mknod /dev/%s c %d 0'.\n", DEVICE_NAME, Major);
    printk(KERN_INFO "Try various minor numbers. Try to cat and echo to\n");
    printk(KERN_INFO "the device file.\n");
    printk(KERN_INFO "Remove the device file and module when done.\n");

    return SUCCESS;
}

/*
 * This function is called when the module is unloaded
 */
void cleanup_module(void)
{
    /* 
     * Unregister the device 
     */
    int ret = unregister_chrdev(Major, DEVICE_NAME);
    if (ret < 0)
        printk(KERN_ALERT "Error in unregister_chrdev: %d\n", ret);
}

/*
 * Methods
 */

/* 
 * Called when a process tries to open the device file, like
 * "cat /dev/mycharfile"
 */
static int device_open(struct inode *inode, struct file *file)
{
    static int counter = 0;

    if (Device_Open)
        return -EBUSY;

    Device_Open++;
    sprintf(msg, "I already told you %d times Hello world!\n", counter++);
    msg_Ptr = msg;
    try_module_get(THIS_MODULE);

    return SUCCESS;
}

/* 
 * Called when a process closes the device file.
 */
static int device_release(struct inode *inode, struct file *file)
{
    Device_Open--;      /* We're now ready for our next caller */

    /* 
     * Decrement the usage count, or else once you opened the file, you'll
     * never get get rid of the module. 
     */
    module_put(THIS_MODULE);

    return 0;
}

/* 
 * Called when a process, which already opened the dev file, attempts to
 * read from it.
 */
static ssize_t device_read(struct file *filp,   /* see include/linux/fs.h   */
               char *buffer,    /* buffer to fill with data */
               size_t length,   /* length of the buffer     */
               loff_t * offset)
{
    /*
     * Number of bytes actually written to the buffer 
     */
    int bytes_read = 0;

    /*
     * If we're at the end of the message, 
     * return 0 signifying end of file 
     */
    if (*msg_Ptr == 0)
        return 0;

    /* 
     * Actually put the data into the buffer 
     */
    while (length && *msg_Ptr) {

        /* 
         * The buffer is in the user data segment, not the kernel 
         * segment so "*" assignment won't work.  We have to use 
         * put_user which copies data from the kernel data segment to
         * the user data segment. 
         */
        put_user(*(msg_Ptr++), buffer++);

        length--;
        bytes_read++;
    }

    /* 
     * Most read functions return the number of bytes put into the buffer
     */
    return bytes_read;
}

/*  
 * Called when a process writes to dev file: echo "hi" > /dev/hello 
 */
static ssize_t
device_write(struct file *filp, const char *buff, size_t len, loff_t * off)
{
    printk(KERN_ALERT "Sorry, this operation isn't supported.\n");
    return -EINVAL;
}

编写多个内核版本的模块

系统调用是内核向进程显示的主要接口，通常在不同版本中保持不变。可以添加新的系统调用，但通常旧的系统调用将与以前完全相同。这对于向后兼容性是必要的 - 新的内核版本不应该破坏常规进程。在大多数情况下，设备文件也将保持不变。另一方面，内核中的内部接口可以在版本之间进行更改。

Linux内核版本分为稳定版本（n。 $<$ 偶数 $>$ .m）和开发版本（n。 $<$ odd number $>$ .m）。开发版本包括所有很酷的新想法，包括那些在下一版本中被视为错误或重新实现的想法。因此，您不能相信这些版本中的界面保持不变（这就是为什么我在本书中没有为它们提供支持的原因，它的工作太多而且会过快地过时）。另一方面，在稳定版本中，无论错误修复版本（m号）如何，我们都可以期望界面保持不变。

不同的内核版本之间存在差异，如果您想支持多个内核版本，您将发现自己必须编写条件编译指令。这样做的方法是将宏LINUX_VERSION_CODE与宏KERNEL_VERSION进行比较。在内核的版本abc中，此宏的值为 $2 ^ {16} a + 2 ^ {8} b + c$ 。

虽然本指南的先前版本展示了如何使用这些结构非常详细地编写向后兼容代码，但我们决定更好地打破这一传统。有兴趣这样做的人现在可以使用LKMPG，其版本与其内核匹配。我们决定像内核一样对LKMPG进行版本控制，至少就主要和次要数字而言。我们使用patchlevel进行自己的版本控制，因此对内核2.4.x使用LKMPG版本2.4.x，对内核2.6.x使用LKMPG版本2.6.x，依此类推。还要确保始终使用内核和指南的当前最新版本。

更新：我们上面所说的内容适用于2.6.10之前的内核。您可能已经注意到最近的内核看起来不同。如果你没有，他们现在看起来像2.6.xy。前三个项目的含义基本上保持不变，但是已经添加了一个子补丁级别，它将指示安全修复，直到下一个稳定的补丁级别结束。因此，人们可以在具有安全更新的稳定树之间进行选择，并将最新内核用作开发人员树。如果您对完整故事感兴趣，请搜索内核邮件列表档案。

第5章 /proc文件系统

/proc文件系统

在Linux中，内核和内核模块有一个额外的机制来向进程发送信息— / proc文件系统。最初设计为允许轻松访问有关进程的信息（因此名称），它现在被内核的每个位用于报告有趣的东西，例如/ proc / modules ，它提供模块列表和/ proc / meminfo哪些统计内存使用情况统计。

使用proc文件系统的方法与设备驱动程序使用的方法非常相似 - 使用/ proc文件所需的所有信息创建结构，包括指向任何处理函数的指针（在我们的例子中只有一个），当有人试图从/ proc文件中读取时调用的那个）。然后， init_module向内核注册结构， cleanup_module将其注销。

我们使用proc_register_dynamic [8]的原因是因为我们不希望事先确定用于我们文件的inode编号，而是允许内核确定它以防止冲突。普通文件系统位于磁盘上，而不仅仅位于内存中（这是/ proc所在的位置），在这种情况下，inode编号是指向文件索引节点（简称inode）所在的磁盘位置的指针。 inode包含有关文件的信息，例如文件的权限，以及指向磁盘位置的指针或可以找到文件数据的位置。

因为在打开或关闭文件时我们没有调用，所以我们无法在此模块中放置try_module_get和try_module_put ，如果打开文件然后删除模块，则无法避免后果。

这是一个简单的例子，展示了如何使用/ proc文件。这是/ proc文件系统的HelloWorld。有三个部分：在函数init_module中创建文件/ proc / helloworld ，在回调函数procfs_read中读取文件/ proc / helloworld时返回一个值（和一个缓冲区），并删除文件中的/ proc / helloworld function cleanup_module 。

当使用函数create_proc_entry加载模块时，将创建/ proc / helloworld 。返回值是’struct proc_dir_entry *’，它将用于配置文件/ proc / helloworld （例如，此文件的所有者）。空返回值表示创建失败。

每次，每次读取文件/ proc / helloworld时，都会调用procfs_read函数。该函数的两个参数非常重要：缓冲区（第一个参数）和偏移量（第三个参数）。缓冲区的内容将返回给读取它的应用程序（例如cat命令）。偏移量是文件中的当前位置。如果函数的返回值不为null，则再次调用此函数。所以要小心这个函数，如果它从不返回零，则无休止地调用read函数。

% cat /proc/helloworld
HelloWorld!

例5-1。 procfs1.c

/*
 *  procfs1.c -  create a "file" in /proc
 *
 */

#include <linux/module.h>   /* Specifically, a module */
#include <linux/kernel.h>   /* We're doing kernel work */
#include <linux/proc_fs.h>  /* Necessary because we use the proc fs */

#define procfs_name "helloworld"

/**
 * This structure hold information about the /proc file
 *
 */
struct proc_dir_entry *Our_Proc_File;

/* Put data into the proc fs file.
 * 
 * Arguments
 * =========
 * 1. The buffer where the data is to be inserted, if
 *    you decide to use it.
 * 2. A pointer to a pointer to characters. This is
 *    useful if you don't want to use the buffer
 *    allocated by the kernel.
 * 3. The current position in the file
 * 4. The size of the buffer in the first argument.
 * 5. Write a "1" here to indicate EOF.
 * 6. A pointer to data (useful in case one common 
 *    read for multiple /proc/... entries)
 *
 * Usage and Return Value
 * ======================
 * A return value of zero means you have no further
 * information at this time (end of file). A negative
 * return value is an error condition.
 *
 * For More Information
 * ====================
 * The way I discovered what to do with this function
 * wasn't by reading documentation, but by reading the
 * code which used it. I just looked to see what uses
 * the get_info field of proc_dir_entry struct (I used a
 * combination of find and grep, if you're interested),
 * and I saw that  it is used in <kernel source
 * directory>/fs/proc/array.c.
 *
 * If something is unknown about the kernel, this is
 * usually the way to go. In Linux we have the great
 * advantage of having the kernel source code for
 * free - use it.
 */
int
procfile_read(char *buffer,
          char **buffer_location,
          off_t offset, int buffer_length, int *eof, void *data)
{
    int ret;

    printk(KERN_INFO "procfile_read (/proc/%s) called\n", procfs_name);

    /* 
     * We give all of our information in one go, so if the
     * user asks us if we have more information the
     * answer should always be no.
     *
     * This is important because the standard read
     * function from the library would continue to issue
     * the read system call until the kernel replies
     * that it has no more information, or until its
     * buffer is filled.
     */
    if (offset > 0) {
        /* we have finished to read, return 0 */
        ret  = 0;
    } else {
        /* fill the buffer, return the buffer size */
        ret = sprintf(buffer, "HelloWorld!\n");
    }

    return ret;
}

int init_module()
{
    Our_Proc_File = create_proc_entry(procfs_name, 0644, NULL);

    if (Our_Proc_File == NULL) {
        remove_proc_entry(procfs_name, &proc_root);
        printk(KERN_ALERT "Error: Could not initialize /proc/%s\n",
               procfs_name);
        return -ENOMEM;
    }

    Our_Proc_File->read_proc = procfile_read;
    Our_Proc_File->owner     = THIS_MODULE;
    Our_Proc_File->mode      = S_IFREG | S_IRUGO;
    Our_Proc_File->uid   = 0;
    Our_Proc_File->gid   = 0;
    Our_Proc_File->size      = 37;

    printk(KERN_INFO "/proc/%s created\n", procfs_name);    
    return 0;   /* everything is ok */
}

void cleanup_module()
{
    remove_proc_entry(procfs_name, &proc_root);
    printk(KERN_INFO "/proc/%s removed\n", procfs_name);
}

读写/proc文件

我们已经看到一个非常简单的/ proc文件示例，我们只读取文件/ proc / helloworld 。也可以在/ proc文件中写入。它的工作方式与read相同，写入/ proc文件时会调用一个函数。但是读取有一些差别，数据来自用户，因此您必须将数据从用户空间导入内核空间（使用copy_from_user或get_user ）

copy_from_user或get_user的原因是Linux内存（在英特尔架构上，在某些其他处理器下可能不同）是分段的。这意味着指针本身不会引用内存中的唯一位置，只引用内存段中的某个位置，并且您需要知道它可以使用哪个内存段。内核有一个内存段，每个进程有一个内存段。

进程可访问的唯一内存段是它自己的，因此在编写常规程序以作为进程运行时，无需担心段。编写内核模块时，通常需要访问由系统自动处理的内核内存段。但是，当需要在当前运行的进程和内核之间传递内存缓冲区的内容时，内核函数接收指向进程段中的内存缓冲区的指针。 put_user和get_user宏允许您访问该内存。这些函数只处理一个字符，您可以使用copy_to_user和copy_from_user处理多个字符。由于缓冲区（在读取或写入函数中）位于内核空间中，因此对于写入函数，您需要导入数据，因为它来自用户空间，而不是读取函数，因为数据已经存在于内核空间中。

例5-2。 procfs2.c


/**
 *  procfs2.c -  create a "file" in /proc
 *
 */

#include <linux/module.h>   /* Specifically, a module */
#include <linux/kernel.h>   /* We're doing kernel work */
#include <linux/proc_fs.h>  /* Necessary because we use the proc fs */
#include <asm/uaccess.h>    /* for copy_from_user */

#define PROCFS_MAX_SIZE     1024
#define PROCFS_NAME         "buffer1k"

/**
 * This structure hold information about the /proc file
 *
 */
static struct proc_dir_entry *Our_Proc_File;

/**
 * The buffer used to store character for this module
 *
 */
static char procfs_buffer[PROCFS_MAX_SIZE];

/**
 * The size of the buffer
 *
 */
static unsigned long procfs_buffer_size = 0;

/** 
 * This function is called then the /proc file is read
 *
 */
int 
procfile_read(char *buffer,
          char **buffer_location,
          off_t offset, int buffer_length, int *eof, void *data)
{
    int ret;

    printk(KERN_INFO "procfile_read (/proc/%s) called\n", PROCFS_NAME);

    if (offset > 0) {
        /* we have finished to read, return 0 */
        ret  = 0;
    } else {
        /* fill the buffer, return the buffer size */
        memcpy(buffer, procfs_buffer, procfs_buffer_size);
        ret = procfs_buffer_size;
    }

    return ret;
}

/**
 * This function is called with the /proc file is written
 *
 */
int procfile_write(struct file *file, const char *buffer, unsigned long count,
           void *data)
{
    /* get buffer size */
    procfs_buffer_size = count;
    if (procfs_buffer_size > PROCFS_MAX_SIZE ) {
        procfs_buffer_size = PROCFS_MAX_SIZE;
    }

    /* write data to the buffer */
    if ( copy_from_user(procfs_buffer, buffer, procfs_buffer_size) ) {
        return -EFAULT;
    }

    return procfs_buffer_size;
}

/**
 *This function is called when the module is loaded
 *
 */
int init_module()
{
    /* create the /proc file */
    Our_Proc_File = create_proc_entry(PROCFS_NAME, 0644, NULL);

    if (Our_Proc_File == NULL) {
        remove_proc_entry(PROCFS_NAME, &proc_root);
        printk(KERN_ALERT "Error: Could not initialize /proc/%s\n",
            PROCFS_NAME);
        return -ENOMEM;
    }

    Our_Proc_File->read_proc  = procfile_read;
    Our_Proc_File->write_proc = procfile_write;
    Our_Proc_File->owner      = THIS_MODULE;
    Our_Proc_File->mode       = S_IFREG | S_IRUGO;
    Our_Proc_File->uid    = 0;
    Our_Proc_File->gid    = 0;
    Our_Proc_File->size       = 37;

    printk(KERN_INFO "/proc/%s created\n", PROCFS_NAME);    
    return 0;   /* everything is ok */
}

/**
 *This function is called when the module is unloaded
 *
 */
void cleanup_module()
{
    remove_proc_entry(PROCFS_NAME, &proc_root);
    printk(KERN_INFO "/proc/%s removed\n", PROCFS_NAME);
}

使用标准文件系统管理/ proc文件

我们已经了解了如何使用/ proc接口读取/写入/ proc文件。但是也可以使用inode管理/ proc文件。主要的兴趣是使用高级功能，如权限。

在Linux中，有一种标准的文件系统注册机制。由于每个文件系统都必须有自己的函数来处理inode和文件操作[9] ，因此有一个特殊的结构来保存指向所有这些函数的指针struct struct inode_operations ，其中包含指向struct file_operations的指针。在/ proc中，每当我们注册一个新文件时，我们都可以指定使用哪个结构inode_operations来访问它。这是我们使用的机制，一个struct inode_operations ，它包含一个指向struct file_operations的指针，该指针包含指向procfs_read和procfs_write函数的指针。

这里另一个有趣的点是module_permission函数。只要进程尝试对/ proc文件执行某些操作，就会调用此函数，并且可以决定是否允许访问。现在它只基于当前用户的操作和uid（在当前可用，指向包含当前正在运行的进程的信息的结构的指针），但它可以基于我们喜欢的任何东西，例如什么其他进程正在使用相同的文件，一天中的时间或我们收到的最后一个输入。

值得注意的是，内核中的读写标准角色是相反的。读取函数用于输出，而写入函数用于输入。原因是读写是指用户的观点 - 如果进程从内核读取内容，那么内核需要输出它，如果进程将内容写入内核，则内核接收它作为输入。

例5-3。 procfs3.c

/* 
 *  procfs3.c -  create a "file" in /proc, use the file_operation way
 *          to manage the file.
 */

#include <linux/kernel.h>   /* We're doing kernel work */
#include <linux/module.h>   /* Specifically, a module */
#include <linux/proc_fs.h>  /* Necessary because we use proc fs */
#include <asm/uaccess.h>    /* for copy_*_user */

#define PROC_ENTRY_FILENAME     "buffer2k"
#define PROCFS_MAX_SIZE     2048

/**
 * The buffer (2k) for this module
 *
 */
static char procfs_buffer[PROCFS_MAX_SIZE];

/**
 * The size of the data hold in the buffer
 *
 */
static unsigned long procfs_buffer_size = 0;

/**
 * The structure keeping information about the /proc file
 *
 */
static struct proc_dir_entry *Our_Proc_File;

/**
 * This funtion is called when the /proc file is read
 *
 */
static ssize_t procfs_read(struct file *filp,   /* see include/linux/fs.h   */
                 char *buffer,  /* buffer to fill with data */
                 size_t length, /* length of the buffer     */
                 loff_t * offset)
{
    static int finished = 0;

    /* 
     * We return 0 to indicate end of file, that we have
     * no more information. Otherwise, processes will
     * continue to read from us in an endless loop. 
     */
    if ( finished ) {
        printk(KERN_INFO "procfs_read: END\n");
        finished = 0;
        return 0;
    }

    finished = 1;

    /* 
     * We use put_to_user to copy the string from the kernel's
     * memory segment to the memory segment of the process
     * that called us. get_from_user, BTW, is
     * used for the reverse. 
     */
    if ( copy_to_user(buffer, procfs_buffer, procfs_buffer_size) ) {
        return -EFAULT;
    }

    printk(KERN_INFO "procfs_read: read %lu bytes\n", procfs_buffer_size);

    return procfs_buffer_size;  /* Return the number of bytes "read" */
}

/*
 * This function is called when /proc is written
 */
static ssize_t
procfs_write(struct file *file, const char *buffer, size_t len, loff_t * off)
{
    if ( len > PROCFS_MAX_SIZE )    {
        procfs_buffer_size = PROCFS_MAX_SIZE;
    }
    else    {
        procfs_buffer_size = len;
    }

    if ( copy_from_user(procfs_buffer, buffer, procfs_buffer_size) ) {
        return -EFAULT;
    }

    printk(KERN_INFO "procfs_write: write %lu bytes\n", procfs_buffer_size);

    return procfs_buffer_size;
}

/* 
 * This function decides whether to allow an operation
 * (return zero) or not allow it (return a non-zero
 * which indicates why it is not allowed).
 *
 * The operation can be one of the following values:
 * 0 - Execute (run the "file" - meaningless in our case)
 * 2 - Write (input to the kernel module)
 * 4 - Read (output from the kernel module)
 *
 * This is the real function that checks file
 * permissions. The permissions returned by ls -l are
 * for referece only, and can be overridden here.
 */

static int module_permission(struct inode *inode, int op, struct nameidata *foo)
{
    /* 
     * We allow everybody to read from our module, but
     * only root (uid 0) may write to it 
     */
    if (op == 4 || (op == 2 && current->euid == 0))
        return 0;

    /* 
     * If it's anything else, access is denied 
     */
    return -EACCES;
}

/* 
 * The file is opened - we don't really care about
 * that, but it does mean we need to increment the
 * module's reference count. 
 */
int procfs_open(struct inode *inode, struct file *file)
{
    try_module_get(THIS_MODULE);
    return 0;
}

/* 
 * The file is closed - again, interesting only because
 * of the reference count. 
 */
int procfs_close(struct inode *inode, struct file *file)
{
    module_put(THIS_MODULE);
    return 0;       /* success */
}

static struct file_operations File_Ops_4_Our_Proc_File = {
    .read    = procfs_read,
    .write   = procfs_write,
    .open    = procfs_open,
    .release = procfs_close,
};

/* 
 * Inode operations for our proc file. We need it so
 * we'll have some place to specify the file operations
 * structure we want to use, and the function we use for
 * permissions. It's also possible to specify functions
 * to be called for anything else which could be done to
 * an inode (although we don't bother, we just put
 * NULL). 
 */

static struct inode_operations Inode_Ops_4_Our_Proc_File = {
    .permission = module_permission,    /* check for permissions */
};

/* 
 * Module initialization and cleanup 
 */
int init_module()
{
    /* create the /proc file */
    Our_Proc_File = create_proc_entry(PROC_ENTRY_FILENAME, 0644, NULL);

    /* check if the /proc file was created successfuly */
    if (Our_Proc_File == NULL){
        printk(KERN_ALERT "Error: Could not initialize /proc/%s\n",
               PROC_ENTRY_FILENAME);
        return -ENOMEM;
    }

    Our_Proc_File->owner = THIS_MODULE;
    Our_Proc_File->proc_iops = &Inode_Ops_4_Our_Proc_File;
    Our_Proc_File->proc_fops = &File_Ops_4_Our_Proc_File;
    Our_Proc_File->mode = S_IFREG | S_IRUGO | S_IWUSR;
    Our_Proc_File->uid = 0;
    Our_Proc_File->gid = 0;
    Our_Proc_File->size = 80;

    printk(KERN_INFO "/proc/%s created\n", PROC_ENTRY_FILENAME);

    return 0;   /* success */
}

void cleanup_module()
{
    remove_proc_entry(PROC_ENTRY_FILENAME, &proc_root);
    printk(KERN_INFO "/proc/%s removed\n", PROC_ENTRY_FILENAME);
}

仍然渴望procfs的例子？好吧，首先请记住，周围有谣言，声称procfs正在出路，考虑使用sysfs代替。其次，如果你真的不能得到足够的，那么linux / Documentation / DocBook /下面的procfs有一个非常值得推荐的奖励等级。在您的顶级内核目录中使用make help ，获取有关如何将其转换为您喜欢的格式的说明。示例： make htmldocs 。考虑使用此机制，以防您想要记录与您自己相关的内核。

使用seq_file管理/ proc文件

正如我们所看到的，编写/ proc文件可能非常“复杂”。因此，为了帮助人们编写/ proc文件，有一个名为seq_file的API可帮助格式化/ proc文件以进行输出。它基于序列，由3个函数组成：start（），next（）和stop（）。当用户读取/ proc文件时，seq_file API启动序列。

序列以函数start（）的调用开始。如果返回值为非NULL值，则调用函数next（）。这个函数是一个迭代器，目标是去想所有的数据。每次调用next（）时，也会调用函数show（）。它将数据值写入用户读取的缓冲区中。调用函数next（）直到它返回NULL。当next（）返回NULL时，序列结束，然后调用函数stop（）。

要小心：当序列完成时，另一个序列开始。这意味着在函数stop（）结束时，再次调用函数start（）。当函数start（）返回NULL时，此循环结束。您可以在图“Seq_file如何工作”中看到这个方案。

图5-1。 seq_file如何工作

这里写图片描述

Seq_file为file_operations提供基本功能，如seq_read，seq_lseek和其他一些功能。但是没有什么可写在/ proc文件中。当然，您仍然可以使用与上一示例相同的方式。

例5-4。 procfs4.c

/**
 *  procfs4.c -  create a "file" in /proc
 *  This program uses the seq_file library to manage the /proc file.
 *
 */

#include <linux/kernel.h>   /* We're doing kernel work */
#include <linux/module.h>   /* Specifically, a module */
#include <linux/proc_fs.h>  /* Necessary because we use proc fs */
#include <linux/seq_file.h> /* for seq_file */

#define PROC_NAME   "iter"

MODULE_AUTHOR("Philippe Reynes");
MODULE_LICENSE("GPL");

/**
 * This function is called at the beginning of a sequence.
 * ie, when:
 *  - the /proc file is read (first time)
 *  - after the function stop (end of sequence)
 *
 */
static void *my_seq_start(struct seq_file *s, loff_t *pos)
{
    static unsigned long counter = 0;

    /* beginning a new sequence ? */    
    if ( *pos == 0 )
    {   
        /* yes => return a non null value to begin the sequence */
        return &counter;
    }
    else
    {
        /* no => it's the end of the sequence, return end to stop reading */
        *pos = 0;
        return NULL;
    }
}

/**
 * This function is called after the beginning of a sequence.
 * It's called untill the return is NULL (this ends the sequence).
 *
 */
static void *my_seq_next(struct seq_file *s, void *v, loff_t *pos)
{
    unsigned long *tmp_v = (unsigned long *)v;
    (*tmp_v)++;
    (*pos)++;
    return NULL;
}

/**
 * This function is called at the end of a sequence
 * 
 */
static void my_seq_stop(struct seq_file *s, void *v)
{
    /* nothing to do, we use a static value in start() */
}

/**
 * This function is called for each "step" of a sequence
 *
 */
static int my_seq_show(struct seq_file *s, void *v)
{
    loff_t *spos = (loff_t *) v;

    seq_printf(s, "%Ld\n", *spos);
    return 0;
}

/**
 * This structure gather "function" to manage the sequence
 *
 */
static struct seq_operations my_seq_ops = {
    .start = my_seq_start,
    .next  = my_seq_next,
    .stop  = my_seq_stop,
    .show  = my_seq_show
};

/**
 * This function is called when the /proc file is open.
 *
 */
static int my_open(struct inode *inode, struct file *file)
{
    return seq_open(file, &my_seq_ops);
};

/**
 * This structure gather "function" that manage the /proc file
 *
 */
static struct file_operations my_file_ops = {
    .owner   = THIS_MODULE,
    .open    = my_open,
    .read    = seq_read,
    .llseek  = seq_lseek,
    .release = seq_release
};


/**
 * This function is called when the module is loaded
 *
 */
int init_module(void)
{
    struct proc_dir_entry *entry;

    entry = create_proc_entry(PROC_NAME, 0, NULL);
    if (entry) {
        entry->proc_fops = &my_file_ops;
    }

    return 0;
}

/**
 * This function is called when the module is unloaded.
 *
 */
void cleanup_module(void)
{
    remove_proc_entry(PROC_NAME, NULL);
}