dmesg&printk的工作原理

置顶 papaofdoudou

已于 2024-05-13 10:48:48 修改

阅读量2k

点赞数 9

分类专栏： Linux 嵌入式系统文章标签：网络

于 2022-02-02 10:22:20 首次发布

本文链接：https://blog.csdn.net/tugouxp/article/details/122767764

版权

嵌入式系统同时被 2 个专栏收录

355 篇文章 57 订阅

订阅专栏

Linux

217 篇文章 29 订阅

订阅专栏

内核调试者对DMESG应该不会感到陌生，DMESG可以将内核打印信息输出到控制台，从而让我们知道内核发生了什么。一句话不足以概括其中的工作细节，那么，DMESG是如何发挥作用的呢，这里我们就探究一下。

dmesg程序的格式

在UBUNTU18.04上，DMESG是以ELF可执行文件的形式提供功能的。

dmesg工作原理

我们用strace跟踪一下dmesg的工作过程，输入

strace -e trace=open,read,openat,close dmesg

输出如下：

可以看到，DMESG工作依赖于文件句柄为3的文件，而这个文件又是打开/dev/kmsg获得的，所以核心点在于/dev/kmsg驱动的实现。

dmesg如何输出信息？

我们同时跟踪read/write系统调用：

strace -e trace=open,read,write,openat,close dmesg

可以看到，每次read系统调用之后都会伴随一个write系统调用，并且write length - read length 固定等于 12。这说明写出的数据恰好就是read从内核读出的。

另外，除了printk函数可以向__log_buf中输送数据，/dev/kmsg节点也实现了write_iter接口，用户态也可以向/dev/kmsg写入数据，这样用户态的LOG也会出现在dmesg中。

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
	int fd = -1;

	fd = open("/dev/kmsg", O_WRONLY);
	char *buf = "Hello World.\n";

	if (fd != -1) {
		write(fd, buf, strlen(buf) + 1);
		close(fd);
	} else {
		printf("unable to get file descriptor.\n");
	}

	return 0;
}

/dev/kmsg的实现

/dev/kmsg是一个混杂字符设备,它和/dev/mem,/dev/kmem, /dev/null,/dev/port,/dev/zero,/dev/full,/dev/random,/dev/urandom共用同一个主设备号，通过次设备号来区分。

在内核中的实现如下：

主设备号为1，次设备号按序分配。

对于这种多个设备节点共享同一个cdev的情况，chrdev_open函数会在打开操作进行时将每个被打开设备节点的inode->i_devices节点加入到所属的cdev->list上进行“登记”。

我们关心的核心实现就是kmsg_fops了。

/proc/kmsg

另一个有意思的节点是 /proc/kmsg，注册的地方在：

他也是通过内核循环缓冲区来实现，读取的方法是

sudo cat /proc/kmsg

不过，/proc/kmsg和/dev/kmsg在读取时的表现有所差异， /dev/kmsg会每次将所有的打印输出一遍，而/proc/kmsg则从上一次输出之后的地方开始输出，这个上一次包括DMESG的输出。比如下图所示，每次插拔U盘，都显示当前最新的输出。

相同点是，无论/dev/kmsg还是/proc/kmsg，都依赖于内核打开CONFIG_PRINTK这项配置。

分析/dev/kmsg实现

dmesg read系统调用是通过函数devkmsg_read实现的，函数的实现中有几个影响阅读的关键变量：

log_next_seq：指向当前可写的struct log 索引号, 递增的自然数（可写的下一笔)。
log_next_idx: 指向环形缓冲区(log_buf)可写的位置，整数，当前写入位置相对于RINGBUF基地址的偏移(也就是log_buf内的写指针，单位为字节)。
log_first_seq：指向当前可读的struct log 索引号（可读的下一笔）。
log_first_idx：指向环形缓冲区可以读的位置(也就是log_buf内的读指针，单位为字节).

关于以上变量的意义的印证，可以从该函数在读取完最后一条MSG后的处理得到证明，在上面的LOG中，当读完最后一条DMESG信息后，用户态会得到一条EAGAIN错误输出，恰恰就是下图所示的地方返回的，按照循环缓冲区的理解，user->seq == log_next_seq代表读位置追上了写位置，没有什么东西可读的了，也就是所有的信息都读取完毕了，符合该处的除了逻辑。

user从何而来

devkmsg_usr是在/dev/kmsg节点打开的时候分配内存的，将其IDX和SEQ初始化为读点的变量，并且将其赋值给private_data指针。另外可以看大devkmsg_usr自带BUF，BUF成员无需初始化。

从如下两个函数定义可以看出,__log_buf中的数据包括两种类型，第一种当然是打印字符数据类型，第二种，就是通常会忽略的LOG头对象(struct printk_log)，它是对一段LOG的描述结构体对象，描述了接下来的LOG长度等信息，并且__log_buf要以struct printk_log对齐。

为了区分循环缓冲区最后一段无法塞下一断LOG的情况，当回绕发生时，最后一笔LOG的printk_log头用0填充,并且将log_next_idx重置为0，也就是log_buf的开始位置。

如下图所示：

printk的工作原理

printk是一个很NB的函数，普通函数对执行上下文有很苛刻的要求，比如有的函数因为包含休眠调用，不能在中断上下文调用，而有些函数由于持有某个锁资源，为了避免死锁，不能在已经持有这把锁的环境下调用。printk 不一般的地方就在于，它可以在任何上下文调用。

printk的实质填充__log_buf的地方,msg指针直接取自于__log_buf,并且作为struct printk_log头填充必要的信息，后面附上LOG的内容，一起填充到LOG BUF里面，最后更新log_next_seq和log_next_idx.

写完之后，更新log_next_idx和log_next_seq。

printk的调用流程：

IDA Pro proximity browser around vprintk_emit:

前面的输出是输出到__log_buf,console_unlock路径才是真正的打印到控制台，串口等等驱动。

可以看到最终输出到了串口(console),当然，如果当前没有直接打印到console的条件，则先将LOG缓冲到LOG BUF中，在由一个异步worker队列完成到终端的输出，具体分析看下文。

更新__log_buf:

如果在启动阶段指定了字符串命令行“log_buf_len = 0xXXXXXX",

当设置的长度大于原来的长度log_buf_len，也就是__LOG_BUF_LEN时，将会更新新BUF的长度到new_log_buf_len中。

之后，在setup_log_buf调用的时候，如果发现new_log_buf_len不为0，将会重新分配一块更大的BUF作为LOG buf.这就是log_buf更新的逻辑。

printk和REBOOT的关系：

在系统重启阶段，会将LOG BUF的内容刷出，用户态的reboot调用入口：

dump指针指向谁呢？在kmsg_dumper结构对象中找到了关联

可见，最终也会通过console_unlock刷新控制台输出。

测试程序

#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <fcntl.h>
#include <unistd.h>
 
#define DBG(fmt, ...)   do { printf("%s line %d, "fmt"\n", __func__, __LINE__, ##__VA_ARGS__); } while (0)

struct printk_log {
    uint64_t ts_nsec;
    uint16_t len;
    uint16_t text_len;
    uint16_t dict_len;
    uint8_t facility;
    uint8_t flags:5;
    uint8_t level:3;
    uint32_t caller_id;
};

#define LOG_ALIGN __alignof__(struct printk_log)

static uint32_t msg_used_size(uint16_t text_len, uint16_t dict_len, uint32_t *pad_len)
{                          
    uint32_t size,old;              

    old = size = sizeof(struct printk_log) + text_len + dict_len;
    *pad_len = (-size) & (LOG_ALIGN - 1);
    size += *pad_len;      
    
    if(old > size)
    {
		while(1)
		{
			DBG("old %d, size %d.", old,size);
		}
    }
	else
	{
		DBG("align %ld, old 0x%x, size 0x%x, -size=0x%x.", LOG_ALIGN, old, size, -old);
	}

    return size;           
}      

int main(void)
{
	int32_t i;
	uint32_t pad_len;

	DBG("LOG_ALIGN = %ld.", LOG_ALIGN);

	for(i = 0; i  < 1024; i ++)
	{
		msg_used_size(i, 0, &pad_len);
	}

	return 0;
}

关于printk打印的几个知识点：

如何控制打印级别：

四个成员分别对应了四类不同的控制台输出等级：

如何发挥作用的呢？以console_loglevel为例，每条LOG在打印的时候都会默认有一个等级，并且记录在printk_log结构体中：

在console_unlock打印时，会进行log_level的检查，只会允许优先级高于当前console_loglevel（数值小的优先级高）的LOG输出。

每笔LOG数据都有一个printk_log结构体对象作为前缀，其中记录了这笔数据的输出LEVEL，所以，每笔数据的LOG输出与否都可以单独控制。

什么时候会输出“printk messages dropped”？

在console_unlock函数中，当发现打印速度过快，导致log_first_seq记数漂移到新的位置时（通过log_make_free_space,见log_make_free_space)，则会提示当前打印过快，打印不过来了,同时将LOG输出记数调整到新的log_first_seq位置。

get-log-level:

cat /kmsg/proc

一段验证printk_log buffer的代码：

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <stddef.h>
#include <stdbool.h>
#include <string.h>

#define __aligned(x) __attribute__((aligned(x)))

#define max(x, y) ({                \
    typeof(x) _max1 = (x);          \
    typeof(y) _max2 = (y);          \
    (void) (&_max1 == &_max2);      \
    _max1 > _max2 ? _max1 : _max2; })

struct printk_log {
    uint64_t ts_nsec;        /* timestamp in nanoseconds */
    uint16_t len;            /* length of entire record */
    uint16_t text_len;       /* length of text buffer */
    uint16_t dict_len;       /* length of dictionary buffer */
    uint8_t  facility;       /* syslog facility */
    uint8_t  flags:5;        /* internal record flags */
    uint8_t  level:3;        /* syslog level */
};

enum log_flags {
    LOG_NOCONS  = 1,    /* already flushed, do not print to console */
    LOG_NEWLINE = 2,    /* text ended with a newline */
    LOG_PREFIX  = 4,    /* text started with a prefix */
    LOG_CONT    = 8,    /* text is a fragment of a continuation line */
};  

#define LOG_ALIGN __alignof__(struct printk_log)
#define CONFIG_LOG_BUF_SHIFT 20
#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
static char *log_buf        = __log_buf;
static uint32_t log_buf_len = __LOG_BUF_LEN;

// index and sequence number of the first record stored in the buffer
static uint64_t log_first_seq;
static uint32_t log_first_idx;

// index and sequence number of the next record to store in the buffer
static uint64_t log_next_seq;
static uint32_t log_next_idx;

// the next printk record to read after the last 'clear' command
static uint64_t clear_seq;
static uint32_t clear_idx;

char *log_buf_addr_get(void)
{
    return log_buf;
}

uint32_t log_buf_len_get(void)
{
    return log_buf_len;
}

static char *log_text(const struct printk_log *msg)
{
    return (char *)msg + sizeof(struct printk_log);
}

static char *log_dict(const struct printk_log *msg)
{
    return (char *)msg + sizeof(struct printk_log) + msg->text_len;
}

static struct printk_log *log_from_idx(uint32_t idx)
{
    struct printk_log *msg = (struct printk_log *)(log_buf + idx);

    // A length == 0 record is the end of buffer marker. Wrap around and
    // read the message at the start of the buffer.
    if (!msg->len)
        return (struct printk_log *)log_buf;
    return msg;
}

static uint32_t log_next(uint32_t idx) 
{
    struct printk_log *msg = (struct printk_log *)(log_buf + idx);
    
    // length == 0 indicates the end of the buffer; wrap
    // A length == 0 record is the end of buffer marker. Wrap around and
    // read the message at the start of the buffer as *this* one, and
    // return the one after that.

    if (!msg->len)
    { 
        msg = (struct printk_log *)log_buf; 
        return msg->len;
    }

    return idx + msg->len;
}

// Check whether there is enough free space for the given message.
// The same values of first_idx and next_idx mean that the buffer
// is either empty or full.
// If the buffer is empty, we must respect the position of the indexes.
// They cannot be reset to the beginning of the buffer.

static int logbuf_has_space(uint32_t msg_size, bool empty)
{
    uint32_t free;

    if (log_next_idx > log_first_idx || empty)
        free = max(log_buf_len - log_next_idx, log_first_idx);
    else
        free = log_first_idx - log_next_idx;

    // We need space also for an empty header that signalizes wrapping
    // of the buffer.
    
    return free >= msg_size + sizeof(struct printk_log);
}

static int log_make_free_space(uint32_t msg_size)
{
    while (log_first_seq < log_next_seq && !logbuf_has_space(msg_size, false))
    {
        /* drop old messages until we have enough contiguous space */
        log_first_idx = log_next(log_first_idx);
        log_first_seq++; 
    }
    
    if (clear_seq < log_first_seq)
    { 
        clear_seq = log_first_seq;
        clear_idx = log_first_idx;
    }
    
    /* sequence numbers are equal, so the log buffer is empty */
    if (logbuf_has_space(msg_size, log_first_seq == log_next_seq))
        return 0;

    return -1;
}

// compute the message size including the padding bytes
static uint32_t msg_used_size(uint16_t text_len, uint16_t dict_len, uint32_t *pad_len)
{
    uint32_t size;

    size = sizeof(struct printk_log) + text_len + dict_len;
    *pad_len = (-size) & (LOG_ALIGN - 1);
    size += *pad_len;

    return size;
}

// Define how much of the log buffer we could take at maximum. The value 
// must be greater than two. Note that only half of the buffer is available
// when the index points to the middle.

#define MAX_LOG_TAKE_PART 4
static const char trunc_msg[] = "<truncated>";

static uint32_t truncate_msg(uint16_t *text_len, uint16_t *trunc_msg_len,uint16_t *dict_len, uint32_t *pad_len)
{
    // The message should not take the whole buffer. Otherwise, it might
    // get removed too soon.
    
    uint32_t max_text_len = log_buf_len / MAX_LOG_TAKE_PART;
    if (*text_len > max_text_len)
        *text_len = max_text_len;
    /* enable the warning message */
    *trunc_msg_len = strlen(trunc_msg);
    /* disable the "dict" completely */ 
    *dict_len = 0;
    /* compute the size again, count also the warning message */ 
    return msg_used_size(*text_len + *trunc_msg_len, 0, pad_len);
}

// insert record into the buffer, discard old ones, update heads

static int log_store(int facility, int level,enum log_flags flags, uint64_t ts_nsec, const char *dict, uint16_t dict_len,const char *text, uint16_t text_len)
{
    struct printk_log *msg;
    uint32_t size, pad_len;
    uint16_t trunc_msg_len = 0;

    /* number of '\0' padding bytes to next message */
    size = msg_used_size(text_len, dict_len, &pad_len);

    if (log_make_free_space(size)) {
        /* truncate the message if it is too long for empty buffer */
        size = truncate_msg(&text_len, &trunc_msg_len,
                    &dict_len, &pad_len);
        /* survive when the log buffer is too small for trunc_msg */
        if (log_make_free_space(size))
            return 0;
    }

    if (log_next_idx + size + sizeof(struct printk_log) > log_buf_len) {
        /*
         * This message + an additional empty header does not fit
         * at the end of the buffer. Add an empty header with len == 0
         * to signify a wrap around.
         */
        memset(log_buf + log_next_idx, 0, sizeof(struct printk_log));
        log_next_idx = 0;
    }

    /* fill message */
    msg = (struct printk_log *)(log_buf + log_next_idx);
    memcpy(log_text(msg), text, text_len);
    msg->text_len = text_len;
    if (trunc_msg_len) {
        memcpy(log_text(msg) + text_len, trunc_msg, trunc_msg_len);
        msg->text_len += trunc_msg_len;
    }
    memcpy(log_dict(msg), dict, dict_len);
    msg->dict_len = dict_len;
    msg->facility = facility;
    msg->level = level & 7;
    msg->flags = flags & 0x1f;
    if (ts_nsec > 0)
        msg->ts_nsec = ts_nsec;
    else
        msg->ts_nsec = 0;
    memset(log_dict(msg) + dict_len, 0, pad_len);
    msg->len = size;

    /* insert message */
    log_next_idx += msg->len;
    log_next_seq++;

    return msg->text_len;
}


int main(void)
{
    char *str = "hello world.";
    int len = strlen(str);

    while(1)
    {
        log_store(0, 0, 0, 0, NULL, 0, str, len);
        printf("%s line %d, write seq %ld, write idx %d, read seq %ld, read idx %d.\n", __func__, __LINE__, log_next_seq, log_next_idx, log_first_seq, log_first_idx);
    }

    return 0;
}

增加消费者线程：

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <stddef.h>
#include <stdbool.h>
#include <string.h>
#include <pthread.h>

#define __aligned(x) __attribute__((aligned(x)))

#define max(x, y) ({                \
    typeof(x) _max1 = (x);          \
    typeof(y) _max2 = (y);          \
    (void) (&_max1 == &_max2);      \
    _max1 > _max2 ? _max1 : _max2; })

struct printk_log {
    uint64_t ts_nsec;        /* timestamp in nanoseconds */
    uint16_t len;            /* length of entire record */
    uint16_t text_len;       /* length of text buffer */
    uint16_t dict_len;       /* length of dictionary buffer */
    uint8_t  facility;       /* syslog facility */
    uint8_t  flags:5;        /* internal record flags */
    uint8_t  level:3;        /* syslog level */
};

enum log_flags {
    LOG_NOCONS  = 1,    /* already flushed, do not print to console */
    LOG_NEWLINE = 2,    /* text ended with a newline */
    LOG_PREFIX  = 4,    /* text started with a prefix */
    LOG_CONT    = 8,    /* text is a fragment of a continuation line */
};  

#define LOG_ALIGN __alignof__(struct printk_log)
#define CONFIG_LOG_BUF_SHIFT 20
#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
static char *log_buf        = __log_buf;
static uint32_t log_buf_len = __LOG_BUF_LEN;

// index and sequence number of the first record stored in the buffer
static uint64_t log_first_seq;
static uint32_t log_first_idx;

// index and sequence number of the next record to store in the buffer
static uint64_t log_next_seq;
static uint32_t log_next_idx;

// the next printk record to read after the last 'clear' command
static uint64_t clear_seq;
static uint32_t clear_idx;

char *log_buf_addr_get(void)
{
    return log_buf;
}

uint32_t log_buf_len_get(void)
{
    return log_buf_len;
}

static char *log_text(const struct printk_log *msg)
{
    return (char *)msg + sizeof(struct printk_log);
}

static char *log_dict(const struct printk_log *msg)
{
    return (char *)msg + sizeof(struct printk_log) + msg->text_len;
}

static struct printk_log *log_from_idx(uint32_t idx)
{
    struct printk_log *msg = (struct printk_log *)(log_buf + idx);

    // A length == 0 record is the end of buffer marker. Wrap around and
    // read the message at the start of the buffer.
    if (!msg->len)
        return (struct printk_log *)log_buf;
    return msg;
}

static uint32_t log_next(uint32_t idx) 
{
    struct printk_log *msg = (struct printk_log *)(log_buf + idx);
    
    // length == 0 indicates the end of the buffer; wrap
    // A length == 0 record is the end of buffer marker. Wrap around and
    // read the message at the start of the buffer as *this* one, and
    // return the one after that.

    if (!msg->len)
    { 
        msg = (struct printk_log *)log_buf; 
        return msg->len;
    }

    return idx + msg->len;
}

// Check whether there is enough free space for the given message.
// The same values of first_idx and next_idx mean that the buffer
// is either empty or full.
// If the buffer is empty, we must respect the position of the indexes.
// They cannot be reset to the beginning of the buffer.

static int logbuf_has_space(uint32_t msg_size, bool empty)
{
    uint32_t free;

    if (log_next_idx > log_first_idx || empty)
        free = max(log_buf_len - log_next_idx, log_first_idx);
    else
        free = log_first_idx - log_next_idx;

    // We need space also for an empty header that signalizes wrapping
    // of the buffer.
    
    return free >= msg_size + sizeof(struct printk_log);
}

static int log_make_free_space(uint32_t msg_size)
{
    while (log_first_seq < log_next_seq && !logbuf_has_space(msg_size, false))
    {
        /* drop old messages until we have enough contiguous space */
        log_first_idx = log_next(log_first_idx);
        log_first_seq++; 
    }
    
    if (clear_seq < log_first_seq)
    { 
        clear_seq = log_first_seq;
        clear_idx = log_first_idx;
    }
    
    /* sequence numbers are equal, so the log buffer is empty */
    if (logbuf_has_space(msg_size, log_first_seq == log_next_seq))
        return 0;

    return -1;
}

// compute the message size including the padding bytes
static uint32_t msg_used_size(uint16_t text_len, uint16_t dict_len, uint32_t *pad_len)
{
    uint32_t size;

    size = sizeof(struct printk_log) + text_len + dict_len;
    *pad_len = (-size) & (LOG_ALIGN - 1);
    size += *pad_len;

    return size;
}

// Define how much of the log buffer we could take at maximum. The value 
// must be greater than two. Note that only half of the buffer is available
// when the index points to the middle.

#define MAX_LOG_TAKE_PART 4
static const char trunc_msg[] = "<truncated>";

static uint32_t truncate_msg(uint16_t *text_len, uint16_t *trunc_msg_len,uint16_t *dict_len, uint32_t *pad_len)
{
    // The message should not take the whole buffer. Otherwise, it might
    // get removed too soon.
    
    uint32_t max_text_len = log_buf_len / MAX_LOG_TAKE_PART;
    if (*text_len > max_text_len)
        *text_len = max_text_len;
    /* enable the warning message */
    *trunc_msg_len = strlen(trunc_msg);
    /* disable the "dict" completely */ 
    *dict_len = 0;
    /* compute the size again, count also the warning message */ 
    return msg_used_size(*text_len + *trunc_msg_len, 0, pad_len);
}

// insert record into the buffer, discard old ones, update heads

static int log_store(int facility, int level,enum log_flags flags, uint64_t ts_nsec, const char *dict, uint16_t dict_len,const char *text, uint16_t text_len)
{
    struct printk_log *msg;
    uint32_t size, pad_len;
    uint16_t trunc_msg_len = 0;

    /* number of '\0' padding bytes to next message */
    size = msg_used_size(text_len, dict_len, &pad_len);

    if (log_make_free_space(size)) {
        /* truncate the message if it is too long for empty buffer */
        size = truncate_msg(&text_len, &trunc_msg_len,
                    &dict_len, &pad_len);
        /* survive when the log buffer is too small for trunc_msg */
        if (log_make_free_space(size))
            return 0;
    }

    if (log_next_idx + size + sizeof(struct printk_log) > log_buf_len) {
        /*
         * This message + an additional empty header does not fit
         * at the end of the buffer. Add an empty header with len == 0
         * to signify a wrap around.
         */
        memset(log_buf + log_next_idx, 0, sizeof(struct printk_log));
        log_next_idx = 0;
    }

    /* fill message */
    msg = (struct printk_log *)(log_buf + log_next_idx);
    memcpy(log_text(msg), text, text_len);
    msg->text_len = text_len;
    if (trunc_msg_len) {
        memcpy(log_text(msg) + text_len, trunc_msg, trunc_msg_len);
        msg->text_len += trunc_msg_len;
    }
    memcpy(log_dict(msg), dict, dict_len);
    msg->dict_len = dict_len;
    msg->facility = facility;
    msg->level = level & 7;
    msg->flags = flags & 0x1f;
    if (ts_nsec > 0)
        msg->ts_nsec = ts_nsec;
    else
        msg->ts_nsec = 0;
    memset(log_dict(msg) + dict_len, 0, pad_len);
    msg->len = size;

    /* insert message, update pointer at last, lockless must.*/
    log_next_idx += msg->len;
    log_next_seq++;

    return msg->text_len;
}


static uint64_t console_seq;
static uint32_t console_idx;

void *log_output_msg(void *param)
{
    struct printk_log *msg;   
    for( ; ; )
    {
        if (console_seq == log_next_seq)
        {
            continue;
        }
        if (console_seq < log_first_seq)
        {
            console_seq = log_first_seq;
            console_idx = log_first_idx;
        }

        msg = log_from_idx(console_idx);
        console_idx = log_next(console_idx);
        console_seq++;
        printf("\033[31m""%s line %d, payload %s""\x1b[37m""\n", __func__, __LINE__, log_text(msg));
    }

    return NULL;
}

int main(void)
{
    int ret;
    pthread_t pthread_read;

    char *str = "hello world.";
    int len   = strlen(str);

    int err = pthread_create(&pthread_read, NULL, log_output_msg, NULL);
    if(err != 0)
    {
        perror("create pthread failure.");
        return -1;
    }

    while(1)
    {
        log_store(0, 0, 0, 0, NULL, 0, str, len);
        printf("%s line %d, write seq %ld, write idx %d, read seq %ld, read idx %d.\n", __func__, __LINE__, log_next_seq, log_next_idx, log_first_seq, log_first_idx);
    }
    
    pthread_join(pthread_read, NULL);
    return 0;
}

syslog日志文件 kern.log和dmesg的不同

/var/log/kern.log及其轮换的日志(/var/log/kern.log.1/var/log/kern.log.2 ....)包含由内核生成并由syslog处理的日志.

dmesg 是用于检查或控制内核环形缓冲区，实际上，它将显示自上次引导以来/var/log/kern.log的最后16392个八位字节(__log_buf大小).

如何刷新kern.log文件？

如果kern.log文件过大，记录的日志过多，查看会非常不方便，这时，可以手动将kern.log重命名为一个新文件，然后执行：

sudo systemctl restart rsyslog

重启syslog服务，之后如果有新的dmesg日志输出，则会自动创建新的kern.log文件了。

查看压缩日志文件

如果日志过长，将会分多个文件保存，除了当前在写入的LOG文件kern.log之外，已经写完的日志文件会以kern.log.1.gz, kern.log.2.gz。。。等形式保存为多个压缩文件。如下图所示：

查看压缩log文件的方法有两种，第一种，也是最容易想到的，是首先用gunzip解压缩成普通LOG文本文件，再查看，或者使用 gunzip -c命令查看：

第二种方式是使用zcat工具,zcat 工具基本上等价于gunzip -c命令。

还可以通过 journalctl -k查看内核历史日志。

printk日志丢失

printk可能会丢失日志信息而不会给出任何提示，这一点要特别注意，如下图所示，程序中连续的日志计数，在显示成功打印出来的日志中并不连续。

如何扩大dmesg log_buf_len?

方法非常简单，修改/etc/default/grub文件，添加一条启动参数log_buf_len=8M，之后执行 sudo update-grub更新启动文件系统，之后重启系统即可。

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=off log_buf_len=8M"

对于无法增加log_buf的情况，可以查看/var/log/kern.log日志文件，它包含所有的LOG记录。

指针打印需要注意的事项

ifneq ($(KERNELRELEASE),)
obj-m:=pointer.o
else
KERNELDIR:=/lib/modules/$(shell uname -r)/build
PWD:=$(shell pwd)
all:
	$(MAKE) -C $(KERNELDIR) M=$(PWD) modules
 
clean:
	rm -rf *.o *.mod.c *.mod.o *.ko *.symvers *.mod .*.cmd *.order
endif

#include <linux/init.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/stat.h>
#include <linux/fs.h>
#include <linux/kdev_t.h>
#include <linux/cdev.h>
#include <linux/slab.h>
#include <linux/device.h>
#include <linux/seq_file.h>
#include <linux/sched/signal.h>
#include <linux/proc_fs.h>
#include <linux/pid.h>
#include <linux/pci.h>
#include <linux/usb.h>
#include <linux/kobject.h>
#include <linux/sched/mm.h>
#include <linux/platform_device.h>
#include <linux/netdevice.h>
#include <net/net_namespace.h>
#include <linux/sched/debug.h>
 
static int ptr_init(void)
{
    void *ptr;
    printk("%s line %d.\n", __func__, __LINE__);
    ptr = kmalloc(GFP_KERNEL, 0x1000);
    printk("%s line %d, ptr = 0x%p.\n", __func__, __LINE__, ptr);
    printk("%s line %d, ptr = 0x%pK.\n", __func__, __LINE__, ptr);
    printk("%s line %d, ptr = 0x%px.\n", __func__, __LINE__, ptr);
    printk("%s line %d, ptr = 0x%lx.\n", __func__, __LINE__, (unsigned long)ptr);
    printk("%s line %d, ptr = %pS.\n", __func__, __LINE__, ptr);
    kfree(ptr);
    return 0;
}
 
static void ptr_exit(void)
{
    printk("%s line %d, exit.\n", __func__, __LINE__);
    return;
}
 
module_init(ptr_init);
module_exit(ptr_exit);

MODULE_AUTHOR("zlcao");
MODULE_LICENSE("GPL");

观察打印输出，可以看到%p无法正常打印出指针的合理值。

可以参考linux-5.15.90/lib/vsprintf.c的说明，内核为了安全，避免普通用户获取内核的地址范围，随机化了%p的输出

fmt parameter process flow:

printk->vprintk_func->vprintk_default->vprintk_emit->vprintk_store->vscnprintf->vsnprintf->format_decode(fmt, &spec);->pointer(fmt, str,...)->pointer_string.

if kptr_restrict was set, then use the strict theogy, output the true pointer value,else will output the hashvalue of the true value use ptr_to_id(buf, end, ptr, spec);

another way to disable this is:

echo 0 > /proc/sys/kernel/kptr_restrict

/proc/kallsyms also controlled by kptr_restrict.

you will see all zero if kptr_restrict was 2.

连续打印的方法：

通常用printk打印，无论字符串末尾加不加换行，最后DMESG输出都会换行。

如何仅接着上一行连续打印输出呢?方法如下：

核心代码，一次log_store代表一次刷新，如果当前不是CONT，则会刷调上一笔CONT的数据，调用的是cont_flush接口。如果没有CONT数据，则将当前数据缓存起来作为CONT，等待被下一笔刷调或者接续，如果下一笔是CONT的话。

这种设计下，只有KERN_CONT有“续”上一笔输出的语义，而普通的字符串打印，在末尾没有换行符的情况下，表示愿意作一个头，供结下来的字符输出接序。

log level控制

虽然/proc/sys/kernel/printk可以查看和设置printk的log level,但如同上面对DMESG的分析，这个设置对dmesg log的输出不起作用。printk首先会输出到__log_buf中，其次才会输出到各个前端LOG，比如dmesg, syslog, console等等，所以，输出到log_buf和输出到console是两回事，loglevel仅仅对console起作用（suppress_message_printing等级监测函数在console_unlock中调用）.

所以，loglevel控制不了dmesg的输出,dmesg不是控制台。而对于console的输出，实际上有两条调用路径：

路径1：

printk->vprintk_func->vprintk_default->vprintk_emit->console_unlock:

这种调用路径仅仅发生在打印上下文为非LOGLEVEL_SCHED上下文的情况下，也就是使用printk_deferred打印的环境下，使用这个打印函数的上下文一般是在调度器中，为了保证调度更加安全。

而对于printk_deferred打印的环境,vprintk_emit仅仅将LOG保存到__log_buf中（vprintk_store），之后在一个worker工作队列中调用console_unlock输出，也就是路径2:

路径2:worker->console_callback->console_unlock.

trigger work: schedule_console_callback

所以可以看到，在嵌入式系统中可以用调整LOG LEVEL的方式控制打印到串口的LOG等级，但是无法控制DMESG， SYSLOG的日至等级，无论用什么等级的printk打印，dmesg 都是可以看到的。所以，__log_buf可以看到内核日志系统的后端，包含所有日志，DMESG，CONSOLE，SYSLOG等等是日志系统的前端，各自维护各自的读指针，面向不同业务，由不同的逻辑控制。

printk lock hand over

为了保证互斥操作(exclusive operation),printk实现中会获取一把锁console_sem，它是一个semaphore，但是前面又提到，printk可以在任何上下文中使用，包括中断上下文，众所周知，中断上下文不允许休眠，怎么可以调用有可能会引发休眠的semaphore呢？

原来，当vprintk_emit发现当前上下文是可以打印到控制台，并且存在待打印的有效记录后，会立刻调用console_trylock_spinning获取console_sem，如果成功获取，console_trylock_spinning返回1，才会进入console_unlock调用控制台的write函数进行真正的打印.并且在整个过程中，抢占是被关闭的。实现的关键是console_trylock_spinning函数，分析它的实现，发现最终获取锁的调用并不是down,而是down_trylock,或者尝试测试以下是否能够获取锁，如果能，则尝试获取，如果获取不到，则返回失败，这种行为不会造成上下文休眠，获取不到也没关系，毕竟LOG已经记录到了__log_buf中，内核中有其他机制可以推迟__log_buf中内容的输出。

console_trylock_spinning尽量尝试成功获取，所以如果第一把尝试没有获取到，并不会直接退出，而是进入循环等待，毕竟在关调度，拿锁等层层保护下，每个上下文都得尽快完成任务，好把锁交出去给其他上下文用。

在占用线程释放之前，等待线程执行cpu_relax循环等待资源的释放：

但是这里仍然有一个疑惑，console_trylock_spinning返回1表示成功拿到锁，但是在循环等待一直到函数返回为1的点之间，并未看到有进行拿锁的操作，这是怎么回事呢？
原来，console_unlock和console_trylock_spinning实现之间，有一个小的trick,避免了让打印上下文释放锁再由等待上下文去尝试拿锁的操作，具体做法是，两个上下文之间通过console_waiter进行同步：

首先，等待端拿不到锁，会设置console_waiter 为true，并且，根据逻辑，同一时刻，只允许有一个上下文因为拿不到打印锁而等待，等待方在设置console_waiter为TRUE后，会阻止其它上下文将spin设置为TRUE，整个逻辑被自旋锁console_owner_lock保护，保证逻辑的原子性，所以最多只有一个调用printk的上下文进入等待状态。

当占用锁的上下文打印完成后，console_unlock->console_lock_spinning_disable_and_check中，会设置console_waiter为FALSE，并且导致console_unlock 不释放拿到的锁直接返回，这样，就相当于直接将自己拿到的锁hand over给等待线程。等待线程看到console_waiter为FALSE后，就不需要在去拿锁，因为此时，锁已经是它的了，如果这次又有新的打印上下文进来，则新任拿锁的线程也可以照葫芦画瓢，在打印执行完后将锁handover给下一个进程，所以，理想情况下，这个打印锁只被获取一次，之后就可以被无数个上下文使用了。

但是能够这样做需要有一个前提，就是这个锁不能是MUTEX，这里用的是SEMOPHORE，这是因为，semephore和mutex最大的不同，就是mutex有属主属性，而semephore没有属主属性，后者仅仅关系资源有无，至于资源给了谁，semephore并不关心，而mutex会记录谁持有这个资源。所以，这里用semephore，目的也是为了实现这种线程之间实现semophore hand over的操作，如果换做mutex,则拥有方和等待方就必须要做一个释放和获取的操作了。

等待方在console_trylock_spinning退出后，执行console_unlock->console_lock_spinning_enable，更新console_owner为current.

dmesg -C clear 操作原理

dmesg -C命令会清空dmesg显示，参考util-linux/sys-utils/dmesg.c 的实现，其原理是调用了syslog 控制命令SYSLOG_ACTION_CLEAR :

klogctl是LIBC实现：

对应到内核中,将clear_seq, clear_idx移动到最新的下一次的写入位置：

之后下次执行dmesg命令，会调用SEEK_DATA命令利用clear_seq, clear_idx初始化读取位置的上下文：

如何记录用户态LOG？

编辑文件/etc/rsyslog.d/50-default.conf，曲调15行注释：

user.*                          -/var/log/user.log

保存退出，之后执行service rsyslog restart重启LOG服务：

此时再次向/dev/kmsg节点输入数据，即可在user.log中获取输入的内容，说明用户态日志被正确LOG。

参考文章

https://www.cnblogs.com/hellokitty2/p/16468019.html

pstore_lzhf1122的博客-CSDN博客

Linux内核panic核心执行逻辑_papaofdoudou的博客-CSDN博客

结束！

papaofdoudou

关注

9
点赞
踩
9

收藏

觉得还不错? 一键收藏
打赏
0
评论
dmesg&printk的工作原理

内核调试者对DMESG应该不会感到陌生，DMESG可以将内核打印信息输出到控制台，从而让我们知道内核发生了什么。一句话不足以概括其中的工作细节，那么，DMESG是如何发挥作用的呢，这里我们就探究一下。在UBUNTU18.04上，DMESG是以ELF可执行文件的形式提供功能的。我们用strace跟踪一下dmesg的工作过程，输入输出如下：可以看到，DMESG工作依赖于文件句柄为3的文件，而这个文件又是打开/dev/kmsg获得的，所以核心点在于/dev/kmsg驱动的实现。我们同时跟踪read/writ
复制链接

扫一扫