QEMU采样方式计算脏页速率原理

最新推荐文章于 2023-06-16 14:51:44 发布

享乐主

最新推荐文章于 2023-06-16 14:51:44 发布

阅读量861

点赞数 1

分类专栏：内存虚拟化

本文链接：https://blog.csdn.net/huang987246510/article/details/114037013

版权

内存虚拟化专栏收录该内容

11 篇文章 19 订阅

订阅专栏

文章目录

基本原理
接口实现

基本原理

内存脏页速率的计算利用了抽样估计的原理，针对QEMU分配给虚机使用的内存，随机选取任意内存区间的1个内存页进行采样，使用循环冗余校验算法计算该页的crc值，并记录到内存，经过一段时间后再次计算上一次采样区间的内存页，对比页内容是否发生了变化，如果有变说明页变脏，统计所有变脏的页和总的采样页数，得到一个采样脏页和采样总页数的比例。从而根据该比例推算整个虚机内存脏页与总内存的比例。虽然抽样估计存在抽样误差，但总体能够评估内存的脏页速率，理论上单位内存区间采样的点越多，采样的间隔的时间越小，越接近真实的内存脏页速率。采样的原理示意图如下：
如图所示，qemu进程负责为虚机内存分配空间，用结构RAMBlock表示，图中表示成RAMBlock 0，RAMBlock 1和RAMBlock N，内存脏页速率计算接口需要接收两个参数，一是隔多长时间对比内存变脏(sample_pages_per_gigabytes)，二是每G的内存区间采样多少个页(sample_period_seconds)。图中我们配置每G内存区间采样3个页(默认为512个)。实现脏页速率计算的步骤如下：

采样

RAMBlock 0使用的内存大小有1G，通过sample_pages_per_gigabytes计算出采样点个数3个，RAMBlock 1使用的内存大小有2G，计算出采样点个数6个。确认采样点个数后，我们随机在RAMBlock内存区间的选取一个偏移地址，计算该偏移开始的一个内存页的循环冗余值，保存起来，同时也保存偏移地址。

计算

经过sample_period_seconds这段时间，我们取出之前保存的偏移地址，再次计算偏移地址开始的一个内存页的循环冗余值，对比之前保存的值，如果相同，说明内存页没有变动(untouched page)，反之就是脏的(dirty page)，对于脏页我们统计起来。再计算完所有抽样内存页的冗余值之后，我们统计出抽样页中的脏页数，从而得到脏页数与总抽样页数的占比。将该占比乘以虚机使用的内存总大小，就推断得到虚机在sample_period_seconds这段时间产生的总脏页数。再除以sample_period_seconds得到虚机每秒产生的脏页数，也就是我们需要的内存脏页速率，公式如下：

Total sample pages count = sample page count of RAMBlock0(3) + 
                           sample page count of RAMBlock1(6) +
                           sample page count of RAMBlockN

Total sample dirty count  = sample dirty count of RAMBlock0(2) +
                           sample dirty count of RAMBlock1(1) + 
                           sample dirty count of RAMBlockN
                           
dirty rate = Total size of vm * (Total sample pages count/Total sample dirty count) / config.sample_period_seconds

接口实现

脏页速率计算提供了两个qmp接口，一个接口用于计算脏页速率，另一个用于查询，用法如下：

1. 计算脏页速率接口calc-dirty-rate
virsh qemu-monitor-command {vmname} '{"execute":"calc-dirty-rate", "arguments": {"calc-time": {sleeptime}}}'
其中sleeptime的单位为s
 
2. 查询脏页速率接口query-dirty-rate
virsh qemu-monitor-command {vmname} '{"execute":"query-dirty-rate"}'

下面分别介绍其实现的数据结构和流程：：

数据结构

DirtyRateStatus

QEMU实现脏页速率计算定义了三个状态：

unstarted: 未开始，当主线程没有启动采样线程时，处于此状态
measuring: 测量中，当主线程启动采样线程后，直到计算出脏页速率，整个过程都处于该状态
measured: 测量完成，采样线程完成脏页速率计算，得到速率值，处于此状态，采样线程只有处于该状态时，用户调用脏页查询接口才能得到有效值，否则返回相应的错误。
QEMU在migration.json中定义了这三个状态，编译QEMU时会自动生成对应的枚举类型，三个状态分别是: DIRTY_RATE_STATUS_UNSTARTED、DIRTY_RATE_STATUS_MEASURING、DIRTY_RATE_STATUS_MEASURED

##
# @DirtyRateStatus:
#
# An enumeration of dirtyrate status.
#
# @unstarted: the dirtyrate thread has not been started.
#
# @measuring: the dirtyrate thread is measuring.
#
# @measured: the dirtyrate thread has measured and results are available.
#
# Since: 5.2
#
##
{ 'enum': 'DirtyRateStatus',
  'data': [ 'unstarted', 'measuring', 'measured'] }

DirtyRateConfig

DirtyRateConfig用于保存用户命令传入的脏页速率计算参数，它主要记录两个信息，一是单位内存区间的采样数，二是采样时间（两次循环冗余计算的时间间隔），理论上采样数越大、采样时间越短越能准确的描述脏页产生的速率。

struct DirtyRateConfig {
    /* 每G内存区间需要采样的点数，每个采样点计算1个内存页的循环冗余校验值 */
    uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
    /* 两次采样时间间隔，第一次采样计算循环冗余值，第二次采样重新计算，对比是否出现脏页 */
    int64_t sample_period_seconds; /* time duration between two sampling */
};

RamblockDirtyInfo

速率计算的采样会遍历每个符合要求的RAMBlock，随机选取RAMBlock中的若干个采样点，计算每个采样点的冗余校验值，这里一个采样点对应一个内存页。RamblockDirtyInfo被设计用于记录一个RAMBlock的使用内存总大小、采样数、脏页数、首次采样时随机选取的内存页偏移以及内存页的冗余校验值。

/*
 * Store dirtypage info for each ramblock.
 */
struct RamblockDirtyInfo {
    /* RAMBlock的idstr */
    char idstr[RAMBLOCK_INFO_MAX_LEN]; /* idstr for each ramblock */
    /* RAMBlock描述的内存在主机上的起始地址，即HVA */
    uint8_t *ramblock_addr; /* base address of ramblock we measure */
    /* RAMBlock管理的内存区间中已经使用的内存空间大小 */
    uint64_t ramblock_pages; /* ramblock size in TARGET_PAGE_SIZE */
    /* 用于存放随机选取的采样点在RAMBlock内存区间的偏移 */
    uint64_t *sample_page_vfn; /* relative offset address for sampled page */
    /* 该RAMBlock的采样点数 */
    uint64_t sample_pages_count; /* count of sampled pages */
    /* 该RAMBlock在第二次采样时的脏页数 */
    uint64_t sample_dirty_count; /* count of dirty pages we measure */
    /* 用于存放循环冗余校验值的数组，数组元素有sample_pages_count个 */
    uint32_t *hash_result; /* array of hash result for sampled pages */
};

DirtyRateStat

QEMU针对每个RAMBlock都记录了内存总大小、采样数、脏页数等信息，基于这些信息，QEMU可以统计出整个虚机内存中采样的总数，采样的总脏页数，从而计算出脏页速率。QEMU将这些统计的信息、最后计算出的脏页速率以及本次脏页统计的起止时间等信息，都封装成DirtyRateStat 结构体，如下：

/*
 * Store calculation statistics for each measure.
 */
struct DirtyRateStat {
    /* 第二次采样统计的所有RAMBlock的脏页数 */
    uint64_t total_dirty_samples; /* total dirty sampled page */
    /* 所有RAMBlock的采样数 */
    uint64_t total_sample_count; /* total sampled pages */
    /* 所有RAMBlock的已使用内存的总大小 */
    uint64_t total_block_mem_MB; /* size of total sampled pages in MB */
    /* 存放脏页产生速率 */
    int64_t dirty_rate; /* dirty rate in MB/s */
    /* 第一次采样开始时间 */
    int64_t start_time; /* calculation start time in units of second */
    /* 两次采样间隔时间 */
    int64_t calc_time; /* time duration of two sampling in units of second */
};

采样实现

我们首先介绍两次采样的函数实现，分别是record_ramblock_hash_info和compare_page_hash_info，如果看不懂，可以提前看下一章的流程实现，再回头看这两个函数：

record_ramblock_hash_info

采样针对的一个个RAMBlock，因此函数输出是一个存放RamblockDirtyInfo数组的指针，它提供RAMBlock相关的脏页信息及其它元信息。首次采样遍历虚机的所有RAMBlock，计算校验值并将结果存放到RamblockDirtyInfo数组中：

static bool record_ramblock_hash_info(struct RamblockDirtyInfo **block_dinfo,
                                      struct DirtyRateConfig config,
                                      int *block_count)
{
    /* 要采样的RAMBlock不能太小，对于小于128M的RAMBlock我们直接跳过 */
    RAMBLOCK_FOREACH_MIGRATABLE(block) {
        if (skip_sample_ramblock(block)) {
            continue;
        }
        /* 统计虚机所有RAMBlock中满足采样条件的RAMBlock */
        total_count++;
    }
    /* 统计出需要采样RAMBlock个数后，为数组分配空间 */
    dinfo = g_try_malloc0_n(total_count, sizeof(struct RamblockDirtyInfo));
    /* 遍历虚机的每个RAMBlock，首先获取采样需要的元信息，然后计算校验值并保存 */
    RAMBLOCK_FOREACH_MIGRATABLE(block) {
        /* 小于128M的RAMBlock跳过 */
        if (skip_sample_ramblock(block)) {
            continue;
        }
        if (index >= total_count) {
            break;
        }
        info = &dinfo[index];
        /* 取出一个RAMBlock需要的采样元信息并保存到RamblockDirtyInfo中：
         * 1. 采样点数
         * 2. RAMBlock已使用的内存大小
         * 3. RAMBlock在主机上的起始虚拟地址 HVA
         * 4. RAMBlock的id
         * */
        get_ramblock_dirty_info(block, info, &config);
        /* 计算校验值并保存到RamblockDirtyInfo中 */
        save_ramblock_hash(info)) 
        index++;
    }
}

看一下校验值的计算函数save_ramblock_hash，这是采样的核心实现：

static bool save_ramblock_hash(struct RamblockDirtyInfo *info)
{
    unsigned int sample_pages_count;
    GRand *rand;
    /* 取出该RAMBlock需要的采样点数 */
    sample_pages_count = info->sample_pages_count;
    /* 根据采样点数，为存放校验值的数组分配空间，有多少个采样点就有多少个校验值 */
    info->hash_result = g_try_malloc0_n(sample_pages_count,
                                        sizeof(uint32_t));
    /* 根据采样点数，为存放采样偏移值的数组分配空间，同上 */
    info->sample_page_vfn = g_try_malloc0_n(sample_pages_count,
                                            sizeof(uint64_t));
    /* 使用Glib库生成一个随机数种子，首次调用g_rand_int_range生成随机数使用该种子
     * 之后的每一次随机数生成，使用上次g_rand_int_range生成的随机数种子，从而保证
     * 每次调用g_rand_int_range产生的随机数不同
     * */
    rand  = g_rand_new();
    for (i = 0; i < sample_pages_count; i++) {
        /* 在[0, ramblock_pages - 1]区间随机生成一个数，保存在sample_page_vfn[]数组中 
         * [0, ramblock_pages - 1]是RAMBlock的使用页区间 
         * */
        info->sample_page_vfn[i] = g_rand_int_range(rand, 0,
                                                    info->ramblock_pages - 1);
        /* 以生成的随机数作为页偏移，计算一块内存页的校验值，保存到hash_result数组中 */                                             
        info->hash_result[i] = get_ramblock_vfn_hash(info,
                                                     info->sample_page_vfn[i]);
    }
}

看一下校验值的计算函数get_ramblock_vfn_hash：

static uint32_t get_ramblock_vfn_hash(struct RamblockDirtyInfo *info,
                                      uint64_t vfn)
{
    uint32_t crc;
    /* 使用crc32函数计算一个内存页内容的校验值，该内存页的起始地址为：
     * info->ramblock_addr + vfn * TARGET_PAGE_SIZE
     * 长度为一个页的大小：TARGET_PAGE_SIZE
     * */
    crc = crc32(0, (info->ramblock_addr +
                vfn * TARGET_PAGE_SIZE), TARGET_PAGE_SIZE);
    return crc;
}

compare_page_hash_info

第一次采样完成后会将采样的结果和相关元信息存放在RamblockDirtyInfo数组中，睡眠一段时间后进行第二次采样，二次采样根据一次采样输出的RamblockDirtyInfo数组，找到RAMBlock和采样点的偏移，重新计算校验值，如果不相等，将脏页计数增加，最终统计所有RAMBlock的脏页数。compare_page_hash_info的输入是一个RAMBlock的脏页元信息，将其与虚机的所有RAMBlock比较，匹配正确的RAMBlock，重新计算校验值，如果与脏页元信息中的校验值不同，增加全局的脏页计数：

static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
                                  int block_count)
{
     /* 遍历所有RAMBlock，找到匹配的，计算校验值，更新全局信息 */
    RAMBLOCK_FOREACH_MIGRATABLE(block) {
        /* 通过之前采样记录的RAMBlock元信息，找到对应的RAMBlock */
        find_block_matched(block, block_count, info);
        /* 计算校验值 */
        calc_page_dirty_rate(block_dinfo);
        /* 更新全局的统计信息，用于之后的速率计算 */
        update_dirtyrate_stat(block_dinfo);
    }
}

看一下如何找到之前采样对应的RAMBlock，主要比较三个信息，一是idstr，二是RAMBlock的起始内存虚拟地址，三是RAMBlock大小，流程如下：

static struct RamblockDirtyInfo *
find_block_matched(RAMBlock *block, int count,
                  struct RamblockDirtyInfo *infos)
{
    /* 首先找到相同idstr的RAMBlock */
    for (i = 0; i < count; i++) {
        if (!strcmp(infos[i].idstr, qemu_ram_get_idstr(block))) {
            break;
        }
    }
    /* 如果相同idstr的RAMBlock没有找到，返回NULL，表示没有匹配的RAMBlock */
    if (i == count) {
        return NULL;
    }
    /* 比较RAMBlock的起始内存虚拟地址是否相同，如果相同
     * 再比较RAMBlock使用的内存大小used_length
     * 通常只有重新启动虚机或者迁移这两种场景才能改变used_length的大小
     * 因此如果RAMBlock的实际使用大小发生了改变
     * 同样认为没有找到匹配的RAMBlock 
     * */
    if (infos[i].ramblock_addr != qemu_ram_get_host_addr(block) ||
        infos[i].ramblock_pages !=
            (qemu_ram_get_used_length(block) >> TARGET_PAGE_BITS)) {
        return NULL;
    }
    matched = &infos[i];
    return matched;
}

找到匹配的RAMBlock之后，重新计算校验值，比较之前采样的保存的结果，如果不同，增加脏页计数：

static void calc_page_dirty_rate(struct RamblockDirtyInfo *info)
{
    uint32_t crc;
    int i;
    /* 依次重新计算每个采样点的校验值，与之前的采样结果比较，如果不同，增加脏页计数 */
    for (i = 0; i < info->sample_pages_count; i++) {
        crc = get_ramblock_vfn_hash(info, info->sample_page_vfn[i]);
        if (crc != info->hash_result[i]) {
            trace_calc_page_dirty_rate(info->idstr, crc, info->hash_result[i]);
            info->sample_dirty_count++;
        }
    }
}

流程实现

qmp_calc_dirty_rate

qmp_calc_dirty_rate接口实现脏页速率的计算，它需要完成两个工作：其一是随机选取内存页进行采样，计算内存页的循环冗余值；其二是等待一段时间重新计算采样点的内存页，统计脏页占比。计算脏页产生速率。接口单独创建了一个get_dirtyrate采样线程来完成这个工作，如下：

void qmp_calc_dirty_rate(int64_t calc_time, Error **errp)
{
    /*
     * Init calculation state as unstarted. 
     * 设置脏页计算初始状态为unstarted
     */
    dirtyrate_set_state(&CalculatingState, CalculatingState, DIRTY_RATE_STATUS_UNSTARTED);                          
	/* 设置采样间隔时间和每G内存区间的采样点数，采样线程根据该配置计算脏页速率 */
    config.sample_period_seconds = calc_time;
    config.sample_pages_per_gigabytes = DIRTYRATE_DEFAULT_SAMPLE_PAGES;
    /* 启动采样线程 */
    qemu_thread_create(&thread, "get_dirtyrate", get_dirtyrate_thread,
                       (void *)&config, QEMU_THREAD_DETACHED);
}

采样线程首先将状态设置为measuring ，设置采样起始时间和间隔时间，然后开始计算脏页速率，待脏页速率计算完成后，将其状态设置为measured：

void *get_dirtyrate_thread(void *arg)
{
    /* 取出采样配置 */
    struct DirtyRateConfig config = *(struct DirtyRateConfig *)arg;
    /* 设置采样状态为measuring */
    dirtyrate_set_state(&CalculatingState, DIRTY_RATE_STATUS_UNSTARTED,
                              DIRTY_RATE_STATUS_MEASURING);
    /* 设置采样起始时间和间隔时间 */
    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
    calc_time = config.sample_period_seconds;
    init_dirtyrate_stat(start_time, calc_time);
    /* 计算脏页速率 */
    calculate_dirtyrate(config);
    /* 脏页速率计算完成后，将采样状态设置为measured */
    dirtyrate_set_state(&CalculatingState, DIRTY_RATE_STATUS_MEASURING,
                              DIRTY_RATE_STATUS_MEASURED);
}

脏页速率计算的主要工作是完成两次采样，首次采样对RAMBlock的内存区间随机选取采样点并计算页的冗余校验值，第二次采样重新计算冗余值，对比首次的冗余值，统计脏页个数，最后计算脏页速率：

static void calculate_dirtyrate(struct DirtyRateConfig config)
{
    /* 第一次采样，随机选取采样点并计算校验值 */
    record_ramblock_hash_info(&block_dinfo, config, &block_count));
    /* 设置采样间隔时间，如果间隔时间未到，就睡眠等待 */
    set_sample_page_period(msec, initial_time);
    /*第二次采样，重新计算校验值，统计采样点中的脏页个数 */
    compare_page_hash_info(block_dinfo, block_count));
    /* 计算脏页速率，并更新到DirtyRateStat结构体的全局变量中 */
    update_dirtyrate(msec);
}

qmp_query_dirty_rate

查询接口的实现非常简单，主要检查采样线程是否处于measured状态，只有在measured状态下，才能读取到脏页速率信息，然后返回结果给调用者，如果非measured状态，不读取脏页速率信息：

static struct DirtyRateInfo *query_dirty_rate_info(void)
{
    int64_t dirty_rate = DirtyStat.dirty_rate;
    struct DirtyRateInfo *info = g_malloc0(sizeof(DirtyRateInfo));
    /* 如果处于measured状态，读取脏页速率，保存到返回的信息中 */
    if (qatomic_read(&CalculatingState) == DIRTY_RATE_STATUS_MEASURED) {
        info->has_dirty_rate = true;
        info->dirty_rate = dirty_rate;
    }
    /* 读取其它返回信息 */
    info->status = CalculatingState;
    info->start_time = DirtyStat.start_time;
    info->calc_time = DirtyStat.calc_time;

    return info;
}