论——memcached真的有工具可以dump出所有key嘛?

众所周知,memcached并没有像redis那样提供了类似 keys * 的命令来获取所有key,也没有数据持久化,因此memcached想要dump所有的key,需要使用额外的命令组合或工具来实现,但有时命令和工具……也不一定会满足需求。下面是我最近对找到的几种方法进行的分析。

一、命令组合方式

我在git上发现如下工具: https://github.com/gnomeby/memcached-itool

使用该工具的dumpkeys可以将memcached的key进行dump,查看其实现方式发现也是使用了 stats itemsstats cachedump 等命令组合实现的,下面介绍下这种方式的实现及原理。

1. 原理实现

首先需要明白memcached的内存管理方式:Slab Allocator

  • memcached内存分为多个chunk组,即多个slab;
  • 每个组的chunk有不同的大小规格,如slab 1中chunk大小均为96B,slab 2中chunk大小均为120B,以此类推;
  • key根据其大小分别分配到不同的slab组的chunk中存储;

具体的内存分配机制不再展开描述,有兴趣的可以参考如下链接: https://www.cnblogs.com/zhoujinyi/p/5554083.html

2. 具体操作

1)stats items、stats slabs命令获取各slabs id以及slab的具体信息

stats items

STAT items:1:number 39       # slab中的key数量
STAT items:1:age 693911
STAT items:1:evicted 0
STAT items:1:evicted_nonzero 0
STAT items:1:evicted_time 0
STAT items:1:outofmemory 0
STAT items:1:tailrepairs 0
STAT items:1:reclaimed 7
STAT items:1:expired_unfetched 4
STAT items:1:evicted_unfetched 0
STAT items:1:crawler_reclaimed 0
STAT items:1:crawler_items_checked 0
STAT items:1:lrutail_reflocked 0
...
stats slabs

STAT 1:chunk_size 96        # slab中的chunk大小规格
STAT 1:chunks_per_page 10922
STAT 1:total_pages 1
STAT 1:total_chunks 10922
STAT 1:used_chunks 39
STAT 1:free_chunks 10883
STAT 1:free_chunks_end 0
STAT 1:mem_requested 3656
STAT 1:get_hits 3666
STAT 1:cmd_set 569
STAT 1:delete_hits 1
STAT 1:incr_hits 0
STAT 1:decr_hits 0
STAT 1:cas_hits 0
STAT 1:cas_badval 0
STAT 1:touch_hits 0
...
  • STAT后的数字即slab的标识id

2)stats cachedump {slab_id} {limit_num} 获取slab下的key信息

stats cachedump 1 5
ITEM seller_shop_im_phone_122 [1 b; 1513935640 s]       # key名称
ITEM seller_shop_im_phone_11542 [1 b; 1513935346 s]
ITEM user_third_userid_35543020 [2 b; 1516523664 s]
ITEM seller_shop_im_phone_12331 [1 b; 1513933986 s]
ITEM user_third_userid_70086126 [4 b; 1516517439 s]
END
  • slab_id 即各slab组的标识id
  • limit_num 即key的数量,0表示所有key

3. 问题缺陷

cachedump每次返回的数据只有2M;

而且在memcached源码中是写死的数值。

这个问题很严重。

网上并没有找到相关源码,于是我在官网下载了memcached-1.5.3的源码并查找,发现确实如此:

##########
# 源码位置:memcached-1.5.3/items.c
##########
char *item_cachedump(const unsigned int slabs_clsid, const unsigned int limit, unsigned int *bytes) {
    unsigned int memlimit = 2 * 1024 * 1024;   /* 2MB max response size */
    char *buffer;
    unsigned int bufcurr;
    item *it;
    unsigned int len;
    unsigned int shown = 0;
    char key_temp[KEY_MAX_LENGTH + 1];
    char temp[512];
    unsigned int id = slabs_clsid;
    id |= COLD_LRU;

    pthread_mutex_lock(&lru_locks[id]);
    it = heads[id];

    buffer = malloc((size_t)memlimit);
    if (buffer == 0) {
        return NULL;
    }
    bufcurr = 0;

    while (it != NULL && (limit == 0 || shown < limit)) {
        assert(it->nkey <= KEY_MAX_LENGTH);
        if (it->nbytes == 0 && it->nkey == 0) {
            it = it->next;
            continue;
        }
        /* Copy the key since it may not be null-terminated in the struct */
        strncpy(key_temp, ITEM_key(it), it->nkey);
        key_temp[it->nkey] = 0x00; /* terminate */
        len = snprintf(temp, sizeof(temp), "ITEM %s [%d b; %llu s]\r\n",
                       key_temp, it->nbytes - 2,
                       it->exptime == 0 ? 0 :
                       (unsigned long long)it->exptime + process_started);
        if (bufcurr + len + 6 > memlimit)  /* 6 is END\r\n\0 */
            break;
        memcpy(buffer + bufcurr, temp, len);
        bufcurr += len;
        shown++;
        it = it->next;
    }

    memcpy(buffer + bufcurr, "END\r\n", 6);
    bufcurr += 5;

    *bytes = bufcurr;
    pthread_mutex_unlock(&lru_locks[id]);
    return buffer;
}

可以看到:

  • 函数第一句定义了 memlimit 参数来限制大小为2M;
  • 之后通过 malloc((size_t)memlimit) 申请了名为buffer的2M空间;
  • 循环获取slab中的item直到达到2M上限;
  • 最后copy到buffer中并return;

由此可以明白,上述命令组合的方式虽然可以批量获取key,但每个slab最大只能dump 2M,数据量超过2M则无法获得所有的key

当然,可以尝试将源码中的memlimit参数调大后重新编译,但是这样也没法从根本上解决问题,因为不可能每次数据量超过后都重新编译一次,而如果直接设置一个很大的值的话————cachedump会不会直接把memcached搞挂掉我也保不准啊!毕竟源码中可是直接 malloc((size_t)memlimit) 一次申请了整个内存的!

感兴趣的大兄弟可以试一下,然后告诉我结果。?

二、libmemcached 工具

一提起libmemcached,我就一阵胃疼。不难搜到,网上很多文章介绍说它可以dump出memcached的所有key,因此我去官网看了下。

官网上介绍如下:

memdump dumps a list of “keys” from all servers that it is told to fetch from. Because memcached does not guarentee to provide all keys it is not possible to get a complete “dump”.

看上去是那么回事,由于没有翻墙,百度也搜不到有效的内容,于是我下载了libmemcached的源码决定研究下。

经过漫长的头疼后,终于搞懂了memdump的核心部分代码:

/*
  We use this to dump all keys.

  At this point we only support a callback method. This could be optimized by first
  calling items and finding active slabs. For the moment though we just loop through
  all slabs on servers and "grab" the keys.
*/

#include <libmemcached/common.h>

static memcached_return_t ascii_dump(Memcached *memc, memcached_dump_fn *callback, void *context, uint32_t number_of_callbacks)
{
  /* MAX_NUMBER_OF_SLAB_CLASSES is defined to 200 in Memcached 1.4.10 */
  for (uint32_t x= 0; x < 200; x++)
  {
    char buffer[MEMCACHED_DEFAULT_COMMAND_SIZE];
    int buffer_length= snprintf(buffer, sizeof(buffer), "%u", x);
    if (size_t(buffer_length) >= sizeof(buffer) or buffer_length < 0)
    {
      return memcached_set_error(*memc, MEMCACHED_MEMORY_ALLOCATION_FAILURE, MEMCACHED_AT,
                                 memcached_literal_param("snprintf(MEMCACHED_DEFAULT_COMMAND_SIZE)"));
    }

    // @NOTE the hard coded zero means "no limit"
    libmemcached_io_vector_st vector[]=
    {
      { memcached_literal_param("stats cachedump ") },
      { buffer, size_t(buffer_length) },
      { memcached_literal_param(" 0\r\n") }
    };

    // Send message to all servers
    ...
    ...

    // Collect the returned items
    memcached_instance_st* instance;
    memcached_return_t read_ret= MEMCACHED_SUCCESS;
    while ((instance= memcached_io_get_readable_server(memc, read_ret)))
    {
      memcached_return_t response_rc= memcached_response(instance, buffer, MEMCACHED_DEFAULT_COMMAND_SIZE, NULL);
      if (response_rc == MEMCACHED_ITEM)
      {
        char *string_ptr, *end_ptr;

        string_ptr= buffer;
        string_ptr+= 5; /* Move past ITEM */

        for (end_ptr= string_ptr; isgraph(*end_ptr); end_ptr++) {} ;

        char *key= string_ptr;
        key[(size_t)(end_ptr-string_ptr)]= 0;

        for (uint32_t callback_counter= 0; callback_counter < number_of_callbacks; callback_counter++)
        {
          memcached_return_t callback_rc= (*callback[callback_counter])(memc, key, (size_t)(end_ptr-string_ptr), context);
          if (callback_rc != MEMCACHED_SUCCESS)
          {
            // @todo build up a message for the error from the value
            memcached_set_error(*instance, callback_rc, MEMCACHED_AT);
            break;
          }
        }
      }
      else if (response_rc == MEMCACHED_END)
      {
        // All items have been returned
      }
      else if ...
      ...
      ...
  return memcached_has_current_error(*memc) ? MEMCACHED_SOME_ERRORS : MEMCACHED_SUCCESS;
}

代码可能比较多,但整体还是比较清晰的:

  • 开头一个大for循环遍历slab,在每个slab中进行操作;
  • 先定义合适大小的变量来存放命令,下面定义vector[]存放完整的cachedump命令;
  • 往下是一个while来循环接收返回结果,主要是在if (response_rc == MEMCACHED_ITEM)下的操作;
  • 将string_ptr指针放在slab的头部,然后跳过ITEM 字符(参考cachedump输出内容);
  • 接下来for循环将end_ptr指针指向key名称的尾端(参考百度:isgraph()函数);
  • 获取key起始位置*key以及key长度end_ptr-string_ptr,通过callback结构体获取key名称。

看到后面步骤的时候我是激动的,这是直接操作内存获取了所有key,厉害!但是我仔细一想,往上翻看了看——我去,这不还是用stats cachedump命令查的么!

为了避免因为我技术渣渣而误导大家,我终于想起来去stackoverflow搜了下,果然比百度强得多,一搜就搜到:

https://stackoverflow.com/questions/41458274/what-does-the-one-page-per-slab-class-limitation-mean-when-dumping-keys-with-m

呵呵,亏了源码开头注释还写着 We use this to dump all keys. 我开始胃疼了。

三、memcached-hack 补丁

memcached-hack是我无意中发现的一个补丁+工具的包,将源码按补丁修改重新编译后,可以实现cachedump时指定slab中的起始位置。大家可以到codegoogle上搜索下载。

有两个版本的补丁和一个python写的工具:

zhangxueyan? PileWorld memcached-hack $ ll
total 56
-rwxr-xr-x@ 1 zhangxueyan  staff   499  8 22  2008 example.py
-rwxr-xr-x@ 1 zhangxueyan  staff  2543  8 22  2008 memcached-1.2.2-cachedump-hack
-rwxr-xr-x@ 1 zhangxueyan  staff  4949  8 22  2008 memcached-1.2.4-cachedump-hack
-rwxr-xr-x@ 1 zhangxueyan  staff  8510  8 22  2008 memcachem.py

python工具就不说了,用的也是打完补丁后的cachedump,下面以1.2.4版本的补丁为例说下核心部分的改动:

Index: items.c
===================================================================
--- items.c (revision 793)
+++ items.c (working copy)
@@ -276,18 +276,23 @@
 }

 /*@null@*/
-char *do_item_cachedump(const unsigned int slabs_clsid, const unsigned int limit, unsigned int *bytes) {
+char *do_item_cachedump(const unsigned int slabs_clsid, const unsigned int start, const unsigned int limit, unsigned int *bytes) {
     unsigned int memlimit = 2 * 1024 * 1024;   /* 2MB max response size */
     char *buffer;
     unsigned int bufcurr;
     item *it;
+   int i;
     unsigned int len;
     unsigned int shown = 0;
     char temp[512];

     if (slabs_clsid > LARGEST_ID) return NULL;
     it = heads[slabs_clsid];
-
+    i = 0;
+    while (it != NULL && i < start) {
+       it = it->next;
+       i++;
+       }
     buffer = malloc((size_t)memlimit);
     if (buffer == 0) return NULL;
     bufcurr = 0;
  • do_item_cachedump函数增加start参数,用来指定slab中的起始位置;
  • 添加while循环,将指针 it 从默认的slab头部指向用户设定的start位置,然后进行读取。

看起来这个方法确实可以真正dump到所有的key了,接下来我做了个简单的测试。

官网下载了1.5.3的版本,参照补丁进行修改后编译启动,与原版的进行了对比:

改动后的1.5.3:

# slab1 下的所有key
stats cachedump 1 0                       
ITEM xue [9 b; 1514291787 s]s s
ITEM zhang [9 b; 1514291587 s]
ITEM runoob [9 b; 1514291526 s]
END

# slab1 下,从第二个key开始的所有key
stats cachedump 1 1 0                    
ITEM zhang [9 b; 1514291587 s]
ITEM runoob [9 b; 1514291526 s]
END

# slab1 下,从第三个key开始的所有key
stats cachedump 1 2 0
ITEM runoob [9 b; 1514291526 s]
END

对比未改动的 1.5.3 版本:

stats cachedump 1 0
ITEM runoob [9 b; 1514292460 s]
ITEM zhang [9 b; 1514292440 s]
ITEM xue [9 b; 1514292430 s]
END
stats cachedump 1 1 0
ITEM runoob [9 b; 1514292460 s]
END
stats cachedump 1 2 0
ITEM runoob [9 b; 1514292460 s]
ITEM zhang [9 b; 1514292440 s]
END

很明显原生的版本把最后一个参数忽略了,还是按照 {slab_id} {limit_num} 的语法,而改动后的版本从这个例子上来看确实是实现了指定位置的功能。

四、结语

花了零零散散几天的时间来研究这个问题,终于可以告一段落,也算是有不少收获,虽然没有找到原生的memcached中直接dump全部的方法,但总还有个改动后的可以实现。

当然我们线上正使用着的目前看来是dump不能了,但是业务上目前还是有这个需求,如果大家有知道原生的方法的,欢迎随时私信或评论中探讨。

转载于:https://my.oschina.net/feicuigu/blog/1595841

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
MEMDump utility is designed to dump or copy any part of 4GB linear memory address space under MS-DOS and Windows 9x DOS to a console, text or binary file. You can use MEMDump for dump contents of PCI devices memory located outside of first megabyte, access USB structures, study contents of memory used by memory managers, etc. For proper access to hardware registers, memory can be read with BYTE, WORD or Double WORD granularity. Utility provides transparent access to memory with or without installed memory managers. Syntax: MEMDUMP [/H|?] [/D[B|W|D][:Address[,Length]]] [/F:filename|none] [/B:filename] where: /H - Print this text /D[B|W|D][:Address[,Length]] - Dump 'Length' number of memory bytes from specified linear 'Address' as bytes (DB), words (DW) or double words (DD) correspondingly. /F:filename - Output file for the dump (Default: console) Use /F:none to completely suppress dump /B:filename - Output file for the binary contents of memory Notes: Both 'Address' and 'Length' can be expressed in hexadecimal format with '0x' prefix. The 'Length' field can be also expressed in decimal. Examples: MEMDUMP /DW:0x100000,0x100000 /F:2ndMB.dmp - dump second MB to file 2ndMB.dmp MEMDUMP /DB:0x100000,128 - dump 128 Bytes to console MEMDUMP /D:0,0x100 /F:none /B:IntTB.bin - copy INT table to binary file IntTB.bin If dump or binary file exists, MEMDUMP unconditionally overrides it. If you are using WORD or DWORD access 'Length' parameter should be multiple of 2 or 4 correspondingly. Please remember that if the memory manager (such as EMM386.EXE) is loaded, MEMDUMP will read linear address rather as physical address.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值