redis下的字符串分割函数

最新推荐文章于 2024-05-10 15:04:26 发布

makesifriend

最新推荐文章于 2024-05-10 15:04:26 发布

阅读量1.3k

点赞数

分类专栏： redis C 文章标签： redis

本文链接：https://blog.csdn.net/makesifriend/article/details/90369112

版权

C 同时被 2 个专栏收录

7 篇文章 0 订阅

订阅专栏

redis

5 篇文章 0 订阅

订阅专栏

在看redis-cli.c的时候，遇到了字符串分割函数sdssplitargs，我想不就是字符串分割函数吗，我也会写，我到要看看自己写的东西与标准有何不同。

一个用C语言写的字符串分割函数需要考虑哪些问题呢？

1，分割的字符串用啥存储呢？答案是二重指针，相当于一个字符串数组；

2，使用啥函数解决从源字符串复制到新开辟的指针所指向的空间呢？strncpy；

3，函数的逻辑问题？嗯，如果遇到‘\0’可以看成是字符串的结束，如果是‘ ’可以看成token字符串的开始，这样可以记录每个token的位置。

代码展示如下：

#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
#include <string.h>

int split_space(char **argvs, char* line){
	char *p = line;
	char *ps, *ptoken;
	int len, i = 0;
	while(*p){
		//从非空格开始
		while(isspace(*p)) ++p;
		ps = p;        //token起点
		while(!isspace(*p) && *p) ++p;
		len = p - ps;  //token长度
		ptoken = (char *)malloc(sizeof(char) * len);
		strncpy(ptoken, ps, len);
		argvs[i++] = ptoken;
	}
	return i;
}

void out_put(char **argvs, int l){
	int i = 0;
	for(i = 0; i < l; ++i){
		printf(":%s\n", argvs[i]);
	}
}

int main(){
	char **argvs;
	char *s = " ab cd ef";
	int len;
	//预先分配10个字符串指针
	argvs = (char **)malloc(sizeof(char *) * 10);

	len = split_space(argvs, s);
	out_put(argvs, len);
}

程序运行正常，可以正常根据空格分割字符串。那就好了，再看看redis这个东西的sdssplitargs做了什么。首先看看它的接口函数：

sds *sdssplitargs(const char *line, int *argc)

嗯，可以理解，redis的字符串用sds表示，返回值是sds数组，没毛病；输入为待检测字符串，与token个数指针（也就是要返回token个数），也没毛病。与自己的函数相比，它返回的是sds数组，我返回的是字符串数组罢了。

那再看看，它的内部实现，瞅了一眼，好像很多行代码的样子，所以它到底干了些啥？

sds *sdssplitargs(const char *line, int *argc) {
    *argc = 0;
    while(1) {
        /* skip blanks */
        while(*p && isspace(*p)) p++;
        if (*p) {
            /* get a token */
            int inq=0;  /* set to 1 if we are in "quotes" */
            int insq=0; /* set to 1 if we are in 'single quotes' */
            int done=0;

            if (current == NULL) current = sdsempty();
            while(!done) {
                //处理双引号
                if (inq) {
                    //处理16进制数字转换
                    if (*p == '\\' && *(p+1) == 'x' &&
                                             is_hex_digit(*(p+2)) &&
                                             is_hex_digit(*(p+3)))
                    {
                        unsigned char byte;

                        byte = (hex_digit_to_int(*(p+2))*16)+
                                hex_digit_to_int(*(p+3));
                        current = sdscatlen(current,(char*)&byte,1);
                        p += 3;
                    } else if (*p == '\\' && *(p+1)) {
                        //处理特殊字符
                        char c;

                        p++;
                        switch(*p) {
                        case 'n': c = '\n'; break;
                        ...
                        }
                        current = sdscatlen(current,&c,1);
                    } else if (*p == '"') {
                        /* closing quote must be followed by a space or
                         * nothing at all. */
                        if (*(p+1) && !isspace(*(p+1))) goto err;
                        done=1;
                    } else if (!*p) {
                        /* unterminated quotes */
                        goto err;
                    } else {
                        current = sdscatlen(current,p,1);
                    }
                } else if (insq) {
                    //处理单引号
                    if (*p == '\\' && *(p+1) == '\'') {
                        p++;
                        current = sdscatlen(current,"'",1);
                    } else if (*p == '\'') {
                        /* closing quote must be followed by a space or
                         * nothing at all. */
                        if (*(p+1) && !isspace(*(p+1))) goto err;
                        done=1;
                    } else if (!*p) {
                        /* unterminated quotes */
                        goto err;
                    } else {
                        current = sdscatlen(current,p,1);
                    }
                } else {
                    switch(*p) {
                    case '\n':
                        ...
                    case '\0':
                        done=1;
                        break;
                    case '"':
                        inq=1;
                        break;
                    case '\'':
                        insq=1;
                        break;
                    default:
                        current = sdscatlen(current,p,1);
                        break;
                    }
                }
                if (*p) p++;
            }
            /* add the token to the vector */
            vector = s_realloc(vector,((*argc)+1)*sizeof(char*));
            vector[*argc] = current;
            (*argc)++;
            current = NULL;
        } 
    }

err:
    ...
}

简要看一下，可以看出，自己分割字符串函数的思路并没有出错，与redis函数逻辑是相同的，只是它在原来函数基础上扩展了几个功能：

1，可以剔除对应的引号；

2，可以转换16进制数字；

3，可以处理特殊字符？这里说明一下，如果在函数文本里输入"\n"，或者'\n'，函数会认为这是4个字母，我们经常用的printf函数自带了特殊字符识别功能，所以能根据输入"\n"换行；如果自己写的字符串分割函数想要实现特殊字符处理的话，还是要首先识别特殊字符的，然后再以‘\n’单个字符的格式添加到字符串中，才能实现换行操作。

makesifriend

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
redis下的字符串分割函数

在看redis-cli.c的时候，遇到了字符串分割函数sdssplitargs，我想不就是字符串分割函数吗，我也会写，我到要看看自己写的东西与标准有何不同。一个用C语言写的字符串分割函数需要考虑哪些问题呢？1，分割的字符串用啥存储呢？答案是二重指针，相当于一个字符串数组；2，使用啥函数解决从源字符串复制到新开辟的指针所指向的空间呢？strncpy；3，函数的逻辑问题？嗯，如果遇到‘...
复制链接

扫一扫

专栏目录