使用字符串定界符(标准C ++)在C ++中解析(拆分)字符串

本文翻译自:Parse (split) a string in C++ using string delimiter (standard C++)

I am parsing a string in C++ using the following: 我正在使用以下方法在C ++中解析字符串:

string parsed,input="text to be parsed";
stringstream input_stringstream(input);

if(getline(input_stringstream,parsed,' '))
{
     // do some processing.
}

Parsing with a single char delimiter is fine. 使用单个字符定界符进行解析就可以了。 But what if I want to use a string as delimiter. 但是,如果我想使用字符串作为分隔符怎么办?

Example: I want to split: 示例:我想拆分:

scott>=tiger

with >= as delimiter so that I can get scott and tiger. 用> =作为分隔符,这样我就可以得到Scott和Tiger。


#1楼

参考:https://stackoom.com/question/xr81/使用字符串定界符-标准C-在C-中解析-拆分-字符串


#2楼

strtok allows you to pass in multiple chars as delimiters. strtok允许您传入多个字符作为分隔符。 I bet if you passed in ">=" your example string would be split correctly (even though the > and = are counted as individual delimiters). 我敢打赌,如果您传递“> =”,您的示例字符串将被正确分割(即使>和=被视为单独的分隔符)。

EDIT if you don't want to use c_str() to convert from string to char*, you can use substr and find_first_of to tokenize. 如果不想使用c_str()从字符串转换为char *进行编辑,则可以使用substrfind_first_of进行标记化。

string token, mystring("scott>=tiger");
while(token != mystring){
  token = mystring.substr(0,mystring.find_first_of(">="));
  mystring = mystring.substr(mystring.find_first_of(">=") + 1);
  printf("%s ",token.c_str());
}

#3楼

I would use boost::tokenizer . 我会使用boost::tokenizer Here's documentation explaining how to make an appropriate tokenizer function: http://www.boost.org/doc/libs/1_52_0/libs/tokenizer/tokenizerfunction.htm 此处的文档说明了如何使适当的令牌生成器功能: http : //www.boost.org/doc/libs/1_52_0/libs/tokenizer/tokenizerfunction.htm

Here's one that works for your case. 这是适合您情况的一种。

struct my_tokenizer_func
{
    template<typename It>
    bool operator()(It& next, It end, std::string & tok)
    {
        if (next == end)
            return false;
        char const * del = ">=";
        auto pos = std::search(next, end, del, del + 2);
        tok.assign(next, pos);
        next = pos;
        if (next != end)
            std::advance(next, 2);
        return true;
    }

    void reset() {}
};

int main()
{
    std::string to_be_parsed = "1) one>=2) two>=3) three>=4) four";
    for (auto i : boost::tokenizer<my_tokenizer_func>(to_be_parsed))
        std::cout << i << '\n';
}

#4楼

You can use the std::string::find() function to find the position of your string delimiter, then use std::string::substr() to get a token. 您可以使用std::string::find()函数查找字符串定界符的位置,然后使用std::string::substr()获得令牌。

Example: 例:

std::string s = "scott>=tiger";
std::string delimiter = ">=";
std::string token = s.substr(0, s.find(delimiter)); // token is "scott"
  • The find(const string& str, size_t pos = 0) function returns the position of the first occurrence of str in the string, or npos if the string is not found. find(const string& str, size_t pos = 0)函数返回str中第一次出现str的位置,如果找不到该字符串,则npos

  • The substr(size_t pos = 0, size_t n = npos) function returns a substring of the object, starting at position pos and of length npos . substr(size_t pos = 0, size_t n = npos)函数返回对象的子字符串,从位置pos开始,长度为npos


If you have multiple delimiters, after you have extracted one token, you can remove it (delimiter included) to proceed with subsequent extractions (if you want to preserve the original string, just use s = s.substr(pos + delimiter.length()); ): 如果有多个定界符,则在提取了一个标记后,可以将其删除(包括定界符)以进行后续提取(如果要保留原始字符串,只需使用s = s.substr(pos + delimiter.length()); ):

s.erase(0, s.find(delimiter) + delimiter.length());

This way you can easily loop to get each token. 这样,您可以轻松地循环获取每个令牌。

Complete Example 完整的例子

std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";

size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
    token = s.substr(0, pos);
    std::cout << token << std::endl;
    s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;

Output: 输出:

scott
tiger
mushroom

#5楼

This method uses std::string::find without mutating the original string by remembering the beginning and end of the previous substring token. 此方法使用std::string::find而不会通过记住前一个子字符串标记的开始和结尾来改变原始字符串。

#include <iostream>
#include <string>

int main()
{
    std::string s = "scott>=tiger";
    std::string delim = ">=";

    auto start = 0U;
    auto end = s.find(delim);
    while (end != std::string::npos)
    {
        std::cout << s.substr(start, end - start) << std::endl;
        start = end + delim.length();
        end = s.find(delim, start);
    }

    std::cout << s.substr(start, end);
}

#6楼

You can use next function to split string: 您可以使用next函数分割字符串:

vector<string> split(const string& str, const string& delim)
{
    vector<string> tokens;
    size_t prev = 0, pos = 0;
    do
    {
        pos = str.find(delim, prev);
        if (pos == string::npos) pos = str.length();
        string token = str.substr(prev, pos-prev);
        if (!token.empty()) tokens.push_back(token);
        prev = pos + delim.length();
    }
    while (pos < str.length() && prev < str.length());
    return tokens;
}
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值