Boost Tokenizer 使用介绍

最新推荐文章于 2023-09-22 17:11:15 发布

amuseme_lu

最新推荐文章于 2023-09-22 17:11:15 发布

阅读量5k

点赞数

分类专栏： C/C++ 文章标签： iterator string token iostream list csv

本文链接：https://blog.csdn.net/amuseme_lu/article/details/6931335

版权

C/C++ 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

Boost Tokenizer 使用介绍

-------------------------

1. 介绍

Boost Tokenizer提供了一种把字符序列转换成一组Token的能力，当然，你也可以定义TokenizerFunction来自定义序列的切分符号，如果不指定，默认是以空格为分割，去掉一些标点符号。

2. 几个简单的例子

下面是一个简单的例子：

// simple_example_1.cpp
#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>


int main(){
   using namespace std;
   using namespace boost;
   string s = "This is,  a test";
   tokenizer<> tok(s);
   for(tokenizer<>::iterator beg=tok.begin(); beg!=tok.end();++beg){
       cout << *beg << "\n";
   }
}

结果如下：
This
is
a
test

这里已经过滤了标点符号。

下面是一个以字符步长来进行分割的例子：

// simple_example_3.cpp
#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>


int main(){
   using namespace std;
   using namespace boost;
   string s = "12252001";
   int offsets[] = {2,2,4};   // 这里指定了三个步长
   offset_separator f(offsets, offsets+3);
   tokenizer<offset_separator> tok(s,f);
   for(tokenizer<offset_separator>::iterator beg=tok.begin(); beg!=tok.end();++beg){
       cout << *beg << "\n";
   }
}

结果如下：
12
23
2001

3. 什么是TokenizerFunction

TokenizerFunction是一个用于查询符合匹配要求的token，目前提供了三种TokenizerFunction模板，
× escaped_list_separator 主要用于解析csv格式的字符串
explicit escaped_list_separator(Char e = '\\', Char c = ',',Char q = '\"')
escaped_list_separator(string_type e, string_type c, string_type q):
× offset_separator 主要用于解析基于特定步长的要求

template<typename Iter>

offset_separator(Iter begin,Iter end,bool bwrapoffsets = true, bool breturnpartiallast = true)

× char_separator 主要是用于解析基于特定字符分割的需求

explicit char_separator(const Char* dropped_delims,

const Char* kept_delims = "",
empty_token_policy empty_tokens = drop_empty_tokens)

4. 一个简单的解析/etc/passwd的例子

/**
 * @auth lemo.lu
 * @date 2011.11.03
 *
 * example of Boost tokenizer template usage,This example uses delimiter
 * separator. 
 */


// stl header
#include <iostream>                  // iostream
#include <string>                    // string
#include <fstream>                   // ifstream


// boost
#include <boost/tokenizer.hpp>       // boost Tokenizer


int main(){
    std::ifstream passwdFile;
    passwdFile.open("/etc/passwd",std::ifstream::in);


    // store password line
    char passwdString[256];
    
    typedef boost::tokenizer<boost::char_separator<char> > passwdTokenizer;
    // set a TokenizerFunction , dropped delimiters ":" and keep delimiters ""
    boost::char_separator<char> tokenSep(":", "", boost::keep_empty_tokens);
    // passwd format information
    static const char* passwd_st[] = { "Account","password","UID","GID","GECOS","Dir","Shell"
    };


    // iterator the passwd file
    while(passwdFile.good())
    {
        // get line
        passwdFile.getline(passwdString,256);
        passwdTokenizer tok(std::string(passwdString), tokenSep);
        int passwd_c = 0;
        for(passwdTokenizer::iterator curTok=tok.begin(); curTok!=tok.end(); ++curTok)
            std::cout << passwd_st[passwd_c++] << ":" << *curTok  << std::endl;
        std::cout << "---------------------" << std::endl;
    }passwdFile.close();}

部分结果如下：

Account:root
password:x
UID:0
GID:0
GECOS:root
Dir:/root
Shell:/bin/bash
---------------------
Account:daemon
password:x
UID:1
GID:1
GECOS:daemon
Dir:/usr/sbin
Shell:/bin/sh
---------------------

5. 参考

http://www.boost.org/doc/libs/1_47_0/libs/tokenizer/

amuseme_lu

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Boost Tokenizer 使用介绍

Boost Tokenizer 使用介绍-------------------------1. 介绍Boost Tokenizer提供了一种把字符序列转换成一组Token的能力，当然，你也可以定义TokenizerFunction来自定义序列的切分符号，如果不指定，默认是以空格为分割，去掉一些标点符号。2. 几个简单的例子下面是一个简单的例子：// simple
复制链接

扫一扫

专栏目录