自动机是文本匹配文本解析的利器,这里仿造参考文献[1],实现一个ini配置文件解析器,状态机在处理文本解析的工作过程是这样的,不断读取输入的字符,根据当前的状态对字符进行处理,处理的过程主要包括状态的转换等动作,知道处理完毕全部的输入字符。
一般ini文件格式如下:
;this is comment
[section1]
aa = 1
bb = 2
[section2]
cc = 3
dd = 4
在ini文件解析的过程中,共涉及到一下几个状态:
开始状态:是一初始的状态
SectionState:进入到某个section label的状态
KeyState:进入到处理key的状态
ValueState:进入到处理value的状态
CommentStae:进入注释状态
状态转换过程为:
开始状态:
读入'[',进入SectionState
读入字母数字字符,进入KeyState
读入';',进入CommentState
SectionState状态:
读入']',返回开始状态
KeyState状态:
读入'=',截取key,并进入ValueState状态
ValueState状态:
读入‘\n',截取value,并进入初始状态;
CommentState状态:
读入'\n',进入初始状态
下面是完整程序:
#include <stdio.h>
#include <map>
#include <string>
bool IsAlphabet(char c) {
if (c >= 'a' && c <= 'z' ||
c >= 'A' && c <= 'Z' ||
c >= '0' && c <= '9')
return true;
else
return false;
}
bool IsCommentStart(char c) {
if (c == ';' || c == '#') {
return true;
} else {
return false;
}
}
bool IsSectionLabelStart(char c) {
if (c == '[') {
return true;
} else {
return false;
}
}
bool IsSectionLabelEnd(char c) {
if (c == ']') {
return true;
} else {
return false;
}
}
bool IsKeyEnd(char c) {
if (c == '=') {
return true;
} else {
return false;
}
}
bool IsValueEnd(char c) {
if (c == '\n') {
return true;
} else {
return false;
}
}
bool IsCommentEnd(char c) {
if (c == '\n') {
return true;
} else {
return false;
}
}
bool ParseInit(const std::string& init_buffer, std::map<std::string, std::string>* properties) {
enum ParseState {
StartState,
SectionLabelState,
KeyState,
ValueState,
CommentState
};
int offset = 0;
int start_offset;
std::string key;
std::string value;
ParseState parse_state = StartState;
while (offset < init_buffer.size()) {
switch (parse_state) {
case StartState:
if (IsSectionLabelStart(init_buffer[offset])) {
parse_state = SectionLabelState;
break;
}
if (IsAlphabet(init_buffer[offset])) {
parse_state = KeyState;
start_offset = offset;
break;
}
if (IsCommentStart(init_buffer[offset])) {
parse_state = CommentState;
break;
}
break;
case SectionLabelState:
if (IsSectionLabelEnd(init_buffer[offset])) {
parse_state = StartState;
break;
}
break;
case KeyState:
if (IsKeyEnd(init_buffer[offset])) {
parse_state = ValueState;
key = init_buffer.substr(start_offset, offset - start_offset);
start_offset = offset + 1;
break;
}
break;
case ValueState:
if (IsValueEnd(init_buffer[offset])) {
parse_state = StartState;
value = init_buffer.substr(start_offset, offset - start_offset);
(*properties)[key] = value;
break;
}
break;
case CommentState:
if (IsCommentEnd(init_buffer[offset])) {
parse_state = StartState;
break;
}
break;
default:
break;
}
offset++;
}
if (parse_state == ValueState) {
value = init_buffer.substr(start_offset, offset - start_offset + 1);
(*properties)[key] = value;
}
}
int main(int argc, char** argv) {
std::string init_buffer= " [section1] aa = 1 \n bb = 2 \n [section2] \n cc = 3 \n [section3] \n dd = 4 \n ff = 5\n";
std::map<std::string, std::string> properties;
ParseInit(init_buffer, &properties);
std::map<std::string, std::string>::iterator it = properties.begin();
for (; it != properties.end(); ++it) {
printf("key: %s, value %s \n", it->first.c_str(), it->second.c_str());
}
}
为了提供足够的灵活性,我们为条件的判断使用函数来封装,使得修改更加方便。
参考文献
[1]系统程序员成长计划 P188