I have an HTML file with very bad formatted code that I get from a website, I want to extract some very small pieces of information.搜索HTML线和删除线不与
I am only interested in lines that start like this:
user897HouseA2HouseA Type12 1 of 2user12310and I want to extract 3 fields:
A:HouseA
B:HouseA Type12
C:user123
D:10
I know I've seen people recommend HTML Agility Pack and lib2xml but I really don't think I need all that. My app is in C/C++.
I am already using getline to start reading lines, I am just not sure what's the best way to proceed. Thanks!
std::ifstream data("Home.html");
std::string line;
while(std::getline(data,line))
{
linenum++;
std::stringstream lineStream(line);
std::string user;
if (strncmp(line.c_str(), "
",strlen("")) == 0){
printf("found a wanted line in line:%d\n", linenum);
}
}
2011-02-17
emge
+0
你有没有尝试用正则表达式解析你的HTML? :-p –
2011-02-17 22:48:43
+0
你有什么库可以使用C++ stdlib吗?你的目标是什么平台? –
2011-02-17 22:51:14