c libxml2解析html,c - libxml2 HTML chunk parsing - Stack Overflow

韩德雨

于 2021-06-15 11:53:02 发布

阅读量81

点赞数

文章标签： c libxml2解析html

博主探讨了在下载大型HTML文件过程中，如何利用libxml2库解析已接收的部分内容，以加快程序对用户显示的速度。尽管原始HTML文件可能存在tidy报告的警告，但博主询问libxml2是否能处理这些不完整的HTML片段。文章涉及到实时解析大文件的策略以及XML和HTML解析库的应用。

摘要由CSDN通过智能技术生成

I'm downloading HTML from a website. The file can be quite large so while the file's downloading, I want to already parse the available chunks of HTML so that the process appears faster for the end-user of my program. I don't have control over how the cunks are generated, so a chunk can begin in the middle of a word, e.g. like so:

chunk 1 --->

XKCD

...and so on.

I have seen example where libxml2 was used to parse XML chunks exactly how I described. Can libxml2 also parse HTML chunks? I have checked with tidy on the html files I'm going to be downloading, it reports warnings but no errors. Can libxml2 parse those HTML chunks as well?