之前已经整理了:
现在继续折腾。
1.有空再去试试:
2.搜
C# beautifulSoup
而找到:
3.去试试:
参考官网代码,把示例代码添加进来了:using Sgml;
using System.Xml;
using System.IO;
XmlDocument FromHtml(TextReader reader)
{
// setup SgmlReader
Sgml.SgmlReader sgmlReader = new Sgml.SgmlReader();
sgmlReader.DocType = "HTML";
sgmlReader.WhitespaceHandling = WhitespaceHandling.All;
sgmlReader.CaseFolding = Sgml.CaseFolding.ToLower;
sgmlReader.InputStream = reader;
// create document
XmlDocument doc = new XmlDocument();
doc.PreserveWhitespace = true;
doc.XmlResolver = null;
doc.Load(sgmlReader);
return doc;
}
4.然后再去参考:
去看看如何使用。
最后,经过简单修改,就可以将html转换为xml的document了:using Sgml;
using System.Xml;
using System.IO;
XmlDocument htmlToXmlDoc(string html)
{
// setup SgmlReader
Sgml.SgmlReader sgmlReader = new Sgml.SgmlReader();
sgmlReader.DocType = "HTML";
sgmlReader.WhitespaceHandling = WhitespaceHandling.All;
sgmlReader.CaseFolding = Sgml.CaseFolding.ToLower;
//sgmlReader.InputStream = reader;
sgmlReader.InputStream = new StringReader(html);
// create document
XmlDocument doc = new XmlDocument();
doc.PreserveWhitespace = true;
doc.XmlResolver = null;
doc.Load(sgmlReader);
return doc;
}
5.后来,分别遇到并解决了:
6.
7.