c 去除html标签样式,13.4. 去除HTML的标签tag:htmlRemoveTag

13.4. 去除HTML的标签tag:htmlRemoveTag

/*

* [Function]

* remove html tag, retain html content

* [Input]

* html, with tag

*

* [Output]

* pure content, no html tag

*

* [Note]

*/

public string htmlRemoveTag(string html)

{

string filteredHtml = "";

if (!string.IsNullOrEmpty(html))

{

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

htmlDoc.LoadHtml(html);

if (htmlDoc == null)

{

return "";

}

// 1. remove all comments

//(1)get all comment nodes using XPATH

HtmlNodeCollection commentNodeList = htmlDoc.DocumentNode.SelectNodes("//comment()");

if (commentNodeList != null)

{

foreach (HtmlNode comment in commentNodeList)

{

//(2) remove comment node itself

comment.ParentNode.RemoveChild(comment);

}

}

//2. get all content

foreach (var node in htmlDoc.DocumentNode.ChildNodes)

{

filteredHtml += node.InnerText;

}

}

return filteredHtml;

}

例 13.4. htmlRemoveTag 的使用范例

HtmlAgilityPack.HtmlDocument htmlDoc = crl.htmlToHtmlDoc(googleSearchRespHtml);

HtmlNodeCollection liNodeList = htmlDoc.DocumentNode.SelectNodes("//li[@class='g']");

foreach (HtmlNode liNode in liNodeList)

{

HtmlNode h3ANode = liNode.SelectSingleNode(".//h3[@class='r']/a");

if (h3ANode != null)

{

googleSearchResultItem singleResultItem = new googleSearchResultItem();

//string titleHtml = h3ANode.InnerHtml; //"Amritanandamayi Math to sponsor charity events - Times Of India"

string titleHtml = h3ANode.InnerText; //"Amritanandamayi Math to sponsor charity events - Times Of India"

string filteredTitle = crl.htmlRemoveTag(titleHtml);

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值