c# 正则删除 html代码,使用C#正则表达式删除HTML标签

The correct answer is don't do that, use the HTML Agility Pack.

Edited to add:

To shamelessly steal from the comment below by jesse, and to avoid being accused of inadequately answering the question after all this time, here's a simple, reliable snippet using the HTML Agility Pack that works with even most imperfectly formed, capricious bits of HTML:

HtmlDocument doc = new HtmlDocument();

doc.LoadHtml(Properties.Resources.HtmlContents);

var text = doc.DocumentNode.SelectNodes("//body//text()").Select(node => node.InnerText);

StringBuilder output = new StringBuilder();

foreach (string line in text)

{

output.AppendLine(line);

}

string textOnly = HttpUtility.HtmlDecode(output.ToString());

There are very few defensible cases for using a regular expression for parsing HTML, as HTML can't be parsed correctly without a context-awareness that's very painful to provide even in a nontraditional regex engine. You can get part way there with a RegEx, but you'll need to do manual verifications.

Html Agility Pack can provide you a robust solution that will reduce the need to manually fix up the aberrations that can result from naively treating HTML as a context-free grammar.

A regular expression may get you mostly what you want most of the time, but it will fail on very common cases. If you can find a better/faster parser than HTML Agility Pack, go for it, but please don't subject the world to more broken HTML hackery.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值