html的text段尾,c# – 截断文本块末尾的HTML内容(块元素)

这是一些可以截断内部文本的示例代码.它使用InnerText属性和CloneNode方法的递归功能.

public static HtmlNode TruncateInnerText(HtmlNode node,int length)

{

if (node == null)

throw new ArgumentNullException("node");

// nothing to do?

if (node.InnerText.Length < length)

return node;

HtmlNode clone = node.CloneNode(false);

TruncateInnerText(node,clone,length);

return clone;

}

private static void TruncateInnerText(HtmlNode source,HtmlNode root,HtmlNode current,int length)

{

HtmlNode childClone;

foreach (HtmlNode child in source.ChildNodes)

{

// is expected size is ok?

int expectedSize = child.InnerText.Length + root.InnerText.Length;

if (expectedSize <= length)

{

// yes,just clone the whole hierarchy

childClone = child.CloneNode(true);

current.ChildNodes.Add(childClone);

continue;

}

// is it a text node? then crop it

HtmlTextNode text = child as HtmlTextNode;

if (text != null)

{

int remove = expectedSize - length;

childClone = root.OwnerDocument.CreateTextNode(text.InnerText.Substring(0,text.InnerText.Length - remove));

current.ChildNodes.Add(childClone);

return;

}

// it's not a text node,shallow clone and dive in

childClone = child.CloneNode(false);

current.ChildNodes.Add(childClone);

TruncateInnerText(child,root,childClone,length);

}

}

还有一个示例C#控制台应用程序,将把这个问题作为一个例子,并将其截断为500个字符.

class Program

{

static void Main(string[] args)

{

var web = new HtmlWeb();

var doc = web.Load("https://stackoverflow.com/questions/30926684/truncating-html-content-at-the-end-of-text-blocks-block-elements");

var post = doc.DocumentNode.SelectSingleNode("//td[@class='postcell']//div[@class='post-text']");

var truncated = TruncateInnerText(post,500);

Console.WriteLine(truncated.OuterHtml);

Console.WriteLine("Size: " + truncated.InnerText.Length);

}

}

当它运行它应该显示:

Mainly when we shorten/truncate textual content we usually just truncate it at specific character index. That's already complicated in HTML anyway,but I want to truncate my HTML content (generated using content-editable div) using different measures:

  1. I would define character index N that will serve as truncating startpoint limit
  2. Algorithm will check whether content is at least N characters long (text only; not counting tags); if it's not it will just return the whole content
  3. It would then

Size: 500

注意:我没有在字边界截断,只是在字符边界,而不是,根本不符合我的意见建议:-)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值