这是一些可以截断内部文本的示例代码.它使用InnerText属性和CloneNode方法的递归功能.
public static HtmlNode TruncateInnerText(HtmlNode node,int length)
{
if (node == null)
throw new ArgumentNullException("node");
// nothing to do?
if (node.InnerText.Length < length)
return node;
HtmlNode clone = node.CloneNode(false);
TruncateInnerText(node,clone,length);
return clone;
}
private static void TruncateInnerText(HtmlNode source,HtmlNode root,HtmlNode current,int length)
{
HtmlNode childClone;
foreach (HtmlNode child in source.ChildNodes)
{
// is expected size is ok?
int expectedSize = child.InnerText.Length + root.InnerText.Length;
if (expectedSize <= length)
{
// yes,just clone the whole hierarchy
childClone = child.CloneNode(true);
current.ChildNodes.Add(childClone);
continue;
}
// is it a text node? then crop it
HtmlTextNode text = child as HtmlTextNode;
if (text != null)
{
int remove = expectedSize - length;
childClone = root.OwnerDocument.CreateTextNode(text.InnerText.Substring(0,text.InnerText.Length - remove));
current.ChildNodes.Add(childClone);
return;
}
// it's not a text node,shallow clone and dive in
childClone = child.CloneNode(false);
current.ChildNodes.Add(childClone);
TruncateInnerText(child,root,childClone,length);
}
}
还有一个示例C#控制台应用程序,将把这个问题作为一个例子,并将其截断为500个字符.
class Program
{
static void Main(string[] args)
{
var web = new HtmlWeb();
var doc = web.Load("https://stackoverflow.com/questions/30926684/truncating-html-content-at-the-end-of-text-blocks-block-elements");
var post = doc.DocumentNode.SelectSingleNode("//td[@class='postcell']//div[@class='post-text']");
var truncated = TruncateInnerText(post,500);
Console.WriteLine(truncated.OuterHtml);
Console.WriteLine("Size: " + truncated.InnerText.Length);
}
}
当它运行它应该显示:
Mainly when we shorten/truncate textual content we usually just truncate it at specific character index. That's already complicated in HTML anyway,but I want to truncate my HTML content (generated using content-editable div
) using different measures:
- I would define character index
N
that will serve as truncating startpoint limit - Algorithm will check whether content is at least
N
characters long (text only; not counting tags); if it's not it will just return the whole content - It would then
Size: 500
注意:我没有在字边界截断,只是在字符边界,而不是,根本不符合我的意见建议:-)