html截取摘要并补齐标签(htmlparser)一

最新推荐文章于 2021-06-18 12:48:53 发布

digyso888

最新推荐文章于 2021-06-18 12:48:53 发布

阅读量1k

点赞数

分类专栏： HtmlParser 文章标签： html 算法 java

7 篇文章 0 订阅

订阅专栏

从html里面截取摘要关键在于在截取的时候不能截断它里面的标签，所以就要想法让它在截取的时候能截全标签：方法是写一个算法让它在截取所指定长度时只计算标签

外面文本的数量而标签里面的长度不计算在内，这样才能以指定的长度截取到不会断节的标签：

　　下面就是这个小算法（见笑了）：

Java代码

public static String readWithTag(File filename,int length) throws IOException {
String content = readFileByLines(filename);
int pos = 0,len = 0,count = 0;
String s = "";
StringBuffer sb = new StringBuffer();
while(true)
{
if(count >= length)
break;
s = content.substring(pos, pos+1);
if(s.equals("<"))
{
len = content.indexOf(">", pos)-pos;
for(int i=0;i<len;i++)
{
s = content.substring(pos+i, pos+i+1);
sb.append(s);
}
pos += len;
}
else
{
if(count < length)
{
if(s.equals(">"))
{
sb.append(s);
pos++;
}
sb.append(s);
count++;
pos++;
}
}
}
return sb.toString();
}

详情请见下一篇:html截取摘要并补齐标签二