论文: Blog Post and Comment Extraction Using Information Quantity of Web Format

该文介绍了一种抽取博客正文和评论的方法.

Donglin Cao, Xiangwen Liao, Hongbo Xu, Shuo Bai. Blog Post and Comment Extraction Using Information Quantity of Web Format. In Proceedings of the 2008 Asia Information Retrieval Symposium(AIRS-2008), January 15-28, 2008, Harbin, China.

Abstract: With the development of the research on blogosphere, acquiring the post and comment from blog page becomes more important in improving the search performance. In this paper, we present a two-stage method. First, we combine the advantage of the vision information and the effective text information to locate the main text which represents the theme of blog page. Second, we use the information quantity of separator to detect the boundary between the post and comment. According to our experiments, this method achieves a good performance in extraction and improves the performance of blog search.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值