自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(1)
  • 资源 (1)
  • 收藏
  • 关注

原创 ACM-并查集- How Many Tables-HDU - 1213

D - How Many Tables Today is Ignatius' birthday. He invites a lot of friends. Now it's dinner time. Ignatius wants to know how many tables he needs at least. You have to notice that not all the f

2017-07-16 20:54:36 371

automatically maintaining wrappers for semi-structured web sources

需要的可以下载,只有PDF文件,不包含源代码。 摘要如下: A substantial subset of the web data follows some kind of underlying structure. Nevertheless, HTML does not contain any schema or semantic information about the data it represents. A program able to provide software applications with a structured view of those semi-structured web sources is usually called a wrapper. Wrappers are able to accept a query against the source and return a set of structured results, thus enabling applications to access web data in a similar manner to that of information from databases. A significant problem in this approach arises because web sources may experiment changes that invalidate the current wrappers. In this paper, we present novel heuristics and algorithms to address this problem. Our approach is based on collecting some query results during wrapper operation. Then, when the source changes, they are used to generate a set of labeled examples that are then provided as input to a wrapper induction algorithm able to regenerate the wrapper. We have tested our methods in several real-world web data extraction domains, obtaining high accuracy in all the steps of the process.

2020-04-06

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除