第一篇参考文献
AN APPLICATION OF THE FELLEGI-SUNTER MODEL OF
RECORD LINKAGE TO THE 1990 U.S. DECENNIAL
CENSUS
记录连接的FELLEGI-SUNTER模型在美国1990年十年人口普查中的应用
William E. Winkler and Yves Thibaudeau
U.S. Bureau of the Census
Abstract:This paper describes a methodology for computer matching the Post Enumeration Survey with the Census. Computer matching is the first stage of a process for producing adjusted Census counts. All crucial matching parameters are computed solely using characteristics of the files being matched. No a priori
knowledge of truth of matches is assumed. No previously created lookup tables are needed. The methods are illustrated with numerical results using files from the 1988 Dress Rehearsal Census for which the truth of matches is known.
Key words and phrases : EM Algorithm ; String Comparator Metric ; LP Algorithm; Decision Rule ; Error Rate.
摘要:本文介绍了一种电脑匹配人口普查的覆核统计调查的方法。电脑匹配是产生调整后人口普查计数的第一阶段。所有关键的匹配参数计算仅使用被匹配的文件的特点。没有假定的先验匹配真理的知识。先前创建的查找表也不是必要的。这个方法说明了使用了已知匹配的真相的1988年的预演人口普查的档案的计算结果。
关键词和短语:EM算法;字符串比较公制; LP算法;决策规则;错误率。
第二篇参考文献
Data Cleaning: Problems and Current Approaches
数据清洗:问题与目前的做法
Erhard Rahm , Hong Hai Do
University of Leipzig, Germany
Abstract:We classify data quality problems that are addressed by data cleaning and provide an overview of the main solution approaches. Data cleaning is especially required when integrating heterogeneous data sources and should be addressed
together with schema-related data transformations. In data warehouses, data cleaning is a major part of the so-called ETL process. We also discuss current tool support for data cleaning.
摘要:我们给处理数据清洗的数据质量问题分类,并提供了一个概览的主要解决办法。数据清洗是必要的,尤其是集成异构数据源时,应与模式相关的数据转换一同处理。在数据仓库,数据清洗是一个所谓的ETL过程中的重要组成部分。