DataMining(2)_Data Preprocessing

Data Quality: Why Preprocess the Data?

Accuracy: correct or wrong, accurate or not
Completeness: not recorded, unavailable
Consistency: some modified but some not, dangling
Timeliness: timely update?
Believability: how trustable the data are correct?
Interpretability: how easily the data can be understood?

Major Tasks in Data Preprocessing

  1. Data cleaning
    Data in the Real World Is Dirty: Lots of potentially incorrect data, e.g., instrument faulty, human or computer error, transmission error
    Incomplete (Missing) Data
    Noisy Data
    Binning
    Regression
    Clustering
    Combined computer and human inspection

  2. Data integration
    Combines data from multiple sources into a coherent store
    Handling Redundancy in Data Integration
    Correlation Analysis
    1).Nominal Data:
    这里写图片描述
    2).Numeric Data
    这里写图片描述
    这里写图片描述

  3. Data reduction
    Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same (or almost the same) analytical results
    Data reduction strategies
    Dimensionality reduction, e.g.,remove unimportant attributes
    Wavelet transforms
    Principal Components Analysis (PCA)
    Feature subset selection, feature creation

Numerosity reduction(some simply call it: Data Reduction)
Regression and Log-Linear Models
Histograms, clustering, sampling
Data cube aggregation

Data compression

4. Data transformation and data discretization

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值