数据挖掘导论-1

Classification[Predictive]
Clustering [Descriptive]
Association Rule Discovery[Descriptive]
Sequential Pattern Discovery[Descriptive]
Regression[Predictive]
Deviation Detection[Predictive]

 

categorical/qualitative
1) nominal:
mode众数
entropy熵
contingency correlation列联相关
x,2-test卡方检验

2) Ordinal: median/percentiles/rank correlation/
run tests游程检验
sign test符号检验

numeric/quantitative

3) Interval:
mean/standard deviation/Pearson's correlation/t and F tests
4) Ratio:
geometric mean/harmonic mean/percent variation百分比变差


 data quality problems:

1) Noise and outliers
2) missing values
why: 1. info not collected; 2. attributes not applicable for all
how: 1. eliminate data objects; 2. estimate missing values; 3. Ignore missing values during analysis; 4. replace with all possible values(weighted by probabilities)
3) duplicate data


data preprocessing:
1) aggregation
2) sampling
3) dimensionality reduction
curse of dimensionality: dimensionality↑sparse↑,density & distance meaningful↓
how: Principle Component Analysis; Singular Value Decomposition
4) feature subset selection

5) feature creation

feature extraction: domain-specific
mapping data to new space: Fourier transform/Wavelet transform
feature construction: combining features

6) discretization and binarization
7) attribute transformation


 


 



Euclidean density = number of points per unit volume

 

转载于:https://www.cnblogs.com/pxy7896/p/6493064.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值