数据挖掘导论-2

exploratory data analysis (EDA)

is an approach to analyzing data sets to summarize their main characteristics, often with visual methods.

In this book, we focus on:

1) summary statistics

2) visualization

3) online analytical processing(OLAP)


UCI Machine Learning Repository
http://www.ics.uci.edu/~mlearn/MLRepository.html


1. summary statistics:

1) mean is very sensitive to outliers.Thus, the median or a trimmed mean is also commonly used.

2) variance is also sensitive to outliers.

Average absolute deviation:


2. Visualization

box plot:

Parallel Coordinates:

不使用纵轴。横轴上是很多attribute(顺序影响解读),每个样本的各属性值在横轴上方的位置标好,连线,即每个样本用一条线表示。


3. OLAP

OLAP uses a multidimensional array representation.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值