课堂笔记——Data Mining(1)

一、Introduction

……

1、Major Issues in Data Mining

User Interaction

Presentation and visualization of data mining results : Efficiency and Scalability

Diversity of data types: complex types of data; Mining dynamic, networked, and global data repositories 

Data mining and society: Privacy-preserving; Social impacts of data mining; Invisible data mining

 

二、Getting to Know Your Data

1、Type of Data Sets

Record:Relational records; Data matrix; Text documents; Transaction data

2、 Important Characteristics of Structured Data

Dimensionality: Curse of dimensionality;

Sparsity: Only presnce counts;

Resolution: Patterns depend on the scale;

Distribution: Centrality and dispersion 

3、Attribute (dimensions features varibles)

types: Nominal; Ordinal; Binary: Symmetric, Asymmetric; Quantity: Interval, Ratio

Discrete Attribute

Continuous Attribute

4、Basic Statistical Descriptions of Data

Data dispersion characterstics: median, max, min, quantiles, outliers, variance

mean:Weighted arithmetic mean; Trimmed mean

5、Measuring the Dispersion of Data

Quartiles:Q1(25th percentile)、Q3(75th percentile)

Inter-quartile range(IQR):最当中的50%

Five number summary :min、Q1,median、Q3、max

6、Graphic Displays of Basic Statistcal Description 

7、五种数据分析图

boxplot analysis:

Histogram Analysis

Quantile Plot

Quantile-Quantile Plot(Q-Q Plot)

Scatter Plot

8、 Categorization of visualization methods

Pixel-orirnted: 

① The m dimension values of a record are mapped to m pixels at the corresponding positions in the windows

② The color of pixel reflect corresponding values

③ For  a dataset of m dimensions, create m windows on the screen, one for each dimension

Parallel Coordinates:用于画k维属性的图。

Geometric projection

Icon-based

Chenoff Faces:

 Stick Figures:A 5-piece stick figure

Hierarchical:

Dimensional Stacking

Worlds-within-Worlds

Tree-Map

Infocube

8、Similarity and  Dissimilarity

① Data matrix

② Dissimilarity matrix

Proximity Measure of Nominal Attributes

a. Simple matching

b. Use a large number of binary attributes: create a new binary attribute for each  

Standardizing Numeric Data: z-score

 

 

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值