Data Transformation
Now we have an error free dataset
Still needs to be standarized
Type Conversion
Normalization
Sampling
数据的类型
Continuous
Real values:Temperature,Height Weight
Discrete
Integer values:Number of people
Nominal
Symbols:{Teacher,Worker,Salesman},{Red,Green,Blue}...
String
Text Tsinghua University
编码问题?
0010
0100
1000
0001
Sampling
A database/data warehouse may store terabytes data
Processing limits:CPU,Memory,I/O