LEC 3
Mathematical Preliminaries
Common Discrete Probability Distributions
1. Bernoulli distribution
: 伯努利分布
models binary outcomes (coin flip). 模型二进制结果
P
(
X
=
head
) =
p
and
P
(
X
=
tail
) = 1 −
p
2.
Generalised Bernoulli distribution
: 广义伯努利分布
models
k
> 2
outcomes (rolls of a
k
sided die)
3. Binomial distribution: 二项式分布
models a sequence of multiple flip 模拟一个硬币的多次翻转的序列
4. Multinomial distribution: 多项分布
models a sequence of multiple rolls of a -sided die for k>2
If there are rolls and is the number of times the die came up on side , then the probability of this event is
Missing value
解决缺失值的方法:
1.discard
2.
fill in values by hand
3. set “missingValue”
4.
replace with the mean 用两个数之间的平均值来代替这个缺失的值
但是如果出现了一个outlier,那么就会很不准确
5. predict
We can train a new classifier to first predict the missing values in data instances and then train a second classifier to predict the target class using all (original + missing values predicted) the data points.
6.
accept missing values
Noisy value
Over-fitting vs. Under-fitting
过拟合 over-fitting:用于训练集的效果太好,而测试机的效果不好
Feature Normalisation
方法二:高斯归一化