6 训练神经网络(上)——激活函数、数据预处理

训练神经网络(上)——激活函数、数据 预处理

Activateion

Sigmoid

σ ( x ) = 1 1 + e − x \sigma(x) = \frac{1}{1 + e^{-x}} σ(x)=1+ex1

  • Squashes numbers to range [0,1]
  • Historically popular

3 problems:

  1. Saturated neurons kill the gradient
  2. Sigmoid outputs are not zero-centered
  3. exp() is a bit compute expensive

tanh(x)

  • Squanshes numbers to range [-1, 1]
  • zero centered 😃
  • still kill gradients when saturated 😦

ReLU

f ( x ) = m a x ( 0 , x ) f(x) = max(0, x) f(x)=max(0,x)

  • Does not saturate 😃
  • very computationally efficient 😃
  • Converges much faster than sigmoid/tanh in practice 😃
  • Actually more biologically plausible than sigmoid 😃

problems:

  • Not zero-centered output

Leaky ReLU

f ( x ) = m a x ( 0.01 x , x ) f(x) = max(0.01x, x) f(x)=max(0.01x,x)

Exponential Linear Units(ELU)

f ( x ) = { x i f x > 0 α ( e x p ( x ) − 1 ) i f x ≤ 0 f(x) = \begin{cases} x \quad if \quad x > 0 \\ \alpha (exp(x)-1) \quad if \quad x \leq 0 \end{cases} f(x)={xifx>0α(exp(x)1)ifx0

Maxout Neuron

m a x ( w 1 T x + b 1 , w 2 T x + b 2 ) max(w_1^Tx + b_1, w_2^Tx + b_2) max(w1Tx+b1,w2Tx+b2)

  • double parameters 😦

Data Preprocessing

Preprocess the data

  • zero-centered data
  • normalized data
  • PCA
  • Whitening

Weight Normalization

  • First idea: Small random numbers
    (gaussian with zero mean and 1e-2 standard deviation)
    W = 0.01 ∗ n p . r a n d o m . r a n d n ( D , H ) W = 0.01 * np.random.randn(D, H) W=0.01np.random.randn(D,H)
    Work Okey for small networks, but problems with deeper networks
  • Xavier initialization
    W = n p . r a n d o m . r a n d n ( f a n _ i n , f a n _ o u t ) / n p . s q r t ( f a n _ i n ) W = np.random.randn(fan\_in, fan\_out) / np.sqrt(fan\_in) W=np.random.randn(fan_in,fan_out)/np.sqrt(fan_in)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值