Deep Learning Notes 1

Logistics Classification

  • Softmax function:
    We have a linear function:
    WX+b=y
    where X is the input data and y is the score, our goal is to use machine learning to train the W and b to make the score very high for the correct label. The softmax function takes the scores and turns them into the proper probabilities. The higher probabilities correspond to the higher scores, of course the right label.
    s(yi)=eyijeyj
    If you multiply the scores by 10, then the probabilities either go very close to 1 or go down very close to 0. Otherwise, if you divide the scores by 10, the probabilities turns out to be very close to each other and look like to be derived from the uniform distribution.
    In another word, if you increase the size of your outputs, then the classifier becomes very confident about its predictions, while if you reduce the size of your outputs, your classifier becomes very insure. We want our classifier to be not so sure at the beginning. Overtime, it will gain confidence as it learns.

    • The codes are from the Udacity:
"""Softmax."""

scores = [3.0, 1.0, 0.2]

import numpy as np

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    prob_sum = np.exp(x).sum(axis=0)
    return np.exp(x)/prob_sum

print(softmax(scores))

# Plot softmax curves
import matplotlib.pyplot as plt
x = np.arange(-2.0, 6.0, 0.1)
scores = np.vstack([x, np.ones_like(x), 0.2 * np.ones_like(x)])
plt.plot(x, softmax(scores).T, linewidth=2)
plt.show()
  • Ways to label:

    • One-hot encoding:
      e.g. [a, b, c] -> [1, 0, 0]

      Disadvantages When it comes to super large labels sets, say if you have 10,000 labels, then you’ll get large matrix which is super inefficient.

      Advantages Easy to compare to our outputs by comparing two vectors.

      • Ways to compare two vectors:
        Cross Entropy:
        D(S,L)=Lilog(Si) where S is the output of the softmax function and L is the label
        The cross entropy is not symmetric!!
      • The entire steps: Multinomial Logistics Classification
        D(S(WX+b),L)
        Input Data X —> Logits Y –> Softmax Scores for each y_i –> Cross-Entropy Results Labels
        From Udacity

      • Minimize cross-entropy:
        Training Loss Function is the
        Loss Function : loss(W,b)=1NiD(S(WXi+b),Li)

        1. Gradient Descent:
          Step Chosen
          step=αΔL(w1,w2)
          Above step is just calculated from two weights, but a typical Loss function might have thousands of weights.
          Disadvantages
          Adding small values to super big numbers can result in a lot of errors, e.g:

          a = 1000000000
          b = 0.000001
          for i in range(1000000):
              a += b
          print a-1000000000
          

          The result value is not 1.0 though. A good way to solve it is always to make the training variables(data) have zero mean and equal variances.
          Weight Initialization
          Use small σ to begin with, so that you’ll have an uncertain classifier. As time goes by, the classifier will gain confidence.
          Then:
          W=WαΔWL
          b=bαΔbL
          Loops .. until we reach the minimum of the loss function.

TO BE CONTINUED. Thank You.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值