Revision
KNN
Difficiencies:
- Train O ( 1 ) O(1) O(1) Predict O ( n ) O(n) O(n)
- ignore the position information, as well as the connection between pixels
K-fold validation
Linear Classifier
We need activation functions, and the most common one is ReLU.
Loss funtions
- SVM loss: ∑ j ≠ y i max ( 0 , s j − s y i + 1 ) \sum_{j \neq y_i} \max(0,s_j-s_{y_i}+1) ∑j=yimax(0,sj−syi+1)
Intuition: If the probability of the correct label is far beyond(1) the others, no loss added
Otherwise, add the distance into the loss
2.softmax loss: L i = − log ( e x p ( s y i ) ∑ e x p ( s j ) ) L_i = - \log(\frac{exp(s_{y_i})}{\sum exp(s_j)}) Li=−log(∑exp(sj)exp(syi))
Intuition: result from maximum likelihood estimation.
Use backprop Implentaion to simplify the calculation.
(Not in Assignment 1 qaq.)
Attention: We must use implicit multiplication of Jacobian to avoid explicitly forming it.(内存占用)
Q&A for Assignment 1
Q1: What if we find a W s.t. SVM loss = 0?
这会导致 k W kW kW 都能使loss为0,所以也体现了Regularization 的重要性。
Q2: Why nan appear?
Numerical Instability
在计算机中表示实数时,几乎总会引入一些近似误差。在许多情况下,这仅仅是舍入误差。舍入误差会导致一些问题,特别是当许多操作复合时,即使是理论上可行的算法,如果在设计时没有考虑最小化舍入误差的累积,在实践时也可能会导致算法失效。
A2在实现softmax的时候,就涉及了这个问题:
s o f t m a x ( x ) = e x i ∑ j = 1 n e x j softmax(x) = \frac{e^{x_i}}{\sum_{j=1}^n e^{x_j}} softmax(x)=∑j=1nexjexi
如果所有的 x i x_i xi 都是常数 c 的话,正常输出应该都是 1 n \frac{1}{n} n1,但当 c → − inf c \rightarrow -\inf c→−inf 的时候,就会发生下溢(underflow):趋近于0的一个很小的正数,作为分母可能会导致 n a n nan nan。同样的当 $ c \rightarrow \inf$ 的时候会发生上溢(overflow),导致 n a n nan nan的出现。
解决方法
x i − = max { x j } x_i -= \max\{x_j\} xi−=max{xj},这样最大项是0,不会上溢,分母存在1这项,不会下溢!
Q2: Feature methods?
Roughly speaking, HOG should capture the texture of the image while ignoring color information, and the color histogram represents the color of the input image while ignoring texture. As a result, we expect that using both together ought to work better than using either alone.
Using the extracted features, we can get great improvements on the accuarcy.