Deep Learning by Andrew Ng --- Softmax regression

最新推荐文章于 2023-01-06 13:52:26 发布

Xiaomin-Wu

最新推荐文章于 2023-01-06 13:52:26 发布

阅读量1.7k

点赞数

分类专栏： ML 文章标签： andrew

本文链接：https://blog.csdn.net/meanme/article/details/44873519

版权

ML 专栏收录该内容

26 篇文章

订阅专栏

本文解析了UFLDL编程练习中的Softmax回归成本函数及梯度计算方法，并提供了具体的MATLAB实现代码示例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

这是UFLDL的编程练习。

Weight decay（Softmax 回归有一个不寻常的特点：它有一个“冗余”的参数集）后的cost function和梯度函数：

cost function： $J (θ) = - 1 m ⎡ ⎣ \sum i = 1 m \sum j = 1 k 1 {y (i) = j} log e θ T j x ( i ) \sum k l = 1 e θ T l x ( i ) ⎤ ⎦ + λ 2 \sum i = 1 k \sum j = 0 n θ 2 i j$ $\begin{align} J(\theta) = - \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }} \right] + \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^n \theta_{ij}^2 \end{align}$
梯度函数：

\nabla θ j J (θ) = - 1 m \sum i = 1 m [x (i) (1 {y (i) = j} - p (y (i) = j | x (i); θ))] + λ θ j

$\begin{align} \nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\} - p(y^{(i)} = j | x^{(i)}; \theta) ) \right] } + \lambda \theta_j \end{align}$ p(y(i)=j|x(i);θ))等于UFLDL练习中step2中的h。

bsxfun函数的使用：

to prevent overflow, simply subtract some large constant value from each of the $θ T j x (i)$ $\begin{align} \theta_j^T x^{(i)}\end{align}$ terms before computing the exponential：
% M is the matrix as described in the text
M = bsxfun(@minus, M, max(M, [], 1));
use the following code to compute the hypothesis：
% M is the matrix as described in the text
M = bsxfun(@rdivide, M, sum(M）

练习题答案（建议自己完成，后参考）：

softmaxCost.m:

M = theta*data; %exp(theta(l)' * x(i))
M = bsxfun(@minus, M, max(M, [], 1));  
h = exp(M);
h =  bsxfun(@rdivide, h, sum(h));  
size(groundTruth);
cost = -1/numCases*sum(sum(groundTruth.*log(h)))+lambda/2*sum(sum(theta.^2));  
thetagrad = -1/numCases*((groundTruth-h)*data')+lambda*theta;

softPredict.m:

[index ,  pred]= max(theta * data,[],1);