逻辑回归
-
逻辑回归的梯度下降法推导
-
逻辑回归目标函数为凸函数
训练数据 D = { ( x 1 , y 1 ) , ⋯ , ( x n , y n ) } D = \{ (\mathbf{x}_{1}, y_{1}), \cdots, (\mathbf{x}_{n}, y_{n}) \} D={ (x1,y1),⋯,(xn,yn)},其中 ( x i , y i ) (\mathbf{x}_{i}, y_{i}) (xi,yi)表示 一条样本, x i ∈ R D \mathbf{x}_{i} \in \R^{D} xi∈RD为 D D D维样本特征(feature), y i ∈ { 0 , 1 } y_{i} \in \{ 0, 1\} yi∈{ 0,1}表示样本标签(label)。
逻辑回归模型的参数为 ( w , b ) (\mathbf{w}, b) (w,b)。为推导方便,通常将 b b b整合到 w \mathbf{w} w中,此时, w \mathbf{w} w和 x i \mathbf{x}_{i} xi分别改写为
w = [ w 0 , w 1 , ⋯ , w D ] , x i = [ 1 , x 1 , ⋯ , x D ] \mathbf{w} = [w_{0}, w_{1}, \cdots, w_{D}], \ \mathbf{x}_{i} = [1, x_{1}, \cdots, x_{D}] w=[w0,w1,⋯,wD], xi=[1,x1,⋯,xD]
1 逻辑回归的目标函数
目标函数(objective function),也称为损失函数(loss function),记为 L ( w ) \mathcal{L} (\mathbf{w}) L(w)。
二分类问题模型
p ( y ∣ x ; w ) = p ( y = 1 ∣ x ; w ) y [ 1 − p ( y = 1 ∣ x ; w ) ] 1 − y (1) p(y | \mathbf{x}; \mathbf{w} ) = p(y = 1 | \mathbf{x}; \mathbf{w})^{y} [1 - p(y = 1 | \mathbf{x}; \mathbf{w})]^{1 - y} \tag {1} p(y∣x;w)=p(y=1∣x;w)y[1−p(y=1∣x;w)]1−y(1)
最大似然估计(MLE)
w ∗ = arg max w p ( y ∣ x ; w ) = arg max w ∏ i = 1 n p ( y i ∣ x i ; w ) = arg max w log [ ∏ i = 1 n p ( y i ∣ x i ; w ) ] = arg max w ∑ i = 1 n log [ p ( y i ∣ x i ; w ) ] = arg max w ∑ i = 1 n log [ p ( y i = 1 ∣ x i ; w ) y i [ 1 − p ( y i = 1 ∣ x i ; w ) ] 1 − y i ] = arg max w ∑ i = 1 n [ y i log p ( y i = 1 ∣ x i ; w ) + ( 1 − y i ) log [ 1 − p ( y i = 1 ∣ x i ; w ) ] ] (2) \begin{aligned} \mathbf{w}^{\ast} & = \arg \max_{\mathbf{w}} p(\mathbf{y} | \mathbf{x}; \mathbf{w} ) \\ & = \arg \max_{\mathbf{w}} \prod_{i = 1}^{n} p(y_{i} | \mathbf{x}_{i}; \mathbf{w} ) \\ & = \arg \max_{\mathbf{w}} \log \left[ \prod_{i = 1}^{n} p(y_{i} | \mathbf{x}_{i}; \mathbf{w} ) \right] \\ & = \arg \max_{\mathbf{w}} \sum_{i = 1}^{n} \log \left[ p(y_{i} | \mathbf{x}_{i}; \mathbf{w} ) \right] \\ & = \arg \max_{\mathbf{w}} \sum_{i = 1}^{n} \log \left[ p(y_{i} = 1 | \mathbf{x}_{i}; \mathbf{w})^{y_{i}} [1 - p(y_{i} = 1 | \mathbf{x}_{i}; \mathbf{w})]^{1 - y_{i}} \right] \\ & = \arg \max_{\mathbf{w}} \sum_{i = 1}^{n} \left[ y_{i} \log p(y_{i} = 1 | \mathbf{x}_{i}; \mathbf{w}) + (1 - y_{i}) \log [1 - p(y_{i} = 1 | \mathbf{x}_{i}; \mathbf{w})] \right] \\ \end{aligned} \tag {2} w∗=argwmaxp(y∣x;w)=argwmaxi=1∏np(yi∣xi;w)=argwmaxlog[i=1∏np(y