# 通过正则化regularization防止overfitting

L1 regularization:

C=Co+λnw|w|$\ C=C_{o}+\frac{\lambda }{n}\sum_{w}^{ }|w|$

L2 regularization:

C=Co+λ2nww2$\ C=C_{o}+\frac{\lambda }{2n}\sum_{w}^{ }w^{2}$

C=12nyaL2+λ2nww2$\ C=\frac{1}{2n}\sum_{}\left \| y-a_{}^{L}\right \|^2+\frac{\lambda }{2n}\sum_{w}^{ }w^{2}$

C=12nj[yjlnajL+(1yj))ln(1aLj))]+λ2nww2$\ C=\frac{1}{2n}\sum_{j}^{ }\left [ y_{j}lna\tfrac{j}{L}+(1-y_{j}))ln(1-a_{j}^{L})) \right ]+\frac{\lambda }{2n}\sum_{w}^{ }w^{2}$

C=Co+λ2nww2$\ C=C_{o}+\frac{\lambda }{2n}\sum_{w}^{ }w^{2}$

λ$\\{\lambda}$:　调整两项的相对重要程度，较小的λ$\\{\lambda}$项倾向于让第一项 Co$\ C_{o}$最小化，较大的λ$\\{\lambda}$项倾向于最小化权重之和．

Cw=Cow+λ2nw$\ \frac{\partial C}{\partial w}=\frac{\partial C_{o}}{\partial w}+\frac{\lambda }{2n}w$

Cb=Cob$\ \frac{\partial C}{\partial b}=\frac{\partial C_{o}}{\partial b}$

wwη(Cow+λnw)wηCowηλnw$\ w'＝w-\eta (\frac{\partial C_{o}}{\partial w}+\frac{\lambda }{n}w)＝w-\eta \frac{\partial C_{o}}{\partial w}-\frac{\eta\lambda }{n}w$

bbηCob$\ b'＝b-\eta \frac{\partial C_{o}}{\partial b}$

w1ηλnwηCow$\ w'＝（1-\frac{\eta\lambda }{n}）w-\eta \frac{\partial C_{o}}{\partial w}$

bbηCob$\ b'＝b-\eta \frac{\partial C_{o}}{\partial b}$

w1ηλnwηmxCxw$\ w'＝（1-\frac{\eta\lambda }{n}）w-\frac{\eta }{m} \sum_{x}^{ }\frac{\partial C_{x}}{\partial w}$

bbηmxCxb$\ b'＝ b-\frac{\eta }{m}\sum_{x}^{ } \frac{\partial C_{x}}{\partial b}$

x$\ \sum_{x}^{ }$即SGD在小批量样本x上进行的）