# 机器学习案例系列教程——损失函数总结

## 注意区分样本损失，和样本集的损失

$loss\left(y,f\left(x\right)\right)$

$\frac{1}{n}\sum _{i=1}^{n}loss\left(y,f\left(x\right)\right)$

$min\frac{1}{N}\sum _{i=1}^{N}L\left({y}_{i},f\left({x}_{i}\right)\right)$

$min\frac{1}{N}\sum _{i=1}^{N}L\left({y}_{i},f\left({x}_{i}\right)\right)+\lambda J\left(f\right)$

## 0-1损失函数

$L\left(Y,f\left(X\right)\right)=\left\{\begin{array}{rl}& 1,\phantom{\rule{1em}{0ex}}Y\ne f\left(X\right)\\ & 0,\phantom{\rule{1em}{0ex}}Y=f\left(X\right)\end{array}$

## 对数损失函数(logarithmic loss function)

$L\left(Y,P\left(Y|X\right)\right)=-logP\left(Y|X\right)$

$J\left(w,b\right)=-\frac{1}{m}\sum D\left({y}_{i},{p}_{i}\right)$

$J\left(w,b\right)=-\frac{1}{m}\sum _{i=1}^{m}\left[{y}^{\left(i\right)}log{\stackrel{^}{y}}^{\left(i\right)}+\left(1-{y}^{\left(i\right)}\right)log\left(1-{\stackrel{^}{y}}^{\left(i\right)}\right)\right]$

$L\left(Y,f\left(x\right)\right)=-\sum _{j=1}^{n}{Y}_{j}log{p}_{j}$

## 平方损失函数（最小二乘法, Ordinary Least Squares ）

$L\left(Y,f\left(X\right)\right)=\left(Y-f\left(X\right){\right)}^{2}$

## 绝对值损失函数(absolute loss function)

$L\left(Y,f\left(X\right)\right)=|Y-f\left(X\right)|$

## 指数损失函数

$L\left(y,f\left(x\right)\right)=exp\left[-yf\left(x\right)\right]$

$L\left(y,f\left(x\right)\right)=\frac{1}{n}\sum _{i=1}^{n}exp\left[-{y}_{i}f\left({x}_{i}\right)\right]$

## Hinge损失函数（SVM）

$minL\left(w\right)=\frac{1}{n}\sum _{i=1}^{n}H\left({y}_{i}f\left({x}_{i},w\right)\right),\phantom{\rule{0ex}{0ex}}where\phantom{\rule{thinmathspace}{0ex}}H\left(t\right)=\left\{\begin{array}{cc}-t+1& t<1\\ 0& t\ge 1\end{array}$

## 感知机损失函数（L1 margin cost）

$minL\left(w\right)=\frac{1}{n}\sum _{i=1}^{n}H\left({y}_{i}f\left({x}_{i},w\right)\right),\phantom{\rule{0ex}{0ex}}where\phantom{\rule{thinmathspace}{0ex}}H\left(t\right)=\left\{\begin{array}{cc}-t& t<0\\ 0& t\ge 0\end{array}$

## 决策树损失函数：

${H}_{t}\left(T\right)=-\sum _{k=1}^{K}\frac{{N}_{tk}}{{N}_{t}}log\frac{{N}_{tk}}{{N}_{t}}$

${C}_{\alpha }\left(T\right)=\sum _{t=1}^{\mid T\mid }{N}_{t}{H}_{t}\left(T\right)+\alpha \mid T\mid$

## L1和L2正则化

L1正则化假设了模型的先验概率分布服从拉普拉斯分布；L2正则化假设了模型的先验概率分布服从高斯分布。