一、 maximum log-likelihood estimation最大log似然估计的概念
具体参考博文:https://towardsdatascience.com/probability-concepts-explained-maximum-likelihood-estimation-c7b4342fdbb1
要点:
1、为什么叫最大似然估计而不是最大似然概率?——答:理解下面这张图,因为虽然“the probability density of the data given the parameters【右式】”等价于“the likelihood of the parameters given the data【左式】”,但是左式要求的是参数,右式要求的是数据,此处我们要求参数,因此叫likelihood.
2、为什么要引入log?——答:因为引入log之后,对乘或者除的求导,可以转化成对加和减的求导,求导更加方便。
3、什么是参数?——答:parameters define a blueprint for the model. It is only when specific values are chosen for the parameters that we get an instantiation for the model that describes a given phenomenon.
4、Intuitive explanation of maximum likelihood estimation?——答:Maximum likelihood estimation is a method that determines values for the parameters of a model. The parameter values are found such that they maximise the likelihood that the process described by the model produced the data that were actually observed.
二、KL-divergence KL散度的概念
参考博文:https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-understanding-kl-divergence-2b382ca2b2a8
定义:
三、Jensen不等式
参考博文:https://blog.csdn.net/baidu_38172402/article/details/89090383
注意可以由两点推广到面,甚至是积分。【假设推断法证明】
四、贝叶斯公式分解
ATTENTION!:注意“猫分类器”和“猫图像生成器”之间的区别。“猫图像生成器”的输出应该是猫的图像。而“猫分类器”输出一个0~1的数值,对应分类为猫的概率。
四、变分贝叶斯
https://www.jianshu.com/p/86c5d1e1ef93