Latex中的常用公式模板

最新推荐文章于 2025-03-18 20:10:30 发布

wzg2016

最新推荐文章于 2025-03-18 20:10:30 发布

阅读量6.5k

点赞数 3

原文链接：https://www.cnblogs.com/Sinte-Beuve/p/6160905.html

版权

在学习机器学习中会接触到大量的数学公式，所以在写博客是会非常的麻烦。用公式编辑器一个一个写会非常的麻烦，这时候我们可以使用LaTeX来插入公式。

写这篇博文的目的在于，大家如果要编辑一些简单的公式，就不必自己写，直接copy过去修改下就能用了。所以下面仅列出些常用的grammar。随着、机器学习的深入会添加更多的相关公式。

LaTeX公式基础

这里的基础嫌烦的话可以先不看，直接看杂例，有不理解的地方在回来看这里的内容。此处知识摘取了一些简单的语法，如果需要完整的LaTeX书写数学公式的文档，见参考文献。

排版方式

行级元素(inline)，行级元素使用$...$，两个$表示公式的首尾。

块级元素(displayed)，块级元素使用$$...$$。块级元素默认是居中显示的。

常用西文符号

\alpha, \beta,...,\omega 代表 $\alpha, \beta,...,\omega$ ; 大写字母,

使用 \Gamma, \Delta,..., \Omega代表 $\Gamma, \Delta,..., \Omega$ Γ,Δ,…,Ω.

上标与下标

使用 ^和 _ 表示上标和下标. 例如

x_i^2: $x_i^2$ ,

\log_2 x: $\log_2 x$

使用{}来消除二义性——优先级问题。例如10^10: $10^10$ ，显然是错误的，要显示 $10^{10}$ ,正确的语法应该是10^{10}。

括号

小括号和中括号直接使用，大括号由于用来分组，所以需要转义。\{1+2\}: $\{1+2\}$

运算

分数：\frac{}{}。例如，\frac{1+1}{2}+1: $\frac{1+1}{2}+1$
求和：\sum_1^n: $\sum_1^n$
积分：\int_1^n: $\int_1^n$
极限：lim_{x \to \infty: $lim_{x \to \infty$
矩阵：$$\begin{matrix}…\end{matrix}$$，使用&分隔同行元素，\\换行。例如：

$$
        \begin{matrix}
        1 & x & x^2 \\
        1 & y & y^2 \\
        1 & z & z^2 \\
        \end{matrix}
$$

得到的公式为：

$$
\begin{matrix}
1 & x & x^2 \\
1 & y & y^2 \\
1 & z & z^2 \\
\end{matrix}
$$

杂例

LSTM 公式

\begin{equation}
\begin{array}{ll}
\hat{C}_t = tanh(W_C\otimes[h_{t-1},x_t]+b_C) \\[2ex] % 输入信息： 记忆门--从融合信息i_t中提取信息，作用于输入信息
i_t=\sigma(W_i\otimes[h_{t-1},x_t]+b_i) \\[2ex] % 记忆门，输入信息融合--融合h_{t-1},x_t的信息，或称融合门
f_t=\sigma(W_f\otimes[h_{t-1},x_t]+b_f) \\[2ex] % 遗忘门--从cell中忘记东西，记忆与以遗忘都是针对cell的，作用于cell信息
C_t=f_t * C_{t-1}+i_t * \hat{C}_t \\[2ex] % cell 状态更新，包含两部分：从上一个cell忘记一些信息，从当前融合信息提取需要记忆的信息
o_t=\sigma(W_o\otimes[h_{t-1},x_t]+b_o) \\[2ex] % 输出门--负责从更新后的cell提取输出信息
h_t=o_t * tanh(C_t) % 隐状态输出--利用当前的cell状态获取输出。所以cell式操作的核心。记忆，遗忘，输出对cell服务的
\end{array}
\end{equation}

\begin{equation}
\begin{array}{ll}
\hat{C}_t = tanh(W_C\otimes[h_{t-1},x_t]+b_C) \\[2ex]
i_t=\sigma(W_i\otimes[h_{t-1},x_t]+b_i) \\[2ex]
f_t=\sigma(W_f\otimes[h_{t-1},x_t]+b_f) \\[2ex]
C_t=f_t * C_{t-1}+i_t * \hat{C}_t \\[2ex]
o_t=\sigma(W_o\otimes[h_{t-1},x_t]+b_o) \\[2ex]
h_t=o_t * tanh(C_t)
\end{array}
\end{equation}

注释：

这基本可以算是LSTM的标准表达式，很多文献都采用这种表达方式，这几个公式是按逻辑关系排列的。其中：

$\hat{C}_t$ 表示当前time-stamp的输入信息的融合。通常把当前time-stamp的输入信息用tanh激活标准化。

$i_t$ 表示输入门，从当前输入信息 $\hat{C}_t$ 中留下多少信息到cell。

$f_t$ 表示遗忘门，从旧的cell状态中遗忘掉一些信息，保留剩下的信息

$C_t$ 表示当前的time-stamp的更新后的cell状态

$o_t$ 表示输出门，从当前的cell状态选择哪些信息输出。

$h_t$ 当前time-stamp的输出隐状态

$\sigma$ 代表sigmoid激活函数

$tanh$ 代表tanh激活函数

$*$ 代表逐元素相乘 (pointwise multiplication)

$\otimes$ 代表 Hadamard product 积（矩阵相乘）

$[h_{t-1},x_t]$ 代表把 $h_{t-1}$ 与 $x_t$ 的concate.

说明：LSTM 共包含3个门，输入门 $i_t$ ，遗忘门 $f_t$ ，输出门 $o_t$ ，这三个门都是围绕cell状态发挥作用的。 $i_t$ 决定从当前输入中留下多少信息到cell， $f_t$ 决定从上一时刻的cell中忘记多少信息/保留多少信息， $o_t$ 决定从当前cell输出什么信息。 $h_t$ 是当前cell的输出。输入门，遗忘门，输出门都是逐元素相乘的操作。其余的特征操作是卷积操作。

convLSTM公式

\begin{equation}
\begin{array}{ll}
\hat{C}_t = tanh(W_C\odot[h_{t-1},x_t]+b_C) \\[2ex] % 输入信息： 记忆门--从融合信息i_t中提取信息，作用于输入信息
i_t=\sigma(W_i\odot[h_{t-1},x_t]+b_i) \\[2ex] % 记忆门，输入信息融合--融合h_{t-1},x_t的信息，或称融合门
f_t=\sigma(W_f\odot[h_{t-1},x_t]+b_f) \\[2ex] % 遗忘门--从cell中忘记东西，记忆与以遗忘都是针对cell的，作用于cell信息
C_t=f_t * C_{t-1}+i_t * \hat{C}_t \\[2ex] % cell 状态更新，包含两部分：从上一个cell忘记一些信息，从当前融合信息提取需要记忆的信息
o_t=\sigma(W_o\odot[h_{t-1},x_t]+b_o) \\[2ex] % 输出门--负责从更新后的cell提取输出信息
h_t=o_t * tanh(C_t) % 隐状态输出--利用当前的cell状态获取输出。所以cell式操作的核心。记忆，遗忘，输出对cell服务的
\end{array}
\end{equation}

\begin{equation}
\begin{array}{ll}
\hat{C}_t = tanh(W_C\odot[h_{t-1},x_t]+b_C) \\[2ex]
i_t=\sigma(W_i\odot[h_{t-1},x_t]+b_i) \\[2ex]
f_t=\sigma(W_f\odot[h_{t-1},x_t]+b_f) \\[2ex]
C_t=f_t * C_{t-1}+i_t * \hat{C}_t \\[2ex]
o_t=\sigma(W_o\odot[h_{t-1},x_t]+b_o) \\[2ex]
h_t=o_t * tanh(C_t)
\end{array}
\end{equation}

注释：

与LSTM的方程式相比，ConvLSTM的各个门的计算方式变了，由全连接的Hadamard积变成了卷积操作，减少了参数量，保存的输入数据的空间信息。

$\odot$ 代表卷积操作(convolutiaon operation)

注：卷积操作，Hadamard积与逐元素相乘，这三种操作的符号表示，在计算机视觉领域内似乎没有统一的标准/习惯，在公式表达上加上对应的说明即可。

$$h(\theta)=\sum_{j=0}^n \theta_jx_j$$

$$h(\theta)=\sum_{j=0}^n \theta_jx_j$$

$$J(\theta)=\frac1{2m}\sum_{i=0}(y^i-h_\theta(x^i))^2$$

$$J(\theta)=\frac1{2m}\sum_{i=0}(y^i-h_\theta(x^i))^2$$

$$
\frac{\partial J(\theta)}{\partial\theta_j}=
-\frac1m\sum_{i=0}^m(y^i-h_\theta(x^i))x^i_j 
$$

$$\frac{\partial J(\theta)}{\partial\theta_j} = -\frac1m\sum_{i=0}^m(y^i-h_\theta(x^i))x^i_j $$

$$
f(n) =
    \begin{cases}
    n/2,  & \text{if $n$ is even} \\
    3n+1, & \text{if $n$ is odd}
    \end{cases}
$$

$$
f(n) =
\begin{cases}
n/2, & \text{if $n$ is even} \\
3n+1, & \text{if $n$ is odd}
\end{cases}
$$

$$
\left\{ 
    \begin{array}{c}
        a_1x+b_1y+c_1z=d_1 \\ 
        a_2x+b_2y+c_2z=d_2 \\ 
        a_3x+b_3y+c_3z=d_3
    \end{array}
\right. 
$$

$$
\left\{
\begin{array}{c}
a_1x+b_1y+c_1z=d_1 \\
a_2x+b_2y+c_2z=d_2 \\
a_3x+b_3y+c_3z=d_3
\end{array}
\right.
$$

$$X=\left(
        \begin{matrix}
            x_{11} & x_{12} & \cdots & x_{1d}\\
            x_{21} & x_{22} & \cdots & x_{2d}\\
            \vdots & \vdots & \ddots & \vdots\\
            x_{m1} & x_{m2} & \cdots & x_{md}\\
        \end{matrix}
    \right)
    =\left(
         \begin{matrix}
                x_1^T \\
                x_2^T \\
                \vdots\\
                x_m^T \\
            \end{matrix}
    \right)
$$

$$X=\left(
\begin{matrix}
x_{11} & x_{12} & \cdots & x_{1d}\\
x_{21} & x_{22} & \cdots & x_{2d}\\
\vdots & \vdots & \ddots & \vdots\\
x_{m1} & x_{m2} & \cdots & x_{md}\\
\end{matrix}
\right)
=\left(
\begin{matrix}
x_1^T \\
x_2^T \\
\vdots\\
x_m^T \\
\end{matrix}
\right)
$$

$$
\begin{align}
\frac{\partial J(\theta)}{\partial\theta_j}
& = -\frac1m\sum_{i=0}^m(y^i-h_\theta(x^i)) \frac{\partial}{\partial\theta_j}(y^i-h_\theta(x^i)) \\
& = -\frac1m\sum_{i=0}^m(y^i-h_\theta(x^i)) \frac{\partial}{\partial\theta_j}(\sum_{j=0}^n\theta_jx_j^i-y^i) \\
& = -\frac1m\sum_{i=0}^m(y^i-h_\theta(x^i))x^i_j
\end{align}
$$

$$
\begin{align}
\frac{\partial J(\theta)}{\partial\theta_j}
& = -\frac1m\sum_{i=0}^m(y^i-h_\theta(x^i)) \frac{\partial}{\partial\theta_j}(y^i-h_\theta(x^i)) \\
& = -\frac1m\sum_{i=0}^m(y^i-h_\theta(x^i)) \frac{\partial}{\partial\theta_j}(\sum_{j=0}^n\theta_jx_j^i-y^i) \\
& = -\frac1m\sum_{i=0}^m(y^i-h_\theta(x^i))x^i_j
\end{align}
$$