受控自回归滑动平均模型 (Controled Auto Regression and Moving Average model, CARMA
),亦称带外部输入的自回归滑动平均模型 (Auto Regression and Moving Average model with eXogenous input, ARMAX
)是应用非常广泛的线性系统模型,本文介绍该模型的一种系统辨识方法:最大似然法。
系统模型
y
k
+
a
1
y
k
−
1
+
⋯
a
n
y
k
−
n
=
b
1
u
k
−
1
+
b
n
u
k
−
n
+
e
k
+
c
1
e
k
−
1
+
⋯
+
c
n
e
k
−
n
y_k + a_1y_{k-1} + \cdots a_n y_{k-n} = \\ b_1u_{k-1}+ b_n u_{k-n} + e_{k} + c_1 e_{k-1} + \cdots + c_n e_{k-n}
yk+a1yk−1+⋯anyk−n=b1uk−1+bnuk−n+ek+c1ek−1+⋯+cnek−n
即
y
k
=
−
∑
i
=
1
n
a
i
y
k
−
i
+
∑
i
=
1
n
b
i
u
k
−
i
+
∑
i
=
1
n
c
i
e
k
−
i
+
e
k
y_k = -\sum_{i=1}^n a_iy_{k-i} + \sum_{i=1}^n b_i u_{k-i}+ \sum_{i=1}^n c_i e_{k-i} + e_{k}
yk=−i=1∑naiyk−i+i=1∑nbiuk−i+i=1∑nciek−i+ek
该模型在 ARMA 的基础上考虑了外部输入
u
(
k
)
u(k)
u(k) 对输出的影响,为方便讨论,在此设定
a
,
b
,
c
a,b,c
a,b,c 的下标都是从
1
1
1 到
n
n
n,实际上不必如此。
似然函数
在数理统计学中,似然函数是一种关于统计模型中的参数的函数,表示模型参数中的似然性。
给定输出x时,关于参数θ的似然函数L(θ|x)(在数值上)等于给定参数θ后变量X的概率:L ( θ ∣ X ) = P ( X ∣ θ ) L(\theta | X) = P(X|\theta) L(θ∣X)=P(X∣θ)
在统计学中,“似然性”和“概率”有明确的区分。概率用于在已知一些参数的情况下,预测接下来的观测所得到的结果,而似然性则是用于在已知某些观测所得到的结果时,对有关事物的性质的参数进行估计。
假定误差
e
(
k
)
e(k)
e(k) 服从
0
0
0 均值、方差为
σ
2
\sigma^2
σ2 的高斯分布,则有似然函数为:
P
(
Y
N
∣
U
N
,
Θ
)
=
P
(
y
N
,
…
∣
u
N
,
…
,
Θ
)
=
P
(
y
0
)
Π
k
=
1
N
P
(
y
k
∣
Y
k
−
1
,
U
N
,
Θ
)
=
P
(
y
0
)
Π
k
=
1
N
P
(
e
k
)
=
P
(
y
0
)
(
2
π
)
−
N
/
2
σ
−
N
exp
[
−
(
1
/
2
σ
2
)
∑
k
=
1
N
e
k
2
]
(1)
\begin{array}{ll} P(Y_N | U_N,\Theta) &= P(y_N, \ldots | u_N, \ldots, \Theta) \\\\ &= P(y_0)\Pi _{k=1}^NP(y_k| Y_{k-1},U_N,\Theta) \\\\ &= P(y_0)\Pi _{k=1}^NP(e_k) \\\\ &= P(y_0)(2\pi)^{-N/2}\sigma^{-N} \exp \left[ -(1/2\sigma^2)\sum_{k=1}^N e_k^2\right] \tag{1} \end{array}
P(YN∣UN,Θ)=P(yN,…∣uN,…,Θ)=P(y0)Πk=1NP(yk∣Yk−1,UN,Θ)=P(y0)Πk=1NP(ek)=P(y0)(2π)−N/2σ−Nexp[−(1/2σ2)∑k=1Nek2](1)
其中
e
k
=
y
k
−
ϕ
i
Θ
ϕ
k
=
[
−
y
k
−
1
,
…
,
−
y
k
−
n
∣
u
k
−
1
,
…
,
u
k
−
n
∣
e
k
−
1
,
…
,
e
k
−
n
]
⊤
Θ
=
[
a
1
,
…
,
a
n
∣
b
i
,
…
,
b
n
∣
c
1
,
…
,
c
n
]
⊤
\begin{array}{ll} e_k &= y_k - \phi_i \Theta \\\\ \phi_k & = [-y_{k-1}, \ldots, -y_{k-n} | u_{k-1}, \ldots, u_{k-n} | e_{k-1}, \ldots, e_{k-n}]^\top \\\\ \Theta &= [a_1, \ldots, a_n | b_i, \ldots, b_n | c_1, \dots, c_n ]^\top \end{array}
ekϕkΘ=yk−ϕiΘ=[−yk−1,…,−yk−n∣uk−1,…,uk−n∣ek−1,…,ek−n]⊤=[a1,…,an∣bi,…,bn∣c1,…,cn]⊤
最大化似然(1)等价于最小化负对数似然(2)
J
(
σ
,
Θ
)
=
N
ln
σ
+
1
2
σ
2
∑
k
N
e
k
2
(2)
J(\sigma, \Theta) = N \ln \sigma + \frac{1}{2\sigma^2}\sum_k^N e_k^2 \tag{2}
J(σ,Θ)=Nlnσ+2σ21k∑Nek2(2)
辨识过程
在实际辨识过程中,由于参数 Θ \Theta Θ 未知,误差 e k e_k ek 不能精确获得,所以计算过程中用其估计值 ν k ≃ e k \nu_k \simeq e_k νk≃ek 来替代。
代价函数:
J
(
σ
ν
,
Θ
)
=
N
ln
σ
ν
+
1
2
σ
ν
2
∑
k
N
ν
k
2
(2)
J(\sigma_\nu, \Theta) = N \ln \sigma_{\nu} + \frac{1}{2\sigma_\nu^2}\sum_k^N \nu_k^2 \tag{2}
J(σν,Θ)=Nlnσν+2σν21k∑Nνk2(2)
- 采集 N N N 组数据,估计一个初始 Θ 0 \Theta_0 Θ0,比如先假定误差项系数 c i = 0 c_i=0 ci=0,用最小二乘法求解 a i , b i a_i, b_i ai,bi;
- k = 0 k = 0 k=0
- 固定
Θ
\Theta
Θ(即固定
ν
\nu
ν),更新
σ
ν
\sigma_\nu
σν:
σ ν = arg min σ ν J = ∑ k ν k 2 / N \sigma_\nu = \argmin_{\sigma_\nu} J = \sqrt{\sum_k \nu_k^2/N} σν=σνargminJ=k∑νk2/N - 固定
σ
ν
\sigma_\nu
σν, 更新
Θ
\Theta
Θ,即最小化
L = ∑ k ν k 2 L = \sum_k \nu_k^2 L=k∑νk2
采用牛顿法更新参数:
Θ t + 1 = Θ t − H − 1 ∇ Θ L \Theta_{t+1} = \Theta_t - H^{-1} \nabla_{\Theta} L Θt+1=Θt−H−1∇ΘL -
k
=
k
+
1
k = k +1
k=k+1,重复以上交替优化过程直到
σ t 2 − σ t − 1 2 σ t − 1 2 < 1 0 − 4 \frac{\sigma_t^2 - \sigma_{t-1}^2}{\sigma_{t-1}^2} < 10^{-4} σt−12σt2−σt−12<10−4
关于参数的梯度与海森矩阵
梯度
∂
L
∂
Θ
=
2
∑
ν
k
∂
ν
k
∂
Θ
(3)
\frac{\partial L}{\partial \Theta} = 2\sum \nu_k \frac{\partial \nu_k}{\partial \Theta} \tag{3}
∂Θ∂L=2∑νk∂Θ∂νk(3)
这里稍稍有一点复杂,因为:
ν
k
=
y
k
+
∑
i
=
1
n
a
i
y
k
−
i
−
∑
i
=
1
n
b
i
u
k
−
i
−
∑
i
=
1
n
c
i
ν
k
−
i
=
y
k
−
ϕ
k
⊤
Θ
\nu_k = y_k + \sum_{i=1}^n a_iy_{k-i} - \sum_{i=1}^n b_i u_{k-i} - \sum_{i=1}^n c_i \nu_{k-i} = y_k - \phi_k ^\top\Theta
νk=yk+i=1∑naiyk−i−i=1∑nbiuk−i−i=1∑nciνk−i=yk−ϕk⊤Θ
所以
∂
ν
k
∂
a
i
=
y
k
−
i
−
∑
l
=
1
n
c
l
∂
ν
k
−
l
∂
a
i
∂
ν
k
∂
b
i
=
−
u
k
−
i
−
∑
l
=
1
n
c
l
∂
ν
k
−
l
∂
b
i
∂
ν
k
∂
c
i
=
−
ν
k
−
i
−
∑
l
=
1
n
c
l
∂
ν
k
−
l
∂
c
i
\frac{\partial \nu_k}{\partial a_i} = y_{k-i} - \sum_{l=1}^n c_l \frac{\partial \nu_{k-l}}{\partial a_i} \\ \frac{\partial \nu_k}{\partial b_i} = -u_{k-i} - \sum_{l=1}^n c_l \frac{\partial \nu_{k-l}}{\partial b_i} \\ \frac{\partial \nu_k}{\partial c_i} = -\nu_{k-i}- \sum_{l=1}^n c_l \frac{\partial \nu_{k-l}}{\partial c_i}
∂ai∂νk=yk−i−l=1∑ncl∂ai∂νk−l∂bi∂νk=−uk−i−l=1∑ncl∂bi∂νk−l∂ci∂νk=−νk−i−l=1∑ncl∂ci∂νk−l
可以合并成
∂
ν
k
∂
Θ
=
−
ϕ
k
⊤
−
∑
l
=
1
n
c
l
∂
ν
k
−
l
∂
Θ
\frac{\partial \nu_k}{\partial \Theta} = -\phi_{k}^\top - \sum_{l=1}^n c_l \frac{\partial \nu_{k-l}}{\partial \Theta}
∂Θ∂νk=−ϕk⊤−l=1∑ncl∂Θ∂νk−l
所以
∂
L
/
∂
Θ
\partial L/\partial \Theta
∂L/∂Θ 已求得!
海森矩阵
假定
∂
ν
k
/
∂
Θ
\partial \nu_k/\partial \Theta
∂νk/∂Θ 为行向量
H
=
∂
2
L
∂
Θ
2
=
∂
∂
Θ
(
2
∑
ν
k
∂
ν
k
∂
Θ
)
=
2
(
∑
k
(
∂
ν
k
∂
Θ
)
⊤
(
∂
ν
k
∂
Θ
)
+
ν
k
∂
2
ν
k
∂
Θ
2
)
H = \frac{\partial ^2L}{\partial \Theta^2} = \frac{\partial }{\partial \Theta}\left(2\sum \nu_k \frac{\partial \nu_k}{\partial \Theta} \right) \\ = 2\left( \sum_k \left(\frac{\partial \nu_k}{\partial \Theta}\right)^\top \left( \frac{\partial \nu_k}{\partial \Theta}\right) + \nu_k \frac{\partial^2 \nu_k}{\partial \Theta^2}\right)
H=∂Θ2∂2L=∂Θ∂(2∑νk∂Θ∂νk)=2(k∑(∂Θ∂νk)⊤(∂Θ∂νk)+νk∂Θ2∂2νk)
主要需要考虑
∂
2
ν
/
∂
Θ
2
\partial^2 \nu/ \partial \Theta^2
∂2ν/∂Θ2:
∂
2
ν
k
∂
a
i
∂
a
j
=
−
∑
l
=
1
n
c
l
∂
2
ν
k
−
l
∂
a
i
∂
a
j
∂
2
ν
k
∂
a
i
∂
b
j
=
−
∑
l
=
1
n
c
l
∂
2
ν
k
−
l
∂
a
i
∂
b
j
∂
2
ν
k
∂
a
i
∂
c
j
=
−
∂
ν
k
−
j
∂
a
i
−
∑
l
=
1
n
c
l
∂
2
ν
k
−
l
∂
a
i
∂
c
j
∂
2
ν
k
∂
b
i
∂
c
j
=
−
∂
ν
k
−
j
∂
b
i
−
∑
l
=
1
n
c
l
∂
2
ν
k
−
l
∂
a
i
∂
c
j
∂
2
ν
k
∂
c
i
∂
c
j
=
−
2
∂
ν
k
−
j
∂
c
i
−
∑
l
=
1
n
c
l
∂
2
ν
k
−
l
∂
c
i
∂
c
j
\frac{\partial^2 \nu_k}{ \partial a_i \partial a_j} = - \sum_{l=1}^n c_l \frac{\partial^2 \nu_{k-l}}{\partial a_i\partial a_j} \\ \frac{\partial^2 \nu_k}{ \partial a_i \partial b_j} = - \sum_{l=1}^n c_l \frac{\partial^2 \nu_{k-l}}{\partial a_i\partial b_j} \\ \frac{\partial^2 \nu_k}{ \partial a_i \partial c_j} = -\frac{\partial \nu_{k-j}}{\partial a_i} - \sum_{l=1}^n c_l \frac{\partial^2 \nu_{k-l}}{\partial a_i\partial c_j} \\ \frac{\partial^2 \nu_k}{ \partial b_i \partial c_j} = -\frac{\partial \nu_{k-j}}{\partial b_i} - \sum_{l=1}^n c_l \frac{\partial^2 \nu_{k-l}}{\partial a_i\partial c_j} \\ \frac{\partial^2 \nu_k}{ \partial c_i \partial c_j} = -2\frac{\partial \nu_{k-j}}{\partial c_i} - \sum_{l=1}^n c_l \frac{\partial^2 \nu_{k-l}}{\partial c_i\partial c_j} \\
∂ai∂aj∂2νk=−l=1∑ncl∂ai∂aj∂2νk−l∂ai∂bj∂2νk=−l=1∑ncl∂ai∂bj∂2νk−l∂ai∂cj∂2νk=−∂ai∂νk−j−l=1∑ncl∂ai∂cj∂2νk−l∂bi∂cj∂2νk=−∂bi∂νk−j−l=1∑ncl∂ai∂cj∂2νk−l∂ci∂cj∂2νk=−2∂ci∂νk−j−l=1∑ncl∂ci∂cj∂2νk−l
写成矩阵形式:
H
k
=
−
∑
l
=
1
n
c
l
H
k
−
l
−
G
k
−
G
k
⊤
G
k
=
[
0
…
0
∣
0
…
0
∣
(
∂
ν
k
−
1
∂
Θ
)
⊤
…
(
∂
ν
k
−
n
∂
Θ
)
⊤
]
3
n
×
3
n
H_k = -\sum_{l=1}^n c_lH_{k-l} - G_k -G_k^\top \\ G_k = \left[ 0 \ldots 0 \bigg| 0 \ldots 0 \bigg| \left(\frac{\partial \nu_{k-1}}{\partial \Theta}\right)^\top \ldots \left(\frac{\partial \nu_{k-n}}{\partial \Theta}\right)^\top \right]_{3n\times 3n}
Hk=−l=1∑nclHk−l−Gk−Gk⊤Gk=[0…0∣∣∣∣0…0∣∣∣∣(∂Θ∂νk−1)⊤…(∂Θ∂νk−n)⊤]3n×3n
再次强调一下:
∂
ν
k
/
∂
Θ
\partial \nu_{k}/\partial \Theta
∂νk/∂Θ 是行向量,因为
Θ
\Theta
Θ 是列向量!
到此,完整的计算过程应该心中有数了吧!