原文 演化博弈方法用于多智能体系统最优资源分配 -CSDN博客
https://ieeexplore.ieee.org/document/8243778/
问题描述
有2种资源分配给6个个体,2种资源的总量分别为
y
1
=
545
,
y
2
=
467
y_1=545,y_2=467
y1=545,y2=467,不同的个体收到不同的资源会产生不同的收益,分配的资源量均大于等于0。以第3个个体为例,它接受的资源为
x
⃗
3
=
[
x
31
,
x
32
]
T
\vec{x}_3=[x_{31},x_{32}]^\text{T}
x3=[x31,x32]T,两种资源的上限分别为
x
31
max
=
66
,
x
32
max
=
120
x_{31}^\text{max}=66,x_{32}^\text{max}=120
x31max=66,x32max=120。可行资源组合
R
3
=
x
⃗
3
\mathcal{R}_3=\vec{x}_3
R3=x3(这我没看懂
R
3
\mathcal{R}_3
R3 有什么用),收到2种资源的收益为
u
3
(
x
⃗
3
)
=
x
31
(
2
x
31
max
−
x
31
)
c
31
x
31
max
+
x
32
(
2
x
32
max
−
x
32
)
c
32
x
32
max
=
x
31
(
2
⋅
66
−
x
31
)
0.4
⋅
66
+
x
32
(
2
⋅
120
−
x
32
)
0.5
⋅
120
\begin{aligned} u_3(\vec{x}_3) =& \frac{x_{31}(2x_{31}^\text{max}-x_{31})} {c_{31}x_{31}^\text{max}} +\frac{x_{32}(2x_{32}^\text{max}-x_{32})} {c_{32}x_{32}^\text{max}} \\ =& \frac{x_{31}(2\cdot 66-x_{31})}{0.4\cdot 66} +\frac{x_{32}(2\cdot 120-x_{32})}{0.5\cdot 120} \\ \end{aligned}
u3(x3)==c31x31maxx31(2x31max−x31)+c32x32maxx32(2x32max−x32)0.4⋅66x31(2⋅66−x31)+0.5⋅120x32(2⋅120−x32)
需要求的是在
x
11
+
x
21
+
x
31
+
x
41
+
x
51
+
x
61
=
545
x
12
+
x
22
+
x
32
+
x
42
+
x
52
+
x
62
=
467
x_{11}+x_{21}+x_{31}+x_{41}+x_{51}+x_{61}=545 \\ x_{12}+x_{22}+x_{32}+x_{42}+x_{52}+x_{62}=467
x11+x21+x31+x41+x51+x61=545x12+x22+x32+x42+x52+x62=467
的约束条件下,求
u
1
+
u
2
+
u
3
+
u
4
+
u
5
+
u
6
u_1+u_2+u_3+u_4+u_5+u_6
u1+u2+u3+u4+u5+u6
的最大值。
一个证明的解读
原文中求和不变性的证明式(16)中
x
˙
31
=
x
31
(
p
31
x
21
+
x
41
y
1
−
x
21
p
21
+
x
41
p
41
y
1
)
p
31
=
∂
u
3
∂
x
31
=
2
(
x
31
max
−
x
31
)
c
31
x
31
max
\begin{aligned} \dot{x}_{31} =& x_{31}\left(p_{31}\frac{x_{21}+x_{41}}{y_1} -\frac{x_{21}p_{21}+x_{41}p_{41}}{y_1}\right) \\ p_{31} =& \frac{\partial u_3}{\partial x_{31}} =\frac{2(x_{31}^\text{max}-x_{31})}{c_{31}x_{31}^\text{max}} \end{aligned}
x˙31=p31=x31(p31y1x21+x41−y1x21p21+x41p41)∂x31∂u3=c31x31max2(x31max−x31)
假设对个体3和它的邻居2,包含这两个个体的式子为
x
˙
31
=
x
31
(
p
31
x
21
+
x
41
y
1
−
x
21
p
21
+
x
41
p
41
y
1
)
x
˙
21
=
x
21
(
p
21
x
11
+
x
31
y
1
−
x
11
p
11
+
x
31
p
31
y
1
)
\dot{x}_{31} = x_{31}\left(p_{31}\frac{x_{21}+x_{41}}{y_1} -\frac{x_{21}p_{21}+x_{41}p_{41}}{y_1}\right) \\ \dot{x}_{21} = x_{21}\left(p_{21}\frac{x_{11}+x_{31}}{y_1} -\frac{x_{11}p_{11}+x_{31}p_{31}}{y_1}\right) \\
x˙31=x31(p31y1x21+x41−y1x21p21+x41p41)x˙21=x21(p21y1x11+x31−y1x11p11+x31p31)
取出所有包含
x
21
x_{21}
x21 和
x
31
x_{31}
x31 的项就是式(16)中
(
x
31
p
31
x
21
−
x
31
x
21
p
21
)
+
(
x
21
p
21
x
31
−
x
21
x
31
p
31
)
=
0
(x_{31}p_{31}x_{21}-x_{31}x_{21}p_{21}) + (x_{21}p_{21}x_{31}-x_{21}x_{31}p_{31})=0
(x31p31x21−x31x21p21)+(x21p21x31−x21x31p31)=0
与拉格朗日乘子法比较
原论文中给出的例子的收益函数里两个资源的收益相互独立,因此本文只分析第一种资源的收益。除了原文介绍的演化博弈法以外,注意到这个问题就是一个约束优化问题。可以试一下拉格朗日乘子法。为便于说明只写3个变量,代码里完整给出。
U
=
x
1
(
2
x
m
1
−
x
1
)
c
1
x
m
1
+
x
2
(
2
x
m
2
−
x
2
)
c
2
x
m
2
+
x
3
(
2
x
m
3
−
x
3
)
c
3
x
m
3
U=\frac{x_1(2x_{m1}-x_1)}{c_1x_{m1}} +\frac{x_2(2x_{m2}-x_2)}{c_2x_{m2}} +\frac{x_3(2x_{m3}-x_3)}{c_3x_{m3}}
U=c1xm1x1(2xm1−x1)+c2xm2x2(2xm2−x2)+c3xm3x3(2xm3−x3)
∂
L
∂
x
i
=
x
m
i
−
x
i
k
i
+
λ
=
0
x
i
−
k
i
λ
=
x
m
i
\frac{\partial L}{\partial x_i}=\frac{x_{mi}-x_i}{k_i} + \lambda = 0 \\ x_i-k_i\lambda = x_{mi}
∂xi∂L=kixmi−xi+λ=0xi−kiλ=xmi
其中
k
i
=
c
i
x
m
i
2
k_i=\displaystyle\frac{c_ix_{mi}}{2}
ki=2cixmi。写成矩阵形式
[
1
k
1
1
k
2
1
k
3
1
1
1
0
]
[
x
1
x
2
x
3
λ
]
=
[
x
m
1
x
m
2
x
m
3
545
]
\left[\begin{matrix} 1 &&& k_1 \\ & 1 && k_2 \\ && 1 & k_3 \\ 1 & 1 & 1 & 0 \\ \end{matrix}\right] \left[\begin{matrix} x_1 \\ x_2 \\ x_3 \\ \lambda \end{matrix}\right] =\left[\begin{matrix} x_{m1} \\ x_{m2} \\ x_{m3} \\ 545 \end{matrix}\right]
111111k1k2k30
x1x2x3λ
=
xm1xm2xm3545
解得
X
=
[
167.604914004914
45.1985257985260
62.6270270270270
104.645700245700
93.6117936117936
71.3120393120392
0.255528255528253
]
X=\left[\begin{matrix} 167.604914004914 \\ 45.1985257985260 \\ 62.6270270270270 \\ 104.645700245700 \\ 93.6117936117936 \\ 71.3120393120392 \\ 0.255528255528253 \end{matrix}\right]
X=
167.60491400491445.198525798526062.6270270270270104.64570024570093.611793611793671.31203931203920.255528255528253
演化博弈法的仿真结果为
X
=
[
167.75139
45.21687
62.42745
104.58382
93.432434
71.588005
]
X=\left[\begin{matrix} 167.75139 \\ 45.21687 \\ 62.42745 \\ 104.58382 \\ 93.432434 \\ 71.588005 \\ \end{matrix}\right]
X=
167.7513945.2168762.42745104.5838293.43243471.588005
可见结果基本相同。
仿真
计算出的最大值是
3615.688
3615.688
3615.688,论文里没给出最大值的具体值但结果差不多。
参考链接
附代码
拉格朗日乘子法的matlab代码:
clear;clc;
Cij = [0.2;0.3;0.4;0.1;0.5;0.85];
Xmaxij = [172;47;66;106;100;80];
K = zeros(6,1);
for n = 1:6
K(n) = Cij(n)*Xmaxij(n)*0.5;
end
A = [eye(6), K; ones(1, 6), 0];
B = [Xmaxij; 545];
X = pinv(A)*B
演化博弈法的python代码:
import matplotlib.pyplot as plt
import numpy as np
def Marginal_Utility(i, j):
return 2*(Xmaxij[i, j] - Xij[i, j]) / (Cij[i, j]*Xmaxij[i, j])
def StateSum():
sum = 0
for n in range(Xij.shape[0]):
sum += Xij[n, 0]
for n in range(Xij.shape[0]):
sum += Xij[n, 1]
return sum
def Utility():
u = 0
for j in range(2):
for i in range(6):
u += Xij[i, j]*(2*Xmaxij[i, j] - Xij[i, j]) / Cij[i, j] / Xmaxij[i, j]
return u
Cij = np.array([
[0.2 , 0.85],
[0.3 , 0.4 ],
[0.4 , 0.5 ],
[0.1 , 0.2 ],
[0.5 , 0.4 ],
[0.85, 0.8 ],
], dtype=np.float32)
Xmaxij = np.array([
[172, 86 ],
[47 , 70 ],
[66 , 120],
[106, 45 ],
[100, 90 ],
[80 , 100],
], dtype=np.float32)
Xij = np.array([
[10, 10],
[10, 10],
[10, 10],
[10, 10],
[10, 10],
[10, 10],
], dtype=np.float32)
Y = np.array([545, 467])
sum = 0
for n in range(Xij.shape[0] - 1):
sum += Xij[n, 0]
Xij[-1, 0] = Y[0] - sum
sum = 0
for n in range(Xij.shape[0] - 1):
sum += Xij[n, 1]
Xij[-1, 1] = Y[1] - sum
Xnew = Xij.copy()
nvec = []
Uvec = []
Xvec = []
for i in range(6):
Xvec.append([])
for n in range(100):
for j in range(2):
for i in range(6):
Ineigh0 = (i+5) % 6
Ineigh1 = (i+1) % 6
f = Marginal_Utility(i, j) * (Xij[Ineigh0, j] + Xij[Ineigh1, j])
fbar = Xij[Ineigh0, j] * Marginal_Utility(Ineigh0, j)
fbar += Xij[Ineigh1, j] * Marginal_Utility(Ineigh1, j)
deltax = Xij[i, j] * (f - fbar) / Y[j]
Xnew[i, j] += deltax*0.1
Xij = Xnew.copy()
for i in range(6):
Xvec[i].append([Xij[i, 0]])
# print(StateSum())
nvec.append(n)
Uvec.append(Utility())
print(Uvec[-1])
# plt.plot(nvec, Uvec)
for i in range(6):
plt.plot(nvec, Xvec[i])
plt.legend([str(i) for i in range(6)])
plt.show()