论文原文:Spectrum Sharing in Vehicular Networks Based on Multi-Agent Reinforcement Learning
论文翻译 & 解读:[论文笔记]Spectrum Sharing in Vehicular Networks Based on Multi-Agent Reinforcement Learning
代码地址:https://github.com/le-liang/MARLspectrumSharingV2X
博客中用到的VISIO流程图(由博主个人绘制,有错误欢迎交流指教):https://download.csdn.net/download/m0_37495408/12353933
使用方法:
(来自原作者github的readme)
- 要训练多主体RL模型:main_marl_train.py + Environment_marl.py + replay_memory.py
- 要训练基准单一代理RL模型:main_sarl_train.py + Environment_marl.py + replay_memory.py
- 要在同一环境中测试所有模型:main_test.py + Environment_marl_test.py + replay_memory.py +'/ model'。
- 可以从运行“ main_test.py”直接复制本文中的图3和图4。通过“ Environment_marl_test.py”中的“ self.demand_size”更改V2V有效负载大小。
- 图5只能从培训期间的记录回报中获得。
- 图6-7显示了任意情节的表现(但随机基线失败且MARL传输成功)。实际上,大多数此类情节都表现出一些有趣的现象,表明了多主体合作。解释取决于读者。
- 不建议在“ main_marl_train.py”中使用“测试”模式。
基本类定义
在Environment_marl.py文件中,定义了架构的四个基本CLASS,分别是:V2Vchannels,V2Ichannels,Vehicle,Environ。其中Environ的方法(即函数)最多,Vehicle没有函数只有几个属性,其余两者各有两个方法(分别是计算路损和阴影衰落)。
Vehicle
初始化时需要传入三个参数:起始位置、起始方向、速度。函数内部将自己定义两个list:neighbors、destinations,分别存放邻居和V2V的通信端(这里两者在数值上相同,因为设定V2V的对象即为邻居)
class Vehicle:
# Vehicle simulator: include all the information for a vehicle
def __init__(self, start_position, start_direction, velocity):
self.position = start_position
self.direction = start_direction
self.velocity = velocity
self.neighbors = []
self.destinations = []
从下方的代码可见destionations的含义
def renew_neighbor(self): # 这个来自CLASS Env
""" Determine the neighbors of each vehicles """
for i in range(len(self.vehicles)):
self.vehicles[i].neighbors = []
self.vehicles[i].actions = []
z = np.array([[complex(c.position[0], c.position[1]) for c in self.vehicles]])
Distance = abs(z.T - z)
for i in range(len(self.vehicles)):
sort_idx = np.argsort(Distance[:, i])
for j in range(self.n_neighbor):
self.vehicles[i].neighbors.append(sort_idx[j + 1])
destination = self.vehicles[i].neighbors
self.vehicles[i].destinations = destination
V2Vchannels
内部参数z:这里将bs和ms的高度设置为1.5m,阴影的std为3,都是来自TR36 885-A.1.4-1;载波频率为2,单位为GHz;
class V2Vchannels:
# Simulator of the V2V Channels
def __init__(self):
self.t = 0
self.h_bs = 1.5
self.h_ms = 1.5
self.fc = 2
self.decorrelation_distance = 10
self.shadow_std = 3
包含两个方法:
计算路损
def get_path_loss(self, position_A, position_B):
d1 = abs(position_A[0] - position_B[0])
d2 = abs(position_A[1] - position_B[1])
d = math.hypot(d1, d2) + 0.001 # sqrt(x*x + y*y)
# 下一行定义有效BP距离
d_bp = 4 * (self.h_bs - 1) * (self.h_ms - 1) * self.fc * (10 ** 9) / (3 * 10 ** 8)
def PL_Los(d):
if d <= 3:
return 22.7 * np.log10(3) + 41 + 20 * np.log10(self.fc / 5)
else:
if d < d_bp:
return 22.7 * np.log10(d) + 41 + 20 * np.log10(self.fc / 5)
else:
return 40.0 * np.log10(d) + 9.45 - 17.3 * np.log10(self.h_bs) - 17.3 * np.log10(self.h_ms) + 2.7 * np.log10(self.fc / 5)
def PL_NLos(d_a, d_b):
n_j = max(2.8 - 0.0024 * d_b, 1.84)
return PL_Los(d_a) + 20 - 12.5 * n_j + 10 * n_j * np.log10(d_b) + 3 * np.log10(self.fc / 5)
if min(d1, d2) < 7:
PL = PL_Los(d)
else:
PL = min(PL_NLos(d1, d2), PL_NLos(d2, d1))
return PL # + self.shadow_std * np.random.normal()
说明:上述代码使用随机过程模型(见[2]-p328)。
路损使用曼哈顿网格布局LOS模型,即:
, for
以及:, for
上面的n_1=2.2、n_2=4.0,分别表示在BP之前和之后的率衰落常数,d'表示有效BP距离,代码中用d_bp表示。
曼哈顿网格布局NLOS模型:
代码后半出现的min函数,在[2]的p344页有描述,这是假设接收机位于垂直街道是对PL的估计方法。
代码中的公式出自IST-4-027756 WINNER II D1.1.2 V1.2 WINNER II
其中有如下表格,与代码中的参数完全符合:
更新阴影衰落
def get_shadowing(self, delta_distance, shadowing):
return np.exp(-1 * (delta_distance / self.decorrelation_distance)) * shadowing \
+ math.sqrt(1 - np.exp(-2 * (delta_distance / self.decorrelation_distance))) * np.random.normal(0, 3) # standard dev is 3 db
这个更新公式是出自文献[1]-A-1.4 Channel model表格后的部分,如下:
V2Ichannels
包含的两个方法和V2V相同,但是计算路损的时候不再区分Los了
def get_path_loss(self, position_A):
d1 = abs(position_A[0] - self.BS_position[0])
d2 = abs(position_A[1] - self.BS_position[1])
distance = math.hypot(d1, d2)
return 128.1 + 37.6 * np.log10(math.sqrt(distance ** 2 + (self.h_bs - self.h_ms) ** 2) / 1000) # + self.shadow_std * np.random.normal()
def get_shadowing(self, delta_distance, shadowing):
nVeh = len(shadowing)
self.R = np.sqrt(0.5 * np.ones([nVeh, nVeh]) + 0.5 * np.identity(nVeh))
return np.multiply(np.exp(-1 * (delta_distance / self.Decorrelation_distance)), shadowing) \
+ np.sqrt(1 - np.exp(-2 * (delta_distance / self.Decorrelation_distance))) * np.random.normal(0, 8, nVeh)
上面的两个方法均是文献[1]-Table A.1.4-2的内容和其后的说明,如下:
Environ

初始化需要传入4个list(为上下左右路口的位置数据):down_lane, up_lane, left_lane, right_lane;地图的宽和高;车辆数和邻居数。除以上所提外,内部含有好多参数,如下:
class Environ:
def __init__(self, down_lane, up_lane, left_lane, right_lane, width, height, n_veh, n_neighbor):
self.V2Vchannels = V2Vchannels()
self.V2Ichannels = V2Ichannels()
self.vehicles = []
self.demand = []
self.V2V_Shadowing = []
self.V2I_Shadowing = []
self.delta_distance = []
self.V2V_channels_abs = []
self.V2I_channels_abs = []
self.V2I_power_dB = 23 # dBm
self.V2V_power_dB_List = [23, 15, 5, -100] # the power levels
self.V2I_power = 10 ** (self.V2I_power_dB)
self.sig2_dB = -114
self.bsAntGain = 8
self.bsNoiseFigure = 5
self.vehAntGain = 3
self.vehNoiseFigure = 9
self.sig2 = 10 ** (self.sig2_dB / 10)
self.n_RB = n_veh
self.n_Veh = n_veh
self.n_neighbor = n_neighbor
self.time_fast = 0.001
self.time_slow = 0.1 # update slow fading/vehicle position every 100 ms
self.bandwidth = int(1e6) # bandwidth per RB, 1 MHz
# self.bandwidth = 1500
self.demand_size = int((4 * 190 + 300) * 8 * 2) # V2V payload: 1060 Bytes every 100 ms
# self.demand_size = 20
self.V2V_Interference_all = np.zeros((self.n_Veh, self.n_neighbor, self.n_RB)) + self.sig2
添加车:有两个方法:add_new_vehivles(需要传输起始坐标、方向、速度),add_new_vehicles_by_number(n)。后者比较有意思,只需要一个参数,n,但是并不是添加n辆车,而是4n辆车,上下左右方向各一台,位置是随机的。
更新车辆位置:renew_position(无)&#