[代码解读]基于多代理RL的车联网频谱分享_Python实现

最新推荐文章于 2023-06-10 22:51:23 发布

置顶

Love_marginal

最新推荐文章于 2023-06-10 22:51:23 发布

阅读量1.2w

点赞数 32

分类专栏：强化学习 V2X 自动驾驶文章标签：强化学习 V2X 频谱资源分配

本文链接：https://blog.csdn.net/m0_37495408/article/details/105587470

版权

论文原文：Spectrum Sharing in Vehicular Networks Based on Multi-Agent Reinforcement Learning

论文翻译 & 解读：[论文笔记]Spectrum Sharing in Vehicular Networks Based on Multi-Agent Reinforcement Learning

代码地址：https://github.com/le-liang/MARLspectrumSharingV2X

博客中用到的VISIO流程图（由博主个人绘制，有错误欢迎交流指教）：https://download.csdn.net/download/m0_37495408/12353933

使用方法：

（来自原作者github的readme）

要训练多主体RL模型：main_marl_train.py + Environment_marl.py + replay_memory.py
要训练基准单一代理RL模型：main_sarl_train.py + Environment_marl.py + replay_memory.py
要在同一环境中测试所有模型：main_test.py + Environment_marl_test.py + replay_memory.py +'/ model'。
- 可以从运行“ main_test.py”直接复制本文中的图3和图4。通过“ Environment_marl_test.py”中的“ self.demand_size”更改V2V有效负载大小。
- 图5只能从培训期间的记录回报中获得。
- 图6-7显示了任意情节的表现（但随机基线失败且MARL传输成功）。实际上，大多数此类情节都表现出一些有趣的现象，表明了多主体合作。解释取决于读者。
- 不建议在“ main_marl_train.py”中使用“测试”模式。

基本类定义

在Environment_marl.py文件中，定义了架构的四个基本CLASS，分别是：V2Vchannels，V2Ichannels，Vehicle，Environ。其中Environ的方法（即函数）最多，Vehicle没有函数只有几个属性，其余两者各有两个方法（分别是计算路损和阴影衰落）。

Vehicle

初始化时需要传入三个参数：起始位置、起始方向、速度。函数内部将自己定义两个list：neighbors、destinations，分别存放邻居和V2V的通信端（这里两者在数值上相同，因为设定V2V的对象即为邻居）

class Vehicle:
    # Vehicle simulator: include all the information for a vehicle

    def __init__(self, start_position, start_direction, velocity):
        self.position = start_position
        self.direction = start_direction
        self.velocity = velocity
        self.neighbors = []
        self.destinations = []

从下方的代码可见destionations的含义

    def renew_neighbor(self):  # 这个来自CLASS Env
        """ Determine the neighbors of each vehicles """

        for i in range(len(self.vehicles)):
            self.vehicles[i].neighbors = []
            self.vehicles[i].actions = []
        z = np.array([[complex(c.position[0], c.position[1]) for c in self.vehicles]])
        Distance = abs(z.T - z)

        for i in range(len(self.vehicles)):
            sort_idx = np.argsort(Distance[:, i])
            for j in range(self.n_neighbor):
                self.vehicles[i].neighbors.append(sort_idx[j + 1])
            destination = self.vehicles[i].neighbors

            self.vehicles[i].destinations = destination

V2Vchannels

内部参数z：这里将bs和ms的高度设置为1.5m，阴影的std为3，都是来自TR36 885-A.1.4-1；载波频率为2，单位为GHz；

class V2Vchannels:
    # Simulator of the V2V Channels

    def __init__(self):
        self.t = 0
        self.h_bs = 1.5
        self.h_ms = 1.5
        self.fc = 2
        self.decorrelation_distance = 10
        self.shadow_std = 3

包含两个方法：

计算路损

    def get_path_loss(self, position_A, position_B):
        d1 = abs(position_A[0] - position_B[0])
        d2 = abs(position_A[1] - position_B[1])
        d = math.hypot(d1, d2) + 0.001  # sqrt(x*x + y*y)
        # 下一行定义有效BP距离
        d_bp = 4 * (self.h_bs - 1) * (self.h_ms - 1) * self.fc * (10 ** 9) / (3 * 10 ** 8)

        def PL_Los(d):
            if d <= 3:
                return 22.7 * np.log10(3) + 41 + 20 * np.log10(self.fc / 5)
            else:
                if d < d_bp:
                    return 22.7 * np.log10(d) + 41 + 20 * np.log10(self.fc / 5)
                else:
                    return 40.0 * np.log10(d) + 9.45 - 17.3 * np.log10(self.h_bs) - 17.3 * np.log10(self.h_ms) + 2.7 * np.log10(self.fc / 5)

        def PL_NLos(d_a, d_b):
            n_j = max(2.8 - 0.0024 * d_b, 1.84)
            return PL_Los(d_a) + 20 - 12.5 * n_j + 10 * n_j * np.log10(d_b) + 3 * np.log10(self.fc / 5)

        if min(d1, d2) < 7:
            PL = PL_Los(d)
        else:
            PL = min(PL_NLos(d1, d2), PL_NLos(d2, d1))
        return PL  # + self.shadow_std * np.random.normal()

~~说明：上述代码使用随机过程模型（见[2]-p328）。~~

~~路损使用曼哈顿网格布局LOS模型，即：~~

~~$PL_{LOS}(d)_{|dB} = 10n_1 lg(d/1m)+28.0+20log(fc/1GHz)+PL_{I|dB}$ , for $10m <d<d_{BP}$~~

~~以及： $PL_{LOS}(d)_{|dB} = 10n_2 lg(d/d'_{BP})+PL_{LOS}(d'_{BP})_{|dB}$ , for $d'_{BP}<d<500m$~~

~~上面的n_1=2.2、n_2=4.0，分别表示在BP之前和之后的率衰落常数,d'表示有效BP距离，代码中用d_bp表示。~~

~~曼哈顿网格布局NLOS模型： $PL_{NLOS} = PL_{LOS}(d_1)_{|dB} + 17.9-12.5n_j+10n_jlg(d_2/1m)+3lg(f_c/1GHz)$ $+PL_{2|dB}$~~

~~代码后半出现的min函数，在[2]的p344页有描述，这是假设接收机位于垂直街道是对PL的估计方法。~~

代码中的公式出自IST-4-027756 WINNER II D1.1.2 V1.2 WINNER II

其中有如下表格，与代码中的参数完全符合：

更新阴影衰落

    def get_shadowing(self, delta_distance, shadowing):
        return np.exp(-1 * (delta_distance / self.decorrelation_distance)) * shadowing \
               + math.sqrt(1 - np.exp(-2 * (delta_distance / self.decorrelation_distance))) * np.random.normal(0, 3)  # standard dev is 3 db

这个更新公式是出自文献[1]-A-1.4 Channel model表格后的部分，如下：

V2Ichannels

包含的两个方法和V2V相同，但是计算路损的时候不再区分Los了

    def get_path_loss(self, position_A):
        d1 = abs(position_A[0] - self.BS_position[0])
        d2 = abs(position_A[1] - self.BS_position[1])
        distance = math.hypot(d1, d2)
        return 128.1 + 37.6 * np.log10(math.sqrt(distance ** 2 + (self.h_bs - self.h_ms) ** 2) / 1000) # + self.shadow_std * np.random.normal()

    def get_shadowing(self, delta_distance, shadowing):
        nVeh = len(shadowing)
        self.R = np.sqrt(0.5 * np.ones([nVeh, nVeh]) + 0.5 * np.identity(nVeh))
        return np.multiply(np.exp(-1 * (delta_distance / self.Decorrelation_distance)), shadowing) \
               + np.sqrt(1 - np.exp(-2 * (delta_distance / self.Decorrelation_distance))) * np.random.normal(0, 8, nVeh)

上面的两个方法均是文献[1]-Table A.1.4-2的内容和其后的说明，如下：

Environ

初始化需要传入4个list（为上下左右路口的位置数据）：down_lane, up_lane, left_lane, right_lane；地图的宽和高；车辆数和邻居数。除以上所提外，内部含有好多参数，如下：

class Environ:
    def __init__(self, down_lane, up_lane, left_lane, right_lane, width, height, n_veh, n_neighbor):
        self.V2Vchannels = V2Vchannels()
        self.V2Ichannels = V2Ichannels()
        self.vehicles = []

        self.demand = []
        self.V2V_Shadowing = []
        self.V2I_Shadowing = []
        self.delta_distance = []
        self.V2V_channels_abs = []
        self.V2I_channels_abs = []

        self.V2I_power_dB = 23  # dBm
        self.V2V_power_dB_List = [23, 15, 5, -100]  # the power levels
        self.V2I_power = 10 ** (self.V2I_power_dB)
        self.sig2_dB = -114
        self.bsAntGain = 8
        self.bsNoiseFigure = 5
        self.vehAntGain = 3
        self.vehNoiseFigure = 9
        self.sig2 = 10 ** (self.sig2_dB / 10)

        self.n_RB = n_veh
        self.n_Veh = n_veh
        self.n_neighbor = n_neighbor
        self.time_fast = 0.001
        self.time_slow = 0.1  # update slow fading/vehicle position every 100 ms
        self.bandwidth = int(1e6)  # bandwidth per RB, 1 MHz
        # self.bandwidth = 1500
        self.demand_size = int((4 * 190 + 300) * 8 * 2)  # V2V payload: 1060 Bytes every 100 ms
        # self.demand_size = 20

        self.V2V_Interference_all = np.zeros((self.n_Veh, self.n_neighbor, self.n_RB)) + self.sig2

添加车：有两个方法：add_new_vehivles(需要传输起始坐标、方向、速度)，add_new_vehicles_by_number（n）。后者比较有意思，只需要一个参数，n，但是并不是添加n辆车，而是4n辆车，上下左右方向各一台，位置是随机的。

更新车辆位置：renew_position(无)&#