【论文阅读】2021年9月6日

最新推荐文章于 2022-11-07 22:19:33 发布

夕阳下的奔跑517

最新推荐文章于 2022-11-07 22:19:33 发布

阅读量563

点赞数

分类专栏：文献摘要文章标签：拥塞控制深度强化学习

本文链接：https://blog.csdn.net/bajiaoyu517/article/details/120126080

版权

文献摘要专栏收录该内容

8 篇文章 0 订阅

订阅专栏

1 基本信息

标题：7 Self-learning Congestion Control of MPTCP in Satellites Communications
仿真器：ns-3 (TensorFlow for DDPG)
源码： (非)
https://github.com/JamesRaynor67/mptcp_with_machine_learning
https://github.com/kallen666/MPTCP-Deep-Reinforcement-Learning
会议：IEEE The International Wireless Communications and Mobile Computing Conference (IWCMC) 2019，B类
机构：北京邮电大学
亮点：应用于卫星通信，MpTCP

2 概述

要解决的问题是：
运用在LEO卫星通信的MPTCP的，对于multiple sub-flows的拥塞控制主要是manual process，在动态复杂网络环境中性能不好。
还要解决LEO卫星通信快速移动导致频繁handover问题。
提出的方法是
设计了拥塞控制机制，DRL应用于上述环境的拥塞控制，学习每个sub-flow的控制策略。
我能做的是
将RL-CC应用于水下电场通信
我还想要的资料是
DDPG算法教程代码，stable-baselines使用方法，MDP教程。

3 细节

agent
action: [cwnd_i] of each sub-flow
state：congestion window size, cwnd_t, round-trip time rtt_t, ACK number ack_t, and retransmissions rate rta_t of each sub-flow.

i: ith sub-flow
reward:
algorithm：DDPG
NN：a policy network with two connected hidden layers and a deep Q-valued network with three convolutional neural networks.
topology：6 nodes and 11 full-duplex links, source node possesses two IP address and the destination node posses one IP address.

MPTCP (Multipath TCP)
使用多条路径的传输机制，提高了吞吐量。即把一个数据流分为多个子流，在多条路径上传输。是TCP的扩展，把TCP分成了两部分：MPTCP层和TCP层（sub-flow layer）。MPTCP层使用各种函数管理下层的子流，例如path optimization, packet scheduling, and congestion control. 并且它是透明的，给应用层提供了TCP标准接口，封装隐藏了多路径的复杂性。

MPTCP协议栈：
IETF(The Internet Engineering Task Force)制定的MPTCP三大目标: improve throughput（用multi-path flows至少比single flow on the best path的吞吐量要高）, harmless（不能占用过多网络资源而对其他single path TCP flow有害） and balance congestion（避免在高拥塞路径上传输数据，从而保证前两个目标）.

LEO (low earth orbit) satellites communications and networking
低轨道卫星通信

优：low orbit (500-1500 km) and short range，high throughput and lower propagation delay、less energy to deploy (对比Medium Earth Orbit (MEO)和Geostationary Earth Orbit (GEO)).
缺：high moving speeds（每8-10 minutes切换一次），frequent handover（可能导致routing failures, channel quality changing, packet blocking，最终导致service performance degradation）

Satellite Communications with MPTCP
将MPTCP应用于LEO卫星网络不仅提升带宽，而且在卫星handover时smoothly shift traffic on the disconnected sub-flow to other flow.

在这里插入图片描述

MDP(Markov Decision Process)

请添加图片描述

Deep Deterministic Policy Gradient (DDPG)
传统MPTCP算法（对比用）

round-robin (RR): 每个sub-flow没有优先级，采取round-robin（循环）机制。
lowest RTT first RR (LRTT): 基于lowest RTT规划每个sub-flow

4 Writing

Introduction

In [9], Cao et al. proposed an approximate iterative algorithm with the ”Congestion Equality Principle” to solve the multipath congestion control.

In [1], Mai et al. proposed an deep reinforcement learning based congestion control mechanism in Multipath TCP transport protocol to improvement the performance of low earth orbit satellites communication and networking.

However, these algorithms largely relies on the manual
process, which has poor scalability and robustness in complex system control. Therefore, there is a need for more powerful methods to deal with the challenges faced in networking.

However, non of algorithm is designed for electrocommunication and networking environment. Therefore, there is a need for more powerful methods to deal with the challenges faced in networking.

Inspired by recent success of applying machine learning in other challenging domain, such as video game, autonomous vehicles. In this paper, we try to apply a deep reinforcement learning approach for optimizing the congestion control strategies to maximize throughput and guarantee fairness. In addition, some simulation results are presented to evaluate the correctness of our architecture and algorithm.

The rest of this paper is organized as followed. In Section
II, we present a new architecture of combining MPTCP with
satellite communications and formulate the problem of multipath congestion control in MPTCP. In Section III, we apply deep deterministic policy gradient algorithm for searching the optimal strategy of congestion control. In Section IV, we present a simulation result to demonstrate the performance of our architecture and algorithm, and summarize the work in Section V.

Simulation

These algorithms adopt pre-defined deterministic strategies which is hard to meet the complex network environment [18]. As shown in Fig. ??, based on the strong fitting ability of deep neural networks, our algorithms present a higher throughput than RR and LRTT algorithms.

Reference
[1] Mai, T. , Yao, H. , Jin, Y. , X Xu, & Ji, Z. . (2019). Self-learning Congestion Control of MPTCP in Satellites Communications. IWCMC 2019.