记AAMAS评审意见

一共三位评审给出了意见,心里在滴血,感觉亲生孩子被上了十大酷刑。
第一位给出的意见,很狠,但是真心服气(宝宝委屈)
在这里插入图片描述在这里插入图片描述第二位,正常
在这里插入图片描述第三位,感觉没接触这个方向
在这里插入图片描述

我的回复:
回复3:
Dear reviewer:
I am very grateful to your comments for the manuscript. According to your advice, we recognize the shortcomings in the manuscript and will try our best to improve the paper. Some of your questions were answered below:
The use of deep cyclical phased actor-critic is due to the large scale state and action space of continuous control tasks. The target critic network and the replay buffer are the same mechanisms in the paper [16] and equation 12 is also from it. The target critic is used in the calculation of critic loss, in equation 12 and 13. Since PACEE is off-policy, replay buffers are used to store experiences and to break sample correlation to some extent by random sampling. The computation of critic loss is same to DQN, and we use deterministic policy gradients to compute the loss of actor. Compared to PACEE, the actors of C-PACEE work cyclically which is their only difference. So we only show PACEE algorithm. The time complexity of the algorithm is O(n^2), and it will consume some memory resources due to the use of replay buffer and deep neural network. The average time for an episode is about 0.8053 in testing and each Mujoco environment requires an average of approximately 8 hours of training. The CPU of our machine is Inter® Core™ i7-7770. TRPO, PPO, DDPG are the methods in reference [23], [24], [16] respectively and Ant is a continues control task in Mujoco.
回复2:
Dear reviewer:
I am very grateful to your comments for the manuscript. According to your advice, we recognize the shortcomings in the manuscript and will try our best to improve the paper. Some of your questions were answered below:

  1. We think that \xi is in (0,1) . And we found that if \xi is greater than 0.5, the generated experiences are dominated by experience network which is not conducive to actors learning. So we turn the parameter down so that the actors can dominate the generated experiences and finally found that 1e-5 is a good value.
  2. Yes
    回复1:
    Dear reviewer:
    I am very grateful to your comments for the manuscript. According to your advice, we recognize the shortcomings in the manuscript and will try our best to improve the paper. Some of your questions were answered below:
  3. For environments like HalfCheetah, HumanoidStandup, Ant, and Swimmer, since each episode is 1000 time steps, they are all 1000 runs. Like Hopper, Walker2d has an episode length of less than 1000, so they all exceed 1000runs, but all trained one million time steps.
  4. Yes, From the second experiment, we can see that both Reacher and InvertedPendulum converged in one million time steps.
  5. Table 1 combines with the experimental result images to illustrate the advantages of our approaches.
好的,关于多机器人系统的应用,以下是一些例子: 1. 目标检测和跟踪:多机器人系统可用于对目标进行检测和追踪。例如,在[1]中,研究人员提出了一种基于异质MRS的目标跟踪算法,通过使用具有不同传感器和处理能力的机器人,实现了对目标的高效跟踪。 2. 环境感知和地图生成:MRS也可用于进行环境感知和地图生成。例如,在[2]中,研究人员开发了一种自主异构MRS,用于在未知环境中进行地图制作。通过使用不同类型的机器人,包括地面和空中机器人,实现了高效且准确的地图生成。 3. 协作运输和队形控制:多机器人系统还可用于进行协作运输和队形控制。例如,在[3]中,研究人员提出了一种基于异质MRS的协作运输算法,通过使用具有不同运输能力的机器人,实现了高效的协作运输。 4. 勘探和地图制作:MRS在勘探和地图制作任务中也有广泛的应用。例如,在[4]中,研究人员开发了一种自主异构MRS,用于进行海底勘探和地图制作。通过使用不同类型的机器人,包括水下机器人和地面机器人,实现了高效且准确的勘探和地图制作。 引用的参考文献: [1] A. P. Delgado et al., "Heterogeneous multirobot system for efficient object tracking," in Robotics and Autonomous Systems, vol. 76, pp. 27-37, 2016. [2] Y. Zhang et al., "Autonomous heterogeneous multi-robot system for mapping in unknown environments," in Autonomous Robots, vol. 43, no. 1, pp. 177-192, 2019. [3] Y. Liu et al., "Heterogeneous multi-robot system for cooperative transport," in Robotics Science and Systems, 2018. [4] Y. Zhang et al., "Autonomous underwater and ground multi-robot system for seafloor exploration and map construction," in Journal of Field Robotics, vol. 38, no. 5, pp. 359-375, 2021.
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值