sinat_38316070-CSDN博客

原创使用ray库在gpu上训练，cpu上评估报错

记录RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device(‘cpu’) to map your storages to the CPU.解决：在torch/st

2020-08-19 11:14:32 461

原创 macos 运行 gym atari 报错

# MacOS:$ brew install cmake boost boost-python sdl2 swig wget

2020-08-17 21:58:50 352

原创 E: 无法获得锁 /var/lib/dpkg/lock-frontend - open (11: 资源暂时不可用)

sudo rm /var/lib/dpkg/lock-frontend

2020-05-14 16:23:13 172

原创 pip install 超时问题

1 换源2 pip --default-timeout=100 install xxx

2020-05-06 20:45:38 282 1

原创 tensorboard报错 Fatal error in launcher: Unable to create process using

解决方法：python -m tensorboard.main --logdir=（路径）

2020-04-09 19:51:42 2576 6

原创 screen命令

备忘一下screen -S Name 创建新窗口screen -ls 列出窗口12345.Namescreen -r 12345 进入窗口Ctrl+ 先A后D 回到主窗口，后台运行12345

2020-04-09 17:09:22 104

转载 Ubuntu-Docker拉取镜像加速

地址：https://cr.console.aliyun.com/undefined/instances/mirrors左下方“镜像加速器”

2020-03-27 15:10:09 604

原创 notes4-- 深度强化学习入门到熟练（Shusen Wang）

Actor-Critic Methods固定S，更新策略网络的θ\thetaθ，使得V的平均值增加。策略网络π\piπ相当于体操运动员，裁判相当于价值网络q，运动员努力提高技术使得裁判打分更高，裁判打的分数就是监督信号，运动员靠裁判打的分数提高自己技术。更新价值网络的w，是为了让q的打分更精准。裁判相当于价值网络q，一开始是随机初始化，裁判没有判断能力。裁判会逐渐改变自己的水平，使自...

2020-03-24 21:12:21 189

原创 notes3-- 深度强化学习入门到熟练（Shusen Wang）

Policy-Based RL用神经网络近似策略函数π\piπ回顾 State-Value Function Approximation使用策略网络π(a∣st;θ)\pi(a|s_{t};\theta)π(a∣st;θ)代替策略函数π(a∣st)\pi(a|s_{t})π(a∣st)对SSS求期望，消去SSS，只有θ\thetaθ策略网络越好，J(θ)J(\theta)J(θ)...

2020-03-24 19:31:55 252

原创 notes2-- 深度强化学习入门到熟练（Shusen Wang）

Value-based Learning 价值学习回顾Deep Q network (DQN)使用神经网络近似Q∗Q^{*}Q∗ 函数 Approximate the Q FuncitionDQN in Super MarioTemporal difference(TD)TD learning for DQN Summary...

2020-03-24 18:04:05 806

原创 notes1-- 深度强化学习入门到熟练（Shusen Wang）

个人觉得讲的蛮好的入门资料，用笔记总结一下。基本概念（1）随机变量 Random Variable大写字母：随机变量小写字母：观测值(2) Randomness in RL:Actions have randomness.States transitions have randomness.（3）ReturnReturn UtU_{t}Ut: cumulative...

2020-03-24 13:58:37 416

原创深度强化学习中的奖励稀疏问题

DRL Lecture 7 – Sparse Reward – notes – Hung-yi Lee深度强化学习中的奖励稀疏To solve sparse reward problems, three directions:1. Reward Shaping环境有真正的reward，但自己设计额外的rewardEx: for a child:Take “Play”, rt+1...

2020-03-24 11:40:19 1464

原创 ML lecture 2 - Where does the error come from-2019.8.5

Bias v.s. Variance在这里插入图片描述

2019-08-05 19:31:36 171

跟着实现的微信小程序.zip

首页、分类页面、个人中心页面、商品列表页面；查找商品功能、加入购物车功能、收藏功能，有支付页面但无支付接口。

2020-05-12

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人