《动手学强化学习》中遇到的python函数的笔记

小帅吖

已于 2022-07-11 17:20:56 修改

阅读量434

点赞数 1

分类专栏：深度强化学习代码实践文章标签： python numpy 机器学习深度学习人工智能

于 2022-06-24 23:40:48 首次发布

本文链接：https://blog.csdn.net/qq_47997583/article/details/125032566

版权

深度强化学习代码实践专栏收录该内容

20 篇文章 40 订阅

订阅专栏

文章目录

1. zip()

zip() 函数用于将可迭代的对象作为参数，将对象中对应的元素打包成一个个元组，然后返回由这些元组组成的对象，这样做的好处是节约了不少的内存。

我们可以使用 list() 转换来输出列表。

如果各个迭代器的元素个数不一致，则返回列表长度与最短的对象相同，利用 * 号操作符配合zip函数，可以将元组解压为列表。

>>> a = [1,2,3]
>>> b = [4,5,6]
>>> c = [4,5,6,7,8]
>>> zipped = zip(a,b)     # 返回一个对象
>>> zipped
<zip object at 0x103abc288>
>>> list(zipped)  # list() 转换为列表
[(1, 4), (2, 5), (3, 6)]
>>> list(zip(a,c))              # 元素个数与最短的列表一致
[(1, 4), (2, 5), (3, 6)]

>>> a1, a2 = zip(*zip(a,b))          # 与 zip 相反，zip(*) 可理解为解压，返回二维矩阵式
>>> list(a1)
[1, 2, 3]
>>> list(a2)
[4, 5, 6]
>>>

https://www.runoob.com/python3/python3-func-zip.html

2. np.random.random()

当无参数传入时返回一个0-1的随机数
当传入参数则返回shape为参数的0-1的随机数的数组
在这里插入图片描述

3. numpy.random.randint()

numpy.random.randint(low, high=None, size=None, dtype='l')

函数的作用是，返回一个随机整型数，范围从低（包括）到高（不包括），即[low, high)。
如果没有写参数high的值，则返回[0,low)的值。

>>> np.random.randint(2, size=10)
array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0])
>>> np.random.randint(1, size=10)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>>> np.random.randint(5, size=(2, 4))
array([[4, 0, 2, 1],
       [3, 2, 2, 0]])
>>>np.random.randint(2, high=10, size=(2,3))
array([[6, 8, 7],
       [2, 5, 2]])

https://blog.csdn.net/u011851421/article/details/83544853

4. gather()

gather函数的功能可以解释为根据 index 参数（即是索引）返回数组里面对应位置的值
这里的b.gather()写法和torch.gather(b)的写法都可以，重点是两个参数，dim和index

低维的理解方式
dim=0表示按行来索引，也就是说index的值表示的是第几行
dim=1表示按列来索引，也就是指index的值表示的是第几列

5. torch.distributions.Categorical()

probs = torch.FloatTensor([0.9,0.2])
ac = torch.distributions.Categorical(probs)
print(ac)
for _ in range(5):
	print(ac.sample())

在这里插入图片描述

其作用是创建以参数probs为标准的类别分布，样本是来自“0，…，K-1”的整数，K是probs参数的长度。也就是说，按照probs的概率，在相应的位置进行采样，采样返回的是该位置的整数索引。

再看一下在rl中依据策略网络选择动作：

 def take_action(self, state):  # 根据动作概率分布随机采样
        state = torch.tensor([state], dtype=torch.float).to(self.device) # 1*4
        probs = self.policy_net(state)  # 1*2
        action_dist = torch.distributions.Categorical(probs)
        action = action_dist.sample()
        return action.item()

在这里插入图片描述

6. np.vstack()和np.hstack()

两个拼接数组的方法：
np.vstack():在竖直方向上堆叠
np.hstack():在水平方向上平铺

import numpy as np
arr1=np.array([1,2,3])
arr2=np.array([4,5,6])
print np.vstack((arr1,arr2))
 
print np.hstack((arr1,arr2))
 
a1=np.array([[1,2],[3,4],[5,6]])
a2=np.array([[7,8],[9,10],[11,12]])
print a1
print a2
print np.hstack((a1,a2))

[[1 2 3]
 [4 5 6]]
[1 2 3 4 5 6]
[[1 2]
 [3 4]
 [5 6]]
[[ 7  8]
 [ 9 10]
 [11 12]]
[[ 1  2  7  8]
 [ 3  4  9 10]
 [ 5  6 11 12]]

7. np.clip()

numpy.clip(a, a_min, a_max, out=None)

参数说明：
a : 输入的数组
a_min: 限定的最小值也可以是数组如果为数组时 shape必须和a一样
a_max:限定的最大值也可以是数组 shape和a一样
out：剪裁后的数组存入的数组

>>> a = np.arange(10)
>>> np.clip(a, 1, 8)
array([1, 1, 2, 3, 4, 5, 6, 7, 8, 8]) # a被限制在1-8之间
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # 没改变a的原值

>>> np.clip(a, 3, 6, out=a) # 修剪后的数组存入到a中
array([3, 3, 3, 3, 4, 5, 6, 6, 6, 6])

8. numpy.random.randn()

numpy.random.randn(d0,d1,…,dn)

randn函数返回一个或一组样本，具有标准正态分布。
dn表格每个维度
返回值为指定维度的array

np.random.randn() # 当没有参数时，返回单个数据

-1.1241580894939212

np.random.randn(2,4)

array([[ 0.27795239, -2.57882503,  0.3817649 ,  1.42367345],
       [-1.16724625, -0.22408299,  0.63006614, -0.41714538]])

9. torch.distributions.normal()

pytorch的torch.distributions中可以定义正态分布
如下：

import torch
from torch.distributions import  Normal
mean=torch.Tensor([0,2])
normal=Normal(mean,1)

sample()就是直接在定义的正太分布（均值为mean，标准差std是１）上采样：

c=normal.sample()
print("c:",c)

c: tensor([-1.3362,  3.1730])

rsample()不是在定义的正太分布上采样，而是先对标准正太分布N(0,1)进行采样，然后输出：
mean+std×采样值

a=normal.rsample()
print("a:",a)

a: tensor([ 0.0530,  2.8396])

log_prob(value)是计算value在定义的正态分布（mean,1）中对应的概率的对数，正太分布概率密度函数是![在这里插入图片描述](https://img-blog.csdnimg.cn/237fc596b2c742029e04472608817659.png)

通过对数概率还原其对应的真实概率：

print("c log_prob:",normal.log_prob(c).exp())

c log_prob: tensor([ 0.1634,  0.2005])

参考：https://blog.csdn.net/geter_CS/article/details/90752582

小帅吖

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
《动手学强化学习》中遇到的python函数的笔记

1.zip()函数zip() 函数用于将可迭代的对象作为参数，将对象中对应的元素打包成一个个元组，然后返回由这些元组组成的对象，这样做的好处是节约了不少的内存。我们可以使用 list() 转换来输出列表。如果各个迭代器的元素个数不一致，则返回列表长度与最短的对象相同，利用 * 号操作符配合zip函数，可以将元组解压为列表。>>> a = [1,2,3]>>> b = [4,5,6]>>> c = [4,5,6,7,8]>>&gt
复制链接

扫一扫

专栏目录