【深度强化学习】常常使用的pytorch代码


前言

最近又看了一遍《深度强化学习》,和TD3的代码,觉得市面上好多代码写的非常绚丽,但表达的意思,实际的操作确实同一个,再此总结一下这些常见的代码的含义。
顺便自己构建一个比较简单易懂的强化学习算法供自己使用。(暂时只搭建了部分,欢迎star)
参考了很多人写的代码,这里先不列举了。

torch版本:2.3.1+cu121
python版本:3.11.9

参考:设计深度强化学习库的思想

tensor张量

官方文档

基于这样的一个事实,一般环境给出的状态变量的类型为np.float32。
我们需要考虑从numpy数组转换成tensor张量,和张量转换成np数组的最快方式。

【从np.array->tensor张量的最优解】

torch.as_tensor(data_numpy,dtype=torch.float32)

torch.tensor(当数据为浮点时)默认创建torch.float32类型的张量,且可以指定张量类型,可读性较高。
但是:
在这里插入图片描述
使用torch.tensor 会创建一个新的张量,因此会占用更多的内存,torch.as_tensor则会尽可能共享内存,从而实现最快。

## torch.tensor 创建张量

import torch
import numpy as np
# 使用 torch.tensor 创建 float32 类型的张量
tensor = torch.tensor(np.array([1.0, 2.0, 3.0]), dtype=torch.float32)
print(tensor.dtype)

tensor_1d = torch.FloatTensor([1.0, 2.0, 3.0])
print(tensor_1d.dtype)

tensor_as = torch.as_tensor([1.0, 2.0, 3.0], dtype=torch.float32)
print(tensor_as.dtype)

'''
torch.float32
torch.float32
torch.float32
'''

torch.tensor 与torch.as_tensor的区别

data_numpy = np.array([1, 2, 3])
tensor_from_numpy = torch.tensor(data_numpy) ## 当数据不为浮点时,会自动转换为整型

# 使用 torch.as_tensor
tensor_from_numpy_as = torch.as_tensor(data_numpy)

# 检查内存共享
data_numpy[0] = 10
print(tensor_from_numpy)  # 输出: tensor([1, 2, 3])
print(tensor_from_numpy_as)  # 输出: tensor([10, 2, 3], dtype=torch.int32)

'''
tensor([1, 2, 3], dtype=torch.int32)
tensor([10,  2,  3], dtype=torch.int32)
'''

但是我们实际上确实只需要一份数据,所以这里我们可以采用如下形式

torch.as_tensor(data_numpy,dtype=torch.float32)

实际上在(elegentRL)小雅中认为去掉dtype=torch.float32是最快的,但是在它的库中,实际用的是上述方法,以增加可读性。

【从tensor张量->np.array的最优解】

tensor.detach().cpu().numpy()

这里直接借鉴elegentRL的形式,实际其他代码也有这种写法,下面这种方法最快。

print(tensor.detach().cpu().numpy().dtype)  # 输出: float32

tensor.detach().cpu().numpy()
不能用 data,因为这个很旧,功能已被 .detach() 替代
detach() 不让PyTorch框架去追踪张量的梯度,所以在放在最前
cpu() 把张量从GPU显存中传输到CPU内存 numpy() 把张量tensor变成数组array

detach

tensor.detach()

detach 用于使得此张量不参与梯度的运算,一般用于目标网络。
目标网路有一个特点,没有优化器给它。

以下三种等效

tensor = net(tensor.detach())   
tensor = net(tensor).detach()  
with  torch.no_grad():
    tensor = net(tensor)

这里我会选择第二种,看起来可读性更高,更灵活

调整张量维度reshape、view

reshape、view、transpose、t、permute、faltten、squeeze、unsqueeze

reshape:重塑,无限制,先判断是否连续,会自动调用.contiguous()方法
view:视图调整 ,限制:必须为连续张量
transpose: 转置,限制:必须接受两个参数,交换这两个参数的位置
t:二维转置,限制:必须是二维张量
permute:置换,无限制,根据给定的维度顺序重排张量的维度
flatten:弄平,无限制,压成一维
squeeze:挤压,接受一个参数,去除指定位置 或无参数,去除所有1维维度
unsqueeze:松开 ,限制:必须接受一个参数,在指定位置下加一个一维维度

二维的张量调整

## 调整张量维度
import torch

print('--两维度调整--')
torch_tensor = torch.rand(6, 4) # batch_size X  feature_size  6x4
# reshape 
new_shape_0 = torch_tensor.reshape(3, -1) # 6x4 -> 3x8
new_shape_1 = torch.reshape(torch_tensor, (3, -1)) # 6x4 -> 3x8
print('reshape:',new_shape_0.shape,new_shape_1.shape)  # 输出: torch.Size([3, 8])
# view , view_as
new_shape_0 = torch_tensor.view(3, -1) # 6x4 -> 3x8
new_shape_1 = torch_tensor.view_as(torch.Tensor(3, 8)) # 6x4 -> 3x8  #torch.Tensor(3, 8)为创建一个3x8的张量
print('view:',new_shape_0.shape,'view_as:',new_shape_1.shape)  # 输出: torch.Size([3, 8])
# transpose 只能接受两个参数
new_shape_0 = torch_tensor.transpose(0, 1) # 6x4 -> 4x6
new_shape_1 = torch.transpose(torch_tensor, 0, 1) # 6x4 -> 4x6
print('transpose:',new_shape_0.shape,new_shape_1.shape)  # 输出: torch.Size([4, 6])
# t 只能用于两维
new_shape_0 = torch_tensor.t() # 6x4 -> 4x6
new_shape_1 = torch.t(torch_tensor) # 6x4 -> 4x6
print('t:',new_shape_0.shape,new_shape_1.shape)  # 输出: torch.Size([4, 6])
# permute 改变视图
new_shape_0 = torch_tensor.permute(1, 0) # 6x4 -> 4x6
new_shape_1 = torch.permute(torch_tensor, (1, 0)) # 6x4 -> 4x6
print('permute:',new_shape_0.shape,new_shape_1.shape)  # 输出: torch.Size([4, 6])
# flatten 返回新的张量
new_shape_0 = torch_tensor.flatten() # 6x4 -> 24
new_shape_1 = torch.flatten(torch_tensor) # 6x4 -> 24
print('flatten:',new_shape_0.shape,new_shape_1.shape)  # 输出: torch.Size([24])
# unsqueeze
new_shape_0 = torch_tensor.unsqueeze(0) # 6x4 -> 1x6x4
new_shape_1 = torch.unsqueeze(torch_tensor, 1) # 6x4 -> 6x1x4
print('unsqueeze:',new_shape_0.shape,new_shape_1.shape)  # 输出: torch.Size([1, 6, 4]), torch.Size([6, 1, 4])
# squeeze
new_shape_2 = new_shape_0.squeeze(0) == new_shape_0.squeeze()# 1x6x4 -> 6x4
new_shape_3 = torch.squeeze(new_shape_1) ==  new_shape_1.squeeze(1) # 6x1x4 -> 6x4
print('squeeze:',new_shape_2.shape,new_shape_3.shape)  # 输出: torch.Size([6, 4]), torch.Size([6, 4])

new_tensor = torch.rand(1,1,1)
print(new_tensor,new_tensor.squeeze().shape)  # 输出: torch.Size([])

# 尝试对转置后的张量使用 view 函数 --> t(),transpose(),permute()会导致张量不连续 reshape(),flatten(),squeeze(),unsqueeze() 则不会
try:
    #tensor_t_view = torch_tensor.t().view(3, 8)
    #tensor_t_view = torch_tensor.transpose(0,1).view(3, -1)
    tensor_t_view = torch_tensor.permute(1, 0).view(3, -1)
except RuntimeError as e:
    print("\nError when using view on non-contiguous tensor:")
    print(e)
tensor_t_contiguous = torch_tensor.t().contiguous().view(3, 8)
print(tensor_t_contiguous.shape)  # 输出: torch.Size([3, 8])

三维张量调整

print('--三维度调整--')
torch_tensor = torch.rand(6, 3, 4) # batch_size X seq_len X feature_size  6x3x4
# reshape
new_shape_0 = torch_tensor.reshape(2, 2, -1) # 6x3x4 -> 2x2x18
new_shape_1 = torch.reshape(torch_tensor, (2, 3, -1)) # 6x3x4 -> 2x3x12
print('reshape:',new_shape_0.shape,new_shape_1.shape)  # 输出: torch.Size([2, 2, 18]), torch.Size([2, 3, 8])
# view , view_as
new_shape_0 = torch_tensor.view(2, 2, -1) # 6x3x4 -> 2x2x18
new_shape_1 = torch_tensor.view_as(torch.Tensor(2, 3, 12)) # 6x3x4 -> 2x3x12  #torch.Tensor(2, 3, 8)为创建一个2x3x8的张量
print('view:',new_shape_0.shape,'view_as:',new_shape_1.shape)  # 输出: torch.Size([2, 2, 18]), torch.Size([2, 3, 8])
# transpose
new_shape_0 = torch_tensor.transpose(0, 2) # 6x3x4 -> 4x3x6
new_shape_1 = torch.transpose(torch_tensor, 0, 2) # 6x3x4 -> 4x3x6
print('transpose:',new_shape_0.shape,new_shape_1.shape)  # 输出: torch.Size([4, 3, 6]), torch.Size([4, 3, 6])
# permute
new_shape_0 = torch_tensor.permute(2, 1, 0) # 6x3x4 -> 4x3x6
new_shape_1 = torch.permute(torch_tensor, (2, 1, 0)) # 6x3x4 -> 4x3x6
print('permute:',new_shape_0.shape,new_shape_1.shape)  # 输出: torch.Size([4, 3, 6]), torch.Size([4, 3, 6])
# flatten
new_shape_0 = torch_tensor.flatten() # 6x3x4 -> 72
new_shape_1 = torch.flatten(torch_tensor) # 6x3x4 -> 72
print('flatten:',new_shape_0.shape,new_shape_1.shape)  # 输出: torch.Size([72])
# unsqueeze
new_shape_0 = torch_tensor.unsqueeze(0) # 6x3x4 -> 1x6x3x4
new_shape_1 = torch.unsqueeze(torch_tensor, 1) # 6x3x4 -> 6x1x3x4
print('unsqueeze:',new_shape_0.shape,new_shape_1.shape)  # 输出: torch.Size([1, 6, 3, 4]), torch.Size([6, 1, 3, 4])
# squeeze
new_shape_2 = new_shape_0.squeeze(0) == new_shape_0.squeeze()# 1x6x3x4 -> 6x3x4
new_shape_3 = torch.squeeze(new_shape_1) ==  new_shape_1.squeeze(1) # 6x1x3x4 -> 6x3x4
print('squeeze:',new_shape_2.shape,new_shape_3.shape)  # 输出: torch.Size([6, 3, 4]), torch.Size([6, 3, 4])

综上比较:

reshape比view,view_as 更灵活,可以处理非连续的张量,可读性更高,且在性能上几乎一样。
permute vs transpose,t 更灵活,可以处理多个维度

reshape,permute

广播问题

广播原则:官方
一般发生在 target_Q = reward + self.gamma * target_Q * (1 - done)
reward: 4
tagget_Q: 4x1
done: 4
此时两者相加会出现4x4的情况

### 广播原则
import torch
'''
广播原则:从后往前逐个比较两个张量的维度,满足以下条件之一,两个张量才能进行广播
1. 如果张量的维度不同,将维度较小的张量进行扩展,直到两个张量的维度都一样。
2. 对应维度的两个张量,如果某个张量的长度为1,那么可以利用这个张量进行复制来扩展为相同的形状。

'''
A = torch.rand(4, 3)
B = torch.rand(1, 3) # 原则2:对应上下两个张量的维度,A的第一维度4对应B的第一维度1,那么B的第一维度扩展为4
Z = A * B   
X = A + B
print(Z.shape,X.shape)  # 输出: torch.Size([4, 3])

A = torch.rand(4, 1) # 原则2:在第二维度体现 A->4x3
B = torch.rand(1, 3) # B->4x3
Z = A * B 
X = A + B
print(Z.shape,X.shape)  # 输出: torch.Size([4, 3])

A = torch.rand(4, 1) # A->4x3
B = torch.rand(   3) # 原则1:B->4x3
Z = A * B # 原则1
X = A + B
print(Z.shape,X.shape)  # 输出: torch.Size([4, 3])

A = torch.rand(5, 2, 1, 1) # A->5x2x3x1
B = torch.rand(   1, 3, 1) # 原则1: B->5x2x3x1
Z = A * B
X = A + B
print(Z.shape,X.shape)  # 输出: torch.Size([5, 2, 3, 1])

# 无法广播
A = torch.rand(4, 3)
B = torch.rand(2, 3) # 无法广播 这里第一维度2 不是1 则无法广播
try:
    Z = A * B
    print(Z.shape)
    Z = A + B
    print(Z.shape)
except RuntimeError as e:
    print("\nError when broadcasting tensors:")
    print(e)
 
## 广播问题 : 4x1 * 4 = 4x4
'''
一般发生在 target_Q = reward + self.gamma * target_Q * (1 - done)
reward: 4
tagget_Q: 4x1
done: 4
'''

A = torch.rand(4, 1)
B = torch.rand(   4)    

Z = A * B
X = A + B
print(Z.shape,X.shape)  # 输出: torch.Size([4, 4])

## 广播问题解决1
A = torch.rand(4, 1).squeeze(1) #.flatten()
B = torch.rand(   4)

Z = A * B
X = A + B
print(Z.shape,X.shape)  # 输出: torch.Size([4]), torch.Size([4])

## 广播问题解决2
A = torch.rand(4, 1)
B = torch.rand(   4).reshape(-1, 1)
Z = A * B
X = A + B
print(Z.shape,X.shape)  # 输出: torch.Size([4, 1]), torch.Size([4, 1])

'''
这里选择解决2,因为后续算法中如TD3需要2个q值,如果解决1则需要两个squeeze
'''

这里选择解决2,因为后续算法中如TD3需要2个q值,如果解决1则需要两个squeeze。
即将reward的一维张量拓展reshape为2维。

reshape

max,argmax

同理min,argmin

.max(dim = 1)[0] argmax(dim = -1)

max在显式的选择最大动作的价值时会用到
.max(dim = 1)[0]是返回最大值,[1]是返回索引

## max,min,argmax,argmin
import torch
import numpy as np

torch_tensor = torch.rand(6, 4)
# max
max_val_0 = torch.max(torch_tensor) # 返回所有元素中的最大值
max_val_1 = torch_tensor.max() # 返回所有元素中的最大值
print('max:',max_val_0,max_val_1)  # 输出: tensor(0.9977), tensor(0.9977)
max_val_0 = torch.max(torch_tensor, dim=1) # 返回每行的最大值和索引
max_val_1 = torch_tensor.max(dim=1) # 返回每行的最大值和索引
print('max:',max_val_1.values,max_val_1.indices)  
'''
max_val_1.values 常见为 tensor.max(1)[0] #6x4 -> 6
max_val_1.indices 常见为 tensor.max(dim=1)[1]
'''

# argmax
argmax_val_0 = torch.argmax(torch_tensor) # 返回所有元素中的最大值的索引
argmax_val_1 = torch_tensor.argmax() # 返回所有元素中的最大值的索引
print('argmax:',argmax_val_0,argmax_val_1)  # 输出: tensor(13), tensor(13)
argmax_val_0 = torch.argmax(torch_tensor, dim=1) # 返回每行的最大值的索引
argmax_val_1 = torch_tensor.argmax(dim=1) # 返回每行的最大值的索引 #6x4 -> 6
print('argmax:',argmax_val_1,argmax_val_1.shape)  # 输出: tensor([1, 3, 0, 1, 1, 3]) 

argmax_val_1 = torch_tensor.argmax(dim=1,keepdim=True) #  6x4 -> 6x1 #keepdim=True 表示不改变维度
print('argmax:',argmax_val_1,argmax_val_1.shape)  

gather

self.agent.Qnet(state).gather(1, action.long())

gather:收集
离散空间下,为了得到当前Q值,所应用到的函数

# gather
'''
gather(input, dim, index, out=None) → Tensor
input (Tensor) – 源张量
dim (int) – 索引的轴
index (LongTensor) – 包含索引的张量
out (Tensor, optional) – 目标张量
即搜集input在dim维度上的index索引的值,返回到out张量上
'''
torch_tensor = torch.rand(6, 4)
print(torch_tensor)
index = torch.LongTensor([0, 2, 3, 1, 1, 3]) # size = 6
gather_val = torch.gather(torch_tensor, 1, index.reshape(-1,1)) # 6x4 -> 6x1
print('gather:',gather_val)  
'''
即在6行4列 行代表batch_size 列代表act_dim
在列的维度上取出index对应的值
第一行选择第0列的值,第二行选择第2列的值,第三行选择第3列的值...
'''
gather_val = torch_tensor.gather(1, index.reshape(-1,1)) # 6x4 -> 6x1
'''
注意 index 必须为LongTensor类型
index 张量的形状必须与 input 张量的形状在非收集维度上一致。
index 张量的元素必须在 input 张量的有效索引范围内。
'''

注意 index 必须为LongTensor类型

cat,stack

cat([s,a],dim = 1) stack([s,s],dim= -1)

cat:拼接,默认dim = 0
stack:堆叠 ,默认dim = 0
cat 多见于 在critic网路更新时,对state 和 action 的拼接
stack 多见于 多智能体在同一个维度上的log概率密度求和

# cat
'''
cat(tensors, dim=0) → Tensor
tensors (sequence of Tensors) – 要连接的张量序列
dim (int) – 要连接的维度
'''
torch_tensor = torch.rand(6, 4)
cat_val = torch.cat([torch_tensor, torch_tensor], dim=0) # 6x4 -> 12x4 # 默认dim = 0
print('cat:',cat_val,cat_val.shape)

# stack
'''
stack(tensors, dim=0) → Tensor
tensors (sequence of Tensors) – 要连接的张量序列
dim (int) – 要连接的维度
'''
torch_tensor = torch.rand(6, 4)
stack_val = torch.stack([torch_tensor, torch_tensor], dim=0) # 2x6x4 # 默认dim = 0
print('stack:',stack_val,stack_val.shape)

其他细节

关于选择动作的神经网络输入

神经网络输入和输出时 尽量选用二维输入,二位输出,以加快网络更新

即这里self.agent.Qnet(ob)的ob,ob以一维输入也可以,二维输入也可以。
但是pytorch是默认神经网络输入是二维的,且本身pytorch中对矩阵的优化比较好,所以选用二维作为输入,
测试过一次,在使用gpu或cpu的情况下,二维的输入比一维的输入快了1/4倍。(二维所花的时间是1维的3/4。)

所以这里在神经网络输入和输出时 尽量选用二维输入,二位输出,以加快网络更新。

  • 17
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
神经架构搜索(NAS)是一种自动化的机器学习方法,它使用深度强化学习来学习神经网络的结构。以下是一个使用PyTorch实现深度强化学习的神经结构搜索的示例代码: 首先,我们需要定义一个搜索空间。这个搜索空间定义了我们想要搜索的神经网络结构。在这个示例中,我们将使用一个简单的搜索空间,它包含了一些卷积层和全连接层。 ``` import random import torch.nn as nn class SearchSpace(): def __init__(self): self.conv_layers = [ nn.Conv2d(3, 32, 3, padding=1), nn.Conv2d(3, 64, 3, padding=1), nn.Conv2d(3, 128, 3, padding=1), ] self.fc_layers = [ nn.Linear(128 * 8 * 8, 512), nn.Linear(128 * 8 * 8, 1024), nn.Linear(128 * 8 * 8, 2048), ] def random_conv_layer(self): return random.choice(self.conv_layers) def random_fc_layer(self): return random.choice(self.fc_layers) def random_layer(self): if random.random() < 0.5: return self.random_conv_layer() else: return self.random_fc_layer() ``` 接下来,我们需要定义一个代理模型,它将作为我们在搜索过程中评估不同神经网络结构的模型。在这个示例中,我们将使用CIFAR-10数据集来评估每个神经网络结构的性能。 ``` import torch.optim as optim import torch.utils.data as data import torchvision.datasets as datasets import torchvision.transforms as transforms class ProxyModel(): def __init__(self, search_space): self.search_space = search_space self.model = nn.Sequential( self.search_space.random_conv_layer(), nn.ReLU(), nn.MaxPool2d(2), self.search_space.random_layer(), nn.ReLU(), self.search_space.random_layer(), nn.ReLU(), nn.MaxPool2d(2), nn.Flatten(), self.search_space.random_fc_layer(), nn.ReLU(), nn.Linear(512, 10), ) self.optimizer = optim.SGD(self.model.parameters(), lr=0.1) self.criterion = nn.CrossEntropyLoss() transform = transforms.Compose([ transforms.Resize(32), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)), ]) train_set = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform) self.train_loader = data.DataLoader(train_set, batch_size=64, shuffle=True, num_workers=2) def evaluate(self, architecture): self.model.load_state_dict(architecture) self.model.train() for i, (inputs, labels) in enumerate(self.train_loader, 0): self.optimizer.zero_grad() outputs = self.model(inputs) loss = self.criterion(outputs, labels) loss.backward() self.optimizer.step() return loss.item() ``` 接下来,我们需要定义一个环境,它将接收来自代理模型的奖励并返回下一个状态。在这个示例中,我们将使用轮盘赌选择法来选择下一个神经网络结构。 ``` import numpy as np class Environment(): def __init__(self, search_space, proxy_model): self.search_space = search_space self.proxy_model = proxy_model self.current_architecture = None def reset(self): self.current_architecture = {} self.current_architecture['conv1'] = self.search_space.random_conv_layer().state_dict() self.current_architecture['fc1'] = self.search_space.random_fc_layer().state_dict() self.current_architecture['fc2'] = self.search_space.random_fc_layer().state_dict() return self.current_architecture def step(self, action): if action == 0: self.current_architecture['conv1'] = self.search_space.random_conv_layer().state_dict() elif action == 1: self.current_architecture['fc1'] = self.search_space.random_fc_layer().state_dict() elif action == 2: self.current_architecture['fc2'] = self.search_space.random_fc_layer().state_dict() reward = self.proxy_model.evaluate(self.current_architecture) next_state = self.current_architecture done = False return next_state, reward, done ``` 最后,我们需要定义一个智能体,它将使用深度强化学习来搜索最佳神经网络结构。在这个示例中,我们将使用深度Q学习算法。 ``` import torch.nn.functional as F class Agent(): def __init__(self, search_space, proxy_model, env): self.search_space = search_space self.proxy_model = proxy_model self.env = env self.gamma = 0.9 self.epsilon = 1.0 self.epsilon_decay = 0.99 self.epsilon_min = 0.01 self.memory = [] self.batch_size = 32 self.model = nn.Sequential( nn.Linear(768, 128), nn.ReLU(), nn.Linear(128, 64), nn.ReLU(), nn.Linear(64, 3), ) self.optimizer = optim.Adam(self.model.parameters(), lr=0.001) def act(self, state): if np.random.rand() <= self.epsilon: return np.random.randint(3) else: state_tensor = torch.tensor([list(state['conv1'].values()) + list(state['fc1'].values()) + list(state['fc2'].values())]) q_values = self.model(state_tensor.float()) return torch.argmax(q_values).item() def remember(self, state, action, reward, next_state, done): self.memory.append((state, action, reward, next_state, done)) def replay(self): if len(self.memory) < self.batch_size: return batch = random.sample(self.memory, self.batch_size) states, actions, rewards, next_states, dones = zip(*batch) state_tensors = torch.tensor([list(state['conv1'].values()) + list(state['fc1'].values()) + list(state['fc2'].values()) for state in states]) action_tensors = torch.tensor(actions) reward_tensors = torch.tensor(rewards) next_state_tensors = torch.tensor([list(state['conv1'].values()) + list(state['fc1'].values()) + list(state['fc2'].values()) for state in next_states]) done_tensors = torch.tensor(dones) q_values = self.model(state_tensors.float()) next_q_values = self.model(next_state_tensors.float()) max_next_q_values = torch.max(next_q_values, dim=1)[0] expected_q_values = reward_tensors + self.gamma * max_next_q_values * (1 - done_tensors) q_value = q_values.gather(1, action_tensors.unsqueeze(1)).squeeze() loss = F.smooth_l1_loss(q_value, expected_q_values.detach()) self.optimizer.zero_grad() loss.backward() self.optimizer.step() if self.epsilon > self.epsilon_min: self.epsilon *= self.epsilon_decay def train(self, episodes=100): for episode in range(episodes): state = self.env.reset() done = False while not done: action = self.act(state) next_state, reward, done = self.env.step(action) self.remember(state, action, reward, next_state, done) state = next_state self.replay() ``` 现在我们可以训练我们的智能体来搜索最佳神经网络结构: ``` search_space = SearchSpace() proxy_model = ProxyModel(search_space) env = Environment(search_space, proxy_model) agent = Agent(search_space, proxy_model, env) agent.train() ``` 请注意,这只是一个简单的示例代码,实际上神经结构搜索可能需要更复杂的搜索空间和代理模型来获得更好的性能。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值