(17-5)基于强化学习的自动驾驶系统: 深度学习模型

17.6  深度学习模型

在“models”目录中实现了多个程序文件,主要功能是定义了深度学习模型,包括自动编码器、变分自编码器(VAE)、以及与强化学习相关的模型。这些模型用于处理感知数据、学习潜在表示、生成图像或执行强化学习任务。

17.6.1  编码器

在本项目中,编码器(Encoder)的作用是将输入数据(通常是图像)转换为低维度的嵌入向量。编码器的主要功能是提取输入数据的特征,并将这些特征编码成紧凑的表示形式,以便在低维空间中表示输入数据。具体来说,编码器的作用包括:

  1. 特征提取:编码器通过一系列卷积和池化层来提取输入数据的有用特征。这些特征可以捕获输入数据的局部和全局结构信息。
  2. 降维:编码器将高维输入数据(例如,图像)映射到低维的嵌入向量空间。这个低维表示通常具有比原始数据更紧凑的表示形式,有助于减少数据的存储和计算复杂性。
  3. 特征学习:编码器的过程可以被视为一种特征学习过程,其中模型学习如何表示输入数据,以便于后续的任务,如重构、分类或语义分割。
  4. 信息压缩:编码器通过将输入数据编码为嵌入向量,实现了信息的压缩。这有助于减少存储需求,并提供了一种有效的方式来表示输入数据的重要信息。

编写文件models/autoencoder.py,功能是分别定义了一个深度自动编码器(Autoencoder)和一个用于语义分割的自动编码器(AutoencoderSEM)模型。文件models/autoencoder.py的具体实现流程如下所示。

(1)创建类ConvBlock和DeConvBlock,用于实现自动编码器中使用的卷积块和反卷积块的定义,用于构建编码器和解码器的层。其中类ConvBlock包括卷积层、批归一化和ReLU激活函数。类DeConvBlock包括反卷积层、批归一化和ReLU激活函数。具体实现代码如下所示。

class ConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1):
        super().__init__()
        self.conv = nn.Sequential(nn.Conv2d(in_channels, out_channels,
                                            kernel_size, stride, padding),
                                  nn.BatchNorm2d(out_channels),
                                  nn.ReLU(inplace=True))
    def forward(self, x):
        return self.conv(x)
    
class DeConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, output_padding=0):
        super().__init__()
        self.deconv = nn.Sequential(nn.ConvTranspose2d(in_channels, out_channels,
                                                       kernel_size, stride, padding,
                                                       output_padding),
                                    nn.BatchNorm2d(out_channels),
                                    nn.ReLU(inplace=True))
    def forward(self, x):
        return self.deconv(x)

(2)创建模型编码类Encoder,用于将输入数据编码为低维嵌入向量的编码器模型。输入数据包括大小为(256, 256)的图像和其他信息,包括多个卷积块(conv1至conv5)以提取图像的特征。最后,通过全连接层(fc)将特征映射到指定维度(emb_size)的嵌入向量。

class Encoder(nn.Module):
    def __init__(self, input_size=(256, 256), emb_size=256):
        super().__init__()
        self.conv1 = ConvBlock(4, 32, kernel_size=5, stride=2, padding=2)
        self.conv2 = ConvBlock(32, 64, kernel_size=5, stride=2, padding=2)
        self.conv3 = ConvBlock(64, 128, kernel_size=3, stride=2, padding=1)
        self.conv4 = ConvBlock(128, 256, kernel_size=3, stride=2, padding=1)
        self.conv5 = ConvBlock(256, 64, kernel_size=3, stride=2, padding=1)

        self.y_final = input_size[0] // (2 ** 5)
        self.x_final = input_size[1] // (2 ** 5)

        self.fc = nn.Linear(64*self.x_final*self.y_final, emb_size)

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = self.conv5(x)

        x = torch.flatten(x, start_dim=1)
        x = self.fc(x)
        x = nn.functional.normalize(x, dim=1)
        return x

(4)创建模型解码类Decoder,用于将编码的嵌入向量解码为与输入数据相同大小的图像的解码器模型。包括全连接层(fc)用于从嵌入向量生成中间表示,后续的反卷积块(deconv1至deconv5)用于生成图像。最后的卷积层(convfinal)用于生成最终的图像。如果使用额外的数据(additional data),则还包括用于生成该数据的全连接层(fc_data和fc_junction)。具体实现代码如下所示。 

class Decoder(nn.Module):
    def __init__(self, input_size=(256, 256), emb_size=256, out_ch=29, use_additional_data=True):
        super().__init__()
        self.y_inicial = input_size[0] // (2 ** 5)
        self.x_inicial = input_size[1] // (2 ** 5)

        self.fc = nn.Linear(emb_size, 64*self.x_inicial*self.y_inicial)

        self.deconv1 = DeConvBlock(64, 256, kernel_size=5, stride=2, padding=2, output_padding=1)
        self.deconv2 = DeConvBlock(256, 128, kernel_size=5, stride=2, padding=2, output_padding=1)
        self.deconv3 = DeConvBlock(128, 64, kernel_size=5, stride=2, padding=2, output_padding=1)
        self.deconv4 = DeConvBlock(64, 32, kernel_size=5, stride=2, padding=2, output_padding=1)
        self.deconv5 = DeConvBlock(32, 16, kernel_size=6, stride=2, padding=2, output_padding=0)
        self.convfinal = nn.Conv2d(16, out_ch, kernel_size=3, stride=1, padding=1)

        self.use_additional_data = use_additional_data
        if self.use_additional_data:
            self.fc_data = nn.Linear(emb_size, 3)
            self.fc_junction = nn.Linear(emb_size, 1)

    def forward(self, emb):
        x = self.fc(emb)
        x = x.view(-1, 64, self.y_inicial, self.x_inicial)
        x = self.deconv1(x)
        x = self.deconv2(x)
        x = self.deconv3(x)
        x = self.deconv4(x)
        x = self.deconv5(x)
        x = self.convfinal(x)
        if self.use_additional_data:
            data = self.fc_data(emb)
            junction = self.fc_junction(emb)
            return x, data, junction
        else:
            return x, None, None

(5)创建模型类Autoencoder,这是一个完整的自动编码器,包括编码器和解码器。可用于训练并重构输入数据,使用AdamW优化器和学习率调度器。具体实现代码如下所示。    

class Autoencoder(pl.LightningModule):
    def __init__(self, input_size=(256, 256), emb_size=256, out_ch=1, lr=1e-3, weights=(0.8, 0.1, 0.1), use_additional_data=True):
        super().__init__()
        self.save_hyperparameters()
        self.encoder = Encoder(input_size, emb_size)
        self.decoder = Decoder(input_size, emb_size, out_ch, use_additional_data)
        self.lr = lr
        self.weights = weights
        self.use_additional_data = use_additional_data

    def encode(self, x):
        return self.encoder(x)
    
    def decode(self, emb):
        x, data, junction = self.decoder(emb)
        x = torch.sigmoid(x).squeeze()
        return x, data, junction

    def forward(self, x):
        x = self.encode(x)
        x, data, junction = self.decode(x)
        return x, data, junction
    
    def configure_optimizers(self):
        optimizer = torch.optim.AdamW(self.parameters(), lr=self.lr, weight_decay=0.05)
        scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode="min", factor=0.2, patience=20, min_lr=5e-5)
        return {"optimizer": optimizer, "lr_scheduler": scheduler, "monitor": "val_loss"}
    
    def training_step(self, batch, batch_idx):
        x, sem, data, junction = batch

        sem_hat, data_hat, junction_hat = self(x)
        loss_sem = nn.functional.mse_loss(sem_hat, sem)
        if self.use_additional_data:
            loss_data = nn.functional.mse_loss(data_hat, data)
            loss_junction = nn.functional.binary_cross_entropy_with_logits(junction_hat, junction)
            loss = self.weights[0]*loss_sem + self.weights[1]*loss_data + self.weights[2]*loss_junction
        else:
            loss = loss_sem
        self.log('train_loss', loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)
        return loss

    def validation_step(self, batch, batch_idx):
        x, sem, data, junction = batch

        sem_hat, data_hat, junction_hat = self(x)
        loss_sem = nn.functional.mse_loss(sem_hat, sem)
        if self.use_additional_data:
            loss_data = nn.functional.mse_loss(data_hat, data)
            loss_junction = nn.functional.binary_cross_entropy_with_logits(junction_hat, junction)
            loss = self.weights[0]*loss_sem + self.weights[1]*loss_data + self.weights[2]*loss_junction
        else:
            loss = loss_sem
        self.log('val_loss', loss, on_step=False, on_epoch=True, prog_bar=True, logger=True)
        return loss

(6)创建模型类AutoencoderSEM,这是用于实现语义分割的自动编码器,也包括编码器和解码器。类AutoencoderSEM用于训练并执行语义分割,使用交叉熵损失来处理语义分割任务。具体实现代码如下所示。

class AutoencoderSEM(pl.LightningModule):
    def __init__(self, input_size=(256, 256), emb_size=256, num_classes=29, lr=1e-3, weights=(0.8, 0.1, 0.1), use_additional_data=True):
        super().__init__()
        self.save_hyperparameters()
        self.encoder = Encoder(input_size, emb_size)
        self.decoder = Decoder(input_size, emb_size, num_classes, use_additional_data)
        self.lr = lr
        self.weights = weights
        self.use_additional_data = use_additional_data

    def encode(self, x):
        return self.encoder(x)
    
    def decode(self, emb):
        x, data, junction = self.decoder(emb)
        return x, data, junction

    def forward(self, x):
        x = self.encode(x)
        x, data, junction = self.decode(x)
        return x, data, junction
    
    def configure_optimizers(self):
        optimizer = torch.optim.AdamW(self.parameters(), lr=self.lr, weight_decay=0.05)
        scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode="min", factor=0.2, patience=20, min_lr=5e-5)
        return {"optimizer": optimizer, "lr_scheduler": scheduler, "monitor": "val_loss"}
    
    def training_step(self, batch, batch_idx):
        x, sem, data, junction = batch

        sem_hat, data_hat, junction_hat = self(x)
        loss_sem = nn.functional.cross_entropy(sem_hat, sem)
        if self.use_additional_data:
            loss_data = nn.functional.mse_loss(data_hat, data)
            loss_junction = nn.functional.binary_cross_entropy_with_logits(junction_hat, junction)
            loss = self.weights[0]*loss_sem + self.weights[1]*loss_data + self.weights[2]*loss_junction
        else:
            loss = loss_sem
        self.log('train_loss', loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)
        return loss

    def validation_step(self, batch, batch_idx):
        x, sem, data, junction = batch

        sem_hat, data_hat, junction_hat = self(x)
        loss_sem = nn.functional.cross_entropy(sem_hat, sem)
        if self.use_additional_data:
            loss_data = nn.functional.mse_loss(data_hat, data)
            loss_junction = nn.functional.binary_cross_entropy_with_logits(junction_hat, junction)
            loss = self.weights[0]*loss_sem + self.weights[1]*loss_data + self.weights[2]*loss_junction
        else:
            loss = loss_sem
        self.log('val_loss', loss, on_step=False, on_epoch=True, prog_bar=True, logger=True)
        return loss

在整个项目中,编码器用于实现如下两种不同的任务:

  1. 在Autoencoder模型中,编码器用于将输入图像编码成嵌入向量,然后解码器将嵌入向量还原为重构的图像。这个过程用于特征学习和重建任务。
  2. 在AutoencoderSEM模型中,编码器用于将输入图像编码成嵌入向量,然后用于进行语义分割任务。编码器学习如何从输入图像中提取有关物体和场景的语义信息,以便进行像素级别的标记。

总之,编码器在该项目中起到了关键的作用,它能够将输入数据转换为有用的低维嵌入向量,为后续的任务提供了有力支持。

17.6.2  编码器

编写文件models/vae.py,功能是定义了一个变分自编码器(Variational Autoencoder,VAE)模型。VAE是一种生成模型,用于学习数据的潜在表示和生成新的数据样本。VAE通过编码器和解码器的结合实现了自动编码和生成的功能,是一种强大的生成模型。具体实现代码如下所示。

import torch
from torch import nn
import pytorch_lightning as pl

class ConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1):
        super().__init__()
        self.conv = nn.Sequential(nn.Conv2d(in_channels, out_channels,
                                            kernel_size, stride, padding),
                                  nn.BatchNorm2d(out_channels),
                                  nn.ReLU(inplace=True))
    def forward(self, x):
        return self.conv(x)
    
class DeConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, output_padding=0, last=False):
        super().__init__()
        if not last:
            self.deconv = nn.Sequential(nn.ConvTranspose2d(in_channels, out_channels,
                                                       kernel_size, stride, padding,
                                                       output_padding),
                                        nn.BatchNorm2d(out_channels),
                                        nn.ReLU(inplace=True))
        else:
            self.deconv = nn.Sequential(nn.ConvTranspose2d(in_channels, out_channels,
                                                         kernel_size, stride, padding,
                                                            output_padding))
    def forward(self, x):
        return self.deconv(x)
    
class EncoderVAE(nn.Module):
    def __init__(self, input_size=(256, 256), emb_size=256):
        super().__init__()
        self.conv1 = ConvBlock(4, 32, kernel_size=4, stride=2, padding=1)
        self.conv2 = ConvBlock(32, 64, kernel_size=4, stride=2, padding=1)
        self.conv3 = ConvBlock(64, 128, kernel_size=4, stride=2, padding=1)
        self.conv4 = ConvBlock(128, 256, kernel_size=4, stride=2, padding=1)
        self.conv5 = ConvBlock(256, 64, kernel_size=4, stride=2, padding=1)

        self.y_final = input_size[0] // (2 ** 5)
        self.x_final = input_size[1] // (2 ** 5)

        self.fc = nn.Linear(64*self.x_final*self.y_final, emb_size)

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = self.conv5(x)

        x = torch.flatten(x, start_dim=1)
        x = self.fc(x)
        return x
    
class DecoderVae(nn.Module):
    def __init__(self, input_size=(256, 256), emb_size=256, out_ch=4):
        super().__init__()
        self.y_inicial = input_size[0] // (2 ** 5)
        self.x_inicial = input_size[1] // (2 ** 5)

        self.fc = nn.Linear(emb_size, 64*self.x_inicial*self.y_inicial)

        self.deconv1 = DeConvBlock(64, 256, kernel_size=4, stride=2, padding=1, output_padding=0)
        self.deconv2 = DeConvBlock(256, 128, kernel_size=4, stride=2, padding=1, output_padding=0)
        self.deconv3 = DeConvBlock(128, 64, kernel_size=4, stride=2, padding=1, output_padding=0)
        self.deconv4 = DeConvBlock(64, 32, kernel_size=4, stride=2, padding=1, output_padding=0)
        self.deconv5 = DeConvBlock(32, out_ch, kernel_size=4, stride=2, padding=1, output_padding=0, last=True)

    def forward(self, x):
        x = self.fc(x)
        x = x.view(-1, 64, self.y_inicial, self.x_inicial)
        x = self.deconv1(x)
        x = self.deconv2(x)
        x = self.deconv3(x)
        x = self.deconv4(x)
        x = torch.sigmoid(self.deconv5(x)).squeeze()
        return x
    
class VAE(pl.LightningModule):
    def __init__(self, input_size=(256, 256), emb_size=256, lr=1e-3, out_ch=4):
        super().__init__()
        self.save_hyperparameters()
        self.encoder = EncoderVAE(input_size, emb_size)
        self.decoder = DecoderVae(input_size, emb_size, out_ch)
        self.fc_mu = nn.Linear(emb_size, emb_size)
        self.fc_var = nn.Linear(emb_size, emb_size)
        self.lr = lr
        
    def encode(self, x):
        x = self.encoder(x)
        mu = self.fc_mu(x)
        log_var = self.fc_var(x)
        std = torch.exp(log_var / 2)
        q = torch.distributions.Normal(mu, std)
        z = q.sample()
        return z
    
    def decode(self, z):
        x_hat = self.decoder(z)
        return x_hat

    def forward(self, x):
        z = self.encode(x)
        x_hat = self.decode(z)
        return x_hat
    
    def configure_optimizers(self):
        optimizer = torch.optim.AdamW(self.parameters(), lr=self.lr, weight_decay=0.05)
        scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode="min", factor=0.2, patience=20, min_lr=5e-5)
        return {"optimizer": optimizer, "lr_scheduler": scheduler, "monitor": "val_loss"}
    
    def training_step(self, batch, batch_idx):
        x, y, _, _ = batch

        # encode x to get the mu and variance parameters
        x_encoded = self.encoder(x)
        mu, log_var = self.fc_mu(x_encoded), self.fc_var(x_encoded)

        # sample z from q
        std = torch.exp(log_var / 2)
        q = torch.distributions.Normal(mu, std)
        z = q.rsample()

        # decoded 
        x_hat = self.decoder(z)

        # reconstruction loss
        recon_loss = nn.functional.mse_loss(x_hat, y)

        # kl
        kld_loss = torch.mean(-0.5 * torch.sum(1 + log_var - mu ** 2 - log_var.exp(), dim = 1), dim = 0)

        loss = kld_loss + recon_loss

        self.log('train_loss', loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)
        return loss
    
    def validation_step(self, batch, batch_idx):
        x, y, _, _ = batch

        # encode x to get the mu and variance parameters
        x_encoded = self.encoder(x)
        mu, log_var = self.fc_mu(x_encoded), self.fc_var(x_encoded)

        # sample z from q
        std = torch.exp(log_var / 2)
        q = torch.distributions.Normal(mu, std)
        z = q.rsample()

        # decoded 
        x_hat = self.decoder(z)

        # reconstruction loss
        recon_loss = nn.functional.mse_loss(x_hat, y)

        # kl
        kld_loss = torch.mean(-0.5 * torch.sum(1 + log_var - mu ** 2 - log_var.exp(), dim = 1), dim = 0)

        loss = kld_loss + recon_loss

        self.log('val_loss', loss, on_step=False, on_epoch=True, prog_bar=True, logger=True)
        return loss

对上述代码的具体说明如下:

  1. 类ConvBlock和类DeConvBlock:这些类定义了卷积块和反卷积块,用于构建编码器和解码器的层。它们包括卷积、反卷积、批归一化和激活函数等操作。
  2. 类EncoderVAE:编码器类,用于将输入数据编码成潜在向量。它包括一系列卷积层,将输入图像逐渐降维到潜在空间的低维表示,并通过全连接层输出均值和方差参数,以便进行潜在空间的采样。
  3. 类DecoderVae:解码器类,用于从潜在向量重构输入数据。它包括一系列反卷积层,将潜在向量映射回输入图像的形状,并输出重构的图像。
  4. 类VAE:VAE模型的主类,包括编码器、解码器、均值和方差的全连接层。它定义了VAE的前向传播过程,包括编码、采样和解码步骤。
  5. encode方法:用于对输入数据进行编码,返回潜在向量z和生成的均值和方差。
  6. decode方法:用于从潜在向量z生成重构的图像。
  7. configure_optimizers方法:配置优化器和学习率调度器。
  8. training_step和validation_step方法:定义了训练和验证的每个步骤,包括重构损失和KL散度损失的计算,以及总体损失的计算。

17.6.3  定义强化学习模型

编写文件models/agent_parts.py,功能是定义了在深度强化学习(Deep Reinforcement Learning)中使用的神经网络模型,包括Actor、Critic、TwinCritic和Environment模型。这些模型通常用于实现强化学习算法,如深度确定性策略梯度(DDPG)等。具体实现代码如下所示。

import torch
from torch import nn
import torch.nn.functional as F

class Actor(nn.Module):
    def __init__(self, emb_size=256):
        super().__init__()
        self.fc1 = nn.Linear(emb_size+2, 256)
        self.fc2 = nn.Linear(256, 256)

        self.fc3_left = nn.Linear(256, 256)
        self.out_left = nn.Linear(256, 2)

        self.fc3_right = nn.Linear(256, 256)
        self.out_right = nn.Linear(256, 2)

        self.fc3_straight = nn.Linear(256, 256)
        self.out_straight = nn.Linear(256, 2)

    def forward(self, emb, command, action):
        x = torch.cat((emb, action), dim=1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))

        x_left = F.relu(self.fc3_left(x))
        x_left = self.out_left(x_left)

        x_straight = F.relu(self.fc3_straight(x))
        x_straight = self.out_straight(x_straight)

        x_right = F.relu(self.fc3_right(x))
        x_right = self.out_right(x_right)

        x = torch.stack((x_left, x_straight, x_right), dim=0)

        x = torch.gather(x, 0, command.expand((-1,2)).view(1,-1,2)).squeeze(0)

        x = torch.tanh(x)

        return x
    
class Critic(nn.Module):
    def __init__(self, emb_size=256):
        super().__init__()
        self.fc1 = nn.Linear(emb_size+2, 256)
        self.fc2 = nn.Linear(256, 256)

        self.fc3_left = nn.Linear(256, 256)
        self.out_left = nn.Linear(256, 1)

        self.fc3_right = nn.Linear(256, 256)
        self.out_right = nn.Linear(256, 1)

        self.fc3_straight = nn.Linear(256, 256)
        self.out_straight = nn.Linear(256, 1)

    def forward(self, emb, command, action):
        x = torch.cat((emb, action), dim=1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))

        x_left = F.relu(self.fc3_left(x))
        x_left = self.out_left(x_left)

        x_straight = F.relu(self.fc3_straight(x))
        x_straight = self.out_straight(x_straight)

        x_right = F.relu(self.fc3_right(x))
        x_right = self.out_right(x_right)

        x = torch.stack((x_left, x_straight, x_right), dim=0)

        x = torch.gather(x, 0, command.view(1,-1,1)).squeeze(0)

        return x
    
class TwinCritic(nn.Module):
    def __init__(self, emb_size=256):
        super().__init__()
        self.critic1 = Critic(emb_size)
        self.critic2 = Critic(emb_size)

    def forward(self, emb, command, action):
        return self.critic1(emb, command, action), self.critic2(emb, command, action)
    
class Environment(nn.Module):
    def __init__(self, emb_size=256):
        super().__init__()
        self.fc1_transition = nn.Linear(2, 128)
        self.fc2_transition = nn.Linear(128+emb_size, 512)
        self.out_transition = nn.Linear(512, emb_size)

        self.fc1_reward = nn.Linear(2, 128)
        self.fc2_reward = nn.Linear(128+emb_size*2, 512)
        self.fc3_reward = nn.Linear(512, 256)
        self.out_reward = nn.Linear(256, 1)
    
    def forward(self, emb, action):
        o = self.transition_model(emb, action)
        r = self.reward_model(emb, action, o)

        return o, r
    
    def transition_model(self, emb, action):
        o = F.relu(self.fc1_transition(action))
        o = torch.cat((o, emb), dim=1)
        o = F.relu(self.fc2_transition(o))
        o = self.out_transition(o)
        return o
    
    def reward_model(self, emb, action, next_emb):
        r = F.relu(self.fc1_reward(action))
        r = torch.cat((r, emb, next_emb), dim=1)
        r = F.relu(self.fc2_reward(r))
        r = F.relu(self.fc3_reward(r))
        r = self.out_reward(r)
        return r

对上述代码的具体说明如下:

(1)Actor模型

  1. 用于表示策略(Policy)的神经网络模型。
  2. 输入包括状态(emb)、命令(command)和动作(action)。
  3. 包括多个全连接层(fc1、fc2、fc3_left、fc3_straight、fc3_right)以及输出层(out_left、out_straight、out_right)。
  4. 通过前向传播计算并输出三个动作:左转、直行和右转。
  5. 使用ReLU激活函数,并将输出通过tanh激活函数进行约束。

(2)Critic模型

  1. 用于评估状态-动作对的价值的神经网络模型。
  2. 输入包括状态(emb)、命令(command)和动作(action)。
  3. 包括多个全连接层(fc1、fc2、fc3_left、fc3_straight、fc3_right)以及输出层(out_left、out_straight、out_right)。
  4. 通过前向传播计算并输出三个动作的价值。

(3)TwinCritic模型

  1. 由两个Critic模型组成,用于评估状态-动作对的价值。
  2. 使用两个独立的Critic模型来提高训练的稳定性。

(4)Environment模型

  1. 用于表示环境动态和奖励的神经网络模型。
  2. 包括状态转移模型和奖励模型。
  3. 状态转移模型用于预测下一个状态(emb)。
  4. 奖励模型用于预测奖励信号(reward)。
  5. 包括多个全连接层以及输出层。

上述模型在强化学习任务中通常用于构建智能体的策略(Actor)、评估状态-动作对的价值(Critic和TwinCritic),以及模拟环境动态和奖励(Environment)。它们是深度强化学习中的关键组成部分,用于实现各种强化学习算法。

未完待续

  • 23
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

码农三叔

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值