生物大分子平台（5）

最新推荐文章于 2024-07-10 22:17:05 发布

南山夜梦

最新推荐文章于 2024-07-10 22:17:05 发布

阅读量125

点赞数

分类专栏：生物大分子平台文章标签：深度学习 python 人工智能

本文链接：https://blog.csdn.net/fengjiuxin/article/details/120985644

版权

生物大分子平台专栏收录该内容

14 篇文章 0 订阅

订阅专栏

生物大分子平台（5）

2021SC@SDUSC

文章目录

生物大分子平台（5）
0 前言
1 强化学习代码
- 1.1 强化学习定义
- 1.2 强化学习代码解读
2 数据可视化

0 前言

这个周主要的任务是完成了强化网络有关代码的阅读，然后完成了数据可视化部分代码的阅读以及motif可视化的。

1 强化学习代码

1.1 强化学习定义

强化学习是在与环境的交互过程中通过学习策略以达成回报最大化或实现特定目标的问题的一种机器学习方法。

1.2 强化学习代码解读

使用的库函数
库函数详解：itertools，迭代库，大大减少了某些算法与代码片段的内存量和时间。

import gym
import math
import random
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from collections import namedtuple
from itertools import count
from PIL import Image

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision.transforms as T
env = gym.make('CartPole-v0').unwrapped
# set up matplotlib
is_ipython = 'inline' in matplotlib.get_backend()
if is_ipython:
    from IPython import display
plt.ion()
# if gpu is to be used
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

定义两个类第一个类Transition用来命名元组，表示我们环境中的单个转换。本质上将(state,action）对映射到(next_state,reward），状态state和next_state是屏幕差异图像。第二个类ReplayMemory是一个有界大小的循环缓冲区，用于保存最近观察到的转换。此类还实了.sample()方法，从经验库中随机选择一批transitions，方便直接拿去训练智能体。

Transition = namedtuple('Transition',
                        ('state', 'action', 'next_state', 'reward'))
class ReplayMemory(object):
    def __init__(self, capacity):
        self.capacity = capacity
        self.memory = []
        self.position = 0
    def push(self, *args):
        """Saves a transition."""
        if len(self.memory) < self.capacity:
            self.memory.append(None)
        self.memory[self.position] = Transition(*args)
        self.position = (self.position + 1) % self.capacity
    def sample(self, batch_size):
        return random.sample(self.memory, batch_size)
    def __len__(self):
        return len(self.memory)

Q-Network 模型是一个卷积神经网络，该卷积神经网络吸收当前屏幕补丁与先前屏幕补丁之间的差异，并有两个输出。实际上，网络正尝试预测在给定输入的情况下执行每个action的预期收获。

class DQN(nn.Module):
    def __init__(self, h, w, outputs):
        super(DQN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=5, stride=2)
        self.bn1 = nn.BatchNorm2d(16)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=5, stride=2)
        self.bn2 = nn.BatchNorm2d(32)
        self.conv3 = nn.Conv2d(32, 32, kernel_size=5, stride=2)
        self.bn3 = nn.BatchNorm2d(32)
        def conv2d_size_out(size, kernel_size = 5, stride = 2):
            return (size - (kernel_size - 1) - 1) // stride  + 1
        convw = conv2d_size_out(conv2d_size_out(conv2d_size_out(w)))
        convh = conv2d_size_out(conv2d_size_out(conv2d_size_out(h)))
        linear_input_size = convw * convh * 32
        self.head = nn.Linear(linear_input_size, outputs)
    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.relu(self.bn3(self.conv3(x)))
    return self.head(x.view(x.size(0), -1))

获取输入

resize = T.Compose([T.ToPILImage(),
                    T.Resize(40, interpolation=Image.CUBIC),
                    T.ToTensor()])

def get_cart_location(screen_width):
    world_width = env.x_threshold * 2
    scale = screen_width / world_width
    return int(env.state[0] * scale + screen_width / 2.0)

def get_screen():
    screen = env.render(mode='rgb_array').transpose((2, 0, 1))
    _, screen_height, screen_width = screen.shape
    screen = screen[:, int(screen_height*0.4):int(screen_height * 0.8)]
    view_width = int(screen_width * 0.6)
    cart_location = get_cart_location(screen_width)
    if cart_location < view_width // 2:
        slice_range = slice(view_width)
    elif cart_location > (screen_width - view_width // 2):
        slice_range = slice(-view_width, None)
    else:
        slice_range = slice(cart_location - view_width // 2,
                            cart_location + view_width // 2)
    screen = screen[:, :, slice_range]
    screen = np.ascontiguousarray(screen, dtype=np.float32) / 255
    screen = torch.from_numpy(screen)
    return resize(screen).unsqueeze(0).to(device)

env.reset()
plt.figure()
plt.imshow(get_screen().cpu().squeeze(0).permute(1, 2, 0).numpy(),
           interpolation='none')
plt.title('Example extracted screen')
plt.show()

训练和配置超参数
select_action 将根据epsilon-greedy策略选择一个行为。有时会使用我们的模型来选择行为，有时我们只会对其中一个进行统一采样。选择随机行为的概率将从EPS_START开始，并朝EPS_END呈指数衰减，参数EPS_DECAY用来控制衰减率。plot_durations是一个帮助绘制迭代次数持续时间的参数，以及过去100次迭代的平均值。迭代次数将在包含主训练循环的单元下方，并在每次迭代之后更新。

BATCH_SIZE = 128
GAMMA = 0.999
EPS_START = 0.9
EPS_END = 0.05
EPS_DECAY = 200
TARGET_UPDATE = 10
# 获取屏幕大小，以便我们可以根据从ai-gym返回的形状正确初始化层。
# 这一点上的平常尺寸接近3x40x90，这是在get_screen(）中抑制和缩小的渲染缓冲区的结果。
init_screen = get_screen()
_, _, screen_height, screen_width = init_screen.shape
n_actions = env.action_space.n
policy_net = DQN(screen_height, screen_width, n_actions).to(device)
target_net = DQN(screen_height, screen_width, n_actions).to(device)
target_net.load_state_dict(policy_net.state_dict())
target_net.eval()
optimizer = optim.RMSprop(policy_net.parameters())
memory = ReplayMemory(10000)
steps_done = 0
def select_action(state):
    global steps_done
    sample = random.random()
    eps_threshold = EPS_END + (EPS_START - EPS_END) * \
        math.exp(-1\. * steps_done / EPS_DECAY)
    steps_done += 1
    if sample > eps_threshold:
        with torch.no_grad():
            return policy_net(state).max(1)[1].view(1, 1)
    else:
        return torch.tensor([[random.randrange(n_actions)]], device=device, dtype=torch.long)
episode_durations = []
def plot_durations():
    plt.figure(2)
    plt.clf()
    durations_t = torch.tensor(episode_durations, dtype=torch.float)
    plt.title('Training...')
    plt.xlabel('Episode')
    plt.ylabel('Duration')
    plt.plot(durations_t.numpy())
    if len(durations_t) >= 100:
        means = durations_t.unfold(0, 100, 1).mean(1).view(-1)
        means = torch.cat((torch.zeros(99), means))
        plt.plot(means.numpy())
    plt.pause(0.001) 
    if is_ipython:
        display.clear_output(wait=True)
        display.display(plt.gcf())

转置批样本，将转换的批处理数组转换为另一种格式。
计算非最终状态的掩码并连接批处理元素
计算Q(s_t, a)-模型计算 Q(s_t)，然后选择所采取行动的列。
这些是根据策略网络对每个批处理状态所采取的操作。
计算下一个状态的V(s_{t+1})。
非最终状态下一个状态的预期操作值是基于“旧”目标网络计算的；选择 max(1)[0]的最佳奖励。当状态为最终状态时，我们将获得预期状态值或0。计算期望Q值，计算Huber损失，优化模型。

def optimize_model():
    if len(memory) < BATCH_SIZE:
        return
    transitions = memory.sample(BATCH_SIZE)
    batch = Transition(*zip(*transitions))
    non_final_mask = torch.tensor(tuple(map(lambda s: s is not None,
                                          batch.next_state)), device=device, dtype=torch.bool)
    non_final_next_states = torch.cat([s for s in batch.next_state
                                                if s is not None])
    state_batch = torch.cat(batch.state)
    action_batch = torch.cat(batch.action)
    reward_batch = torch.cat(batch.reward)
    state_action_values = policy_net(state_batch).gather(1, action_batch)
    next_state_values = torch.zeros(BATCH_SIZE, device=device)
    next_state_values[non_final_mask] = target_net(non_final_next_states).max(1)[0].detach()
    expected_state_action_values = (next_state_values * GAMMA) + reward_batch
    loss = F.smooth_l1_loss(state_action_values, expected_state_action_values.unsqueeze(1))
    optimizer.zero_grad()
    loss.backward()
    for param in policy_net.parameters():
        param.grad.data.clamp_(-1, 1)
    optimizer.step()

强化学习流程

2 数据可视化

2.1 数据可视化

目前python常用的数据可视化库有matpotlib和seaborn两种，适合于生物信息范围内的可视化库还包括weblogo等库。其用法和原理各不相同。

2.2 matplotlib代码用法解读

matplotlib的难点主要就是理解figure(画布)、axes(坐标系)、axis(坐标轴)三者之间的关系

t = np.arange(-1, 2, .01)
s = np.sin(2 * np.pi * t)

#曲线
plt.plot(t, s)

# 以y轴0点画横线
plt.axhline(linewidth=8, color='#d62728')

# 画横线
plt.axhline(y=1)

# 画纵线
plt.axvline(x=1)

# Draw a thick blue vline at x=0 that spans the upper quadrant of the yrange
# plt.axvline(x=0, ymin=0.75, linewidth=8, color='#1f77b4')

# 画线段
plt.axhline(y=.5, xmin=0.25, xmax=0.75)

# 平行填充
plt.axhspan(0.25, 0.75, facecolor='0.5', alpha=0.5)

# 垂直填充
plt.axvspan(1.25, 1.55, facecolor='#2ca02c', alpha=0.5)

# 坐标轴
plt.axis([-1, 2, -1, 2])

plt.show()

2.3 weblogo命令行调用方法

首先配置服务器环境，服务器环境配置将在下一个推文中讲解
要在linux条件下才能完成操作
http://weblogo.threeplusone.com/examples.html

南山夜梦

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
生物大分子平台（5）

生物大分子平台（5）2021SC@SDUSC文章目录生物大分子平台（5）0 前言11 数据可视化1.1 matplotlib plt详细解读1.2matplotlib 画布0 前言这个周主要的任务是完成了深度网络有关代码的阅读，然后完成了数据可视化部分代码的阅读以及新环境的搭建。11 数据可视化数据可视化是用matplotlib和R语言绘制而成的，我这个周主要完成了matplotlib库部分代码的阅读。1.1 matplotlib plt详细解读首先我们可以看到我们经常引入的plt其实是一
复制链接

扫一扫