Python学习:deque

Deque

Example: d=collections.deque([(start,1)])

A double-ended queue, or deque, supports adding and removingelements from either end. The more commonly used stacks and queues aredegenerate forms of deques, where the inputs and outputs arerestricted to a single end.

import collections

d = collections.deque('abcdefg')
print 'Deque:', d
print 'Length:', len(d)
print 'Left end:', d[0]
print 'Right end:', d[-1]

d.remove('c')
print 'remove(c):', d
Since deques are a type of sequence container, they support some of the same operations that lists support, such as examining the contents with __getitem__(), determining length, and removing elements from the middle by matching identity.

$ python collections_deque.py

Deque: deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
Length: 7
Left end: a
Right end: g
remove(c): deque(['a', 'b', 'd', 'e', 'f', 'g'])

Populating

A deque can be populated from either end, termed “left” and “right” in thePython implementation.

import collections

# Add to the right
d = collections.deque()
d.extend('abcdefg')
print 'extend    :', d
d.append('h')
print 'append    :', d

# Add to the left
d = collections.deque()
d.extendleft('abcdefg')
print 'extendleft:', d
d.appendleft('h')
print 'appendleft:', d

Notice that extendleft() iterates over its input and performsthe equivalent of anappendleft() for each item. The end resultis thedeque contains the input sequence in reverse order.

$ python collections_deque_populating.py

extend    : deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
append    : deque(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
extendleft: deque(['g', 'f', 'e', 'd', 'c', 'b', 'a'])
appendleft: deque(['h', 'g', 'f', 'e', 'd', 'c', 'b', 'a'])

Consuming

Similarly, the elements of the deque can be consumed fromboth or either end, depending on the algorithm being applied.

import collections

print 'From the right:'
d = collections.deque('abcdefg')
while True:
    try:
        print d.pop()
    except IndexError:
        break

print '\nFrom the left:'
d = collections.deque('abcdefg')
while True:
    try:
        print d.popleft()
    except IndexError:
        break

Use pop() to remove an item from the “right” end of thedeque andpopleft() to take from the “left” end.

$ python collections_deque_consuming.py

From the right:
g
f
e
d
c
b
a

From the left:
a
b
c
d
e
f
g

Since deques are thread-safe, the contents can even be consumed fromboth ends at the same time from separate threads.

import collections
import threading
import time

candle = collections.deque(xrange(11))

def burn(direction, nextSource):
    while True:
        try:
            next = nextSource()
        except IndexError:
            break
        else:
            print '%8s: %s' % (direction, next)
            time.sleep(0.1)
    print '%8s done' % direction
    return

left = threading.Thread(target=burn, args=('Left', candle.popleft))
right = threading.Thread(target=burn, args=('Right', candle.pop))

left.start()
right.start()

left.join()
right.join()

The threads in this example alternate between each end, removing itemsuntil thedeque is empty.

$ python collections_deque_both_ends.py

    Left: 0
   Right: 10
   Right: 9
     Left: 1
   Right: 8
    Left: 2
   Right: 7
    Left: 3
   Right: 6
    Left: 4
   Right: 5
    Left done
   Right done

Rotating

Another useful capability of the deque is to rotate it ineither direction, to skip over some items.

import collections

d = collections.deque(xrange(10))
print 'Normal        :', d

d = collections.deque(xrange(10))
d.rotate(2)
print 'Right rotation:', d

d = collections.deque(xrange(10))
d.rotate(-2)
print 'Left rotation :', d

Rotating the deque to the right (using a positive rotation)takes items from the right end and moves them to the leftend. Rotating to the left (with a negative value) takes items from theleft end and moves them to the right end. It may help to visualizethe items in the deque as being engraved along the edge of a dial.

$ python collections_deque_rotate.py

Normal        : deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Right rotation: deque([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])
Left rotation : deque([2, 3, 4, 5, 6, 7, 8, 9, 0, 1])

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
基于Python的强化学习代码通常使用强化学习框架来实现。其中比较常用的框架包括OpenAI Gym和PyTorch等。下面是一个使用PyTorch实现DQN算法的强化学习代码示例: ```python import gym import torch import random import numpy as np from collections import deque from torch import nn, optim class DQN(nn.Module): def __init__(self, state_dim, action_dim): super(DQN, self).__init__() self.fc1 = nn.Linear(state_dim, 64) self.fc2 = nn.Linear(64, 64) self.fc3 = nn.Linear(64, action_dim) def forward(self, x): x = torch.relu(self.fc1(x)) x = torch.relu(self.fc2(x)) x = self.fc3(x) return x class ReplayBuffer: def __init__(self, capacity): self.buffer = deque(maxlen=capacity) def push(self, state, action, reward, next_state, done): self.buffer.append((state, action, reward, next_state, done)) def sample(self, batch_size): state, action, reward, next_state, done = zip(*random.sample(self.buffer, batch_size)) return np.array(state), np.array(action), np.array(reward, dtype=np.float32), np.array(next_state), np.array(done, dtype=np.uint8) def __len__(self): return len(self.buffer) class Agent: def __init__(self, state_dim, action_dim, lr, gamma, epsilon, buffer_capacity, batch_size): self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") self.action_dim = action_dim self.gamma = gamma self.epsilon = epsilon self.batch_size = batch_size self.buffer = ReplayBuffer(buffer_capacity) self.policy_net = DQN(state_dim, action_dim).to(self.device) self.target_net = DQN(state_dim, action_dim).to(self.device) self.target_net.load_state_dict(self.policy_net.state_dict()) self.target_net.eval() self.optimizer = optim.Adam(self.policy_net.parameters(), lr=lr) def act(self, state): if random.random() < self.epsilon: return random.randint(0, self.action_dim - 1) state = torch.FloatTensor(state).unsqueeze(0).to(self.device) with torch.no_grad(): q_value = self.policy_net(state) return q_value.argmax(dim=1).item() def update(self): if len(self.buffer) < self.batch_size: return state, action, reward, next_state, done = self.buffer.sample(self.batch_size) state = torch.FloatTensor(state).to(self.device) action = torch.LongTensor(action).to(self.device) reward = torch.FloatTensor(reward).to(self.device) next_state = torch.FloatTensor(next_state).to(self.device) done = torch.FloatTensor(done).to(self.device) q_value = self.policy_net(state).gather(1, action.unsqueeze(1)).squeeze(1) next_q_value = self.target_net(next_state).max(1)[0] expected_q_value = reward + self.gamma * next_q_value * (1 - done) loss = nn.MSELoss()(q_value, expected_q_value.detach()) self.optimizer.zero_grad() loss.backward() self.optimizer.step() def update_target(self): self.target_net.load_state_dict(self.policy_net.state_dict()) def save(self, path): torch.save(self.policy_net.state_dict(), path) def load(self, path): self.policy_net.load_state_dict(torch.load(path)) env = gym.make('CartPole-v0') state_dim = env.observation_space.shape[0] action_dim = env.action_space.n agent = Agent(state_dim, action_dim, lr=0.001, gamma=0.99, epsilon=0.1, buffer_capacity=10000, batch_size=64) for episode in range(1000): state = env.reset() total_reward = 0 while True: action = agent.act(state) next_state, reward, done, _ = env.step(action) agent.buffer.push(state, action, reward, next_state, done) total_reward += reward state = next_state agent.update() if done: break agent.update_target() print("Episode: {}, total reward: {}".format(episode, total_reward)) agent.save("dqn.pth") ``` 该代码实现了一个使用DQN算法解决CartPole-v0环境的强化学习任务的智能体。其中,DQN类定义了一个三层全连接神经网络,ReplayBuffer类定义了一个经验回放缓存,Agent类定义了一个智能体,包含了策略网络和目标网络,并实现了动作选择、经验回放和网络更新等功能。在主函数中,我们使用该智能体在CartPole-v0环境中进行了1000个episode的训练,并将训练好的策略网络保存到了文件"dqn.pth"中。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值