Greedy quantifiers, Reluctant quantifiers, Possessive quantifiers

Greedy quantifiers
X?X, once or not at all
X*X, zero or more times
X+X, one or more times
X{n}X, exactly n times
X{n,}X, at least n times
X{n,m}X, at least n but not more than m times
 
Reluctant quantifiers
X??X, once or not at all
X*?X, zero or more times
X+?X, one or more times
X{n}?X, exactly n times
X{n,}?X, at least n times
X{n,m}?X, at least n but not more than m times
 
Possessive quantifiers
X?+X, once or not at all
X*+X, zero or more times
X++X, one or more times
X{n}+X, exactly n times
X{n,}+X, at least n times
X{n,m}+X, at least n but not more than m times

Groups and capturing

Capturing groups are numbered by counting their opening parentheses from left to right. In the expression ((A)(B(C))), for example, there are four such groups:

1    ((A)(B(C)))
2    (A)
3    (B(C))
4    (C)

Group zero always stands for the entire expression.

Capturing groups are so named because, during a match, each subsequence of the input sequence that matches such a group is saved. The captured subsequence may be used later in the expression, via a back reference, and may also be retrieved from the matcher once the match operation is complete.

The captured input associated with a group is always the subsequence that the group most recently matched. If a group is evaluated a second time because of quantification then its previously-captured value, if any, will be retained if the second evaluation fails. Matching the string "aba" against the expression (a(b)?)+, for example, leaves group two set to "b". All captured input is discarded at the beginning of each match.

Groups beginning with (? are pure, non-capturing groups that do not capture text and do not count towards the group total.


### 贪婪DQN算法概述 贪婪DQN(Deep Q-Network)是强化学习领域中的一个重要进展,旨在通过深度神经网络近似Q函数来克服传统Q-learning面临的维度灾难问题。此方法不仅能够处理高维输入空间,还能够在复杂环境中有效地学习策略[^1]。 在贪婪DQN中,“贪婪”通常指的是采用贪心策略选择行动——即总是选取当前估计回报最高的那个动作作为下一步要执行的操作。然而,在实际应用过程中为了平衡探索与利用的关系,一般会结合ε-greedy机制:大部分时间遵循最大预期收益原则行事,但在一定概率下随机挑选其他可能的行为来进行尝试。 ### 实现方法 以下是Python语言编写的简化版贪婪DQN实现: ```python import torch import random from collections import deque class GreedyDQN(torch.nn.Module): def __init__(self, state_dim, action_dim, hidden_size=64): super(GreedyDQN, self).__init__() self.fc = torch.nn.Sequential( torch.nn.Linear(state_dim, hidden_size), torch.nn.ReLU(), torch.nn.Linear(hidden_size, action_dim) ) def forward(self, x): return self.fc(x) def select_action(model, state, epsilon, n_actions): sample = random.random() if sample > epsilon: with torch.no_grad(): q_values = model(state).squeeze(0) action = q_values.argmax().item() # greedy choice based on current policy else: action = random.randrange(n_actions) # exploration by choosing randomly return action ``` 这段代码定义了一个简单的两层全连接神经网络用于表示价值函数,并提供了一种基于给定模型预测的状态值来决定采取何种行为的方法。当`epsilon`参数较大时更倾向于探索未知区域;反之则更多地依据已有经验做出判断。 ### 应用场景 贪婪DQN已被广泛应用于各种序列决策问题之中,特别是在游戏AI方面取得了显著成就。例如AlphaGo系列程序就运用了类似的思路去击败人类顶尖棋手。除此之外,该技术还在机器人路径规划、自动驾驶汽车控制等领域展现出巨大潜力[^4]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值