rqn 137 找试场(模拟)

6 篇文章 0 订阅

题目连接:http://www.rqnoj.cn/Problem_137.html

解题思路:模拟,水题一道,看代码。

#include <stdio.h>
#include <string.h>
const int N = 10;
const int r[] = {1, 2, 3, 0};
const int l[] = {3, 0, 1, 2};

int n;
char order[N];

int main() {
    while (scanf("%d", &n) == 1) {
	int x = 0, y = 0, dir = 0, flag = 0;
	for (int i = 0; i < n; i++) {
	    scanf("%s", order);
	    if (strcmp(order, "left") == 0)
		dir = l[dir];
	    else if (strcmp(order, "right") == 0)
		dir = r[dir];
	    else {
		int step;
		flag = 1;
		sscanf(order, "%d", &step);
		if (dir == 0)
		    y += step;
		else if (dir == 1)
		    x += step;
		else if (dir == 2)
		    y -= step;
		else if (dir == 3)
		    x -= step;
		printf("(%d,%d)\n", x, y);
	    }
	}
	if (flag == 0)
	    printf("(0,0)\n");
    }
    return 0;
}


MODRQN是一种深度强化学习算法,它结合了多步时间差分算法和RQN(Recurrent Q-Network)的思想,可以用于解决强化学习中的决策任务。下面是一个用Python实现MODRQN算法的代码示例: ```python import tensorflow as tf import numpy as np class MODRQN: def __init__(self, state_dim, action_dim, num_steps, num_layers, hidden_dim): self.state_dim = state_dim self.action_dim = action_dim self.num_steps = num_steps self.num_layers = num_layers self.hidden_dim = hidden_dim self.inputs = tf.placeholder(tf.float32, [None, self.num_steps, self.state_dim]) self.actions = tf.placeholder(tf.int32, [None, self.num_steps]) self.targets = tf.placeholder(tf.float32, [None]) # 定义模型 lstm_cells = [tf.nn.rnn_cell.BasicLSTMCell(self.hidden_dim) for _ in range(num_layers)] lstm = tf.nn.rnn_cell.MultiRNNCell(lstm_cells) initial_state = lstm.zero_state(tf.shape(self.inputs)[0], tf.float32) outputs, _ = tf.nn.dynamic_rnn(lstm, self.inputs, initial_state=initial_state, dtype=tf.float32) outputs = tf.transpose(outputs, [1, 0, 2]) last_output = tf.gather(outputs, self.num_steps - 1) # 计算价值函数 self.w = tf.Variable(tf.random_normal([self.hidden_dim, self.action_dim])) self.b = tf.Variable(tf.zeros([self.action_dim])) q = tf.matmul(last_output, self.w) + self.b # 计算损失函数 mask = tf.one_hot(self.actions, self.action_dim) q_masked = tf.reduce_sum(q * mask, axis=1) self.loss = tf.reduce_mean(tf.square(self.targets - q_masked)) self.optimizer = tf.train.AdamOptimizer().minimize(self.loss) self.sess = tf.Session() self.sess.run(tf.global_variables_initializer()) def train(self, inputs, actions, targets): self.sess.run(self.optimizer, feed_dict={self.inputs: inputs, self.actions: actions, self.targets: targets}) def predict(self, inputs): return self.sess.run(q, feed_dict={self.inputs: inputs}) ``` 这个代码定义了一个MODRQN类,其构造函数需要传入状态维度state_dim、动作维度action_dim、多步时间差分算法的时间步数num_steps、RQN的层数num_layers和隐藏维度hidden_dim。在训练时,需要调用train方法,传入输入inputs、动作actions和目标值targets,即可更新模型参数。在预测时,需要调用predict方法,传入输入inputs,即可得到模型的输出。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值