【打CF,学算法——一星级】Gym 100548A Built with Qinghuai and Ari Factor (Asia Xian)

【CF简介】

提交链接:Built with Qinghuai and Ari Factor


题面:

Built with Qinghuai and Ari Factor
Description

DISCLAIMER: All names, incidents, characters and places appearing in this problem
are fictitious. Any resemblance to actual events or locales or real persons, living
or dead, is purely coincidental.
Shamatisan is a somewhat famous smartphone maker in China, they built smartphones
that hope to contend with Abble and Dami for the hearts, minds and the wallets of China’s
consumers. They have a famous advertising word, saying that Shamatisan phones are built
with Qinghuai (a concept which is hard to explain in English). Their latest phone T-1 has
just began taking reservations recently, or to be precious, at the beginning of this month. But
those who are tracking its progress on Aripapapa’s online store Skyat noticed an interesting
fact that has led to an apology by the online shopping site.
Those (being like sleuths in this story) who are always curious about questions like “In
how many attoseconds1 were the Dami phones sold out?” found something unusual about the
reservation count of Shamatisan T-1. It always has a divisor three! What’s the logic behind
this mystery? A bit of digging into the site coding showed that the number of reservations
had been multipled by three. After this discovery, people started rumors like “Three is the
Qinghuai factor, applied broadly by Shamatisan internally.” and began to call integers, which
are divisible by three, Qinghuai numbers. They also defined if all elements in a sequence are
Qinghuai numbers, the sequence itself is said to be built with Qinghuai. Moreover, after some
research, people found that there is a feature called “Buy Buy Buy Ring” on Skyat, causing
all reservation counts multiplied by a factor (possibly 1). The rumor “Any real number can
be represented as an Aripapapa Factor (also known as Ari Factor)” had been spread widely.
Later, an Aripapapa’s spokeswoman said this was an incident and posted an official apology
announcement. It is said that a programmer “made a highly unscientific decision”. As a result,
main programmer of Skyat whose name is Beiguoxia lost his job.
Our protagonist Pike loves to write programs that are able to automatically grab some
data from Internet. As you may already know, such programs are usually called “spider”.
Pike has already collected some sequence using his spider. Now he wonders that if these
sequences are built with Qinghuai. Help Pike to figure out this!
Input
The first line of the input gives the number of test cases, T. T test cases follow.
For each test case, the first line contains an integer n (1 ≤ n ≤ 100), the length of
sequence S. The second line contains n integers, which represent n integers in sequence S.
All the numbers in the input will not exceed 106
.
11 attosecond equals to 10−18 seconds.
1The 2014 ACM-ICPC Asia Xi’an Regional Contest October 26, 2014
Output
For each test case output one line “Case #x: y”, where x is the case number (starting
from 1) and y is “Yes” (without quotes) if the sequence S is built with so-called “Qinghuai”,
otherwise “No” (without quotes).
Samples
Sample Input 

2
3
1 2 3
2
3000 996

Sample Output
Case #1: No
Case #2: Yes
Hints
In the first case, since the sequence contains numbers which are too small to have Qinghuai,
it cannot be called being built with Qinghuai.
In the second case, the first integer is the signage of Shamatisan, and the second integer
represents core values of Aripapapa, we can declare that the sequence is built with Qinghuai.
Also note that the whole problem statement (including hints) had deliberately been written
as a joke, don’t be so serious!


解题:

     签到题,题目神长,然而就两句话有用,就是全部数能整除3,输出yes,否则no。


代码:

#include <iostream>
using namespace std;
int main()
{
	int n,t,tmp;
	cin>>t;
	for(int i=1;i<=t;i++)
	{
		bool flag=true;
		cin>>n;
		for(int j=1;j<=n;j++)
		{
			cin>>tmp;
			if(tmp%3)
				flag=false;
		}
		cout<<"Case #"<<i<<": ";
		if(flag)cout<<"Yes\n";
		else cout<<"No\n";
	}
	return 0;
}



好的,以下是用TensorFlow搭建A2C算法并实现gym中的打砖块游戏的步骤: 1. 导入库 ```python import numpy as np import tensorflow as tf import gym ``` 2. 定义Actor-Critic网络模型 ```python class ActorCritic(tf.keras.Model): def __init__(self, num_actions): super(ActorCritic, self).__init__() self.common = tf.keras.layers.Dense(32, activation='relu') self.actor = tf.keras.layers.Dense(num_actions, activation='softmax') self.critic = tf.keras.layers.Dense(1) def call(self, inputs): x = self.common(inputs) return self.actor(x), self.critic(x) ``` 该网络模型包含一个共享层和两个分支层,分别用于输出动作概率和状态值。共享层接收环境状态作为输入,经过计算后输出一个向量,分别供两个分支层使用。动作概率分支层使用softmax激活函数输出一个概率分布,以决定在给定状态下采取哪个动作。状态值分支层使用线性激活函数输出一个标量,以估计在给定状态下采取动作的期望回报。 3. 定义A2C算法 ```python class A2C: def __init__(self, env, gamma=0.99, alpha=0.0001): self.env = env self.gamma = gamma self.alpha = alpha self.model = ActorCritic(env.action_space.n) self.optimizer = tf.keras.optimizers.Adam(learning_rate=alpha) def update(self, state, action, reward, next_state, done): state = np.reshape(state, [1, -1]) next_state = np.reshape(next_state, [1, -1]) with tf.GradientTape() as tape: # 计算当前状态的动作概率和状态值 actor_probs, critic_value = self.model(state) # 计算选择的动作的log概率 log_prob = tf.math.log(actor_probs[0, action]) # 计算TD误差 if done: td_error = reward - critic_value else: next_actor_probs, next_critic_value = self.model(next_state) td_error = reward + self.gamma * next_critic_value - critic_value # 计算Actor和Critic的损失函数 actor_loss = -log_prob * td_error critic_loss = tf.keras.losses.mean_squared_error(reward + self.gamma * next_critic_value, critic_value) loss = actor_loss + critic_loss # 计算梯度并更新网络参数 gradients = tape.gradient(loss, self.model.trainable_variables) self.optimizer.apply_gradients(zip(gradients, self.model.trainable_variables)) ``` 该A2C算法包含一个Actor-Critic网络模型和一个优化器。它的update方法接收当前状态、选择的动作、即时奖励、下一个状态和done标志作为输入,然后根据A2C算法计算Actor和Critic的损失函数,并使用梯度下降法更新网络参数。 4. 训练A2C算法 ```python env = gym.make('Breakout-v0') a2c = A2C(env) total_episodes = 1000 max_steps_per_episode = 10000 for episode in range(total_episodes): state = env.reset() episode_reward = 0 for step in range(max_steps_per_episode): # 选择动作 actor_probs, _ = a2c.model(np.reshape(state, [1, -1])) action = np.random.choice(env.action_space.n, p=actor_probs.numpy()[0]) # 执行动作并观察环境 next_state, reward, done, _ = env.step(action) episode_reward += reward # 更新A2C算法 a2c.update(state, action, reward, next_state, done) if done: break state = next_state print("Episode {}: Reward = {}".format(episode + 1, episode_reward)) ``` 在这个训练循环中,我们首先使用env.reset()初始化游戏状态,并在每个时间步中选择一个动作并执行它。然后,我们观察环境并计算即时奖励,更新A2C算法,直到游戏结束。在每个episode结束时,我们输出总奖励。 5. 运行游戏 ```python from gym.wrappers import Monitor env = gym.make('Breakout-v0') env = Monitor(env, './video', force=True) state = env.reset() done = False while not done: actor_probs, _ = a2c.model(np.reshape(state, [1, -1])) action = np.argmax(actor_probs.numpy()) next_state, _, done, _ = env.step(action) state = next_state env.close() ``` 最后,我们可以使用gym.wrappers.Monitor包装器来录制游戏视频,并在每个时间步中选择Actor-Critic网络模型输出的最大概率动作。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值