1014 of greedy strategy

Problem O

Time Limit : 2000/1000ms (Java/Other)   Memory Limit : 131072/65536K (Java/Other)
Total Submission(s) : 71   Accepted Submission(s) : 13
Problem Description
Before bridges were common, ferries were used to transport cars across rivers. River ferries, unlike their larger cousins, run on a guide line and are powered by the river's current. Cars drive onto the ferry from one end, the ferry crosses the river, and the cars exit from the other end of the ferry. 
There is a ferry across the river that can take n cars across the river in t minutes and return in t minutes. m cars arrive at the ferry terminal by a given schedule. What is the earliest time that all the cars can be transported across the river? What is the minimum number of trips that the operator must make to deliver all cars by that time?
 

Input
The first line of input contains c, the number of test cases. Each test case begins with n, t, m. m lines follow, each giving the arrival time for a car (in minutes since the beginning of the day). The operator can run the ferry whenever he or she wishes, but can take only the cars that have arrived up to that time.
 

Output
For each test case, output a single line with two integers: the time, in minutes since the beginning of the day, when the last car is delivered to the other side of the river, and the minimum number of trips made by the ferry to carry the cars within that time. <br> <br>You may assume that 0 < n, t, m < 1440. The arrival times for each test case are in non-decreasing order.
 

Sample Input
  
  
2 2 10 10 0 10 20 30 40 50 60 70 80 90 2 10 3 10 30 40
 

Sample Output
  
  
100 5 50 2
 题目要求:货船运送货车过河问题。给与货船最大装船量和货车的数目和到达时间,要求保证最小运送次数下的最短时间。
 思路:时间最短,即保证最后一辆货车最早到岸,而其前最大装船量内的货车不计入时间。进行递归调用后得到结果。m%n为0,直接n辆递归计算时间,否者先运前m%n辆然后n量递归求出时间。
细节:判断运完一次回岸后是否积累到最大装车量,用于计算时间。
#include<iostream>
#include<iostream>
#include<string.h>
#include<set>
#include<stdio.h>
#include<vector>
#include<algorithm>
#include<numeric>
#include<math.h>
#include<string.h>
#include<sstream>
#include<stdio.h>
#include<string>
#include<cstdlib>
#include<algorithm>
#include<iostream>
#include<map>
#include<queue>
#include<iomanip>
#include<cstdio>
using namespace std;
int main()
{
	int n, m, t,e,c,x,y,o,i;
	scanf("%d", &c);
	while (c--)
	{
		scanf("%d%d%d", &n, &t, &m);
		x = m / n;
		o=m%n;
		if (o)x++;y=0;
		for(i=1;i<=m;i++)
        {
            scanf("%d",&e);
            if(o)
            {
                if(i==o)y=e;
                else if(i%n==o)
            {
             if(y+2*t>e)y+=2*t;
             else y=e;
            }
            }
            else
            if(i==n){y=e;}
            else if(i%n==0)
            {
                if(y+2*t>e)y+=2*t;
             else y=e;
            }
         }y+=t;
		printf("%d %d\n", y, x);

	}
}


### 贪婪DQN算法概述 贪婪DQN(Deep Q-Network)是强化学习领域中的一个重要进展,旨在通过深度神经网络近似Q函数来克服传统Q-learning面临的维度灾难问题。此方法不仅能够处理高维输入空间,还能够在复杂环境中有效地学习策略[^1]。 在贪婪DQN中,“贪婪”通常指的是采用贪心策略选择行动——即总是选取当前估计回报最高的那个动作作为下一步要执行的操作。然而,在实际应用过程中为了平衡探索与利用的关系,一般会结合ε-greedy机制:大部分时间遵循最大预期收益原则行事,但在一定概率下随机挑选其他可能的行为来进行尝试。 ### 实现方法 以下是Python语言编写的简化版贪婪DQN实现: ```python import torch import random from collections import deque class GreedyDQN(torch.nn.Module): def __init__(self, state_dim, action_dim, hidden_size=64): super(GreedyDQN, self).__init__() self.fc = torch.nn.Sequential( torch.nn.Linear(state_dim, hidden_size), torch.nn.ReLU(), torch.nn.Linear(hidden_size, action_dim) ) def forward(self, x): return self.fc(x) def select_action(model, state, epsilon, n_actions): sample = random.random() if sample > epsilon: with torch.no_grad(): q_values = model(state).squeeze(0) action = q_values.argmax().item() # greedy choice based on current policy else: action = random.randrange(n_actions) # exploration by choosing randomly return action ``` 这段代码定义了一个简单的两层全连接神经网络用于表示价值函数,并提供了一种基于给定模型预测的状态值来决定采取何种行为的方法。当`epsilon`参数较大时更倾向于探索未知区域;反之则更多地依据已有经验做出判断。 ### 应用场景 贪婪DQN已被广泛应用于各种序列决策问题之中,特别是在游戏AI方面取得了显著成就。例如AlphaGo系列程序就运用了类似的思路去击败人类顶尖棋手。除此之外,该技术还在机器人路径规划、自动驾驶汽车控制等领域展现出巨大潜力[^4]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值