博弈DP专题

3 篇文章 0 订阅

在做这类博弈题的时候,时不时会让人陷入“如何找一个最优的贪心策略”这么一个局面,所以开这么一个专题来收集这类的博弈题以此告诫自己。

对待这类题目,经常是动态规划与记忆化搜索结合。




Play Game hdu-4597

Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65535/65535 K (Java/Others)
Total Submission(s): 936    Accepted Submission(s): 551


Problem Description
Alice and Bob are playing a game. There are two piles of cards. There are N cards in each pile, and each card has a score. They take turns to pick up the top or bottom card from either pile, and the score of the card will be added to his total score. Alice and Bob are both clever enough, and will pick up cards to get as many scores as possible. Do you know how many scores can Alice get if he picks up first?
 

Input
The first line contains an integer T (T≤100), indicating the number of cases.
Each case contains 3 lines. The first line is the N (N≤20). The second line contains N integer a i (1≤a i≤10000). The third line contains N integer b i (1≤b i≤10000).
 

Output
For each case, output an integer, indicating the most score Alice can get.
 

Sample Input
  
  
2 1 23 53 3 10 100 20 2 4 3
 

Sample Output
  
  
53 105

题意:有两堆含有n张牌的卡组,两个玩家轮流从两堆卡组中任意一端抽取一张卡,每张卡有一个权值,问先手最多得多少分

思路:首先要记住一点,因为是博弈,所以双方都会选择对自己最有益的策略,即当前局面我抽某张卡能使我最终获得的权值最多。

我们用dp[l][r][ll][rr]来表示一个状态,即第一堆牌已经取到[l,r],第二堆牌已经取到[ll,rr]时,当前玩家所能获得的最大值。我们只需要考虑当前情况下,我取某一个数的值+剩下所有数的和-对方状态下的最优解。递归出口,就是当卡的个数为1或为0,这看代码写法,我以0作为递归出口。

#pragma comment(linker, "/STACK:1024000000,1024000000") 
#include<cstdio>
#include<cstring>
#include<iostream>
#include<algorithm>
#include<stdlib.h>
#include<vector>
#include<stack>
#include<queue>
#include<map>
#include<string>
using namespace std;

#define LL long long
#define ULL unsigned long long
int n;

int num1[25],num2[25];
int sum1[25],sum2[25];
int dp[25][25][25][25];
int dfs(int l,int r,int ll,int rr){
    if(dp[l][r][ll][rr]!=-1) return dp[l][r][ll][rr];
    if(l>r&&ll>rr) return 0;
    else if(ll>rr){
        if(l==r) return dp[l][r][ll][rr] = num1[l];
        else {
            dp[l][r][ll][rr] = max(dp[l][r][ll][rr],num1[l]+sum1[r]-sum1[l]+sum2[rr]-sum2[ll-1]-dfs(l+1,r,ll,rr));
            dp[l][r][ll][rr] = max(dp[l][r][ll][rr],num1[r]+sum1[r-1]-sum1[l-1]+sum2[rr]-sum2[ll-1]-dfs(l,r-1,ll,rr));
            return dp[l][r][ll][rr];
        }
    }        
    else if(l>r){
        if(ll==rr) return dp[l][r][ll][rr] = num2[ll];
        else {
            dp[l][r][ll][rr] = max(dp[l][r][ll][rr],num2[ll]+sum1[r]-sum1[l-1]+sum2[rr]-sum2[ll]-dfs(l,r,ll+1,rr));
            dp[l][r][ll][rr] = max(dp[l][r][ll][rr],num2[rr]+sum1[r]-sum1[l-1]+sum2[rr-1]-sum2[ll-1]-dfs(l,r,ll,rr-1));
            return dp[l][r][ll][rr];
        }
    }
    else {
        dp[l][r][ll][rr] = max(dp[l][r][ll][rr],num1[l]+sum1[r]-sum1[l]+sum2[rr]-sum2[ll-1]-dfs(l+1,r,ll,rr));
        dp[l][r][ll][rr] = max(dp[l][r][ll][rr],num1[r]+sum1[r-1]-sum1[l-1]+sum2[rr]-sum2[ll-1]-dfs(l,r-1,ll,rr));
        dp[l][r][ll][rr] = max(dp[l][r][ll][rr],num2[ll]+sum1[r]-sum1[l-1]+sum2[rr]-sum2[ll]-dfs(l,r,ll+1,rr));
        dp[l][r][ll][rr] = max(dp[l][r][ll][rr],num2[rr]+sum1[r]-sum1[l-1]+sum2[rr-1]-sum2[ll-1]-dfs(l,r,ll,rr-1));
        return dp[l][r][ll][rr];
    }    
    
    
}
int main(void){
    int t;
    scanf("%d",&t);
    while(t--){
        scanf("%d",&n);
        sum1[0]=sum2[0]=0;
        for(int i=1;i<=n;i++) scanf("%d",&num1[i]),sum1[i]=sum1[i-1]+num1[i];
        for(int i=1;i<=n;i++) scanf("%d",&num2[i]),sum2[i]=sum2[i-1]+num2[i];
        memset(dp,-1,sizeof(dp));
        printf("%d\n",dfs(1,n,1,n));
    }
    
    return 0;
}


poj-1440
Varacious Steve
Time Limit: 3000MS Memory Limit: 10000K
Total Submissions: 360 Accepted: 166

Description

Steve and Digit bought a box containing a number of donuts. In order to divide them between themselves they play a special game that they created. The players alternately take a certain, positive number of donuts from the box, but no more than some fixed integer. Each player's donuts are gathered on the player's side. The player that empties the box eats his donuts while the other one puts his donuts back into the box and the game continues with the "looser" player starting. The game goes on until all the donuts are eaten. The goal of the game is to eat the most donuts. How many donuts can Steve, who starts the game, count on, assuming the best strategy for both players? 

Write a program that: 

  • reads the parameters of the game from the standard input, 

  • computes the number of donuts Steve can count on, 

  • writes the result to the standard output. 

Input

The rst and only line of the input contains exactly two integers n and m separated by a single space, 1 <= m <= n <= 100 - the parameters of the game, where n is the number of donuts in the box at the beginning of the game and m is the upper limit on the number of donuts to be taken by one player in one move. 

Process to the end of file. 

Output

The output contains exactly one integer equal to the number of donuts Steve can count on.

Sample Input

5 2

Sample Output

3

题意:盒子里有n个甜甜圈,两个人博弈,每次最多取m个,当盒子里的全取完,取完的能吃下自己取到的所有甜甜圈,对方把自己取到的放回去,并重新开始,对方先取,问最后游戏无法再进行时,先手最多能吃掉多少甜甜圈。

思路:我们用dp[a][b][c]保存从当前状态开始,我取了a个,对方取b个,还剩c个时我最终所获得的甜甜圈个数。

当c>m时,dp[a][b][c] = max(dp[a][b][c],a+b+c-dp[b][k+a][c-k]),其中k为我取的个数,范围为[1,m],因为dp[b][k+a][c-k]是对方在那个状态下最终的结果,所以我们以总数减去对方的最优策略,即是我能得到的结果。

当c<=m时,当c不等于m时与上面方程一样,当c等于m时

dp[a][b][c] = max(dp[a][b][c],a+b+c-dp[0][0][b])

这里相当于我赢了,以对方的个数重新开始,递归出口是当全部取完,即dp[0][0][0]的时候为0


#pragma comment(linker, "/STACK:1024000000,1024000000") 
#include<cstdio>
#include<cstring>
#include<iostream>
#include<algorithm>
#include<stdlib.h>
#include<vector>
#include<stack>
#include<queue>
#include<map>
#include<string>
using namespace std;

#define LL long long
#define ULL unsigned long long
int dp[105][105][105];
int n,m;
int dfs(int a,int b,int c){
	if(dp[a][b][c]!=-1) return dp[a][b][c];
	if(a==0&&b==0&&c==0) return dp[a][b][c]=0;
	else if(c>m){
		for(int i=1;i<=m;i++)
			dp[a][b][c] = max(dp[a][b][c],a+b+c-dfs(b,a+i,c-i));
		return dp[a][b][c];
	}
	else {
		for(int i=1;i<=c;i++)
			if(i==c)
				dp[a][b][c] = max(dp[a][b][c],a+b+c-dfs(0,0,b));
			else dp[a][b][c] = max(dp[a][b][c],a+b+c-dfs(b,a+i,c-i));
		return dp[a][b][c];
	}
}
int main(void){
	while(~scanf("%d%d",&n,&m)){
		memset(dp,-1,sizeof(dp));
		int ans = dfs(0,0,n);
		printf("%d\n",ans);
	}
	
	return 0;
}


  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值