Rock Paper Scissors-Google Kickstart 2021 Round C

Rock Paper Scissors-Google Kickstart 2021 Round C第三题

You and your friend like to play Rock Paper Scissors. Each day you play exactly 60 rounds and at the end of each day, you tally up the score from these 60 rounds.
During each round, without any knowledge of the other person’s choice, you each make your choice. Then, you both reveal the choice you made and determine your score. Rock wins over Scissors, Scissors wins over Paper, and Paper wins over Rock. Let R represent Rock, P represent Paper, and S represent Scissors. Every day you both agree on values W and E. If your choice wins, you get W points. If you and your friend both pick the same choice, you get E points. If your choice loses, you get nothing.
By accident, you see your friend’s strategy written in an open notebook on a desk one day. Your friend keeps track of how many times you have chosen R, P, and S so far during one day. Let Ai be your choice of R, P, or S on round i, while Bi is your friend’s choice on the same round. Let ri be the number of times Aj= R for 1≤j≤(i−1). Similarly, let pi and si be the total number of times you have chosen P and S, respectively, prior to round i.
On round 1 of each day, i=1 and r1=s1=p1=0, and your friend plays randomly due to the lack of information (i.e. your friend chooses each option with probablity 1/3). On every subsequent round, your friend decides Bi by choosing R with probability Pr[R]=si/(i−1), P with probability Pr[P]=ri/(i−1), and S with probability Pr[S]=pi/(i−1). This strategy is adaptive and tough to beat!
You are going on vacation for the next T days. You must leave your assistant with instructions on what choice to pick each round each day. Let integer X be the average reward you are aiming for in this game after T days. Given W and E (different values for different days), provide your instructions as a string of 60 characters, ordered from round 1 to round 60. Each character represents your choice for the corresponding round. Your goal is to choose your set of instructions so that the average expected value of the reward across all the days of your gameplay is at least X. Note that you can choose different instructions for different values of W and E.

题目大意:

你与朋友玩剪刀石头布的游戏。 游戏一共有200天, 每天有60局, 其中第i天获胜一局获得 W i W_i Wi分,平局获得 E i E_i Ei分。 已知朋友的策略如下:

  1. 每天第一局等概率出剪刀/石头/布
  2. 每天的第i局,假设你当天前 i − 1 i-1 i1局一共出了 r r r次石头, p p p次布, s s s次剪刀, 则朋友有 r i − 1 \frac{r}{i-1} i1r的概率出布, p i − 1 \frac{p}{i-1} i1p的概率出剪刀, s i − 1 \frac{s}{i-1} i1s的概率出石头。

试制定每一天的策略,使得200天后最终每天期望得分的平均值至少为 X X X分。

思路分析:

看到总数只有60局,并且不要求最优解,容易想到模拟退火骗分。
不过正解也比较容易想到: 一个 O ( n 3 ) O(n^3) O(n3)的期望dp可解。 我们设 e x p [ r ] [ p ] [ s ] exp[r][p][s] exp[r][p][s]为第 i i i天已经出了 r r r个石头, p p p个布, s s s个剪刀的最高期望得分。
第一局无论出什么的期望得分都是 W 3 + E 3 \frac{W}3+\frac{E}3 3W+3E,之后的转移方程可以从前一局具体出的什么推出:
e x p [ r ] [ p ] [ s ] = m a x ( e x p [ r − 1 ] [ p ] [ s ] + s r + p + s − 1 × W i + r − 1 r + p + s − 1 × E i , e x p [ r ] [ p − 1 ] [ s ] + r r + p + s − 1 × W i + p − 1 r + p + s − 1 × E i , e x p [ r ] [ p ] [ s − 1 ] + p r + p + s − 1 × W i + s − 1 r + p + s − 1 × E i ) \begin{aligned} exp[r][p][s]=max(&exp[r−1][p][s]+\frac{s}{r+p+s−1}×W_i+\frac{r−1}{r+p+s−1}×E_i,\\ &exp[r][p−1][s]+\frac{r}{r+p+s−1}×W_i+\frac{p−1}{r+p+s−1}×E_i,\\ &exp[r][p][s−1]+\frac{p}{r+p+s−1}×W_i+\frac{s−1}{r+p+s−1}×E_i) \end{aligned} exp[r][p][s]=max(exp[r1][p][s]+r+p+s1s×Wi+r+p+s1r1×Ei,exp[r][p1][s]+r+p+s1r×Wi+r+p+s1p1×Ei,exp[r][p][s1]+r+p+s1p×Wi+r+p+s1s1×Ei)
由于需要输出方案,另开一个数组存最大值的取值即可。计算一下状态总数为 ∑ r = 0 n ( n − r + 2 2 ) \sum_{r=0}^{n}\binom{n-r+2}{2} r=0n(2nr+2),空间复杂度所以是 n 3 n^3 n3的;每次转移都是 O ( 1 ) O(1) O(1)的,时间复杂度大概也是 O ( n 3 ) O(n^3) O(n3)的。由于 n n n(局数)只有60,所以时空限制都很宽松。

关键代码:

	exp[0][0][1]=w/3+e/3;
    exp[0][1][0]=exp[0][0][1];
    exp[1][0][0]=exp[0][1][0];//初始化
    for (int n=2; n<=60;n++)//枚举总共局数
    {
        for(int r=0;r<=n;r++)//枚举石头
        {
            for(int p=0;p+r<=n;p++){
                int s=n-p-r;
                if(exp[r][p][s]<exp[r-1][p][s]+s*w/(n-1)+(r-1)*e/(n-1))
                {
                    exp[r][p][s]=exp[r-1][p][s]+s*w/(n-1)+(r-1)*e/(n-1);
                    dec[r][p][s]=0;
                }
                if(exp[r][p][s]<exp[r][p-1][s]+r*w/(n-1)+(p-1)*e/(n-1))
                {
                    exp[r][p][s]=exp[r][p-1][s]+r*w/(n-1)+(p-1)*e/(n-1);
                    dec[r][p][s]=1;
                }
                if(exp[r][p][s]<exp[r][p][s-1]+p*w/(n-1)+(s-1)*e/(n-1))
                {
                    exp[r][p][s]=exp[r][p][s-1]+p*w/(n-1)+(s-1)*e/(n-1);
                    dec[r][p][s]=2;
                }//每次记录选择到dec数组
            }
        }
    }
//输出答案
int mx=0,r0,p0,s0,st=60;
    for(int r=0;r<=60;r++)
    {
        for(int p=0;p+r<=60;p++)
        {
            if(exp[r][p][60-r-p]>mx)
            {
                r0=r,p0=p,s0=60-r-p;
            }
        }
    }//找最优解
    while(st--)
    {
        ans[st]=dec[r0][p0][s0];//
        if(ans[st]==0)r0--;
        else if(ans[st]==1)p0--;
        else s0--;
        //寻找上一步
    }//循环60次

题目倒是越来越长了(;´༎ຶД༎ຶ`).

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值