《D o C P》学习笔记(5 - 1)Dealing with Uncertainty Through Probability - Lesson 5

本文是关于如何通过概率来处理不确定性的学习笔记,以小猪游戏为例。课程介绍了如何利用概率论来应对不确定性,并在编程中应用。内容包括游戏规则、状态表示、行动选择和策略优化等,旨在帮助理解如何在复杂问题中管理不确定性。
摘要由CSDN通过智能技术生成

备注1:每个视频的英文字幕,都翻译成中文,太消耗时间了,为了加快学习进度,我将暂停这个工作,仅对英文字幕做少量注释。
备注2:将.flv视频文件与Subtitles文件夹中的.srt字幕文件放到同1个文件夹中,然后在迅雷看看中打开播放,即可自动加载字幕。

Dealing with Uncertainty Through Probability

你可以学到什么:

  • 概率:小猪游戏。
  • 最大化期望效用以优化策略

Lesson 5

视频链接:
Lesson 5 - Udacity

Course Syllabus

Lesson 5: Dealing with Uncertainty Through Probability

1. Welcome Back

Hey, welcome back. Now, as we’ve said, this class is all about managing complexity.

Now many types of software manage complexity by trying to artificially(人为地;人工地) rule out(排除…可能性) any type of uncertainty. That is, say(比方说;宣称;假设) you have a checkbook-balancing(checkbook 支票簿) program, and it says you’ve got to enter(输入) the exact(准确的;精确的) amount(数量). You’ve got to say $39.27. You can’t say, oh I don’t know about $40.

It’s easier to write programs that deal that way, but it constrains(constrain 强迫;限制;约束) what you can do. So, in this unit we’re going to learn about how the laws of probability(the laws of probability 概率论) can allow you to deal with uncertainty in your programs. Now, the truly amazing thing is that you can allow uncertainty and what you know about the world, or what’s true right now and uncertainty in your actions, if the program does something, what happens next? Even though both of those are uncertain you can still use the laws of probability to calculate what it means to do the right thing. That is, we can have clarity(清楚;清晰度) of action. We can know exactly what the best thing to do is even though we’re uncertain about what’s going to happen. So follow with this unit, and we’ll learn how to do that.

2. Porcine Probability

porcine 猪的,似猪的

This unit is about probability, which is a tool for dealing with uncertainty. Once you understand probability, you’ll be able to tackle(着手处理) a much broader range of problems than you could with programs that don’t understand probability.

这里写图片描述

Often when we have problems with uncertainty, we’re dealing with search problems. Recall(回想;召回), in a search problem, we are in a current state. There are other states that we can transition(转换,转变) into, and we’re trying to achieve some goal, but we can’t do it all in one step. We have to paste together a sequence(序列;顺序) of steps. In doing that, we’re building up a search frontier(边界,边缘) that we’re continuing to explore from.

Now, uncertainty can come into play in two ways.

这里写图片描述

(1) We can be uncertain about the current state. Rather than knowing exactly where we are, it may be that we start off in one of four possible states and all we know is that we’re somewhere in there, but we’re not sure exactly where we are.

这里写图片描述

(2) The other place uncertainty can come in is when we apply an action, say this action here–action A–it may be that we don’t get to one specific(明确的;具体的) state but, rather, we’re uncertain as to what the action will do, and we might end up in this state or this state or this state instead of the one that we were aiming at.

And so we’ll see techniques for dealing with both of these types of uncertainty.

Now, one place where people are used to dealing with uncertainty is in playing games that employ(使用,利用) dice(骰子). And that’s what we’re going to deal with. In particular, we’re going to play a dice game which is called Pig. I don’t know why the game is called Pig. I can guarantee no porcine creatures(creature 生物,动物) were harmed in the creation(产物;创造物,产物) of this unit.

Here’s how the game works:

There are two players, although you could play with more. The players take turns, and on his turn a player has the option to roll the dice–a single die(die 骰子)–as often as he wants or to hold–to stop rolling. And the object(目标) of the game is to score a certain number of points. We’re going to say 50 points; 100 is more common, but 50 will be easier on the Udacity servers in terms of the amount of computation required.

And so it’s my turn, and we have a score. So here’s a scoreboard; we’ll have players with the imaginative(虚假的;富于想象力的) names of player 0 and player 1. And the score starts off 0 to 0. Now there’s another part of the scoreboard that is not part of the player’s score. We’ll call that the pending(待定的;未定的;未决的;即将发生的) score.

Let’s say it’s my turn. I pick up the die, I roll it, and let’s say I get a 5. Then 5 goes into the pending score, but I don’t score any points yet.

Now it’s my turn again. Do I roll or do I hold–stop rolling? Let’s say I want to roll again. This time I get a 2, so I add 2 to the pending score; I get 7. Let’s say I roll again. I’m lucky. I get a 6. I add 6 to the pending; I get 13. And I’m going great(go great 做得很好), so I roll again, and this time I get a 1. And a 1 is special. A 1 is called a pig out, and when you roll a pig out it means you lose all the pending points, and for your hand you score not this total in pending, but just the 1. So my score would be just the 1.

Now the other player, player number 1, goes. Let’s say player number 1 says, “I’m going to roll,” gets a 3. “I’m going to roll again,” gets a 4. “I’m going to roll again,” gets a 5. So now we have 12 in the pending, and now player number 1 says, “I think I’ve had enough; I’m going to hold,” and that means we take these points from the pending, the 12 points, put them up on the board for player 1’s score. And now player 1’s turn ends, and it’s player 0’s turn.

So your turn continues until you either hold or pig out, and your score for the turn is the sum of your rolls, if you didn’t pig out, if you decided to hold, and the score is just 1 if you pigged out. And you keep on taking turns until somebody reaches the target–here, 50. So that’s how the game of Pig works. Now let’s go to try to describe the game in a form that we can program.

3. The State of Pig

So as usual, we’re going to make an inventory of concepts in the game.

This time I’m going to try to break things out a little bit, and I’m going to talk about low-level concepts, high-level concepts, and mid-level concepts.

As we saw in the discussion forums there’s always a question of where do you want to start. Do you want to describe the low level first and build up from there? Do you want to describe the high level first and build down? I think for this case we’ll take more of a middle out approach.

So, at the mid level there’s the concept of current state of the game. We’re sort of(可以说,可说是) inching(缓动;微动) towards a search problem, and we know that we have to represent states for a search problem. So, we want to know the current state of the game.

If we’re thinking of search problems then we also have to know about actions we can take. We know that there are two actions: Roll and hold. So, here’s some candidates(报考者;申请求职者) for what’s in the current state.

First, the things that were on the scoreboard. The scoreboard, remember, had three things.

Then the player whose turn it is, we might want that to be part of the state. The previous roll of the dice, whether I just rolled a five or something else, that might be part of the state. The previous turn score, how much did the other player just make on their turn?

So, all of these are possibilities. You might be able to think of other possibilities. I want you to tell me which one of these are necessary to describe the state of the game. I guess I should say here that we’re assuming that the goal of the game, the number of points you need win, we’re assuming that’s constant(常量,常数) and doesn’t need to be represented in each individual state. We just represent it once for the whole game.

Which of these are necessary for the current state?

the things on the scoreboard(计分板上的东西):
 · score 0(0号玩家的分数)
 · score 1(1号玩家的分数)
 · pending(待定的分数)

the player whose turn it is(玩家的顺序):
 · player

the previous roll of dice(掷骰子游戏中,之前掷骰子的结果)
 · previous roll

the previous turn score(其他玩家在他们的顺序上,分数多少)
 · previous turn score

(我的答案:
我认为,在表示当前的状态中,前4个概念是必须的。
previous roll、previous turn score这2个概念,或者说分数,一定在pending上,或者已经计入了计分板上的玩家的分数当中。)

3. The State of Pig Solution

这里写图片描述

Well, we certainly have to know the score(score 0score 1). We have to know how much is pending, because that’s going to affect the score. We have to know what player is playing.

Now these things, what happened before, they might be interesting, but they don’t really help us to find the current state. So those are unnecessary.

STATE       -- (p, me, you, pending)
ACTIONS     -- roll/hold  

So, the state’s going to end up being something like a four tuple. I’ve written it as p, me, you, pending, the player to move, that player’s score, the other player’s score, and the pending score that hasn’t been reaped(reap 获得;得到) yet.

4. Concept Inventory

LOW LEVEL

 · DIE        the roll of a die:
 · SCORES     the implementation of scores:
 · PLAYERS    the implementation of the players:
 · TO MOVE    the player to move:
 · goal       the goal:

At the low level–I count as(当作) low-level things like the roll of a die, the implementation(实现) of scores, the implementation of the players and of the player to move, the goal–so these are all things that we’re going to have to represent.

HIGH LEVEL

 · play-pig    a function play-pig that plays a game between two players:
 · strategy    a strategy that a player is taking in order to play the game:

And then at the high level, I’m going to have a function play-pig, that plays a game between two players, and I have the notion(概念) of a strategy(策略)–a strategy that a player is taking in order to play the game.

Now let’s think about how to implement these things, and when I’m doing the implementation, I’m going to move top-down. So I started sort of(可以说,可说是) middle-out saying(say 假设;比方说) these are the kinds of things I think I’m going to need; now I have a good enough feel for them that I feel confident in moving top-down. I don’t see any difficulties in implementing any of these pieces.

If I start at the top, then I’ll be able to make choices later on without feeling constrained. If I thought there was something down here that was difficult to deal with, I might spend more time now, at the low level, trying to resolve(分解) what the right representation is for one of these difficult pieces, and that would inform(通知;预示) my high-level decisions. But since I don’t see any difficulty, I’m going to jump to the high level.

HIGH LEVEL

 · play-pig    fun(A, B) --> A
 · strategy    fn(state) --> action

Now, what’s play-pig? Well, I think that’s going to be a function, and let’s just say that its input is two players, A and B, and we haven’t decided yet how we’re going to represent those. And its output is–let’s say it’s going to be the winner of the game.

Maybe A is the winner. And we’ll have to make a choice of how we represent these players.

Now what’s a strategy? Well, a strategy–people sometimes use the word “policy”(policy 策略) for that. We can also represent that as a function. And it takes as input a state, and it returns an action or a move in the game.

MIDDLE LEVEL

 · STATE        (p, me, you, pending)
 · ACTIONS      'roll'/'hold'
     roll(state) --> state
     hold(state) --> state
         roll(state) --> {states}
             roll(state, d) --> state

In this game we said that the actions are roll and hold. We’re starting to move down. Let’s just say now how are we going to represent these actions? Well, we can call the actions just by strings, so we use the strings “roll” and “hold” and that could be what the strategy returns. But then we’ll also need something that implements these actions, so we’ll have to have something that’s a function that says–let’s say– the function “roll” takes a state and returns a new state; function “hold” takes a state and returns a new state.

But that doesn’t seem quite right; there’s a problem here. What about the die? That seems to take and effect that roll by itself is not a function from state to state.

Rather, roll–if we wanted to specify it–would be a function from a state to a set of states, and that represents the fundamental(基础的,基本的,根本的,重要的,原始的,主要的) uncertainty.

That’s why we need probability to deal with it. That’s why we have an uncertain or a nondeterministic(非确定的) domain is because an action doesn’t have a single result; rather, it can have a set of possible results.

And, in some cases it makes sense to go ahead and implement these actions as functions that look like that, that return sets of states. And I considered that as a possibility, but I ended up with an implementation where I talk about the different possibilities for the dice.

So the dice can come up as D, one of the numbers 1 to 6, and now roll, from a particular state with the particular die roll, that is going to return a single state rather than a set of states. And I just think it’s easier to deal this way, although in other applications you might want to deal that way.

LOW LEVEL
 · DIE -- int
 · SCORES -- int
 · goal -- int
 · TO MOVE -- 0, 1        the player to move
 · PLAYERS -- fn          strategy function

(这里我存在一点疑问:使用策略代表players?不是很理解)
Now the rest seems to be pretty easy. The die can be represented as an integer. Scores can be represented as integers. Likewise(likewise 同样地) the goal. The player to move–we can represent that as an index, 0 or 1, into an array of players. And the players themselves? Well, the simplest way to do it is just to represent the player by their strategy. The strategy is a function, and that could represent the player. We could have something more complex, but it seems like we don’t need anything more than that. So players will be strategy functions.

这里写图片描述

5. Hold and Roll

Now you’re probably itching(渴望的) to write some code by now–so let’s get started.

What I want you to do first is write these two action functions, hold and roll, which take a state and return a state.

Here the state that results from holding. Here the state that results from rolling and getting a d. A state is represented by this four tuple of p, the player. It’s either zero or one. The subsequent(后来的;随后的) state would remain the same if the player continues and would swap(交换) between one and the other otherwise. Me and you, two integers indicating the score, the score of the player to play and the score of the other player, and then pending, which is score accumulated(累积的) so far but not yet put onto the scoreboard.

Go ahead and write those functions.

# -----------------
# User Instructions
# instruction 指令;课程
# 
# Write the two action functions, hold and roll. Each should take a
# state as input, apply the appropriate action, and return a new
# appropriate 适当的;合适的
# state. 
#
# States are represented as a tuple of (p, me, you, pending) where
# p:       an int, 0 or 1, indicating which player's turn it is.
# me:      an int, the player-to-move's current score
# you:     an int, the other player's current score.
# pending: an int, the number of points accumulated on current turn,
# not yet scored
# accumulated 累积的

def hold(state):
    """Apply the hold action to a state to yield a new state:
    Reap the 'pending' points and it becomes the other player's turn."""
    # reap 获得;得到
    # your code here

def roll(state, d):
    """Apply the roll action to a state (and a die roll d) to yield a new state:
    If d is 1, get 1 point (losing any accumulated 'pending' points),
    accumulated 累积的
    and it is the other player's turn. If d > 1, add d to 'pending' points."""
    # your code here

def test():
    assert hold((1, 10, 20, 7))    == (0, 20, 17, 0)
    assert hold((0, 5, 15, 10))    == (1, 15, 15, 0)
    assert roll((1, 10, 20, 7), 1) == (0, 20, 11, 0)
    assert roll((0, 5, 15, 10), 5) == (0, 5, 15, 15)
    return 'tests pass'

print test()

(谈下我自己的思考过程吧:
我先回忆了之前的视频【Porcine Probability】中,对游戏规则的讲解,没有提到两个玩家会交换分数。
但是,test()中的用例,对于me和you两个玩家,应该是交换了分数。
再看本段视频中,又提到了swap。
再看看用例,再次理解游戏规则:
如果一方玩家选择了hold,即不掷骰子,那么turn顺序就必须轮到另一方玩家;
如果一方玩家选择了roll, 即 掷骰子,那么turn顺序就可能还是自己;
补充说明1:如果一方玩家roll的情况,掷骰子掷出了1,那么必须轮到另一方玩家;
补充说明2:对于补充说明1的情况,还有补充,pending清零,1加到掷骰子的玩家。

我准备先写hold(state)函数,因为这个函数稍微简单些。
先观察测试用例:

    assert hold((1, 10, 20, 7))    == (0, 20, 17, 0)
    assert hold((0, 5, 15, 10))    == (1, 15, 15, 0)

捋一捋它的逻辑:
如果state[0] == 1,那么,将state[1]与pending相加,并与state[2]交换,返回
(0, state[2], state[1]+pending, 0);
如果state[0] == 0,那么,将state[1]与pending相加,并与state[2]交换,返回
(1, state[2], state[1]+pending, 0);

也就是说,玩家选择hold不掷骰子时,必定是要交换turn的。

如下:

def hold(state):
    if state[0] == 1:
        return (0, state[2], state[1]+pending, 0)
    elif state[0] == 0:
        return (1, state[2], state[1]+pending, 0)

简化:

def hold(state):
    return (0, state[2], state[1]+state[3], 0) if state[0]==1 else (1, state[2], state[1]+state[3], 0)

再来写roll函数。
先观察测试用例:

    #      roll(state,          d)
    assert roll((1, 10, 20, 7), 1) == (0, 20, 11, 0)
    assert roll((0, 5, 15, 10), 5) == (0, 5, 15, 15)

捋一捋它的逻辑:
如果d等于1,则必须交换turn;
这种情况下,则pending清零,将state[1]与1相加,将相加之和与state[2]交换,即返回
(另一个玩家, state[2], state[1]+1, 0)
如果d不等于1,则不交换turn,还是自己掷骰子;
这种情况下,则pending与d相加,其他不变。

如下:

def roll(state, d):
    if d == 1:
        return (int(not p), state[2], state[1]+1, 0)
    elif d == 0:
        return (state[0], state[1], state[2], pending+d)

简化:

def roll(state, d):
    return (int(not state[0]), state[2], state[1]+1, 0) if d == 1 else (state[0], state[1], state[2], state[3]+d)

)

(先上我的完整的代码,测试通过了:

def hold(state):
    """Apply the hold action to a state to yield a new state:
    Reap the 'pending' points and it becomes the other player's turn."""
    # your code here
    return (0, state[2], state[1]+state[3], 0) if state[0]==1 else (1, state[2], state[1]+state[3], 0)

def roll(state, d):
    """Apply the roll action to a state (and a die roll d) to yield a new state:
    If d is 1, get 1 point (losing any accumulated 'pending' points),
    and it is the other player's turn. If d > 1, add d to 'pending' points."""
    # your code here
    return (int(not state[0]), state[2], state[1]+1, 0) if d == 1 else (state[0], state[1], state[2], state[3]+d)

def test():    
    assert hold((1, 10, 20, 7))    == (0, 20, 17, 0)
    assert hold((0, 5, 15, 10))    == (1, 15, 15, 0)
    assert roll((1, 10, 20, 7), 1) == (0, 20, 11, 0)
    assert roll((0, 5, 15, 10), 5) == (0, 5, 15, 15)
    return 'tests pass'

print test()

)

5. Hold and Roll

Peter的代码:

def hold(state):
    (p, me, you, pending) = state
    return [other[p], you, me+pending, 0]

def roll(state, d):
    (p, me, you, pending) = state
    if d == 1:
        return (other[p], you, me+1, 0)
    else:
        return (p, me, you, pending+d)

other = {
  1:0, 0:1}

Here’s my solution:

这里写图片描述

So, I have my state–I just broke it up into pieces so that I know what I’m talking about. Then if I hold it becomes the other player’s turn. The other player’s score is the same as it was before.

So now remember the second place is the score of the player whose turn it is. So, that was you previously, and then the score that I got–I just add in the pending. I reap(获得;得到) all of those, and the pending gets reset to zero. When I roll, again let’s figure out what’s in the state, if the roll is one that’s a pig out, it becomes the other player’s turn, and I only got one lousy point. Pending gets reset to zero.

If the roll is not a one then it’s still my turn. I don’t change my score so far, but I add d onto the pending.

Here’s just a way to map from one player to the other. If the other player, if it was one it becomes zero. If it was zero it becomes one. It’s always a great idea to write some test cases.

Now, one comment on style. Right here I’m taking this state, which is a tuple that has four components, and I’m breaking it up like this into it’s four components. When you have four components that’s probably okay, but it’s getting to worry me a little bit that maybe I won’t be able to remember which part of the state is which. If I had more than four, if I had five or six components, I really start to worry about that.

So there are other possibilities where we can be more explicit(明确的,清楚的) about what the state is rather than just have it be a set of undifferentiated(无差别的,一致的) elements of a tuple that we then define like this. We can define it ahead of time.

6. l Named Tuples

视频下方的补充材料——开始

Interested in learning more about namedtuples? Check out the python documentation.

视频下方的补充材料——结束

Now here’s an alternative.

Instead of just defining a state by just creating a tuple and then getting at the fields of a state by doing an assignment, we can use something called a namedtuple that gives a name to the tuple itself as well as to the individual elements.

We can define a new data type called state and use capitalized letters for data types. Say state is equal to a namedtuple, and the name of the data type is state, and the fields of the data type are p, me, you, and pending.

state = (1, 2, 3, 4)
(p, me, you, pending) = state

from colle
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值