《D o C P》学习笔记（6 - 1）Word Games

最新推荐文章于 2022-07-16 13:55:50 发布

HenryQWER

最新推荐文章于 2022-07-16 13:55:50 发布

阅读量2.9k

点赞数

分类专栏：软件工程 SICP Design of Computer Programs 学习

本文链接：https://blog.csdn.net/qq_33528613/article/details/80712826

版权

这篇博客记录了作者在学习《D o C P》课程中关于Word Games部分的笔记。课程重点在于管理复杂性，处理大量词汇，选择合适的数据结构。博主将通过一系列的编程练习，实现一个填字游戏的高分算法，涉及单词查找、得分计算、最佳手牌等。课程包括多个章节，涵盖游戏规则、数据结构和算法的运用，以及解决实际问题的技巧。

摘要由CSDN通过智能技术生成

备注1：每个视频的英文字幕，都翻译成中文，太消耗时间了，为了加快学习进度，我将暂停这个工作，仅对英文字幕做少量注释。
备注2：将.flv视频文件与Subtitles文件夹中的.srt字幕文件放到同1个文件夹中，然后在迅雷看看中打开播放，即可自动加载字幕。

Word Games

你可以学到什么：

Managing complexity.
Large sets of words.
Appropriate data structures.

Lesson 6

视频链接：
Lesson 6 - Udacity

Lesson 6 - YouTube

Course Syllabus

Lesson 6: Word Games

Lesson 6 Course Notes（主要是课程视频对应的英文字幕的网页。）
Lesson 6 Code
Lesson 6 words4k.txt file

01 Welcome Back

Hi, welcome back. So far in this class we covered a lot of programming techniques, but we mostly done it with small examples of code. In this unit, we’re going to look at a larger example than anything we seen before. We’re going to write an algorithm for finding the highest scoring play in a crossword纵横字谜) tile(牌;麻将牌) game. Now, versions of this game go by names like Scrabble(乱摸;扒寻) and Words with Friends. So we’re going to have to represent everything about the words, the tiles, the board, the scoring and the algorithm for finding the highest scoring word. That’s going to be more code I’m going to be writing a lot of it and you’re going to get practice reading, that’s an important skill but I’m also going to stop and leave you plenty of places where you can write some other code and at any point, if you want a bigger challenge, you can stop the video and go ahead yourself and try to solve as much of it as you can on your own and I would encourage you to do that. This is a big step-up(stepup 加速的;增强的). I think you are all ready for it. So let’s get started.

02 Word Games

We’ve got a lot to cover and not much time to do it so let’s dig right in. Here’s a game I’m playing online with my friend, Ken.

这里写图片描述

I’m winning by a little bit mostly because I got good letters like the Z and so on but Ken is catching up.

Let’s dig right in(dig in 全力以赴地做起来) and come up with our concept inventory. What have we got?

Well, the most obvious thing, there’s a board.
there’s letters–both letters on the board and letters in the hand, and the letters on the board have to form words and in the hand they’re not.
There’s the notion(概念,观念;) of a legal(合法的) play on the board, so RITZY is a word, and it’s a word independent of where it appears, but it’s legal to have placed it here where it hooks up(hook up 连接) with another letter, and it wouldn’t have been legal to place it where it bumps into(撞上;偶然遇见) the H or where it’s not attached to anything else.
There’s the notion of score and the score for individual letters. Z is worth 10. An I is worth 1.
And there are scores for a play where you add up the letters. Part of that is that there are bonuses(bonus 奖金,额外津贴;红利) on the board. DL means double letter score. A letter that’s placed there gets doubled. DW means double word score. If any letter of the word is on that square, then the whole word score is doubled, and we also have triples as well.
Somewhere behind the scenes, there’s a dictionary and all these words are in the dictionary and other combinations of letters are not.
Then not shown here is the notion of a blank tile. Part of the hand might be a blank that isn’t indicating any particular letter, but you’re free to use for any one, similar to the way we had jokers in the game of poker.

(不完整的笔记：
Z : 10 分
I : 1 分
DL : Double Letter score
DW : Double Word score
TL : Triple Letter
TW : Triple Word

blank tile : 手里的牌的一部分是 blank 空的，不代表任何特定的字母，但你可以自由使用任何 1 个字母。
)

03 Concept Inventory

Now let’s talk about how to implement any of these, see if there’s any difficulties, any areas that we think might be hard to implement.

The board can be some kind of two-dimensional array, maybe a list of lists is one possibility. One thing I’m not quite clear on now is do I need one board or two? It’s clear I need one board to hold all the letters, but then there’s also the bonus squares. Should that be part of the same board or should that be a separate board and the letters are layered on top of this background of bonus squares? I’m not quite sure yet, but I’m not too worried about it, because I can make either approach work.

A letter can be one character string.

A word can be a string.

A hand can also be a string. It could also be a list of letters. Either one would be fine. Any collection of letters would be okay. Note that a set would not work for the hand. The hand can’t be a set of letters, because we might have duplicates, and sets don’t allow duplicates.

Now, for the notion of a legal play, we’ll have some function that generates legal plays, given a board position and a hand, and then the plays themselves will need some representation. Maybe they can be something like a tuple of say starting position– for example, “RITZY” starts in this location, the direction in which they’re going– are they going across or down, the two allow about directions–and the word itself. In this case, RITZY. That seems like a good representation for a legal play.

I’m not quite sure yet what the representation of a position or a direction should be, but that’s easy enough.

A score–we’ll have some function to compute the score.

For letters, we can have a dictionary that says the value of Z is 10.

For plays we’ll need some function to compute that.

For the bonus squares, we’ll need some mapping from a position on the board to double word or triple letter or whatever.

A dictionary is a set of words.

The blank letter–well, we said letters were strings, so that’s probably okay. We could use the string space or the string underscore, to represent the blank. Then it’s dealing with it that will be an issue later on. Now, I’m a little bit worried about blanks, because in poker Jokers(joker 纸牌百搭;纸牌中可当任何点数用的一张) were easy. We just said, replace them by any card and just deal with all the possibilities. Our routines are fast enough that we could probably deal with them all. Here I’m pretty confident we can make it fast enough that that approach will work, but it doesn’t quite work because not only do we have to try all possibilities for the letter, but the scoring rules are actually different. When you use a blank instead of a letter, you don’t get the letter scores for that blank. We’ll have to have scoring know about blanks and not just know about filling things in. That’ll be a complication. But overall I went through all the concepts. I’ve got an implementation for both.

这里写图片描述

Some of them are functions that I don’t quite know how to do, but I don’t see anything that looks like a show stopper. I think I can go ahead. The difficulty then is not that I have to invent something new in order to solve one of the problems.

The difficulty is just that there’s so much.

When faced with a problem of this size or problems can be much larger, the notion(概念;观念) of pacing(领先于) is an important one.

What do I mean by that? It means I want to attack this, and I know I’m not going to solve it all at once. I’m not just going to sit down for 20 minutes and knock out(淘汰;击败;出局) the whole problem. It’s going to be a lot longer than that.

I want to have pacing in that I have intermediate(中间的) goals along the way where I can say, okay, now I’m going focus on one part of the problem, and I’m going to get that done. Then when I’m done with that part, then I can move on to the next part.

If you don’t have that pacing, you can lose your focus. You can get discouraged that there’s so much left to do. But if you break it up into bite-sized(很小的) pieces, then you can say, okay, I’m almost there. I just have to finish a little bit more, and now this piece will be done, and then I can move on to the next piece.

The first piece I’m going to look at is finding words from a hand. In other words, I’m going ignore the whole board. I’m going to say pretend the board isn’t there and pretend all we have is the hand, and we have the dictionary, a set of legal words. I want to know out of(由于;用…(材料);得自(来源)) that hand, what words in the dictionary can I make?

04 Finding Words

Let’s get started. The first thing I need is to come up with a dictionary of all the words.

Now, we’ve created a small file with about 4,000 words in it, called “word4k.txt.”

Let’s take that file, read it, convert it to uppercase, because Scrabble(乱摸;扒寻) with Words with Friends use only uppercase letters, split it into a list of words, assign that to a global variable– we’ll call it WORDS and put it in all uppercase, just make sure that it stands out. Let’s make this a set so that access to it is easy. We can figure out very quickly whether a word is in the dictionary. Okay, so now we’re done.

(补充内容： file() 方法简介
file() 方法的别名是 open() ，是内置函数，用于创建 1 个 file 对象，我猜测 file() 方法只在 Python 2.x 中出现，Python 3.x 中应该没有，时间有限，不去深究了。
)

WORDS = set(file('words4k.txt').read().upper().split())

We have our words. Then I want to find all the words within a hand. So the hand will be seven letters, and I want to find all the words of seven letters or less that can be made out of those letters. I’m going start with a very straightforward approach, and then we’re going to refine(提炼;改善) it over time. Here is what I’ve done:
(关于为什么最多是 7 个字母？
去看下最前面的第 1 张图片，图片最下面的 1 手牌中，最多容纳 7 个字母。
)

def find_words(hand):
    "Find all words that can be made from the letters in hand."
    results = set()
    for a in hand:
        if a in WORDS: results.add(a)
        for b in removed(hand, a):
            w = a+b
            if w in WORDS: results.add(w)
            for c in removed(hand, w):
                w = a+b+c
                if w in WORDS: results.add(w)
                for d in removed(hand, w):
                    w = a+b+c+d
                    if w in WORDS: results.add(w)
                    for e in removed(hand, w):
                        w = a+b+c+d+e
                        if w in WORDS: results.add(w)
                        for f in removed(hand, w):
                            w = a+b+c+d+e+f
                            if w in WORDS: results.add(w)
                            for g in removed(hand, w):
                                w = a+b+c+d+e+f+g
                                if w in WORDS: results.add(w)

    return results

(把 removed 函数的几次调用结果拿出来，应该会有助于理解此函数的作用：

>>> removed('letter', 'l')
'etter'
>>> removed('letter', 't')
'leter'
>>> removed('letter', 'set')
'lter'
>>> removed('letter', 'setter')
'l'

)

I haven’t worried about repeating myself and about making the code long. I just wanted to make it straightforward. Then I said, the first letter a can be any letter in the hand. If that’s a word, then go ahead and add that to my set of results. I start off with an empty set of results, and I’m going to add as I go. Otherwise, b can be any letter in the result of removing a from the hand. Now the word that I’m building up is a + b–two-letter word. If that’s a word, add it. Otherwise, c can be any letter in the hand without w in it– the remaining letters in the hand. A new word can is a + b + c. If that’s in WORDS, then add it, and we just keep on going through, adding a letter each time, checking to see if that’s in the WORDS, adding them up.

(补充
replace() 函数的用法简介：

>>> s
'cheese'
>>> s.replace('e', 'f')
'chffsf'
>>> s.replace('e', 'f', 1)
'chfese'

s.replace(a, b) 的含义是将字符串 s 中的所有的字符 a 全部替换为字符 b ；
s.replace(a, b, 1) 的含义是将字符串 s 中的第 1 次出现的字符 a 替换为字符 b 。

)

Here’s my definition of removed:
(注释中，有我对该函数的理解。)

# It takes a hand or a sequence of letters and then the letter or letters to remove.
def removed(letters, remove):
    "Return a str of letters, but with each letter in remove removed once."
    # 遍历 remove 字符串中的每 1 个字符 L
    for L in remove:
        # 将 letters 中第 1 次出现的 L 字符替换为空字符 '' ，即删除掉
        letters = letters.replace(L, '', 1)
    return letters

It takes a hand or a sequence of letters and then the letter or letters to remove. For each of those letters just replace the letter in the collection of letters with the empty string and do that exactly once, so don’t remove all of them. Then return the remaining letters.

Does it work? Well, if I find words with this sequence of letters in my hand, it comes back with this list.

>>> find_words('LETTERS')
set(['ERS', 'RES', 'RET', 'ERE', 'STREET', 'ELS', 'REE', 'SET', 'LETTERS', 'SER', 'TEE', 'RE', 'SEE', 'SEL', 'TET', 'EL', 'REST', 'ELSE', 'LETTER', 'ET', 'ES', 'ER', 'LEE', 'EEL', 'TREE', 'TREES', 'LET', 'TEL', 'TEST'])
>>>

That looks pretty good. It’s hard for me to verify(核实;证明) right now that I found everything that’s in my dictionary, but it looks good, and I did a little bit of poking around(poke around 闲逛) in the dictionary for likely things, and all the words I could think of that weren’t in this set were not in the dictionary. That’s why they weren’t included. That’s looks pretty good. I’m going to be doing a lot of work here, and I’m going to be modifying this function and changing it. I’d like to have a better set of tests than just one test.

05 Regression Tests

(regression tests 回归测试)

I made up a bigger test. I made up a dictionary of hands that map from a hand to a set of words that I found.

hands = {  ## Regression test
    'ABECEDR': set(['BE', 'CARE', 'BAR', 'BA', 'ACE', 'READ', 'CAR', 'DE', 'BED', 'BEE',
         'ERE', 'BAD', 'ERA', 'REC', 'DEAR', 'CAB', 'DEB', 'DEE', 'RED', 'CAD',
         'CEE', 'DAB', 'REE', 'RE', 'RACE', 'EAR', 'AB', 'AE', 'AD', 'ED', 'RAD',
         'BEAR', 'AR', 'REB', 'ER', 'ARB', 'ARC', 'ARE', 'BRA']),
    'AEINRST': set(['SIR', 'NAE', 'TIS', 'TIN', 'ANTSIER', 'TIE', 'SIN', 'TAR', 'TAS',
         'RAN', 'SIT', 'SAE', 'RIN', 'TAE', 'RAT', 'RAS', 'TAN', 'RIA', 'RISE',
         'ANESTRI', 'RATINES', 'NEAR', 'REI', 'NIT', 'NASTIER', 'SEAT', 'RATE',
         'RETAINS', 'STAINER', 'TRAIN', 'STIR', 'EN', 'STAIR', 'ENS', 'RAIN', 'ET',
         'STAIN', 'ES', 'ER', 'ANE', 'ANI', 'INS', 'ANT', 'SENT', 'TEA', 'ATE',
         'RAISE', 'RES', 'RET', 'ETA', 'NET', 'ARTS', 'SET', 'SER', 'TEN', 'RE',
         'NA', 'NE', 'SEA', 'SEN', 'EAST', 'SEI', 'SRI', 'RETSINA', 'EARN', 'SI',
         'SAT', 'ITS', 'ERS', 'AIT', 'AIS', 'AIR', 'AIN', 'ERA', 'ERN', 'STEARIN',
         'TEAR', 'RETINAS', 'TI', 'EAR', 'EAT', 'TA', 'AE', 'AI', 'IS', 'IT',
         'REST', 'AN', 'AS', 'AR', 'AT', 'IN', 'IRE', 'ARS', 'ART', 'ARE']),
    'DRAMITC': set(['DIM', 'AIT', 'MID', 'AIR', 'AIM', 'CAM', 'ACT', 'DIT', 'AID', 'MIR',
         'TIC', 'AMI', 'RAD', 'TAR', 'DAM', 'RAM', 'TAD', 'RAT', 'RIM', 'TI',
         'TAM', 'RID', 'CAD', 'RIA', 'AD', 'AI', 'AM', 'IT', 'AR', 'AT', 'ART',
         'CAT', 'ID', 'MAR', 'MA', 'MAT', 'MI', 'CAR', 'MAC', 'ARC', 'MAD', 'TA',
         'ARM']),
    'ADEINRST': set(['SIR', 'NAE', 'TIS', 'TIN', 'ANTSIER', 'DEAR', 'TIE', 'SIN', 'RAD', 
         'TAR', 'TAS', 'RAN', 'SIT', 'SAE', 'SAD', 'TAD', 'RE', 'RAT', 'RAS', 'RID',
         'RIA', 'ENDS', 'RISE', 'IDEA', 'ANESTRI', 'IRE', 'RATINES', 'SEND',
         'NEAR', 'REI', 'DETRAIN', 'DINE', 'ASIDE', 'SEAT', 'RATE', 'STAND',
         'DEN', 'TRIED', 'RETAINS', 'RIDE', 'STAINER', 'TRAIN', 'STIR', 'EN',
         'END', 'STAIR', 'ED', 'ENS', 'RAIN', 'ET', 'STAIN', 'ES', 'ER', 'AND',
         'ANE', 'SAID', 'ANI', 'INS', 'ANT', 'IDEAS', 'NIT', 'TEA', 'ATE', 'RAISE',
         'READ', 'RES', 'IDS', 'RET', 'ETA', 'INSTEAD', 'NET', 'RED', 'RIN',
         'ARTS', 'SET', 'SER', 'TEN', 'TAE', 'NA', 'TED', 'NE', 'TRADE', 'SEA',
         'AIT', 'SEN', 'EAST', 'SEI', 'RAISED', 'SENT', 'ADS', 'SRI', 'NASTIER',
         'RETSINA', 'TAN', 'EARN', 'SI', 'SAT', 'ITS', 'DIN', 'ERS', 'DIE', 'DE',
         'AIS', 'AIR', 'DATE', 'AIN', 'ERA', 'SIDE', 'DIT', 'AID', 'ERN',
         'STEARIN', 'DIS', 'TEAR', 'RETINAS', 'TI', 'EAR', 'EAT', 'TA', 'AE',
         'AD', 'AI', 'IS', 'IT', 'REST', 'AN', 'AS', 'AR', 'AT', 'IN', 'ID', 'ARS',
         'ART', 'ANTIRED', 'ARE', 'TRAINED', 'RANDIEST', 'STRAINED', 'DETRAINS']),
    'ETAOIN': set(['ATE', 'NAE', 'AIT', 'EON', 'TIN', 'OAT', 'TON', 'TIE', 'NET', 'TOE',
         'ANT', 'TEN', 'TAE', 'TEA', 'AIN', 'NE', 'ONE', 'TO', 'TI', 'TAN',
         'TAO', 'EAT', 'TA', 'EN', 'AE', 'ANE', 'AI', 'INTO', 'IT', 'AN', 'AT',
         'IN', 'ET', 'ON', 'OE', 'NO', 'ANI', 'NOTE', 'ETA', 'ION', 'NA', 'NOT',
         'NIT']),
    'SHRDLU': set(['URD', 'SH', 'UH', 'US']),
    'SHROUDT': set(['DO', 'SHORT', 'TOR', 'HO', 'DOR', 'DOS', 'SOUTH', 'HOURS', 'SOD',
         'HOUR', 'SORT', 'ODS', 'ROD', 'OUD', 'HUT', 'TO', 'SOU', 'SOT', 'OUR',
         'ROT', 'OHS', 'URD', 'HOD', 'SHOT', 'DUO', 'THUS', 'THO', 'UTS', 'HOT',
         'TOD', 'DUST', 'DOT', 'OH', 'UT', 'ORT', 'OD', 'ORS', 'US', 'OR',
         'SHOUT', 'SH', 'SO', 'UH', 'RHO', 'OUT', 'OS', 'UDO', 'RUT']),
    'TOXENSI': set(['TO', 'STONE', 'ONES', 'SIT', 'SIX', 'EON', 'TIS', 'TIN', 'XI', 'TON',
         'ONE', 'TIE', 'NET', 'NEXT', 'SIN', 'TOE', 'SOX', 'SET', 'TEN', 'NO',
         'NE', 'SEX', 'ION', 'NOSE', 'TI', 'ONS', 'OSE', 'INTO', 'SEI', 'SOT',
         'EN', 'NIT', 'NIX', 'IS', 'IT', 'ENS', 'EX', 'IN', 'ET', 'ES', 'ON',
         'OES', 'OS', 'OE', 'INS', 'NOTE', 'EXIST', 'SI', 'XIS', 'SO', 'SON',
         'OX', 'NOT', 'SEN', 'ITS', 'SENT', 'NOS'])}

The idea here is that this test is not so much proving that I’ve got the right answer, because I don’t know for sure that this is the right answers. Rather, this is what we call a regression test, meaning as we change our program we want to make sure that we haven’t broken any of these–that we haven’t made changes to our functions.

Even if I don’t know this is exactly the right set, I want to know when I made a change, have I changed the result here. I’ll be able to rerun this and say, have we done exactly the same thing. I’ll also be able to time(测定…的时间) the results of running these various hands and see if we can make our function faster. Here is my list of hands. I’ve got eight hands.

Then I did some further tests here.

def test_words():
    assert removed('LETTERS', 'L') == 'ETTERS'
    assert removed('LETTERS', 'T') == 'LETERS'
    assert removed('LETTERS', 'SET') == 'LTER'
    assert removed('LETTERS', 'SETTER') == 'L'
    t, results = timedcall(map, find_words, hands)
    for ((hand, expected), got) in zip(hands.items(), results):
        assert got == expected, "For %r: got %s, expected %s (diff %s)" % (
            hand, got, expected, expected ^ got)
    return t

timedcall(map, find_words, hands)
0.5527249999

I’m testing removing letters–got all those right. Then I’m going through the hands, and I’m using my timedcall() function that we build last time. That returnsin lapsed(流失的;堕落的) time and a set of results. I make sure all the results are what I expected. Then I return the time elapsed for finding all the words in those eight hands.

It turns out it takes half a second. That kind of worries me. That doesn’t sound very good. Sure, if I was playing Scrabble with a friend and they reply in a half second, that’d be pretty good. Much better than me, for example. In this game here it says that I haven’t replied to my friend Ken in 22 hours. This is a lot better, but still, if we’re going to be doing a lot of work and trying to find the best possible play, half a second to evaluate eight hands– that doesn’t seem fast enough.

Why is find_words() so slow? One thing is that it’s got a lot of nested loops, and it always does all of them. A lot of that is going to be wasteful. For example, let’s say the first two letters in the hand were z and q. At the very start here w is z + q, and now I loop through all the other combinations of all the other letters in the hand trying to find words that start with z + q, but there aren’t any words in the dictionary that start with zq. As soon as I got here, I should be able to figure that out and not do all of the rest of these nested loops.
(
find_words() 函数为什么这么慢？
比方说，手里有 z q 开头的两个字母，但是字典里面没有 zq 开头的单词，但是上述函数的内部仍然会不断地进入内层的循环，这样的情况下，效率非常低下。
)

06 Readwordlist

What I’m going to do is introduce a new concept that we didn’t see before in our initial listing of the concepts, but which is an important one–the notion of a prefix of a word. It’s important only for efficiency and not for correctness–that’s why it didn’t show up the first time. The idea is that given a word there are substrings, which are prefixes of the word.

The empty string is such a prefix. Just W is a prefix. W-O is a prefix. W-O-R is a prefix.

Now, we always have to decide what we want to do with the endpoints. I think for the way I want to use it I do want to include the empty string as a valid prefix, but I think I don’t want to include the entire string W-O-R-D. I’m not going to count that as a prefix of the word. That is the word. I’m going to define this function prefixes(word). It’s pretty straightforward. Just iterate through the range, and the prefixes of W-O-R-D are the empty string and these three longer strings. Now here’s the first bit that I want you to do for me. Reading in our list of words from the dictionary is a little bit complicated in that we want to compute two things–a set of words and a set of prefixes for all the words in the dictionary. The set together of prefixes for each word–union all of those together. I’m going to put that together into a function readwordlist(), which takes the file name and returns these two sets. I want you to write the code for that function here.

06 Readwordlist (answer)

(我的答案：

def readwordlist(filename):
    """Read the words from a file and return a set of the words 
    and a set of the prefixes."""
    file = open(filename) # opens file
    text = file.read()