Project2：CS 61A Autocorrected Typing Software

最新推荐文章于 2023-10-18 22:36:09 发布

Balaaam

最新推荐文章于 2023-10-18 22:36:09 发布

阅读量1.6k

点赞数

分类专栏： CS61A 文章标签：学习 python

本文链接：https://blog.csdn.net/weixin_50697073/article/details/128793988

版权

CS61A 专栏收录该内容

9 篇文章 3 订阅

订阅专栏

Autocorrected Typing Software

1. 前言

本项目是 CS 61A 2022 Fall 的第二个 Project。
这次是实现一个测量打字速度的程序。此外，还要实现一个打字纠错功能（一种在用户输入单词后尝试纠正单词拼写的功能）。

本项目主要在 cats.py 文件内进行，项目的主体也在这个文件里。会用到 utils.py 的字符串相关的工具函数。

2. Phase 1：Typing

Phase 1 是实现打字检测正确度和测速功能。有四个 Problem，各个 Problem 的介绍一下就不翻译了。

2.1 pick

Throughout the project, we will be making changes to functions in cats.py.

Implement pick. This function selects which paragraph the user will type. It takes three parameters:

a list of paragraphs (strings)
a select function, which returns True for paragraphs that can be selected
a non-negative index k
The pick function returns the kth paragraph for which select returns True. If no such paragraph exists (because k is too large), then pick returns the empty string.

Problem 1 实现一个从段落（paragraphs）中挑选一个符合条件（select）的第 k 个段落。考察 list 的使用。

def pick(paragraphs, select, k):
    """Return the Kth paragraph from PARAGRAPHS for which SELECT called on the
    paragraph returns True. If there are fewer than K such paragraphs, return
    the empty string.

    Arguments:
        paragraphs: a list of strings
        select: a function that returns True for paragraphs that can be selected
        k: an integer

    >>> ps = ['hi', 'how are you', 'fine']
    >>> s = lambda p: len(p) <= 4
    >>> pick(ps, s, 0)
    'hi'
    >>> pick(ps, s, 1)
    'fine'
    >>> pick(ps, s, 2)
    ''
    """
    # BEGIN PROBLEM 1
    "*** YOUR CODE HERE ***"
    ps = [] # 挑选符合条件的段落。
    for str in paragraphs:
        if select(str):
            ps += [str]
    if k >= len(ps): # 如果没有符合的段落，则返回空串。
        return ''
    return ps[k]
    # END PROBLEM 1

2.2 about

Implement about, which takes a list of topic words. It returns a function which takes a paragraph and returns a boolean indicating whether that paragraph contains any of the words in topic.

Once we’ve implemented about, we’ll be able to pass the returned function to pick as the select argument, which will be useful as we continue to implement our typing test.

To be able to make this comparison accurately, you will need to ignore case (that is, assume that uppercase and lowercase letters don’t change what word it is) and punctuation in the paragraph. Additionally, only check for exact matches of the words in topic in the paragraph, not substrings. For example, “dogs” is not a match for the word “dog”.

这个问题是实现 about函数，这个函数会返回一个函数。返回的函数获取一个段落并返回一个布尔值，指示该段落是否包含 topic 中的任何单词。考察 list 的使用，string 的相关成员函数以及 in 关键字、高阶函数的使用。

def about(topic):
    """Return a select function that returns whether
    a paragraph contains one of the words in TOPIC.

    Arguments:
        topic: a list of words related to a subject

    >>> about_dogs = about(['dog', 'dogs', 'pup', 'puppy'])
    >>> pick(['Cute Dog!', 'That is a cat.', 'Nice pup!'], about_dogs, 0)
    'Cute Dog!'
    >>> pick(['Cute Dog!', 'That is a cat.', 'Nice pup.'], about_dogs, 1)
    'Nice pup.'
    """
    assert all([lower(x) == x for x in topic]), 'topics should be lowercase.'
    # BEGIN PROBLEM 2
    "*** YOUR CODE HERE ***"
    def select(paragraph):
        paragraph = split(lower(remove_punctuation(paragraph))) #去除标点符号，并将段落全部小写，以及分隔单词。
        for s in paragraph:
            if s in topic:
                return True
        return False
    return select
    # END PROBLEM 2

2.3 accuracy

Implement accuracy, which takes a typed paragraph and a source paragraph. It returns the percentage of words in typed that exactly match the corresponding words in source. Case and punctuation must match as well. “Corresponding” here means that two words must occur at the same indices in typed and source—the first words of both must match, the second words of both must match, and so on.

A word in this context is any sequence of characters separated from other words by whitespace, so treat “dog;” as a single word.

If typed is longer than source, then the extra words in typed that have no corresponding word in source are all incorrect.

If both typed and source are empty, then the accuracy is 100.0. If typed is empty but source is not empty, then the accuracy is zero. If typed is not empty but source is empty, then the accuracy is zero.

此问题是实现 accuracy 函数，它接受一个 typed 段落（输入）和一个 source（目标）段落。它返回 typed 的单词与 source 中的对应单词完全匹配的单词的百分比。注意一些匹配细节。也是考察 list 和 for 语句的相关使用。

def accuracy(typed, source):
    """Return the accuracy (percentage of words typed correctly) of TYPED
    when compared to the prefix of SOURCE that was typed.

    Arguments:
        typed: a string that may contain typos
        source: a string without errors

    >>> accuracy('Cute Dog!', 'Cute Dog.')
    50.0
    >>> accuracy('A Cute Dog!', 'Cute Dog.')
    0.0
    >>> accuracy('cute Dog.', 'Cute Dog.')
    50.0
    >>> accuracy('Cute Dog. I say!', 'Cute Dog.')
    50.0
    >>> accuracy('Cute', 'Cute Dog.')
    100.0
    >>> accuracy('', 'Cute Dog.')
    0.0
    >>> accuracy('', '')
    100.0
    """
    typed_words = split(typed)
    source_words = split(source)
    # BEGIN PROBLEM 3
    "*** YOUR CODE HERE ***"
    if source == '' and typed == '': #如果 source 和 typed 都为空串。
        return 100.0
    elif source == '' or typed == '': #如果source 和 typed 有一个为空串。
        return 0.0
    total = 0
    length = min(len(typed_words), len(source_words))
    for i in range(length):
        if typed_words[i] == source_words[i]:
            total += 1
    return 100 * total / len(typed_words) #返回百分比。
    # END PROBLEM 3

2.4 wpm（words per minute）

Implement wpm, which computes the words per minute, a measure of typing speed, given a string typed and the amount of elapsed time in seconds. Despite its name, words per minute is not based on the number of words typed, but instead the number of groups of 5 characters, so that a typing test is not biased by the length of words. The formula for words per minute is the ratio of the number of characters (including spaces) typed divided by 5 (a typical word length) to the elapsed time in minutes.

For example, the string "I am glad!" contains three words and ten characters (not including the quotation marks). The words per minute calculation uses 2 as the number of words typed (because 10 / 5 = 2). If someone typed this string in 30 seconds (half a minute), their speed would be 4 words per minute.

此问题是实现 wpm函数，功能是计算一分钟内打字的速度。比较简单。

def wpm(typed, elapsed):
    """Return the words-per-minute (WPM) of the TYPED string.

    Arguments:
        typed: an entered string
        elapsed: an amount of time in seconds

    >>> wpm('hello friend hello buddy hello', 15)
    24.0
    >>> wpm('0123456789',60)
    2.0
    """
    assert elapsed > 0, 'Elapsed time must be positive'
    # BEGIN PROBLEM 4
    "*** YOUR CODE HERE ***"
    return len(typed) / 5 * 60 / elapsed
    # END PROBLEM 4

2. Phase 2：Autocorrect

Phase 2 是实现自动纠错功能，主要考察递归的熟练度。

2.1 Autocorrect

Implement autocorrect, which takes a typed_word, a word_list, a diff_function, and a limit.

If the typed_word is contained inside the word_list, autocorrect returns that word.

Otherwise, autocorrect returns the word from word_list that has the lowest difference from the provided typed_word based on the diff_function. However, if the lowest difference between typed_word and any of the words in word_list is greater than limit, then typed_word is returned instead.

A diff function takes in three arguments. The first is the typed_word, the second is the source word (in this case, a word from word_list), and the third argument is the limit. The output of the diff function, which is a number, represents the amount of difference between the two strings.

diff_function 有三个参数。第一个参数是 typed_word ，第二个参数是源单词(在本例中，是来自 word_lis t的单词)，第三个参数是 limit。函数的输出是一个数字，表示两个字符串之间的差值。

autocorrect根据 diff_function 从 word_list 中返回与提供的 typed_word 相差最小的单词。

本问题主要要注意 min 或 max 与 key形参的搭配使用。

def autocorrect(typed_word, word_list, diff_function, limit):
    """Returns the element of WORD_LIST that has the smallest difference
    from TYPED_WORD. Instead returns TYPED_WORD if that difference is greater
    than LIMIT.

    Arguments:
        typed_word: a string representing a word that may contain typos
        word_list: a list of strings representing source words
        diff_function: a function quantifying the difference between two words
        limit: a number

    >>> ten_diff = lambda w1, w2, limit: 10 # Always returns 10
    >>> autocorrect("hwllo", ["butter", "hello", "potato"], ten_diff, 20)
    'butter'
    >>> first_diff = lambda w1, w2, limit: (1 if w1[0] != w2[0] else 0) # Checks for matching first char
    >>> autocorrect("tosting", ["testing", "asking", "fasting"], first_diff, 10)
    'testing'
    """
    # BEGIN PROBLEM 5
    "*** YOUR CODE HERE ***"
    diff_list = []
    for s in word_list: # 构造差异度列表。
        diff_list += [diff_function(typed_word, s, limit)]
        
    if min(diff_list) > limit or typed_word in word_list: # 其它条件。
        return typed_word

    def func(word):
        return diff_function(typed_word, word, limit)
    return min(word_list, key=func) # 返回差异最小的 word。
    # END PROBLEM 5

2.2 feline_fixes

Implement feline_fixes, which is a diff function that takes two strings. It returns the minimum number of characters that must be changed in the typed word in order to transform it into the source word. If the strings are not of equal length, the difference in lengths is added to the total.

Problem 6 和 Problem 7 是要实现两个计算差异度的函数。Problem 6 是从两个字符串的的不同字符个数来实现，而 Problem 7 则是从到目标字符所需要的最小修改次数。两个问题都是要通过递归实现。

def feline_fixes(typed, source, limit):
    """A diff function for autocorrect that determines how many letters
    in TYPED need to be substituted to create SOURCE, then adds the difference in
    their lengths and returns the result.

    Arguments:
        typed: a starting word
        source: a string representing a desired goal word
        limit: a number representing an upper bound on the number of chars that must change

    >>> big_limit = 10
    >>> feline_fixes("nice", "rice", big_limit)    # Substitute: n -> r
    1
    >>> feline_fixes("range", "rungs", big_limit)  # Substitute: a -> u, e -> s
    2
    >>> feline_fixes("pill", "pillage", big_limit) # Don't substitute anything, length difference of 3.
    3
    >>> feline_fixes("roses", "arose", big_limit)  # Substitute: r -> a, o -> r, s -> o, e -> s, s -> e
    5
    >>> feline_fixes("rose", "hello", big_limit)   # Substitute: r->h, o->e, s->l, e->l, length difference of 1.
    5
    """
    # BEGIN PROBLEM 6
    min_length = min(len(typed), len(source))
    def func(i, total): # i：索引 total：typed 和 source 当前不同字符的个数。
        if i == min_length: # 终止条件1：到达最小字符串的长度。
            total += abs(len(typed) - len(source))
        elif total > limit: # 终止条件2：total 大于 limit。
            total = limit + 1
        elif typed[i] != source[i]: # 如果字符不等，则继续递归。
            return func(i + 1, total + 1)
        else:	# 否则，索引加1，total 不变。
            return func(i + 1, total)
        return total # 返回 total。
    return func(0, 0) # i = 0, total = 0。
    # END PROBLEM 6

2.3 minimum_mewtations

Implement minimum_mewtations, which is a diff function that returns the minimum number of edit operations needed to transform the start word into the goal word.

There are three kinds of edit operations, with some examples:

Add a letter to start.
Remove a letter from start.
Substitute a letter in start for another.

Each edit operation contributes 1 to the difference between two words.

实现 minimum_mewations，这是一个diff function，返回将 start 单词转换为 source 单词所需的最少编辑操作数。本问题主要考察string 的切片（slice），递归中的分治思想，以及 Tree Recursive。

核心思想：最小修改数 = min ( 一次 add 后，一次 remove 后，一次 substitute后) + 1。

def minimum_mewtations(start, goal, limit):
    """A diff function that computes the edit distance from START to GOAL.
    This function takes in a string START, a string GOAL, and a number LIMIT.
    Arguments:
        start: a starting word
        goal: a goal word
        limit: a number representing an upper bound on the number of edits
    >>> big_limit = 10
    >>> minimum_mewtations("cats", "scat", big_limit)       # cats -> scats -> scat
    2
    >>> minimum_mewtations("purng", "purring", big_limit)   # purng -> purrng -> purring
    2
    >>> minimum_mewtations("ckiteus", "kittens", big_limit) # ckiteus -> kiteus -> kitteus -> kittens
    3
    """
    if not start and not goal:  # Fill in the condition # 终止条件1：start 和 goal 都为空。
        # BEGIN
        "*** YOUR CODE HERE ***"
        return 0
        # END
    elif not start or not goal: # 终止条件2：start 和 goal 有一个不为空。
        return abs(len(start) - len(goal))
    elif limit < 0:  # Feel free to remove or add additional cases # 终止条件3：limit小于0
        # BEGIN
        "*** YOUR CODE HERE ***"
        return limit + 1
        # END
    elif start[0] == goal[0]: # 如果字符相等，则继续递归，limit 不变。
        return minimum_mewtations(start[1:], goal[1:], limit)
    else: # 三种操作
        add = minimum_mewtations(start, goal[1:], limit - 1)  # Fill in these lines # add：要增加字符的个数。
        remove = minimum_mewtations(start[1:], goal, limit - 1) # remove：要去除字符的个数。
        substitute = minimum_mewtations(start[1:], goal[1:], limit - 1) # substitute：要替换字符的个数。
        # BEGIN
        "*** YOUR CODE HERE ***"
        return min(add, remove, substitute) + 1 # 选择修改次数最小的次数。
        # END

3. Phase 3：Multiplayer

Phase 3 也有三个问题，主要是实现多个用户同时打字测速竞赛的功能。
个人感觉本次 Project 最有意思的地方是 Phase 2。Phase 3主要考察 list 以及 dictionary 的使用，以及list 的闭包性质。就最后一个问题有点复杂，就直接贴代码了。

3.1 report_progress

def report_progress(typed, prompt, user_id, upload):
    """Upload a report of your id and progress so far to the multiplayer server.
    Returns the progress so far.

    Arguments:
        typed: a list of the words typed so far
        prompt: a list of the words in the typing prompt
        user_id: a number representing the id of the current user
        upload: a function used to upload progress to the multiplayer server

    >>> print_progress = lambda d: print('ID:', d['id'], 'Progress:', d['progress'])
    >>> # The above function displays progress in the format ID: __, Progress: __
    >>> print_progress({'id': 1, 'progress': 0.6})
    ID: 1 Progress: 0.6
    >>> typed = ['how', 'are', 'you']
    >>> prompt = ['how', 'are', 'you', 'doing', 'today']
    >>> report_progress(typed, prompt, 2, print_progress)
    ID: 2 Progress: 0.6
    0.6
    >>> report_progress(['how', 'aree'], prompt, 3, print_progress)
    ID: 3 Progress: 0.2
    0.2
    """
    # BEGIN PROBLEM 8
    "*** YOUR CODE HERE ***"
    total = 0
    for i in range(len(typed)):
        if typed[i] == prompt[i]:
            total += 1
        else:
            break
    upload({'id' : user_id, 'progress' :  total / len(prompt)})
    return total / len(prompt)
    # END PROBLEM 8

3.2 time_per_word

def time_per_word(words, times_per_player):
    """Given timing data, return a match dictionary, which contains a
    list of words and the amount of time each player took to type each word.

    Arguments:
        words: a list of words, in the order they are typed.
        times_per_player: A list of lists of timestamps including the time
                          the player started typing, followed by the time
                          the player finished typing each word.

    >>> p = [[75, 81, 84, 90, 92], [19, 29, 35, 36, 38]]
    >>> match = time_per_word(['collar', 'plush', 'blush', 'repute'], p)
    >>> match["words"]
    ['collar', 'plush', 'blush', 'repute']
    >>> match["times"]
    [[6, 3, 6, 2], [10, 6, 1, 2]]
    """
    # BEGIN PROBLEM 9
    "*** YOUR CODE HERE ***"
    def get_time(lst):
        tmp = []
        for i in range(1, len(lst)):
            tmp += [lst[i] - lst[i - 1]]
        return tmp
    times = []
    for i in range(len(times_per_player)):
        times += [get_time(times_per_player[i])]
    return match(words, times) 
    # END PROBLEM 9

3.3 fastest_words

def fastest_words(match):
    """Return a list of lists of which words each player typed fastest.

    Arguments:
        match: a match dictionary as returned by time_per_word.

    >>> p0 = [5, 1, 3]
    >>> p1 = [4, 1, 6]
    >>> fastest_words(match(['Just', 'have', 'fun'], [p0, p1]))
    [['have', 'fun'], ['Just']]
    >>> p0  # input lists should not be mutated
    [5, 1, 3]
    >>> p1
    [4, 1, 6]
    """
    player_indices = range(len(get_all_times(match)))  # contains an *index* for each player
    word_indices = range(len(get_all_words(match)))    # contains an *index* for each word
    # BEGIN PROBLEM 10
    "*** YOUR CODE HERE ***"
    def player_helper(lst, i):
        res = []
        for w in word_indices:
            if lst[w] == i:
                res += [get_word(match, w)]
        return res

    def get_min_index(lst):
        min_index = 0
        min_time = 1000
        for i in player_indices:
            if lst[i] < min_time:
                min_index = i
                min_time = lst[i]
        return min_index
    
    time = []
    tmp = []
    for i in word_indices:
        for j in player_indices:
            tmp += [get_all_times(match)[j][i]]
        time += [tmp]
        tmp = []
    
    index_list = []
    for i in word_indices:
        index_list += [get_min_index(time[i])]
    
    result = []
    for i in player_indices:
        result += [player_helper(index_list, i)]

    return result
    # END PROBLEM 10

4. 测试

=====================================================================
Assignment: Project 2: Cats
OK, version v1.18.1
=====================================================================

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Scoring tests

---------------------------------------------------------------------
Problem 1
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed

---------------------------------------------------------------------
Problem 2
    Passed: 2
    Failed: 0
[ooooooooook] 100.0% passed

---------------------------------------------------------------------
Problem 3
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed

---------------------------------------------------------------------
Problem 4
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed

---------------------------------------------------------------------
Problem 5
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed

---------------------------------------------------------------------
Problem 6
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed

---------------------------------------------------------------------
Problem 7
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed

---------------------------------------------------------------------
Problem 8
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed

---------------------------------------------------------------------
Problem 9
    Passed: 2
    Failed: 0
[ooooooooook] 100.0% passed

---------------------------------------------------------------------
Problem 10
    Passed: 2
    Failed: 0
[ooooooooook] 100.0% passed

---------------------------------------------------------------------
Point breakdown
    Problem 1: 1.0/1
    Problem 2: 1.0/1
    Problem 3: 2.0/2
    Problem 4: 1.0/1
    Problem 5: 2.0/2
    Problem 6: 3.0/3
    Problem 7: 3.0/3
    Problem 8: 2.0/2
    Problem 9: 2.0/2
    Problem 10: 2.0/2

Score:
    Total: 19.0