Autocorrected Typing Software
1. 前言
本项目是 CS 61A 2022 Fall 的第二个 Project。
这次是实现一个测量打字速度的程序。此外,还要实现一个打字纠错功能(一种在用户输入单词后尝试纠正单词拼写的功能)。
本项目主要在 cats.py
文件内进行,项目的主体也在这个文件里。会用到 utils.py
的字符串相关的工具函数。
2. Phase 1:Typing
Phase 1 是实现打字检测正确度和测速功能。有四个 Problem,各个 Problem 的介绍一下就不翻译了。
2.1 pick
Throughout the project, we will be making changes to functions in cats.py
.
Implement pick
. This function selects which paragraph the user will type. It takes three parameters:
- a list of
paragraphs
(strings) - a
select
function, which returnsTrue
for paragraphs that can be selected - a non-negative index
k
Thepick
function returns thek
th paragraph for whichselect
returnsTrue
. If no such paragraph exists (becausek
is too large), thenpick
returns the empty string.
Problem 1 实现一个从段落(paragraphs)中挑选一个符合条件(select)的第 k 个段落。考察 list 的使用。
def pick(paragraphs, select, k):
"""Return the Kth paragraph from PARAGRAPHS for which SELECT called on the
paragraph returns True. If there are fewer than K such paragraphs, return
the empty string.
Arguments:
paragraphs: a list of strings
select: a function that returns True for paragraphs that can be selected
k: an integer
>>> ps = ['hi', 'how are you', 'fine']
>>> s = lambda p: len(p) <= 4
>>> pick(ps, s, 0)
'hi'
>>> pick(ps, s, 1)
'fine'
>>> pick(ps, s, 2)
''
"""
# BEGIN PROBLEM 1
"*** YOUR CODE HERE ***"
ps = [] # 挑选符合条件的段落。
for str in paragraphs:
if select(str):
ps += [str]
if k >= len(ps): # 如果没有符合的段落,则返回空串。
return ''
return ps[k]
# END PROBLEM 1
2.2 about
Implement about
, which takes a list of topic
words. It returns a function which takes a paragraph and returns a boolean indicating whether that paragraph contains any of the words in topic
.
Once we’ve implemented about
, we’ll be able to pass the returned function to pick
as the select
argument, which will be useful as we continue to implement our typing test.
To be able to make this comparison accurately, you will need to ignore case (that is, assume that uppercase and lowercase letters don’t change what word it is) and punctuation in the paragraph. Additionally, only check for exact matches of the words in topic in the paragraph, not substrings. For example, “dogs” is not a match for the word “dog”.
这个问题是实现 about
函数,这个函数会返回一个函数。返回的函数获取一个段落并返回一个布尔值,指示该段落是否包含 topic
中的任何单词。考察 list 的使用,string 的相关成员函数以及 in 关键字、高阶函数的使用。
def about(topic):
"""Return a select function that returns whether
a paragraph contains one of the words in TOPIC.
Arguments:
topic: a list of words related to a subject
>>> about_dogs = about(['dog', 'dogs', 'pup', 'puppy'])
>>> pick(['Cute Dog!', 'That is a cat.', 'Nice pup!'], about_dogs, 0)
'Cute Dog!'
>>> pick(['Cute Dog!', 'That is a cat.', 'Nice pup.'], about_dogs, 1)
'Nice pup.'
"""
assert all([lower(x) == x for x in topic]), 'topics should be lowercase.'
# BEGIN PROBLEM 2
"*** YOUR CODE HERE ***"
def select(paragraph):
paragraph = split(lower(remove_punctuation(paragraph))) #去除标点符号,并将段落全部小写,以及分隔单词。
for s in paragraph:
if s in topic:
return True
return False
return select
# END PROBLEM 2
2.3 accuracy
Implement accuracy
, which takes a typed
paragraph and a source
paragraph. It returns the percentage of words in typed
that exactly match the corresponding words in source
. Case and punctuation must match as well. “Corresponding” here means that two words must occur at the same indices in typed
and source
—the first words of both must match, the second words of both must match, and so on.
A word in this context is any sequence of characters separated from other words by whitespace, so treat “dog;” as a single word.
If typed
is longer than source
, then the extra words in typed
that have no corresponding word in source
are all incorrect.
If both typed
and source
are empty, then the accuracy is 100.0. If typed
is empty but source
is not empty, then the accuracy is zero. If typed
is not empty but source
is empty, then the accuracy is zero.
此问题是实现 accuracy
函数,它接受一个 typed
段落(输入)和一个 source
(目标) 段落。它返回 typed
的单词与 source
中的对应单词完全匹配的单词的百分比。注意一些匹配细节。也是考察 list 和 for 语句的相关使用。
def accuracy(typed, source):
"""Return the accuracy (percentage of words typed correctly) of TYPED
when compared to the prefix of SOURCE that was typed.
Arguments:
typed: a string that may contain typos
source: a string without errors
>>> accuracy('Cute Dog!', 'Cute Dog.')
50.0
>>> accuracy('A Cute Dog!', 'Cute Dog.')
0.0
>>> accuracy('cute Dog.', 'Cute Dog.')
50.0
>>> accuracy('Cute Dog. I say!', 'Cute Dog.')
50.0
>>> accuracy('Cute', 'Cute Dog.')
100.0
>>> accuracy('', 'Cute Dog.')
0.0
>>> accuracy('', '')
100.0
"""
typed_words = split(typed)
source_words = split(source)
# BEGIN PROBLEM 3
"*** YOUR CODE HERE ***"
if source == '' and typed == '': #如果 source 和 typed 都为空串。
return 100.0
elif source == '' or typed == '': #如果source 和 typed 有一个为空串。
return 0.0
total = 0
length = min(len(typed_words), len(source_words))
for i in range(length):
if typed_words[i] == source_words[i]:
total += 1
return 100 * total / len(typed_words) #返回百分比。
# END PROBLEM 3
2.4 wpm(words per minute)
Implement wpm
, which computes the words per minute, a measure of typing speed, given a string typed
and the amount of elapsed
time in seconds. Despite its name, words per minute is not based on the number of words typed, but instead the number of groups of 5 characters, so that a typing test is not biased by the length of words. The formula for words per minute is the ratio of the number of characters (including spaces) typed divided by 5 (a typical word length) to the elapsed time in minutes.
For example, the string "I am glad!"
contains three words and ten characters (not including the quotation marks). The words per minute calculation uses 2 as the number of words typed (because 10 / 5 = 2). If someone typed this string in 30 seconds (half a minute), their speed would be 4 words per minute.
此问题是实现 wpm
函数,功能是计算一分钟内打字的速度。比较简单。
def wpm(typed, elapsed):
"""Return the words-per-minute (WPM) of the TYPED string.
Arguments:
typed: an entered string
elapsed: an amount of time in seconds
>>> wpm('hello friend hello buddy hello', 15)
24.0
>>> wpm('0123456789',60)
2.0
"""
assert elapsed > 0, 'Elapsed time must be positive'
# BEGIN PROBLEM 4
"*** YOUR CODE HERE ***"
return len(typed) / 5 * 60 / elapsed
# END PROBLEM 4
2. Phase 2:Autocorrect
Phase 2 是实现自动纠错功能,主要考察递归的熟练度。
2.1 Autocorrect
Implement autocorrect
, which takes a typed_word
, a word_list
, a diff_function
, and a limit
.
If the typed_word
is contained inside the word_list
, autocorrect
returns that word.
Otherwise, autocorrect
returns the word from word_list
that has the lowest difference from the provided typed_word
based on the diff_function
. However, if the lowest difference between typed_word
and any of the words in word_list
is greater than limit
, then typed_word
is returned instead.
A diff function takes in three arguments. The first is the typed_word
, the second is the source word (in this case, a word from word_list
), and the third argument is the limit
. The output of the diff function, which is a number, represents the amount of difference between the two strings.
diff_function
有三个参数。第一个参数是 typed_word
,第二个参数是源单词(在本例中,是来自 word_lis
t的单词),第三个参数是 limit
。函数的输出是一个数字,表示两个字符串之间的差值。
autocorrect
根据 diff_function
从 word_list
中返回与提供的 typed_word
相差最小的单词。
本问题主要要注意 min
或 max
与 key
形参的搭配使用。
def autocorrect(typed_word, word_list, diff_function, limit):
"""Returns the element of WORD_LIST that has the smallest difference
from TYPED_WORD. Instead returns TYPED_WORD if that difference is greater
than LIMIT.
Arguments:
typed_word: a string representing a word that may contain typos
word_list: a list of strings representing source words
diff_function: a function quantifying the difference between two words
limit: a number
>>> ten_diff = lambda w1, w2, limit: 10 # Always returns 10
>>> autocorrect("hwllo", ["butter", "hello", "potato"], ten_diff, 20)
'butter'
>>> first_diff = lambda w1, w2, limit: (1 if w1[0] != w2[0] else 0) # Checks for matching first char
>>> autocorrect("tosting", ["testing", "asking", "fasting"], first_diff, 10)
'testing'
"""
# BEGIN PROBLEM 5
"*** YOUR CODE HERE ***"
diff_list = []
for s in word_list: # 构造差异度列表。
diff_list += [diff_function(typed_word, s, limit)]
if min(diff_list) > limit or typed_word in word_list: # 其它条件。
return typed_word
def func(word):
return diff_function(typed_word, word, limit)
return min(word_list, key=func) # 返回差异最小的 word。
# END PROBLEM 5
2.2 feline_fixes
Implement feline_fixes
, which is a diff function that takes two strings. It returns the minimum number of characters that must be changed in the typed
word in order to transform it into the source
word. If the strings are not of equal length, the difference in lengths is added to the total.
Problem 6 和 Problem 7 是要实现两个计算差异度的函数。Problem 6 是从两个字符串的的不同字符个数来实现,而 Problem 7 则是从到目标字符所需要的最小修改次数。两个问题都是要通过递归实现。
def feline_fixes(typed, source, limit):
"""A diff function for autocorrect that determines how many letters
in TYPED need to be substituted to create SOURCE, then adds the difference in
their lengths and returns the result.
Arguments:
typed: a starting word
source: a string representing a desired goal word
limit: a number representing an upper bound on the number of chars that must change
>>> big_limit = 10
>>> feline_fixes("nice", "rice", big_limit) # Substitute: n -> r
1
>>> feline_fixes("range", "rungs", big_limit) # Substitute: a -> u, e -> s
2
>>> feline_fixes("pill", "pillage", big_limit) # Don't substitute anything, length difference of 3.
3
>>> feline_fixes("roses", "arose", big_limit) # Substitute: r -> a, o -> r, s -> o, e -> s, s -> e
5
>>> feline_fixes("rose", "hello", big_limit) # Substitute: r->h, o->e, s->l, e->l, length difference of 1.
5
"""
# BEGIN PROBLEM 6
min_length = min(len(typed), len(source))
def func(i, total): # i:索引 total:typed 和 source 当前不同字符的个数。
if i == min_length: # 终止条件1:到达最小字符串的长度。
total += abs(len(typed) - len(source))
elif total > limit: # 终止条件2:total 大于 limit。
total = limit + 1
elif typed[i] != source[i]: # 如果字符不等,则继续递归。
return func(i + 1, total + 1)
else: # 否则,索引加1,total 不变。
return func(i + 1, total)
return total # 返回 total。
return func(0, 0) # i = 0, total = 0。
# END PROBLEM 6
2.3 minimum_mewtations
Implement minimum_mewtations
, which is a diff function that returns the minimum number of edit operations needed to transform the start
word into the goal
word.
There are three kinds of edit operations, with some examples:
- Add a letter to
start
. - Remove a letter from
start
. - Substitute a letter in
start
for another.
Each edit operation contributes 1 to the difference between two words.
实现 minimum_mewations
,这是一个diff function,返回将 start
单词转换为 source
单词所需的最少编辑操作数。本问题主要考察string 的切片(slice),递归中的分治思想,以及 Tree Recursive。
核心思想:最小修改数 = min ( 一次 add 后,一次 remove 后,一次 substitute后) + 1。
def minimum_mewtations(start, goal, limit):
"""A diff function that computes the edit distance from START to GOAL.
This function takes in a string START, a string GOAL, and a number LIMIT.
Arguments:
start: a starting word
goal: a goal word
limit: a number representing an upper bound on the number of edits
>>> big_limit = 10
>>> minimum_mewtations("cats", "scat", big_limit) # cats -> scats -> scat
2
>>> minimum_mewtations("purng", "purring", big_limit) # purng -> purrng -> purring
2
>>> minimum_mewtations("ckiteus", "kittens", big_limit) # ckiteus -> kiteus -> kitteus -> kittens
3
"""
if not start and not goal: # Fill in the condition # 终止条件1:start 和 goal 都为空。
# BEGIN
"*** YOUR CODE HERE ***"
return 0
# END
elif not start or not goal: # 终止条件2:start 和 goal 有一个不为空。
return abs(len(start) - len(goal))
elif limit < 0: # Feel free to remove or add additional cases # 终止条件3:limit小于0
# BEGIN
"*** YOUR CODE HERE ***"
return limit + 1
# END
elif start[0] == goal[0]: # 如果字符相等,则继续递归,limit 不变。
return minimum_mewtations(start[1:], goal[1:], limit)
else: # 三种操作
add = minimum_mewtations(start, goal[1:], limit - 1) # Fill in these lines # add:要增加字符的个数。
remove = minimum_mewtations(start[1:], goal, limit - 1) # remove:要去除字符的个数。
substitute = minimum_mewtations(start[1:], goal[1:], limit - 1) # substitute:要替换字符的个数。
# BEGIN
"*** YOUR CODE HERE ***"
return min(add, remove, substitute) + 1 # 选择修改次数最小的次数。
# END
3. Phase 3:Multiplayer
Phase 3 也有 三个问题,主要是实现多个用户同时打字测速竞赛的功能。
个人感觉本次 Project 最有意思的地方是 Phase 2。Phase 3主要考察 list 以及 dictionary 的使用,以及list 的闭包性质。就最后一个问题有点复杂,就直接贴代码了。
3.1 report_progress
def report_progress(typed, prompt, user_id, upload):
"""Upload a report of your id and progress so far to the multiplayer server.
Returns the progress so far.
Arguments:
typed: a list of the words typed so far
prompt: a list of the words in the typing prompt
user_id: a number representing the id of the current user
upload: a function used to upload progress to the multiplayer server
>>> print_progress = lambda d: print('ID:', d['id'], 'Progress:', d['progress'])
>>> # The above function displays progress in the format ID: __, Progress: __
>>> print_progress({'id': 1, 'progress': 0.6})
ID: 1 Progress: 0.6
>>> typed = ['how', 'are', 'you']
>>> prompt = ['how', 'are', 'you', 'doing', 'today']
>>> report_progress(typed, prompt, 2, print_progress)
ID: 2 Progress: 0.6
0.6
>>> report_progress(['how', 'aree'], prompt, 3, print_progress)
ID: 3 Progress: 0.2
0.2
"""
# BEGIN PROBLEM 8
"*** YOUR CODE HERE ***"
total = 0
for i in range(len(typed)):
if typed[i] == prompt[i]:
total += 1
else:
break
upload({'id' : user_id, 'progress' : total / len(prompt)})
return total / len(prompt)
# END PROBLEM 8
3.2 time_per_word
def time_per_word(words, times_per_player):
"""Given timing data, return a match dictionary, which contains a
list of words and the amount of time each player took to type each word.
Arguments:
words: a list of words, in the order they are typed.
times_per_player: A list of lists of timestamps including the time
the player started typing, followed by the time
the player finished typing each word.
>>> p = [[75, 81, 84, 90, 92], [19, 29, 35, 36, 38]]
>>> match = time_per_word(['collar', 'plush', 'blush', 'repute'], p)
>>> match["words"]
['collar', 'plush', 'blush', 'repute']
>>> match["times"]
[[6, 3, 6, 2], [10, 6, 1, 2]]
"""
# BEGIN PROBLEM 9
"*** YOUR CODE HERE ***"
def get_time(lst):
tmp = []
for i in range(1, len(lst)):
tmp += [lst[i] - lst[i - 1]]
return tmp
times = []
for i in range(len(times_per_player)):
times += [get_time(times_per_player[i])]
return match(words, times)
# END PROBLEM 9
3.3 fastest_words
def fastest_words(match):
"""Return a list of lists of which words each player typed fastest.
Arguments:
match: a match dictionary as returned by time_per_word.
>>> p0 = [5, 1, 3]
>>> p1 = [4, 1, 6]
>>> fastest_words(match(['Just', 'have', 'fun'], [p0, p1]))
[['have', 'fun'], ['Just']]
>>> p0 # input lists should not be mutated
[5, 1, 3]
>>> p1
[4, 1, 6]
"""
player_indices = range(len(get_all_times(match))) # contains an *index* for each player
word_indices = range(len(get_all_words(match))) # contains an *index* for each word
# BEGIN PROBLEM 10
"*** YOUR CODE HERE ***"
def player_helper(lst, i):
res = []
for w in word_indices:
if lst[w] == i:
res += [get_word(match, w)]
return res
def get_min_index(lst):
min_index = 0
min_time = 1000
for i in player_indices:
if lst[i] < min_time:
min_index = i
min_time = lst[i]
return min_index
time = []
tmp = []
for i in word_indices:
for j in player_indices:
tmp += [get_all_times(match)[j][i]]
time += [tmp]
tmp = []
index_list = []
for i in word_indices:
index_list += [get_min_index(time[i])]
result = []
for i in player_indices:
result += [player_helper(index_list, i)]
return result
# END PROBLEM 10
4. 测试
=====================================================================
Assignment: Project 2: Cats
OK, version v1.18.1
=====================================================================
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Scoring tests
---------------------------------------------------------------------
Problem 1
Passed: 1
Failed: 0
[ooooooooook] 100.0% passed
---------------------------------------------------------------------
Problem 2
Passed: 2
Failed: 0
[ooooooooook] 100.0% passed
---------------------------------------------------------------------
Problem 3
Passed: 1
Failed: 0
[ooooooooook] 100.0% passed
---------------------------------------------------------------------
Problem 4
Passed: 1
Failed: 0
[ooooooooook] 100.0% passed
---------------------------------------------------------------------
Problem 5
Passed: 1
Failed: 0
[ooooooooook] 100.0% passed
---------------------------------------------------------------------
Problem 6
Passed: 1
Failed: 0
[ooooooooook] 100.0% passed
---------------------------------------------------------------------
Problem 7
Passed: 1
Failed: 0
[ooooooooook] 100.0% passed
---------------------------------------------------------------------
Problem 8
Passed: 1
Failed: 0
[ooooooooook] 100.0% passed
---------------------------------------------------------------------
Problem 9
Passed: 2
Failed: 0
[ooooooooook] 100.0% passed
---------------------------------------------------------------------
Problem 10
Passed: 2
Failed: 0
[ooooooooook] 100.0% passed
---------------------------------------------------------------------
Point breakdown
Problem 1: 1.0/1
Problem 2: 1.0/1
Problem 3: 2.0/2
Problem 4: 1.0/1
Problem 5: 2.0/2
Problem 6: 3.0/3
Problem 7: 3.0/3
Problem 8: 2.0/2
Problem 9: 2.0/2
Problem 10: 2.0/2
Score:
Total: 19.0