函数——文件的读写和统计分词4_卖女孩的小火柴_fi = open("小女孩.txt","r") fo = open("py301-1.txt","-CSDN博客

本文链接：https://blog.csdn.net/weixin_53307519/article/details/114314186

004 一、方法论

在这里插入图片描述

这道题不同在于，操作对象是多文件：

一个用来读取文件 ('r')
一个用来写入文件 ('w')

fi = open("小女孩.txt","r")
fo = open("PY301-1.txt","w")

将读取的文件储存在变量 txt 中

txt = fi.read()

创建一个空字典，并将常见的中文字符储存在exclude中。

d = {}
exclude = "，。！？、（）【】<>《》=：+-*—“”…"

处理读取的文件：如果是中文标点符号，跳过；反之，统计词频。

for word in txt:
    if word in exclude:
        continue
    else:
        d[word] = d.get(word,0)+1

将“字典” 转化为 列表 ，并降序排序（按题目的要求）
在文件 PY301-1.txt 中写入第一个元组的数据，方法是使用 str.format()

fo.write("{}:{}".format(ls[0][0],ls[0][1]))

最后要注意的就是文件的关闭。

总代码：

fi = open("小女孩.txt","r")
fo = open("PY301-1.txt","w")
txt = fi.read()
d = {}
exclude = "，。！？、（）【】<>《》=：+-*—“”…"
for word in txt:
    if word in exclude:
        continue
    else:
        d[word] = d.get(word,0)+1
ls = list(d.items())
ls.sort(key=lambda x:x[1],reverse=True)
# 下一行是用于查看前十个元素的二元元组列表
# print(ls[:10])
fo.write("{}:{}".format(ls[0][0],ls[0][1]))
fo.close()
fi.close()

二、方法论

在这里插入图片描述

注意：基本的方法和方法论1非常的类似，这也是考试的技巧（tips），可以把小题前面的题目作为下一题的参考。
题目的最重要的要求：不包含回车符，因此，这里有两种方法：

通过双分支结构来剔除回车符（'\n'）

	# 这里是作对题目的核心！！！
if word == "\n":
    continue
else:
    d[word] = d.get(word,0) + 1

通过字典的删除的方法来剔除字典的键，键值对，同生共死，缺一不可。因此，删除了键也就意味着对应的值一并删除。

del d["\n"]

这里的for循环是控制循环的次数，以便程序能够输出前十个字符，而不能是这样的：

for i in ls[:10]:
	fo.write(ls[i][0])
# 这样会报错：
Traceback (most recent call last):
  File "D:\KSWJJ\66000001\PY301-2.py", line 18, in <module>
    fo.write(ls[i][0])
TypeError: list indices must be integers or slices, not tuple
>>>

总代码1：

fi = open("小女孩.txt","r")
fo = open("PY301-2.txt","w")
txt = fi.read()
d = {}
for word in txt:
    if word == "\n":
        continue
    else:
        d[word] = d.get(word,0) + 1
ls = list(d.items())
ls.sort(key=lambda x:x[1], reverse=True) # 此行可以按照词频由高到低排序
for i in range(10):
    fo.write(ls[i][0])
fo.close()
fi.close()

总代码2：

fi = open("小女孩.txt","r")
fo = open("PY301-2.txt","w")
txt = fi.read()
d = {}
for word in txt:
	d[word] = d.get(word,0) + 1
del d["\n"]
ls = list(d.items())
ls.sort(key=lambda x:x[1], reverse=True) # 此行可以按照词频由高到低排序
for i in range(10):
    fo.write(ls[i][0])
fo.close()
fi.close()

19点57分 2021年3月5日：

三、方法论

在这里插入图片描述

fi = open("小女孩.txt","r")
fo = open("小女孩-频次排序.txt","w")
txt = fi.read()
d = {}
for word in txt:
    if word == '\n' or word == ' ':
        continue
    else:
        d[word] = d.get(word,0) + 1
ls = list(d.items())
ls.sort(key=lambda x:x[1], reverse=True) # 此行可以按照词频由高到低排序
for i in ls[:9]:
    fo.write("{}:{},".format(i[0],i[1]))
fo.write('{}:{}'.format(ls[9][0],ls[9][1]))
fo.close()
fi.close()

标准答案：

fi = open("小女孩.txt","r")
fo = open("小女孩-频次排序.txt","w")
txt = fi.read()
d = {}
for word in txt:
    d[word] = d.get(word,0)+1
del d[" "]
del d["\n"]
ls = list(d.items())
ls.sort(key=lambda x:x[1], reverse=True) # 此行可以按照词频由高到低排序
for i in range(len(ls)):
    ls[i] = "{}:{}".format(ls[i][0],ls[i][1])
fo.write(",".join(ls))
fi.close()
fo.close()

我的答案和标答的区别在于，标准答案在于对前十个处理的很到位，直接采用 ''.join() 方法，解决了前九个采用逗号分割，最后一个元素采用单独写入的麻烦。

# 标准答案：
for i in range(len(ls)):
    ls[i] = "{}:{}".format(ls[i][0],ls[i][1])
fo.write(",".join(ls))
 
# 我的答案：
for i in ls[:9]:
    fo.write("{}:{},".format(i[0],i[1]))
fo.write('{}:{}'.format(ls[9][0],ls[9][1]))