python从停用词txt文件中读取停用词到列表中

最新推荐文章于 2024-05-12 16:30:17 发布

Am最温柔

最新推荐文章于 2024-05-12 16:30:17 发布

阅读量6.2k

点赞数 3

分类专栏：毕业论文相关

本文链接：https://blog.csdn.net/weixin_43919570/article/details/104302735

版权

15 篇文章 3 订阅

订阅专栏

在读取停用此列表时遇到这行代码，记录理解过程：

#读取停顿词列表
stopword_list = [k.strip() for k in open('stopwords.txt', encoding='utf8').readlines() if k.strip() != '']

这一行代码有点长，用到的python知识点有：列表生成式、readlines和strip()，下文依次介绍。

[expr for iter_var in iterable] 
[expr for iter_var in iterable if cond_expr]

例子：

# -*- coding: UTF-8 -*-
lsit1=[x * x for x in range(1, 11)]
print(lsit1)

结果为

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

readlines()方法读取整个文件所有行，保存在一个列表(list)变量中，每行作为一个元素，但读取大文件会比较占内存。

f = open("a.txt")
lines = f.readlines()
print(type(lines))
for line in lines:
	print line
f.close()

结果为：

<type 'list'>
Hello
Welcome
What is the fuck...

Python strip() 方法用于移除字符串头尾指定的字符（默认为空格）或字符序列。

注意：该方法只能删除开头或是结尾的字符，不能删除中间部分的字符。

strip() 处理的时候，如果不带参数，默认是清除两边的空白符，例如：/n, /r, /t, ’ ')。

再回头看这行代码就可以读懂了。
参考博客：
https://www.runoob.com/python3/python3-string-strip.html
https://blog.csdn.net/AliceGoToAnother/article/details/79119049

关注