002_006 Python 处理文件中的每个单词

最新推荐文章于 2022-05-06 12:40:53 发布

书山登峰人

最新推荐文章于 2022-05-06 12:40:53 发布

阅读量2.5k

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/houyj1986/article/details/21248327

版权

Python 专栏收录该内容

113 篇文章 2 订阅

订阅专栏

代码如下：

#encoding=utf-8

print '中国'

#处理文件中的每个单词 假定词由空格分开

''' D:\123.txt的内容如下：
1 a b c 中 国
2 a b c 中 国
'''

#方案一
print '------1'
file_object = open(r'd:\123.txt','rU')

for line in file_object:
    for word in line.split():
        print word

file_object.close()

#方案二 正则表达式 不支持中文
print '------2'
import re
re_word = re.compile(r"[\w'-]+")

file_object = open(r'd:\123.txt','rU')

for line in file_object:
    for word in re_word.finditer(line):
        print word.group(0)

file_object.close()

#方案三 封装成迭代器
print '------3'

def wordsoffile(thefilepath, line_to_words = str.split):
    the_file = open(thefilepath)
    for line in the_file:
        for word in line_to_words(line):
            yield word
    the_file.close()
    
for word in wordsoffile(r'd:\123.txt'):
    print word

打印结果如下:

中国
------1
1
a
b
c
中
国
2
a
b
c
中
国
------2
1
a
b
c
2
a
b
c
------3
1
a
b
c
中
国
2
a
b
c
中
国