Python用于文本分析的一个demo—统计文档中只出现一次的字符

最新推荐文章于 2024-05-26 01:00:00 发布

Megustas_JJC

最新推荐文章于 2024-05-26 01:00:00 发布

阅读量940

点赞数 1

分类专栏： Python 文章标签： python

本文链接：https://blog.csdn.net/Megustas_JJC/article/details/78640473

版权

Python 专栏收录该内容

21 篇文章 0 订阅

订阅专栏

学习Python也有一周的时间，将学到的内容通过一个文本处理的小程序进行总结，需求如下：

分析文件长度，按单词计数
追踪文件中只出现一次的不同单词

文本采用1863年林肯先生的《葛底斯堡演说》为例。

def makeWordList(gFile):
    speech = []
    for lineString in gFile:
        lineList = lineString.split()
        for word in lineList:
            word = word.lower().split(".,")
            if word!="--":
                speech.append(word)
    return speech

def makeUnique(speech):
    unique = []
    for word in speech:
        if word not in unique:
            unique.append(word)
    return unique

gFile = open("/Users/Megustas/Desktop/gettysburg.txt","rU")
speech = makeWordList(gFile)
print(speech)
print "Speech Length:",len(speech)
unique = makeUnique(speech)
print(unique)
print "Unique Length:",len(unique)

运行结果：
这里写图片描述

Megustas_JJC

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Python用于文本分析的一个demo—统计文档中只出现一次的字符

学习Python也有一周的时间，将学到的内容通过一个文本处理的小程序进行总结，需求如下：分析文件长度，按单词计数追踪文件中只出现一次的不同单词文本采用1863年林肯先生的《葛底斯堡演说》为例。def makeWordList(gFile): speech = [] for lineString in gFile: lineList = lineString.sp
复制链接

扫一扫