总结一下我写过的读文件的方式

最新推荐文章于 2022-03-03 20:01:58 发布

weixin_43473864

最新推荐文章于 2022-03-03 20:01:58 发布

阅读量67

点赞数

分类专栏：代码整理

本文链接：https://blog.csdn.net/weixin_43473864/article/details/83577799

版权

代码整理专栏收录该内容

6 篇文章 0 订阅

订阅专栏

读取excel

# 读取excel
worksheet = xlrd.open_workbook(filepath)
table = worksheet.sheet_by_index(1)#读取第一个sheet里面的类容
datas=[]
# 第一行和第二行的内容不读入
for i in range(table.nrows):
    if i == 0:
        continue
    if i == 1:
        continue
    s = table.cell_value(i,4) # 读取第5列的内容
    datas.append(s)

读取txt

#这个txt是已经切好词的txt，词与词之间用空格隔开。并且每一行第一个词是标签。
def get_count(fPath):
    invertedIndex = defaultdict(list)
    docNumber = 0
    text=[]
    with open(fPath, 'r',encoding='utf-8') as f:
        line = f.readline()
        while line:
            line = line.strip('\n').split(' ')#这里输出的line是切好词的list
            text.append(line)
            lengthOfDocument = len(line) # 读出文章的长度，也就是每个line的长度
            docNumber += 1 # 计算文档索引，也就是line的索引
            if len(line) == 0:# 文本中的空行也要读取。在我看来有点多余。
                line = f.readline()
                continue
            docIndex = line[0] # 文章标签
            
            for term in set(line):
                count = line.count(term) #计算每行中去重单词的个数
                invertedIndex[term].append([docIndex,count,lengthOfDocument]) 
            line = f.readline()
    f.close()

weixin_43473864

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
总结一下我写过的读文件的方式

读取excel# 读取excelworksheet = xlrd.open_workbook(filepath)table = worksheet.sheet_by_index(1)#读取第一个sheet里面的类容datas=[]# 第一行和第二行的内容不读入for i in range(table.nrows): if i == 0: continue ...
复制链接

扫一扫