我想根据多个.xlsx文件中的单词比较多个.txt文件中的常用单词。 这些是文本文件的外观:
文字档1:
(
apples), kkkk, alloal
df, ja_tee
ffgg
hchh.hc
yuytu_member
文字档2:
(
orange pplL
df
ffgg
hchhhc
yuytu_
文字档3:
(
pear +lpdlpla;s
df
ffgg
hchhhc
yuytu_
我试图提取apples , orange和pear 。 这些值在我的.xlsx文件的单元格A1中找到。
即.xlsx文件1:单元格A1 =苹果
.xlsx文件2:单元格A1 =橙色
.xlsx文件3:单元格A1 =梨
到目前为止,这是我的代码:
import os, sys
import openpyxl
from openpyxl.reader.excel import load_workbook
import xlwt
from xlwt import Workbook
#The filepath that I will be saving my .xls file to:
filepath = ('C:/Users/xxxx/Documents/xxx/abc.xls')
#The .xls file:
wb2 = xlrd.open_workbook('C:\\Users\\xxxx\\Documents\\xxxx\\abc.xls', on_demand= True)
wb2 = Workbook()
sheet2 = wb2.add_sheet("xyz", cell_overwrite_ok=True)
#The .xlxs file that contains the words i want to compare w .txt file:
folder_path1 = os.chdir("C:/Users/xxx/Documents/xxxx/Test python dict")
for file in os.listdir(folder_path1):
if file.endswith(".xlsx"):
wb = load_workbook(file, data_only=True)
ws = wb.active
cell_range = ws['A1']
dictionary = cell_range.value
print(dictionary) #to check if the words/values are there
# Me writing the name of each .txt file to the .xls file:
for r, dir in enumerate(os.listdir("C:/Users/xxx/Documents/xxxx/txt test python")):
sheet2.write(r+1,1,dir)
#Reading .txt file and trying to make the sentence into words instead of lines so that I can compare the .txt individual words with the .xlsx file:
import glob
path = "C:/Users/xxx/Documents/xxxx/txt test python"
from os.path import isfile
files=filter(isfile,glob.glob('%s/*'%path))
for name in files:
with open(name) as texts:
data1 = texts.read().strip()
import re
data = re.split('[,.\n\s]',data1)
for current_word in data:
txtwords = current_word
if txtwords in dictionary:
print(txtwords)
#sheet2.write(r+1,2,dir)
# wb2.save(filepath)
但是,我无法实现我想要的结果。 我现在的输出是:
apples
orange
pear
pear
为什么只打印一个常用字?
预期产量:
apple
orange
pear
apples
orange
pear