谢谢你的帮助,但我自己解决了这个问题。在
这是我制作的模块。通过调用Sheet('path/to/sheet.xlsx').shapes.text,您现在可以轻松地在文本框中找到文本:import zipfile as z
class Sheet(str):
@property
def shapes(this):
s = z.ZipFile(this)
p='xl/drawings/drawing1.xml' # shapes path, *.xlsx default
p='drs/shapexml.xml' # shapes path, *.xls default
return XML(s.read(p))
class XML(object):
def __init__(self, value):
self.value = str(value)
def __repr__(self):
return repr(self.value)
def __getitem__(self, i):
return self.value[i]
def tag_content(self, tag):
return [XML(i) for i in self.value.split(tag)[1::2]]
@property
def text(self):
t = self.tag_content('xdr:txBody') # list of XML codes, each containing a seperate textboxes, messy (with extra xml that is)
l = [i.tag_content('a:p>') for i in t] # split into sublists by line breaks (inside the textbox), messy
w = [[[h[1:-2] for h in i.tag_content('a:t')] if i else ['\n'] for i in j] for j in l] # clean into sublists by cell-by-cell basis (and mind empty lines)
l = [[''.join(i) for i in j] for j in w] # join lines overlapping multiple cells into one sublist
return ['\n'.join(j) for j in l] # join sublists of lines into strings seperated by newline char
所以现在我的问题中提供的模式将被输出为['comments extra'],而一个模式如:This
is
Text
in
a
textbox
on
a
sheet
And this
is another text box somewhere else
Regardless of the overlapped cells
将作为['This is\nText in a textbox on\na sheet','And this is another text box somewhere else\nRegardless of the overlapped cells']输出。在
不客气。