Windows
Doxc
下载python-docx模块库
pip install python-docx
import docx
word = "a.docx"
document = docx.Document(word)
for paragraph in document.paragraphs:
text = paragraph.text
print(text)
Doc
下载win32com模块库,只支持Windows下
python -m pip install pypiwin32
from win32com import client
import pythoncom
word = "a.docx"
pythoncom.CoInitialize()
word = client.Dispatch('Word.Application')
word.Visible = 0 # 后台运行,不显示
word.DisplayAlerts = 0 # 不警告
doc = word.Documents.Open(word)
for para in doc.paragraphs:
print(para.Range.Text)
doc.SaveAs('D:PythonFiles/4paradigm/gdt_flask/file/test.txt', 2)
doc.Close()
word.Quit()
pythoncom.CoUninitialize()
Linux
Doxc
下载python-docx模块库
pip install python-docx
import docx
word = "a.docx"
document = docx.Document(word)
for paragraph in document.paragraphs:
text = paragraph.text
print(text)
Doc
安装 antiword
下载地址:http://www.winfield.demon.nl/linux/antiword-0.37.tar.gz
解压进入目录
tar -zxvf antiword-0.37.tar.gz
cd antiword-0.37
make && make install
安装时,自动安装到了/root/目录下,只有root才可执行该命令,我们需要改一下路径,COPY到/usr中方便调用。
cp /root/bin/*antiword /usr/local/bin/
mkdir /usr/share/antiword
cp -R /root/.antiword/* /usr/share/antiword/
chmod 777 /usr/local/bin/*antiword
chmod 755 /usr/share/antiword/*
"""
代码用法
"""
word = "a.doc"
output = subprocess.check_output(["antiword", word])
# 解码
output = output.decode('utf8')
print(output)
Fighter_ma: 弱小和无知不是生存的障碍,傲慢才是~