I found several questions that were similar to mine, but none of the answers came close to what I need.
Specifications: I'm working with Python 3 and do not have MS Word. My programming machine is running OS X and cloud machine is linux/ubuntu too.
I'm using python-docx to extract values from a .doc file that is sent to me nightly. However, python-docx only works with .docx files, so I need to convert the file to that extension first.
So, I've got a .doc file that I need to convert to .docx. This script might have to run in the cloud so I can't install any kind of Office or Office-like software. Can this be done?
解决方案
You could use unoconv - Universal Office Converter. Convert between any document format supported by LibreOffice/OpenOffice.
unoconv -d document --format=docx *.doc
subprocess.call(['unoconv', '-d', 'document', '--format=docx', filename])